Giter VIP home page Giter VIP logo

Comments (7)

elijahbenizzy avatar elijahbenizzy commented on June 28, 2024

Description:

I encountered an issue while trying to apply multiple data validation decorators to a single function in the Hamilton DAG framework. Specifically, I am trying to validate different columns of a DataFrame using multiple instances of the @check_output_custom decorator. However, I receive a ValueError indicating that the function cannot be defined more than once.

Steps to Reproduce:

  1. Define a function to process a DataFrame.
  2. Apply multiple @check_output_custom decorators to the function, each with different validation parameters.
  3. Attempt to run the decorated function.

Example code snippet:

1st issue code snippet

@check_output_custom(CompositePrimaryKeyValidatorPySparkDataFrame(columns=["OrderID", "ItemNumber"], importance="fail")) @check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail")) def process_order_data(order_data_config: dict, order_filter_template: List) -> DataFrame: # Function implementation pass

This raises the error:

ValueError: Cannot define function process_order_data_raw more than once. Already defined by function <function process_order_data

2nd issue code snippet

@check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail")) @check_output_custom(CategoricalValuesValidatorPySparkDataFrame(column="ProductID", allowed_values=[10, 20, 30], importance="warn")) def process_order_data(order_data_config: dict, order_filter_template: List) -> DataFrame: # Function implementation pass

This raises the error:

ValueError: Cannot define function process_order_data_CategoricalValuesValidator more than once. Already defined by function <function process_order_data

Expected Behavior

Applying multiple @check_output_custom decorators to a single function should allow for different validation checks on various columns of the DataFrame without raising a ValueError.

Actual Behavior

A ValueError is raised, indicating that the function cannot be defined more than once by the same validator.

Library & System Information

python version = 3.9.5 hamilton library version = 1.65.0

Additional Context:

This issue prevents the application of multiple validators to a single function, which is necessary for comprehensive data validation in our use case. It would be helpful if the framework could support multiple validators on the same function without raising errors.

Thank you for your attention to this issue.

Thanks for opening! This is limitation I think. E.G. two that have the same name + another complexity. I think we can build a fix, but just to check, if you have them both in the same validator (E.G. as follows) does it work? My guess is not, but worth a try:

@check_output_custom(
    CompositePrimaryKeyValidatorPySparkDataFrame(columns=["OrderID", "ItemNumber"], importance="fail")),
    CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail")
)

from hamilton.

skrawcz avatar skrawcz commented on June 28, 2024

Another thought would be to add another custom validator that takes in multiple validators... 🤔

Otherwise I think a potential avenue to scope would be to include some name_ kwarg to help name the node so it doesn't clash...

from hamilton.

rohithrockzz avatar rohithrockzz commented on June 28, 2024

@elijahbenizzy
Yes, its working. Thanks for your help. If it is mention in the documentation, it would be helpful for the new comers.
But the 2nd issue still present which is 'If we pass same data validator twice getting error' like below
@check_output_custom(
CategoricalValuesValidatorPySparkDataFrame(column="ReportingId", allowed_values=[156], importance="fail")),
CategoricalValuesValidatorPySparkDataFrame(column="CategoryID", allowed_values=[1, 2, 3], importance="fail")
)

from hamilton.

skrawcz avatar skrawcz commented on June 28, 2024

@rohithrockzz could you try installing
pip install sf-hamilton==1.66.1rc0
and see if that fixes your issue please?

from hamilton.

rohithrockzz avatar rohithrockzz commented on June 28, 2024

@skrawcz
Yes, it worked. Thank you so much for the quick fix

from hamilton.

skrawcz avatar skrawcz commented on June 28, 2024

@rohithrockzz great thanks for verifying. I will publish a non-RC version in the morning.

from hamilton.

skrawcz avatar skrawcz commented on June 28, 2024

@rohithrockzz this has been released under sf-hamilton==1.66.1

from hamilton.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.