Comments (8)
See the docs here https://pandera.readthedocs.io/en/latest/polars.html#error-reporting
This is intended behavior: LazyFrame validation will only to schema-level checks (so as not to materialize the data in a lazy method chain). Currently, pandera assumes that all custom checks operate on data. You can force data-level checks by explicitly setting export PANDERA_VALIDATION_ENABLED=SCHEMA_AND_DATA
.
from pandera.
Gotcha, yeah looks like a bug, looking.
from pandera.
Is this a duplicate of #1565?
from pandera.
See the docs here https://pandera.readthedocs.io/en/latest/polars.html#error-reporting
This is intended behavior: LazyFrame validation will only to schema-level checks (so as not to materialize the data in a lazy method chain). Currently, pandera assumes that all custom checks operate on data. You can force data-level checks by explicitly setting
export PANDERA_VALIDATION_ENABLED=SCHEMA_AND_DATA
.
This is super helpful and makes total sense. Thanks for the feedback.
from pandera.
Is this a duplicate of #1565?
I don't think so. The error that I'm experiencing in #1565 is specific to pl.DataFrame
.
from pandera.
@philiporlando would it make sense to add some logging at validation time to explicitly say what types of checks are being run? If so, would it make sense as logging.info
, debug
or something else?
from pandera.
@philiporlando would it make sense to add some logging at validation time to explicitly say what types of checks are being run? If so, would it make sense as
logging.info
,debug
or something else?
I'm in favor of this! At the very least, I think it would be helpful to communicate which data-level checks are ignored whenever a LazyFrame is validated instead of a DataFrame. It might even make sense to log a warning here?
from pandera.
Gotcha, yeah looks like a bug, looking.
Thank you for looking into it!
from pandera.
Related Issues (20)
- Improve strategies internals: accumulate check statisics instead of filtering
- Nullability for `pl.Float64` in `pl.DataFrame` fails HOT 1
- Try_Pandera edits to be more clear and beginner friendly HOT 2
- Validate on Initialization doesn't work in 3.11.9 and 3.12.3 HOT 6
- Annotated parametrized dtypes error on version >= 0.19.0 HOT 3
- Allow use of generic pa.DataFrameSchema/Model for different supported libraries HOT 2
- Time-agnostic DateTime with pandera-native polars datatype using DataFrameModel not working HOT 2
- Cannot call `get_metadata` on a DataFrameModel if there is a Config without a metadata attribute
- NaNs in boolean column coerced to True, nullable and default parameters are ignored
- Pandera is very slow to import when optional dependencies are installed HOT 2
- Missing `reason_code` when using custom checks with PySpark dataframes
- Finite values in `pl.DataFrame` HOT 2
- Optional import hypotheses doesn't install hypothesis HOT 3
- Custom Check Methods don't support custom error (any more)
- Unexpected behavior when validating date objects. pandera=0.19.1
- Compatibility issues with Pandas HOT 3
- pandera not compatible with numpy 2.0 HOT 2
- `SchemaFieldNotFoundError` with custom check function if no alias is provided.
- Adding missing columns with a string default
- Scalar return for check in polars-backed model fails on validation with `lazy=True`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandera.