Comments (5)
Thanks for bringing this up @kernelpernel, would it be possible to provide some screenshots and a minimally reproducible example? Don't really understand what you mean by docs being injected.
from pandera.
No screenshots due to possible IP conflicts, but I put together this quick example:
For example, if I write this class:
class ExampleSchema(pa.SchemaModel):
"""Schema to demonstrate doc injection."""
Column1: sc.Integer = sc.IntegerF()
Column2: sc.Str = sc.StrF()
I get this output for the sphinx-generated docs:
class jane_dev.options.utils.doc_testing.ExampleSchema(*args, **kwargs)
Bases: "pandera.api.pandas.model.DataFrameModel"
Schema to demonstrate doc injection.
Check if all columns in a dataframe have a column in the Schema.
Parameters:
* **check_obj** (*pd.DataFrame*) -- the dataframe to be
validated.
* **head** -- validate the first n rows. Rows overlapping with
"tail" or "sample" are de-duplicated.
* **tail** -- validate the last n rows. Rows overlapping with
"head" or "sample" are de-duplicated.
* **sample** -- validate a random sample of n rows. Rows
overlapping with "head" or "tail" are de-duplicated.
* **random_state** -- random seed for the "sample" argument.
* **lazy** -- if True, lazily evaluates dataframe against all
validation checks and raises a "SchemaErrors". Otherwise,
raise "SchemaError" as soon as one occurs.
* **inplace** -- if True, applies coercion to the object of
validation, otherwise creates a copy of the data.
Returns:
validated "DataFrame"
Raises:
**SchemaError** -- when "DataFrame" violates built-in or custom
checks.
Example:
Calling "schema.validate" returns the dataframe.
>>> import pandas as pd
>>> import pandera as pa
>>>
>>> df = pd.DataFrame({
... "probability": [0.1, 0.4, 0.52, 0.23, 0.8, 0.76],
... "category": ["dog", "dog", "cat", "duck", "dog", "dog"]
... })
>>>
>>> schema_withchecks = pa.DataFrameSchema({
... "probability": pa.Column(
... float, pa.Check(lambda s: (s >= 0) & (s <= 1))),
...
... # check that the "category" column contains a few discrete
... # values, and the majority of the entries are dogs.
... "category": pa.Column(
... str, [
... pa.Check(lambda s: s.isin(["dog", "cat", "duck"])),
... pa.Check(lambda s: (s == "dog").mean() > 0.5),
... ]),
... })
>>>
>>> schema_withchecks.validate(df)[["probability", "category"]]
probability category
0 0.10 dog
1 0.40 dog
2 0.52 cat
3 0.23 duck
4 0.80 dog
5 0.76 dog
Column1: pandera.typing.pandas.Series[pandas.core.arrays.integer.Int64Dtype] = 'Column1'
Column2: pandera.typing.pandas.Series[str] = 'Column2'
class Config
Bases: "pandera.api.pandas.model_config.BaseConfig"
name: str | None = 'ExampleSchema'
name of schema
Where I would expect to only see this:
class jane_dev.options.utils.doc_testing.ExampleSchema(*args, **kwargs)
Bases: "pandera.api.pandas.model.DataFrameModel"
Schema to demonstrate doc injection.
Column1: pandera.typing.pandas.Series[pandas.core.arrays.integer.Int64Dtype] = 'Column1'
Column2: pandera.typing.pandas.Series[str] = 'Column2'
And the docs appear to be the same as those from here:
Pandera Docs
from pandera.
Thanks for the quick response @cosmicBboy !
from pandera.
It's probably because of the __new__
method: https://github.com/unionai-oss/pandera/blob/main/pandera/api/dataframe/model.py#L127-L132
Can you try overriding that method and seeing if it happens?
from pandera.
@kernelpernel any updates on this issue?
from pandera.
Related Issues (20)
- Try_Pandera edits to be more clear and beginner friendly HOT 2
- Validate on Initialization doesn't work in 3.11.9 and 3.12.3 HOT 6
- Annotated parametrized dtypes error on version >= 0.19.0 HOT 3
- Allow use of generic pa.DataFrameSchema/Model for different supported libraries HOT 2
- Time-agnostic DateTime with pandera-native polars datatype using DataFrameModel not working HOT 2
- Cannot call `get_metadata` on a DataFrameModel if there is a Config without a metadata attribute
- NaNs in boolean column coerced to True, nullable and default parameters are ignored
- Pandera is very slow to import when optional dependencies are installed HOT 2
- Missing `reason_code` when using custom checks with PySpark dataframes
- Finite values in `pl.DataFrame` HOT 2
- Optional import hypotheses doesn't install hypothesis HOT 3
- Custom Check Methods don't support custom error (any more)
- Unexpected behavior when validating date objects. pandera=0.19.1
- Compatibility issues with Pandas HOT 3
- pandera not compatible with numpy 2.0 HOT 2
- `SchemaFieldNotFoundError` with custom check function if no alias is provided.
- Adding missing columns with a string default
- Scalar return for check in polars-backed model fails on validation with `lazy=True`
- Setting `coerce` on a column causes the column to be `required` when `required=False` HOT 1
- Support Data synthesis strategies for polars
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandera.