Question about pandera We use pandera where I work for our datafra

Thanks for bringing this up <a class="user-mention notranslate" data-hovercard-type="u

Thanks for the quick response <a class="user-mention notranslate" data-hovercard-type=

It's probably because of the __new__ method: <a href=

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to Avoid Pandera Doc Injection? about pandera HOT 5 OPEN

kernelpernel commented on May 30, 2024

How to Avoid Pandera Doc Injection?

from pandera.

Comments (5)

cosmicBboy commented on May 30, 2024

Thanks for bringing this up @kernelpernel, would it be possible to provide some screenshots and a minimally reproducible example? Don't really understand what you mean by docs being injected.

from pandera.

kernelpernel commented on May 30, 2024

No screenshots due to possible IP conflicts, but I put together this quick example:

For example, if I write this class:

class ExampleSchema(pa.SchemaModel):
    """Schema to demonstrate doc injection."""

    Column1: sc.Integer = sc.IntegerF()
    Column2: sc.Str = sc.StrF()

I get this output for the sphinx-generated docs:

class jane_dev.options.utils.doc_testing.ExampleSchema(*args, **kwargs)

   Bases: "pandera.api.pandas.model.DataFrameModel"

   Schema to demonstrate doc injection.

   Check if all columns in a dataframe have a column in the Schema.

   Parameters:
      * **check_obj** (*pd.DataFrame*) -- the dataframe to be
        validated.

      * **head** -- validate the first n rows. Rows overlapping with
        "tail" or "sample" are de-duplicated.

      * **tail** -- validate the last n rows. Rows overlapping with
        "head" or "sample" are de-duplicated.

      * **sample** -- validate a random sample of n rows. Rows
        overlapping with "head" or "tail" are de-duplicated.

      * **random_state** -- random seed for the "sample" argument.

      * **lazy** -- if True, lazily evaluates dataframe against all
        validation checks and raises a "SchemaErrors". Otherwise,
        raise "SchemaError" as soon as one occurs.

      * **inplace** -- if True, applies coercion to the object of
        validation, otherwise creates a copy of the data.

   Returns:
      validated "DataFrame"

   Raises:
      **SchemaError** -- when "DataFrame" violates built-in or custom
      checks.

   Example:
   Calling "schema.validate" returns the dataframe.

   >>> import pandas as pd
   >>> import pandera as pa
   >>>
   >>> df = pd.DataFrame({
   ...     "probability": [0.1, 0.4, 0.52, 0.23, 0.8, 0.76],
   ...     "category": ["dog", "dog", "cat", "duck", "dog", "dog"]
   ... })
   >>>
   >>> schema_withchecks = pa.DataFrameSchema({
   ...     "probability": pa.Column(
   ...         float, pa.Check(lambda s: (s >= 0) & (s <= 1))),
   ...
   ...     # check that the "category" column contains a few discrete
   ...     # values, and the majority of the entries are dogs.
   ...     "category": pa.Column(
   ...         str, [
   ...             pa.Check(lambda s: s.isin(["dog", "cat", "duck"])),
   ...             pa.Check(lambda s: (s == "dog").mean() > 0.5),
   ...         ]),
   ... })
   >>>
   >>> schema_withchecks.validate(df)[["probability", "category"]]
      probability category
   0         0.10      dog
   1         0.40      dog
   2         0.52      cat
   3         0.23     duck
   4         0.80      dog
   5         0.76      dog

   Column1: pandera.typing.pandas.Series[pandas.core.arrays.integer.Int64Dtype] = 'Column1'

   Column2: pandera.typing.pandas.Series[str] = 'Column2'

   class Config

      Bases: "pandera.api.pandas.model_config.BaseConfig"

      name: str | None = 'ExampleSchema'

         name of schema

Where I would expect to only see this:

class jane_dev.options.utils.doc_testing.ExampleSchema(*args, **kwargs)

   Bases: "pandera.api.pandas.model.DataFrameModel"

   Schema to demonstrate doc injection.

   Column1: pandera.typing.pandas.Series[pandas.core.arrays.integer.Int64Dtype] = 'Column1'

   Column2: pandera.typing.pandas.Series[str] = 'Column2'

And the docs appear to be the same as those from here:
Pandera Docs

from pandera.

kernelpernel commented on May 30, 2024

Thanks for the quick response @cosmicBboy !

from pandera.

cosmicBboy commented on May 30, 2024

It's probably because of the __new__ method: https://github.com/unionai-oss/pandera/blob/main/pandera/api/dataframe/model.py#L127-L132

Can you try overriding that method and seeing if it happens?

from pandera.

cosmicBboy commented on May 30, 2024

@kernelpernel any updates on this issue?

from pandera.

How to Avoid Pandera Doc Injection? about pandera HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent