Comments (11)
Care to provide additional detail and a proposal? E.g. what can be removed and what part of the codebase would need to be refactored?
from pandera.
I apologize for not providing more detail, but the main problem is just that dependency graph is huge. I'm raising this issue out of concern that this is a problem. It is in our case and it may be so too for others that would like to use it as a dependency.
I can't start going into how this can best be pruned, but I'm sure there is a way. OS libraries will typically have up to tens of packages in their dependency graph, but this package has thousands. Surely, there's something which can be done about this?
I understand if this is an annoying request to make, but I'm only here because I really like Pandera and wish I could use it at work.
from pandera.
I appreciate the developers in what they created. I have exactly the same situation as @ulfaslakprecis in my org. Having all dev dependencies in the requirements, means that if we want to productionize it, we will need to create enormous containers just to perform a simple validation. Not all the functionalities are needed for the core, and in terms of other teams that's a clear no go. There are lot's of dependencies management tools (like poetry, or pip-tools), that besides other optionalities target this issue. If you want to go for a native pip solution it is possible as well:
https://peps.python.org/pep-0508/
from pandera.
@GOGKI I started building pandabear
recently. It has a similar/near-identical API to pandera but it ONLY does pandas dataframe/series validation. Still very beta, but input is much appreciated.
from pandera.
Happy to support work on making pandera more light-weight. @ulfaslakprecis any appetite for contributing to pandera as opposed to building + maintaining a brand new project?
from pandera.
Also wanted to better-understand the issue here. The items listed in the dependency graph are not necessarily what you get when you pip install pandera
. The dependencies listed there are an exhaustive list based on all the **/requirements*
files and github actions: these are not installed with a plain pip install pandera
installation.
Without installing all of the extras, the packages installed are listed here:
https://github.com/unionai-oss/pandera/blob/main/setup.py#L47-L57
That said, I do think we could get rid of multimethod
, wrapt
, and packaging
off the bat. pydantic
and typeguard
can potentially be cordoned off into their own extras.
Having all dev dependencies in the requirements, means that if we want to productionize it, we will need to create enormous containers just to perform a simple validation
@GOGKI just so I understand this, do you only need to install core pandera when you need to productionize your code? What unexpected/unwanted dependencies do you get?
from pandera.
@GOGKI just so I understand this, do you only need to install core pandera when you need to productionize your code? What unexpected/unwanted dependencies do you get?
same question to you @ulfaslakprecis. What dependencies do you consider too heavy weight in your pandera installation (not the dependency graph reported by github, but the ones that are actually installed when you pip install pandera
from pandera.
@cosmicBboy in our case we are having issues with typeguard. Pandera uses typeguard>=3.0.2 and jaxtyping uses typeguard==2.13.3 which makes them incompatible. So having typeguard as an optional dependency would possibly fix the problem.
Allowing typeguard 2 would also fix our problem.
from pandera.
from pandera.
@ulfaslakprecis any comments on #1365 (comment)?
If not gonna close this issue in the next few days.
Created
To capture slimming down the dependencies of a bare pandera installation, but the initial claim in this issue
We want to use Pandera in our organization's codebase, but a some evaluation deemed it unusable at the moment, due to the ENORMOUS (213 pages long) dependency graph.
is actually a non-issue, since the github-reported dependency graph naively reports dependencies in requirements files and not actually the dependencies entailed by pip install pandera
.
from pandera.
Closing now
from pandera.
Related Issues (20)
- How to load schema from pyspark struct or avro format from schema registry ? HOT 1
- How to correctly install a release v0.19.0b3 HOT 2
- Support Series generation with serial dependence HOT 1
- Incorrect validation passes pandera=0.19.0b3 HOT 1
- failure_case conversion failed : polars.exceptions.ComputeError - pandera(0.19.0b3) with polars HOT 5
- Incorrect Pandera Polars DataFrameModel Type Coercion Logic HOT 5
- Pandera Polars datatype 'check' method is not provided a 'data_container' HOT 6
- unique Field argument not yet implemented for pyspark HOT 1
- Improve strategies internals: accumulate check statisics instead of filtering
- Nullability for `pl.Float64` in `pl.DataFrame` fails HOT 1
- Try_Pandera edits to be more clear and beginner friendly HOT 2
- Validate on Initialization doesn't work in 3.11.9 and 3.12.3 HOT 6
- Annotated parametrized dtypes error on version >= 0.19.0 HOT 3
- Allow use of generic pa.DataFrameSchema/Model for different supported libraries HOT 2
- Time-agnostic DateTime with pandera-native polars datatype using DataFrameModel not working HOT 2
- Cannot call `get_metadata` on a DataFrameModel if there is a Config without a metadata attribute
- NaNs in boolean column coerced to True, nullable and default parameters are ignored
- Pandera is very slow to import when optional dependencies are installed HOT 2
- Missing `reason_code` when using custom checks with PySpark dataframes HOT 1
- Finite values in `pl.DataFrame` HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pandera.