Comments (2)
Hey @pangjac first off thank you for supporting the package!
sample_mismatch
doesn't exist for the SparkCompare
class in that version of datacompy
. We have a branch which is waiting review where we are shifting to pandas on pyspark if you are ok using that instead. v0.8.4
is fairly old so I'd highly recommend bumping up if you are able to. That old version of SparkCompare doesn't inherit from the base class as it was built aside from it. It has been something which has been bugging me hence the new branch waiting review and deprecating the old Spark class.
If you look at the new implementation (which aligns better to the pandas, polars, and fugue logic) we will have that function natively for Spark.
Alternatively I wonder if the internal dataframe: _all_rows_mismatched
would give you what you need. you can filter on the column you are interested in since its just a Spark DF.
from datacompy.
@pangjac Just wanted to follow up and see if this was solved for you? Thanks!
from datacompy.
Related Issues (20)
- Benchmark Documentation between pandas, fugue, and native spark.
- who can help make the result significantly HOT 2
- Issue in writing report HOT 9
- Look into porting Compare to a polars backend for performance testing. HOT 2
- Abstract base class for native Compare functionality HOT 3
- Are there plans to support Python 3.12.1? HOT 14
- Snowflake and SQL support via Fugue
- edgetest is broken and needs some investigating.
- Datatype standardization before comparing for dataframes from DASK or Pyspark HOT 3
- [Discussion] Deprecate the native Spark implementation in favour of Fugue or Pandas on Spark HOT 1
- `report` throws an exception when all columns match but no rows match
- SparkCompare [PARSE_SYNTAX_ERROR] if column name contains unicode symbols HOT 2
- SparkCompare [PARSE_SYNTAX_ERROR] if a non-join column name contains unicode symbols HOT 1
- Just going to add a note here for future, currently seeing a small difference in pandas vs spark report sample rows when there are rows only in one dataframe.
- switch to ruff for linting and all the things.
- Please add Snowpark support HOT 1
- `Compare` method is modifying input dataframes HOT 2
- datacompy v0.12 spark sample with 5 rows only takes more than a minute to execute on databricks HOT 10
- v0.12.0 doesn't appear to have LegacySparkCompare HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datacompy.