Giter VIP home page Giter VIP logo

diffit's Introduction

Diff-it: Data Differ

Overview

diffit will report differences between two data sets with similar schema.

Refer to Diffit's documentation for detailed instructions.

Prerequisites

Getting Started

Makester is used as the Integrated Developer Platform.

(macOS Users only) Upgrading GNU Make

Follow these notes to get GNU make.

Creating the Local Environment

Get the code and change into the top level git project directory:

git clone [email protected]:loum/diffit.git && cd diffit

NOTE: Run all commands from the top-level directory of the git repository.

For first-time setup, get the Makester project:

git submodule update --init

Initialise the environment:

make init-dev

Local Environment Maintenance

Keep Makester project up-to-date with:

git submodule update --remote --merge

Help

There should be a make target to get most things done. Check the help for more information:

make help

Running the Test Harness

We use pytest. To run the tests:

make tests

FAQs

Q. Why do I get WARNING: An illegal reflective access operation has occurred? Seems to be related to the JVM version being used. Java 8 will suppress the warning. To check available Java versions on your Mac try /usr/libexec/java_home -V. Then:

export JAVA_HOME=$(/usr/libexec/java_home -v <java_version>)

top

diffit's People

Contributors

loum avatar

Watchers

 avatar Kostas Georgiou avatar  avatar

diffit's Issues

mypy errors with missing-imports

Is your feature request related to a problem? Please describe.

When running mypy across a project that imports diffit, the follow error is reported:

error: Skipping analyzing "diffit": module is installed, but missing library stubs or py.typed marker  [import]
note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports

As per the mypy documentation:

If you are getting a Skipping analyzing X: module is installed, but missing library stubs or py.typed marker, error, this means mypy was able to find the module you were importing, but no corresponding type hints.

Describe the solution you would like.

Add type hints to diffit and a corresponding py.typed.

Describe alternatives you have considered.

Set ignore_missing_imports in the project.

Additional context

Add any other context or screenshots about the feature request here.

Error installing diffit using git-style dependency in setup.cg

Describe the bug.

Error installing diffit as a package dependency to my project's setup.cfg. Error produced:

/config/setupcfg.py", line 598, in _parse_version
          raise DistutilsOptionError(tmpl.format(**locals()))
      distutils.errors.DistutilsOptionError: Version loaded from file: src/diffit/VERSION does not comply with PEP 440:
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Steps to reproduce the behavior.

  1. Add diffit to setup.cfg using git style format:

    [options.extras_require]
    dev =
        diffit @ git+https://github.com/loum/[email protected]
    
  2. Install:

pip install -e .[dev]

Expected behavior.

diffit installs into my project.

Screenshots.

If applicable, add screenshots to help explain your problem.

Additional context.

Add any other context about the problem here.

Apache Spark DataFrame generator for testing and analysis

Is your feature request related to a problem? Please describe.

Generating Spark DataFrames can be a drag. Having a tool that could create one dynamically for testing and analysis purposes would be handy.

Describe the solution you would like.

An extension of the diffit tool with a dataframe create subsystem.

Describe alternatives you have considered.

Creating Spark DataFrames manually.

Additional context

Add any other context or screenshots about the feature request here.

Container image build fails with missing diff.py error

Describe the bug.

The container image build process does not run to completion.

Steps to reproduce the behavior.

  1. Run make image-build
  2. See error:
...
 => ERROR [ 9/10] COPY src/bin/diff.py /scripts/diff.py                                                                      0.0s
------
 > [ 9/10] COPY src/bin/diff.py /scripts/diff.py:
------
failed to compute cache key: "/src/bin/diff.py" not found: not found

Expected behavior.

The container image should be created.

Screenshots.

If applicable, add screenshots to help explain your problem.

Additional context.

Add any other context about the problem here.

Select columns to pass through differential engine

Is your feature request related to a problem? Please describe.

diffit currently supports dropping columns from a row level check. However, there is not support for selecting columns to add to the differential engine.

Describe the solution you would like.

Analogous to the drop switch, provide capability to add columns to the differential engine.

Describe alternatives you have considered.

In the case of running the differential engine on a subset of columns, we have to resort to multiple drop assignments. This is not practical for large data sets.

Additional context

Add any other context or screenshots about the feature request here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.