Giter VIP home page Giter VIP logo

DataCleaner

Build Status: Linux Gitter chat

DataCleaner logo

The premier Open Source Data Quality solution.

DataCleaner is a Data Quality toolkit that allows you to profile, correct and enrich your data. People use it for ad-hoc analysis, recurring cleansing as well as a swiss-army knife in matching and Master Data Management solutions.

Where to go for end-user information?

Please visit the DataCleaner community website https://datacleaner.github.io for downloads, news, documentation etc.

Visit our Gitter chat channel https://gitter.im/datacleaner/community for asking questions or discussions.

GitHub markdown pages and issues are used for developers and technical aspects only.

Module structure

The main application modules are:

  • api - The public API of DataCleaner. Mostly interfaces and annotations that you should use to build your own extensions.
  • resources - Static resources in DataCleaner
  • oss-branding - Icons and colors
  • testware - Useful classes for unit testing of DataCleaner and extension code.
  • engine
    • core - The core engine piece which allows execution of jobs and components as per the API.
    • xml-config - Contains utilities for reading and writing job files and configuration files of DataCleaner.
    • env - Different/alternative environments that DataCleaner can run in, for instance Apache Spark or webapp-cluster
  • components
    • ... - many sub modules containing built-in as well as additional components/extensions to use with DataCleaner.
    • standard-components - a container-project that dependends on all components that are normally bundled in DataCleaner community edition.
  • desktop
    • api - The public API for the DataCleaner desktop application.
    • ui - The Swing-based user interface for desktop users
  • monitor
    • api - the API classes and interfaces of DataCleaner monitor

Code style and formatting

In the root of the project you can find 'Formatter-[IDE].xml' files which enable you to import the code formatting rules of the project into your IDE.

Continuous Integration

There's a public build of DataCleaner that can be found on Travis CI:

https://travis-ci.org/datacleaner/DataCleaner

License

Licensed under the Lesser General Public License, see http://www.gnu.org/licenses/lgpl.txt

DataCleaner's Projects

DataCleaner doesnโ€™t have any public repositories yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.