Giter VIP home page Giter VIP logo

chlorine-finder's Introduction

chlorine-finder

A Java Library to detect sensitive data.

Chlorine-finder is an open source library to detect sensitive elements in text. It is java based. Chlorine-finder can detect different types of Credit card numbers, SSN, Phone Numbers, email adddresses, Ip Addresses, Street Addresses and more.

###To Download source code

git clone https://github.com/dataApps/chlorine-finder.git

###To build chlorine-finder

mvn install

###To use Chlorine-finder

  • Add a dependency to Chlorine-finder library.

Maven dependency Definition

   <dependency>
      <groupId>io.dataapps.chlorine</groupId>
      <artifactId>chlorine-finder</artifactId>
      <version>1.1.5</version>
   </dependency>
  • Add the following lines of Code:
 FinderEngine engine = new FinderEngine();
 List<String> matchedValues = engine.find ("Here is my id : [email protected] and my machine inf o:  124.234.223.12 , ok ?");

The matchedValues will contain the email Address [email protected] and the ip-address 124.234.223.12. If there are multiple sensitive elements, then all of them will be returned.

Chlorine-finder internally uses a set of Finders to perform detection. The Finders can be specified as a regular expression or a Java Class.

Download library jar

The latest chlorine-finder library can be downloaded here.

###Further Documentation chlorine-finder wiki

###Related projects

  • Online Redactor - Masks sensitive elements from the input text. Users can customize the rules.
  • Chlorine-hadoop - detect and mask sensitive elements in Hadoop Clusters.
  • Chlorine-hive - detect and mask sensitive elements in Hive tables.

###Java Docs The java docs for chlorine-finder are available here.

We welcome all contributions. You can contribute features, enhancements, bug fixes or new Finders.

##Want to contribute features, bug fixes, enhancements?

Fork it
Create your feature branch (git checkout -b my-new-feature)
Take a look at the issues
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

chlorine-finder's People

Contributors

dataapps avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

chlorine-finder's Issues

Strings containing CCNs not found due to start block

Hi

I've noticed while using this function that it excludes certain strings from being found despite their containing CCNs which would be picked up by the regular expressions, due to the start block - e.g. if 4444 5555 6666 7777 (just as an example) gets picked up, .4444 5555 6666 7777 or -4444 5555 6666 7777 do not

Why is the start block included when it does this?

Create a GroupFinder

ComposisteFinder allows one to combine multiple Finders into a single Finder.
If any of the component Finders find a match then its reported. If multiple compnent Finders find a match, they are all reported.

To detect Quasi identifiers, we need a GroupFinder, one which reports a match only if all the component finders detect some element in the text - eg. (name, zip code , dob)

Enable FinderEngine to scan a file

Currently, the FinderEngine accepts a string to scan using its finders.
Add functions in FinderEngine to read its input from an InputStream , FIle etc.

Add a feature to define complaince suites

A compliance suite is a set of Finders which detect the presence of different sensitive elements.
The compliance suite is commonly associated with a specific compliance requirement like HIPAA or PCI..

Add a feature so that we can create a compliance suite as a set of finders. One way to do this is to add an optional suites tag for the finders. This can be a comma separated list s each finder can be associated with multiple suites.

Improve and simplify the configuration for Maskers

Currently MaskFactoryand Redactor allows limited ability to configure new Makser. This should be improved.
We need to allow similar configuration capabilities as that of FinderEngine to provide consistensy and configurability

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.