Giter VIP home page Giter VIP logo

mimo's People

Contributors

adrienaury avatar youen avatar

Stargazers

 avatar

Watchers

 avatar  avatar

mimo's Issues

[PROPOSAL] Debug specific field

A new flag --debug-field <field-name> should enable additional informations on a specific field :

  • non masked value (missed)
  • value masked incoherently (lowering coherent rate)
  • pseudonym given to multiple value (lowering identifiant rate)

[PROPOSAL] Pre-processing step

Problem

Difficult to compute inter-stream coherence/identifiable rates when columns are built differently (by format or composition)

Example :

Stream A

A/real.jsonl

{"id":1,"name":"Clothilde","surname":"Renard"}
{"id":2,"name":"Andrée","surname":"Mathieu"}

A/masked.jsonl

{"id":1,"name":"John","surname":"Doe"}
{"id":2,"name":"Jade","surname":"Doe"}

Stream B

B/real.jsonl

{"id":1,"idutil":"1 - Clothilde Renard"}
{"id":2,"idutil":"2 - Andrée Mathieu"}

B/masked.jsonl

{"id":1,"idutil":"1 - John Doe"}
{"id":2,"idutil":"2 - John Doe"}

Solution

MIMO should be able to compute pre-values in the stream.

Configuration for Stream A

A/config.yaml

version: "1"
preprocess:
  - name: "idutil"
    value: "{{.id}} - {{.name}} {{.surname}}"
metrics:
  - name: "idutil"
    coherentWith: ["id"]

Configuration for Stream B

B/config.yaml

version: "1"
metrics:
  - name: "idutil"
    coherentWith: ["id"]

Command line

$ cat A/masked.jsonl | mimo --config A/config.yaml --persist localdb A/real.jsonl
$ cat B/masked.jsonl | mimo --config B/config.yaml --persist localdb B/real.jsonl

computation matrix (stored in localdb)

  1 - John Doe 2 - Jane Doe 2 - John Doe coherent rate = 50%
1 x     1
2   x x 2
identifiant rate = 100% 1 1 1  

report.html

Field Nil Empty Masked Missed Masking Rate Coherent Rate Identifiable Rate
idutil 0 0 4 0 100% 50% 100%

[PROPOSAL] add a path for the repository

The report is generated at the root workspace, insome context i need to change the outpur$t path and the file name
Indeed (because I m a maven man) all build information is write in * target * directory and this directory is in my * .gitignore *
Example :

> cat masked.jsonl | mimo --output target/ real.jsonl

> cat masked.jsonl | mimo -o target/ real.jsonl 
> cat masked.jsonl | mimo --output myHTML.html real.jsonl

> cat masked.jsonl | mimo -o target/myHTML.html real.jsonl 

The final * / * should define the stringis a directory (and create this path if not exist)

[PROPOSAL] Configuration profile

As a user of MIMO, I want to validate

  • a coherence rate between multiple real input fields vs the single pseudonym value
  • a masking rate with or without taking into account empty values
  • constraints on any rate like : equals to 100%, greater or equals to 95%, ...

Solution

Define a validation profile with a YAML configuration file:

mimo.yaml

metrics:
  - name: # name of the field to validate
    exclude: [ nil, "" ] # exclude nil and empty values from the masking rate computation (default: only nil values)
    coherentWith: ["name", "surname"] # list of fields from witch the coherent rate is computed (default: the current field)
    constraints:
      maskingRate:
        shouldEqualsTo: 1
      coherentRate:
        shouldBeGreaterThan: 0.95
$ mkfifo real.jsonl
$ lino pull bdd | tee real.jsonl | pimo | mimo --config mimo.yaml real.jsonl | lino push bdd

[PROPOSAL] Add alias in config

A column could use an alias to be identified across multiple execution context

version: "1"
metrics:
  - name: "person.phone"
    alias: "phone_number"

[PROPOSAL] Configure coherence source with template

{"batchs":[{"id":1,"accounts":[{"number":"A"},{"number":"B"}]}]}
version: "1"
metrics:
  - name: "batchs.[].accounts.[].number"
    coherentSource: "{{.current}} {{._1.id}}"

Path should be able to start from the current context of execution (inside an array structure for example)

  • .current will reference the current object
  • .root will reference the root object
  • ._1 will reference the current first level object
  • ._2 will reference the current second level object
  • etc...

[PROPOSAL] Let MIMO launch the pseudonymize process

Actual usage

$ mkfifo real.jsonl
$ lino pull bdd | tee real.jsonl | pimo | mimo real.jsonl | lino push bdd

Wanted usage

$ # mimo starts the subprocess (pimo)
$ lino pull bdd | mimo pimo | lino push bdd

or

$ # mimo reads masked data from a file
$ lino pull bdd | tee real.jsonl | pimo | mimo --from-file real.jsonl | lino push bdd

[PROPOSAL] Debug information when constraint fail

$ cat masked.jsonl | mimo --config cfg.yaml real.jsonl
5:05PM ERR summmary for column value count-ignored=0 count-masked=8 count-missed=2 count-nil=0 field=value rate-coherence=0.6 rate-identifiable=0.8 rate-masking=0.8
5:05PM ERR masking-rate shouldEqual 1 for column value failed, below are some example of failed values field=value
5:05PM ERR value was not masked field=value value=John
5:05PM ERR value was not masked field=value value=Jane
5:05PM ERR coherence-rate shouldEqual 1 for column value failed, below are some example of failed values field=value
5:05PM ERR value was attributed 3 pseudonyms field=value value=John pseudonyms=[Rob,John,Jane]
5:05PM ERR identifiable-rate shouldEqual 1 for column value failed, below are some example of failed values field=value
5:05PM ERR pseudonym was attributed to 2 values field=value pseudonym=Rob values=[John,Jane]
5:05PM FTL end MIMO error="report contains unsatisfied constraint(s)"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.