johnsnowlabs / langtest Goto Github PK
View Code? Open in Web Editor NEWDeliver safe & effective language models
Home Page: http://langtest.org/
License: Apache License 2.0
Deliver safe & effective language models
Home Page: http://langtest.org/
License: Apache License 2.0
Currently we are passing perturbations as single params:
augment_robustness(conll_path = 'data.conll',
uppercase = {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05},
lowercase = {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05})
We should change this to a new parameter that accepts a perturbation map that looks like this:
detailed_proportions = {
"uppercase": {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05},
"lowercase": {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05},
"title": {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05},
"add_punctuation": {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05},
}
augment_robustness(conll_path = 'data.conll',
entity_perturbation_map = detailed_proportions)
we should also accept a more simple version of this in another parameter:
proportions= {
"uppercase": 0.05,
"lowercase": 0.05}
augment_robustness(conll_path = 'data.conll',
perturbation_map = proportions)
The new NERDataHandler
class will include:
It will go under nlptest/data_handlers/
The first version of this class will support robustness testing/fixing in Spark NLP, but adaptability to other tasks and libraries should be kept in mind when developing it.
Create a NERDataHandler
class that establishes a common CoNLL data structure for all libraries to process labeled NER data. This includes:
This issue will be used to track the sub-tasks required to launch and maintain this class.
Tasks for release-0.0.1
:
Task led by @JulesBelveze
π
Token filtering was created to delete extra added tokens to match token lengths for comparing predictions from NER models. There are other ways we can do this like implementing something into metrics to ignore token length differences.
See slides for more details on possible approaches.
.report()
should print this:
test factory | test type | pass count | fail count | pass rate | minimum pass rate | pass |
---|---|---|---|---|---|---|
Perturbation | uppercase | 34 | 16 | 68% | 75% | False |
noise_proportion = 0.5
# step 1: apply perturbation to all samples
1000 sentences -> apply contraction
# step 2: sample as many successfully augmented sentences as possible to reach noise_proportion
# we don't mind if some are already augmented
50 samples successfully contracted (augmented) + 100 already contracted
50 augmented + 50 (random.sample(n=50) from 1000 - 50) -> we don't mind if sampled ones are already augmented
noise_proportion = 0.5
contraction -> 5 samples to augment + 5 original samples -> f1 score: 0.60
uppercase -> 500 samples to augment + 500 original samples -> f1 score: 0.75
samples to augment == 0 -> "No samples to apply {test_name}, skipping this test."
samples to augment < 50 -> "Low number of samples ({n_samples}) to apply {test_name} to."
"F1-Score may not be representative of true perturbation effect."
total sentences β> 1000
noise_prob β> 0.5
for low augmentation coverage β> add, strip punc, accent_conversion, entity_swapping, add contraction
For this ones, we can apply some samples β> not all
add_punction β> sentence already have punctuation β> skip
Not all sentences can be contracted β> is not β> isnβt
1000 sentences β> noise prob 0.5 β> we can try to apply augmentation to only around 500, bcs of the noise prob
1000 β> 500 (added with noise prob) β 500 (will be searched for contraction)
Among 500 samples β> 25 contraction augmentation
While we are testing our perturbation β> perturbation set contains 500 original sentence and 25 augmented samples
Problem 1 -> we know 500 of them (original sentences) will be correct.
total noise samples will be 500 + 25 but we are comparing only 25 of them
this cause high f1 β> it seems like model donβt have problem in this perturbation test
In yaml:
- uppercase:
- min_pass_rate: 75%
or maybe better to have 1 config for all tests - @ArshaanNazir you can decide when implementing
Tasks for release-0.0.1
:
Tasks for release-0.0.2
:
Create a NERModelHandler
class that establishes a common way for inference and training on NER models from different libraries. This includes:
This issue will be used to track the sub-tasks required to launch and maintain this class.
Use this package to automate the conversion: https://github.com/cbillingham/docconvert
Sheet can be found in Development channel - nlptest features.xlsx\
Please fill out the Design
tab
Sheet can be found in Development channel - nlptest features.xlsx
The new NERModelHandler
class will include methods to:
It will go under nlptest/model_handlers/
The first version of this class will support robustness testing/fixing in Spark NLP, but adaptability to other tasks and libraries should be kept in mind when developing it.
π
π
Colab notebook used to experiment: https://colab.research.google.com/drive/1VkBn3xn0yS1MzxR_w30ONRgZNNDjVByt?usp=sharing
π
Should support both classic Spark NLP pipelines & new JSL library
We need to build mechanisms to test data for different categories of privacy attacks:
Membership inference attack: An adversary predicts whether a known subject was a present in the training data used for training the synthetic data model.
Re-identification attack: The adversary explores the probability of some features being re-identified using synthetic data and matching to the training data.
Attribute inference attack: The adversary predicts the value of sensitive features using synthetic data.
Main article discussing mechanisms.
π
A declarative, efficient, and flexible JavaScript library for building user interfaces.
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. πππ
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google β€οΈ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.