johnsnowlabs / langtest Goto Github PK

View Code? Open in Web Editor NEW

466.0 11.0 34.0 162.8 MB

Deliver safe & effective language models

Home Page: http://langtest.org/

License: Apache License 2.0

Python 99.98% Makefile 0.01% CSS 0.01% Batchfile 0.01% Shell 0.01%

benchmarks ethics-in-ai large-language-models ml-safety ml-testing mlops model-assessment nlp responsible-ai llm-test

langtest's Issues

Create a tutorial notebook for noisy labels testing/fixing

Rename Harness config_path param to config

Add Google style docstrings to all code

Robustness fixing should accept a simple dictionary where keys are perturbation names and values are proportions to apply to all entities for that perturbation

Currently we are passing perturbations as single params:

augment_robustness(conll_path = 'data.conll',
                   uppercase = {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05},
                   lowercase = {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05})

We should change this to a new parameter that accepts a perturbation map that looks like this:

detailed_proportions = {
   "uppercase": {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05},
   "lowercase": {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05},
   "title": {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05},
   "add_punctuation": {'PROBLEM':0.05, 'TEST':0.05, 'TREATMENT':0.05},
}

augment_robustness(conll_path = 'data.conll',
                   entity_perturbation_map = detailed_proportions)

we should also accept a more simple version of this in another parameter:

proportions= {
   "uppercase": 0.05,
   "lowercase":  0.05}

augment_robustness(conll_path = 'data.conll',
                   perturbation_map = proportions)

Make invalid metrics_output_format impossible to pass to robustness testing

Create and implement a NERDataHandler class that establishes a common method to process labeled NER data in Spark NLP

The new NERDataHandler class will include:

Write and read methods
Storing docs indexes
Easily filtering
Converting inputs to match library requirements

It will go under nlptest/data_handlers/

The first version of this class will support robustness testing/fixing in Spark NLP, but adaptability to other tasks and libraries should be kept in mind when developing it.

Add unittest for noisy labels testing

Description

Create a NERDataHandler class that establishes a common CoNLL data structure for all libraries to process labeled NER data. This includes:

Write and read methods
Storing docs indexes
Easily filtering
Converting inputs to match external library requirements (including direct dataset download from HF datasets)

This issue will be used to track the sub-tasks required to launch and maintain this class.

Tasks

Ensure class supports robustness testing/fixing with Spark NLP
Ensure class supports bias testing with Spark NLP
Ensure class supports noisy label testing/fixing with Spark NLP
Ensure class supports robustness testing/fixing with transformers
Ensure class supports bias testing with transformers
Ensure class supports noisy label testing/fixing with transformers
Ensure class supports robustness testing/fixing with spaCy
Ensure class supports bias testing with spaCy
Ensure class supports noisy label testing/fixing with spaCy

Add support for HF transformers NER models

Tasks for release-0.0.1:

Implement a minimum working version for 3 popular NER models

Task led by @JulesBelveze

Add support for spaCy NER models

Adaptations

Adapt handler classes for noisy labels testing in Spark NLP
Adapt handler classes for noisy labels testing in transformers - notebook
Adapt handler classes for noisy labels testing in spaCy

Improvements

#20
#31
Improve scoring using sentence label quality score
Add type checking to main function
Supporting classification, assertion, relation extraction tasks

Bug Fixes

🎉

Refactor token filtering in robustness_testing

Token filtering was created to delete extra added tokens to match token lengths for comparing predictions from NER models. There are other ways we can do this like implementing something into metrics to ignore token length differences.

See slides for more details on possible approaches.

Reformat report method output

.report() should print this:

test factory	test type	pass count	fail count	pass rate	minimum pass rate	pass
Perturbation	uppercase	34	16	68%	75%	False

Implement GH automated build for repo

Add unittest for bias testing

Add installation instructions in README

Add usage instructions in README

Add unittest for robustness testing

Make invalid test names impossible to pass to robustness testing

Create a tutorial notebook for bias testing

Add support for HF datasets for Text Classification task

Rename k to k_fold in noisy labels testing

Improve sampling method for `noise_prob` param by replacing with new `noise_proportion` param in robustness testing

noise_proportion = 0.5

# step 1: apply perturbation to all samples
1000 sentences -> apply contraction

# step 2: sample as many successfully augmented sentences as possible to reach noise_proportion
# we don't mind if some are already augmented
50 samples successfully contracted (augmented) + 100 already contracted

50 augmented + 50 (random.sample(n=50) from 1000 - 50) -> we don't mind if sampled ones are already augmented


noise_proportion = 0.5

contraction -> 5 samples to augment + 5 original samples -> f1 score: 0.60

uppercase -> 500 samples to augment + 500 original samples -> f1 score: 0.75

samples to augment == 0 -> "No samples to apply {test_name}, skipping this test."

samples to augment < 50  -> "Low number of samples ({n_samples}) to apply {test_name} to."
                            "F1-Score may not be representative of true perturbation effect."

total sentences —> 1000
noise_prob —> 0.5
for low augmentation coverage —> add, strip punc, accent_conversion, entity_swapping, add contraction
For this ones, we can apply some samples —> not all
add_punction —> sentence already have punctuation —> skip
Not all sentences can be contracted —> is not —> isn’t
1000 sentences —> noise prob 0.5 —> we can try to apply augmentation to only around 500, bcs of the noise prob
1000 —> 500 (added with noise prob) — 500 (will be searched for contraction)
Among 500 samples —> 25 contraction augmentation
While we are testing our perturbation —> perturbation set contains 500 original sentence and 25 augmented samples
Problem 1 -> we know 500 of them (original sentences) will be correct.
total noise samples will be 500 + 25 but we are comparing only 25 of them
this cause high f1 —> it seems like model don’t have problem in this perturbation test

Features Backlog

Parked Ideas 🚗

Noisy Labels Fixing Roadmap

Adaptations

Adapt handler classes for noisy labels testing in Spark NLP
Adapt handler classes for noisy labels testing in transformers
Adapt handler classes for noisy labels testing in spaCy

Improvements

#32
Prettify UI dropdown, groupby sentences like in ALAB UI
Add type checking to main function
Supporting classification, assertion, relation extraction tasks

Bug Fixes

Fix UI jupyter lab compatibility

Add minimum pass rate config to each test in yaml

In yaml:

- uppercase:
    - min_pass_rate: 75%

or maybe better to have 1 config for all tests - @ArshaanNazir you can decide when implementing

Create docs website in line with Spark NLP theme

Tasks for release-0.0.1:

Clone spark nlp docs
Make inventory of content needed, create GH issues for each item

Tasks for release-0.0.2:

@agsfer deploy locally
@luca-martial edit content and structure

NERModelHandler Roadmap

Description

Create a NERModelHandler class that establishes a common way for inference and training on NER models from different libraries. This includes:

Wrapping NER inference pipelines for Spark NLP, transformers and spaCy
Standardizing output formats for all pipeline predictions
Wrapping training process for Spark NLP, transformers and spaCy models

This issue will be used to track the sub-tasks required to launch and maintain this class.

Tasks

Ensure class supports robustness testing with Spark NLP
Ensure class supports bias testing with Spark NLP
Ensure class supports noisy label testing/fixing with Spark NLP
Ensure class supports robustness testing with transformers
Ensure class supports bias testing with transformers
Ensure class supports noisy label testing/fixing with transformers
Ensure class supports robustness testing with spaCy
Ensure class supports bias testing with spaCy
Ensure class supports noisy label testing/fixing with spaCy

Implement unit tests to automate testing of the nlptest library

Add unittest for robustness fixing

Convert all docstrings from reStructuredText to Google style docstrings

Use this package to automate the conversion: https://github.com/cbillingham/docconvert

Fill out design sheet

Sheet can be found in Development channel - nlptest features.xlsx\

Please fill out the Design tab

Documentation Roadmap

Description

This is the roadmap for all things related to knowledge translation and documentation. This includes:

tutorial notebooks
readme instructions
blogposts
docs

Wrap NER inference pipelines in Spark NLP
Standardize output formats for all pipeline predictions

It will go under nlptest/model_handlers/

The first version of this class will support robustness testing/fixing in Spark NLP, but adaptability to other tasks and libraries should be kept in mind when developing it.

Convert perturbations to subclasses instead of methods of PerturbationFactory

Fixes Backlog

Parked Ideas 🚗

Removing transformers dependency if possible

Bias Testing Roadmap

Adaptations

Pick and implement final gender classification method
Replace NerDLMetrics with sklearn classification_report
Adapt handler classes for bias testing in Spark NLP
Adapt handler classes for bias testing in transformers
Adapt handler classes for bias testing in spaCy

Improvements

#30
Add type checking to main function
Supporting classification, assertion, relation extraction tasks

Bug Fixes

🎉

Regroup & standardize perturbations for robustness_testing and robustness_fixing into 1 module: perturbations.py

Robustness Fixing Roadmap

Adaptations

#10
#12
Adapt NERDataHandler class for robustness fixing in transformers
Adapt NERDataHandler class for robustness fixing in spaCy

Improvements

#19
#29
Add optimization algo for all augmentations so that we get 100% augmentation coverage for all tests as much as possible (use similar method to new noise_proportion method in robustness testing -> sample after augmenting all)
Track if change occurred in every sentence for every augmentation
Apply entity swapping to all entities in sentence (instead of 1)
Add type checking to main function
Supporting classification, assertion, relation extraction tasks
Become compatible with many types of DOCSTART

Bug Fixes

🎉

Robustness Testing Roadmap

Adaptations

Colab notebook used to experiment: https://colab.research.google.com/drive/1VkBn3xn0yS1MzxR_w30ONRgZNNDjVByt?usp=sharing

Improvements

#16
#28
Apply entity swapping to all entities in sentence (instead of 1)
Add type checking to main function
Supporting classification, assertion, relation extraction tasks

Bug Fixes

🚀

Description

We need to build mechanisms to test data for different categories of privacy attacks:

Membership inference attack: An adversary predicts whether a known subject was a present in the training data used for training the synthetic data model.
Re-identification attack: The adversary explores the probability of some features being re-identified using synthetic data and matching to the training data.
Attribute inference attack: The adversary predicts the value of sensitive features using synthetic data.

Main article discussing mechanisms.

Tasks

🕐

johnsnowlabs / langtest Goto Github PK

langtest's Issues

Description

Tasks

Adaptations

Improvements

Bug Fixes

Parked Ideas 🚗

Adaptations

Improvements

Bug Fixes

Description

Tasks

Description

Tasks

Parked Ideas 🚗

Parked Ideas 🚗

Adaptations

Improvements

Bug Fixes

Adaptations

Improvements

Bug Fixes

Adaptations

Improvements

Bug Fixes

Description

Tasks

Recommend Projects

Recommend Topics

Recommend Org