Comments (6)
We could also connect to GEM
as per @yjernite 's proposal in another thread.
from evaluate.
And to Skimage
from evaluate.
And to Torch Fidelity -- for generative metrics
from evaluate.
I would rather reimplement these metrics than depend on other libs.
My heuristic:
- if the metric is something stable we don't expect to change (Jaccard score): I would reimplement. Note that if the license of the other lib is permissive (apache 2 ou MIT) you can copy with copyright citation and it's fine
- if the metric is expected to change, e.g. BertScore or other metrics that either very recent or based on pretrained neural net: I would rather depend on an external lib to avoid having to keep our metric in sync
from evaluate.
So this would be quite a deviation from the current status of metrics: most of them are wrappers around existing libraies. Just to name a few:
- "accuracy", "recall", "f1" are all wrappers around
scikit-learn
- "meteor" uses
nltk
- "seqeval" is a wrapper for
seqeval
- "spearmanr" comes from
scipy
Would you reimplement all of them? As an example, there are O(100) metrics in scikit-learn and we only have a handful of them inside evaluate
at the moment. scikit-learn has BSD licence which would also mean that we would need a BSD license as well if we copy-paste the code, right?
My main concern is that we can't move very fast and adopt a wide range of modalities if we implement most of them ourselves. I feel like a criteria for adoption is how fast we cover a wide range of metrics which avoids a user to leave the library for a missing one. On the other hand with community metrics users would probably quickly add missing metrics by creating wrappers since that's the fastest way (which is probably what kind of happened in datasets/metrics
.
Regarding dependencies: the idea was that each metric's repository has a requirement.txt file, so we don't need to add the dependencies to evaluate
but being able to easily guide the user to update the installation.
What do you think about the following:
- Phase 1: allow 3rd party integrations slowly wrappers from
evaluate/metrics
- Phase 2: add proper implementations to
evaluate/metrics
where there is a need
from evaluate.
Just giving my half-cent opinion here… if you are going to rewrite the metrics, it could be done in a way that you compute the metrics from a confusion matrix. I know this could be done for metrics used in computer vision, I don't know if the same would be true for NLP.
But as already noted, this would probably take more time than third-party integrations 😅
from evaluate.
Related Issues (20)
- ImportError: To be able to use evaluate-metric/rouge, you need to install the following dependencies['nltk'] using 'pip install # Here to have a nice missing dependency error message early on' for instance' HOT 2
- ValueError: Predictions and/or references don't match the expected format. HOT 1
- Does Rouge score support the multilingual language? HOT 1
- Can't use the BLEU offline. HOT 2
- Shouldn't perplexity range from [1 to inf)? HOT 2
- Cannot use it offline! HOT 1
- Allow for specify coda device in perplexity evaluation
- [Question] How to have no preset values sent into `.compute()`
- METEOR has no option to return unaggregated results
- Perplexity metric does not apply batching correctly to tokenization HOT 1
- Module 'glue' doesn't exist on the Hugging Face Hub either. HOT 1
- Unable to run pip install evaluate[template] HOT 1
- [FR] Confidence intervals for metrics
- How to pass generation_kwargs to the TextGeneration evaluator ?
- Metrics for multilabel problems don't match the expected format. HOT 2
- [Question]Shall we adding a faster BLEU score calculator?
- Can't load exist dataset for evaluation HOT 1
- Problems during run initial step HOT 12
- SyntaxError: closing parenthesis '}' HOT 3
- Evaluation of empty strings with MAUVE results in error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from evaluate.