Create a uniform integration for metrics libraries such as s

And to <a href="https://scikit-image.org/docs/dev/api/skimage.metrics.html" rel="nofol

Feature: integration standard libraries about evaluate HOT 6 OPEN

lvwerra commented on June 8, 2024 1

Feature: integration standard libraries

from evaluate.

Comments (6)

sashavor commented on June 8, 2024 1

We could also connect to GEM as per @yjernite 's proposal in another thread.

from evaluate.

sashavor commented on June 8, 2024 1

And to Skimage

from evaluate.

sashavor commented on June 8, 2024

And to Torch Fidelity -- for generative metrics

from evaluate.

thomwolf commented on June 8, 2024

I would rather reimplement these metrics than depend on other libs.

My heuristic:

if the metric is something stable we don't expect to change (Jaccard score): I would reimplement. Note that if the license of the other lib is permissive (apache 2 ou MIT) you can copy with copyright citation and it's fine
if the metric is expected to change, e.g. BertScore or other metrics that either very recent or based on pretrained neural net: I would rather depend on an external lib to avoid having to keep our metric in sync

from evaluate.

lvwerra commented on June 8, 2024

So this would be quite a deviation from the current status of metrics: most of them are wrappers around existing libraies. Just to name a few:

"accuracy", "recall", "f1" are all wrappers around scikit-learn
"meteor" uses nltk
"seqeval" is a wrapper for seqeval
"spearmanr" comes from scipy

Would you reimplement all of them? As an example, there are O(100) metrics in scikit-learn and we only have a handful of them inside evaluate at the moment. scikit-learn has BSD licence which would also mean that we would need a BSD license as well if we copy-paste the code, right?

My main concern is that we can't move very fast and adopt a wide range of modalities if we implement most of them ourselves. I feel like a criteria for adoption is how fast we cover a wide range of metrics which avoids a user to leave the library for a missing one. On the other hand with community metrics users would probably quickly add missing metrics by creating wrappers since that's the fastest way (which is probably what kind of happened in datasets/metrics.

Regarding dependencies: the idea was that each metric's repository has a requirement.txt file, so we don't need to add the dependencies to evaluate but being able to easily guide the user to update the installation.

What do you think about the following:

Phase 1: allow 3rd party integrations slowly wrappers from evaluate/metrics
Phase 2: add proper implementations to evaluate/metrics where there is a need

from evaluate.

johnnv1 commented on June 8, 2024

Just giving my half-cent opinion here… if you are going to rewrite the metrics, it could be done in a way that you compute the metrics from a confusion matrix. I know this could be done for metrics used in computer vision, I don't know if the same would be true for NLP.

But as already noted, this would probably take more time than third-party integrations 😅

from evaluate.

Feature: integration standard libraries about evaluate HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent