The metric4coref from lowinli

Hi there 👋

🌱 I’m currently learning AI

metric4coref's People

Contributors

Stargazers

Watchers

metric4coref's Issues

The reference implementation or paper of the MUC metric

Hi~

Appreciate so much for your excellent work! However, since it is quite different from the original paper, A Model-Theoretic Coreference Scoring Scheme, I wonder which implementation or paper did you refer to when realizing the MUC metric? After I checked several test cases, it seems that your implementation of the MUC metric is more critical than the paper I mentioned.

F1 score error

Hi, thank you for this great tool and the elaborated explanation of those metrics. :)

Just a kind reminder that in the calculation of f1, it is possible for both precision and recall to be zero and thus it might give an error as the denominator might be zero. The same for precision and recall, too.

Many thanks again!

Problem in b_cubed calculation

Hi again, I rechecked the code and I found there was a tiny problem. After summing up all the precisions and recalls of all mentions, the value should be normalized by the number of mentions either in the predicted clusters (for precision) or in the gold clusters (for recall), but you the number you used is simply the number of the intersection between two clusters, which makes the denominator incorrect.

Regarding the MUC, I commented on another closed issue.

Hope it makes sense. :)

Issue with shuffling cluster order in MUC

Hello

Thanks so much for making this available in python - it's really useful!

I think there might be a bit of a problem with the MUC measure - it gives different results depending on the order of the cluster elements.

For example:

predict_clusters = [["a", "b", "c"], ["d", "e", "f", "g"], ["h", "i", "j"], ["k"]]
predict_clusters_shuffled = [["c", "b", "a"], ["e", "d", "g", "f"], ["j", "i", "h"], ["k"]]
gold_clusters = [["a", "b", "d"], ["c", "e", "f", "g"], ["h", "i", "j", "k"]]

print(metric4coref.muc(predict_clusters, gold_clusters))
>>> (0.5833333333333334, 0.4666666666666667, 0.5185185185185186)

print(metric4coref.muc(predict_clusters_shuffled, gold_clusters))
>>> (0.16666666666666666, 0.13333333333333333, 0.14814814814814814)

This doesn't seem to be a problem with any of the other measures.

For now, I'm just sorting all my clusters before I evaluate them, but I just wanted to flag!

Thanks,
Emily

lowinli / metric4coref Goto Github PK

metric4coref's Introduction

Hi there 👋

metric4coref's People

Contributors

Stargazers

Watchers

Forkers

metric4coref's Issues

The reference implementation or paper of the MUC metric

F1 score error

Problem in b_cubed calculation

Issue with shuffling cluster order in MUC

催更

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent