๐ฑ Iโm currently learning AI
lowinli / metric4coref Goto Github PK
View Code? Open in Web Editor NEWthis is a metric for coreference resolution evaluation
License: MIT License
this is a metric for coreference resolution evaluation
License: MIT License
Hi~
Appreciate so much for your excellent work! However, since it is quite different from the original paper, A Model-Theoretic Coreference Scoring Scheme, I wonder which implementation or paper did you refer to when realizing the MUC metric? After I checked several test cases, it seems that your implementation of the MUC metric is more critical than the paper I mentioned.
Hi, thank you for this great tool and the elaborated explanation of those metrics. :)
Just a kind reminder that in the calculation of f1, it is possible for both precision and recall to be zero and thus it might give an error as the denominator might be zero. The same for precision and recall, too.
Many thanks again!
Hi again, I rechecked the code and I found there was a tiny problem. After summing up all the precisions and recalls of all mentions, the value should be normalized by the number of mentions either in the predicted clusters (for precision) or in the gold clusters (for recall), but you the number you used is simply the number of the intersection between two clusters, which makes the denominator incorrect.
Regarding the MUC, I commented on another closed issue.
Hope it makes sense. :)
Hello
Thanks so much for making this available in python - it's really useful!
I think there might be a bit of a problem with the MUC measure - it gives different results depending on the order of the cluster elements.
For example:
predict_clusters = [["a", "b", "c"], ["d", "e", "f", "g"], ["h", "i", "j"], ["k"]]
predict_clusters_shuffled = [["c", "b", "a"], ["e", "d", "g", "f"], ["j", "i", "h"], ["k"]]
gold_clusters = [["a", "b", "d"], ["c", "e", "f", "g"], ["h", "i", "j", "k"]]
print(metric4coref.muc(predict_clusters, gold_clusters))
>>> (0.5833333333333334, 0.4666666666666667, 0.5185185185185186)
print(metric4coref.muc(predict_clusters_shuffled, gold_clusters))
>>> (0.16666666666666666, 0.13333333333333333, 0.14814814814814814)
This doesn't seem to be a problem with any of the other measures.
For now, I'm just sorting all my clusters before I evaluate them, but I just wanted to flag!
Thanks,
Emily
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.