Giter VIP home page Giter VIP logo

interpreteval's People

Contributors

anonymous4nlp avatar daniellaye avatar jinlanfu avatar neubig avatar pfliu-nlp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

interpreteval's Issues

run_task_ner.sh overwrites git-committed files

When run_task_ner.sh is run, it overwrites a lot of files that are committed to git, and as a result when you try to make any changes and run git commit you get a whole bunch of files that are listed as "need merge". The output files of run_task_ner.sh files should probably be added to .gitignore, and maybe be written to a different directory that is not committed.

M	interpretEval/analysis/ner-fig/Flair-ELMo/bucketInfo.pkl
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-eCon.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-eCon.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-eDen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-eDen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-eFre.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-eFre.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-eLen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-eLen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-oDen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-oDen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-sLen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-sLen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-tCon.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-tCon.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-tFre.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-tFre.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-tag.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-breakdown-tag.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-selfdiag.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-ELMo-selfdiag.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-eCon.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-eCon.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-eDen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-eDen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-eFre.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-eFre.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-eLen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-eLen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-oDen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-oDen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-sLen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-sLen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-tCon.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-tCon.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-tFre.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-tFre.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-tag.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-breakdown-tag.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-selfdiag.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair-selfdiag.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair_ELMo-aideddiag.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair_ELMo-aideddiag.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/conll03-Flair_ELMo-heatmap.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/log.latex
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-eCon.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-eCon.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-eDen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-eDen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-eFre.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-eFre.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-eLen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-eLen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-oDen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-oDen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-sLen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-sLen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-tCon.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-tCon.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-tFre.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-tFre.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-tag.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-breakdown-tag.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-selfdiag.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-ELMo-selfdiag.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-eCon.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-eCon.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-eDen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-eDen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-eFre.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-eFre.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-eLen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-eLen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-oDen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-oDen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-sLen.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-sLen.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-tCon.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-tCon.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-tFre.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-tFre.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-tag.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-breakdown-tag.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-selfdiag.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair-selfdiag.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair_ELMo-aideddiag.pdf
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair_ELMo-aideddiag.png
M	interpretEval/analysis/ner-fig/Flair-ELMo/wnut16-Flair_ELMo-heatmap.png
M	interpretEval/analysis/tEval-ner.html

Error when generating figures

Hi,

I'm currently running analysis and it seems to be kinda working but it's dying on generating figures.

Traceback (most recent call last):
  File "genFig.py", line 550, in <module>
    genHistTex(dict_breakdown_m2, path_template_breakdown, path_tex_base_breakdown, corpus_type, model_name2, data_mn_attr_range[corpus_type][model_name2], dict_data_bucketInfo[corpus_type])
  File "genFig.py", line 115, in genHistTex
    xticklabel = getTuple(eval(xticklabel_list[idx].split(":")[1]))

Here are scripts to completely reproduce the error, just check out the "analysis_scripts" branch of the repo and run run_all_analyses.sh: https://github.com/neubig/masakhane-ner/tree/analysis_scripts

Could you please help take a look?

Support for other NER datasets

Hi,

I really enjoyed reading the paper (great analysis and good attributes for NER tasks) and love the idea of both a leaderboard with detailed information and comparisons between other models ❤️

So I would like to ask, if you do plan to add support for other NER datasets like e.g. CoNLL-2003 (German, both the revisited and the original dataset), CoNLL-2002 (Spanish and Dutch) or even GermEval 2014 (which is an awesome resource for German NER).

I would love to provide you the output of our systems that we've trained for the CoNLL datasets (see paper here with Alan from Flair) and for GermEval (see paper here with the people from Deepset).

Additionally, there's a corrected version of the English CoNLL coming soon (see here) and adding support for it on the ExplainaBoard would be really awesome (to compare the "uncorrected" vs. corrected dataset).

Many thanks for the great work!

Stefan

Bug: Different models generate same breakdown performance pics.

Describe the bug

The generated html shows totally the same result on different models(Flair&ELMo) like this:
WechatIMG60
It's absolutely that something went wrong because both Flair and ELMo show the same Break-down Performance.
Same bug happens on Self-diagnosis:
WechatIMG61

Debug and Fix

At first I thought maybe the problem is within the main program,but the txt outcome was correct:

# break-down performance
Flair
eCon	0:0.8938461538461538 1:0.8558758314855874 2:0.9645621181262729 3:0.9733250620347393
tCon	0:0.8864388092613011 1:0.8766129032258065 2:0.9627551020408164 3:0.985190670122177
eFre	0:0.8963153384747216 1:0.9464209172738963 2:0.9558373414954089 3:0.960822722820764
tFre	0:0.9345070422535211 1:0.9395325203252033 2:0.9451523545706371 3:0.9270331083252974
eLen	0:0.9355970253963799 1:0.9315525876460768 2:0.8631578947368422 3:0.8507462686567164
sLen	0:0.9378569029224051 1:0.9269841269841269 2:0.9219178082191781 3:0.9319938176197835
eDen	0:0.9208025343189017 1:0.9346981997882103 2:0.9467226348078787 3:0.9323786793953858
oDen	0:0.9352896914973664 1:0.9170383586083855 2:0.89103690685413 3:0.9480401093892433
tag	0:0.9374064091045223 1:0.8423295454545454 2:0.9177877428998505 3:0.974058060531192

ELMo
eCon	0:0.8814928818776453 1:0.8501118568232663 2:0.9590263691683569 3:0.9699624530663328
tCon	0:0.8745598591549296 1:0.872125857200484 2:0.958141909137315 3:0.9838380085454208
eFre	0:0.8852177644282343 1:0.9394589952769429 2:0.9527145359019265 3:0.9527559055118111
tFre	0:0.9295774647887324 1:0.9336721728081323 2:0.9434903047091413 3:0.9157792836398838
eLen	0:0.9279887482419129 1:0.9234209055338177 2:0.8482328482328482 3:0.8467153284671532
sLen	0:0.931986531986532 1:0.9090265486725665 2:0.9142661179698217 3:0.9323017408123792
eDen	0:0.9160789844851905 1:0.9270538243626062 2:0.9272080232934325 3:0.9330677290836653
oDen	0:0.9269641734758014 1:0.907953529937444 2:0.8804920913884007 3:0.9451553930530164
tag	0:0.9340956966596449 1:0.8133903133903134 2:0.9083308450283668 3:0.9715170278637771

# self-diagnosis 
Flair
eCon	1:0.8558758314855874 3:0.9733250620347393 0.1174492305491519
tCon	1:0.8766129032258065 3:0.985190670122177 0.10857776689637044
eFre	0:0.8963153384747216 3:0.960822722820764 0.06450738434604242
tFre	3:0.9270331083252974 2:0.9451523545706371 0.01811924624533967
eLen	3:0.8507462686567164 0:0.9355970253963799 0.08485075673966347
sLen	2:0.9219178082191781 0:0.9378569029224051 0.015939094703226964
eDen	0:0.9208025343189017 2:0.9467226348078787 0.02592010048897697
oDen	2:0.89103690685413 3:0.9480401093892433 0.05700320253511337
tag	1:0.8423295454545454 3:0.974058060531192 0.13172851507664662

ELMo
eCon	1:0.8501118568232663 3:0.9699624530663328 0.11985059624306649
tCon	1:0.872125857200484 3:0.9838380085454208 0.11171215134493684
eFre	0:0.8852177644282343 3:0.9527559055118111 0.06753814108357681
tFre	3:0.9157792836398838 2:0.9434903047091413 0.02771102106925749
eLen	3:0.8467153284671532 0:0.9279887482419129 0.08127341977475966
sLen	1:0.9090265486725665 3:0.9323017408123792 0.02327519213981266
eDen	0:0.9160789844851905 3:0.9330677290836653 0.01698874459847477
oDen	2:0.8804920913884007 3:0.9451553930530164 0.06466330166461576
tag	1:0.8133903133903134 3:0.9715170278637771 0.15812671447346371

So the math is correct, after some efforts, I found the incorrct problem(genFig.py line467-489):

        elif block.find("break-down performance") != -1:
        metaInfo_m1 = extValue(block, model_name1+":\n", "\n\n")
        metaInfo_m2 = extValue(block, model_name2+":\n", "\n\n")
        dict_breakdown_m1 = str2dict(metaInfo_m1)
        dict_breakdown_m2 = str2dict(metaInfo_m2)


    elif block.find("self-diagnosis") != -1:
        metaInfo_m1 = extValue(block, model_name1+":\n", "\n\n")
        metaInfo_m2 = extValue(block, model_name2+":\n", "\n\n")
        dict_self_diag_m1 = str2dict(metaInfo_m1)
        dict_self_diag_m2 = str2dict(metaInfo_m2)

    elif block.find("aided-diagnosis line-chart") != -1:
        metaInfo_m1_2 = extValue(block, model_name1+"_"+model_name2+ ":\n", "\n\n")
        dict_aided_diag_hist_m1_2 = str2dict(metaInfo_m1_2)




    elif block.find("aided-diagnosis heatmap") != -1:
        metaInfo_m1_2 = extValue(block, model_name1+"_"+model_name2+ ":\n", "\n\n")
        dict_aided_diag_heatmap_m1_2 = str2dict(metaInfo_m1_2)

The extValue() takes in a parameter like this:model_name1+":\n", however , if you look at the block(first parameter this method takes), you will find out that model name doesn't end with ':\n', it just end with '\n'.
So after you change the codeblock into:(delete all colons)

    elif block.find("break-down performance") != -1:
        metaInfo_m1 = extValue(block, model_name1+"\n", "\n\n")
        metaInfo_m2 = extValue(block, model_name2+"\n", "\n\n")
        dict_breakdown_m1 = str2dict(metaInfo_m1)
        dict_breakdown_m2 = str2dict(metaInfo_m2)


    elif block.find("self-diagnosis") != -1:
        metaInfo_m1 = extValue(block, model_name1+"\n", "\n\n")
        metaInfo_m2 = extValue(block, model_name2+"\n", "\n\n")
        dict_self_diag_m1 = str2dict(metaInfo_m1)
        dict_self_diag_m2 = str2dict(metaInfo_m2)

    elif block.find("aided-diagnosis line-chart") != -1:
        metaInfo_m1_2 = extValue(block, model_name1+"_"+model_name2+ "\n", "\n\n")
        dict_aided_diag_hist_m1_2 = str2dict(metaInfo_m1_2)




    elif block.find("aided-diagnosis heatmap") != -1:
        metaInfo_m1_2 = extValue(block, model_name1+"_"+model_name2+ "\n", "\n\n")
        dict_aided_diag_heatmap_m1_2 = str2dict(metaInfo_m1_2)

It will work properly and produce:
WechatIMG62
WechatIMG63

Because Flair and ELMo performed almost the same, the fix isn't clear.But if you use other models you will see it clearly.
I will submit a pull request for this fix, it's not a big deal but I am really happy I can be a part of this gorgeous project!!

Proposal to make adding new datasets easier

Hi,

I have a proposal to make adding new datasets easier:

  1. For each dataset, consolidate the location of all the related data into something like:

    data/ner/conll03/conll03.conf
    data/ner/conll03/data
    data/ner/conll03/results
    data/ner/conll03/precomputed

so it's clear where all the data is.

  1. Show an example of going from just the "data" and "results" directory (i.e. stuff that's in easy-to-interpret formats such as CoNLL format) to a full report, including the precomputation.

What do you think? This would be very helpful and hopefully not too much work?

How to upload the model predictions?

Hi,
I could not figure out how to upload the file containing model predictions to ExplainaBoard.
I have a NER model for the CoNLL-2003 dataset and also a file containing its dev and test predictions.
Is it possible to analyze our model by submitting the test predictions to ExplainaBoard?
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.