Comments (12)
Sorry for the long delay in responding. The format you are using is pred_and_softmax
, where the first column is the prediction and the next columns have the prediction probabilities. You'll note that your predictions do not match your probabilities: the first ten lines in your file have prediction=2 (positive), even though the probability of negative is 99%.
from checklist.
Hi! Could you please give me a more complete example test, so I can take a closer look? e.g. what do you see if you print test.data
and test.conf
from checklist.
Hi! Could you please give me a more complete example test, so I can take a closer look? e.g. what do you see if you print
test.data
andtest.conf
Hi, just gusse you are chinese from your user_name, so I post one example for you to take a look .
Examples: I remove the puncuation of sentence for INV test, because it's NER model so test.conf is None.
In jupyter html, it would be show like:
报道 说 , 印度 目前 外汇 储备 为 255.2亿→ 2552亿 美元 。 Pred: ['印度' '2552亿美元']→['印度' '255.2亿美元']
print(test.data):
[['报道说,印度目前外汇储备为255.2亿美元。', '报道说印度目前外汇储备为2552亿美元']]
Another issue is that if sentence and pred of testcase is too long, jupyter would not should the complete pred in testcase'box of html because I think the pred box can not split lines automatic which not like sentence . So I need to change width of jupyter to 200% to show the hole contenxt.
from checklist.
Thanks for catching both bugs!
I've fixed the display in 8a0f05c, and now the display should also automatically wrap lines for the prediction (in 7997666).
Both should work if you reinstall from the repo; We will bump the pip install later.
Please feel free to re-open the issue if you still run into problems!
(And yes, I'm Chinese :P)
from checklist.
Thanks for catching both bugs!
I've fixed the display in 8a0f05c, and now the display should also automatically wrap lines for the prediction (in 7997666).
Both should work if you reinstall from the repo; We will bump the pip install later.
Please feel free to re-open the issue if you still run into problems!(And yes, I'm Chinese :P)
Thank you for doing that, I want to ask can I just copy some folders like checklist/viewer and checklist/visual_interface and then reinstall the procject? Because I modfied a lot for the project in some files.
By the way, I have seen your photo in 公众号 haha.
from checklist.
Yeah that works, just copy-n-replace /checklist/viewer/static/
, and then install locally again: pip install -e .
:)
On a side note, in case checklist has future updates, you probably want to consider fork the repo, so later you can just fetch our updates!
(Well, at least that photo is with cute doggy lol)
from checklist.
Yeah that works, just copy-n-replace
/checklist/viewer/static/
, and then install locally again:pip install -e .
:)
On a side note, in case checklist has future updates, you probably want to consider fork the repo, so later you can just fetch our updates!
(Well, at least that photo is with cute doggy lol)
Hi, I just copy-n-replace /checklist/viewer/static/ just like you said, and then use pip install -e . in the project. But I get the Output below I don't know it reinstall successfully?
Installing collected packages: checklist
Attempting uninstall: checklist
Found existing installation: checklist 0.0.4
Can't uninstall 'checklist'. No files were found to uninstall.
Running setup.py develop for checklist
Successfully installed checklist
from checklist.
I couldn't really guess what's going on, but hopefully this stackoverflow thread can help?
Another hack to try, essentially forcing pip to upgrade the package:
- bump the version in
setup.py
(here) pip install --upgrade -e .
from checklist.
I couldn't really guess what's going on, but hopefully this stackoverflow thread can help?
Another hack to try, essentially forcing pip to upgrade the package:
- bump the version in
setup.py
(here)pip install --upgrade -e .
Thanks for reply, I do nothing but when I test the jupyter found that Bugs is gone!
from checklist.
Hi :)
I have a similar problem, but related to MFT test type. In the visual summary all is well, but in the textual summary the "Example fails" reports that phrases with negative sentiment have confidence for the negative label = 1.0, while in the visual table the example is reported as misclassified as 2, i.e. positive (Expect: 0, Pred: 2).
The strange thing is that I only get this error when I test a different model than the released ones (amazon, google, etc). Are there particular format-rules to follow besides saving the tests_n500 predictions in a txt file having the format "pred - prob for 0 - prob for 1 - prob for 2"?
Sentiment-laden words in context
Test cases: 8658
Test cases run: 500
Fails (rate): 208 (41.6%)
Example fails:
1.0 0.0 0.0 This was a creepy aircraft.
1.0 0.0 0.0 That cabin crew is creepy.
1.0 0.0 0.0 This food is lame.
from checklist.
Hm, this is odd. Can you provide us with a small example?
The default prediction file format is indeed the prediction followed by the softmax (no matter how many labels there are)
from checklist.
Yes, of course and thank you
These are the first 10 lines of the txt file situated in the predictions folder. The only thing that is different from roberta/amazon/* files is that the probabilities for Negative and Positive counts 5 decimals instead of 6, but I mean, ... This is not a real difference (right?)
2 0.99984 0.000000 0.00016
2 0.99190 0.000000 0.00810
2 0.99999 0.000000 0.00001
2 0.98929 0.000000 0.01071
2 0.99970 0.000000 0.00030
2 0.99999 0.000000 0.00001
2 0.99999 0.000000 0.00001
2 0.99848 0.000000 0.00152
2 0.99999 0.000000 0.00001
2 0.99998 0.000000 0.00002
I report the first lines of the textual summary too: the problem persists only when negative labels are involved
Vocabulary
single positive words
Test cases: 34
Fails (rate): 0 (0.0%)
single negative words
Test cases: 35
Fails (rate): 32 (91.4%)
Example fails:
1.0 0.0 0.0 regretted
1.0 0.0 0.0 dislike
0.4 0.0 0.6 horrible
single neutral words
Test cases: 13
Fails (rate): 13 (100.0%)
Example fails:
1.0 0.0 0.0 international
1.0 0.0 0.0 Israeli
1.0 0.0 0.0 see
Sentiment-laden words in context
Test cases: 8658
Test cases run: 500
Fails (rate): 208 (41.6%)
Example fails:
1.0 0.0 0.0 This airline is dreadful.
1.0 0.0 0.0 That service is terrible.
1.0 0.0 0.0 The cabin crew is terrible.
neutral words in context
Test cases: 1716
Test cases run: 500
Fails (rate): 500 (100.0%)
Example fails:
1.0 0.0 0.0 This is an American crew.
1.0 0.0 0.0 That customer service was Indian.
1.0 0.0 0.0 This was an Indian aircraft.
from checklist.
Related Issues (20)
- IndexError when running test HOT 3
- Release of fine-tuned checkpionts HOT 1
- mltests module? HOT 1
- Uhandled TypeError when running Sentiment analysis test suite HOT 6
- Encoding issue in Qindows with TestSuite.to_raw_file HOT 1
- Text Generation model for negation HOT 1
- 4. The CheckList process.ipynb : pipe([example]) incompatible type HOT 4
- How to load in tests and templates from tsv HOT 1
- Potential Case-Sensitivity Issue When Testing SQuAD HOT 1
- Using functions to convert slot words into different forms HOT 1
- Invoking pip in setup.py breaks custom flags HOT 5
- Is there a way to save all the failed test cases in some text file or in a list? HOT 1
- Google colab HOT 1
- Need to pin notebook version - 7.0.1 breaks install - AttributeError: module 'notebook' has no attribute 'nbextensions' HOT 3
- suite.visual_summary_table() API results in JS error while loading the html table
- Read the 'pkl' file under Windows, prompt 'an integer is required (get type bytes)'
- Reproducibility Issues When Templating
- do you have some progress in this directions since 2 years ago
- editor.synonyms breaks on specific input HOT 1
- Minor fix to Wordnet text generation score HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from checklist.