Comments (10)
Run with the following command (GPU optional). Gets about 88% accuracy which is within 1% of the score reported in [Riedel et al 2017].
On our dataset, we're sitting about 60% accuracy for a 2-way classification 👎
GPU=1 PYTHONPATH=src:lib/DrQA python src/scripts/rte/fnc_riedel.py
(on FNC data)
GPU=1 PYTHONPATH=src:lib/DrQA python src/scripts/rte/fever_riedel.py
(on our data)
This is without early stopping or a clever learning rate schedule.
I also want to evaluate different vocab/NN sizes. Might try a grid search on Sharc over the weekend.
from naacl2018-fever.
Interesting! The random baseline would be 50% right? (same number of supported/refuted).
I would say let's first complete the full task evaluation (and make sure we are happy with the metrics) and then we optimize the various components according to what the metrics tell us.
from naacl2018-fever.
Training with randomly sampled pages for the not enough info class:
Accuracy on gold labels dev set is approx: 70%.
Accuracy on predicted pages (from DRQA) is currently approx: 56%
A random baseline would be 33%
Will try incorporating DRQA predictions into the training set too.
from naacl2018-fever.
Training on pages solely on pages retrieved from DRQA for the Not Enough Info class gives a dev accuracy of 0.37
Training on FNC (merging discuss and unrelated into not enough info) and testing on pages predicted with DRQA gives an accuracy of 0.35.
from naacl2018-fever.
from naacl2018-fever.
Got 2 families of experiments going for this MLP model for generating training data for the Not Enough Info
class. Method 1: NN - use the closest page from DRQA. Method 2: RS - Randomly sample.
Oracle RS - no DrQA for the test-time predictions. Just using the annotator labeled pages. Accuracy 71%. Confusion matrix/classification report: https://pastebin.com/x43y9vR0
Oracle NN - using DrQA just to identify the nearest neighbour pages for NEI
claims. Accuracy 53%
https://pastebin.com/1MZxpaBg
DrQA selecting k pages for all claims
Confusion matrix for k=1 RS model https://pastebin.com/UQtTHYvK
from naacl2018-fever.
PS: that has early stopping with patience=8.
I think the reason the RS model is doing well is because the cosine similarity between TF-IDF vectors (one of the features) is going to be v. low for unrelated documents. this might work quite nicely as a document relatedness filter.
from naacl2018-fever.
Hey, yes, makes sense. Random is often hard to beat, and for a good reason (unless we know what mistakes we will make). Is it correct to say that the oracle RS is a bit more oracle than the oracle NN as the first uses the labeled pages by the annotators while the second one uses DRQA?
from naacl2018-fever.
Both use labeled pages for the support/refutes classes. It's just for the NEI class where we have no labeled pages. I think because the nearest neighbour pages are more semantically similar than randomly sampled, the classifier needs to be more sensitive which we cannot achieve with this mlp
from naacl2018-fever.
got it. The reason could be what you say. Maybe see if the NN chosen ones help if added to the RS chosen ones.
from naacl2018-fever.
Related Issues (20)
- MLP model training crashes if models directory doesnt exist
- problems with model 2 from readme HOT 2
- Some training data files missing HOT 6
- Cannot find "OnlineTfidfDocRanker" HOT 5
- MLP training will crash if models directory doesn't exist
- ImportError: cannot import name 'Dataset' HOT 3
- Failed to build DrQA
- NameError: name 'get_count_matrix' is not defined HOT 2
- Key Error when running drQA HOT 1
- installation fails with pip 10.0.1 HOT 2
- Got TypeError: unhashable type: 'list' when running eval_mrr.py
- Evidence Retrieval Evaluation being killed
- Evaluation speed HOT 1
- Get error in the initialization regex HOT 1
- pytorch 0.3.1 seems too outdated to install HOT 1
- fever competition
- Pre-trained model.tar.gz Not Available HOT 2
- Error in Data Preparation HOT 2
- Can't download dataset HOT 2
- Can't download pretrained model HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from naacl2018-fever.