Giter VIP home page Giter VIP logo

fever's Introduction

Fact Extraction and VERification

This is an implementation of the FEVER shared task. The goal is to create a system based on a large corpus which determines whether a given claim is supported, refuted, or with not enough information for factual verification. We use pre-processed Wikipedia Pages (June 2017 dump) as the evidence corpus and this is provided by the FEVER task, together with the large training dataset with 185,445 claims generated by altering sentences extracted from Wikipedia. The dataset is labeled as Supported, Refuted, and NotEnoughInfo with necessary evidences for the judgenment. We employ TF-IDF and PMI approaches to the term-document frequency matrix to retrieve most relevant documents and sentences. Simple linear classification with overlapping words and word cross-product feature function results in 37% and 42% accuracy, respectively, and it is comparable to the results of the baseline approach in the original FEVER paper.

This work can be seen as a simplified implementation and (small) expansion of the pipeline baseline described in the paper: FEVER: A large-scale dataset for Fact Extraction and VERification.

This is a final project for CS224U Natural Language Understanding, Spring 2018 at Stanford University.

Installation

Clone the repository

git clone https://github.com/jongminyoon/fever.git
cd fever

Install requirements (run export LANG=C.UTF-8 if installation of DrQA fails)

pip install -r requirements.txt

Download the FEVER dataset from the website into the data directory

mkdir data
mkdir data/fever-data

# We use the data used in the baseline paper
wget -O data/fever-data/train.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/train.jsonl
wget -O data/fever-data/dev.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/paper_dev.jsonl
wget -O data/fever-data/test.jsonl https://s3-eu-west-1.amazonaws.com/fever.public/paper_test.jsonl

Data Preparation

The data preparation consists of three steps: downloading the articles from Wikipedia, indexing these for the Evidence Retrieval and performing the negative sampling for training.

1. Download Wikipedia data

Download the pre-processed Wikipedia articles from the website and unzip it into the data folder.

wget https://s3-eu-west-1.amazonaws.com/fever.public/wiki-pages.zip
unzip wiki-pages.zip -d data

2. Construct SQLite Database

Construct an SQLite Database. A commercial personal laptop seems not work when dealing with the entire database as a single file so we split the Wikipedia database into a few files too.

python build_db.py data/wiki-pages data/single --num-files 1
python build_db.py data/wiki-pages data/fever --num-files 5

3. Create Term-Document count matrices and merge

Create a term-document count matrix for each split, and then merge the count matrices.

python build_count_matrix.py data/fever data/index
python merge_count_matrix.py data/index data/index

4. Reweight the count matrix

Two schemes are tried, TF-IDF and PMI.

python reweight_count_matrix.py data/index/count-ngram\=1-hash\=16777216.npz data/index --model tfidf
python reweight_count_matrix.py data/index/count-ngram\=1-hash\=16777216.npz data/index --model pmi

Results

The remaining task for FEVER challenge, i.e. document retrieval, sentence selection, sampling for NotEnoughInfo, and RTE training are done in IPython notebook fever.ipynb and implementation in fever.py. The class Oracle reads either TF-IDF or PMI matrix and have methods for finding relevant documents, sentences, etc. given the input claim.

1. Document Retrieval

The oracle accuracies for document retrieval for varying number of documents retrieved are

Accuracy (%) Model
Num Docs TF-IDF PMI
1 23.2 23.2
3 45.5 45.5
5 56.9 56.9
10 69.0 69.0

2. Sentence Selection

Num Docs Accuracy (%)
1 51.2
3 67.0
5 72.7
10 81.8

3. RTE Training

We used logistic classifier with grid cross-validation for best hyperparamters. The details can be found in fever_paper.pdf in the folder reports.

1) Word-overlapping feature

Precision Recall F1 score
Supported 0.337 0.798 0.455
Refuted 0.426 0.012 0.023
NEI 0.362 0.326 0.343
avg / total 0.374 0.346 0.274

2) Word cross-product feature

Precision Recall F1 score
Supported 0.378 0.410 0.394
Refuted 0.535 0.219 0.311
NEI 0.339 0.527 0.420
avg / total 0.421 0.385 0.375

fever's People

Contributors

jongminyoon avatar

Stargazers

Vinh Lã Tuấn avatar Simone Azeglio avatar Mike Z avatar

Watchers

paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.