Giter VIP home page Giter VIP logo

simple-qa-emnlp-2018's Introduction

Simple Question Answering — EMNLP 2018

This is the code for the EMNLP 2018 paper "SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach".

On the SimpleQuestions dataset task, one of the most commonly used benchmarks for studying single-relation factoid questions, we:

  1. Show that ambiguity in the data bounds performance on this benchmark at 83.4%; there are often multiple answers that cannot be disambiguated from the question alone.
  2. Introduce a baseline that sets a new state-of-the-art performance level at 78.1% accuracy, using only standard methods.

Example

Preview of the software

Structure

.
├── /notebooks/                          
│   ├── /Simple QA End-To-End/           # Experiments on components of the end-to-end QA pipeline
│   ├── /Simple QA Models                # Experiments on various neural models
│   ├── /Simple QA KG to PostgreSQL DB   # Scripts to populate postgreSQL
│   ├── /Simple QA Numbers               # Scripts for computing and verifying various numbers
├── /pretrained_models/                   
├── /lib/                                # Various utility functionality
├── /tests/                               
├── .flake8                               
└── requirements.txt                     # Required python packages

Prerequisites

This repository requires Python 3.5 or greater and PostgreSQL.

Installation

  • Clone the repository and cd into it
git clone https://github.com/PetrochukM/Simple-QA-EMNLP-2018.git
cd Simple-QA-EMNLP-2018
  • Install the required packages
python -m pip install -r requirements.txt
  • Create and populate a PostgreSQL table named fb_two_subject_name with notebooks/Simple QA KG to PostgreSQL DB/fb_two_subject_name.csv.gz

  • Download the SimpleQuestions v2 dataset from Facebook Research. Use the notebook at Simple-QA-EMNLP-2018/notebooks/Simple QA KG to PostgreSQL DB/FB5M & FB2M KG to DB.ipynb to create and populate a PostgreSQL table.

  • You're done! Feel free to run Simple-QA-EMNLP-2018/notebooks/Simple QA End-To-End.

Slides

The slides used for our EMNLP talk.

Citation

@article{Petrochuk2018SimpleQuestionsNS,
  title={SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach},
  author={Michael Petrochuk and Luke S. Zettlemoyer},
  journal={CoRR},
  year={2018},
  volume={abs/1804.08798}
}

Important Notes

  • The FB2M and FB5M subsets of Freebase KG can complete 7,188,636 and 7,688,234 graph queries respectively; therefore, the FB5M subset is 6.9% larger than the FB2M subset. Also, the FB5M dataset only contains 3.98M entities. This contradicts the statement that "FB5M, is much larger with about 5M entities" (Bordes et al., 2015).
  • FB5M and FB2M contain 4,322,266 and 3,654,470 duplicate grouped facts respectively.
  • FB2M is not a subset of FB5M, 1 atomic fact is in FB2M that is not in FB5M: (01g4wmh, music/album/acquire_webpage, 02q5zps).
  • FB5M and FB2M do not contain the answer for 24 and 36 examples in SimpleQuestions dataset respectively; therefore, those examples are unanswerable.

Other Important Papers

Other Important GitHub Repositories

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.