Giter VIP home page Giter VIP logo

rnn_agreement's Introduction

RNN subject-verb agreement

Code for the paper Assessing the ability of LSTMs to learn syntax-sensitive dependencies. Dependencies:

  • numpy
  • keras
  • theano (though the TensorFlow backend is likely to also work)
  • pandas

For the Google LM evaluation, you would need to install TensorFlow and download the trained model.

If you're just looking for the subject-verb dependency data in a simple format and are not planning to run the code in this repository, download our simple dependency dataset.

Quick start

Follow this section if you'd like to run the code on the same set of dependencies we used in the paper.

All of the functions should accept all relevant filenames as arguments, but in general the easiest thing to do is to set the environment variable RNN_AGREEMENT_ROOT to wherever you cloned this repository.

After cloning the repository, download the dependency dataset into the data subdirectory and unzip the file.

from rnnagr.agreement_acceptor import PredictVerbNumber
from rnnagr import filenames

pvn = PredictVerbNumber(filenames.deps, prop_train=0.1)
pvn.pipeline()

After running this code, pvn.test_results will be a pandas data frame with all of the relevant results.

The file experiments.py contains the code used to run the experiments reported in the paper; see that file for examples of tasks and training regimes other than PredictVerbNumber.

More

If you'd like to regenerate the set of dependencies from the corpus (and perhaps modify our criteria), download the subset of the parsed Wikipedia corpus we used (1.7 GB).

from rnnagr.collect_agreement import CollectAgreement
from rnnagr.utils import deps_to_csv
import filenames
ca = CollectAgreement(filenames.parsed_wiki_subset_50, modes=('infreq_pos',),
skip=0, most_common=10000)
ca.collect_agreement()
deps_to_tsv(ca.deps, 'agr_mostcommon_10K.tsv')

Citation

If you use our data or code for academic work, please cite:

@article{linzen2016assessing,
    Author = {Linzen, Tal and Dupoux, Emmanuel and Goldberg, Yoav},
    Journal = {Transactions of the Association for Computational Linguistics},
    Title = {Assessing the ability of {LSTMs} to learn syntax-sensitive dependencies},
    Volume = {4},
    Pages = {521--535},
    Year = {2016}
}

rnn_agreement's People

Contributors

tallinzen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.