Giter VIP home page Giter VIP logo

representations_of_syntax's Introduction

Representations_Of_Syntax

Code and Resources for the paper 'Representations of Syntax [MASK] Useful: Effects of Constituency and Dependency Structure in Recursive LSTMs' by Lepori, Linzen, and McCoy.

This repository is structured as follows:

  • Artificial Corpus Examples: Contains code to train and run models on artificial corpora. This takes very little time, compared to the natural language training. Thus, it may be useful as a toy example.
  • Corpus Processing: Contains files of helper functions to convert parse trees into representations that the Tree LSTMs can use.
  • Models: Contains the code implementing the models from Tai et al. (2015) https://arxiv.org/abs/1503.00075, a bidirectional LSTM, and the head-lexicalized tree LSTM.
  • Natural Language: Contains the Code to process the LGD dataset from Linzen et al (2016) https://arxiv.org/abs/1611.01368. Also contains the scripts ran on the Maryland Advanced Research Computing Center (MARCC) to obtain results on the natural language corpora. All of these scripts take a long time to run.
  • Natural and Artificial: Contains the code to test pretrained models on the artificial test set. To reproduce our results on the artificial test set, configure the main method of the script test_artificial.py, load in the correct pretrained model (either before or after augmentation), and run. Also contains the code to augment the pretrained models using a variety of PCFG-generated corpora. This also contains example scripts used to test the augmented models on natural language on the Maryland Advanced Research Computing Center (MARCC).
  • Pretrained Models: Contains all models used in the final paper.

Notes:

  • Throughout the code, the head-lexicalized model is referred to as the 'hybrid' model.
  • Because of an error in setting the random seed in the augment_models.py script, fine-tuning the pretrained natural language models is not guaranteed to produce the pretrained augmented models. However, the pretrained augmented models are the exact same as were used for obtaining all results presented in the paper, and this can be verified on our artificial corpus, using the test_artificial.py script.

representations_of_syntax's People

Contributors

mlepori1 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.