Giter VIP home page Giter VIP logo

cdmalon / finetune-transformer-lm Goto Github PK

View Code? Open in Web Editor NEW

This project forked from openai/finetune-transformer-lm

2.0 1.0 0.0 413.42 MB

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"

Home Page: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf

License: Other

Python 100.00%

finetune-transformer-lm's Introduction

finetune-transformer-lm

Code and model for the paper "Improving Language Understanding by Generative Pre-Training"

Entailment

This modified version of the code supports the option "--dataset entailment", to train models for entailment problems such as SNLI, following the description in the paper.

This model was used in the FEVER Fact Extraction and Verification challenge for the system with best evidence F1 score, as described in the paper Team Papelo: Transformer Networks at FEVER.

Training input is expected to be in files named train.premise, train.hypothesis, and train.label. These files should use one line per example. The premise and hypothesis will be tokenized by Spacy. Each label should be 0, 1, or 2 (ESIM convention is to use 0 for entailment, 1 for neutral, and 2 for contradiction). Similarly, development and testing sets should be put in files named dev.premise, test.premise, etc. These nine files are expected in data_dir.

Train with a command like:

python train.py --dataset entailment --desc entailment --submit --analysis --data_dir /path/to/data --n_gpu 3 --submission_dir output/submission --save_dir output/save --log_dir output/log

I've also added a prediction script, allowing you to obtain model predictions separately from the training. If the test files are data/test.premise, etc., then the command is like:

python predict.py --desc entailment --dataset entailment --model_file output/save/entailment/best_params.jl --test_prefix data/test --n_ctx 348 --result_file result.tsv

To run the prediction, you need to supply the amount of context that the model was trained with, as the value of the --n_ctx option. Generally, this depends on the lengths of the examples in your training set. If you didn't remember this value from training time, you can compute it from the saved model by taking the number of entries in the embedding matrix, and subtracting the size of the vocabulary (40478, from encoder_bpe_40000.json) and the number of special tokens (3), as these remaining embeddings are used to encode each of the positions up to n_ctx. The number of entries in the embedding matrix will be reported in the error message you get if you choose the wrong value.

-- Christopher Malon (cdmalon)

ROCStories

Currently this code implements the ROCStories Cloze Test result reported in the paper by running: python train.py --dataset rocstories --desc rocstories --submit --analysis --data_dir [path to data here]

Note: The code is currently non-deterministic due to various GPU ops. The median accuracy of 10 runs with this codebase (using default hyperparameters) is 85.8% - slightly lower than the reported single run of 86.5% from the paper.

The ROCStories dataset can be downloaded from the associated website.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.