Giter VIP home page Giter VIP logo

snli_nn_tf's Introduction

snli hacking (in tensorflow)

note: port of snli hacking in theano

hacking with the Stanford Natural Language Inference corpus. problem is to decide if two sentences are neutral, contradict each other or the first entails the second.

model dev accuracy
log_reg_baseline.py 0.667
nn bidir gru -> concat -> mlp 0.780

baseline

simple logistic regression using token features, tokens from sentence 1 prepended with "s1_", tokens from sentence 2 prepended with "s2_"

$ time ./log_reg_baseline.py

train confusion
 [[121306  27808  34073]
  [ 29941 117735  35088]
  [ 23662  20907 138847]] (accuracy 0.687)

dev confusion
 [[2077  549  652]
  [ 546 2044  645]
  [ 474  404 2451]] (accuracy 0.667)

# approx 6m

nn models

lots more variants explored in the theano version of this project.

bidir grus -> concat -> mlp -> logreg

first model is

  • a bidirictional gru rnn over each sentence
  • concatted final states -> couple layer mlp -> 3way logisitic regression
$ ./nn_baseline.py --help
usage: nn_baseline.py [-h] [--train-set TRAIN_SET]
                      [--num-from-train NUM_FROM_TRAIN] [--dev-set DEV_SET]
                      [--num-from-dev NUM_FROM_DEV]
                      [--dev-run-freq DEV_RUN_FREQ] [--batch-size BATCH_SIZE]
                      [--hack-max-len HACK_MAX_LEN] [--hidden-dim HIDDEN_DIM]
                      [--embedding-dim EMBEDDING_DIM]
                      [--num-epochs NUM_EPOCHS] [--optimizer OPTIMIZER]
                      [--learning-rate LEARNING_RATE] [--momentum MOMENTUM]
                      [--mlp-config MLP_CONFIG] [--restore-ckpt RESTORE_CKPT]
                      [--ckpt-dir CKPT_DIR] [--ckpt-freq CKPT_FREQ]
                      [--disable-gpu]
                      [--initial-embeddings INITIAL_EMBEDDINGS]
                      [--vocab-file VOCAB_FILE] [--dont-train-embeddings]


optional arguments:
  -h, --help                       show this help message and exit
  --train-set TRAIN_SET
  --num-from-train NUM_FROM_TRAIN  number of batches to read from train. -1 => all
  --dev-set DEV_SET
  --num-from-dev NUM_FROM_DEV      number of batches to read from dev. -1 => all
  --dev-run-freq DEV_RUN_FREQ      frequency (in num batches trained) to run against dev set
  --batch-size BATCH_SIZE          batch size
  --hack-max-len HACK_MAX_LEN      hack; need to do bucketing, for now just ignore long egs
  --hidden-dim HIDDEN_DIM          hidden node dimensionality
  --embedding-dim EMBEDDING_DIM    embedding node dimensionality
  --num-epochs NUM_EPOCHS          number of epoches to run. -1 => forever
  --optimizer OPTIMIZER            optimizer to use; some baseclass of tr.train.Optimizer
  --learning-rate LEARNING_RATE
  --momentum MOMENTUM              momentum (for MomentumOptimizer)
  --mlp-config MLP_CONFIG          pre classifier mlp config; array describing #hidden
                                   nodes per layer. eg [50,50,20] denotes 3 hidden
                                   layers, with 50, 50 and 20 nodes. a value of []
                                   denotes no MLP before classifier
  --restore-ckpt RESTORE_CKPT      if set, restore from this ckpt file
  --ckpt-dir CKPT_DIR              root dir to save ckpts. blank => don't save ckpts
  --ckpt-freq CKPT_FREQ            frequency (in num batches trained) to dump ckpt to --ckpt-dir
  --disable-gpu                    if set we only run on cpu
  --initial-embeddings INITIAL     initial embeddings npy file. requires --vocab-file
  --vocab-file VOCAB_FILE          vocab (token -> idx) for embeddings, required if using --initial-embeddings
  --dont-train-embeddings          if set don't backprop to embeddings

precalculated embeddings

to use pretrained embeddings we first build a vocab mapping tokens -> row ids

time cat data/snli_1.0_train.jsonl | ./generate_vocab_from_snli.py  > vocab.tsv

we then convert glove embeddings to an npy matrix using the above vocab. for entries not in the glove data we generate a random vector scaled to the median length of the observed glove embeddings. ids 0 and 1 are "reserved" for PAD and UNK where PAD is a zero vector.

time ./convert_glove_embeddings.py \
 --vocab vocab.tsv \
 --glove-data glove/glove.6B.300d.txt \
 --npy glove/snli_glove.npy \
 --random-projection-dimensionality 100

using pretrained embeddings gives a big bump in initial convergence but randomly picked ones overtake them (quite a bit) later on.

baseline

snli_nn_tf's People

Contributors

matpalm avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.