carpedm20 / memn2n-tensorflow Goto Github PK

View Code? Open in Web Editor NEW

826.0 47.0 249.0 1.92 MB

"End-To-End Memory Networks" in Tensorflow

Home Page: http://arxiv.org/abs/1503.08895v4

License: MIT License

Python 100.00%

tensorflow memory-network nlp

memn2n-tensorflow's Introduction

End-To-End Memory Networks in Tensorflow

Tensorflow implementation of End-To-End Memory Networks for language modeling (see Section 5). The original torch code from Facebook can be found here.

Prerequisites

This code requires Tensorflow. There is a set of sample Penn Tree Bank (PTB) corpus in data directory, which is a popular benchmark for measuring quality of these models. But you can use your own text data set which should be formated like this.

When you use docker image tensorflw/tensorflow:latest-gpu, you need to python package future.

$ pip install future

If you want to use --show True option, you need to install python package progress.

$ pip install progress

Usage

To train a model with 6 hops and memory size of 100, run the following command:

$ python main.py --nhop 6 --mem_size 100

To see all training options, run:

$ python main.py --help

which will print:

usage: main.py [-h] [--edim EDIM] [--lindim LINDIM] [--nhop NHOP]
              [--mem_size MEM_SIZE] [--batch_size BATCH_SIZE]
              [--nepoch NEPOCH] [--init_lr INIT_LR] [--init_hid INIT_HID]
              [--init_std INIT_STD] [--max_grad_norm MAX_GRAD_NORM]
              [--data_dir DATA_DIR] [--data_name DATA_NAME] [--show SHOW]
              [--noshow]

optional arguments:
  -h, --help            show this help message and exit
  --edim EDIM           internal state dimension [150]
  --lindim LINDIM       linear part of the state [75]
  --nhop NHOP           number of hops [6]
  --mem_size MEM_SIZE   memory size [100]
  --batch_size BATCH_SIZE
                        batch size to use during training [128]
  --nepoch NEPOCH       number of epoch to use during training [100]
  --init_lr INIT_LR     initial learning rate [0.01]
  --init_hid INIT_HID   initial internal state value [0.1]
  --init_std INIT_STD   weight initialization std [0.05]
  --max_grad_norm MAX_GRAD_NORM
                        clip gradients to this norm [50]
  --checkpoint_dir CHECKPOINT_DIR
                        checkpoint directory [checkpoints]
  --data_dir DATA_DIR   data directory [data]
  --data_name DATA_NAME
                        data set name [ptb]
  --is_test IS_TEST     True for testing, False for Training [False]
  --nois_test
  --show SHOW           print progress [False]
  --noshow

(Optional) If you want to see a progress bar, install progress with pip:

$ pip install progress
$ python main.py --nhop 6 --mem_size 100 --show True

After training is finished, you can test and validate with:

$ python main.py --is_test True --show True

The training output looks like:

$ python main.py --nhop 6 --mem_size 100 --show True
Read 929589 words from data/ptb.train.txt
Read 73760 words from data/ptb.valid.txt
Read 82430 words from data/ptb.test.txt
{'batch_size': 128,
'data_dir': 'data',
'data_name': 'ptb',
'edim': 150,
'init_hid': 0.1,
'init_lr': 0.01,
'init_std': 0.05,
'lindim': 75,
'max_grad_norm': 50,
'mem_size': 100,
'nepoch': 100,
'nhop': 6,
'nwords': 10000,
'show': True}
I tensorflow/core/common_runtime/local_device.cc:25] Local device intra op parallelism threads: 12
I tensorflow/core/common_runtime/direct_session.cc:45] Direct session inter op parallelism threads: 12
Training |################################| 100.0% | ETA: 0s
Testing |################################| 100.0% | ETA: 0s
{'perplexity': 507.3536108810464, 'epoch': 0, 'valid_perplexity': 285.19489755719286, 'learning_rate': 0.01}
Training |################################| 100.0% | ETA: 0s
Testing |################################| 100.0% | ETA: 0s
{'perplexity': 218.49577035468886, 'epoch': 1, 'valid_perplexity': 231.73457031084268, 'learning_rate': 0.01}
Training |################################| 100.0% | ETA: 0s
Testing |################################| 100.0% | ETA: 0s
{'perplexity': 163.5527845871247, 'epoch': 2, 'valid_perplexity': 175.38771414841014, 'learning_rate': 0.01}
Training |################################| 100.0% | ETA: 0s
Testing |################################| 100.0% | ETA: 0s
{'perplexity': 136.1443535538306, 'epoch': 3, 'valid_perplexity': 161.62522958776597, 'learning_rate': 0.01}
Training |################################| 100.0% | ETA: 0s
Testing |################################| 100.0% | ETA: 0s
{'perplexity': 119.15373237680929, 'epoch': 4, 'valid_perplexity': 149.00768378137946, 'learning_rate': 0.01}
Training |##############                  | 44.0% | ETA: 378s

Performance

The perplexity on the test sets of Penn Treebank corpora.

# of hidden	# of hops	memory size	MemN2N (Sukhbaatar 2015)	This repo.
150	3	100	122	129
150	6	150	114	in progress

Author

Taehoon Kim / @carpedm20

memn2n-tensorflow's People

Contributors

Stargazers

Watchers

Forkers

liangkai kastnerkyle ml-lab kjw0612 peratham mbosnjak zhangkom tammyyang evilmucedin sandy4321 zentechthaingo mkolod iscienceluvr taktak1 mathewlee11 caomw dapeng2018 ml-ai-nlp-ir rtvt123 californiacoffee pombredanne angelikilazaridou denmoroz desperado1992 jiangts zh4ngx zbxzc35 cyloss chagge robsalzwedel binbinbian likaiguo seong889 zuiwufenghua m4t1ss j-min tpnguyen codeaudit liyi193328 dillonalaird datavizweb meteora9479 vyraun carjun mihail911 tigerneil shihmengli tonydeep juxiangwu mjk276 ilyeong-ai liweitj47 qiuyuew bexcite zihaow21 bodidze reveal-cvpr digideskio tifoit elnaaz rubeeny likebullet86 jungel2star zshwuhan moses1994 pku-wuwei calvin-cmu andrei-pokrovsky akbari59 douglas01996 wonyonyon yiqingyang2012 ynswon fundou lxh-123 abhishekraok omprakash95 fuhuamosi mylearning2017 lucasmahieu summatic francescoalb zhangjiulong sszzsupersupersupersuper pcgreat nagyistge zxsted songweiping jaredwei01 gabrielhuang yangliuy rzhao1 wavelets jniva maozhiqiang vidhijain ricelingz ricegithup anthonylife shuidongliu

memn2n-tensorflow's Issues

How to choose/calculate context in order to get better result?

In the code of this repo, context matrix of shape [batch_size, mem_size] is chosen randomly as below
m = random.randrange(self.mem_size, len(data)) target[b][data[m]] = 1 context[b] = data[m - self.mem_size:m]
My quesiton (I am sorry it is not actually an 'issue' but my personal quesion) is what approaches I can take to get better result rather than just random?
Any kind of material that is helpful is welcomed :)

segmentation fault issue

(tensorflow09GPU)➜  MemN2N-tensorflow git:(master) python main.py --nhop 6 --mem_size 100
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcuda.so locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcurand.so locally
Read 929589 words from data/ptb.train.txt
Read 73760 words from data/ptb.valid.txt
Read 82430 words from data/ptb.test.txt
{'batch_size': 128,
 'checkpoint_dir': 'checkpoints',
 'data_dir': 'data',
 'data_name': 'ptb',
 'edim': 150,
 'init_hid': 0.1,
 'init_lr': 0.01,
 'init_std': 0.05,
 'is_test': False,
 'lindim': 75,
 'max_grad_norm': 50,
 'mem_size': 100,
 'nepoch': 100,
 'nhop': 6,
 'nwords': 10000,
 'show': False}
[1]    9730 segmentation fault (core dumped)  python main.py --nhop 6 --mem_size 100

Tensorflow version ?

Can you pls add a requirements.txt ?

absl.flags._exceptions.UnrecognizedFlagError: Unknown command line flag 'nwords'

how to solve this?

What is the goal of this MemN2N?

ptb data is not the QA task, so it makes me confused.
Thank you @carpedm20

from past.builtins import xrange ImportError: No module named 'past'

I am a student of Yunnan University in China and learning the code ,could you help me get the 'past.builtins'? Thangks

Exception: [!] Directory checkpoints not found

envy@ub1404:/media/envy/data1t/os_prj/github/MemN2N-tensorflow$ PYTHONPATH=~/os_prj/github/tensorflow/_python_build python main.py --nhop 6 --mem_size 100
Read 929589 words from data/ptb.train.txt
Read 73760 words from data/ptb.valid.txt
Read 82430 words from data/ptb.test.txt
{'batch_size': 128,
'checkpoint_dir': 'checkpoints',
'data_dir': 'data',
'data_name': 'ptb',
'edim': 150,
'init_hid': 0.1,
'init_lr': 0.01,
'init_std': 0.05,
'is_test': False,
'lindim': 75,
'max_grad_norm': 50,
'mem_size': 100,
'nepoch': 100,
'nhop': 6,
'nwords': 10000,
'show': False}
Traceback (most recent call last):
File "main.py", line 52, in
tf.app.run()
File "/home/envy/os_pri/github/tensorflow/_python_build/tensorflow/python/platform/default/_app.py", line 30, in run
sys.exit(main(sys.argv))
File "main.py", line 43, in main
model = MemN2N(FLAGS, sess)
File "/media/envy/data1t/os_prj/github/MemN2N-tensorflow/model.py", line 26, in init
raise Exception(" [!] Directory %s not found" % self.checkpoint_dir)
Exception: [!] Directory checkpoints not found
envy@ub1404:/media/envy/data1t/os_prj/github/MemN2N-tensorflow$

Weight sharing missing?

Hi,

Thanks for the code. I notice that in the original code, the weights of C are shared, as in https://github.com/facebook/MemNN/blob/946c4784b59dcf053bbbbb9637d6814bc152c276/MemN2N-lang-model/model.lua#L67-L74

However, I cannot find the weight sharing part in this code. Did I miss something? Thank you.

which version of tensorflow of this project?

Directory not found

Hi, thanks for the code.

I'm getting the following error when running main

Traceback (most recent call last):
  File "main.py", line 52, in <module>
    tf.app.run()
  File "/home/eders/anaconda/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 30, in run
    sys.exit(main(sys.argv))
Traceback (most recent call last):
  File "main.py", line 52, in <module>
    tf.app.run()
  File "/home/eders/anaconda/lib/python2.7/site-packages/tensorflow/python/platform/default/_app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "main.py", line 43, in main
    model = MemN2N(FLAGS, sess)
  File "/home/eders/python/MemN2N-tensorflow/model.py", line 26, in __init__
    raise Exception(" [!] Directory %s not found" % self.checkpoint_dir)
Exception:  [!] Directory checkpoints not found

I tried to use --data_dir but that didn't work either.

Global gradients clipping should be used in stead of clipping each matrix separately.

I found your code measuring norms and clipping gradients for each parameter separately. But in the paper, the authors said "the l2 norm of the whole gradient of all parameters..." and your method was used in QA tasks.

is this possible to modeling multi turn dialog.......here is new dataset

https://nlp.stanford.edu/blog/a-new-multi-turn-multi-domain-task-oriented-dialogue-dataset/

How to use this model?

I don't know how to use this model.
I need a code to answer questions.

ex)
context{ Sam walks into the kitchen
Sam picks up an apple
Sam walks into the bedroom
Sam drops the apple
}

     Q: Where is the apple?
     A: Bedroom

please exam code

Issue with argument 'adjoint_b'

Hello @carpedm20 , thanks for this project, it is helping me get better insight into the paper :)
Quick question: during re-implementation of the code, I am getting the following error:

Aout = tf.matmul(self.hid3dim, Ain, adjoint_b=True) TypeError: matmul() got an unexpected keyword argument 'adjoint_b'

Any idea what the cause could be?

Thanks a ton :)