localminimum / qanet Goto Github PK

A Tensorflow implementation of QANet for machine reading comprehension

License: MIT License

Python 96.50% Shell 1.26% HTML 2.23%

nlp squad tensorflow machine-comprehension cnn

qanet's Introduction

QANet

A Tensorflow implementation of Google's QANet (previously Fast Reading Comprehension (FRC)) from ICLR2018. (Note: This is not an official implementation from the authors of the paper)

I wrote a blog post about implementing QANet. Check out here for more information!

Training and preprocessing pipeline have been adopted from R-Net by HKUST-KnowComp. Demo mode is working. After training, just use python config.py --mode demo to run an interactive demo server.

Due to a memory issue, a single head dot-product attention is used as opposed to a 8 heads multi-head attention like in the original paper. The hidden size is also reduced to 96 from 128 due to usage of a GTX1080 compared to a P100 used in the paper. (8GB of GPU memory is insufficient. If you have a 12GB memory GPU please share your training results with us.)

Currently, the best model reaches EM/F1 = 70.8/80.1 in 60k steps (6~8 hours). Detailed results are listed below.

Dataset

The dataset used for this task is Stanford Question Answering Dataset. Pretrained GloVe embeddings obtained from common crawl with 840B tokens used for words.

Requirements

Python>=2.7
NumPy
tqdm
TensorFlow>=1.5
spacy==2.0.9
bottle (only for demo)

Usage

To download and preprocess the data, run

# download SQuAD and Glove
sh download.sh
# preprocess the data
python config.py --mode prepro

Just like R-Net by HKUST-KnowComp, hyper parameters are stored in config.py. To debug/train/test/demo, run

python config.py --mode debug/train/test/demo

To evaluate the model with the official code, run

python evaluate-v1.1.py ~/data/squad/dev-v1.1.json train/{model_name}/answer/answer.json

The default directory for the tensorboard log file is train/{model_name}/event

Run in Docker container (optional)

To build the Docker image (requires nvidia-docker), run

nvidia-docker build -t tensorflow/qanet .

Set volume mount paths and port mappings (for demo mode)

export QANETPATH={/path/to/cloned/QANet}
export CONTAINERWORKDIR=/home/QANet
export HOSTPORT=8080
export CONTAINERPORT=8080

bash into the container

nvidia-docker run -v $QANETPATH:$CONTAINERWORKDIR -p $HOSTPORT:$CONTAINERPORT -it --rm tensorflow/qanet bash

Once inside the container, follow the commands provided above starting with downloading the SQuAD and Glove datasets.

Pretrained Model

Pretrained model weights are temporarily not available.

Detailed Implementaion

The model adopts character level convolution - max pooling - highway network for input representations similar to this paper by Yoon Kim.
The encoder consists of positional encoding - depthwise separable convolution - self attention - feed forward structure with layer norm in between.
Despite the original paper using 200, we observe that using a smaller character dimension leads to better generalization.
For regularization, a dropout of 0.1 is used every 2 sub-layers and 2 blocks.
Stochastic depth dropout is used to drop the residual connection with respect to increasing depth of the network as this model heavily relies on residual connections.
Query-to-Context attention is used along with Context-to-Query attention, which seems to improve the performance more than what the paper reported. This may be due to the lack of diversity in self attention due to 1 head (as opposed to 8 heads) which may have repetitive information that the query-to-context attention contains.
Learning rate increases from 0.0 to 0.001 in the first 1000 steps in inverse exponential scale and fixed to 0.001 from 1000 steps.
At inference, this model uses shadow variables maintained by the exponential moving average of all global variables.
This model uses a training / testing / preprocessing pipeline from R-Net for improved efficiency.

Results

Here are the collected results from this repository and the original paper.

Model	Training Steps	Size	Attention Heads	Data Size (aug)	EM	F1
My model	35,000	96	1	87k (no aug)	69.0	78.6
My model	60,000	96	1	87k (no aug)	70.4	79.6
My model ( reported by @jasonbw)	60,000	128	1	87k (no aug)	70.7	79.8
My model ( reported by @chesterkuo)	60,000	128	8	87k (no aug)	70.8	80.1
Original Paper	35,000	128	8	87k (no aug)	NA	77.0
Original Paper	150,000	128	8	87k (no aug)	73.6	82.7
Original Paper	340,000	128	8	240k (aug)	75.1	83.8

TODO's

Training and testing the model
Add trilinear function to Context-to-Query attention
Apply dropouts + stochastic depth dropout
Query-to-context attention
Realtime Demo
Data augmentation by paraphrasing
Train with full hyperparameters (Augmented data, 8 heads, hidden units = 128)

Tensorboard

Run tensorboard for visualisation.

$ tensorboard --logdir=./

qanet's People

Contributors

Stargazers

Watchers

Forkers

huangpeng1126 caoxu915683474 ioana-blue pengfight lirongming 3dmm-icme2023 kamalkraj cosecant-csc cutecha arfu2016 njust-taoye shuang0420 shubhampachori12110095 troflow jasonshiyong asherchan zhihaosun libertatis gauravyeole lujunru judelee19 simplejian los-phoenix cyzhangathit liu4lin yzx1992 nitish166 qhduan little1tow shlpu wanghm92 yuhuizhou vpegasus zwjyyc statml bellamkondaprakash arvindsg jasonwbw springbarley zzmjohn icewwn 312shan webblearning yucoian hunslater-deeplearning antriv shuaiyan terencezhou taghialiyev mennianshi michael-wzhu fengxhao jimmyzhangbupt mysqlsc aiedward stanstarks misoknisky annding zgsxwsdxg augmen txye dailyactie wangsc522 chenghuige dylan-fan qitong sunnymarkliu helenailse milamila56 ubermenschlzy andrefsp newenglandml i-lovelife xiaonainiu javacjh babylls joyle readzw repletetop amandalmia14 ewrfcas reloadbrain toddmorrill matejkvassay imdahmash vangogh0318 jadielam casillas-qf openhushen kimnt93 waveli123 ldruth28 jeremycchsu db-li hryym sanwushuosi studydeeplearningai wolfhu senseinfosys-indra-firmansyah roshanraj

qanet's Issues

layer normalization in layer?

https://github.com/NLPLearn/QANet/blob/8107d223897775d0c3838cb97f93b089908781d4/layers.py#L52

execuse me, in the paper "Layer Normalization,Lei Jimmy Ba, Ryan Kiros, and Geoffrey E. Hinton", it said that the mean and variance is computed over all the hidden units in the same layer, and different training cases have different normalization terms. So I think the mean should be computed like this:

axes = list(range(1, x.shape.ndims))
mean = tf.reduce_mean(x, axes)

So the shape of mean is [batch,]. also the variance is [batch,]
and then feed them to compute the normlized x.

In the tensorflow api of layer normalization, the source code is below, and I think it is the same with mine.
norm_axes = list(range(begin_norm_axis, inputs_rank))
https://github.com/tensorflow/tensorflow/blob/c19e29306ce1777456b2dbb3a14f511edf7883a8/tensorflow/contrib/layers/python/layers/layers.py#L2311

AssertionError

Hi,
when i try to run your code ,I got an error:
Reducing Glove Matrix
100%|█████████████████████████████████████████████████| 442/442 [01:32<00:00, 4.79it/s]
100%|███████████████████████████████████████████████████| 48/48 [00:10<00:00, 4.43it/s]
Processing 91600 vocabs
Total number of lines: 91604
Reduced vocab size: 91604
Reading GloVe from: ./glove.840B.300d.txt
Processing line 91600
Reading GloVe from: ./glove.840B.300d.char.txt

Tokenizing training data.
100%|█████████████████████████████████████████████████| 442/442 [01:25<00:00, 5.19it/s]
Tokenizing dev data.
100%|███████████████████████████████████████████████████| 48/48 [00:10<00:00, 4.77it/s]
Tokenizing complete
Processing 91600 vocabsTraceback (most recent call last):
File "process.py", line 377, in
main()
File "process.py", line 371, in main
load_glove(Params.glove_dir,"glove",vocab_size = Params.vocab_size)
File "process.py", line 203, in load_glove
assert 0
AssertionError
can you tell me why this happend?

Unable to load pre-trained weights

Steps i have done
1.Cloned the repo and downloaded weights
2.sh download.sh
3.run python config.py --mode prepro
4. run python config.py --mode demo
error

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/kamalraj/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/kamalraj/anaconda2/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/mnt/d/ML exp/Fast-Reading-Comprehension/demo.py", line 74, in demo_backend
    saver.restore(sess, tf.train.latest_checkpoint(config.save_dir))
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1755, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1137, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
    options, run_metadata)
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
    raise type(e)(node_def, op, message)
InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [91588,300] rhs shape= [91589,300]
         [[Node: save/Assign_375 = Assign[T=DT_FLOAT, _class=["loc:@word_mat"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:C
PU:0"](word_mat, save/RestoreV2:375)]]

Caused by op u'save/Assign_375', defined at:
  File "/home/kamalraj/anaconda2/lib/python2.7/threading.py", line 774, in __bootstrap
    self.__bootstrap_inner()
  File "/home/kamalraj/anaconda2/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/home/kamalraj/anaconda2/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/mnt/d/ML exp/Fast-Reading-Comprehension/demo.py", line 73, in demo_backend
    saver = tf.train.Saver()
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1293, in __init__
    self.build()
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1302, in build
    self._build(self._filename, build_save=True, build_restore=True)
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1339, in _build
    build_save=build_save, build_restore=build_restore)
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 796, in _build_internal
    restore_sequentially, reshape)
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 471, in _AddRestoreOps
    assign_ops.append(saveable.restore(saveable_tensors, shapes))
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 161, in restore
    self.op.get_shape().is_fully_defined())
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 280, in assign
    validate_shape=validate_shape)
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 58, in assign
    use_locking=use_locking, name=name)
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
    op_def=op_def)
  File "/home/kamalraj/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1650, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [91588,300] rhs shape= [91589,300]
         [[Node: save/Assign_375 = Assign[T=DT_FLOAT, _class=["loc:@word_mat"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:C
PU:0"](word_mat, save/RestoreV2:375)]]

conv_block problem

Why is the dropout here not every resudial block?

Training stops after some time

Hello everyone,

I've been trying to train a model with different num_heads, hidden and num_steps parameters.
The default parameters in config.py works like a charm but once I change the mentioned parameters, I get this:

Exception ignored in: <bound method tqdm.__del__ of  42%|██████████████████████▉                                | 49999/120000 [15:34:24<18:06:29,  1.07it/s]>
Traceback (most recent call last):█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 328/328 [02:05<00:00,  2.53it/s]
  File "/home/username/.virtualenvs/qanet/lib/python3.5/site-packages/tqdm/_tqdm.py", line 889, in __del__
    self.close()
  File "/home/username/.virtualenvs/qanet/lib/python3.5/site-packages/tqdm/_tqdm.py", line 1095, in close
    self._decr_instances(self)
  File "/home/username/.virtualenvs/qanet/lib/python3.5/site-packages/tqdm/_tqdm.py", line 454, in _decr_instances
    cls.monitor.exit()
  File "/home/username/.virtualenvs/qanet/lib/python3.5/site-packages/tqdm/_monitor.py", line 52, in exit
    self.join()
  File "/usr/lib/python3.5/threading.py", line 1051, in join
    raise RuntimeError("cannot join current thread")
RuntimeError: cannot join current thread

This occured when I set num_head to 2, 4 and 8. I could train up to 50k and 54k steps when num_head was set to 2 and 4, and it failed from the starts when num_head was set to 8.

I'm using Ubuntu 16.04, Python 3.5.2 and training the network on a GPU. Here's the nvidia-smi and nvcc --version output if someone needs it:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.44                 Driver Version: 396.44                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   72C    P0    63W / 149W |      0MiB / 11441MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

So what could be the real cause of this error?

Thanks in advance!

dev set evaluation

My test and dev sets are same. But I get different results from training check point evaluation vs running config.py in test mode.

Ideally it should give same results because we are loading the saved model and running it on dev file again ?

-flags.DEFINE_integer("bucket_range", [40, 401, 40], "the range of bucket")

should be:

-flags.DEFINE_list("bucket_range", [40, 401, 40], "the range of bucket")

OOM error while training

What are the specification of System you used for training ?

Can you share a pre-trained model weights ?

No such file or directory: 'demo.html'

I am trying to run the interactive server, but when I navigate to the server URL, the page throws up a 500 code error (Internal Server Error).

The trace for the error is:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/bottle.py", line 862, in _handle
    return route.call(**args)
  File "/usr/local/lib/python3.6/dist-packages/bottle.py", line 1740, in wrapper
    rv = callback(*a, **ka)
  File "/home/rudresh/Documents/machine_comprehension/Fast-Reading-Comprehension/demo.py", line 25, in home
    with open('demo.html', 'r') as fl:
FileNotFoundError: [Errno 2] No such file or directory: 'demo.html'
127.0.0.1 - - [07/Apr/2018 10:07:55] "GET / HTTP/1.1" 500 739
127.0.0.1 - - [07/Apr/2018 10:07:56] "GET /favicon.ico HTTP/1.1" 404 740
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/bottle.py", line 862, in _handle
    return route.call(**args)
  File "/usr/local/lib/python3.6/dist-packages/bottle.py", line 1740, in wrapper
    rv = callback(*a, **ka)
  File "/home/rudresh/Documents/machine_comprehension/Fast-Reading-Comprehension/demo.py", line 25, in home
    with open('demo.html', 'r') as fl:
FileNotFoundError: [Errno 2] No such file or directory: 'demo.html'

Which is obviously caused by the missing demo.html file. Can you please help me out with where do I procure the file from?

I am running a Python3.6 on Ubuntu 16.04.

Results of the original paper

Thanks for this great implementation. I noticed that you mentioned in the README file that the original system can achieve EM: 72.5, F1: 81.4 after 150,000 training steps, and EM: 76.2, F1: 84.6 after 340,000 training steps. But I didn't find this information in the original paper. It seems that the original system takes much longer time to train? Could you show me where to get this information? Or did you infer that from other statistics?

mask_logits in layer.py

I think the line "return inputs + mask_value * (1 - mask)" should be "return inputs*mask + mask_value * (1 - mask)"

how to predict answers for custom question and context by reusing loaded model

Hello All,

I have many json files whose format are the same as the standard train file or dev file so can i feed that to this network and predict to get the answers for different input questions and contexts?

Thanks,
Sachin B. Ichake

Trainable Embedding for OOV words

Hello,

I have one doubt over your code: in your code, all OOV words are represented by id 1, which means, all OOV words are considered the same word, and its embedding is a zero vector. Also, this embedding will not be updated during training. However, in the original paper, the author mentioned that for OOV words, the word embeddings are updated during training.

I think this may be a reason why the score is lower than the original paper.

how to adapt it for squad2.0 dataset?

Train Models With Macs

Hi. I have Macbook Air(Mid 2017) and I want to train data. So it haven't a GPU so without GPU how can I train model?

Is this snippet in prepro.py correct

 for token in context_tokens:
                    word_counter[token] += len(para["qas"])
                    for char in token:
                        char_counter[char] += len(para["qas"])

Should it be +=1?

TODOs

This is an umbrella issue where we can collectively tackled some problems and improve general open source reading comprehension quality.

Goal
The network is already there. We just need to add more features on top of the current model.

Implement full features stated in the original paper
Achieve EM/F1 performance stated in the original paper with a single model settings

Model

Increase the hidden units to 128. #15 reported performance increase when the hidden units increased from 96 to 128
Increase the number of heads to 8
Add dropouts in better locations to maximize regularization
Train "unknown" word embedding

Data

Implement paraphrasing by back-translation to increase the data size

Contribution to any of these issues is welcome and please comment on this issue and let us know if you want to work on these problems.

Speed ?

For num_heads 1, hidden size 96, seems not faster then HKUST rnet ?
With batch size 64 , 1.42 batch/s while HKUST RNET with 2.4+ batch/s
Though HKUST RNET default use char dim only 8 , here we use 64 but still I think QANet not as fast as which google show in the paper ?

possibly insufficient driver version:

Hi，

i meet runtime error, in sess.run([]),

2018-03-23 11:55:55.959752: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2018-03-23 11:55:55.959887: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module  384.69  Wed Aug 16 19:34:54 PDT 2017
GCC version:  gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) 
"""
2018-03-23 11:55:55.959970: E tensorflow/stream_executor/cuda/cuda_dnn.cc:393] possibly insufficient driver version: 384.69.0
2018-03-23 11:55:55.959998: E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2018-03-23 11:55:55.960028: F tensorflow/core/kernels/conv_ops.cc:717] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
2018-03-23 11:55:55.960040: E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
Aborted (core dumped)

tensorflow version: 1.5.0
CUDA: 9.0
Cudnn: 7.0
Driver Version: 384.69

Can i ask your versions, should i update my driver version, or may only change some model code?
it works without gpu.

Thanks

tensorflow not in requirements.txt

problem about highwaynet

In highway network, H is a non_linear function. But in this report，H is a linear function. why this is? thanks!

RuntimeError('cannot join current thread',) in <object repr() failed>

(.venv) ub16c9@ub16c9-gpu:~/ub16_prj/QANet$ python config.py --mode train
Building model...
WARNING:tensorflow:From /home/ub16c9/ub16_prj/QANet/layers.py:52: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /home/ub16c9/ub16_prj/QANet/model.py:134: calling softmax (from tensorflow.python.ops.nn_ops) with dim is deprecated and will be removed in a future version.
Instructions for updating:
dim is deprecated, use axis instead
WARNING:tensorflow:From /home/ub16c9/ub16_prj/QANet/model.py:174: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

Total number of trainable parameters: 788673
2018-12-29 11:14:48.345129: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2018-12-29 11:14:48.431530: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-12-29 11:14:48.431955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.6575
pciBusID: 0000:01:00.0
totalMemory: 10.92GiB freeMemory: 10.43GiB
2018-12-29 11:14:48.431971: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2018-12-29 11:14:48.733045: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-12-29 11:14:48.733079: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2018-12-29 11:14:48.733085: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2018-12-29 11:14:48.733318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10086 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-12-29 11:14:50.042331: W tensorflow/core/framework/allocator.cc:122] Allocation of 109906800 exceeds 10% of system memory.
2018-12-29 11:14:50.174758: W tensorflow/core/framework/allocator.cc:122] Allocation of 109906800 exceeds 10% of system memory.
2018-12-29 11:14:50.507489: W tensorflow/core/framework/allocator.cc:122] Allocation of 109906800 exceeds 10% of system memory.
2018-12-29 11:14:50.691090: W tensorflow/core/framework/allocator.cc:122] Allocation of 109906800 exceeds 10% of system memory.
2018-12-29 11:14:50.825623: W tensorflow/core/framework/allocator.cc:122] Allocation of 109906800 exceeds 10% of system memory.
55%|██████████████████████████████████████████████████████████████████████████████████████▏ | 32935/60000 [3:15:35<2:19:53, 3.22it/s] 90%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 53999/60000 [5:17:29<29:48, 3.36it/sException RuntimeError: RuntimeError('cannot join current thread',) in <object repr() failed> ignored██████████████████████████████████████████████████████████████████████| 328/328 [00:36<00:00, 9.07it/s]
(.venv) ub16c9@ub16c9-gpu:~/ub16_prj/QANet$

Try to test ELMO language model

Hi all,
As both results from Google Brain team and AllenNPL, using ELMO can give a big boost in result. I noticed that AllenNLP provides some pretrained model of ELMO. I would love to see some better results.
Thanks.
[1] QANet slide
[2] ELMO page

about embedding matrix structure

Thanks for the brillient code.
I have noticed a santence in the paper:

"all the out-of-vocabulary words are mapped to a token ,whose embedding is trainable with random initialization." which not in your code. (they used a pretrained matrix)That seems make sence.
Do that works for the model?

how to train by changing/adding batch_size?

I am not able to free GPU for training data. So I am planning how to add/update batch _size?

How to start training?

I have read README.md file, but still don't know how to run this project. Can anybody give more instructions?

How to train in Multi GPU

I see that tensorflow detected 2 GPU's but the training is only happening in 1 GPU. Please advise?

Pre-loaded glove char vectors have mismatched tensor shapes

When trying to set the "pretrained_char" as True, the is a tensor reshape size conflict.

glove_char_file = os.path.join('data/glove', "glove.840B.300d-char.txt")
flags.DEFINE_string("glove_char_file", glove_char_file, "Glove character embedding source file")
flags.DEFINE_boolean("pretrained_char", True, "Whether to use pretrained character embedding")

Error is from model.py line 76, below. How can the reshape dimensions be adjusted?

Error:

Traceback (most recent call last):
  File "config.py", line 152, in <module>
    tf.app.run()
  File "/home/my/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "config.py", line 133, in main
    train(config)
  File "QANet/main.py", line 95, in train
    handle: train_handle, model.dropout: config.dropout})
  File "/home/my/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 905, in run
    run_metadata_ptr)
  File "/home/my/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1140, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/my/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    run_metadata)
  File "/home/my/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 2445312 values, but the requested shape has 11462400
	 [[Node: Input_Embedding_Layer/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input_Embedding_Layer/embedding_lookup, Input_Embedding_Layer/Reshape/shape)]]
	 [[Node: Identity/_4743 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_52979_Identity", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op 'Input_Embedding_Layer/Reshape', defined at:
  File "config.py", line 152, in <module>
    tf.app.run()
  File "/home/my/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "config.py", line 133, in main
    train(config)
  File "QANet/main.py", line 72, in train
    model = Model(config, iterator, word_mat, char_mat, graph = g)
  File "QANet/model.py", line 60, in __init__
    self.forward()
  File "QANet/model.py", line 76, in forward
    ch_emb = tf.reshape(tf.nn.embedding_lookup(self.char_mat, self.ch), [N * PL, CL, dc]) # 32*1000?, 16, 64 = 32768000.  Input to reshape is a tensor with 34099200 values, but the requested shape has 7274496.
  File "/home/my/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 5782, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/home/my/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/my/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3290, in create_op
    op_def=op_def)
  File "/home/my/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1654, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 2445312 values, but the requested shape has 11462400
	 [[Node: Input_Embedding_Layer/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Input_Embedding_Layer/embedding_lookup, Input_Embedding_Layer/Reshape/shape)]]
	 [[Node: Identity/_4743 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_52979_Identity", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Can it support Chinese?

I just change nlp = spacy.blank("en") to nlp = spacy.blank("zh")
Is that ok?

The embedding projection

Hi, I have noticed that you have put the input projection before Highway Network. However, in the paper, it is mentioned that the input of Embedding Encoding Layer is a vector of dimension p1+p2=500 for each word, which means that the projection is placed after the Highway Network.

Have you already try this?

This repo cannot reproduce the result of original paper

Thank you for your implementation, it is very helpful for me.
I run this code and can get the similar result when the number of heads equals to 1. But, I cannot get the result of original paper(73.6/82.7) when I use 8 heads, batch size 32, training step 150k, char dimension of 200 (the same setting as the original paper). I can only get around (71.27/80.58).
Same situation was ocurred when I ran the pytorch repo (https://github.com/andy840314/QANet-pytorch-).

Any suggestions?

How do I resume training off a checkpoint?

After training the model to 46 percent there was a power outage. What command do I use to resume training? I'm on checkpoint 26.

Thanks in Advance

Report the results

Model	Training Steps	Size	Attention Heads	Data Size (aug)	EM	F1
My Model	60,000	128	1	87k (no aug)	70.7	79.8

The results are obtained on a K80 machine. I modify the trilinear function for memory efficiency, but the results are the same with the current version of this repository.

I'm not sure about the overfitting, the model is the last checkpoint after training 60,000 steps.

How can I do fine tuning using QANet?

I've trained the QANet model on SQUAD. I wanted to apply this SQUAD trained model to a new dataset using fine tuning. I need to use the weight from this SQUAD trained model as the initialization for the new dataset for training, with a purpose to make the SQUAD model adaptive to the new dataset.

From the train/FRC folder, I can see there are several checkpoint files. Which checkpoint files should I use for initialization of the new model for the new dataset?

Thanks,

https://nlp.stanford.edu/data/glove.840B.300d.zip

First it is a greate job!
The file https://nlp.stanford.edu/data/glove.840B.300d.zip
could not been download,where can I download it?Thanks!

Parameter setting problem

1.What is the meaning of config.hidden used in conv(), and why is the value of kernel size =5 in conv() , is it a parameter that needs to be debugged?

2.Is the conv function pre-packaged with tensorflow, or you need to rewrite it by yourself?

Is the highway function rewritten by yourself? In the original code of BiDAF, the highway function provided by Seo is different from yours. Have you you already tried it, and the effect of Seo is not good.

Is it a better way to use conv layer in highway or encoder block feed forward network rather than dense layer?

The author didn't mention they use conv layer in paper. thanks for any reply!

Train with M40 card but got OOM message

i'm checking this model with M40 device , which is 24G memory on this board.

What's you default batch size used on 1080 card ?? as it seem tf show OOM when i increase batch size to 64 ?

mask_logits function

I don't understand the purpose of "mask_logits" function, which is being used before calling "softmax" function at various places. Can someone please explain.

inconsistency in predictions

We have trained QA net for our own question and answers data. But when we run it in demo mode for prediction it is giving different results for the same question.

Some times it picks correct answer for the same question and some time does not, but ideally it should pick the same answer, right ? Any ideas what could be the reason for this behaviour of trained model ?

I have commented out below section from test/demo code:

"""
if config.decay < 1.0:
sess.run(model.assign_vars)
"""

Memory Issue

I am using AWS p2.xlarge which has Tesla K80.
While training it is still showing memory issue. Why??
It has 11.17 GIGs of memory which displays in my console.
Logs - attached.
logs.txt

TIA

Trying to fine tune with different data, But getting dimensionality mismatched for tensor

I am getting the following error while trying to fine tune

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [326,64] rhs shape= [1427,64]
	 [[Node: save/Assign_746 = Assign[T=DT_FLOAT, _class=["loc:@char_mat"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](char_mat, save/RestoreV2:746)]]

Inference on my machine differs from other machines I tested

Hi, I trained the model on AWS (GPU instance) for 60K steps and got the model. I then tested it on several GPU/CPU instance and results are consistent. When I deploy it locally on my Ubuntu desktop (CPU only), the inferences are totally off. I tested on AWS GPU instance (p2.xlarge), AWS CPU instance (c5d.4xlarge) and also on Colab. All three show consistent answers for a given context and questions. Only on my desktop the answers are way off. Any inputs as to why this could be happening would help. Thanks!

InvalidArgumentError during evaluation

Hello,

for some questions in SQuAD dataset I got exception:

InvalidArgumentError (see above for traceback): num_upper must be negative or less or equal to number of columns (10) got: 30
[[Node: Output_Layer/MatrixBandPart = MatrixBandPart[T=DT_FLOAT, Tindex=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Output_Layer/MatMul, Output_Layer/MatrixBandPart/num_lower, Output_Layer/MatrixBandPart/num_upper)]]

Do you know what is the reason for that? How to get rid of this problem?

word_embed.json missing?

I'm trying to train/demo the code and in both cases, python config.py --mode train and python config.py --mode demo I end up hitting the same error.

The last few bits of the traceback are:

  File "config.py", line 125, in main
    train(config)
  File "/home/arjoonn/Fast-Reading-Comprehension/main.py", line 19, in train
    with open(config.word_emb_file, "r") as fh:
FileNotFoundError: [Errno 2] No such file or directory: 'data/word_emb.json'

I saw some commented out things in the download.sh file, should I be un-commenting those?

Unable to preprocess data

I am getting following error while preprocessing:
Generating word embedding...
13%|#######################3 | 296814/2200000 [00:36<03:52, 8176.12it/s]Traceback (most recent call last):
File "config.py", line 144, in
tf.app.run()
File "C:\Users\chchauha\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run
_sys.exit(main(argv))
File "config.py", line 127, in main
prepro(config)
File "F:\Synapse\QANet-master\prepro.py", line 287, in prepro
word_counter, "word", emb_file=word_emb_file, size=config.glove_word_size, vec_size=config.glove_dim)
File "F:\Synapse\QANet-master\prepro.py", line 99, in get_embedding
vector = list(map(float, array[-vec_size:]))
ValueError: could not convert string to float: 'sania'

training on "Answer not available" ?

Any suggestions on how to train network on "Not available" answer for the questions which cannot be answered from the context.

A tool to generate training data

Snorkel can generate training data, maybe it is useful to data augmentation.
It is using dynamic programming instead of translation twice.

Preprocessing

In the preprocessing mode the execution stops at def build_features()
stating that (example["y1s"][0] - example["y2s"][0]) > ans_ limit
List index out of bound

And later when commenting that statement it moves forward and gives another error at
start, end = example["y1s"][-1], example["y2s"][-1]
List Index out of bound

Please Help. Is it because I am using SQuAD version 2.0?

AttributeError: 'module' object has no attribute 'blank'

i had done:
sudo pip install spacy==2.0.9

mldl@mldlUB1604:~/ub16_prj/Fast-Reading-Comprehension$ python config.py --mode prepro
Traceback (most recent call last):
File "config.py", line 9, in
from prepro import prepro
File "/home/mldl/ub16_prj/Fast-Reading-Comprehension/prepro.py", line 15, in
nlp = spacy.blank("en")
AttributeError: 'module' object has no attribute 'blank'