loudinthecloud / pytorch-ntm Goto Github PK

View Code? Open in Web Editor NEW

578.0 13.0 127.0 7.88 MB

Neural Turing Machines (NTM) - PyTorch Implementation

License: BSD 3-Clause "New" or "Revised" License

Jupyter Notebook 90.22% Python 9.78%

pytorch ntm python notebook neural-network neural-turing-machines lstm

pytorch-ntm's People

Contributors

Stargazers

Watchers

Forkers

ml-lab cometyang rguo12 benjamesbabala allensmile codeaudit lgstd b2220333 resurgo-genetics kastnerkyle jayparks hrishikeshvganu onisimchukv devsinghsachan shubhampachori12110095 herbertchen1 vincentzhang rzel romilly vuonglong leej35 gbmarc1 newebug timforby simonguiroy josephdviviano louis-udm atremblay alexudem imanneelmaachi rheadau samithaj edbeeching bayesianbrad 82magnolia marcwww cupwater sufeidechabei heykeetae lhryang afcarl singhgautam marikgoldstein jasperzhong gaozhen0816 shagunsodhani jakestevens ishmaelbelghazi sumhncku imyutaro bigheiniu shwinshaker inguhu niudong1001 geektoni chunrzou bencolburn123 vikasverma1077 agbgarg gusuperstar beekbin youarerare chopper-leey shenjiangqiu colinqiyangli avinash-chouhan mercerai tacalvin narroyo1 pranoy-panda liangsheng yifanzh3 yfzhoucs liaopeiyuan taaccoo-beta geekypathak21 cchen23 alanhome rohilbadkundri jiachenlei mahi97 dongzwhitsz smoralesduarte jwill1994 felixjchen sinanonur tipsyfermion bhuvansingla vaibhavjindal ss872 karthikedar nilaybeniwal trendingtechnology lsaldyt baojiazhong mnomanalmani syboomsy wolflegend99 xiaosonggege icloudsong

pytorch-ntm's Issues

Cannot reproduce README.md graphs/results on copy task [Commit: 5c5ce66]

Hello,

I'm just trying to reproduce the results/graphs shown in the README.md for the "copy" task.

I am running this on the latest master branch:
Commit 5c5ce66376e8032c38ef4327ca381fee145f4d0f

How I trained my model:

./train.py --seed 1000 --task copy --checkpoint_interval 500 --checkpoint-path ./notebooks/copy -pbatch_size=15

NOTE: I used a batch_size of 15 instead of the default of 1, since it seems to lead to more stable convergence rates.

I then used the python notebook to generate 3 graphs shown in the README.md.
For convenience when comparing, I've included both the graphs I got, and the graph I expect (taken from the README.md).

Graph 1: Training convergence

I got:

I expect:

Graph 2: Training convergence (per sequence length)

I got:

I expect:

Graph 3: Evaluate

I got:

(The expected graph being that Outputs matches the Targets)

Setup information

My setup is:

OS: Ubuntu 16.04.4 LTS
Python version: 3.6.4
PyTorch is installed using Anaconda, pip freeze reports the version as:
torch==0.3.0.post4
CUDA/cuDNN versions/libraries in use by pytorch at runtime:

/usr/lib/x86_64-linux-gnu/libcuda.so.384.111
/usr/local/cuda-9.0/lib64/libcublas.so.9.0.176
/usr/local/cuda-9.0/lib64/libcudart.so.9.0.176
/usr/local/cuda-9.0/lib64/libcudnn.so.7.0.5
/usr/local/cuda-9.0/lib64/libcurand.so.9.0.176
/usr/local/cuda-9.0/lib64/libcusparse.so.9.0.176
/usr/local/cuda-9.0/lib64/libnvrtc.so.9.0.176
/usr/local/cuda-9.0/lib64/libnvToolsExt.so.1.0.0

Let me know if there is any other information I can provide.

What's the meaning of using memory?

Dear sir:
I have read your code and I really appreciate your work.But I have get some questions.

self.register_buffer('mem_bias',torch.Tensor(N,M)) #mem_bias was used as buffer, which means it would not be update
self.memory =self.mem_bias.clone().repeat(batch_size,1,1) # self.memory was create using mem_bias to match the batch_size
for each batch, we run init_squence(),

which reset the memory,and the reset function ,
self.batch_size = batch_size
self.memory = self.mem_bias.clone().repeat(batch_size, 1, 1)

just clean all the content in the memory and initialize it with mem_bias.
So what's the point to write and read from memory ? it just be the same with mem_bias each batch and mem_bias is not updated which means it's never changed.
I think I just could not figure it out and I would readlly appreciate it if you could answer my question.

Why did each batch has a memory?

Dear author:
Your code of NTM is a pretty work, and its structure is concise and easy to follow. But I have a little confusion about why each batch has a memory? Why not using only a memory for every batch, just like a LSTM but just expanding the memory cell size? Could you help me address this confusion? Thank you very much!

Strange fluctuation on curves even after large #seqs have been trained with

(random seed=10)
As the plots show, after 120,000 seqs, there still occurs some fluctuation of cost, which seems not to match that of the results in your experiments and the original authors'.
What could probably be the reasons?
How to copy with this?
THANKS A LOT.

Convergence is really slow with copy task when sequence length is smaller

Hi,

I have tried to run the copy task with the default parameters (controller_size=100, controller_layers=1, num_heads=1, sequence_width=8, sequence_min_len=1, sequence_max_len=20, memory_n=128, memory_m=20, batch_size=1), the result is similar to the one in the notebook. However, when I changed the sequence length to a smaller one (sequence_min_len=1, sequence_max_len=5), the fitting rate is really slow (like the figure below) which is unexpected since smaller sequence should be learned faster. Do you have any idea why this happen and how to train smaller sequences properly? Any suggestion is welcomed.

How to change code when each sequence length is different

Dear sir:
Sorry for my rude words.But I do really want to know how to change the code if the sequence length is different.It seems that ntm need sequence length to be the same.
I tried to fill short sequence with zero but the ntm use attention and if filled with zero the predict rate down.
I want to know how could I do to overcome the problem.Please help me.
Best wishes to you.

How could u write down beta, gamma in your source code?

Hi, I found your code has the latin characters $\beta$, $\gamma$. It's wired and interesting for me. Could you explain how to achieve this?

Why use vague storage

Dear sir:
Viewing the code,for each input batch, B C,it would be stored as B N M.All the N
weight sum to 1.It actually split C into N pieces C.It's called vague storage.But I don't know why user this kind of storage.What's the advantage of this storage?

It would be nice if you can add the cuda option

So people can train on gpu easily.

register_buffer

I have an issue regarding the training, I launch the program I get :

KeyError: "attribute 'mem_bias' already exists"

It seems that you define 2 times mem_bias in memory.py :

self.mem_bias = Variable(torch.Tensor(N, M))
self.register_buffer('mem_bias', self.mem_bias.data)

It is perhaps a problem with the pytorch version (I have pytorch 0.3).

Different results between testing in the mid terms of training and at the end of training

Dear author,

I've fork a repo(at https://github.com/marcwww/pytorch-ntm) from your work, mainly expecting to test the model on longer sequences(for example, training on sequences of length ranging from 1 to 10, and testing on seqs of length ranging from 11 to 20).

The question is that the final testing result after training without testing in the middle terms of the training process is different from that with testing in the middle terms. The experiment setting of the repo is the latter one (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/train_test.py#L236).

In the forked repo batches for testing are sampled in the same way of ones for training (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/tasks/copytask_test.py#L16). Actually, I've tried to see whether the result are from the intertwined sampling of training and testing by loading a pre-generated test set, and it does not help.

Could you please help me with this? Thanks a lot.

why need to initialize read vector and memory?

Dear author:
I found you initialize read vector and memory as :

		self.register_buffer('mem_bias', torch.Tensor(N, M))

		# Initialize memory bias
		stdev = 1 / (np.sqrt(N + M))
		nn.init.uniform_(self.mem_bias, -stdev, stdev)

and

init_r_bias = torch.randn(1, M).to('cuda') * 0.01
# the initial value of read vector is not optimized.
self.register_buffer("read{}_bias".format(self.num_read_heads), init_r_bias)

I wonder whether the initialization scheme will make a big difference,
or I can just all initialized to torch.zeros()??

Why it runs so slowly?

error in copy-task-plots.ipynb

when run the code piece

seq_len = 60
_, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
result = evaluate(model.net, model.criterion, x, y)
y_out = result['y_out']

there comes the error information:

IndexError                                Traceback (most recent call last)
<ipython-input-41-127bd44fb490> in <module>()
      1 seq_len = 60
      2 _, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
----> 3 result = evaluate(model.net, model.criterion, x, y)
      4 y_out = result['y_out']

D:\GithubProjs\pytorch-ntm-master\train.py in evaluate(net, criterion, X, Y)
    151 
    152     result = {
--> 153         'loss': loss.data[0],
    154         'cost': cost / batch_size,
    155         'y_out': y_out,

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

how to solve it?

Why to create a new tensor?

Hi, dear author:

    def write(self, w, e, a):
        """write to memory (according to section 3.2)."""
        self.prev_mem = self.memory
        self.memory = Variable(torch.Tensor(self.batch_size, self.N, self.M))
        erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
        add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
        self.memory = self.prev_mem * (1 - erase) + add

In your writing method, I dont understand why u create a new Variable(torch.Tensor(self.batch_size, self.N, self.M)) and then assign the new value,
Why not write as following directly :

    def write(self, w, e, a):
        """write to memory (according to section 3.2).""" 
        erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
        add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
        self.memory = self.memory * (1 - erase) + add

How to visual the training process and draw these animation?

Dear author,
I found your readme is very easy to follow and the animation of training process is vivid.
I wonder how to draw these pictures and animation?