Giter VIP home page Giter VIP logo

pytorch-ntm's Introduction

PyTorch Neural Turing Machine (NTM)

PyTorch implementation of Neural Turing Machines (NTM).

An NTM is a memory augumented neural network (attached to external memory) where the interactions with the external memory (address, read, write) are done using differentiable transformations. Overall, the network is end-to-end differentiable and thus trainable by a gradient based optimizer.

The NTM is processing input in sequences, much like an LSTM, but with additional benfits: (1) The external memory allows the network to learn algorithmic tasks easier (2) Having larger capacity, without increasing the network's trainable parameters.

The external memory allows the NTM to learn algorithmic tasks, that are much harder for LSTM to learn, and to maintain an internal state much longer than traditional LSTMs.

A PyTorch Implementation

This repository implements a vanilla NTM in a straight forward way. The following architecture is used:

NTM Architecture

Features

  • Batch learning support
  • Numerically stable
  • Flexible head configuration - use X read heads and Y write heads and specify the order of operation
  • copy and repeat-copy experiments agree with the paper

Copy Task

The Copy task tests the NTM's ability to store and recall a long sequence of arbitrary information. The input to the network is a random sequence of bits, ending with a delimiter. The sequence lengths are randomised between 1 to 20.

Training

Training convergence for the copy task using 4 different seeds (see the notebook for details)

NTM Convergence

The following plot shows the cost per sequence length during training. The network was trained with seed=10 and shows fast convergence. Other seeds may not perform as well but should converge in less than 30K iterations.

NTM Convergence

Evaluation

Here is an animated GIF that shows how the model generalize. The model was evaluated after every 500 training samples, using the target sequence shown in the upper part of the image. The bottom part shows the network output at any given training stage.

Copy Task

The following is the same, but with sequence length = 80. Note that the network was trained with sequences of lengths 1 to 20.

Copy Task


Repeat Copy Task

The Repeat Copy task tests whether the NTM can learn a simple nested function, and invoke it by learning to execute a for loop. The input to the network is a random sequence of bits, followed by a delimiter and a scalar value that represents the number of repetitions to output. The number of repetitions, was normalized to have zero mean and variance of one (as in the paper). Both the length of the sequence and the number of repetitions are randomised between 1 to 10.

Training

Training convergence for the repeat-copy task using 4 different seeds (see the notebook for details)

NTM Convergence

Evaluation

The following image shows the input presented to the network, a sequence of bits + delimiter + num-reps scalar. Specifically the sequence length here is eight and the number of repetitions is five.

Repeat Copy Task

And here's the output the network had predicted:

Repeat Copy Task

Here's an animated GIF that shows how the network learns to predict the targets. Specifically, the network was evaluated in each checkpoint saved during training with the same input sequence.

Repeat Copy Task

Installation

The NTM can be used as a reusable module, currently not packaged though.

  1. Clone repository
  2. Install PyTorch
  3. pip install -r requirements.txt

Usage

Execute ./train.py

usage: train.py [-h] [--seed SEED] [--task {copy,repeat-copy}] [-p PARAM]
                [--checkpoint-interval CHECKPOINT_INTERVAL]
                [--checkpoint-path CHECKPOINT_PATH]
                [--report-interval REPORT_INTERVAL]

optional arguments:
  -h, --help            show this help message and exit
  --seed SEED           Seed value for RNGs
  --task {copy,repeat-copy}
                        Choose the task to train (default: copy)
  -p PARAM, --param PARAM
                        Override model params. Example: "-pbatch_size=4
                        -pnum_heads=2"
  --checkpoint-interval CHECKPOINT_INTERVAL
                        Checkpoint interval (default: 1000). Use 0 to disable
                        checkpointing
  --checkpoint-path CHECKPOINT_PATH
                        Path for saving checkpoint data (default: './')
  --report-interval REPORT_INTERVAL
                        Reporting interval

pytorch-ntm's People

Contributors

loudinthecloud avatar marikgoldstein avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-ntm's Issues

why need to initialize read vector and memory?

Dear author:
I found you initialize read vector and memory as :

		self.register_buffer('mem_bias', torch.Tensor(N, M))

		# Initialize memory bias
		stdev = 1 / (np.sqrt(N + M))
		nn.init.uniform_(self.mem_bias, -stdev, stdev)

and

init_r_bias = torch.randn(1, M).to('cuda') * 0.01
# the initial value of read vector is not optimized.
self.register_buffer("read{}_bias".format(self.num_read_heads), init_r_bias)

I wonder whether the initialization scheme will make a big difference,
or I can just all initialized to torch.zeros()??

Cannot reproduce README.md graphs/results on copy task [Commit: 5c5ce66]

Hello,

I'm just trying to reproduce the results/graphs shown in the README.md for the "copy" task.

I am running this on the latest master branch:
Commit 5c5ce66376e8032c38ef4327ca381fee145f4d0f

How I trained my model:

./train.py --seed 1000 --task copy --checkpoint_interval 500 --checkpoint-path ./notebooks/copy -pbatch_size=15

NOTE: I used a batch_size of 15 instead of the default of 1, since it seems to lead to more stable convergence rates.

I then used the python notebook to generate 3 graphs shown in the README.md.
For convenience when comparing, I've included both the graphs I got, and the graph I expect (taken from the README.md).

Graph 1: Training convergence

I got:
image
I expect:
image

Graph 2: Training convergence (per sequence length)

I got:
image
I expect:
image

Graph 3: Evaluate

I got:
image

(The expected graph being that Outputs matches the Targets)

Setup information

My setup is:

  • OS: Ubuntu 16.04.4 LTS
  • Python version: 3.6.4
  • PyTorch is installed using Anaconda, pip freeze reports the version as:
    torch==0.3.0.post4
  • CUDA/cuDNN versions/libraries in use by pytorch at runtime:
/usr/lib/x86_64-linux-gnu/libcuda.so.384.111
/usr/local/cuda-9.0/lib64/libcublas.so.9.0.176
/usr/local/cuda-9.0/lib64/libcudart.so.9.0.176
/usr/local/cuda-9.0/lib64/libcudnn.so.7.0.5
/usr/local/cuda-9.0/lib64/libcurand.so.9.0.176
/usr/local/cuda-9.0/lib64/libcusparse.so.9.0.176
/usr/local/cuda-9.0/lib64/libnvrtc.so.9.0.176
/usr/local/cuda-9.0/lib64/libnvToolsExt.so.1.0.0

Let me know if there is any other information I can provide.

register_buffer

I have an issue regarding the training, I launch the program I get :

KeyError: "attribute 'mem_bias' already exists"

It seems that you define 2 times mem_bias in memory.py :

self.mem_bias = Variable(torch.Tensor(N, M))
self.register_buffer('mem_bias', self.mem_bias.data)

It is perhaps a problem with the pytorch version (I have pytorch 0.3).

How to change code when each sequence length is different

Dear sir:
Sorry for my rude words.But I do really want to know how to change the code if the sequence length is different.It seems that ntm need sequence length to be the same.
I tried to fill short sequence with zero but the ntm use attention and if filled with zero the predict rate down.
I want to know how could I do to overcome the problem.Please help me.
Best wishes to you.

Why to create a new tensor?

Hi, dear author:

    def write(self, w, e, a):
        """write to memory (according to section 3.2)."""
        self.prev_mem = self.memory
        self.memory = Variable(torch.Tensor(self.batch_size, self.N, self.M))
        erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
        add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
        self.memory = self.prev_mem * (1 - erase) + add

In your writing method, I dont understand why u create a new Variable(torch.Tensor(self.batch_size, self.N, self.M)) and then assign the new value,
Why not write as following directly :

    def write(self, w, e, a):
        """write to memory (according to section 3.2).""" 
        erase = torch.matmul(w.unsqueeze(-1), e.unsqueeze(1))
        add = torch.matmul(w.unsqueeze(-1), a.unsqueeze(1))
        self.memory = self.memory * (1 - erase) + add

Convergence is really slow with copy task when sequence length is smaller

Hi,

I have tried to run the copy task with the default parameters (controller_size=100, controller_layers=1, num_heads=1, sequence_width=8, sequence_min_len=1, sequence_max_len=20, memory_n=128, memory_m=20, batch_size=1), the result is similar to the one in the notebook. However, when I changed the sequence length to a smaller one (sequence_min_len=1, sequence_max_len=5), the fitting rate is really slow (like the figure below) which is unexpected since smaller sequence should be learned faster. Do you have any idea why this happen and how to train smaller sequences properly? Any suggestion is welcomed.

figure_1

error in copy-task-plots.ipynb

when run the code piece

seq_len = 60
_, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
result = evaluate(model.net, model.criterion, x, y)
y_out = result['y_out']

there comes the error information:

IndexError                                Traceback (most recent call last)
<ipython-input-41-127bd44fb490> in <module>()
      1 seq_len = 60
      2 _, x, y = next(iter(dataloader(1, 1, 8, seq_len, seq_len)))
----> 3 result = evaluate(model.net, model.criterion, x, y)
      4 y_out = result['y_out']

D:\GithubProjs\pytorch-ntm-master\train.py in evaluate(net, criterion, X, Y)
    151 
    152     result = {
--> 153         'loss': loss.data[0],
    154         'cost': cost / batch_size,
    155         'y_out': y_out,

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

how to solve it?

Why did each batch has a memory?

Dear author:
Your code of NTM is a pretty work, and its structure is concise and easy to follow. But I have a little confusion about why each batch has a memory? Why not using only a memory for every batch, just like a LSTM but just expanding the memory cell size? Could you help me address this confusion? Thank you very much!

Different results between testing in the mid terms of training and at the end of training

Dear author,

I've fork a repo(at https://github.com/marcwww/pytorch-ntm) from your work, mainly expecting to test the model on longer sequences(for example, training on sequences of length ranging from 1 to 10, and testing on seqs of length ranging from 11 to 20).

The question is that the final testing result after training without testing in the middle terms of the training process is different from that with testing in the middle terms. The experiment setting of the repo is the latter one (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/train_test.py#L236).

In the forked repo batches for testing are sampled in the same way of ones for training (at https://github.com/marcwww/pytorch-ntm/blob/1d0595e165a6790219df76e0b7f13b48e406b4d9/tasks/copytask_test.py#L16). Actually, I've tried to see whether the result are from the intertwined sampling of training and testing by loading a pre-generated test set, and it does not help.

Could you please help me with this? Thanks a lot.

Why use vague storage

Dear sir:
Viewing the code,for each input batch, B C,it would be stored as B N M.All the N
weight sum to 1.It actually split C into N pieces C.It's called vague storage.But I don't know why user this kind of storage.What's the advantage of this storage?

What's the meaning of using memory?

Dear sir:
I have read your code and I really appreciate your work.But I have get some questions.

  1. self.register_buffer('mem_bias',torch.Tensor(N,M)) #mem_bias was used as buffer, which means it would not be update
  2. self.memory =self.mem_bias.clone().repeat(batch_size,1,1) # self.memory was create using mem_bias to match the batch_size
  3. for each batch, we run init_squence(),
    image
    which reset the memory,and the reset function ,
    self.batch_size = batch_size
    self.memory = self.mem_bias.clone().repeat(batch_size, 1, 1)

just clean all the content in the memory and initialize it with mem_bias.
So what's the point to write and read from memory ? it just be the same with mem_bias each batch and mem_bias is not updated which means it's never changed.
I think I just could not figure it out and I would readlly appreciate it if you could answer my question.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.