dennybritz / rnn-tutorial-rnnlm Goto Github PK

View Code? Open in Web Editor NEW

894.0 50.0 469.0 23.88 MB

Recurrent Neural Network Tutorial, Part 2 - Implementing a RNN in Python and Theano

License: Apache License 2.0

Python 18.85% Jupyter Notebook 81.15%

rnn-tutorial-rnnlm's Introduction

Please read the blog post that goes with this code!

Jupyter Notebook Setup

System Requirements:

Python, pip
(Optional) virtualenv

To start the Jupyter Notebook:

# Clone the repo
git clone https://github.com/dennybritz/rnn-tutorial-rnnlm
cd rnn-tutorial-rnnlm

# Create a new virtual environment (optional, but recommended)
virtualenv venv
source venv/bin/activate

# Install requirements
pip install -r requirements.txt
# Start the notebook server
jupyter notebook

Setting up a CUDA-enabled GPU instance on EC2:

# Install build tools
sudo apt-get update
sudo apt-get install -y build-essential git python-pip libfreetype6-dev libxft-dev libncurses-dev libopenblas-dev  gfortran python-matplotlib libblas-dev liblapack-dev libatlas-base-dev python-dev python-pydot linux-headers-generic linux-image-extra-virtual
sudo pip install -U pip

# Install CUDA 7
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/x86_64/cuda-repo-ubuntu1410_7.0-28_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1410_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda
sudo reboot

# Clone the repo and install requirements
git clone [email protected]:dennybritz/nn-theano.git
cd nn-theano
sudo pip install -r requirements.txt

# Set Environment variables
export CUDA_ROOT=/usr/local/cuda-7.0
export PATH=$PATH:$CUDA_ROOT/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CUDA_ROOT/lib64
export THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32
# For profiling only
export CUDA_LAUNCH_BLOCKING=1

# Startup jupyter noteboook
jupyter notebook

To start a public notebook server that is accessible over the network you can follow the official instructions.

rnn-tutorial-rnnlm's People

Contributors

Stargazers

Watchers

Forkers

aurora1625 mmottahedi liyi193328 laisun kfreg alongwithyou txd866 duytinvo killedision mmolero yfdu1989 isoyang gitforhf bountrisv kdjyss davinnovation benjamesbabala sunilitggu zerkh tmdavid elviswf canojim fdabek1 yuepan anirudhprabhu lanzhzh slacklife zuiwufenghua xuh5156 klaas7 rsingh2083 gelsto hhoareau sonu5623 clare1chen poweic wangxiong2015 liuchang8am nagyistoce gholamlooali beardeer aaxwaz sagarjoglekar shaoguangcheng soroushmehr xsongx ljkania kpeterso macro32-xx 3333qwe ml-ai-nlp-ir recluze canjiali dengjl hrishikeshvganu mandarup ghs2015 rlkelly roy881020 liangpj demonalex1 leomauro bassneere pinglmlcv woodyzantzinger darkmap byeongjunjoo leafxin wyjss2015 jude2014 mhbashari rjbashar bertomartin planaria158 devroy73 mdagost burkesquires scharmchi gxyepfl dusj wanjinchang yasiemir darksnakezero fangzheng354 binbinbian sophie-germain zbxzc35 dawtang miradel51 kentchun33333 gushihao siruliu tbmihailov phddone romeowen wangzhaobo libcorner reborn2016 greatarthur ljdawn

rnn-tutorial-rnnlm's Issues

Some basic errors I am getting with code while loading and testing the model.

Great tutorial, learning a lot reading through this! I collected the code from the tutorial as follows but I don't understand how you made it work because I get some very basic errors (I am new to RNN):

vocabulary_size = 8000
unknown_token = "UNKNOWN_TOKEN"
sentence_start_token = "SENTENCE_START"
sentence_end_token = "SENTENCE_END"

from rnn_theano import RNNTheano
from utils import load_model_parameters_theano, save_model_parameters_theano

word_to_index = dict([(w,i) for i,w in enumerate(index_to_word)])

model = RNNTheano(vocabulary_size, hidden_dim=50)
load_model_parameters_theano('./data/trained-model-theano.npz', model)

def generate_sentence(model):
    # We start the sentence with the start token
    new_sentence = [word_to_index[sentence_start_token]]
    # Repeat until we get an end token
    while not new_sentence[-1] == word_to_index[sentence_end_token]:
        next_word_probs = model.forward_propagation(new_sentence)
        sampled_word = word_to_index[unknown_token]
        # We don't want to sample unknown words
        while sampled_word == word_to_index[unknown_token]:
            samples = np.random.multinomial(1, next_word_probs[-1])
            sampled_word = np.argmax(samples)
        new_sentence.append(sampled_word)
    sentence_str = [index_to_word[x] for x in new_sentence[1:-1]]
    return sentence_str

num_sentences = 10
senten_min_length = 7

for i in range(num_sentences):
    sent = []
    # We want long sentences, not sentences with one or two words
    while len(sent) < senten_min_length:
        sent = generate_sentence(model)
    print " ".join(sent)

It's a silly error but I can't figure out a fix, will greatly appreciate if you can help me test the model.

Traceback (most recent call last):
  File "rnn_test.py", line 11, in <module>
    word_to_index = dict([(w,i) for i,w in enumerate(index_to_word)])
NameError: name 'index_to_word' is not defined

pip on requirement.txt gives "This backport is for Python 2.7 only." for functool32

When
pip install -r requirements.txt
,
the following error occurred:
Collecting appnope==0.1.0 (from -r requirements.txt (line 1))
Using cached appnope-0.1.0-py2.py3-none-any.whl
Collecting backports.ssl-match-hostname==3.4.0.2 (from -r requirements.txt (line 2))
Using cached backports.ssl_match_hostname-3.4.0.2.tar.gz
Collecting certifi==2015.9.6.2 (from -r requirements.txt (line 3))
Using cached certifi-2015.9.6.2-py2.py3-none-any.whl
Collecting decorator==4.0.2 (from -r requirements.txt (line 4))
Using cached decorator-4.0.2-py2.py3-none-any.whl
Collecting funcsigs==0.4 (from -r requirements.txt (line 5))
Using cached funcsigs-0.4-py2.py3-none-any.whl
Collecting functools32==3.2.3.post2 (from -r requirements.txt (line 6))
Using cached functools32-3.2.3-2.zip
Complete output from command python setup.py egg_info:
This backport is for Python 2.7 only.

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in C:\Users\lud\AppData\Local\Temp\pip-build-oboinpbr\functools32\

My environment is
Python: 3.6.4
Pip: 9.0.1

Any suggestion?

not able to get data from csv file to train network in "train-theano.py"

Hey, Im having troubles getting the data to train the RNN. Specifically on this line:
sentences = itertools.chain(*[nltk.sent_tokenize(x[0].decode('utf-8').lower()) for x in reader])
if I open the file as 'rb' i get the error:

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

and if I open it up with 'r' i get:

sentences = itertools.chain(*[nltk.sent_tokenize(x[0].decode('utf-8').lower()) for x in reader])
AttributeError: 'str' object has no attribute 'decode'

Im not sure wich is the very basic idea to train the NN with strings or binary codes (guess binary codes).
thanks for your time!

error:scan() got an unexpected keyword argument 'strict'

When I run the 'RNNTheano' in ipython file: RNNLM, an error emerges:
TypeError Traceback (most recent call last)
in ()
2 # To avoid performing millions of expensive calculations we use a smaller vocabulary size for checking.
3 grad_check_vocab_size = 5
----> 4 model = RNNTheano(grad_check_vocab_size, 10)
5 gradient_check_theano(model, [0,1,2,3], [1,2,3,4])

/home/carrierlxk/rnn-tutorial-rnnlm-master/rnn_theano.py in init(self, word_dim, hidden_dim, bptt_truncate)
22 # We store the Theano graph here
23 self.theano = {}
---> 24 self.theano_build()
25
26 def theano_build(self):

/home/carrierlxk/rnn-tutorial-rnnlm-master/rnn_theano.py in theano_build(self)
38 non_sequences=[U, V, W],
39 truncate_gradient=self.bptt_truncate,
---> 40 strict=True)
41
42 prediction = T.argmax(o, axis=1)

TypeError: scan() got an unexpected keyword argument 'strict'

What is the reason?

Why the timestep is not used in the forward_propagation

When i use lstm in keras or tensorflow, they have different data strctures. When i compared this tutorial with keras, i found the 'timestep' is only used in the backward_propagation, but not in the forward_propagation is this right?

I think there is something wrong with code of the function of calculate_total_loss

Hi, since the loss function formula is

$
\begin{aligned}
L(y,o) = - \frac{1}{N} \sum_{n \in N} y_{n} \log o_{n}
\end{aligned}
$

And the code in RNNLM.ipynb is

def calculate_total_loss(self, x, y):
    L = 0
    # For each sentence...
    for i in np.arange(len(y)):
        o, s = self.forward_propagation(x[i])
        # We only care about our prediction of the "correct" words
        correct_word_predictions = o[np.arange(len(y[i])), y[i]]
        # Add to the loss based on how off we were
        L += -1 * np.sum(np.log(correct_word_predictions))
    return L

L += -1 * np.sum(np.log(correct_word_predictions)) didn't take $y_n$ that is the true labels into consideration. I'm not sure I am right, maybe $y_n$ is just 1 with one-hot encoding. Am I right? Thank you!

module compiled against API version 0xa but this version of numpy is 0x9

I am a newcomer to RNN. When I try to type "python train-theano.py", the system does not work and report like this:
RuntimeError: module compiled against API version 0xa but this version of numpy is 0x9
RuntimeError: module compiled against API version 0xa but this version of numpy is 0x9

I have created a virtualenv, and install requirements.txt

ValueError: dimension mismatch in args to gemv (50,50)x(80)->(50)

Hi Denny!

I mentioned in your tutorial about RNNs that I'm facing an error, similar to Edwin's. I actually had exactly the one he's facing, but I already fixed it. It was related to my GPU, which wasn't being correctly enabled.

Now, everything seems to be working fine, I also tested some Theano snippets to see if everything was working ok. However, when I try to execute the sampling code, I'm having the following error (I'll highlight the important parts after it, so you don't have to look directly at this Cthulhu):

Traceback (most recent call last):
  File "RNNLM.py", line 345, in <module>
    sent = generate_sentence(model)
  File "RNNLM.py", line 328, in generate_sentence
    next_word_probs = model.forward_propagation(new_sentence)
  File "rnn-tutorial-rnnlm/venv/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 606, in __call__
    storage_map=self.fn.storage_map)
  File "rnn-tutorial-rnnlm/venv/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 595, in __call__
    outputs = self.fn()
  File "rnn-tutorial-rnnlm/venv/local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 672, in rval
    r = p(n, [x[0] for x in i], o)
  File "rnn-tutorial-rnnlm/venv/local/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 661, in <lambda>
    self, node)
  File "scan_perform.pyx", line 356, in theano.scan_module.scan_perform.perform (/root/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/scan_perform/mod.cpp:3605)
  File "scan_perform.pyx", line 350, in theano.scan_module.scan_perform.perform (/root/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/scan_perform/mod.cpp:3537)

ValueError: dimension mismatch in args to gemv (50,50)x(80)->(50)
Apply node that caused the error: GpuGemv{no_inplace}(GpuSubtensor{::, int32}.0, TensorConstant{1.0}, W_copy[cuda], <CudaNdarrayType(float32, vector)>, TensorConstant{1.0})
Inputs types: [CudaNdarrayType(float32, vector), TensorType(float32, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, vector), TensorType(float32, scalar)]

HINT: Use another linker then the c linker to have the inputs shapes and strides printed.
HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Apply node that caused the error: forall_inplace,gpu,scan_fn}(Shape_i{0}.0, Subtensor{int64:int64:int8}.0, GpuAlloc{memset_0=True}.0, Shape_i{0}.0, V, U, W)
Inputs types: [TensorType(int64, scalar), TensorType(int32, vector), CudaNdarrayType(float32, matrix), TensorType(int64, scalar), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, matrix)]
Inputs shapes: [(), (1,), (1, 80), (), (8000, 50), (50, 8000), (50, 50)]
Inputs strides: [(), (4,), (0, 1), (), (50, 1), (8000, 1), (50, 1)]
Inputs values: [array(1), array([0], dtype=int32), 'not shown', array(1), 'not shown', 'not shown', 'not shown']

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

The important message is the following:

ValueError: dimension mismatch in args to gemv (50,50)x(80)->(50)

I think this piece of code is referrring to a multiplication between W (of size 50x50) and the hidden layer s_t, but with size 80 (obs: I'm using the same math notation you used, so I don't know if this is mapping directly to the Theano code. I'll check it out).

Do you know what could be causing that?

Sorry for bothering you.

The equation problems in your related blogs

Recently, I have learned the RNN model from your bolgs. It is very nice! The codes in your github is very readable. However, I have found the equation of part 2 and part 4 of the blogs cannot show well. Instead, the original Latex format is showed, which is much difficult to read. However, the equations in part 1 show well. I would appreciate it if you can kindly handle this problem.

IndexError: shape mismatch:

Hi,
I'm trying to get intuition in rnnlm and write your code from tutorial step by step.
Could you please clarify, why your code throws an exception:
"in calculate_total_loss
correct_word_predictions = o[np.arange(len(y)), y[i]]
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (1000,) (20,) "?
I'm using python 3

Thanks in advance.

P.S.
It would be great if you also kindly clarify some points about your data file "reddit-comments-2015-08.csv":

As i understand, comments in csv file must be separated by double quotes, but it is wrong in your file.
Also i tried use just half of file content, but in that case i always get error:
"in
sentences = itertools.chain(*[nltk.sent_tokenize(x[0].lower()) for x in reader])
IndexError: list index out of range".

I would be appreciate if you kindly clarify these points.

Large error when using the genrate sentence method at the end

Hi there,

Thank you for taking your time to code and present it all in your tutorial. I've enjoyed trying them out so far. However I ran into this rather massive error when I tried to run the generate_sentence method in the portion of the tutorial where you generate words.
The error is as follows (bear with me here, it is rather long):

Traceback (most recent call last):
File "RNN.py", line 325, in
sent = generate_sentence(model)
File "RNN.py", line 306, in generate_sentence
next_word_probs = model.forward_propagation(new_sentence)
File "/home/mahesh/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py", line 917, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/mahesh/miniconda2/lib/python2.7/site-packages/theano/gof/link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/mahesh/miniconda2/lib/python2.7/site-packages/theano/compile/function_module.py", line 903, in call
self.fn() if output_subset is None else
File "/home/mahesh/miniconda2/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 963, in rval
r = p(n, [x[0] for x in i], o)
File "/home/mahesh/miniconda2/lib/python2.7/site-packages/theano/scan_module/scan_op.py", line 952, in p
self, node)
File "theano/scan_module/scan_perform.pyx", line 405, in theano.scan_module.scan_perform.perform (/home/mahesh/.theano/compiledir_Linux-3.17-fc21.x86_64-x86_64-with-fedora-21-Twenty_One-x86_64-2.7.13-64/scan_perform/mod.cpp:4606)
File "/home/mahesh/miniconda2/lib/python2.7/site-packages/theano/gof/link.py", line 325, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "theano/scan_module/scan_perform.pyx", line 397, in theano.scan_module.scan_perform.perform (/home/mahesh/.theano/compiledir_Linux-3.17-fc21.x86_64-x86_64-with-fedora-21-Twenty_One-x86_64-2.7.13-64/scan_perform/mod.cpp:4490)
ValueError: Shape mismatch: A.shape[1] != x.shape[0]
Apply node that caused the error: CGemv{no_inplace}(Subtensor{::, int32}.0, TensorConstant{1.0}, W_copy, <TensorType(float64, vector)>, TensorConstant{1.0})
Toposort index: 2
Inputs types: [TensorType(float64, vector), TensorType(float64, scalar), TensorType(float64, matrix), TensorType(float64, vector), TensorType(float64, scalar)]
Inputs shapes: [(50,), (), (50, 50), (100,), ()]
Inputs strides: [(64000,), (), (400, 8), (8,), ()]
Inputs values: ['not shown', array(1.0), 'not shown', 'not shown', array(1.0)]
Outputs clients: [[Elemwise{tanh,no_inplace}(CGemv{no_inplace}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
Apply node that caused the error: forall_inplace,cpu,scan_fn}(Shape_i{0}.0, Subtensor{int64:int64:int8}.0, IncSubtensor{InplaceSet;:int64:}.0, Shape_i{0}.0, V, U, W)
Toposort index: 7
Inputs types: [TensorType(int64, scalar), TensorType(int32, vector), TensorType(float64, matrix), TensorType(int64, scalar), TensorType(float64, matrix), TensorType(float64, matrix), TensorType(float64, matrix)]
Inputs shapes: [(), (1,), (1, 100), (), (8000, 50), (50, 8000), (50, 50)]
Inputs strides: [(), (4,), (800, 8), (), (400, 8), (64000, 8), (400, 8)]
Inputs values: [array(1), array([0], dtype=int32), 'not shown', array(1), 'not shown', 'not shown', 'not shown']
Outputs clients: [[], ['output']]

I can't quite figure out what the issue is over here.
I would be grateful for any help you can give me in resolving this issue. I would love to get this code working in its entireity :)

thanks,

Mahesh

error: multinomial in generate_sentence of RNNLM.ipynb

Thanks for this great tutorial!
When I'm trying to run code within RNNLM.ipynb, I got an error in generate_sentence, witch says:

  File "mtrand.pyx", line 4811, in mtrand.RandomState.multinomial (numpy/random/mtrand/mtrand.c:32755)
ValueError: object too deep for desired array

After checking the code, I found in next_word_probs = model.forward_propagation(new_sentence), according to the returns values of forward_propagation, next_word_probs may be a combination of o, s, but np.random.multinomial(1, next_word_probs[-1]) may just want o (output). Thus, by changing to next_word_probs, _ = model.forward_propagation(new_sentence), the code can run again.

Am I getting this correctly?

Thanks!

Where is RNNNumpy?

I can not find the numpy code. Thanks.

NameError: name 'softmax' is not defined

Hi,

Thank you for the great tutorial.
I got stuck here,
o[t]= softmax(self.V.dot(s[t]))
Inside the forward_propagation function definition
NameError: name 'softmax' is not defined

I tried implementing a function called 'softmax'

def softmax(x):
	e_x = np.exp(x-np.max(x))
	return e_x / e_x.sum()

def softmax(x):
	w = np.array(x)
	maxes = np.amax(w, axis=0)
	e = np.exp(w-maxes)
	s = e / np.sum(e, axis=0)
	return s

and but it does n't work i get the exception
ValueError: could not broadcast input array from shape (8000) into shape (100)

also tried
o[t]= Th.nnet.softmax(self.V.dot(s[t]))

but got the following error
ValueError: setting an array element with a sequence.

please somebody help

A simple question about the ouput o_t of rnn

@dennybritz I have been reading your rnn code and I got an question in rnn_theano.py:
30--> def forward_prop_step(x_t, s_t_prev, U, V, W):
s_t = T.tanh(U[:,x_t] + W.dot(s_t_prev))
o_t = T.nnet.softmax(V.dot(s_t))
return [o_t[0], s_t]
It defines a simple recurrent network and my question is what's the component of o_t? Why using o_t[0]?
Because when we use it, the x_t is each element of x which is a specific example of X_train, so the x is a list of real number which respect to the word_index and in this case x_t is a real number. So the o_t in this case is a 1-d vector whose length is the word_dim.
Could you solve the problem for me? Thank you very much!

Dimensionality problem

Hello! I' trying to modify a bit you're code to make the RNN return not an index of a word, but an array. That is, my input will be an array of floats and I want the RNN estimate the next one. I encountered some problems: there is a warning compiling the forard_prop_step function inside theano.scan that tells that all parameters should be specified in the sequence or non_sequence field. In fact I can't see where the function gets s_t_prev when executes. Can you clarify this point?

Saving/Loading of Model Parameters

Disclaimer - I am fairly new to coding in general.

Could you also use something like np.savetxt('filename.csv', model.hidden_dim, delimiter=",") instead of using np.savez?

I was having trouble with using np.savez earlier, but figured out I didn't have the utils.py in the correct directory.

Licence

I would like to use this code for some project at the FAU (Friedrich-Alexander-University Erlangen-Nürnberg, Germany).

Without a license, I can use the code but may not redistribute it. I d like to have it graded, so I need to give it to the University. This may be forbidden without a license.

All in all, I would need a written text saying I can use it for this purpose, or a license.

Currently I am adjusting the code to learn a language character-based.

Thanks in advance,
DSZ