Comments (5)
Would you be able to simply post the Kurfile itself here (rather than an iPython notebook)? A minimal working dataset would be great, too, but rather than needing to run the notebook (and risk coming up with a different dataset than you), can you just post it here, too? Not the code, just the dataset.
I would expect the submission to look something like this (where the data entry is really short, just one or two data items):
When I use this Kurfile:
settings:
# ...
train:
# ...
with this data given to the JSONL supplier:
{'input_seq' : [1, 2, 3, ...], ...}
{'input_seq' : [4, 5, 6, ...], ...}
then I get NaN all the time.
from kur.
If I understand your request correctly,
Dataset
I tried to use smaller or a few lines of text, but it seems the model does not work on such a short text. So, it seems the easiest way to replicate the error is still to use the same dataset stored in kur github.
Preprocess dataset and save into JSONL file
- Go to
kur directory/examples/language-model/
- go inside
make_data.py
and setdev == True
, to speed up everything - then run
python make_data.py
to save data into jsonl file - the format of sample to be saved in jsonl file is following:
{'out_char': [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'in_seq': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]]}
Kurfile
- this kurfile is almost the same as the original kurfile.yaml under
language-model/model/
- use the following kurfile to replace the original
- run
kur -v train kurfile.yaml
undermodel/
directory
---
settings:
vocab:
size: 30
rnn:
size: 128
depth: 3
model:
- input: in_seq
- for:
range: "{{ rnn.depth - 1 }}"
iterate:
- recurrent:
size: "{{ rnn.size }}"
type: lstm # only difference from original
sequence: yes
bidirectional: no
- batch_normalization
- recurrent:
size: "{{ rnn.size }}"
type: lstm # only difference from original
sequence: no
bidirectional: no
- dense: "{{ vocab.size }}"
- activation: softmax
- output: out_char
loss:
- target: out_char
name: categorical_crossentropy
train:
data:
- jsonl: ../data/train.jsonl
epochs: 5
weights:
initial: best.w.kur
last: last.w.kur
log: log
validate:
data:
- jsonl: ../data/validate.jsonl
weights: best.w.kur
test:
data:
- jsonl: ../data/test.jsonl
weights: best.w.kur
evaluate:
data:
- jsonl: ../data/evaluate.jsonl
weights: best.w.kur
destination: output.pkl
Error I always get
Epoch 1/5, loss=10.101: 0%| | 32/13300 [00:00<02:19, 95.00samples/s][ERROR 2017-03-07 21:49:15,882 kur.model.executor:647] Received NaN loss value for model output "out_char". Make sure that your inputs are all normalized and that the learning rate is not too high. Sometimes different algorithms/implementations work better than others, so you can try switching optimizers or backend.
Epoch 1/5, loss=nan: 0%| | 64/13300 [00:00<01:29, 147.18samples/s]
[ERROR 2017-03-07 21:49:15,882 kur.model.executor:227] Exception raised during training.
Traceback (most recent call last):
File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 224, in train
**kwargs
File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 648, in wrapped_train
raise ValueError('Model loss is NaN.')
ValueError: Model loss is NaN.
[INFO 2017-03-07 21:49:15,883 kur.model.executor:235] Saving most recent weights: last.w.kur
Traceback (most recent call last):
File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/bin/kur", line 11, in <module>
load_entry_point('kur', 'console_scripts', 'kur')()
File "/Users/Natsume/Downloads/kur/kur/__main__.py", line 382, in main
sys.exit(args.func(args) or 0)
File "/Users/Natsume/Downloads/kur/kur/__main__.py", line 62, in train
func(step=args.step)
File "/Users/Natsume/Downloads/kur/kur/kurfile.py", line 371, in func
return trainer.train(**defaults)
File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 224, in train
**kwargs
File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 648, in wrapped_train
raise ValueError('Model loss is NaN.')
ValueError: Model loss is NaN.
from kur.
using code above, if I set settings backend as the following:
settings:
backend:
name: keras
backend: tensorflow
The error reported above will disappear and it trains, but I do receive the following report saying tensorflow was not compiled (I remember @ajsyp told me once to install tensorflow from source to deal with tensorflow compile issue, I will try it later):
(dlnd-tf-lab) ->kur -v train kurfile.yaml
[INFO 2017-03-08 11:56:56,939 kur.kurfile:699] Parsing source: kurfile.yaml, included by top-level.
[INFO 2017-03-08 11:56:56,952 kur.kurfile:82] Parsing Kurfile...
[INFO 2017-03-08 11:56:56,970 kur.loggers.binary_logger:71] Loading log data: log
[INFO 2017-03-08 11:57:00,218 kur.backend.backend:80] Creating backend: keras
[INFO 2017-03-08 11:57:00,218 kur.backend.backend:83] Backend variants: none
[INFO 2017-03-08 11:57:00,218 kur.backend.keras_backend:81] The tensorflow backend for Keras has been requested.
[INFO 2017-03-08 11:57:01,184 kur.backend.keras_backend:195] Keras is loaded. The backend is: tensorflow
[INFO 2017-03-08 11:57:01,184 kur.model.model:260] Enumerating the model containers.
[INFO 2017-03-08 11:57:01,185 kur.model.model:265] Assembling the model dependency graph.
[INFO 2017-03-08 11:57:01,185 kur.model.model:280] Connecting the model graph.
[INFO 2017-03-08 11:57:02,200 kur.model.model:284] Model inputs: in_seq
[INFO 2017-03-08 11:57:02,200 kur.model.model:285] Model outputs: out_char
[INFO 2017-03-08 11:57:02,200 kur.kurfile:357] Ignoring missing initial weights: best.w.kur. If this is undesireable, set "must_exist" to "yes" in the approriate "weights" section.
[INFO 2017-03-08 11:57:02,200 kur.model.executor:315] No historical training loss available from logs.
[INFO 2017-03-08 11:57:02,200 kur.model.executor:323] No historical validation loss available from logs.
[INFO 2017-03-08 11:57:02,200 kur.model.executor:329] No previous epochs.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[INFO 2017-03-08 11:57:10,026 kur.backend.keras_backend:666] Waiting for model to finish compiling...
Epoch 1/5, loss=5.123: 26%|████▋ | 3488/13300 [00:29<01:23, 118.16samples/s]^C
from kur.
change backend to remove error and compiling issue
I tried the latest development version with pytorch as backend. It is faster and no error reported above when using lstm
:
Change backend from tensorflow to pytorch, by using code below:
settings:
backend: pytorch
no error and no tensorflow compiling issue at all
Install tensorflow from source
Installing tensorflow from source looks complicated, is there an easier solution to tensorflow compiling issue?
from kur.
I have also experience worse numerical stability with Theano than the other backends. Once the PyTorch backend is stable/tested more, it may become the default backend. So I'm glad that switching away from Theano helps with your problem.
Also, I have installed TensorFlow from source before, but I have not found it an enjoyable or simple process (frankly, bazel doesn't seem like a great build tool, in my experience). My recommendation is that unless you really are ready to dive into building from source, just use pip install tensorflow
or pip install tensorflow-gpu
(for GPU capabilities) and simply live. After all, if you truly need a big speed boost, a GPU will be worlds different than the extra CPU optimization.
from kur.
Related Issues (20)
- Weights are not getting saved. HOT 2
- Validation loss diverging in speech example HOT 1
- add support for mxnet
- Truth Data generation Error HOT 5
- How to get text output from evaluate.pkl file ? HOT 1
- Running STT on specific file?
- AttributeError: module 'keras.backend' has no attribute 'set_image_dim_ordering' HOT 1
- RuntimeError: generator raised StopIteration HOT 6
- out of memory error
- How do I transcribe the wav file?
- Validation loss become higher after 20 hours training time HOT 3
- Weights for speech recognition are not restored when again starting the training as loss value climbs back to 1st epoch value i.e 316 instead of starting from reduced loss HOT 4
- Training for speech recognition using KUR on a distributed system. Is it possible? HOT 1
- CUDA_ERROR_NO_DEVICE HOT 1
- Import Voxforge to Kur HOT 1
- speech.yml prediction is empty HOT 2
- failed to find any available GPUs HOT 1
- The speed of training gradually decreases when using gpu HOT 4
- Training a dataset with non-latin characters HOT 1
- Error when training on custom data HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kur.