Giter VIP home page Giter VIP logo

Comments (5)

ajsyp avatar ajsyp commented on August 19, 2024

Would you be able to simply post the Kurfile itself here (rather than an iPython notebook)? A minimal working dataset would be great, too, but rather than needing to run the notebook (and risk coming up with a different dataset than you), can you just post it here, too? Not the code, just the dataset.

I would expect the submission to look something like this (where the data entry is really short, just one or two data items):

When I use this Kurfile:

settings:
  # ...
train:
  # ...

with this data given to the JSONL supplier:

{'input_seq' : [1, 2, 3, ...], ...}
{'input_seq' : [4, 5, 6, ...], ...}

then I get NaN all the time.

from kur.

EmbraceLife avatar EmbraceLife commented on August 19, 2024

If I understand your request correctly,

Dataset

I tried to use smaller or a few lines of text, but it seems the model does not work on such a short text. So, it seems the easiest way to replicate the error is still to use the same dataset stored in kur github.

Preprocess dataset and save into JSONL file

  • Go to kur directory/examples/language-model/
  • go inside make_data.py and set dev == True, to speed up everything
  • then run python make_data.py to save data into jsonl file
  • the format of sample to be saved in jsonl file is following:
{'out_char': [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
'in_seq': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]]}

Kurfile

  • this kurfile is almost the same as the original kurfile.yaml under language-model/model/
  • use the following kurfile to replace the original
  • run kur -v train kurfile.yaml under model/ directory
---

settings:

  vocab:
    size: 30

  rnn:                              
    size: 128        
    depth: 3

model:
  - input: in_seq

  - for:
      range: "{{ rnn.depth - 1 }}"
      iterate:
        - recurrent:
            size: "{{ rnn.size }}"
            type: lstm                                     # only difference from original
            sequence: yes
            bidirectional: no
        - batch_normalization

  - recurrent:
      size: "{{ rnn.size }}"
      type: lstm                                          # only difference from original
      sequence: no
      bidirectional: no

  - dense: "{{ vocab.size }}"

  - activation: softmax

  - output: out_char

loss:
  - target: out_char
    name: categorical_crossentropy

train:
  data:
    - jsonl: ../data/train.jsonl
  epochs: 5
  weights:
    initial: best.w.kur
    last: last.w.kur

  log: log

validate:
  data:
    - jsonl: ../data/validate.jsonl
  weights: best.w.kur
   
test:
  data:
    - jsonl: ../data/test.jsonl
  weights: best.w.kur
    
evaluate:
  data:
    - jsonl: ../data/evaluate.jsonl
  weights: best.w.kur

  destination: output.pkl

Error I always get

Epoch 1/5, loss=10.101:   0%| | 32/13300 [00:00<02:19, 95.00samples/s][ERROR 2017-03-07 21:49:15,882 kur.model.executor:647] Received NaN loss value for model output "out_char". Make sure that your inputs are all normalized and that the learning rate is not too high. Sometimes different algorithms/implementations work better than others, so you can try switching optimizers or backend.
Epoch 1/5, loss=nan:   0%| | 64/13300 [00:00<01:29, 147.18samples/s]
[ERROR 2017-03-07 21:49:15,882 kur.model.executor:227] Exception raised during training.
Traceback (most recent call last):
  File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 224, in train
    **kwargs
  File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 648, in wrapped_train
    raise ValueError('Model loss is NaN.')
ValueError: Model loss is NaN.
[INFO 2017-03-07 21:49:15,883 kur.model.executor:235] Saving most recent weights: last.w.kur
Traceback (most recent call last):
  File "/Users/Natsume/miniconda2/envs/dlnd-tf-lab/bin/kur", line 11, in <module>
    load_entry_point('kur', 'console_scripts', 'kur')()
  File "/Users/Natsume/Downloads/kur/kur/__main__.py", line 382, in main
    sys.exit(args.func(args) or 0)
  File "/Users/Natsume/Downloads/kur/kur/__main__.py", line 62, in train
    func(step=args.step)
  File "/Users/Natsume/Downloads/kur/kur/kurfile.py", line 371, in func
    return trainer.train(**defaults)
  File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 224, in train
    **kwargs
  File "/Users/Natsume/Downloads/kur/kur/model/executor.py", line 648, in wrapped_train
    raise ValueError('Model loss is NaN.')
ValueError: Model loss is NaN.

from kur.

EmbraceLife avatar EmbraceLife commented on August 19, 2024

using code above, if I set settings backend as the following:

settings: 
  backend: 
    name: keras
    backend: tensorflow

The error reported above will disappear and it trains, but I do receive the following report saying tensorflow was not compiled (I remember @ajsyp told me once to install tensorflow from source to deal with tensorflow compile issue, I will try it later):

(dlnd-tf-lab)  ->kur -v train kurfile.yaml
[INFO 2017-03-08 11:56:56,939 kur.kurfile:699] Parsing source: kurfile.yaml, included by top-level.
[INFO 2017-03-08 11:56:56,952 kur.kurfile:82] Parsing Kurfile...
[INFO 2017-03-08 11:56:56,970 kur.loggers.binary_logger:71] Loading log data: log
[INFO 2017-03-08 11:57:00,218 kur.backend.backend:80] Creating backend: keras
[INFO 2017-03-08 11:57:00,218 kur.backend.backend:83] Backend variants: none
[INFO 2017-03-08 11:57:00,218 kur.backend.keras_backend:81] The tensorflow backend for Keras has been requested.
[INFO 2017-03-08 11:57:01,184 kur.backend.keras_backend:195] Keras is loaded. The backend is: tensorflow
[INFO 2017-03-08 11:57:01,184 kur.model.model:260] Enumerating the model containers.
[INFO 2017-03-08 11:57:01,185 kur.model.model:265] Assembling the model dependency graph.
[INFO 2017-03-08 11:57:01,185 kur.model.model:280] Connecting the model graph.
[INFO 2017-03-08 11:57:02,200 kur.model.model:284] Model inputs:  in_seq
[INFO 2017-03-08 11:57:02,200 kur.model.model:285] Model outputs: out_char
[INFO 2017-03-08 11:57:02,200 kur.kurfile:357] Ignoring missing initial weights: best.w.kur. If this is undesireable, set "must_exist" to "yes" in the approriate "weights" section.
[INFO 2017-03-08 11:57:02,200 kur.model.executor:315] No historical training loss available from logs.
[INFO 2017-03-08 11:57:02,200 kur.model.executor:323] No historical validation loss available from logs.
[INFO 2017-03-08 11:57:02,200 kur.model.executor:329] No previous epochs.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[INFO 2017-03-08 11:57:10,026 kur.backend.keras_backend:666] Waiting for model to finish compiling...

Epoch 1/5, loss=5.123:  26%|████▋             | 3488/13300 [00:29<01:23, 118.16samples/s]^C

from kur.

EmbraceLife avatar EmbraceLife commented on August 19, 2024

change backend to remove error and compiling issue

I tried the latest development version with pytorch as backend. It is faster and no error reported above when using lstm:

Change backend from tensorflow to pytorch, by using code below:

settings: 
  backend: pytorch

no error and no tensorflow compiling issue at all

Install tensorflow from source

Installing tensorflow from source looks complicated, is there an easier solution to tensorflow compiling issue?

from kur.

ajsyp avatar ajsyp commented on August 19, 2024

I have also experience worse numerical stability with Theano than the other backends. Once the PyTorch backend is stable/tested more, it may become the default backend. So I'm glad that switching away from Theano helps with your problem.

Also, I have installed TensorFlow from source before, but I have not found it an enjoyable or simple process (frankly, bazel doesn't seem like a great build tool, in my experience). My recommendation is that unless you really are ready to dive into building from source, just use pip install tensorflow or pip install tensorflow-gpu (for GPU capabilities) and simply live. After all, if you truly need a big speed boost, a GPU will be worlds different than the extra CPU optimization.

from kur.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.