training with flickr8k aborts: 253/15000 batch done in 5.037s. at ep

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Aborting, cost seems to be exploding. about neuraltalk HOT 4 OPEN

karpathy commented on July 24, 2024

Aborting, cost seems to be exploding.

from neuraltalk.

Comments (4)

karpathy commented on July 24, 2024

With default parameters? I Thoguht I tuned them so that this doesn't happen, sorry about that. As the message suggests, lowering the learning rate does it. Set learning_rate to be about half or fifth of what it is now, until it doesn't explode :)

from neuraltalk.

StevenLOL commented on July 24, 2024

Here is my result on the default setting:

python driver.py
parsed parameters:
{
"grad_clip": 5,
"rnn_relu_encoders": 0,
"dataset": "flickr8k",
"image_encoding_size": 256,
"eval_max_images": -1,
"drop_prob_decoder": 0.5,
"word_encoding_size": 256,
"max_epochs": 50,
"eval_batch_size": 100,
"fappend": "baseline",
"generator": "lstm",
"write_checkpoint_ppl_threshold": -1,
"decay_rate": 0.999,
"tanhC_version": 0,
"hidden_size": 256,
"momentum": 0.0,
"worker_status_output_directory": "status/",
"learning_rate": 0.001,
"checkpoint_output_directory": "cv/",
"do_grad_check": 0,
"word_count_threshold": 5,
"batch_size": 100,
"regc": 1e-08,
"smooth_eps": 1e-08,
"solver": "rmsprop",
"eval_period": 1.0,
"drop_prob_encoder": 0.5
}
Initializing data provider for dataset flickr8k...
BasicDataProvider: reading data/flickr8k/dataset.json
BasicDataProvider: reading data/flickr8k/vgg_feats.mat
preprocessing word counts and creating vocab based on word count threshold 5

253/15000 batch done in 3.242s. at epoch 0.84. loss cost = 39.264201, reg cost = 0.000001, ppl2 = 29.60 (smooth 47.89)
254/15000 batch done in 3.133s. at epoch 0.85. loss cost = 39.633654, reg cost = 0.000001, ppl2 = 33.57 (smooth 47.74)
255/15000 batch done in 3.169s. at epoch 0.85. loss cost = 38.571550, reg cost = 0.000001, ppl2 = 29.56 (smooth 47.56)

.
.
.
.
.
one day later...
.

14999/15000 batch done in 3.492s. at epoch 50.00. loss cost = 28.621228, reg cost = 0.000004, ppl2 = 11.19 (smooth 10.80)
evaluating val performance in batches of 100
evaluated 5000 sentences and got perplexity = 17.785250
validation perplexity = 17.785250

from neuraltalk.

karpathy commented on July 24, 2024

@StevenLOL Nice! Looking at the Model Zoo,
http://cs.stanford.edu/people/karpathy/neuraltalk/

my LSTM model achieves perplexity of about 15.7 (which is slightly better). I ran it for longer and cross-validated it on our cluster, though.

from neuraltalk.

pannous commented on July 24, 2024

Thanks I will try again with reduced learning rate

On Jan 10, 2015, at 10:59 AM, Steven [email protected] wrote:

Here is my result on the default setting:

python driver.py
parsed parameters:
{
"grad_clip": 5,
"rnn_relu_encoders": 0,
"dataset": "flickr8k",
"image_encoding_size": 256,
"eval_max_images": -1,
"drop_prob_decoder": 0.5,
"word_encoding_size": 256,
"max_epochs": 50,
"eval_batch_size": 100,
"fappend": "baseline",
"generator": "lstm",
"write_checkpoint_ppl_threshold": -1,
"decay_rate": 0.999,
"tanhC_version": 0,
"hidden_size": 256,
"momentum": 0.0,
"worker_status_output_directory": "status/",
"learning_rate": 0.001,
"checkpoint_output_directory": "cv/",
"do_grad_check": 0,
"word_count_threshold": 5,
"batch_size": 100,
"regc": 1e-08,
"smooth_eps": 1e-08,
"solver": "rmsprop",
"eval_period": 1.0,
"drop_prob_encoder": 0.5
}
Initializing data provider for dataset flickr8k...
BasicDataProvider: reading data/flickr8k/dataset.json
BasicDataProvider: reading data/flickr8k/vgg_feats.mat
preprocessing word counts and creating vocab based on word count threshold 5

253/15000 batch done in 3.242s. at epoch 0.84. loss cost = 39.264201, reg cost = 0.000001, ppl2 = 29.60 (smooth 47.89)
254/15000 batch done in 3.133s. at epoch 0.85. loss cost = 39.633654, reg cost = 0.000001, ppl2 = 33.57 (smooth 47.74)
255/15000 batch done in 3.169s. at epoch 0.85. loss cost = 38.571550, reg cost = 0.000001, ppl2 = 29.56 (smooth 47.56)

.
.
.
.
.
one day later...
.

14999/15000 batch done in 3.492s. at epoch 50.00. loss cost = 28.621228, reg cost = 0.000004, ppl2 = 11.19 (smooth 10.80)
evaluating val performance in batches of 100
evaluated 5000 sentences and got perplexity = 17.785250
validation perplexity = 17.785250

—
Reply to this email directly or view it on GitHub.

from neuraltalk.

Aborting, cost seems to be exploding. about neuraltalk HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent