hobincar / sa-lstm Goto Github PK
View Code? Open in Web Editor NEWA Pytorch implementation of "describing videos by exploiting temporal structure", ICCV 2015
License: MIT License
A Pytorch implementation of "describing videos by exploiting temporal structure", ICCV 2015
License: MIT License
Hi hobincar, I see that there's some difference in nn.Linear and nn.LSTM when it comes to the batch dim. nn.Linear is batch_first while nn.LSTM is batch_middle. How did you manage that?
The get_last_hidden function in Decoder class is quite confusing for me. Can you demonstrate those complicated tensor operations for me? What does this function do? Many thanks.
Hi,
Why didn't you use the 3D-ResNext-101
feature of the MSR-VTT
dataset to train the model?
Hi, thank you very much for releasing your code.
Can you provide features for extracting MSVD
dataset withR(2+1)D
and 3D-ResNet
?
Hi,
I would like to ask you, how can I better adjust these hyperparameters? What's the trick?Or do we have to try them one by one?
Looking forward to your reply, thank you.
Hi,
Thank you again for releasing your code, your work is excellent. I think your research will get better and better.
I made some changes to your code as I thought. But the training results were very bad. I would like to consult you, what may be the reason for this ?
In Epoch 1
, I got results similar to yours. However, as the training progresses, the generated results become worse and worse, and the corresponding results are almost zero.
Looking forward to your reply and heartfelt thanks.
Partial results are as follows:
[Epoch #1] loss: 4.623 = (CE: 4.623) + (Ent: 0.0 * 3099.393): 100%|██████████| 244/244 [03:40<00:00, 1.11it/s]
loss: 5.72320610773 (CE 5.72320610773 + E 2923.30143263)
{'reflen': 581, 'guess': [574, 474, 374, 274], 'testlen': 574, 'correct': [425, 162, 78, 9]}
ratio: 0.987951807227
loss: 4.93912514773 (CE 4.93912514773 + E 2970.68725586)
scores: {'CIDEr': 0.12514787945892875, 'Bleu_4': 0.20157463878737503, 'Bleu_3': 0.3705521147726949, 'Bleu_2': 0.4969475811933564, 'Bleu_1': 0.731443463944282, 'ROUGE_L': 0.5897754218096293, 'METEOR': 0.22409070167637316}
Saving checkpoint at epoch=1 to checkpoints/SA-LSTM | MSVD | FEAT MSVD_InceptionV4 mfl-60 fsl-28 mcl-30 | EMB 468 | DEC uni-LSTM-l1-h512 at-256 | DEC uni-LSTM-l1-h512 at-256 | OPTIM AMSGrad lr-5e-05-dc-20-0.5-5-wd-1e-05 rg-0.0 | 191125-10:14:35/1.ckpt
[Epoch #4] loss: 3.143 = (CE: 3.143) + (Ent: 0.0 * 4099.900): 100%|██████████| 244/244 [03:39<00:00, 1.11it/s]
loss: 3.27911160422 (CE 3.27911160422 + E 4025.05741507)
{'reflen': 1117, 'guess': [1197, 1097, 997, 897], 'testlen': 1197, 'correct': [598, 191, 61, 15]}
ratio: 1.07162041182
loss: 5.4078495936 (CE 5.4078495936 + E 2899.17534846)
scores: {'CIDEr': 0.1675833601295786, 'Bleu_4': 0.09712741190089311, 'Bleu_3': 0.1745913876077926, 'Bleu_2': 0.2949285982058635, 'Bleu_1': 0.4995822890555559, 'ROUGE_L': 0.47106992298136474, 'METEOR': 0.1974787218445869}
Saving checkpoint at epoch=4 to checkpoints/SA-LSTM | MSVD | FEAT MSVD_InceptionV4 mfl-60 fsl-28 mcl-30 | EMB 468 | DEC uni-LSTM-l1-h512 at-256 | DEC uni-LSTM-l1-h512 at-256 | OPTIM AMSGrad lr-5e-05-dc-20-0.5-5-wd-1e-05 rg-0.0 | 191125-10:14:35/4.ckpt
[Epoch #8] loss: 2.302 = (CE: 2.302) + (Ent: 0.0 * 4122.210): 100%|██████████| 244/244 [03:39<00:00, 1.11it/s]
loss: 2.32029580581 (CE 2.32029580581 + E 3954.91530021)
{'reflen': 1399, 'guess': [1804, 1704, 1604, 1504], 'testlen': 1804, 'correct': [695, 216, 57, 11]}
ratio: 1.28949249464
loss: 6.12755023349 (CE 6.12755023349 + E 2833.49368564)
scores: {'CIDEr': 0.06094166329578324, 'Bleu_4': 0.059687988482948866, 'Bleu_3': 0.12017135956381349, 'Bleu_2': 0.2209867404159963, 'Bleu_1': 0.38525498891331195, 'ROUGE_L': 0.3835879206901543, 'METEOR': 0.1720359332410031}
Saving checkpoint at epoch=8 to checkpoints/SA-LSTM | MSVD | FEAT MSVD_InceptionV4 mfl-60 fsl-28 mcl-30 | EMB 468 | DEC uni-LSTM-l1-h512 at-256 | DEC uni-LSTM-l1-h512 at-256 | OPTIM AMSGrad lr-5e-05-dc-20-0.5-5-wd-1e-05 rg-0.0 | 191125-10:14:35/8.ckpt
[Epoch #10] loss: 2.018 = (CE: 2.018) + (Ent: 0.0 * 3962.546): 100%|██████████| 244/244 [03:39<00:00, 1.11it/s]
loss: 2.03280989068 (CE 2.03280989068 + E 3847.87243252)
{'reflen': 1527, 'guess': [1883, 1783, 1683, 1583], 'testlen': 1883, 'correct': [739, 215, 59, 11]}
ratio: 1.23313686968
loss: 6.26145935059 (CE 6.26145935059 + E 2839.57718173)
scores: {'CIDEr': 0.035406386106884076, 'Bleu_4': 0.05826935694302678, 'Bleu_3': 0.1183812833024839, 'Bleu_2': 0.21754074803295365, 'Bleu_1': 0.39245884227276023, 'ROUGE_L': 0.3603913688475546, 'METEOR': 0.17098638202970123}
Saving checkpoint at epoch=10 to checkpoints/SA-LSTM | MSVD | FEAT MSVD_InceptionV4 mfl-60 fsl-28 mcl-30 | EMB 468 | DEC uni-LSTM-l1-h512 at-256 | DEC uni-LSTM-l1-h512 at-256 | OPTIM AMSGrad lr-5e-05-dc-20-0.5-5-wd-1e-05 rg-0.0 | 191125-10:14:35/10.ckpt
Hi,
Why is your SA-LSTM(Inception-v4)
model so different from SA-LSTM[3](Inception-v4)
model in terms of evaluation indicators?
Such as:
SA-LSTM[3](Inception-v4)
BLEU4
(45.3) CIDEr
(76.2) METEOR
(31.9) ROUGE_L
(64.2)
SA-LSTM[Yours](Inception-v4)
BLEU4
(50.2) CIDEr
(79.0) METEOR
(33.3) ROUGE_L
(69.7)
Hi, your reproduced work and coding ability are both very good. So, would you consider using reforcement learning as objective function instead of cross entropy like many image captioning methods? I think this would further enhance the performance.
Thank you for your awesome code! When I run the train.py, I can only obtain the scores of four metrics. Thus, I have the question that how can the model generate sentences and store them?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.