luoweizhou / densecap Goto Github PK

View Code? Open in Web Editor NEW

This project forked from salesforce/densecap

41.0 41.0 10.0 1.96 MB

Dense video captioning in PyTorch

License: BSD 3-Clause "New" or "Revised" License

Python 3.87% Jupyter Notebook 96.13%

activitynet-captions dense-video-captioning transformer video-captioning youcook2

densecap's People

Contributors

Stargazers

Watchers

Forkers

zhangzhizz feiyuhug mbrei ttengwang ammieqi suejing momalave aniloc111 deepaliverma onlyonewater

densecap's Issues

not getting train_samplelist.pkl file

Sir, can you please tell me where i get this pickle file

Cannot re-initialize CUDA in forked subprocess

I am trying to run training for the end-to-end masked transformer using the ActivityNet data set. Currently I am running this on an AWS EC2 instance of type p2.xlarge, which has one GPU. I call the training script as follows:

CUDA_VISIBLE_DEVICES=0 python scripts/train.py --dist_url ./ss_model --cfgs_file cfgs/anet.yml --checkpoint_path ./checkpoint/ss_model --batch_size 14 --world_size 1 --cuda --sent_weight 0.25 --mask_weight 1.0 --gated_mask | tee log/ss_model-0

Unfortunately I run into the error below with regards to multiprocessing. So far I have been unable to debug it successfully. When adding the spawn method as indicated by the error messages, further errors occur. I would appreciate any help in figuring out what I'm doing wrong.

train.py:122: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  options_yaml = yaml.load(handle)
Namespace(alpha=0.95, attn_dropout=0.2, batch_size=14, beta=0.999, cap_dropout=0.2, cfgs_file='cfgs/anet.yml', checkpoint_path='./checkpoint/weird', cls_weight=1.0, cuda=True, d_hidden=2048, d_model=1024, dataset='anet', dataset_file='./data/anet/anet_annotations_trainval.json', densecap_references=['./data/anet/val_1.json', './data/anet/val_2.json'], dist_backend='gloo', dist_url='./weird', dur_file='./data/anet/anet_duration_frame.csv', enable_visdom=False, epsilon=1e-08, feature_root='./dataset', gated_mask=True, grad_norm=1, image_feat_size=3072, in_emb_dropout=0.1, kernel_list=[1, 2, 3, 4, 5, 7, 9, 11, 15, 21, 29, 41, 57, 71, 111, 161, 211, 251], learning_rate=0.1, load_train_samplelist=False, load_valid_samplelist=False, loss_alpha_r=2, losses_log_every=1, mask_weight=1.0, max_epochs=20, max_sentence_len=20, n_heads=8, n_layers=2, neg_thresh=0.3, num_workers=1, optim='sgd', patience_epoch=1, pos_thresh=0.7, reduce_factor=0.5, reg_weight=10, sample_prob=0, sampling_sec=0.5, save_checkpoint_every=1, save_train_samplelist=False, save_valid_samplelist=False, scst_weight=0.0, seed=213, sent_weight=0.25, slide_window_size=480, slide_window_stride=20, start_from='', stride_factor=50, train_data_folder=['training'], train_sample=20, train_samplelist_path='/z/home/luozhou/subsystem/densecap_vid/train_samplelist.pkl', val_data_folder=['validation'], valid_batch_size=64, valid_samplelist_path='/z/home/luozhou/subsystem/densecap_vid/valid_samplelist.pkl', vis_emb_dropout=0.1, world_size=1)
loading dataset
# of words in the vocab: 4563
# of sentences in training: 37421, # of sentences in validation: 17505
# of training videos: 10009
size of the sentence block variable (['training']): torch.Size([37415, 20])
Process ForkPoolWorker-1:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
    torch.cuda._lazy_init()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
    torch.cuda._lazy_init()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Process ForkPoolWorker-3:
Process ForkPoolWorker-4:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
    torch.cuda._lazy_init()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
    torch.cuda._lazy_init()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.c line=150 error=3 : initialization error
Process ForkPoolWorker-5:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 94, in rebuild_storage_cuda
    return storage._new_view(offset, view_size)
RuntimeError: cuda runtime error (3) : initialization error at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.c:150
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.c line=150 error=3 : initialization error

Which pre-trained ResNet-200 model did you use?

Dear author @LuoweiZhou , thanks for your help in my last issue. Now, I meet another problem 😂 Could you release the pre-trained ResNet-200 model which you used to extract appearance feature?

Datasets and pretrained models are taken down

Datasets and pretrained models sites are taken down, even after trying in multiple browsers i am not able to open these sites

Can you release all the model parameters?

Hi author, I am interested in your work and I have tried to reproduce this project in YouCook2 validation dataset. But the result let me down... I don't know why I cannot realize the result reported in your paper.

I follow the setting what you write in the readme.
The model parameters what I used is below:
--max_sentence_len: 20
--d_mode: 1024
--d_hidden: 2048
--n_heads: 8
--in_emb_dropout: 0.1
--attn_dropout: 0.2
--vis_emb_dropout', default=0.1
--cap_dropout', default=0.2
--image_feat_size', default=3072
--n_layers', default=2
--train_sample', default=20
--sample_prob', default=0

--slide_window_size', default=480
--slide_window_stride', default=20
--sampling_sec', default=0.5
--kernel_list', default=[1, 2, 3, 4, 5, 7, 9, 11, 15, 21, 29, 41, 57, 71, 111, 161, 211, 251]
--pos_thresh', default=0.7
--neg_thresh', default=0.3
--stride_factor', default=50

--max_epochs', default=20
--batch_size', default=16
--valid_batch_size', default=16
--cls_weight', default=1.0
--reg_weight', default=10
--sent_weight', default=0.25
--scst_weight', default=0.0
--mask_weight', default=1.0
--gated_mask', action='store_true'

--optim',default='sgd'
--learning_rate', default=0.1
--alpha', default=0.95
--beta', default=0.999
--epsilon', default=1e-8
--loss_alpha_r', default=2
--patience_epoch', default=1
--reduce_factor', default=0.5
--grad_norm', default=1

The result what I got is:
Proposal recall area: 18.823
BLEU@3: 2.749
BLEU@4: 0.6991
Meteor: 8.4377

And I found that the validation loss reduces to 1.2+ and It cannot be optimized for further reduction. Does it seem that the model still in the under-fitting state? The one reason I can figure out is that the model parameters I set are not the same as what you use. Hope to get your answer soon. Thanks!

ActionPropDenseCap' object has no attribute 'module

Thank u so much sir for the previous reply, but now I m facing another issue related to
"AttributeError: 'ActionPropDenseCap' object has no attribute 'module'"
please suggest me a way to the solution and also I m using a single GPU.

feature files for yc2 download link down

Hi,

It comes nothing although I tried several times to download yc2's feature file

Would you mind checking the link status?

Thank you in advance!

Error running test.py:

The errors and configurations are as follows:
D:\Anaconda2\envs\py36\python.exe D:/PythonCode1/densecap-master/test.py --cfgs_file cfgs/anet.yml --densecap_eval_file tools/densevid_eval/evaluate.py --batch_size 1 --start_from checkpoint/anet-2L-e2e-mask/model_epoch_19.t7 --id anet-2L-e2e-mask-19 --val_data_folder validation --learn_mask --gated_mask --cuda
D:/PythonCode1/densecap-master/test.py:82: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
options_yaml = yaml.load(handle)
Namespace(attn_dropout=0.2, batch_size=1, cap_dropout=0.2, cfgs_file='cfgs/anet.yml', cuda=True, d_hidden=2048, d_model=1024, dataset='anet', dataset_file='data/anet/anet_annotations_trainval.json', densecap_eval_file='tools/densevid_eval/evaluate.py', densecap_references=['data/anet/val_1.json', 'data/anet/val_2.json'], dur_file='data/anet/anet_duration_frame.csv', feature_root='dataset/anet/ActivityNet', gated_mask=True, id='anet-2L-e2e-mask-19', image_feat_size=3072, in_emb_dropout=0.1, kernel_list=[1, 2, 3, 4, 5, 7, 9, 11, 15, 21, 29, 41, 57, 71, 111, 161, 211, 251], learn_mask=True, max_prop_num=500, max_sentence_len=20, min_prop_before_nms=200, min_prop_num=50, n_heads=8, n_layers=2, num_workers=2, pos_thresh=0.7, sampling_sec=0.5, slide_window_size=480, slide_window_stride=20, start_from='checkpoint/anet-2L-e2e-mask/model_epoch_19.t7', stride_factor=50, val_data_folder='validation', vis_emb_dropout=0.1)
loading dataset

of words in the vocab: 4563

of sentences in training: 37421, # of sentences in validation: 17505

of training videos: 10009

total number of samples (unique videos): 0
total number of sentences: 0
building model
Initializing weights from checkpoint/anet-2L-e2e-mask/model_epoch_19.t7
avg_prop_num:
0
loader.dataset:
0
Traceback (most recent call last):
File "D:/PythonCode1/densecap-master/test.py", line 256, in
main()
File "D:/PythonCode1/densecap-master/test.py", line 250, in main
recall_area = validate(model, test_loader, args)
File "D:/PythonCode1/densecap-master/test.py", line 202, in validate
print("average proposal number: {}".format(avg_prop_num/len(loader.dataset)))
ZeroDivisionError: division by zero

Process finished with exit code 1
Looking forward to your reply!

exception on loading dataset

I encountered an exception on the data loading process: https://github.com/LuoweiZhou/densecap/blob/master/data/anet_dataset.py#L146

I got this:

Traceback (most recent call last):
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
torch.cuda._lazy_init()
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/site-packages/torch/cuda/init.py", line 159, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Is ther any idea to fix it?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.