luoweizhou / densecap Goto Github PK
View Code? Open in Web Editor NEWThis project forked from salesforce/densecap
Dense video captioning in PyTorch
License: BSD 3-Clause "New" or "Revised" License
This project forked from salesforce/densecap
Dense video captioning in PyTorch
License: BSD 3-Clause "New" or "Revised" License
Sir, can you please tell me where i get this pickle file
I am trying to run training for the end-to-end masked transformer using the ActivityNet data set. Currently I am running this on an AWS EC2 instance of type p2.xlarge, which has one GPU. I call the training script as follows:
CUDA_VISIBLE_DEVICES=0 python scripts/train.py --dist_url ./ss_model --cfgs_file cfgs/anet.yml --checkpoint_path ./checkpoint/ss_model --batch_size 14 --world_size 1 --cuda --sent_weight 0.25 --mask_weight 1.0 --gated_mask | tee log/ss_model-0
Unfortunately I run into the error below with regards to multiprocessing. So far I have been unable to debug it successfully. When adding the spawn method as indicated by the error messages, further errors occur. I would appreciate any help in figuring out what I'm doing wrong.
train.py:122: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
options_yaml = yaml.load(handle)
Namespace(alpha=0.95, attn_dropout=0.2, batch_size=14, beta=0.999, cap_dropout=0.2, cfgs_file='cfgs/anet.yml', checkpoint_path='./checkpoint/weird', cls_weight=1.0, cuda=True, d_hidden=2048, d_model=1024, dataset='anet', dataset_file='./data/anet/anet_annotations_trainval.json', densecap_references=['./data/anet/val_1.json', './data/anet/val_2.json'], dist_backend='gloo', dist_url='./weird', dur_file='./data/anet/anet_duration_frame.csv', enable_visdom=False, epsilon=1e-08, feature_root='./dataset', gated_mask=True, grad_norm=1, image_feat_size=3072, in_emb_dropout=0.1, kernel_list=[1, 2, 3, 4, 5, 7, 9, 11, 15, 21, 29, 41, 57, 71, 111, 161, 211, 251], learning_rate=0.1, load_train_samplelist=False, load_valid_samplelist=False, loss_alpha_r=2, losses_log_every=1, mask_weight=1.0, max_epochs=20, max_sentence_len=20, n_heads=8, n_layers=2, neg_thresh=0.3, num_workers=1, optim='sgd', patience_epoch=1, pos_thresh=0.7, reduce_factor=0.5, reg_weight=10, sample_prob=0, sampling_sec=0.5, save_checkpoint_every=1, save_train_samplelist=False, save_valid_samplelist=False, scst_weight=0.0, seed=213, sent_weight=0.25, slide_window_size=480, slide_window_stride=20, start_from='', stride_factor=50, train_data_folder=['training'], train_sample=20, train_samplelist_path='/z/home/luozhou/subsystem/densecap_vid/train_samplelist.pkl', val_data_folder=['validation'], valid_batch_size=64, valid_samplelist_path='/z/home/luozhou/subsystem/densecap_vid/valid_samplelist.pkl', vis_emb_dropout=0.1, world_size=1)
loading dataset
# of words in the vocab: 4563
# of sentences in training: 37421, # of sentences in validation: 17505
# of training videos: 10009
size of the sentence block variable (['training']): torch.Size([37415, 20])
Process ForkPoolWorker-1:
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
torch.cuda._lazy_init()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Process ForkPoolWorker-2:
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
torch.cuda._lazy_init()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Process ForkPoolWorker-3:
Process ForkPoolWorker-4:
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
torch.cuda._lazy_init()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
torch.cuda._lazy_init()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.c line=150 error=3 : initialization error
Process ForkPoolWorker-5:
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 94, in rebuild_storage_cuda
return storage._new_view(offset, view_size)
RuntimeError: cuda runtime error (3) : initialization error at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.c:150
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.c line=150 error=3 : initialization error
Dear author @LuoweiZhou , thanks for your help in my last issue. Now, I meet another problem ๐ Could you release the pre-trained ResNet-200 model which you used to extract appearance feature?
Datasets and pretrained models sites are taken down, even after trying in multiple browsers i am not able to open these sites
Hi author, I am interested in your work and I have tried to reproduce this project in YouCook2 validation dataset. But the result let me down... I don't know why I cannot realize the result reported in your paper.
I follow the setting what you write in the readme.
The model parameters what I used is below:
--max_sentence_len: 20
--d_mode: 1024
--d_hidden: 2048
--n_heads: 8
--in_emb_dropout: 0.1
--attn_dropout: 0.2
--vis_emb_dropout', default=0.1
--cap_dropout', default=0.2
--image_feat_size', default=3072
--n_layers', default=2
--train_sample', default=20
--sample_prob', default=0
--slide_window_size', default=480
--slide_window_stride', default=20
--sampling_sec', default=0.5
--kernel_list', default=[1, 2, 3, 4, 5, 7, 9, 11, 15, 21, 29, 41, 57, 71, 111, 161, 211, 251]
--pos_thresh', default=0.7
--neg_thresh', default=0.3
--stride_factor', default=50
--max_epochs', default=20
--batch_size', default=16
--valid_batch_size', default=16
--cls_weight', default=1.0
--reg_weight', default=10
--sent_weight', default=0.25
--scst_weight', default=0.0
--mask_weight', default=1.0
--gated_mask', action='store_true'
--optim',default='sgd'
--learning_rate', default=0.1
--alpha', default=0.95
--beta', default=0.999
--epsilon', default=1e-8
--loss_alpha_r', default=2
--patience_epoch', default=1
--reduce_factor', default=0.5
--grad_norm', default=1
The result what I got is:
Proposal recall area: 18.823
BLEU@3: 2.749
BLEU@4: 0.6991
Meteor: 8.4377
And I found that the validation loss reduces to 1.2+ and It cannot be optimized for further reduction. Does it seem that the model still in the under-fitting state? The one reason I can figure out is that the model parameters I set are not the same as what you use. Hope to get your answer soon. Thanks!
Thank u so much sir for the previous reply, but now I m facing another issue related to
"AttributeError: 'ActionPropDenseCap' object has no attribute 'module'"
please suggest me a way to the solution and also I m using a single GPU.
Hi,
It comes nothing although I tried several times to download yc2's feature file
Would you mind checking the link status?
Thank you in advance!
The errors and configurations are as follows:
D:\Anaconda2\envs\py36\python.exe D:/PythonCode1/densecap-master/test.py --cfgs_file cfgs/anet.yml --densecap_eval_file tools/densevid_eval/evaluate.py --batch_size 1 --start_from checkpoint/anet-2L-e2e-mask/model_epoch_19.t7 --id anet-2L-e2e-mask-19 --val_data_folder validation --learn_mask --gated_mask --cuda
D:/PythonCode1/densecap-master/test.py:82: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
options_yaml = yaml.load(handle)
Namespace(attn_dropout=0.2, batch_size=1, cap_dropout=0.2, cfgs_file='cfgs/anet.yml', cuda=True, d_hidden=2048, d_model=1024, dataset='anet', dataset_file='data/anet/anet_annotations_trainval.json', densecap_eval_file='tools/densevid_eval/evaluate.py', densecap_references=['data/anet/val_1.json', 'data/anet/val_2.json'], dur_file='data/anet/anet_duration_frame.csv', feature_root='dataset/anet/ActivityNet', gated_mask=True, id='anet-2L-e2e-mask-19', image_feat_size=3072, in_emb_dropout=0.1, kernel_list=[1, 2, 3, 4, 5, 7, 9, 11, 15, 21, 29, 41, 57, 71, 111, 161, 211, 251], learn_mask=True, max_prop_num=500, max_sentence_len=20, min_prop_before_nms=200, min_prop_num=50, n_heads=8, n_layers=2, num_workers=2, pos_thresh=0.7, sampling_sec=0.5, slide_window_size=480, slide_window_stride=20, start_from='checkpoint/anet-2L-e2e-mask/model_epoch_19.t7', stride_factor=50, val_data_folder='validation', vis_emb_dropout=0.1)
loading dataset
total number of samples (unique videos): 0
total number of sentences: 0
building model
Initializing weights from checkpoint/anet-2L-e2e-mask/model_epoch_19.t7
avg_prop_num:
0
loader.dataset:
0
Traceback (most recent call last):
File "D:/PythonCode1/densecap-master/test.py", line 256, in
main()
File "D:/PythonCode1/densecap-master/test.py", line 250, in main
recall_area = validate(model, test_loader, args)
File "D:/PythonCode1/densecap-master/test.py", line 202, in validate
print("average proposal number: {}".format(avg_prop_num/len(loader.dataset)))
ZeroDivisionError: division by zero
Process finished with exit code 1
Looking forward to your reply!
Hi
I encountered an exception on the data loading process: https://github.com/LuoweiZhou/densecap/blob/master/data/anet_dataset.py#L146
I got this:
Traceback (most recent call last):
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
torch.cuda._lazy_init()
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/site-packages/torch/cuda/init.py", line 159, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Is ther any idea to fix it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.