Giter VIP home page Giter VIP logo

densecap's People

Contributors

luoweizhou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

densecap's Issues

Cannot re-initialize CUDA in forked subprocess

I am trying to run training for the end-to-end masked transformer using the ActivityNet data set. Currently I am running this on an AWS EC2 instance of type p2.xlarge, which has one GPU. I call the training script as follows:

CUDA_VISIBLE_DEVICES=0 python scripts/train.py --dist_url ./ss_model --cfgs_file cfgs/anet.yml --checkpoint_path ./checkpoint/ss_model --batch_size 14 --world_size 1 --cuda --sent_weight 0.25 --mask_weight 1.0 --gated_mask | tee log/ss_model-0

Unfortunately I run into the error below with regards to multiprocessing. So far I have been unable to debug it successfully. When adding the spawn method as indicated by the error messages, further errors occur. I would appreciate any help in figuring out what I'm doing wrong.

train.py:122: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  options_yaml = yaml.load(handle)
Namespace(alpha=0.95, attn_dropout=0.2, batch_size=14, beta=0.999, cap_dropout=0.2, cfgs_file='cfgs/anet.yml', checkpoint_path='./checkpoint/weird', cls_weight=1.0, cuda=True, d_hidden=2048, d_model=1024, dataset='anet', dataset_file='./data/anet/anet_annotations_trainval.json', densecap_references=['./data/anet/val_1.json', './data/anet/val_2.json'], dist_backend='gloo', dist_url='./weird', dur_file='./data/anet/anet_duration_frame.csv', enable_visdom=False, epsilon=1e-08, feature_root='./dataset', gated_mask=True, grad_norm=1, image_feat_size=3072, in_emb_dropout=0.1, kernel_list=[1, 2, 3, 4, 5, 7, 9, 11, 15, 21, 29, 41, 57, 71, 111, 161, 211, 251], learning_rate=0.1, load_train_samplelist=False, load_valid_samplelist=False, loss_alpha_r=2, losses_log_every=1, mask_weight=1.0, max_epochs=20, max_sentence_len=20, n_heads=8, n_layers=2, neg_thresh=0.3, num_workers=1, optim='sgd', patience_epoch=1, pos_thresh=0.7, reduce_factor=0.5, reg_weight=10, sample_prob=0, sampling_sec=0.5, save_checkpoint_every=1, save_train_samplelist=False, save_valid_samplelist=False, scst_weight=0.0, seed=213, sent_weight=0.25, slide_window_size=480, slide_window_stride=20, start_from='', stride_factor=50, train_data_folder=['training'], train_sample=20, train_samplelist_path='/z/home/luozhou/subsystem/densecap_vid/train_samplelist.pkl', val_data_folder=['validation'], valid_batch_size=64, valid_samplelist_path='/z/home/luozhou/subsystem/densecap_vid/valid_samplelist.pkl', vis_emb_dropout=0.1, world_size=1)
loading dataset
# of words in the vocab: 4563
# of sentences in training: 37421, # of sentences in validation: 17505
# of training videos: 10009
size of the sentence block variable (['training']): torch.Size([37415, 20])
Process ForkPoolWorker-1:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
    torch.cuda._lazy_init()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
    torch.cuda._lazy_init()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Process ForkPoolWorker-3:
Process ForkPoolWorker-4:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
    torch.cuda._lazy_init()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
    torch.cuda._lazy_init()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/cuda/__init__.py", line 159, in _lazy_init
    "Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.c line=150 error=3 : initialization error
Process ForkPoolWorker-5:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/multiprocessing/queues.py", line 337, in get
    return _ForkingPickler.loads(res)
  File "/home/ubuntu/miniconda3/envs/demo_ss2/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 94, in rebuild_storage_cuda
    return storage._new_view(offset, view_size)
RuntimeError: cuda runtime error (3) : initialization error at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.c:150
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/THC/generic/THCStorage.c line=150 error=3 : initialization error

Can you release all the model parameters?

Hi author, I am interested in your work and I have tried to reproduce this project in YouCook2 validation dataset. But the result let me down... I don't know why I cannot realize the result reported in your paper.

I follow the setting what you write in the readme.
The model parameters what I used is below:
--max_sentence_len: 20
--d_mode: 1024
--d_hidden: 2048
--n_heads: 8
--in_emb_dropout: 0.1
--attn_dropout: 0.2
--vis_emb_dropout', default=0.1
--cap_dropout', default=0.2
--image_feat_size', default=3072
--n_layers', default=2
--train_sample', default=20
--sample_prob', default=0

--slide_window_size', default=480
--slide_window_stride', default=20
--sampling_sec', default=0.5
--kernel_list', default=[1, 2, 3, 4, 5, 7, 9, 11, 15, 21, 29, 41, 57, 71, 111, 161, 211, 251]
--pos_thresh', default=0.7
--neg_thresh', default=0.3
--stride_factor', default=50

--max_epochs', default=20
--batch_size', default=16
--valid_batch_size', default=16
--cls_weight', default=1.0
--reg_weight', default=10
--sent_weight', default=0.25
--scst_weight', default=0.0
--mask_weight', default=1.0
--gated_mask', action='store_true'

--optim',default='sgd'
--learning_rate', default=0.1
--alpha', default=0.95
--beta', default=0.999
--epsilon', default=1e-8
--loss_alpha_r', default=2
--patience_epoch', default=1
--reduce_factor', default=0.5
--grad_norm', default=1

The result what I got is:
Proposal recall area: 18.823
BLEU@3: 2.749
BLEU@4: 0.6991
Meteor: 8.4377

And I found that the validation loss reduces to 1.2+ and It cannot be optimized for further reduction. Does it seem that the model still in the under-fitting state? The one reason I can figure out is that the model parameters I set are not the same as what you use. Hope to get your answer soon. Thanks!

ActionPropDenseCap' object has no attribute 'module

Thank u so much sir for the previous reply, but now I m facing another issue related to
"AttributeError: 'ActionPropDenseCap' object has no attribute 'module'"
please suggest me a way to the solution and also I m using a single GPU.

Error running test.py:

The errors and configurations are as follows:
D:\Anaconda2\envs\py36\python.exe D:/PythonCode1/densecap-master/test.py --cfgs_file cfgs/anet.yml --densecap_eval_file tools/densevid_eval/evaluate.py --batch_size 1 --start_from checkpoint/anet-2L-e2e-mask/model_epoch_19.t7 --id anet-2L-e2e-mask-19 --val_data_folder validation --learn_mask --gated_mask --cuda
D:/PythonCode1/densecap-master/test.py:82: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
options_yaml = yaml.load(handle)
Namespace(attn_dropout=0.2, batch_size=1, cap_dropout=0.2, cfgs_file='cfgs/anet.yml', cuda=True, d_hidden=2048, d_model=1024, dataset='anet', dataset_file='data/anet/anet_annotations_trainval.json', densecap_eval_file='tools/densevid_eval/evaluate.py', densecap_references=['data/anet/val_1.json', 'data/anet/val_2.json'], dur_file='data/anet/anet_duration_frame.csv', feature_root='dataset/anet/ActivityNet', gated_mask=True, id='anet-2L-e2e-mask-19', image_feat_size=3072, in_emb_dropout=0.1, kernel_list=[1, 2, 3, 4, 5, 7, 9, 11, 15, 21, 29, 41, 57, 71, 111, 161, 211, 251], learn_mask=True, max_prop_num=500, max_sentence_len=20, min_prop_before_nms=200, min_prop_num=50, n_heads=8, n_layers=2, num_workers=2, pos_thresh=0.7, sampling_sec=0.5, slide_window_size=480, slide_window_stride=20, start_from='checkpoint/anet-2L-e2e-mask/model_epoch_19.t7', stride_factor=50, val_data_folder='validation', vis_emb_dropout=0.1)
loading dataset

of words in the vocab: 4563

of sentences in training: 37421, # of sentences in validation: 17505

of training videos: 10009

total number of samples (unique videos): 0
total number of sentences: 0
building model
Initializing weights from checkpoint/anet-2L-e2e-mask/model_epoch_19.t7
avg_prop_num:
0
loader.dataset:
0
Traceback (most recent call last):
File "D:/PythonCode1/densecap-master/test.py", line 256, in
main()
File "D:/PythonCode1/densecap-master/test.py", line 250, in main
recall_area = validate(model, test_loader, args)
File "D:/PythonCode1/densecap-master/test.py", line 202, in validate
print("average proposal number: {}".format(avg_prop_num/len(loader.dataset)))
ZeroDivisionError: division by zero

Process finished with exit code 1
Looking forward to your reply!

exception on loading dataset

Hi

I encountered an exception on the data loading process: https://github.com/LuoweiZhou/densecap/blob/master/data/anet_dataset.py#L146

I got this:

Traceback (most recent call last):
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/pool.py", line 108, in worker
task = get()
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 95, in rebuild_storage_cuda
torch.cuda._lazy_init()
File "/s1_md0/v-botsh/anaconda/py3.6_torch0.4.0/lib/python3.6/site-packages/torch/cuda/init.py", line 159, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Is ther any idea to fix it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.