Giter VIP home page Giter VIP logo

cbnet_caffe's People

Contributors

pkubahuangliuhe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cbnet_caffe's Issues

gpu out of memory issue

What kind of GPU you used to train this model? My GPU is Nvidia 1080ti, I was trying to train a model using the config 'e2e_mask_cascade_rcnn_dual-X-152-32x8d-FPN-IN5k_1.44x.yaml', but despite I set the batchsize to 1, the training still can't goes on.

[I net_async_base.h:206] Using specified CPU pool size: 16; device id: -1
[I net_async_base.h:211] Created new CPU pool, size: 16; device id: -1
[E net_async_base.cc:382] [enforce fail at context_gpu.cu:496] error == cudaSuccess. 2 vs 0. Error at: /pytorch/caffe2/core/context_gpu.cu:496: out of memory
Error from operator:
input: "gpu_0/res4_23_branch2c" input: "gpu_0/res4_23_branch2c_bn_s" input: "gpu_0/res4_23_branch2c_bn_b" output: "gpu_0/res4_23_branch2c_bn" name: "" type: "AffineChannel" device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47

Any tips for finetuning on private datasets?

Hi @PKUbahuangliuhe ,

Great work! So fay as i know, cbnet is the sota on coco object detection tasks.
Do you have any plans to support finetuning on other datasets(different number of object classes) from your pretrained model?
Any tips or reference code here?

Many thanks!

RuntimeError: [enforce fail at pybind_state.h:425] . Exception encountered running PythonOp function: ValueError: min() arg is an empty sequence

I download the pretrained model from "'https://dl.fbaipublicfiles.com/detectron'" and put it in the /tmp/detectron/ImageNetPretrained/25093814
BUT the error happend when I start to train the model. I do not understand this error meaning.
Traceback (most recent call last):
File "/home/wrc/CBNet/tools/train_net.py", line 132, in
main()
File "/home/wrc/CBNet/tools/train_net.py", line 114, in main
checkpoints = detectron.utils.train.train_model()
File "/home/wrc/CBNet/detectron/utils/train.py", line 67, in train_model
workspace.RunNet(model.net.Proto().name)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/workspace.py", line 250, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/workspace.py", line 211, in CallWithExceptionIntercept
return func(*args, **kwargs)
RuntimeError: [enforce fail at pybind_state.h:425] . Exception encountered running PythonOp function: ValueError: min() arg is an empty sequence

At:
/home/wrc/CBNet/detectron/utils/segms.py(136): polys_to_boxes
/home/wrc/CBNet/detectron/roi_data/mask_rcnn.py(46): add_mask_rcnn_blobs
/home/wrc/CBNet/detectron/roi_data/cascade_rcnn.py(193): _sample_rois
/home/wrc/CBNet/detectron/roi_data/cascade_rcnn.py(105): add_cascade_rcnn_blobs
/home/wrc/CBNet/detectron/ops/distribute_cascade_proposals.py(61): forward

Error from operator:
input: "gpu_0/proposals_3" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois_3" output: "gpu_0/labels_int32_3" output: "gpu_0/bbox_targets_3" output: "gpu_0/bbox_inside_weights_3" output: "gpu_0/bbox_outside_weights_3" output: "gpu_0/mapped_gt_boxes_3" output: "gpu_0/mask_rois" output: "gpu_0/roi_has_mask_int32" output: "gpu_0/masks_int32" output: "gpu_0/rois_3_fpn2" output: "gpu_0/rois_3_fpn3" output: "gpu_0/rois_3_fpn4" output: "gpu_0/rois_3_fpn5" output: "gpu_0/rois_3_idx_restore_int32" output: "gpu_0/mask_rois_fpn2" output: "gpu_0/mask_rois_fpn3" output: "gpu_0/mask_rois_fpn4" output: "gpu_0/mask_rois_fpn5" output: "gpu_0/mask_rois_idx_restore_int32" name: "DistributeCascadeProposalsOp:gpu_0/proposals_3,gpu_0/roidb,gpu_0/im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:11" } arg { name: "grad_output_indices" } device_option { device_type: 0 }Error from operator:
input: "gpu_0/proposals_3" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois_3" output: "gpu_0/labels_int32_3" output: "gpu_0/bbox_targets_3" output: "gpu_0/bbox_inside_weights_3" output: "gpu_0/bbox_outside_weights_3" output: "gpu_0/mapped_gt_boxes_3" output: "gpu_0/mask_rois" output: "gpu_0/roi_has_mask_int32" output: "gpu_0/masks_int32" output: "gpu_0/rois_3_fpn2" output: "gpu_0/rois_3_fpn3" output: "gpu_0/rois_3_fpn4" output: "gpu_0/rois_3_fpn5" output: "gpu_0/rois_3_idx_restore_int32" output: "gpu_0/mask_rois_fpn2" output: "gpu_0/mask_rois_fpn3" output: "gpu_0/mask_rois_fpn4" output: "gpu_0/mask_rois_fpn5" output: "gpu_0/mask_rois_idx_restore_int32" name: "DistributeCascadeProposalsOp:gpu_0/proposals_3,gpu_0/roidb,gpu_0/im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:11" } arg { name: "grad_output_indices" } device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7f2dc790c409 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0xa2b85 (0x7f2dc805ab85 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #2: + 0xa0fe7 (0x7f2dc8058fe7 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #3: + 0xea931 (0x7f2dc80a2931 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #4: + 0xe8ffd (0x7f2dc80a0ffd in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f2da8e92b94 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #6: + 0x168f009 (0x7f2da8e99009 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x2a3 (0x7f2dc79062f3 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #8: + 0xc8421 (0x7f2ddc0a1421 in /home/wrc/anaconda3/envs/py27/bin/../lib/libstdc++.so.6)
frame #9: + 0x76ba (0x7f2de87036ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #10: clone + 0x6d (0x7f2de7d2941d in /lib/x86_64-linux-gnu/libc.so.6)

Detectron ops lib not found

在编译Detectron官方代码完后运行python2 detectron/tests/test_spatial_narrow_as_op.py测试时 正常通过,但编译完您的代码后,运行测试代码就会报错

Traceback (most recent call last):
  File "detectron/tests/test_spatial_narrow_as_op.py", line 88, in <module>
    c2_utils.import_detectron_ops()
  File "/home/wrc/CBNet/detectron/utils/c2.py", line 43, in import_detectron_ops
    detectron_ops_lib = envu.get_detectron_ops_lib()
  File "/home/wrc/CBNet/detectron/utils/env.py", line 71, in get_detectron_ops_lib
    ('Detectron ops lib not found; make sure that your Caffe2 '
AssertionError: Detectron ops lib not found; make sure that your Caffe2 version includes Detectron module

请问下是不是这个代码只能使用源码编译的方法安装的caffe2

Sharing weights for CBNet

Hello, regarding the comparative experimental design of shared parameters, I would like to ask how to implement parameter sharing in the code?

About Composite connection

Hi,in your paper, about AHLC, composite connection include 1x1 con layer and bn layer, is there no activate function? The activate function used after combine with lower feature?

Where is the BN layer before upsampling?

The paper said

the composite connection, which consists of a 1×1 convolutional layer and batch normalization
layer to reduce the channels and an upsample operation

But I can't find BN before upsampling here

image

any plans for using the CBNet in Detectron2?

PLEASE FOLLOW THESE INSTRUCTIONS BEFORE POSTING

  1. Please thoroughly read README.md, INSTALL.md, GETTING_STARTED.md, and FAQ.md
  2. Please search existing open and closed issues in case your issue has already been reported
  3. Please try to debug the issue in case you can solve it on your own before posting

After following steps 1-3 above and agreeing to provide the detailed information requested below, you may continue with posting your issue

(Delete this line and the text above it.)

Expected results

What did you expect to see?

Actual results

What did you observe instead?

Detailed steps to reproduce

E.g.:

The command that you ran

System information

  • Operating system: ?
  • Compiler version: ?
  • CUDA version: ?
  • cuDNN version: ?
  • NVIDIA driver version: ?
  • GPU models (for all devices if they are not all the same): ?
  • PYTHONPATH environment variable: ?
  • python --version output: ?
  • Anything else that seems relevant: ?

INFO net.py: 89: old_res5_2_branch2c_bn_b not found

I have installed CBNet follow the Detectron repo install tutorial.
However,when I run the infer_simple.py.I got the error.
INFO net.py: 89: old_res5_2_branch2c_bn_b not found
The command I use:

python tools/infer_simple.py \   
 --cfg configs/cascade_rcnn_baselines/e2e_mask_cascade_rcnn_X-152-32x8d-FPN-        IN5k_1.44x.yaml
 --output-dir /tmp/detectron-visualizations \     
--image-ext jpg     \
--wts https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl       
demo

What should I do to fix it?

请问是否计划在mmdetection工具箱上实现CBNet?

PLEASE FOLLOW THESE INSTRUCTIONS BEFORE POSTING

  1. Please thoroughly read README.md, INSTALL.md, GETTING_STARTED.md, and FAQ.md
  2. Please search existing open and closed issues in case your issue has already been reported
  3. Please try to debug the issue in case you can solve it on your own before posting

After following steps 1-3 above and agreeing to provide the detailed information requested below, you may continue with posting your issue

(Delete this line and the text above it.)

Expected results

What did you expect to see?

Actual results

What did you observe instead?

Detailed steps to reproduce

E.g.:

The command that you ran

System information

  • Operating system: ?
  • Compiler version: ?
  • CUDA version: ?
  • cuDNN version: ?
  • NVIDIA driver version: ?
  • GPU models (for all devices if they are not all the same): ?
  • PYTHONPATH environment variable: ?
  • python --version output: ?
  • Anything else that seems relevant: ?

RuntimeError: [enforce fail at operator.cc:75] blob != nullptr. op Conv: Encountered a non-existing input blob: gpu_0/old_res3_7_sum

I don't have 8 GPUS, so I chang3 Num_GPUS to 2 and it raise this error. How can I fix it?

I use e2e_cascade_rcnn_X-101-64x4d-FPN_1x.yaml. I change it like:
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 21
FASTER_RCNN: True
CASCADE_ON: True
CLS_AGNOSTIC_BBOX_REG: True # default: False
NUM_GPUS: 2
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
CASCADE_RCNN:
ROI_BOX_HEAD: cascade_rcnn_heads.add_roi_2mlp_head
NUM_STAGE: 3
TEST_STAGE: 3
TEST_ENSEMBLE: True
TRAIN:
WEIGHTS: https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_valminusminival',)
SCALE: 800
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .

the error:

[W workspace.cc:170] Blob gpu_0/old_res3_7_sum not in the workspace.
WARNING workspace.py: 222: Original python traceback for operator 383 in network generalized_rcnn in exception above (most recent call last):
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/tools/train_net.py", line 133, in
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/tools/train_net.py", line 115, in main
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/utils/train.py", line 53, in train_model
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/utils/train.py", line 145, in create_model
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/model_builder.py", line 127, in create
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/model_builder.py", line 91, in generalized_rcnn
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/model_builder.py", line 259, in build_generic_detection_model
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/optimizer.py", line 40, in build_data_parallel_model
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/optimizer.py", line 63, in _build_forward_graph
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/model_builder.py", line 189, in _single_gpu_build_func
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/FPN.py", line 64, in add_fpn_ResNet101_conv5_body
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/FPN.py", line 112, in add_fpn_onto_conv_body
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/ResNet.py", line 48, in add_ResNet101_conv5_body
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/ResNet.py", line 145, in add_ResNet_convX_body
Traceback (most recent call last):
File "/home/lzy/diverse/CBNet/tools/train_net.py", line 133, in
main()
File "/home/lzy/diverse/CBNet/tools/train_net.py", line 115, in main
checkpoints = detectron.utils.train.train_model()
File "/home/lzy/diverse/CBNet/detectron/utils/train.py", line 58, in train_model
setup_model_for_training(model, weights_file, output_dir)
File "/home/lzy/diverse/CBNet/detectron/utils/train.py", line 179, in setup_model_for_training
workspace.CreateNet(model.net)
File "/home/lzy/pytorch/build/caffe2/python/workspace.py", line 181, in CreateNet
StringifyProto(net), overwrite,
File "/home/lzy/pytorch/build/caffe2/python/workspace.py", line 215, in CallWithExceptionIntercept
return func(args, kwargs)
RuntimeError: [enforce fail at operator.cc:75] blob != nullptr. op Conv: Encountered a non-existing input blob: gpu_0/old_res3_7_sum
frame #0: c10::ThrowEnforceNotMet(char const
, int, char const
, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const
) + 0x76 (0x7f916475ed36 in /home/lzy/pytorch/build/lib/libc10.so)
frame #1: caffe2::OperatorBase::OperatorBase(caffe2::OperatorDef const&, caffe2::Workspace*) + 0x3ff (0x7f9144b7bd2f in /home/lzy/pytorch/build/lib/libtorch.so)
frame #2: + 0x3f68805 (0x7f914635b805 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #3: + 0x3f868eb (0x7f91463798eb in /home/lzy/pytorch/build/lib/libtorch.so)
frame #4: + 0x3f8841e (0x7f914637b41e in /home/lzy/pytorch/build/lib/libtorch.so)
frame #5: std::_Function_handler<std::unique_ptr<caffe2::OperatorBase, std::default_deletecaffe2::OperatorBase > (caffe2::OperatorDef const&, caffe2::Workspace*), std::unique_ptr<caffe2::OperatorBase, std::default_deletecaffe2::OperatorBase > ()(caffe2::OperatorDef const&, caffe2::Workspace)>::_M_invoke(std::_Any_data const&, caffe2::OperatorDef const&, caffe2::Workspace*&&) + 0x23 (0x7f9164bf96a3 in /home/lzy/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #6: + 0x2786301 (0x7f9144b79301 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #7: caffe2::CreateOperator(caffe2::OperatorDef const&, caffe2::Workspace*, int) + 0x32a (0x7f9144b7a60a in /home/lzy/pytorch/build/lib/libtorch.so)
frame #8: caffe2::dag_utils::prepareOperatorNodes(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0x17f3 (0x7f9144b74b93 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #9: caffe2::AsyncNetBase::AsyncNetBase(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0x246 (0x7f9144b8c026 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #10: caffe2::AsyncSchedulingNet::AsyncSchedulingNet(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0x9 (0x7f9144bb6989 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #11: + 0x27c5e2e (0x7f9144bb8e2e in /home/lzy/pytorch/build/lib/libtorch.so)
frame #12: std::_Function_handler<std::unique_ptr<caffe2::NetBase, std::default_deletecaffe2::NetBase > (std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*), std::unique_ptr<caffe2::NetBase, std::default_deletecaffe2::NetBase > ()(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace)>::_M_invoke(std::_Any_data const&, std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*&&) + 0x23 (0x7f9144bb8ce3 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #13: caffe2::CreateNet(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0x847 (0x7f9144bc3117 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #14: caffe2::Workspace::CreateNet(std::shared_ptr<caffe2::NetDef const> const&, bool) + 0x13c (0x7f9144bdf24c in /home/lzy/pytorch/build/lib/libtorch.so)
frame #15: caffe2::Workspace::CreateNet(caffe2::NetDef const&, bool) + 0x9f (0x7f9144be094f in /home/lzy/pytorch/build/lib/libtorch.so)
frame #16: + 0x51f70 (0x7f9164beef70 in /home/lzy/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #17: + 0x521de (0x7f9164bef1de in /home/lzy/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #18: + 0x99160 (0x7f9164c36160 in /home/lzy/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)

frame #36: __libc_start_main + 0xf0 (0x7f9168059830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #37: + 0x107f (0x55e423b0507f in /home/lzy/anaconda2/envs/lzy/bin/python)

What's more, I can train model on the original detectron.

Can not find the codes for composite structure

Hey, nice work!

Now I have some trouble understanding your codes. I have read all your modifications and didn't find the codes for composite structure which connects 2 similar backbones. TBH, I am not very familiar with Caffe or Caffe2, so this may be my own issue.

Could you point out the file I need to read for it?
Thanks ahead for your time.

HTTP Error 301: Moved Permanently

老的链接好像失效了
INFO io.py: 67: Downloading remote file https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl to /tmp/detectron-download-cache/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl
Traceback (most recent call last):
File "/home/wrc/CBNet/tools/train_net.py", line 132, in
main()
File "/home/wrc/CBNet/tools/train_net.py", line 101, in main
assert_and_infer_cfg()
File "/home/wrc/CBNet/detectron/core/config.py", line 1127, in assert_and_infer_cfg
cache_cfg_urls()
File "/home/wrc/CBNet/detectron/core/config.py", line 1136, in cache_cfg_urls
__C.TRAIN.WEIGHTS = cache_url(__C.TRAIN.WEIGHTS, __C.DOWNLOAD_CACHE)
File "/home/wrc/CBNet/detectron/utils/io.py", line 68, in cache_url
download_url(url, cache_file_path)
File "/home/wrc/CBNet/detectron/utils/io.py", line 114, in download_url
response = urllib2.urlopen(url)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 301: Moved Permanently

然后我直接下载了detectron2的pkl权重 放到指定文件夹,又修改了config文件中的WEIGHTS: /home/wrc/CBNet/pretrained/X-152-32x8d-IN5k.pkl
同时def assert_and_infer_cfg(cache_urls=True, make_immutable=True):这里设置为False
但是新下载的权重和网络好像对不上
运行的时候报错
首先是很多参数找不到
INFO net.py: 173: 3_b not found
src_name is score_3_w
INFO net.py: 173: score_3_w not found
src_name is score_3_b
INFO net.py: 173: score_3_b not found
src_name is _pred_3_w
INFO net.py: 173: _pred_3_w not found
src_name is _pred_3_b
INFO net.py: 173: _pred_3_b not found
src_name is _w
然后报gpu的错误
[E net_async_base.cc:377] [enforce fail at context_gpu.cu:495] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1556653000816/work/caffe2/core/context_gpu.cu:495: out of memory
Error from operator:
input: "gpu_0/res4_17_branch2c_bn" input: "gpu_0/res4_18_branch2a_w" input: "gpu_0/__m9_shared" output: "gpu_0/res4_18_branch2a_w_grad" output: "gpu_0/__m16_shared" name: "" type: "ConvGradient" arg { name: "no_bias" i: 1 } arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN" is_gradient_op: trueframe #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7fb0ab76c409 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x281f180 (0x7fb06b638180 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x14db285 (0x7fb08cb45285 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #3: caffe2::empty(c10::ArrayRef, c10::TensorOptions) + 0x72 (0x7fb08cd38ae2 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #4: + 0x1465745 (0x7fb06a27e745 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: + 0x1468b75 (0x7fb06a281b75 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: + 0x1468e8a (0x7fb06a281e8a in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: bool caffe2::CudnnConvGradientOp::DoRunWithType<float, float, float, float, float, float, float>() + 0x2c5 (0x7fb06a295315 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #8: caffe2::CudnnConvGradientOp::RunOnDevice() + 0xb0 (0x7fb06a27c8c0 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #9: + 0x13cb0b5 (0x7fb06a1e40b5 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #10: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7fb08ccf2b94 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #11: + 0x168f009 (0x7fb08ccf9009 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #12: c10::ThreadPool::main_loop(unsigned long) + 0x2a3 (0x7fb0ab7662f3 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #13: + 0xc8421 (0x7fb0bff01421 in /home/wrc/anaconda3/envs/py27/bin/../lib/libstdc++.so.6)
frame #14: + 0x76ba (0x7fb0cc5636ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #15: clone + 0x6d (0x7fb0cbb8941d in /lib/x86_64-linux-gnu/libc.so.6)
, op ConvGradient
想问下有没有新的可以用的权重链接,或者说可以直接使用的与训练权重可以分享下

aws client error (PermanentRedirect)

Hi, I encounter the error when I run sh train_cascade.sh:

tere3927@terence-ubuntu:~/code/python/pytorch/object_detection/cbnet$ sh train_cascade.sh
Found Detectron ops lib: /home/tere3927/.local/lib/python2.7/site-packages/torch/lib/libcaffe2_detectron_ops_gpu.so
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
INFO train_net.py: 95: Called with args:
INFO train_net.py: 96: Namespace(cfg_file='configs/cascade_rcnn_baselines/e2e_mask_cascade_rcnn_dual-X-152-32x8d-FPN-IN5k_1.44x.yaml', multi_gpu_testing=True, opts=['OUTPUT_DIR', 'detectron-output'], skip_test=False)
configs/cascade_rcnn_baselines/e2e_mask_cascade_rcnn_dual-X-152-32x8d-FPN-IN5k_1.44x.yaml
INFO io.py: 67: Downloading remote file https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl to /tmp/detectron-download-cache/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl
Traceback (most recent call last):
File "tools/train_net.py", line 133, in
main()
File "tools/train_net.py", line 102, in main
assert_and_infer_cfg()
File "/home/tere3927/code/python/pytorch/object_detection/cbnet/detectron/core/config.py", line 1127, in assert_and_infer_cfg
cache_cfg_urls()
File "/home/tere3927/code/python/pytorch/object_detection/cbnet/detectron/core/config.py", line 1136, in cache_cfg_urls
__C.TRAIN.WEIGHTS = cache_url(__C.TRAIN.WEIGHTS, __C.DOWNLOAD_CACHE)
File "/home/tere3927/code/python/pytorch/object_detection/cbnet/detectron/utils/io.py", line 68, in cache_url
download_url(url, cache_file_path)
File "/home/tere3927/code/python/pytorch/object_detection/cbnet/detectron/utils/io.py", line 114, in download_url
response = urllib2.urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 301: Moved Permanently

What might the problem be?

Thanks you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.