vdigpku / cbnet_caffe Goto Github PK

View Code? Open in Web Editor NEW

411.0 411.0 78.0 3.88 MB

Composite Backbone Network (AAAI20)

License: Apache License 2.0

CMake 3.79% Makefile 0.06% Python 95.21% MATLAB 0.24% C++ 0.36% Cuda 0.22% Dockerfile 0.10% Shell 0.02%

cbnet_caffe's People

Contributors

Stargazers

Watchers

Forkers

hajungong007 fenguoo millx2021 xinxin12345 jjayyyy duanyuqi987 templeblock cqray1990 shengzhang90 stephan-who tqdavid dangofuko leo-xxx hzhang57 douxiaotian liuweiping2020 scape1989 wulingtian autohe wintersurvival bigbigrabbit sangkny kingwpf caikw0602 chaoso dreadlord1984 cvresearch-fun gegilligan lihaossu errorhandling catngocd liushuchun yoshilab liuchongwei onejune2018 melongl11 justcallmewilliam cxd1991 dimitrius-ion qianwangn sallyrobotics aixioma nayeem78 fengfengfeng96 windchaserz lironghua318 summer1719 yangyin2016 futurev leofengxin muhammadyasiralikhantareen christinaliang ikasumi sisrfeng emmalucky alphabetakappa druiddrone velaia tuggeluk yamin05114 peternara invite-you linliying1 mdv3101 chungelu dzbwhut lwzbuaa lddsjy youtang1993 shanhedian2017 xrosliang pgsrv tricoffee qianrenjian sinason yurongchen1998 00mjk zhanqiqi zyg11

cbnet_caffe's Issues

gpu out of memory issue

What kind of GPU you used to train this model? My GPU is Nvidia 1080ti, I was trying to train a model using the config 'e2e_mask_cascade_rcnn_dual-X-152-32x8d-FPN-IN5k_1.44x.yaml', but despite I set the batchsize to 1, the training still can't goes on.

[I net_async_base.h:206] Using specified CPU pool size: 16; device id: -1
[I net_async_base.h:211] Created new CPU pool, size: 16; device id: -1
[E net_async_base.cc:382] [enforce fail at context_gpu.cu:496] error == cudaSuccess. 2 vs 0. Error at: /pytorch/caffe2/core/context_gpu.cu:496: out of memory
Error from operator:
input: "gpu_0/res4_23_branch2c" input: "gpu_0/res4_23_branch2c_bn_s" input: "gpu_0/res4_23_branch2c_bn_b" output: "gpu_0/res4_23_branch2c_bn" name: "" type: "AffineChannel" device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47

Any tips for finetuning on private datasets?

Hi @PKUbahuangliuhe ,

Great work! So fay as i know, cbnet is the sota on coco object detection tasks.
Do you have any plans to support finetuning on other datasets(different number of object classes) from your pretrained model?
Any tips or reference code here?

Many thanks!

RuntimeError: [enforce fail at pybind_state.h:425] . Exception encountered running PythonOp function: ValueError: min() arg is an empty sequence

I download the pretrained model from "'https://dl.fbaipublicfiles.com/detectron'" and put it in the /tmp/detectron/ImageNetPretrained/25093814
BUT the error happend when I start to train the model. I do not understand this error meaning.
Traceback (most recent call last):
File "/home/wrc/CBNet/tools/train_net.py", line 132, in
main()
File "/home/wrc/CBNet/tools/train_net.py", line 114, in main
checkpoints = detectron.utils.train.train_model()
File "/home/wrc/CBNet/detectron/utils/train.py", line 67, in train_model
workspace.RunNet(model.net.Proto().name)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/workspace.py", line 250, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/workspace.py", line 211, in CallWithExceptionIntercept
return func(*args, **kwargs)
RuntimeError: [enforce fail at pybind_state.h:425] . Exception encountered running PythonOp function: ValueError: min() arg is an empty sequence

At:
/home/wrc/CBNet/detectron/utils/segms.py(136): polys_to_boxes
/home/wrc/CBNet/detectron/roi_data/mask_rcnn.py(46): add_mask_rcnn_blobs
/home/wrc/CBNet/detectron/roi_data/cascade_rcnn.py(193): _sample_rois
/home/wrc/CBNet/detectron/roi_data/cascade_rcnn.py(105): add_cascade_rcnn_blobs
/home/wrc/CBNet/detectron/ops/distribute_cascade_proposals.py(61): forward

Error from operator:
input: "gpu_0/proposals_3" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois_3" output: "gpu_0/labels_int32_3" output: "gpu_0/bbox_targets_3" output: "gpu_0/bbox_inside_weights_3" output: "gpu_0/bbox_outside_weights_3" output: "gpu_0/mapped_gt_boxes_3" output: "gpu_0/mask_rois" output: "gpu_0/roi_has_mask_int32" output: "gpu_0/masks_int32" output: "gpu_0/rois_3_fpn2" output: "gpu_0/rois_3_fpn3" output: "gpu_0/rois_3_fpn4" output: "gpu_0/rois_3_fpn5" output: "gpu_0/rois_3_idx_restore_int32" output: "gpu_0/mask_rois_fpn2" output: "gpu_0/mask_rois_fpn3" output: "gpu_0/mask_rois_fpn4" output: "gpu_0/mask_rois_fpn5" output: "gpu_0/mask_rois_idx_restore_int32" name: "DistributeCascadeProposalsOp:gpu_0/proposals_3,gpu_0/roidb,gpu_0/im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:11" } arg { name: "grad_output_indices" } device_option { device_type: 0 }Error from operator:
input: "gpu_0/proposals_3" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois_3" output: "gpu_0/labels_int32_3" output: "gpu_0/bbox_targets_3" output: "gpu_0/bbox_inside_weights_3" output: "gpu_0/bbox_outside_weights_3" output: "gpu_0/mapped_gt_boxes_3" output: "gpu_0/mask_rois" output: "gpu_0/roi_has_mask_int32" output: "gpu_0/masks_int32" output: "gpu_0/rois_3_fpn2" output: "gpu_0/rois_3_fpn3" output: "gpu_0/rois_3_fpn4" output: "gpu_0/rois_3_fpn5" output: "gpu_0/rois_3_idx_restore_int32" output: "gpu_0/mask_rois_fpn2" output: "gpu_0/mask_rois_fpn3" output: "gpu_0/mask_rois_fpn4" output: "gpu_0/mask_rois_fpn5" output: "gpu_0/mask_rois_idx_restore_int32" name: "DistributeCascadeProposalsOp:gpu_0/proposals_3,gpu_0/roidb,gpu_0/im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:11" } arg { name: "grad_output_indices" } device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7f2dc790c409 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0xa2b85 (0x7f2dc805ab85 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #2: + 0xa0fe7 (0x7f2dc8058fe7 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #3: + 0xea931 (0x7f2dc80a2931 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #4: + 0xe8ffd (0x7f2dc80a0ffd in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f2da8e92b94 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #6: + 0x168f009 (0x7f2da8e99009 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x2a3 (0x7f2dc79062f3 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #8: + 0xc8421 (0x7f2ddc0a1421 in /home/wrc/anaconda3/envs/py27/bin/../lib/libstdc++.so.6)
frame #9: + 0x76ba (0x7f2de87036ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #10: clone + 0x6d (0x7f2de7d2941d in /lib/x86_64-linux-gnu/libc.so.6)

Pre-training model URL is invalid, error urllib2.HTTPError : HTTP Error: 301: Moved Permanently.

I can't find the URL by using a browser.
https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl

how to use the CBNet

hi I want to use the CBNet to my own net and i don't known how to use it

Detectron ops lib not found

在编译Detectron官方代码完后运行python2 detectron/tests/test_spatial_narrow_as_op.py测试时正常通过，但编译完您的代码后，运行测试代码就会报错

Traceback (most recent call last):
  File "detectron/tests/test_spatial_narrow_as_op.py", line 88, in <module>
    c2_utils.import_detectron_ops()
  File "/home/wrc/CBNet/detectron/utils/c2.py", line 43, in import_detectron_ops
    detectron_ops_lib = envu.get_detectron_ops_lib()
  File "/home/wrc/CBNet/detectron/utils/env.py", line 71, in get_detectron_ops_lib
    ('Detectron ops lib not found; make sure that your Caffe2 '
AssertionError: Detectron ops lib not found; make sure that your Caffe2 version includes Detectron module

请问下是不是这个代码只能使用源码编译的方法安装的caffe2

License issues/conflicts!

There is a license file https://github.com/PKUbahuangliuhe/CBNet/blob/master/LICENSE. But in the readme says "Our code is free for research, but needs authorization for commerce."

I understand that an Apache 2.0 permits commercial use. What does "needs authorization for commerce" means?

Thanks

Sharing weights for CBNet

Hello, regarding the comparative experimental design of shared parameters, I would like to ask how to implement parameter sharing in the code?

About Composite connection

Hi，in your paper, about AHLC, composite connection include 1x1 con layer and bn layer, is there no activate function? The activate function used after combine with lower feature?

Where is the BN layer before upsampling?

The paper said

the composite connection, which consists of a 1×1 convolutional layer and batch normalization
layer to reduce the channels and an upsample operation

But I can't find BN before upsampling here

How to set up to use CBNet. If I use the added cascade rcnn model by default to train model, will it use CBNet?

As shown in the title

Could you please release the trained model weights file?

any plans for using the CBNet in Detectron2?

PLEASE FOLLOW THESE INSTRUCTIONS BEFORE POSTING

Please thoroughly read README.md, INSTALL.md, GETTING_STARTED.md, and FAQ.md
Please search existing open and closed issues in case your issue has already been reported
Please try to debug the issue in case you can solve it on your own before posting

After following steps 1-3 above and agreeing to provide the detailed information requested below, you may continue with posting your issue

(Delete this line and the text above it.)

Expected results

What did you expect to see?

Actual results

What did you observe instead?

Detailed steps to reproduce

E.g.:

The command that you ran

System information

Operating system: ?
Compiler version: ?
CUDA version: ?
cuDNN version: ?
NVIDIA driver version: ?
GPU models (for all devices if they are not all the same): ?
PYTHONPATH environment variable: ?
python --version output: ?
Anything else that seems relevant: ?

INFO net.py: 89: old_res5_2_branch2c_bn_b not found

I have installed CBNet follow the Detectron repo install tutorial.
However,when I run the infer_simple.py.I got the error.
INFO net.py: 89: old_res5_2_branch2c_bn_b not found
The command I use:

python tools/infer_simple.py \   
 --cfg configs/cascade_rcnn_baselines/e2e_mask_cascade_rcnn_X-152-32x8d-FPN-        IN5k_1.44x.yaml
 --output-dir /tmp/detectron-visualizations \     
--image-ext jpg     \
--wts https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl       
demo

What should I do to fix it?

请问是否计划在mmdetection工具箱上实现CBNet？

PLEASE FOLLOW THESE INSTRUCTIONS BEFORE POSTING

Please thoroughly read README.md, INSTALL.md, GETTING_STARTED.md, and FAQ.md
Please search existing open and closed issues in case your issue has already been reported
Please try to debug the issue in case you can solve it on your own before posting

After following steps 1-3 above and agreeing to provide the detailed information requested below, you may continue with posting your issue

(Delete this line and the text above it.)

Expected results

What did you expect to see?

Actual results

What did you observe instead?

Detailed steps to reproduce

E.g.:

The command that you ran

System information

Operating system: ?
Compiler version: ?
CUDA version: ?
cuDNN version: ?
NVIDIA driver version: ?
GPU models (for all devices if they are not all the same): ?
PYTHONPATH environment variable: ?
python --version output: ?
Anything else that seems relevant: ?

RuntimeError: [enforce fail at operator.cc:75] blob != nullptr. op Conv: Encountered a non-existing input blob: gpu_0/old_res3_7_sum

I don't have 8 GPUS, so I chang3 Num_GPUS to 2 and it raise this error. How can I fix it?

I use e2e_cascade_rcnn_X-101-64x4d-FPN_1x.yaml. I change it like:
MODEL:
TYPE: generalized_rcnn
CONV_BODY: FPN.add_fpn_ResNet101_conv5_body
NUM_CLASSES: 21
FASTER_RCNN: True
CASCADE_ON: True
CLS_AGNOSTIC_BBOX_REG: True # default: False
NUM_GPUS: 2
SOLVER:
WEIGHT_DECAY: 0.0001
LR_POLICY: steps_with_decay
BASE_LR: 0.01
GAMMA: 0.1
MAX_ITER: 180000
STEPS: [0, 120000, 160000]
FPN:
FPN_ON: True
MULTILEVEL_ROIS: True
MULTILEVEL_RPN: True
RESNETS:
STRIDE_1X1: False # default True for MSRA; False for C2 or Torch models
TRANS_FUNC: bottleneck_transformation
NUM_GROUPS: 64
WIDTH_PER_GROUP: 4
FAST_RCNN:
ROI_BOX_HEAD: fast_rcnn_heads.add_roi_2mlp_head
ROI_XFORM_METHOD: RoIAlign
ROI_XFORM_RESOLUTION: 7
ROI_XFORM_SAMPLING_RATIO: 2
CASCADE_RCNN:
ROI_BOX_HEAD: cascade_rcnn_heads.add_roi_2mlp_head
NUM_STAGE: 3
TEST_STAGE: 3
TEST_ENSEMBLE: True
TRAIN:
WEIGHTS: https://dl.fbaipublicfiles.com/detectron/ImageNetPretrained/FBResNeXt/X-101-64x4d.pkl
DATASETS: ('coco_2014_train', 'coco_2014_valminusminival')
SCALES: (800,)
MAX_SIZE: 1333
IMS_PER_BATCH: 1
BATCH_SIZE_PER_IM: 512
RPN_PRE_NMS_TOP_N: 2000 # Per FPN level
TEST:
DATASETS: ('coco_2014_valminusminival',)
SCALE: 800
MAX_SIZE: 1333
NMS: 0.5
RPN_PRE_NMS_TOP_N: 1000 # Per FPN level
RPN_POST_NMS_TOP_N: 1000
OUTPUT_DIR: .

the error:

[W workspace.cc:170] Blob gpu_0/old_res3_7_sum not in the workspace.
WARNING workspace.py: 222: Original python traceback for operator 383 in network generalized_rcnn in exception above (most recent call last):
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/tools/train_net.py", line 133, in
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/tools/train_net.py", line 115, in main
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/utils/train.py", line 53, in train_model
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/utils/train.py", line 145, in create_model
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/model_builder.py", line 127, in create
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/model_builder.py", line 91, in generalized_rcnn
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/model_builder.py", line 259, in build_generic_detection_model
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/optimizer.py", line 40, in build_data_parallel_model
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/optimizer.py", line 63, in _build_forward_graph
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/model_builder.py", line 189, in _single_gpu_build_func
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/FPN.py", line 64, in add_fpn_ResNet101_conv5_body
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/FPN.py", line 112, in add_fpn_onto_conv_body
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/ResNet.py", line 48, in add_ResNet101_conv5_body
WARNING workspace.py: 227: File "/home/lzy/diverse/CBNet/detectron/modeling/ResNet.py", line 145, in add_ResNet_convX_body
Traceback (most recent call last):
File "/home/lzy/diverse/CBNet/tools/train_net.py", line 133, in
main()
File "/home/lzy/diverse/CBNet/tools/train_net.py", line 115, in main
checkpoints = detectron.utils.train.train_model()
File "/home/lzy/diverse/CBNet/detectron/utils/train.py", line 58, in train_model
setup_model_for_training(model, weights_file, output_dir)
File "/home/lzy/diverse/CBNet/detectron/utils/train.py", line 179, in setup_model_for_training
workspace.CreateNet(model.net)
File "/home/lzy/pytorch/build/caffe2/python/workspace.py", line 181, in CreateNet
StringifyProto(net), overwrite,
File "/home/lzy/pytorch/build/caffe2/python/workspace.py", line 215, in CallWithExceptionIntercept
return func(args, kwargs)
RuntimeError: [enforce fail at operator.cc:75] blob != nullptr. op Conv: Encountered a non-existing input blob: gpu_0/old_res3_7_sum
frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const) + 0x76 (0x7f916475ed36 in /home/lzy/pytorch/build/lib/libc10.so)
frame #1: caffe2::OperatorBase::OperatorBase(caffe2::OperatorDef const&, caffe2::Workspace*) + 0x3ff (0x7f9144b7bd2f in /home/lzy/pytorch/build/lib/libtorch.so)
frame #2: + 0x3f68805 (0x7f914635b805 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #3: + 0x3f868eb (0x7f91463798eb in /home/lzy/pytorch/build/lib/libtorch.so)
frame #4: + 0x3f8841e (0x7f914637b41e in /home/lzy/pytorch/build/lib/libtorch.so)
frame #5: std::_Function_handler<std::unique_ptr<caffe2::OperatorBase, std::default_deletecaffe2::OperatorBase > (caffe2::OperatorDef const&, caffe2::Workspace*), std::unique_ptr<caffe2::OperatorBase, std::default_deletecaffe2::OperatorBase > ()(caffe2::OperatorDef const&, caffe2::Workspace)>::_M_invoke(std::_Any_data const&, caffe2::OperatorDef const&, caffe2::Workspace*&&) + 0x23 (0x7f9164bf96a3 in /home/lzy/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #6: + 0x2786301 (0x7f9144b79301 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #7: caffe2::CreateOperator(caffe2::OperatorDef const&, caffe2::Workspace*, int) + 0x32a (0x7f9144b7a60a in /home/lzy/pytorch/build/lib/libtorch.so)
frame #8: caffe2::dag_utils::prepareOperatorNodes(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0x17f3 (0x7f9144b74b93 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #9: caffe2::AsyncNetBase::AsyncNetBase(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0x246 (0x7f9144b8c026 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #10: caffe2::AsyncSchedulingNet::AsyncSchedulingNet(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0x9 (0x7f9144bb6989 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #11: + 0x27c5e2e (0x7f9144bb8e2e in /home/lzy/pytorch/build/lib/libtorch.so)
frame #12: std::_Function_handler<std::unique_ptr<caffe2::NetBase, std::default_deletecaffe2::NetBase > (std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*), std::unique_ptr<caffe2::NetBase, std::default_deletecaffe2::NetBase > ()(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace)>::_M_invoke(std::_Any_data const&, std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*&&) + 0x23 (0x7f9144bb8ce3 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #13: caffe2::CreateNet(std::shared_ptr<caffe2::NetDef const> const&, caffe2::Workspace*) + 0x847 (0x7f9144bc3117 in /home/lzy/pytorch/build/lib/libtorch.so)
frame #14: caffe2::Workspace::CreateNet(std::shared_ptr<caffe2::NetDef const> const&, bool) + 0x13c (0x7f9144bdf24c in /home/lzy/pytorch/build/lib/libtorch.so)
frame #15: caffe2::Workspace::CreateNet(caffe2::NetDef const&, bool) + 0x9f (0x7f9144be094f in /home/lzy/pytorch/build/lib/libtorch.so)
frame #16: + 0x51f70 (0x7f9164beef70 in /home/lzy/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #17: + 0x521de (0x7f9164bef1de in /home/lzy/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)
frame #18: + 0x99160 (0x7f9164c36160 in /home/lzy/pytorch/build/caffe2/python/caffe2_pybind11_state_gpu.so)

frame #36: __libc_start_main + 0xf0 (0x7f9168059830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #37: + 0x107f (0x55e423b0507f in /home/lzy/anaconda2/envs/lzy/bin/python)

What's more, I can train model on the original detectron.

方便分享Dual + Triple的配置与模型吗？谢谢~

PLEASE FOLLOW THESE INSTRUCTIONS BEFORE POSTING

### 请问方便分享 Dual 的模型，以及 Triple的配置与模型吗？谢谢~

Expected results

Actual results

Can not find the codes for composite structure

Hey, nice work!

Now I have some trouble understanding your codes. I have read all your modifications and didn't find the codes for composite structure which connects 2 similar backbones. TBH, I am not very familiar with Caffe or Caffe2, so this may be my own issue.

Could you point out the file I need to read for it?
Thanks ahead for your time.

Excuse me, do we need to train the Assistant Backbones in advance?

COCO上训练号的模型

请问是否打算分享COCO上训练好的模型？

HTTP Error 301: Moved Permanently

老的链接好像失效了
INFO io.py: 67: Downloading remote file https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl to /tmp/detectron-download-cache/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl
Traceback (most recent call last):
File "/home/wrc/CBNet/tools/train_net.py", line 132, in
main()
File "/home/wrc/CBNet/tools/train_net.py", line 101, in main
assert_and_infer_cfg()
File "/home/wrc/CBNet/detectron/core/config.py", line 1127, in assert_and_infer_cfg
cache_cfg_urls()
File "/home/wrc/CBNet/detectron/core/config.py", line 1136, in cache_cfg_urls
__C.TRAIN.WEIGHTS = cache_url(__C.TRAIN.WEIGHTS, __C.DOWNLOAD_CACHE)
File "/home/wrc/CBNet/detectron/utils/io.py", line 68, in cache_url
download_url(url, cache_file_path)
File "/home/wrc/CBNet/detectron/utils/io.py", line 114, in download_url
response = urllib2.urlopen(url)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/home/wrc/anaconda3/envs/py27/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 301: Moved Permanently

然后我直接下载了detectron2的pkl权重放到指定文件夹，又修改了config文件中的WEIGHTS: /home/wrc/CBNet/pretrained/X-152-32x8d-IN5k.pkl
同时def assert_and_infer_cfg(cache_urls=True, make_immutable=True):这里设置为False
但是新下载的权重和网络好像对不上
运行的时候报错
首先是很多参数找不到
INFO net.py: 173: 3_b not found
src_name is score_3_w
INFO net.py: 173: score_3_w not found
src_name is score_3_b
INFO net.py: 173: score_3_b not found
src_name is _pred_3_w
INFO net.py: 173: _pred_3_w not found
src_name is _pred_3_b
INFO net.py: 173: _pred_3_b not found
src_name is _w
然后报gpu的错误
[E net_async_base.cc:377] [enforce fail at context_gpu.cu:495] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1556653000816/work/caffe2/core/context_gpu.cu:495: out of memory
Error from operator:
input: "gpu_0/res4_17_branch2c_bn" input: "gpu_0/res4_18_branch2a_w" input: "gpu_0/__m9_shared" output: "gpu_0/res4_18_branch2a_w_grad" output: "gpu_0/__m16_shared" name: "" type: "ConvGradient" arg { name: "no_bias" i: 1 } arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN" is_gradient_op: trueframe #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x59 (0x7fb0ab76c409 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: + 0x281f180 (0x7fb06b638180 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #2: + 0x14db285 (0x7fb08cb45285 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #3: caffe2::empty(c10::ArrayRef, c10::TensorOptions) + 0x72 (0x7fb08cd38ae2 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #4: + 0x1465745 (0x7fb06a27e745 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: + 0x1468b75 (0x7fb06a281b75 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: + 0x1468e8a (0x7fb06a281e8a in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: bool caffe2::CudnnConvGradientOp::DoRunWithType<float, float, float, float, float, float, float>() + 0x2c5 (0x7fb06a295315 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #8: caffe2::CudnnConvGradientOp::RunOnDevice() + 0xb0 (0x7fb06a27c8c0 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #9: + 0x13cb0b5 (0x7fb06a1e40b5 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #10: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7fb08ccf2b94 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #11: + 0x168f009 (0x7fb08ccf9009 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #12: c10::ThreadPool::main_loop(unsigned long) + 0x2a3 (0x7fb0ab7662f3 in /home/wrc/anaconda3/envs/py27/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #13: + 0xc8421 (0x7fb0bff01421 in /home/wrc/anaconda3/envs/py27/bin/../lib/libstdc++.so.6)
frame #14: + 0x76ba (0x7fb0cc5636ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #15: clone + 0x6d (0x7fb0cbb8941d in /lib/x86_64-linux-gnu/libc.so.6)
, op ConvGradient
想问下有没有新的可以用的权重链接，或者说可以直接使用的与训练权重可以分享下

aws client error (PermanentRedirect)

Hi, I encounter the error when I run sh train_cascade.sh:

tere3927@terence-ubuntu:~/code/python/pytorch/object_detection/cbnet$ sh train_cascade.sh
Found Detectron ops lib: /home/tere3927/.local/lib/python2.7/site-packages/torch/lib/libcaffe2_detectron_ops_gpu.so
[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.
INFO train_net.py: 95: Called with args:
INFO train_net.py: 96: Namespace(cfg_file='configs/cascade_rcnn_baselines/e2e_mask_cascade_rcnn_dual-X-152-32x8d-FPN-IN5k_1.44x.yaml', multi_gpu_testing=True, opts=['OUTPUT_DIR', 'detectron-output'], skip_test=False)
configs/cascade_rcnn_baselines/e2e_mask_cascade_rcnn_dual-X-152-32x8d-FPN-IN5k_1.44x.yaml
INFO io.py: 67: Downloading remote file https://s3-us-west-2.amazonaws.com/detectron/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl to /tmp/detectron-download-cache/ImageNetPretrained/25093814/X-152-32x8d-IN5k.pkl
Traceback (most recent call last):
File "tools/train_net.py", line 133, in
main()
File "tools/train_net.py", line 102, in main
assert_and_infer_cfg()
File "/home/tere3927/code/python/pytorch/object_detection/cbnet/detectron/core/config.py", line 1127, in assert_and_infer_cfg
cache_cfg_urls()
File "/home/tere3927/code/python/pytorch/object_detection/cbnet/detectron/core/config.py", line 1136, in cache_cfg_urls
__C.TRAIN.WEIGHTS = cache_url(__C.TRAIN.WEIGHTS, __C.DOWNLOAD_CACHE)
File "/home/tere3927/code/python/pytorch/object_detection/cbnet/detectron/utils/io.py", line 68, in cache_url
download_url(url, cache_file_path)
File "/home/tere3927/code/python/pytorch/object_detection/cbnet/detectron/utils/io.py", line 114, in download_url
response = urllib2.urlopen(url)
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 435, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 473, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 301: Moved Permanently

What might the problem be?

Thanks you.

vdigpku / cbnet_caffe Goto Github PK

cbnet_caffe's People

Contributors

Stargazers

Watchers

Forkers

cbnet_caffe's Issues

PLEASE FOLLOW THESE INSTRUCTIONS BEFORE POSTING

After following steps 1-3 above and agreeing to provide the detailed information requested below, you may continue with posting your issue

Expected results

Actual results

Detailed steps to reproduce

System information

PLEASE FOLLOW THESE INSTRUCTIONS BEFORE POSTING

After following steps 1-3 above and agreeing to provide the detailed information requested below, you may continue with posting your issue

Expected results

Actual results

Detailed steps to reproduce

System information

I don't have 8 GPUS, so I chang3 Num_GPUS to 2 and it raise this error. How can I fix it?

the error:

What's more, I can train model on the original detectron.

PLEASE FOLLOW THESE INSTRUCTIONS BEFORE POSTING

Expected results

Actual results

Recommend Projects

Recommend Topics

Recommend Org