Giter VIP home page Giter VIP logo

vovnet-detectron2's Introduction

๐Ÿ‘‹ย  Hi there! I'm Youngwan, a senior researcher at ETRI and Ph.D student in Graduate school of AI at KAIST, where I'm advised by Prof. Sung Ju Hwang in the Machine Learning and Artificial Intelligence (MLAI) lab.

My research interest is how computers understand the world, including efficient 2D/3D neural network design, object detection, instance segmentation, semantic segmentation, and video classification. ๐Ÿ–ฅ๏ธ๐ŸŒ

Representative publications and Codes

See Google scholar for full list.

  • RC-MAE: Exploring the Role of Mean Teachers in Self-supervised Masked Auto-Encoders, ICLR 2023.
  • MPViT : Multi-Path Vision Transformer for Dense Prediction, CVPR 2022.
  • CenterMask : Real-Time Anchor-Free Instance Segmentation, CVPR 2020.
  • 2D convolutional neural network : VoVNet
  • 3D convolutional neural network : VoV3D

About me

  • ๐Ÿ“ I enjoy teaching talking what I know learn, so I am giving lectures on AI as an AI Facilitator at ETRI AI Academy.
  • ๐ŸŒ๐ŸŒฑ๐ŸŒฒ๐ŸŒŠ โ›ฐ๏ธ I love to appreciate the beautiful nature.
  • ๐ŸŽพ ๐Ÿ€ I enjoy playing tennis and basket ball.
  • ๐Ÿ“ซ How to reach me: [email protected] | [email protected]

๐Ÿ’ช Skills

Platforms & Languages

Python PyTorch Tensorflow Java Android

vovnet-detectron2's People

Contributors

youngwanlee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vovnet-detectron2's Issues

KeyError: 'Non-existent config key: MODEL.VOVNET'

I got the error following error

WARNING [02/08 15:44:32 d2.config.compat]: Config '/home/detectron2/vovnet-detectron2/configs/faster_rcnn_V_99_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2. Traceback (most recent call last): File "vovnet-detectron2/custom_vovnet_train.py", line 75, in <module> cfg = prepareConfig() File "vovnet-detectron2/custom_vovnet_train.py", line 69, in prepareConfig cfg.merge_from_file(config_file) File "/mnt/Data_common/PPE_Violation_Detection_Samjith/MPC_model/detectron2/detectron2/config/config.py", line 45, in merge_from_file self.merge_from_other_cfg(loaded_cfg) File "/opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/fvcore/common/config.py", line 121, in merge_from_other_cfg return super().merge_from_other_cfg(cfg_other) File "/opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/yacs/config.py", line 217, in merge_from_other_cfg _merge_a_into_b(cfg_other, self, self, []) File "/opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/yacs/config.py", line 460, in _merge_a_into_b _merge_a_into_b(v, b[k], root, key_list + [k]) File "/opt/anaconda3/envs/d2_train/lib/python3.8/site-packages/yacs/config.py", line 473, in _merge_a_into_b raise KeyError("Non-existent config key: {}".format(full_key)) KeyError: 'Non-existent config key: MODEL.VOVNET'

SE param is not used in OSA module

From OSA stage I noticed that SE param can be changed to False in some cases, e.g. block_per_stage != 1. I guess it means the following OSA module should not include SE module.

if block_per_stage != 1:
SE = False
module_name = f"OSA{stage_num}_1"
self.add_module(
module_name, _OSA_module(in_ch, stage_ch, concat_ch, layer_per_block, module_name, SE, depthwise=depthwise)
)

But it seems that the SE param defined in OSA module is never used, so SE module will be applied in every OSA module.

class _OSA_module(nn.Module):
def __init__(
self, in_ch, stage_ch, concat_ch, layer_per_block, module_name, SE=False, identity=False, depthwise=False
):

Is it a bug? or just I misunderstood it?

Keypoint RCNN

Hi,

Thanks for this great extension. I'm currently looking to try different backbones for keypoint detection in detectron2. I have to say I'm sort of loss on how to go about replacing a backbone on an existing Keypoint RCNN architecture. Do you have any Vovnet implemented in Keypoint RCNN? Also, I'm looking for a good tutorial/example for swapping backbone architecture in Detectron2. I'd appreciate any help. Thank you!

Add setup.py for easy installation

Consider adding setup.py for easier usage of vovnet backbones in projects.
Then we can install vovnet package as easy as

pip install git+https://github.com/youngwanLEE/vovnet-detectron2

and import it as import vovnet.

Simple setup.py will do the job (of course that is not the best solution).

I found that there may be ambiguity in the code, or maybe I don't understand it?

P5KR(D9G5G@ETAF3 %)(}GH
Regarding the content framed in red in the picture, I think๏ผŒwhen block_per_stage = 3 or 4 ,block_per_stage != 1,the module of SE has become False, So the code module SE below is False no matter how block_per_stage changes.
I have a feeling that this may not be what you originally thought, I feel that by commenting out "last block" I think you may be more inclined to add attention to the last block, or to add attention a few blocks before the "last block"

Loss NaN about using vovnet as backbone in RetinaNet

Hi! Thank you for your great work.
I wanted to improve RetinaNet project in detectron2/projects by replacing "retinanet_resnet_fpn_backbone" with "retinanet_vovnet_fpn_backbone".
However, I always encounterd "loss NaN" in period of less than 1000 iterations during training .
Training by "retinanet_resnet_fpn_backbone" is OK.

I want to make sure that I wasn't doing something wrong.

my config yaml:

_BASE_: "../Base-RetinaNet.yaml"
MODEL:
  WEIGHTS: "./pre_train/vovnet39_ese_detectron2.pth"
  RETINANET:
    NUM_CLASSES: 2
  BACKBONE:
    NAME: "build_retinanet_vovnet_fpn_backbone"
    FREEZE_AT: 0
  VOVNET:
    CONV_BODY : "V-39-eSE"
    OUT_FEATURES: ["stage3", "stage4", "stage5"]
  FPN:
    IN_FEATURES: ["stage3", "stage4", "stage5"]
SOLVER:
  STEPS: (210000, 250000)
  MAX_ITER: 270000
OUTPUT_DIR: "output/retina/V_39_ms_3x"

build_retinanet_vovnet_fpn_backbone

@BACKBONE_REGISTRY.register()
def build_retinanet_vovnet_fpn_backbone(cfg, input_shape: ShapeSpec):
    """
    Args:
        cfg: a detectron2 CfgNode

    Returns:
        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.
    """

    bottom_up = build_vovnet_backbone(cfg, input_shape)
    in_features = cfg.MODEL.FPN.IN_FEATURES
    out_channels = cfg.MODEL.FPN.OUT_CHANNELS
    in_channels_top = out_channels
    top_block = LastLevelP6P7(in_channels_top, out_channels, "p5")
    # in_channels_p6p7 = bottom_up.output_shape()["res5"].channels
    backbone = FPN(
        bottom_up=bottom_up,
        in_features=in_features,
        out_channels=out_channels,
        norm=cfg.MODEL.FPN.NORM,
        top_block=top_block,
        # top_block=LastLevelP6P7(in_channels_p6p7, out_channels),
        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,
    )
    return backbone

On designing thinner VovNet

Thanks for your great work!
I'm working on a project of instance segmentation in which taget_classes=2. I tried to use Vovnet in the project to replace the ResNet that I have already used. Because the target_class is very different from COCO, the output channels of FPN of my model is only 64 and I think the VOVnet here should be a lot thinner than COCO version.
I have designed a VovNet like the model bellow, but the result is worse than ResNet. Do you have any suggestion on how to design thinner VovNet? Thanks

StageSpec = namedtuple(
"StageSpec",
[
"index", # Index of the stage, eg 1, 2, ..,. 5
"block_count", # Number of residual blocks in the stage
"layer_per_block", # Number of OSA modules per block
"return_features", # True => return the last feature map from this stage
"in_channels",
"out_channels",
],
)
VoVNet67_eSE = tuple(
StageSpec(index=i, block_count=b, layer_per_block=l, in_channels=in_c, out_channels=out_c, return_features=r)
for (i, b, l, in_c, out_c, r) in ((1, 1, 5, 16, 48, True), (2, 3, 5, 32, 96, True), (3, 6, 5, 64, 192, True), (4, 3, 5, 128, 256, True))

Error trying to export models to caffe2

When I try to run the standard caffe2 export script, I get an error:

(detectron_env_2) sal9000@sal9000-XPS-13-9370:~/Sources/detectron2/tools/deploy$ ./caffe2_converter_guitars.py --config-file /home/sal9000/Sources/detectron2/projects/vovnet-detectron2/checkpoints/MRCN-V2-19-FPNLite-3x/config.yaml  --output ./caffe2_model_guitars_lite --run-eval MODEL.WEIGHTS /home/sal9000/Sources/detectron2/projects/vovnet-detectron2/checkpoints/MRCN-V2-19-FPNLite-3x/model_final.pth  MODEL.DEVICE cpu
[05/17 15:15:55 detectron2]: Command line arguments: Namespace(config_file='/home/sal9000/Sources/detectron2/projects/vovnet-detectron2/checkpoints/MRCN-V2-19-FPNLite-3x/config.yaml', format='caffe2', opts=['MODEL.WEIGHTS', '/home/sal9000/Sources/detectron2/projects/vovnet-detectron2/checkpoints/MRCN-V2-19-FPNLite-3x/model_final.pth', 'MODEL.DEVICE', 'cpu'], output='./caffe2_model_guitars_lite', run_eval=True)
Traceback (most recent call last):
  File "./caffe2_converter_guitars.py", line 81, in <module>
    torch_model = build_model(cfg)
  File "/home/sal9000/Sources/detectron2/detectron2/modeling/meta_arch/build.py", line 21, in build_model
    model = META_ARCH_REGISTRY.get(meta_arch)(cfg)
  File "/home/sal9000/Sources/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 32, in __init__
    self.backbone = build_backbone(cfg)
  File "/home/sal9000/Sources/detectron2/detectron2/modeling/backbone/build.py", line 31, in build_backbone
    backbone = BACKBONE_REGISTRY.get(backbone_name)(cfg, input_shape)
  File "/home/sal9000/virtualenvs/detectron_env_2/lib/python3.6/site-packages/fvcore/common/registry.py", line 70, in get
    "No object named '{}' found in '{}' registry!".format(name, self._name)
KeyError: "No object named 'build_vovnet_fpn_backbone' found in 'BACKBONE' registry!"

I had already inserted a line to add_vovnet_config(cfg), which fixed an earlier error, but I'm not sure how to proceed with this missing backbone error.

P.S. which is the fastest backbone for CPU inference? Eventually I'd like to try putting this model on a mobile device.

How to set the params? such channel

Hello, I want to change the output channel to be 128, Could you give me some advice about how to modify the params? The params I set can not get a good result. The result is worse than ResNet34.

Looking forward your reply.

lower AP than Resnet backbone in my training

Hi! Thank you for your great work. I wanted to improve Densepose project in detectron2/projects by replacing resnet-fpn backbone with vovnet, but during training I always get lower results than the original resnet backbone. One of the two results and the command I used is as below:
python train_net.py --config-file configs/densepose_rcnn_R_50_FPN_s1x_legacy.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.001 (the original one)
results:
[03/12 13:05:36 d2.evaluation.testing]: copypaste: Task: bbox
[03/12 13:05:36 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[03/12 13:05:36 d2.evaluation.testing]: copypaste: 53.1516,84.5066,57.3643,26.5924,51.5737,66.5134
[03/12 13:05:36 d2.evaluation.testing]: copypaste: Task: densepose_gps
[03/12 13:05:36 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APm,APl
[03/12 13:05:36 d2.evaluation.testing]: copypaste: 44.6044,83.1504,43.4994,38.4582,46.2426
[03/12 13:05:36 d2.evaluation.testing]: copypaste: Task: densepose_gpsm
[03/12 13:05:36 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APm,APl
[03/12 13:05:36 d2.evaluation.testing]: copypaste: 48.4785,86.5655,50.8903,40.1852,50.2448
[03/12 13:05:36 d2.utils.events]: eta: 0:00:00 iter: 129999 total_loss: 2.144 loss_cls: 0.104 loss_box_reg: 0.153 loss_densepose_U: 0.444 loss_densepose_V: 0.487 loss_densepose_I: 0.176 loss_densepose_S: 0.648 loss_rpn_cls: 0.011 loss_rpn_loc: 0.021 time: 0.4843 data_time: 0.0164 lr: 0.000010 max_mem: 2939M

python train_net.py --config-file configs/densepose_rcnn_R_50_FPN_s1x_legacy.yaml SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0001 (the backbone replaced one)
results:
[03/15 06:48:25 d2.evaluation.testing]: copypaste: Task: bbox
[03/15 06:48:25 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[03/15 06:48:25 d2.evaluation.testing]: copypaste: 50.5212,83.5444,52.5758,22.3658,48.5157,64.9114
[03/15 06:48:25 d2.evaluation.testing]: copypaste: Task: densepose_gps
[03/15 06:48:25 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APm,APl
[03/15 06:48:25 d2.evaluation.testing]: copypaste: 41.6630,82.0449,37.6683,31.7831,43.3619
[03/15 06:48:25 d2.evaluation.testing]: copypaste: Task: densepose_gpsm
[03/15 06:48:25 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APm,APl
[03/15 06:48:25 d2.evaluation.testing]: copypaste: 46.4402,86.1345,45.5157,35.2128,48.2686
[03/15 06:48:25 d2.utils.events]: eta: 0:00:00 iter: 269999 total_loss: 2.757 loss_cls: 0.075 loss_box_reg: 0.136 loss_densepose_U: 0.707 loss_densepose_V: 0.751 loss_densepose_I: 0.215 loss_densepose_S: 0.635 loss_rpn_cls: 0.008 loss_rpn_loc: 0.010 time: 0.2863 data_time: 0.0129 lr: 0.000001 max_mem: 4289M

the changes I've made in Base-DensePose-RCNN-FPN.yaml:

MODEL:
  META_ARCHITECTURE: "GeneralizedRCNN"
  BACKBONE:
    NAME: "build_vovnet_fpn_backbone"
    FREEZE_AT: 0
    #NAME: "build_resnet_fpn_backbone"
  VOVNET:
    OUT_FEATURES: ["stage2", "stage3", "stage4", "stage5"]
  #RESNETS:
    #OUT_FEATURES: ["res2", "res3", "res4", "res5"]

  FPN:
    IN_FEATURES: ["stage2", "stage3", "stage4", "stage5"]
    #IN_FEATURES: ["res2", "res3", "res4", "res5"]

and in densepose_rcnn_R_50_FPN_s1x_legacy.yaml:

_BASE_: "Base-DensePose-RCNN-FPN.yaml"
MODEL:
  #WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
  #RESNETS:
    #DEPTH: 50
  WEIGHTS: "vovnet39_ese_detectron2.pth"#"https://www.dropbox.com/s/rptgw6stppbiw1u/vovnet19_ese_detectron2.pth?dl=1"
  VOVNET:
    CONV_BODY : "V-39-eSE"

  ROI_DENSEPOSE_HEAD:
    NUM_COARSE_SEGM_CHANNELS: 15
    POOLER_RESOLUTION: 14
    HEATMAP_SIZE: 56
    INDEX_WEIGHTS: 2.0
    PART_WEIGHTS: 0.3
    POINT_REGRESSION_WEIGHTS: 0.1
    DECODER_ON: False
SOLVER:
  BASE_LR: 0.002
  #MAX_ITER: 130000
  #STEPS: (100000, 120000)
  STEPS: (210000, 250000)
  MAX_ITER: 270000
OUTPUT_DIR: "checkpoints/MRCN-V2-39-3x"

Have I done something wrong? Like unsuitble learning rates or others? I've been deeply impressed by how your backbone can imporve a model's AP, so could you tell me how to train this new backbone in order to get higher results? Thanks a lot in advance!

AP value too low

Hello, excuse me, I'm doing cartoon character detection. The number of training sets and training times are sufficient, but the average AP (0.5-0.95) is only more than 30%. What's the reason? If you can take the time to answer me, I would be gratefu

Inconsistent evaluation results for same model and dataset

Command: python train_net.py --num-gpus 1 --config-file configs/faster_rcnn_V_39_FPN_3x.yaml --resume --eval-only MODEL.WEIGHTS checkpoints/FRCN-V2-39-3x_1/model_final.pth MODEL.ROI_HEADS.SCORE_THRESH_TEST 0.5

The content of config file:
BASE: "Base-RCNN-VoVNet-FPN.yaml"
MODEL:
WEIGHTS: "https://www.dropbox.com/s/q98pypf96rhtd8y/vovnet39_ese_detectron2.pth?dl=1"
MASK_ON: False
VOVNET:
CONV_BODY: "V-39-eSE"
ROI_HEADS:
NUM_CLASSES: 30
SOLVER:
STEPS: (210000, 250000)
MAX_ITER: 200000
IMS_PER_BATCH: 8
BASE_LR: 0.0001
CHECKPOINT_PERIOD: 1000
DATASETS:
TRAIN: ("train_dataset_leafi",)
TEST: ("train_dataset_leafi","val_dataset_leafi")
OUTPUT_DIR: "checkpoints/FRCN-V2-39-3x_crops/"
DATALOADER:
NUM_WORKERS: 4
TEST:
EVAL_PERIOD: 1000

The evaluation results (for validation dataset) running the command:

  • Run 1:

[06/17 10:07:33 d2.evaluation.coco_evaluation]: 'val_dataset_leafi' is not registered by register_coco_instances. Therefore trying to convert it to COCO format ...
[06/17 10:07:33 d2.evaluation.evaluator]: Start inference on 66 images
[06/17 10:07:35 d2.evaluation.fast_eval_api]: Evaluate annotation type bbox
[06/17 10:07:35 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 0.01 seconds.
[06/17 10:07:35 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[06/17 10:07:35 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 0.04 seconds.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.086
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.209
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.052
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.102
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.070
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.137
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.093
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.093
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.093
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.104
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.085
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.144
[06/17 10:07:35 d2.evaluation.coco_evaluation]: Evaluation results for bbox:

AP AP50 AP75 APs APm APl AR1 AR10 AR100 ARs ARm ARl
8.572 20.874 5.198 10.182 6.964 13.746 9.268 9.326 9.326 10.449 8.500 14.444
  • Run 2:

[06/17 10:11:20 d2.evaluation.coco_evaluation]: 'val_dataset_leafi' is not registered by register_coco_instances. Therefore trying to convert it to COCO format ...
WARNING [06/17 10:11:20 d2.data.datasets.coco]: Using previously cached COCO format annotations at 'checkpoints/FRCN-V2-39-3x_crops/inference/val_dataset_leafi_coco_format.json'. You need to clear the cache file if your dataset has been modified.
[06/17 10:11:20 d2.evaluation.evaluator]: Start inference on 66 images
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.094
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.242
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.043
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.110
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.196
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.116
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.116
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.116
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.004
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.142
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.233
[06/17 10:11:22 d2.evaluation.coco_evaluation]: Evaluation results for bbox:

AP AP50 AP75 APs APm APl AR1 AR10 AR100 ARs ARm ARl
9.403 24.244 4.297 0.042 10.984 19.574 11.642 11.642 11.642 0.353 14.227 23.333

This seems to be issue similar to facebookresearch/detectron2#739.

Problems in Multi Task

The vovnet performs greater than ShuffleNetV2 in objects detection on our own dataset, but when we use it as the backbone of multi task, its performance decrease sharply compared with ShuffleNetV2.
Any suggesstion ? Thx !

Training loss goes into nan values

I got nan values when used the default config in vovnet. Then i tried by reducing the bs_lr into 0.001 , 0.00025 .Hence the nan value issue solved, but the training loss not reducing (training loss starts from 1.9 to and reached in 0.7) , the AP is 11 for 75000 iterations.

Dataset : 57000 images with one class , those images are in different resolutions.

inference time

hi , I've noticed that you use V100 GPU machine to measure the inference time , can you please tell me what's the image size do you run when measuring the inference time ?

Out of memory error - how to reduce batch size?

I'm trying to train a small net on my own dataset. AWS P2 machine with ~12GB of GPU memory.

Getting the error below. Do you know what I can do, perhaps reduce batch size or something? How do I do that?

[05/16 14:46:32 d2.data.build]: Using training sampler TrainingSampler
[05/16 14:46:32 fvcore.common.checkpoint]: Loading checkpoint from https://www.dropbox.com/s/rptgw6stppbiw1u/vovnet19_ese_detectron2.pth?dl=1
[05/16 14:46:32 fvcore.common.file_io]: URL https://www.dropbox.com/s/rptgw6stppbiw1u/vovnet19_ese_detectron2.pth?dl=1 cached in /home/ubuntu/.torch/fvcore_cache/s/rptgw6stppbiw1u/vovnet19_ese_detectron2.pth?dl=1
[05/16 14:46:33 fvcore.common.checkpoint]: Some model parameters or buffers are not in the checkpoint:
  backbone.fpn_output5.{bias, weight}
  roi_heads.box_head.fc1.{bias, weight}
  roi_heads.box_predictor.bbox_pred.{weight, bias}
  roi_heads.mask_head.mask_fcn3.{weight, bias}
  roi_heads.mask_head.predictor.{bias, weight}
  backbone.fpn_output4.{bias, weight}
  backbone.fpn_output3.{weight, bias}
  proposal_generator.anchor_generator.cell_anchors.{0, 2, 3, 4, 1}
  proposal_generator.rpn_head.conv.{weight, bias}
  roi_heads.box_predictor.cls_score.{bias, weight}
  proposal_generator.rpn_head.objectness_logits.{bias, weight}
  roi_heads.mask_head.deconv.{bias, weight}
  roi_heads.box_head.fc2.{bias, weight}
  proposal_generator.rpn_head.anchor_deltas.{weight, bias}
  roi_heads.mask_head.mask_fcn1.{weight, bias}
  roi_heads.mask_head.mask_fcn2.{weight, bias}
  backbone.fpn_output2.{bias, weight}
  roi_heads.mask_head.mask_fcn4.{bias, weight}
  backbone.fpn_lateral2.{bias, weight}
  backbone.fpn_lateral4.{weight, bias}
  backbone.fpn_lateral5.{weight, bias}
  backbone.fpn_lateral3.{weight, bias}
[05/16 14:46:33 fvcore.common.checkpoint]: The checkpoint state_dict contains keys that are not used by the model:
  backbone.bottom_up.stem.stem_1/norm.num_batches_tracked
  backbone.bottom_up.stem.stem_2/norm.num_batches_tracked
  backbone.bottom_up.stem.stem_3/norm.num_batches_tracked
  backbone.bottom_up.stage2.OSA2_1.layers.0.OSA2_1_0/norm.num_batches_tracked
  backbone.bottom_up.stage2.OSA2_1.layers.1.OSA2_1_1/norm.num_batches_tracked
  backbone.bottom_up.stage2.OSA2_1.layers.2.OSA2_1_2/norm.num_batches_tracked
  backbone.bottom_up.stage2.OSA2_1.concat.OSA2_1_concat/norm.num_batches_tracked
  backbone.bottom_up.stage3.OSA3_1.layers.0.OSA3_1_0/norm.num_batches_tracked
  backbone.bottom_up.stage3.OSA3_1.layers.1.OSA3_1_1/norm.num_batches_tracked
  backbone.bottom_up.stage3.OSA3_1.layers.2.OSA3_1_2/norm.num_batches_tracked
  backbone.bottom_up.stage3.OSA3_1.concat.OSA3_1_concat/norm.num_batches_tracked
  backbone.bottom_up.stage4.OSA4_1.layers.0.OSA4_1_0/norm.num_batches_tracked
  backbone.bottom_up.stage4.OSA4_1.layers.1.OSA4_1_1/norm.num_batches_tracked
  backbone.bottom_up.stage4.OSA4_1.layers.2.OSA4_1_2/norm.num_batches_tracked
  backbone.bottom_up.stage4.OSA4_1.concat.OSA4_1_concat/norm.num_batches_tracked
  backbone.bottom_up.stage5.OSA5_1.layers.0.OSA5_1_0/norm.num_batches_tracked
  backbone.bottom_up.stage5.OSA5_1.layers.1.OSA5_1_1/norm.num_batches_tracked
  backbone.bottom_up.stage5.OSA5_1.layers.2.OSA5_1_2/norm.num_batches_tracked
  backbone.bottom_up.stage5.OSA5_1.concat.OSA5_1_concat/norm.num_batches_tracked
[05/16 14:46:33 d2.engine.train_loop]: Starting training from iteration 0
ERROR [05/16 14:46:38 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
  File "/home/ubuntu/detectron2/detectron2/engine/train_loop.py", line 132, in train
    self.run_step()
  File "/home/ubuntu/detectron2/detectron2/engine/train_loop.py", line 215, in run_step
    loss_dict = self.model(data)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 121, in forward
    features = self.backbone(images.tensor)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/detectron2/detectron2/modeling/backbone/fpn.py", line 123, in forward
    bottom_up_features = self.bottom_up(x)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/detectron2/projects/vovnet-detectron2/vovnet/vovnet.py", line 367, in forward
    x = getattr(self, name)(x)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/detectron2/projects/vovnet-detectron2/vovnet/vovnet.py", line 234, in forward
    xt = self.concat(x)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/detectron2/detectron2/layers/batch_norm.py", line 55, in forward
    return x * scale + bias
RuntimeError: CUDA out of memory. Tried to allocate 1.03 GiB (GPU 0; 11.17 GiB total capacity; 8.48 GiB already allocated; 845.31 MiB free; 10.03 GiB reserved in total by PyTorch)
[05/16 14:46:38 d2.engine.hooks]: Total training time: 0:00:05 (0:00:00 on hooks)
Traceback (most recent call last):
  File "train_net_docs.py", line 115, in <module>
    dist_url=args.dist_url,
  File "/home/ubuntu/detectron2/detectron2/engine/launch.py", line 57, in launch
    main_func(*args)
  File "train_net_docs.py", line 93, in main
    trainer.resume_or_load(resume=args.resume)
  File "/home/ubuntu/detectron2/detectron2/engine/defaults.py", line 401, in train
    super().train(self.start_iter, self.max_iter)
  File "/home/ubuntu/detectron2/detectron2/engine/train_loop.py", line 132, in train
    self.run_step()
  File "/home/ubuntu/detectron2/detectron2/engine/train_loop.py", line 215, in run_step
    loss_dict = self.model(data)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 121, in forward
    features = self.backbone(images.tensor)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/detectron2/detectron2/modeling/backbone/fpn.py", line 123, in forward
    bottom_up_features = self.bottom_up(x)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/detectron2/projects/vovnet-detectron2/vovnet/vovnet.py", line 367, in forward
    x = getattr(self, name)(x)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/detectron2/projects/vovnet-detectron2/vovnet/vovnet.py", line 234, in forward
    xt = self.concat(x)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/home/ubuntu/virtualenvs/detectron_env_2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/detectron2/detectron2/layers/batch_norm.py", line 55, in forward
    return x * scale + bias
RuntimeError: CUDA out of memory. Tried to allocate 1.03 GiB (GPU 0; 11.17 GiB total capacity; 8.48 GiB already allocated; 845.31 MiB free; 10.03 GiB reserved in total by PyTorch)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.