Giter VIP home page Giter VIP logo

natumeyuzuru / hrnet-object-detection Goto Github PK

View Code? Open in Web Editor NEW

This project forked from hrnet/hrnet-object-detection

0.0 1.0 0.0 1.04 MB

Object detection with multi-level representations generated from deep high-resolution representation learning (HRNetV2h).

Home Page: https://jingdongwang2017.github.io/Projects/HRNet/

License: Apache License 2.0

Shell 0.08% Python 86.01% C++ 4.74% Cuda 9.17%

hrnet-object-detection's Introduction

High-resolution networks (HRNets) for object detection

News

Introduction

This is the official code of High-Resolution Representations for Object Detection. We extend the high-resolution representation (HRNet) [1] by augmenting the high-resolution representation by aggregating the (upsampled) representations from all the parallel convolutions, leading to stronger representations. We build a multi-level representation from the high resolution and apply it to the Faster R-CNN, Mask R-CNN and Cascade R-CNN framework. This proposed approach achieves superior results to existing single-model networks on COCO object detection. The code is based on mmdetection

Performance

ImageNet pretrained models

HRNetV2 ImageNet pretrained models are now available! Codes and pretrained models are in HRNets for Image Classification

All models are trained on COCO train2017 set and evaluated on COCO val2017 set. Detailed settings or configurations are in configs/hrnet.

Note: Models are trained with the newly released code and the results have minor differences with that in the paper. Current results will be updated soon and more models and results are comming.

Note: Pretrained HRNets can be downloaded at HRNets for Image Classification

Faster R-CNN

Backbone #Params GFLOPs lr sched SyncBN MS train mAP model
HRNetV2-W18 26.2M 159.1 1x N N 36.1 OneDrive,BaiduDrive(y4hs)
HRNetV2-W18 26.2M 159.1 1x Y N 37.2 OneDrive,BaiduDrive(ypnu)
HRNetV2-W18 26.2M 159.1 1x Y Y(Default) 37.6 OneDrive,BaiduDrive(ekkm)
HRNetV2-W18 26.2M 159.1 1x Y Y(ResizeCrop) 37.6 OneDrive,BaiduDrive(phgo)
HRNetV2-W18 26.2M 159.1 2x N N 38.1 OneDrive,BaiduDrive(mz9y)
HRNetV2-W18 26.2M 159.1 2x Y Y(Default) 39.4 OneDrive,BaiduDrive(ocuf)
HRNetV2-W18 26.2M 159.1 2x Y Y(ResizeCrop) 39.7
HRNetV2-W32 45.0M 245.3 1x N N 39.5 OneDrive,BaiduDrive(ztwa)
HRNetV2-W32 45.0M 245.3 1x Y Y(Default) 41.0
HRNetV2-W32 45.0M 245.3 2x N N 40.8 OneDrive,BaiduDrive(hmdo)
HRNetV2-W32 45.0M 245.3 2x Y Y(Default) 42.6 OneDrive,BaiduDrive(k03x)
HRNetV2-W40 60.5M 314.9 1x N N 40.4 OneDrive,BaiduDrive(0qda)
HRNetV2-W40 60.5M 314.9 2x N N 41.4 OneDrive,BaiduDrive(xny6)

Mask R-CNN

Backbone lr sched Mask mAP Box mAP model
HRNetV2-W18 1x 34.2 37.3 OneDrive,BaiduDrive(vvc1)
HRNetV2-W18 2x 35.7 39.2 OneDrive,BaiduDrive(x2m7)
HRNetV2-W32 1x 36.8 40.7 OneDrive,BaiduDrive(j2ir)
HRNetV2-W32 2x 37.6 42.1 OneDrive,BaiduDrive(tzkz)

Cascade R-CNN

Note: we follow the original paper[2] and adopt 280k training iterations which is equal to 20 epochs in mmdetection.

Backbone lr sched mAP model
ResNet-101 20e 42.8 OneDrive,BaiduDrive(bzlg)
HRNetV2-W32 20e 43.7 OneDrive,BaiduDrive(ydd7)

Techniques about multi-scale training

Default

  • Procedure

    1. Select one scale from provided scales randomly and apply it.
    2. Pad all images in a GPU Batch(e.g. 2 images per GPU) to the same size (see pad_size, 1600*1000 or 1000*1600)
  • Code

You need to change lines below in config files

data = dict(
    imgs_per_gpu=4,
    workers_per_gpu=8,
    pad_size=(1600, 1024),
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'images/train2017.zip',
        img_scale=[(1600, 1000), (1000, 600), (1333, 800)],
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=False,
        with_crowd=True,
        with_label=True),

ResizeCrop

Less memory and less time, this implementation is more efficient compared to the former one

  • Procedure

    1. Select one scale from provided scales randomly and apply it.
    2. Crop images to a fixed size randomly if they are larger than the given size.
    3. Pad all images to the same size (see pad_size).
  • Code

You need to change lines below in config files

    imgs_per_gpu=2,
    workers_per_gpu=4,
    pad_size=(1216, 800),
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'annotations/instances_train2017.json',
        img_prefix=data_root + 'train2017.zip',
        img_scale=(1200, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=1,
        extra_aug=dict(
            rand_resize_crop=dict(
                scales=[[1400, 600], [1400, 800], [1400, 1000]],
                size=[1200, 800]
            )),
        flip_ratio=0.5,
        with_mask=False,
        with_crowd=True,
        with_label=True),

Quick start

Environment

This code is developed using on Python 3.6 and PyTorch 1.0.0 on Ubuntu 16.04 with NVIDIA GPUs. Training and testing are performed using 4 NVIDIA P100 GPUs with CUDA 9.0 and cuDNN 7.0. Other platforms or GPUs are not fully tested.

Install

  1. Install PyTorch 1.0 following the official instructions
  2. Install mmcv
pip install mmcv
  1. Install pycocotools
git clone https://github.com/cocodataset/cocoapi.git \
 && cd cocoapi/PythonAPI \
 && python setup.py build_ext install \
 && cd ../../
  1. Install NVIDIA/apex to enable SyncBN
git clone https://github.com/NVIDIA/apex
cd apex
python setup install --cuda_ext
  1. Install HRNet-Object-Detection
git clone https://github.com/HRNet/HRNet-Object-Detection.git

cd HRNet-Object-Detection
# compile CUDA extensions.
chmod +x compile.sh
./compile.sh

# run setup
python setup.py install 

# or install locally
python setup.py install --user

For more details, see INSTALL.md

HRNetV2 pretrained models

cd HRNet-Object-Detection
# Download pretrained models into this folder
mkdir hrnetv2_pretrained

Datasets

Please download the COCO dataset from cocodataset. If you use zip format, please specify CocoZipDataset in config files or CocoDataset if you unzip the downloaded dataset.

Train (multi-gpu training)

Please specify the configuration file in configs (learning rate should be adjusted when the number of GPUs is changed).

python -m torch.distributed.launch --nproc_per_node <GPUS NUM> tools/train.py <CONFIG-FILE> --launcher pytorch
# example:
python -m torch.distributed.launch --nproc_per_node 4 tools/train.py configs/hrnet/faster_rcnn_hrnetv2p_w18_1x.py --launcher pytorch

Test

python tools/test.py <CONFIG-FILE> <MODEL WEIGHT> --gpus <GPUS NUM> --eval bbox --out result.pkl
# example:
python tools/test.py configs/hrnet/faster_rcnn_hrnetv2p_w18_1x.py work_dirs/faster_rcnn_hrnetv2p_w18_1x/model_final.pth --gpus 4 --eval bbox --out result.pkl

NOTE: If you meet some problems, you may find a solution in issues of official mmdetection repo or submit a new issue in our repo.

Other applications of HRNets (codes and models):

Citation

If you find this work or code is helpful in your research, please cite:

@inproceedings{SunXLW19,
  title={Deep High-Resolution Representation Learning for Human Pose Estimation},
  author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang},
  booktitle={CVPR},
  year={2019}
}

@article{WangSCJDZLMTWLX19,
  title={Deep High-Resolution Representation Learning for Visual Recognition},
  author={Jingdong Wang and Ke Sun and Tianheng Cheng and 
          Borui Jiang and Chaorui Deng and Yang Zhao and Dong Liu and Yadong Mu and 
          Mingkui Tan and Xinggang Wang and Wenyu Liu and Bin Xiao},
  journal   = {CoRR},
  volume    = {abs/1908.07919},
  year={2019}
}

Reference

[1] Deep High-Resolution Representation Learning for Human Pose Estimation. Ke Sun, Bin Xiao, Dong Liu, and Jingdong Wang. CVPR 2019. download

[2] Cascade R-CNN: Delving into High Quality Object Detection. Zhaowei Cai, and Nuno Vasconcetos. CVPR 2018.

Acknowledgement

Thanks @open-mmlab for providing the easily-used code and kind help!

hrnet-object-detection's People

Contributors

donnyyou avatar hellock avatar lindahua avatar luodian avatar myownskyw7 avatar oceanpang avatar patrick-llgc avatar thangvubk avatar tjsongzw avatar welleast avatar wondervictor avatar xfguo-ucas avatar ychfan avatar yhcao6 avatar youkaichao avatar zehaos avatar zhihuagao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.