Giter VIP home page Giter VIP logo

crowddet's Introduction

Detection in Crowded Scenes: One Proposal, Multiple Predictions

This is the pytorch implementation of our paper "Detection in Crowded Scenes: One Proposal, Multiple Predictions", https://arxiv.org/abs/2003.09163, published in CVPR 2020.

Our method aiming at detecting highly-overlapped instances in crowded scenes.

The key of our approach is to let each proposal predict a set of instances that might be highly overlapped rather than a single one in previous proposal-based frameworks. With this scheme, the predictions of nearby proposals are expected to infer the same set of instances, rather than distinguishing individuals, which is much easy to be learned. Equipped with new techniques such as EMD Loss and Set NMS, our detector can effectively handle the difficulty of detecting highly overlapped objects.

The network structure and results are shown here:

Citation

If you use the code in your research, please cite:

@InProceedings{Chu_2020_CVPR,
author = {Chu, Xuangeng and Zheng, Anlin and Zhang, Xiangyu and Sun, Jian},
title = {Detection in Crowded Scenes: One Proposal, Multiple Predictions},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}
}

Run

  1. Setup environment by docker
    sudo docker build . -t crowddet
    • Run docker image:
    sudo docker run --gpus all --shm-size=8g -it --rm crowddet
  1. CrowdHuman data:

    • CrowdHuman is a benchmark dataset to better evaluate detectors in crowd scenarios. The dataset can be downloaded from http://www.crowdhuman.org/. The path of the dataset is set in config.py.
  2. Steps to run:

    • Step1: training. More training and testing settings can be set in config.py.
    cd tools
    python3 train.py -md rcnn_fpn_baseline
    
    • Step2: testing. If you have four GPUs, you can use -d 0-3 to use all of your GPUs. The result json file will be evaluated automatically.
    cd tools
    python3 test.py -md rcnn_fpn_baseline -r 40
    
    • Step3: evaluating json, inference one picture and visulization json file. -r means resume epoch, -n means number of visulization pictures.
    cd tools
    python3 eval_json.py -f your_json_path.json
    python3 inference.py -md rcnn_fpn_baseline -r 40 -i your_image_path.png 
    python3 visulize_json.py -f your_json_path.json -n 3
    

Models

We use MegEngine in the research (https://github.com/megvii-model/CrowdDetection), this proiect is a re-implementation based on Pytorch.

We use pre-trained model from MegEngine Model Hub and convert this model to pytorch. You can get this model from here. These models can also be downloaded from Baidu Netdisk(code:yx46).

Model Top1 acc Top5 acc
ResNet50 76.254 93.056

All models are based on ResNet-50 FPN.

AP MR JI Model
RCNN FPN Baseline (convert from MegEngine) 0.8718 0.4239 0.7949 rcnn_fpn_baseline_mge.pth
RCNN EMD Simple (convert from MegEngine) 0.9052 0.4196 0.8209 rcnn_emd_simple_mge.pth
RCNN EMD with RM (convert from MegEngine) 0.9097 0.4102 0.8271 rcnn_emd_refine_mge.pth
RCNN FPN Baseline (trained with PyTorch) 0.8665 0.4243 0.7949 rcnn_fpn_baseline.pth
RCNN EMD Simple (trained with PyTorch) 0.8997 0.4167 0.8225 rcnn_emd_simple.pth
RCNN EMD with RM (trained with PyTorch) 0.9030 0.4128 0.8263 rcnn_emd_refine.pth
RetinaNet FPN Baseline 0.8188 0.5644 0.7316 retina_fpn_baseline.pth
RetinaNet EMD Simple 0.8292 0.5481 0.7393 retina_emd_simple.pth

Contact

If you have any questions, please do not hesitate to contact Xuangeng Chu ([email protected]).

crowddet's People

Contributors

dongdem avatar xg-chu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.