Overview of our method. (a) Entire network structure, (b) Multi-frame attention with feature-level warping. The module consists of three modules: forward warping, multi-frame attention, and backward warping. Forward warping aligns features according to frame
$T$ , and multi-frame attention aggregate temporal context. Since the extracted feature maps are aligned to$T$ , the feature maps are warped into features that represent original positions by backward warping.
This repository provides the offical release of the code package for my paper Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking in WACV 2022 (url comming soon).
This method is designed to better address the problem of drone crowd tracking by efficiently aggregating multiple frames information. Comparing to conventional seven methods, proposed method improves the tracking and localization mAP from backbone score.
- PyTorch 1.12.0
- cuda && cudnn
- Download the DroneCrowd datasets DroneCrowd BaiduYun(code:ml1u)| GoogleDrive
We strongly recommend using a virtual environment like Anaconda or Docker. The following is how to build the virtual environment for this code using anaconda.
$ pip install -r env/requirements.py
This full version consists of 112 video clips with 33,600 high resolution frames (i.e., 1920x1080) captured in 70 different scenarios. With intensive amount of effort, our dataset provides 20,800 people trajectories with 4.8 million head annotations and several video-level attributes in sequences.
DroneCrowd BaiduYun(code:ml1u)| GoogleDrive
Please prepare your data you installed as above as follows.
current dir
./dataset
├── train
│ └── train_imgs
│ ├── sequence001 # Each sequence has 300 images
│ ├── sequence002
│ ├── :
│ └── sequenceN
├── val
│ └─ val_imgs # Same structure of train_imgs
│ ├── sequence011 # Each sequence has 12 images
│ ├── sequence015
│ ├── :
│ └── sequenceM
└── test
└── test_imgs # Same structure of train_imgs.
├── sequence011 # Each sequence has 300 images
├── sequence015
├── :
└── sequenceM
If you use this code, run this command to create heatmap's ground-truth first. Training & Validation ground-truth are added in dataset directory. The train_map and val_map directory will be automatically created.
$ python create_gts.py
- Prepare the training set. In dataset directory(example datsets are setted). Default dataset's path is determined at
config/train.yaml
. - Run the training script
$ python train.py
You can set up input path/output path/parameters from config/train.yaml
.
Comming soon
The trained models are available in the folder /models/trained
.
Please cite this paper if you want to use it in your work.
@inproceedings{asanomi2023multi,
title={Multi-Frame Attention with Feature-Level Warping for Drone Crowd Tracking},
author={Asanomi, Takanori and Nishimura, Kazuya and Bise, Ryoma},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1664--1673},
year={2023}
}
@inproceedings{dronecrowd_cvpr2021,
author = {Longyin Wen and
Dawei Du and
Pengfei Zhu and
Qinghua Hu and
Qilong Wang and
Liefeng Bo and
Siwei Lyu},
title = {Detection, Tracking, and Counting Meets Drones in Crowds: A Benchmark},
booktitle = {CVPR},
year = {2021}
}