Giter VIP home page Giter VIP logo

afb-urr's Introduction

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

This repository is the official implementation of Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement (NeurIPS 2020). It is designed for semi-supervised video object segmentation (VOS) task.

[NeurIPS Page] [Paper] [Supplementary]

Paper corrections: Our feature map generated by the encoders has 1024 channels and 1/16 of the original image size.

1. Requirements

We built and tested the repository on Python 3.6.9 and Ubuntu 18.04 with one NVIDIA 1080Ti card (11GB Memory). Run on Windows or Mac is possible with minor modifications. An NVIDIA GPU card and CUDA environment are required. To install requirements, run:

pip3 install -r requirements.txt

Install the package torch_scatter by the official instructions. Our version is 2.0.4.

2. Evaluation

DAVIS17-TrainVal

  1. Download and extract DAVIS17-TrainVal dataset.
  2. Download the pretrained DAVIS17 checkpoint.
  3. run:
python3 eval.py --level 1 --resume /path/to/checkpoint.pth/ --dataset /path/to/dir/

To reproduce the segmentation scores, you can use the official evaluation tool from DAVIS benchmark.

YouTube-VOS18

  1. Download and extract YouTube-VOS18 dataset.
  2. Download the pretrained YouTube-VOS18 checkpoint.
  3. run:
python3 eval.py --level 2 --resume /path/to/checkpoint.pth/ --dataset /path/to/dir/ --update-rate 0.05

Attention: Directly submit our results to the YouTube-VOS codalab for evaluation will pollute the leader board. We encourage you to submit your own results.

Long Videos

  1. Download and extract Long Videos dataset.
  2. Download the pretrained YouTube-VOS18 checkpoint above.
  3. run:
python3 eval.py --level 3 --resume /path/to/checkpoint.pth/ --dataset /path/to/dir/ --update-rate 0.05

To reproduce the segmentation scores, you can use the same tool from the DAVIS benchmark.

Your Own Video

Prepare your video frames and the first frame annotation following the data structure of the long videos page. You can see the data structure without download it and you only need to provide the first frame annotation.

Run the same parameters as the long videos setting.

Options for Evaluation

  1. --gpu: GPU id to run (default: 0).
  2. --viz: Enable output overlays along with the estimated masks (default: False).
  3. --budget: The number of features that can be stored in total (default: 300000 for 1080Ti).

By default, the segmentation results will be saved in ./output.

3. Training

Pre-training on Static Images

  1. Download the following the datasets (COCO is the largest one). You don't have to download all, our pretrain codes skip datasets that don't exist by default.
  2. Run unify_pretrain_dataset.py to convert them into a uniform format (followed DAVIS).
python3 unify_pretrain_dataset.py --name NAME --src /path/to/dataset/dir/ --dst /path/to/output
  1. MSRA10K: --name MSRA10K
  2. ECSSD: --name ECSSD
  3. PASCAL-S: --name PASCAl-s
  4. PASCAL VOC2012: --name PASCALVOC2012
  5. COCO: --name COCO. API pycocotools is required.

You may need minor modifications in the dataset path. Descriptions of useful options,

  1. --palette: Path to the palette image. We provide a template in assets/mask_palette.png, followed the formats of DAVIS17.
  2. --workder: The parallel threads number to accelerate the procedures (Default: 20).

After the conversion process, you can start pre-training the model:

python3 train.py --level 0 --dataset /path/to/pretrain/ --lr 1e-5 --scheduler-step 3 --total-epoch 12 --log

Pre-training process may takes days to weeks, you can download our checkpoint to save time.

Training on DAVIS17

Download the semi-supervised TrainVal 480p from the DAVIS website. Run

python3 train.py --level 1 --new --resume /path/to/PreTrain/checkpoint.pth --dataset /path/to/DAVIS17/ --lr 4e-6 --scheduler-step 200 --total-epoch 1000 --log

Training on YouTube-VOS

Download training set of the YouTube-VOS dataset. Run

python3 train.py --level 2 --new --resume /path/to/PreTrain/checkpoint.pth --dataset /path/to/YouTubeVOS/train --lr 4e-6 --scheduler-step 30 --total-epoch 150 --log

4. License

This repository is released for academic use only. If you want to use our codes for commercial products, please contact [email protected] in advance. If you use our codes, please cite our paper,

@inproceedings{NEURIPS2020_liangVOS,
 author = {Liang, Yongqing and Li, Xin and Jafari, Navid and Chen, Jim},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
 pages = {3430--3441},
 publisher = {Curran Associates, Inc.},
 title = {Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement},
 url = {https://proceedings.neurips.cc/paper/2020/file/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf},
 volume = {33},
 year = {2020}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.