Giter VIP home page Giter VIP logo

sgmg's Introduction

License arXiv

PWC PWC PWC PWC

The official implementation of the ICCV 2023 paper:

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Ajmal Mian

ICCV 2023

Introduction

We propose a Spectrum-guided Multi-granularity (SgMg) approach that follows a segment-and-optimize pipeline to tackle the feature drift issue found in previous decode-and-segment approaches. Extensive experiments show that SgMg achieves state-of-the-art overall performance on multiple benchmark datasets, outperforming the closest competitor by 2.8% points on Ref-YouTube-VOS with faster inference time.

Setup

The main setup of our code follows Referformer.

Please refer to install.md for installation.

Please refer to data.md for data preparation.

Training and Evaluation

All the models are trained using 2 RTX 3090 GPU. If you encounter the OOM error, please add the command --use_checkpoint.

The training and evaluation scripts are included in the scripts folder. If you want to train/evaluate SgMg, please run the following command:

sh dist_train_ytvos_videoswinb.sh
sh dist_test_ytvos_videoswinb.sh

Note: You can modify the --backbone and --backbone_pretrained to specify a backbone.

Model Zoo

We provide the pretrained model for different visual backbones and the checkpoints for SgMg (refer below).

You can put the models in the checkpoints folder to start training/inference.

Results (Ref-YouTube-VOS & Ref-DAVIS)

To evaluate the results, please upload the zip file to the competition server.

Backbone Ref-YouTube-VOS J&F Ref-DAVIS J&F Model Submission
Video-Swin-T 62.0 61.9 model link
Video-Swin-B 65.7 63.3 model link

Results (A2D-Sentences & JHMDB-Sentences)

Backbone (A2D) mAP Mean IoU Overall IoU (JHMDB) mAP Mean IoU Overall IoU Model
Video-Swin-T 56.1 78.0 70.4 44.4 72.8 71.7 model
Video-Swin-B 58.5 79.9 72.0 45.0 73.7 72.5 model

Results (RefCOCO/+/g)

The overall IoU is used as the metric, and the model is obtained from the pre-training stage mentioned in the paper.

Backbone RefCOCO RefCOCO+ RefCOCOg Model
Video-Swin-B 76.3 66.4 70.0 model

Acknowledgements

Citation

@InProceedings{Miao_2023_ICCV,
    author    = {Miao, Bo and Bennamoun, Mohammed and Gao, Yongsheng and Mian, Ajmal},
    title     = {Spectrum-guided Multi-granularity Referring Video Object Segmentation},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2023},
    pages     = {920-930}
}

Contact

If you have any questions about this project, please feel free to contact [email protected].

sgmg's People

Contributors

bo-miao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

sgmg's Issues

The performance of Video Swin Base without pretraining

Thank you for sharing such excellent work. I would like to ask if you have tested the Video Swin Transformer Base as a backbone on the Ref-Youtube-VOS dataset without pretraining on RefCOCO? The results I obtained using your code seem to be similar to those with Video Swin Tiny.

I'm unsure of the cause. It's possible there are some bugs, or the Ref-Youtube-VOS dataset might be too small for effectively fine-tuning the Video Swin Transformer Base.

Thank you for your attention!

Training time

Hi, thanks for your great work. I would like to ask about the training, for both pretrainning and finetuning. In referformer, it takes 2 days and 32 V100 GPUs for pre-trainning. How about SgMg?

Pretrained model

Hello,

thanks for sharing excellent work and code!
Can you share the pretrained model VideoSwin-Tiny model on RefCOCO datasets?

Thank you!

How to visualize Figure 7?

Hi, bo-miao.

Thank you for such a great job. ๐ŸŽ‰๐ŸŽ‰

I am very curious about how the picture you drew in Figure 7 visualizes this heatmap. Could you share the related code? This may be of great help in understanding the working mechanism of sgmg, and even in the field of RVOS.

Looking forward to your reply~

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.