Giter VIP home page Giter VIP logo

sigma's Introduction

[Arxiv] [知乎]

By Wuyang Li

Welcome to have a look at our previous work SCAN (AAAI'22 ORAL), which is the foundation of this work.

Installation

Check INSTALL.md for installation instructions.

If you have any problem in terms of installation, feel free to screenshot your issue for me. Thanks.

Data preparation

Step 1: Format three benchmark datasets. (BDD100k is also available)

We follow EPM to construct the training and testing set by three following settings:

  • Cityscapes -> Foggy Cityscapes
    • Download Cityscapes and Foggy Cityscapes dataset from the link. Particularly, we use leftImg8bit_trainvaltest.zip for Cityscapes and leftImg8bit_trainvaltest_foggy.zip for Foggy Cityscapes.
    • Download and extract the converted annotation from the following links: Cityscapes and Foggy Cityscapes (COCO format).
    • Extract the training sets from leftImg8bit_trainvaltest.zip, then move the folder leftImg8bit/train/ to Cityscapes/leftImg8bit/ directory.
    • Extract the training and validation set from leftImg8bit_trainvaltest_foggy.zip, then move the folder leftImg8bit_foggy/train/ and leftImg8bit_foggy/val/ to Cityscapes/leftImg8bit_foggy/ directory.
  • Sim10k -> Cityscapes (class car only)
    • Download Sim10k dataset and Cityscapes dataset from the following links: Sim10k and Cityscapes. Particularly, we use repro_10k_images.tgz and repro_10k_annotations.tgz for Sim10k and leftImg8bit_trainvaltest.zip for Cityscapes.
    • Download and extract the converted annotation from the following links: Sim10k (VOC format) and Cityscapes (COCO format).
    • Extract the training set from repro_10k_images.tgz and repro_10k_annotations.tgz, then move all images under VOC2012/JPEGImages/ to Sim10k/JPEGImages/ directory and move all annotations under VOC2012/Annotations/ to Sim10k/Annotations/.
    • Extract the training and validation set from leftImg8bit_trainvaltest.zip, then move the folder leftImg8bit/train/ and leftImg8bit/val/ to Cityscapes/leftImg8bit/ directory.
  • KITTI -> Cityscapes (class car only)
    • Download KITTI dataset and Cityscapes dataset from the following links: KITTI and Cityscapes. Particularly, we use data_object_image_2.zip for KITTI and leftImg8bit_trainvaltest.zip for Cityscapes.
    • Download and extract the converted annotation from the following links: KITTI (VOC format) and Cityscapes (COCO format).
    • Extract the training set from data_object_image_2.zip, then move all images under training/image_2/ to KITTI/JPEGImages/ directory.
    • Extract the training and validation set from leftImg8bit_trainvaltest.zip, then move the folder leftImg8bit/train/ and leftImg8bit/val/ to Cityscapes/leftImg8bit/ directory.
[DATASET_PATH]
└─ Cityscapes
   └─ cocoAnnotations
   └─ leftImg8bit
      └─ train
      └─ val
   └─ leftImg8bit_foggy
      └─ train
      └─ val
└─ KITTI
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages
└─ Sim10k
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages

Step 2: change the data root for your dataset at paths_catalog.py.

DATA_DIR = [$Your dataset root]

Tutorials for this project

  1. We provide super detailed code comments in sigma_vgg16_cityscapace_to_foggy.yaml.
  2. We modify the trainer to meet the requirements of SIGMA.
  3. GM is integrated in this "middle layer": graph_matching_head.
  4. Node sampling is conducted together with fcos loss: loss.
  5. We preserve lots of APIs for many implementation choices in defaults
  6. We hope this work can inspire lots of good ideas

Well-trained models

We have provided lots of well-trained models at (onedrive).

  1. Kindly note that we can get higher results than the reported ones with tailor-tuned hyperparameters.
  2. We didn't tune the hyperparameters for ResNet-50, and it could be further improved.
  3. We have tested on C2F and S2F with end-to-end (e2e) training and achieve similar resutls. Our config files are for e2e training.
  4. After correcting a default hyper-parameter, our S2C gives four mAP gains compared with the reported one, as explained in the config file.
dataset backbone mAP mAP@50 mAP@75 file-name
Cityscapes -> Foggy Cityscapes VGG16 24.0 43.6 23.8 city_to_foggy_vgg16_43.58_mAP.pth
Cityscapes -> Foggy Cityscapes VGG16 24.3 43.9 22.6 city_to_foggy_vgg16_43.90_mAP.pth
Cityscapes -> Foggy Cityscapes Res50 22.7 44.3 21.2 city_to_foggy_res50_44.26_mAP.pth
Cityscapes -> BDD100k VGG16 - 32.7 - city_to_bdd100k_vgg16_32.65_mAP.pth
Sim10k -> Cityscapes VGG16 33.4 57.1 33.8 sim10k_to_city_vgg16_53.73_mAP.pth
KITTI -> Cityscapes VGG16 22.6 46.6 20.0 kitti_to_city_vgg16_46.45_mAP.pth

Get start

Train the model from the scratch with the default setting (batchsize = 4):

python tools/train_net_da.py \
        --config-file configs/SIGMA/xxx.yaml \

Test the well-trained model:

python tools/test_net.py \
        --config-file configs/SIGMA/xxx.yaml \
        MODEL.WEIGHT well_trained_models/xxx.pth

# For example: test cityscapes to foggy cityscapes with ResNet50 backbone.

python tools/test_net.py \
         --config-file configs/SIGMA/sigma_res50_cityscapace_to_foggy.yaml \
         MODEL.WEIGHT well_trained_models/city_to_foggy_res50_44.26_mAP.pth

If you train the model from the scratch with a limited batchsize (batchsize = 2), you may need to do some modifications for a stable training:

  1. double your training itertaions
  2. set MODEL.ADV.GA_DIS_LAMBDA 0.1
  3. careforally check if the node_loss continuely decreases

we provide the reproduced results for City to Foggy (vgg16, e2e, unfinished training) to help you check if SIGMA works properly:

iterations batchsize LR (middle head) mAP mAP@50 mAP@75 node_loss
2000 2 0.0025 6.8 17.5 3.4 0.3135
10000 2 0.0025 15.6 32.3 12.8 0.1291
20000 2 0.0025 20.0 37.9 18.8 0.0834
40000 2 0.0025 20.6 40.0 18.9 0.0415
50000 2 0.0025 22.3 42.1 20.5 0.0351

We don't recommend to train with a too small batchsize, since the cross-image graph can't discover enough nodes for a image batch.

Citation

If you think this work is helpful for your project, please give it a star and citation:

@inproceedings{li2022sigma,
  title={SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection},
  author={Li, Wuyang and Liu, Xinyu and Yuan, Yixuan},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2022}
}

Contact

E-mail: [email protected]

Acknowledgements

This work is based on SCAN (AAAI'22 ORAL) and EPM (ECCV20).

The implementation of our anchor-free detector is from FCOS.

Abstract

Domain Adaptive Object Detection (DAOD) leverages a labeled source domain to learn an object detector generalizing to a novel target domain free of annotations. Recent advances align class-conditional distributions through narrowing down cross-domain prototypes (class centers). Though great success, these works ignore the significant within-class variance and the domain-mismatched semantics within the training batch, leading to a sub-optimal adaptation. To overcome these challenges, we propose a novel SemantIc-complete Graph MAtching (SIGMA) framework for DAOD, which completes mismatched semantics and reformulates the adaptation with graph matching. Specifically, we design a Graph-embedded Semantic Completion module (GSC) that completes mismatched semantics through generating hallucination graph nodes in missing categories. Then, we establish cross-image graphs to model class-conditional distributions and learn a graph-guided memory bank for better semantic completion in turn. After representing the source and target data as graphs, we reformulate the adaptation as a graph matching problem, i.e., finding well-matched node pairs across graphs to reduce the domain gap, which is solved with a novel Bipartite Graph Matching adaptor (BGM). In a nutshell, we utilize graph nodes to establish semantic-aware node affinity and leverage graph edges as quadratic constraints in a structure-aware matching loss, achieving fine-grained adaptation with a node-to-node graph matching. Extensive experiments demonstrate that our method outperforms existing works significantly.

image

sigma's People

Contributors

wymancv avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.