Giter VIP home page Giter VIP logo

solider's Introduction

PWC PWC PWC PWC PWC PWC PWC PWC

Welcome to SOLIDER! SOLIDER is a Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images which can benefit downstream human-centric tasks to the maximum extent. Unlike the existing self-supervised learning methods, prior knowledge from human images is utilized in SOLIDER to build pseudo semantic labels and import more semantic information into the learned representation. Meanwhile, different downstream tasks always require different ratios of semantic information and appearance information, and a single learned representation cannot fit for all requirements. To solve this problem, SOLIDER introduces a conditional network with a semantic controller, which can fit different needs of downstream tasks. For more details, please refer to our paper Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks.

Updates

  • [2023/04/24: Codes of attribute recognition task is released!] new
    • Training details of our pretrained model on downstream person attribute recognition task is released.
  • [2023/03/28: Codes of 3 downstream tasks are released!]
    • Training details of our pretrained model on 3 downstream human visual tasks, including person re-identification, person search and pedestrian detection, are released.
  • [2023/03/13: SOLIDER is accepted by CVPR2023!]
    • The paper of SOLIDER is accepted by CVPR2023, and its offical pytorch implementation is released in this repo.

Installation

This codebase has been developed with python version 3.7, PyTorch version 1.7.1, CUDA 10.1 and torchvision 0.8.2.

Datasets

We use LUPerson as our training data, which consists of unlabeled human images. Download LUPerson from its offical link and unzip it.

Training

  • Choice 1. To train SOLIDER from scratch, please run:
sh run_solider.sh
  • Choice 2. Training SOLIDER from scratch may take a long time. To speed up the training, you can train a DINO model first, and then finetune it with SOLIDER, as follows:
sh run_dino.sh
sh resume_solider.sh

Finetuning and Inference

There is a demo to run the trained SOLIDER model, which can be embedded into the inference or the downstream task finetuning.

python demo.py

Models

We use Swin-Transformer as our backbone, which shows great advantages on many CV tasks.

Task Dataset Swin Tiny
(Link)
Swin Small
(Link)
Swin Base
(Link)
Person Re-identification (mAP/R1)
w/o re-ranking
Market1501 91.6/96.1 93.3/96.6 93.9/96.9
MSMT17 67.4/85.9 76.9/90.8 77.1/90.7
Person Re-identification (mAP/R1)
with re-ranking
Market1501 95.3/96.6 95.4/96.4 95.6/96.7
MSMT17 81.5/89.2 86.5/91.7 86.5/91.7
Attribute Recognition (mA) PETA_ZS 74.37 76.21 76.43
RAP_ZS 74.23 75.95 76.42
PA100K 84.14 86.25 86.37
Person Search (mAP/R1) CUHK-SYSU 94.9/95.7 95.5/95.8 94.9/95.5
PRW 56.8/86.8 59.8/86.7 59.7/86.8
Pedestrian Detection (MR-2) CityPersons 10.3/40.8 10.0/39.2 9.7/39.4
Human Parsing (mIOU) LIP 57.52 60.21 60.50
Pose Estimation (AP/AR) COCO 74.4/79.6 76.3/81.3 76.6/81.5
  • All the models are trained on the whole LUPerson dataset.

Traning codes on Downstream Tasks

Acknowledgement

Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

Reference

If you use SOLIDER in your research, please cite our work by using the following BibTeX entry:

@inproceedings{chen2023beyond,
  title={Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks},
  author={Weihua Chen and Xianzhe Xu and Jian Jia and Hao Luo and Yaohua Wang and Fan Wang and Rong Jin and Xiuyu Sun},
  booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023},
}

solider's People

Contributors

ssl-solider avatar cwhgn avatar xiuyu-sxy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.