Giter VIP home page Giter VIP logo

solider's Introduction

PWC PWC PWC PWC PWC PWC PWC PWC

Welcome to SOLIDER! SOLIDER is a Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images which can benefit downstream human-centric tasks to the maximum extent. Unlike the existing self-supervised learning methods, prior knowledge from human images is utilized in SOLIDER to build pseudo semantic labels and import more semantic information into the learned representation. Meanwhile, different downstream tasks always require different ratios of semantic information and appearance information, and a single learned representation cannot fit for all requirements. To solve this problem, SOLIDER introduces a conditional network with a semantic controller, which can fit different needs of downstream tasks. For more details, please refer to our paper Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks.

Updates

  • [2023/07/21: Codes of human pose task is released!] new
    • Training details of our pretrained model on downstream human pose task is released.
  • [2023/05/15: Codes of human parsing task is released!] new
    • Training details of our pretrained model on downstream human parsing task is released.
  • [2023/04/24: Codes of attribute recognition task is released!] new
    • Training details of our pretrained model on downstream person attribute recognition task is released.
  • [2023/03/28: Codes of 3 downstream tasks are released!]
    • Training details of our pretrained model on 3 downstream human visual tasks, including person re-identification, person search and pedestrian detection, are released.
  • [2023/03/13: SOLIDER is accepted by CVPR2023!]
    • The paper of SOLIDER is accepted by CVPR2023, and its offical pytorch implementation is released in this repo.

Installation

This codebase has been developed with python version 3.7, PyTorch version 1.7.1, CUDA 10.1 and torchvision 0.8.2.

Datasets

We use LUPerson as our training data, which consists of unlabeled human images. Download LUPerson from its offical link and unzip it.

Training

  • Choice 1. To train SOLIDER from scratch, please run:
sh run_solider.sh
  • Choice 2. Training SOLIDER from scratch may take a long time. To speed up the training, you can train a DINO model first, and then finetune it with SOLIDER, as follows:
sh run_dino.sh
sh resume_solider.sh

Finetuning and Inference

There is a demo to run the trained SOLIDER model, which can be embedded into the inference or the downstream task finetuning.

python demo.py

Models

We use Swin-Transformer as our backbone, which shows great advantages on many CV tasks.

Task Dataset Swin Tiny
(Link)
Swin Small
(Link)
Swin Base
(Link)
Person Re-identification (mAP/R1)
w/o re-ranking
Market1501 91.6/96.1 93.3/96.6 93.9/96.9
MSMT17 67.4/85.9 76.9/90.8 77.1/90.7
Person Re-identification (mAP/R1)
with re-ranking
Market1501 95.3/96.6 95.4/96.4 95.6/96.7
MSMT17 81.5/89.2 86.5/91.7 86.5/91.7
Attribute Recognition (mA) PETA_ZS 74.37 76.21 76.43
RAP_ZS 74.23 75.95 76.42
PA100K 84.14 86.25 86.37
Person Search (mAP/R1) CUHK-SYSU 94.9/95.7 95.5/95.8 94.9/95.5
PRW 56.8/86.8 59.8/86.7 59.7/86.8
Pedestrian Detection (MR-2) CityPersons 10.3/40.8 10.0/39.2 9.7/39.4
Human Parsing (mIOU) LIP 57.52 60.21 60.50
Pose Estimation (AP/AR) COCO 74.4/79.6 76.3/81.3 76.6/81.5
  • All the models are trained on the whole LUPerson dataset.

Traning codes on Downstream Tasks

Acknowledgement

Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

Reference

If you use SOLIDER in your research, please cite our work by using the following BibTeX entry:

@inproceedings{chen2023beyond,
  title={Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks},
  author={Weihua Chen and Xianzhe Xu and Jian Jia and Hao Luo and Yaohua Wang and Fan Wang and Rong Jin and Xiuyu Sun},
  booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023},
}

solider's People

Contributors

cwhgn avatar ssl-solider avatar xianzhexu avatar xiuyu-sxy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

solider's Issues

What is the feature dimension for person ReID?

Were the feature embedding sizes defined here used for your person ReID experiments link, i.e., 96 for Swin Tiny, 96 for Swin Small, and 128 for Swin Base? Or did you use embeddings of higher dimension? It was not mentioned in the paper. Thanks.

mAP and rank values?

Hello, may I ask how can I see the mAP and rank values? These two parameters are not displayed in my training log. Thank you very much for your review!

下游任务微调

请问在进行下游任务时,例如分类任务时,是全量微调,还是冻结主干,只微调分类头部分?

inference (human parsing)

How to run model on (one of task) inference
I found that in demo.py only features space
Im interesting in human parsing task

modescope中SOLIDER-base模型使用

博主你好,我看你们在modescope中放出了预训练的SOLIDER-base模型,但是modescope中的pipeline中并没有相应的Tasks.Id。

SOLIDER with Resnet50

Hi authors,
Thanks for sharing your method, it is really interesting. This question is out of scope of the paper, have you tried your method with Resnet architecture? Can it work? Thank you in advance.

IndexError: tensors used as indices must be long, byte or bool tensors

when I run the resume_solider.sh,it maybe traceback with
File "main_solider.py", line 398, in train_one_epoch
semantic_weight = [torch.cat(semantic_weight)[torch.from_numpy(mask_idxs)]]
IndexError: tensors used as indices must be long, byte or bool tensors
how can I deal with it

Request for PedestrianDetection pretrained model

博主您好!
我使用SOLIDER上pretrained的模型进行测试,报了如下的错误

/home/cddjjc/anaconda3/envs/pedestron_v2/bin/python /home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py configs/solider/cp/swin_base.py models_pretrained/solider_origin/swin_base/epoch_ 1 2 --out swin_base.json --show --mean_teacher 
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
No pre-trained weights for SwinBase, training start from scratch
unexpected key in source state_dict: backbone.norm0.weight, backbone.norm0.bias, head.mlp.0.weight, head.mlp.0.bias, head.mlp.2.weight, head.mlp.2.bias, head.mlp.4.weight, head.mlp.4.bias, head.last_layer.weight_g, head.last_layer.weight_v

missing keys in source state_dict: bbox_head.reg_convs.0.gn.bias, bbox_head.offset_scales.0.scale, bbox_head.cls_convs.0.conv.weight, neck.p3_l2.weight, bbox_head.reg_convs.0.conv.weight, bbox_head.cls_convs.0.gn.bias, bbox_head.csp_reg.weight, neck.p4_l2.weight, bbox_head.csp_cls.weight, neck.p5_l2.weight, bbox_head.cls_convs.0.gn.weight, bbox_head.offset_convs.0.conv.weight, bbox_head.csp_offset.bias, neck.p4.bias, bbox_head.csp_offset.weight, neck.p5.weight, bbox_head.reg_scales.0.scale, bbox_head.csp_cls.bias, neck.p4.weight, bbox_head.reg_convs.0.gn.weight, bbox_head.csp_reg.bias, bbox_head.offset_convs.0.gn.bias, neck.p3.weight, bbox_head.offset_convs.0.gn.weight, neck.p5.bias, neck.p3.bias

[                              ] 0/500, elapsed: 0s, ETA:Traceback (most recent call last):
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 227, in <module>
    main()
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 195, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.save_img, args.save_img_dir)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 30, in single_gpu_test
    result = model(return_loss=False, rescale=not show, **data)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/base.py", line 88, in forward
    return self.forward_test(img, img_meta, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/base.py", line 79, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/csp.py", line 203, in simple_test
    x = self.extract_feat(img)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/single_stage.py", line 42, in extract_feat
    x = self.neck(x)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/necks/csp_neck.py", line 73, in forward
    p3 = self.p3(inputs[0])
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 958, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: Given transposed=1, weight of size [512, 256, 4, 4], expected input[1, 256, 128, 256] to have 512 channels, but got 256 channels instead

Process finished with exit code 1

应该是SOLIDER上的pretrained model缺少了最后几层的权重。请问是否方便提供一下训练好的PedestrianDetection的完整模型?感谢!

pose estimation 任务训练报错 KeyError: 'SwinTransformer is not in the models registry'

我在下载预训练后的solider_swin_base.pth,运行pose任务的训练时报错如下:

fp16 = dict(loss_scale='dynamic')
work_dir = './work_dirs/swin_base_coco_384x288_lly'
gpu_ids = range(0, 1)

2023-09-25 16:03:15,197 - mmpose - INFO - Set random seed to 1071184448, deterministic: False
Traceback (most recent call last):
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/data1/user/lly/work/pkg/mmpose-0.25.0/mmpose/models/detectors/top_down.py", line 48, in init
self.backbone = builder.build_backbone(backbone)
File "/data1/user/lly/work/pkg/mmpose-0.25.0/mmpose/models/builder.py", line 19, in build_backbone
return BACKBONES.build(cfg)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build
return self.build_func(*args, **kwargs, registry=self)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 45, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'SwinTransformer is not in the models registry'

请问应该怎么解决呢?
我的环境信息如下:
sys.platform: linux
Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA GeForce RTX 4090
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:

  • GCC 7.3
  • C++ Version: 201402
  • Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 10.1
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  • CuDNN 7.6.3
  • Magma 2.5.2
  • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0
OpenCV: 4.8.0
MMCV: 1.3.17
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMPose: 0.25.0+fd361ca

Visualize and manually modify semantic clustering

Hello,

I was wondering if it is possible to visualize the semantic clustering results of the input images (and the attention maps) as in your paper. I have tried it, but I have not been able to visualize them.

Moreover, I was also thinking about modifying the clustering masks with my own ones so the model could learn to focus on the specified parts, but I'm having some problems. Do you think it is feasible?

SOLIDER Training Time?

How long did it take to train SOLIDER (in hours) using the settings from the paper? For both DINO pre-training and SOLIDER training? I see the number of epochs, but not time in hours.

question about Gait Recognition

Great job!
I would like to ask a question:
I don't see Gait Recognition task in the mentioned six downstream tasks, is SOLIDER not suitable for processing human gait features based on silhouette? , or is it out of some other consideration?
Look forward to receiving your reply, thank you.

The Semantic Head

Thanks for your excellent work! I'm trying to reproduce this repo recently! I found that The Sementic Head which is defined as part_classifier in the main_solider.py is seemly not optimized, could you please explain why? By the way, could you please provide the supplementary materials of the cvpr paper?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.