tinyvision / solider Goto Github PK

A Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images, which can benefit downstream human-centric tasks to the maximum extent

License: Apache License 2.0

Python 99.17% Shell 0.83%

cvpr2023 self-supervised-learning human-centric

solider's Introduction

Welcome to SOLIDER! SOLIDER is a Semantic Controllable Self-Supervised Learning Framework to learn general human representations from massive unlabeled human images which can benefit downstream human-centric tasks to the maximum extent. Unlike the existing self-supervised learning methods, prior knowledge from human images is utilized in SOLIDER to build pseudo semantic labels and import more semantic information into the learned representation. Meanwhile, different downstream tasks always require different ratios of semantic information and appearance information, and a single learned representation cannot fit for all requirements. To solve this problem, SOLIDER introduces a conditional network with a semantic controller, which can fit different needs of downstream tasks. For more details, please refer to our paper Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks.

Updates

[2023/07/21: Codes of human pose task is released!]
- Training details of our pretrained model on downstream human pose task is released.
[2023/05/15: Codes of human parsing task is released!]
- Training details of our pretrained model on downstream human parsing task is released.
[2023/04/24: Codes of attribute recognition task is released!]
- Training details of our pretrained model on downstream person attribute recognition task is released.
[2023/03/28: Codes of 3 downstream tasks are released!]
- Training details of our pretrained model on 3 downstream human visual tasks, including person re-identification, person search and pedestrian detection, are released.
[2023/03/13: SOLIDER is accepted by CVPR2023!]
- The paper of SOLIDER is accepted by CVPR2023, and its offical pytorch implementation is released in this repo.

Installation

This codebase has been developed with python version 3.7, PyTorch version 1.7.1, CUDA 10.1 and torchvision 0.8.2.

Datasets

We use LUPerson as our training data, which consists of unlabeled human images. Download LUPerson from its offical link and unzip it.

Training

Choice 1. To train SOLIDER from scratch, please run:

sh run_solider.sh

Choice 2. Training SOLIDER from scratch may take a long time. To speed up the training, you can train a DINO model first, and then finetune it with SOLIDER, as follows:

sh run_dino.sh
sh resume_solider.sh

Finetuning and Inference

There is a demo to run the trained SOLIDER model, which can be embedded into the inference or the downstream task finetuning.

python demo.py

Models

We use Swin-Transformer as our backbone, which shows great advantages on many CV tasks.

Task	Dataset	Swin Tiny (Link)	Swin Small (Link)	Swin Base (Link)
Person Re-identification (mAP/R1) w/o re-ranking	Market1501	91.6/96.1	93.3/96.6	93.9/96.9
	MSMT17	67.4/85.9	76.9/90.8	77.1/90.7
Person Re-identification (mAP/R1) with re-ranking	Market1501	95.3/96.6	95.4/96.4	95.6/96.7
	MSMT17	81.5/89.2	86.5/91.7	86.5/91.7
Attribute Recognition (mA)	PETA_ZS	74.37	76.21	76.43
	RAP_ZS	74.23	75.95	76.42
	PA100K	84.14	86.25	86.37
Person Search (mAP/R1)	CUHK-SYSU	94.9/95.7	95.5/95.8	94.9/95.5
	PRW	56.8/86.8	59.8/86.7	59.7/86.8
Pedestrian Detection (MR-2)	CityPersons	10.3/40.8	10.0/39.2	9.7/39.4
Human Parsing (mIOU)	LIP	57.52	60.21	60.50
Pose Estimation (AP/AR)	COCO	74.4/79.6	76.3/81.3	76.6/81.5

All the models are trained on the whole LUPerson dataset.

Traning codes on Downstream Tasks

Acknowledgement

Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

Reference

If you use SOLIDER in your research, please cite our work by using the following BibTeX entry:

@inproceedings{chen2023beyond,
  title={Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks},
  author={Weihua Chen and Xianzhe Xu and Jian Jia and Hao Luo and Yaohua Wang and Fan Wang and Rong Jin and Xiuyu Sun},
  booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2023},
}

solider's People

Contributors

Stargazers

Watchers

Forkers

ssl-solider cwhgn kaysico viktoya equinorab doxzm office361 onorikka xiuyu-sxy symbolic-ref luna2terra tatacomb lverything paulauv plaid3 garnue jroutines taoxianyu saigona gameplux donkkaseu applejson nevercum utzls venkatsubr shirerdiao qqo8 6gem dutfile 5qnu sexyeah wenwenzju nsfwow detimer droxico recaller awekling duoshoucom shabbyjames tvsecret serend1p1ty gowithplay nimbus2004 xidorm 0xabce coriskr zytewalk srccy teakoo breakarray long-long-double baosiby libtree shijiangming1 varmos vegefira libcolor cinkun gamepera meltykisser kauhua takinoka whilebourgogne mrdimarco tinveo tufo830 wentop1 tryq1998 senuhere roguelio miruteer nek-oko vinanini pfeferminz xiaoqiuli zemire97 0xkiki venuslock cloudvi popself smartsudo finclips pred-320 annwawa partija notifycontext isleat malest marryroseme paxlovid ichibanya azhu520 excogi kennycou manraychan knowbe2 berstand a9bi tatachu mrpavlo

solider's Issues

What is the feature dimension for person ReID?

Were the feature embedding sizes defined here used for your person ReID experiments link, i.e., 96 for Swin Tiny, 96 for Swin Small, and 128 for Swin Base? Or did you use embeddings of higher dimension? It was not mentioned in the paper. Thanks.

关于参数数量？

What is the license of SOLIDER?

Is it under an MIT license? Could you add the license file to the project?

Request for the fine-tuned / Pre-trained code for Person Attribute Recognition

Hello,

Can you please upload task-specific Swin Weight for Person Attribute Recognition.
Basically the weights I need to recreate these results :

Method   | Model        | PETA_ZS(mA) | RAP_ZS(mA) | PA100K(mA)
SOLIDER | Swin Tiny  |         74.37        |       74.23       |       84.14

mAP and rank values？

Hello, may I ask how can I see the mAP and rank values? These two parameters are not displayed in my training log. Thank you very much for your review!

下游任务微调

请问在进行下游任务时，例如分类任务时，是全量微调，还是冻结主干，只微调分类头部分？

inference (human parsing)

How to run model on (one of task) inference
I found that in demo.py only features space
Im interesting in human parsing task

如何使用solider

请问怎么使用solider，做人体解析

modescope中SOLIDER-base模型使用

博主你好，我看你们在modescope中放出了预训练的SOLIDER-base模型，但是modescope中的pipeline中并没有相应的Tasks.Id。

Is it able to run inference in realtime?

What's the approximate inference speed for per image on your GPU configuration?

SOLIDER with Resnet50

Hi authors,
Thanks for sharing your method, it is really interesting. This question is out of scope of the paper, have you tried your method with Resnet architecture? Can it work? Thank you in advance.

IndexError: tensors used as indices must be long, byte or bool tensors

when I run the resume_solider.sh,it maybe traceback with
File "main_solider.py", line 398, in train_one_epoch
semantic_weight = [torch.cat(semantic_weight)[torch.from_numpy(mask_idxs)]]
IndexError: tensors used as indices must be long, byte or bool tensors
how can I deal with it

person serach

Request for PedestrianDetection pretrained model

博主您好！
我使用SOLIDER上pretrained的模型进行测试，报了如下的错误

/home/cddjjc/anaconda3/envs/pedestron_v2/bin/python /home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py configs/solider/cp/swin_base.py models_pretrained/solider_origin/swin_base/epoch_ 1 2 --out swin_base.json --show --mean_teacher 
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
No pre-trained weights for SwinBase, training start from scratch
unexpected key in source state_dict: backbone.norm0.weight, backbone.norm0.bias, head.mlp.0.weight, head.mlp.0.bias, head.mlp.2.weight, head.mlp.2.bias, head.mlp.4.weight, head.mlp.4.bias, head.last_layer.weight_g, head.last_layer.weight_v

missing keys in source state_dict: bbox_head.reg_convs.0.gn.bias, bbox_head.offset_scales.0.scale, bbox_head.cls_convs.0.conv.weight, neck.p3_l2.weight, bbox_head.reg_convs.0.conv.weight, bbox_head.cls_convs.0.gn.bias, bbox_head.csp_reg.weight, neck.p4_l2.weight, bbox_head.csp_cls.weight, neck.p5_l2.weight, bbox_head.cls_convs.0.gn.weight, bbox_head.offset_convs.0.conv.weight, bbox_head.csp_offset.bias, neck.p4.bias, bbox_head.csp_offset.weight, neck.p5.weight, bbox_head.reg_scales.0.scale, bbox_head.csp_cls.bias, neck.p4.weight, bbox_head.reg_convs.0.gn.weight, bbox_head.csp_reg.bias, bbox_head.offset_convs.0.gn.bias, neck.p3.weight, bbox_head.offset_convs.0.gn.weight, neck.p5.bias, neck.p3.bias

[                              ] 0/500, elapsed: 0s, ETA:Traceback (most recent call last):
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 227, in <module>
    main()
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 195, in main
    outputs = single_gpu_test(model, data_loader, args.show, args.save_img, args.save_img_dir)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/test_city_person.py", line 30, in single_gpu_test
    result = model(return_loss=False, rescale=not show, **data)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 169, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/base.py", line 88, in forward
    return self.forward_test(img, img_meta, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/base.py", line 79, in forward_test
    return self.simple_test(imgs[0], img_metas[0], **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/csp.py", line 203, in simple_test
    x = self.extract_feat(img)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/detectors/single_stage.py", line 42, in extract_feat
    x = self.neck(x)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/core/fp16/decorators.py", line 49, in new_func
    return old_func(*args, **kwargs)
  File "/home/cddjjc/Workspace/SOLIDER-PedestrianDetection/mmdet/models/necks/csp_neck.py", line 73, in forward
    p3 = self.p3(inputs[0])
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/cddjjc/anaconda3/envs/pedestron_v2/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 958, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: Given transposed=1, weight of size [512, 256, 4, 4], expected input[1, 256, 128, 256] to have 512 channels, but got 256 channels instead

Process finished with exit code 1

应该是SOLIDER上的pretrained model缺少了最后几层的权重。请问是否方便提供一下训练好的PedestrianDetection的完整模型？感谢！

请问您的训练GPU是什么级别的呢,训练时间是多久呢？

search inference

有训练好的用于demo.py的权重吗？

The pretrained model provide in the link is the original swin-transformer-backbone or you trained on human dataset ?

pose estimation 任务训练报错 KeyError: 'SwinTransformer is not in the models registry'

我在下载预训练后的solider_swin_base.pth，运行pose任务的训练时报错如下：

fp16 = dict(loss_scale='dynamic')
work_dir = './work_dirs/swin_base_coco_384x288_lly'
gpu_ids = range(0, 1)

2023-09-25 16:03:15,197 - mmpose - INFO - Set random seed to 1071184448, deterministic: False
Traceback (most recent call last):
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 52, in build_from_cfg
return obj_cls(**args)
File "/data1/user/lly/work/pkg/mmpose-0.25.0/mmpose/models/detectors/top_down.py", line 48, in init
self.backbone = builder.build_backbone(backbone)
File "/data1/user/lly/work/pkg/mmpose-0.25.0/mmpose/models/builder.py", line 19, in build_backbone
return BACKBONES.build(cfg)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build
return self.build_func(*args, **kwargs, registry=self)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
return build_from_cfg(cfg, registry, default_args)
File "/data1/user/lly/miniconda3/envs/mmcv/lib/python3.7/site-packages/mmcv/utils/registry.py", line 45, in build_from_cfg
f'{obj_type} is not in the {registry.name} registry')
KeyError: 'SwinTransformer is not in the models registry'

请问应该怎么解决呢？
我的环境信息如下：
sys.platform: linux
Python: 3.7.16 (default, Jan 17 2023, 22:20:44) [GCC 11.2.0]
CUDA available: True
GPU 0,1,2,3: NVIDIA GeForce RTX 4090
CUDA_HOME: /usr
NVCC: Cuda compilation tools, release 10.1, V10.1.243
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 1.6.0
PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.7.0
OpenCV: 4.8.0
MMCV: 1.3.17
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMPose: 0.25.0+fd361ca

Global parameter fine-tuning or partial fine-tuning?

Hi! I am very interested in Human Centric Visual Tasks, may I ask if the fine-tuning part for downstream tasks only implement at some layers or all layers?
Looking forward to your more awesome work!

Visualize and manually modify semantic clustering

Hello,

I was wondering if it is possible to visualize the semantic clustering results of the input images (and the attention maps) as in your paper. I have tried it, but I have not been able to visualize them.

Moreover, I was also thinking about modifying the clustering masks with my own ones so the model could learn to focus on the specified parts, but I'm having some problems. Do you think it is feasible?

When release attribute recognition code? thank u

SOLIDER Training Time?

How long did it take to train SOLIDER (in hours) using the settings from the paper? For both DINO pre-training and SOLIDER training? I see the number of epochs, but not time in hours.

LUPerson 官方链接提供的数据有10683716张数据，跟论文里面说的418万数据量好像不一致

@cwhgn 您好，LUPerson 官方链接提供的数据有10683716张数据，跟论文里面说的418万数据量好像不一致，是需要做什么处理变成418万的数据量吗？

Exporting to ONNX (Person REID)

Hello, how could I export the model to onnx.

When will the training framework for human_parsing or human_pose be released

@cwhgn Hello, When will the training framework for human_parsing or human_pose be released?

question about Gait Recognition

Great job!
I would like to ask a question:
I don't see Gait Recognition task in the mentioned six downstream tasks, is SOLIDER not suitable for processing human gait features based on silhouette? , or is it out of some other consideration?
Look forward to receiving your reply, thank you.

The Semantic Head

Thanks for your excellent work! I'm trying to reproduce this repo recently! I found that The Sementic Head which is defined as part_classifier in the main_solider.py is seemly not optimized, could you please explain why? By the way, could you please provide the supplementary materials of the cvpr paper？

您好，可以分享dino、solider的训练日志吗

          8卡V100，训练时间主要看是直接训练solider还是在dino基础上训练solider。dino基础上训练大约几天，直接训练时间会更久。

Originally posted by @cwhgn in #13 (comment)

是否可以在windows 11上跑代码

如题，感谢！