Giter VIP home page Giter VIP logo

fgahoi's Introduction

FGAHOI

PWC arXiv

FGAHOI: Fine-Grained Anchors for Human-Object Interaction Detection

Abstract

Human-Object Interaction (HOI), as an important problem in computer vision, requires locating the human-object pair and identifying the interactive relationships between them. The HOI instance has a greater span in spatial, scale, and task than the individual object instance, making its detection more susceptible to noisy backgrounds. To alleviate the disturbance of noisy backgrounds on HOI detection, it is necessary to consider the input image information to generate fine-grained anchors which are then leveraged to guide the detection of HOI instances. However, it has the following challenges. ๐‘–) how to extract pivotal features from the images with complex background information is still an open question. ๐‘–๐‘–) how to semantically align the extracted features and query embeddings is also a difficult issue. In this paper, a novel end-to-end transformer-based framework (FGAHOI) is proposed to alleviate the above problems. FGAHOI comprises three dedicated components namely, multi-scale sampling (MSS), hierarchical spatial-aware merging (HSAM) and task-aware merging mechanism (TAM). MSS extracts features of humans, objects and interaction areas from noisy backgrounds for HOI instances of various scales. HSAM and TAM semantically align and merge the extracted features and query embeddings in the hierarchical spatial and task perspectives in turn. In the meanwhile, a novel training strategy Stage-wise Training Strategy is designed to reduce the training pressure caused by overly complex tasks done by FGAHOI. In addition, we propose two ways to measure the difficulty of HOI detection and a novel dataset, ๐‘–.๐‘’., HOI-SDC for the two challenges (Uneven Distributed Area in Human-Object Pairs and Long Distance Visual Modeling of Human-Object Pairs) of HOI instances detection. Experiments are conducted on three benchmarks: HICO-DET, HOI-SDC and V-COCO. Our model outperforms the state-of-the-art HOI detection methods, and the extensive ablations reveal the merits of our proposed contribution.

ๅ›พ็‰‡1

Requirements

We test our models under python=3.8, pytorch=1.10.0, cuda=11.3. Other versions might be available as well.

conda create -n FGAHOI python =3.8 pip
conda activate FGAHOI
conda install pytorch==1.9.1 torchvision==0.10.1 torchaudio==0.9.1 cudatoolkit=11.3 -c pytorch -c conda-forge
pip install -r requirements.txt
  • Compiling CUDA operators
cd ./models/dab_deformable_detr/ops
sh ./make.sh
# test
python test.py
  • The details of argument can be changed in main.py

Dataset Preparation&Result

HICO-DET

Please follow the HICO-DET dataset preparation of GGNet.

After preparation, the data/hico_20160224_det folder as follows:

data
โ”œโ”€โ”€ hico_20160224_det
|   โ”œโ”€โ”€ images
|   |   โ”œโ”€โ”€ test2015
|   |   โ””โ”€โ”€ train2015
|   โ””โ”€โ”€ annotations
|       โ”œโ”€โ”€ anno_list.json
|       โ”œโ”€โ”€ corre_hico.npy
|       โ”œโ”€โ”€ file_name_to_obj_cat.json
|       โ”œโ”€โ”€ hoi_id_to_num.json
|       โ”œโ”€โ”€ hoi_list_new.json
|       โ”œโ”€โ”€ test_hico.json
|       โ””โ”€โ”€ trainval_hico.json

V-COCO

Please follow the installation of V-COCO.

For evaluation, please put vcoco_test.ids and vcoco_test.json into data/v-coco/data folder.

After preparation, the data/v-coco folder as follows:

data
โ”œโ”€โ”€ v-coco
|   โ”œโ”€โ”€ prior.pickle
|   โ”œโ”€โ”€ images
|   |   โ”œโ”€โ”€ train2014
|   |   โ””โ”€โ”€ val2014
|   โ”œโ”€โ”€ data
|   |   โ”œโ”€โ”€ instances_vcoco_all_2014.json
|   |   โ”œโ”€โ”€ vcoco_test.ids
|   |   โ””โ”€โ”€ vcoco_test.json
|   โ””โ”€โ”€ annotations
|       โ”œโ”€โ”€ corre_vcoco.npy
|       โ”œโ”€โ”€ test_vcoco.json
|       โ””โ”€โ”€ trainval_vcoco.json

HOI-SDC

After preparation, the data/SDC folder as follows:

data
โ”œโ”€โ”€ SDC
|   โ”œโ”€โ”€ JPGImages
|   |   โ””โ”€โ”€ image
|   โ””โ”€โ”€ annotations
|       โ”œโ”€โ”€ train_annotation.json
|       โ”œโ”€โ”€ test_annotation.json
|       โ”œโ”€โ”€ train_split.txt
|       โ””โ”€โ”€ test_split.txt

More details will come soon!

Results

We currently provide results on HICO-DET.

Model Full (def) Rare (def) None-Rare (def) Full (ko) Rare (ko) None-Rare (ko) Weight
Swin-Tiny 29.94 22.24 32.24 32.48 24.16 34.97 Tiny_weight
Swin-Large*+ 37.18 30.71 39.11 38.93 31.93 41.02 Large_weight

Training

HICO-DET

  • Training FGAHOI with Swin-tiny from scratch.

stage base

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_tiny \
        --pretrained params/swin_tiny_patch4_window7_224.pth \
        --dataset_file hico \
        --num_verb_classes 117 \
        --num_obj_classes 80 \
        --output_dir logs/base \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --merge \
        --scale [1, 3, 5] \
        --base \
        --hoi_path data/hico_20160224_det

stage hierarchical_merge

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_tiny \
        --pretrain_model_path "{Weights of the last stage}" \
        --dataset_file hico \
        --num_verb_classes 117 \
        --num_obj_classes 80 \
        --output_dir logs/hierarchical_merge \
        --epochs 50 \
        --lr_drop 40 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --merge \
        --scale [1, 3, 5] \
        --hierarchical_merge \
        --hoi_path data/hico_20160224_det

stage hierarchical_merge and task_merge

python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_tiny \
        --pretrain_model_path "{Weights of the last stage}" \
        --dataset_file hico \
        --num_verb_classes 117 \
        --num_obj_classes 80 \
        --output_dir logs/hierarchical_merge_and_task_merge \
        --epochs 50 \
        --lr_drop 40 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --merge \
        --scale [1, 3, 5] \
        --hierarchical_merge \  
        --task_merge \
        --hoi_path data/hico_20160224_det

Testing

  • Evaluate FGAHOI with Swin-tiny from scratch.
python -m torch.distributed.launch \
        --nproc_per_node=8 \
        --use_env main.py \
        --backbone swin_tiny \
        --dataset_file hico \
        --resume "{Weight of the model}"
        --num_verb_classes 117 \
        --num_obj_classes 80 \
        --output_dir logs \
        --epochs 150 \
        --lr_drop 120 \
        --num_feature_levels 3 \
        --num_queries 300 \
        --merge \
        --scale [1, 3, 5] \
        --hierarchical_merge \  
        --task_merge \
        --eval \
        --hoi_path data/hico_20160224_det

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Citation

If you use FGAHOI, please consider citing:

@inproceedings{Ma2023FGAHOI,
  title={FGAHOI: Fine-Grained Anchors forHuman-Object Interaction Detection},
  author={Shuailei Ma and Yuefeng Wang and Shanze Wang and Ying Wei},
  year={2023}
}

Contact

Should you have any question, please contact {[email protected]}

Acknowledgments

FGAHOI builds on previous works code base such as QAHOI, DAB-DETR. If you found FGAHOI useful please consider citing these works as well.

fgahoi's People

Contributors

asdf2kr avatar neufan avatar sunggukcha avatar xiaomabufei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fgahoi's Issues

HELP

ๆ‹œ่ฏปไบ†ๆ‚จ็š„่ฎบๆ–‡๏ผšCAT: LoCalization and IdentificAtion Cascade Detection Transformer
for Open-World Object Detection๏ผŒไฝ†ๆบ็ ๆ–‡ไปถๅœฐๅ€ๅทฒ็ปไธๅญ˜ๅœจ๏ผŒๅคงไฝฌ่ƒฝๅ‘ไธชๆบ็ ๅ—๏ผˆไพ›ๅญฆไน ไฝฟ็”จ๏ผ‰?ๆ„Ÿ่ฐข๏ผ๏ผ๏ผ

[email protected]

About test on HICO-DET

Thank you very much for your work๏ผ When I run the test command, exactly export CUDA_VISIBLE_DEVICES=1,2; python -m torch.distributed.run --nproc_per_node=2 --use_env main.py --backbone swin_large --dataset_file hico --resume "params/swin_large.pth" --num_verb_classes 117 --num_obj_classes 80 --output_dir logs --epochs 150 --lr_drop 120 --num_feature_levels 3 --num_queries 300 --merge --hierarchical_merge --task_merge --eval --hoi_path data/hico_20160224_det on my machine๏ผŒsome errors have occurred.
The error message are as follows:
Traceback (most recent call last): File "/home/qinxiaohan/FGAHOI/main.py", line 446, in <module> main(args) File "/home/qinxiaohan/FGAHOI/main.py", line 274, in main model, criterion, postprocessors = build_model_main(args) File "/home/qinxiaohan/FGAHOI/main.py", line 234, in build_model_main model, criterion, postprocessors = build_dab_deformable_detr(args) File "/home/qinxiaohan/FGAHOI/models/dab_deformable_detr/dab_deformable_detr.py", line 729, in build_dab_deformable_detr backbone = build_backbone(args) File "/home/qinxiaohan/FGAHOI/models/dab_deformable_detr/backbone.py", line 151, in build_backbone backbone = Backbone( File "/home/qinxiaohan/FGAHOI/models/dab_deformable_detr/backbone.py", line 124, in __init__ super().__init__(backbone, backbone_name, num_feature_levels, pretrained) File "/home/qinxiaohan/FGAHOI/models/dab_deformable_detr/backbone.py", line 69, in __init__ backbone.init_weights(pretrained) File "/home/qinxiaohan/FGAHOI/models/dab_deformable_detr/swin_transformer.py", line 719, in init_weight load_checkpoint(self, pretrained, strict=False) File "/home/qinxiaohan/FGAHOI/models/dab_deformable_detr/swin_transformer.py", line 144, in load_checkpoint table_current = model.state_dict()[table_key] KeyError: 'backbone.0.body.layers.0.blocks.0.attn.relative_position_bias_table'
I think this error means the key backbone.0.body.layers.0.blocks.0.attn.relative_position_bias_table is missed, but ths parameter strict is already set to False. I don't understand the code enough right now, so I don't know how to solve this problem ๐Ÿ˜ญ. What do you think about this error? I would be very grateful if you could make some suggestions on this issue!

่ฟ่กŒ้”™่ฏฏ

ๆ‚จๅฅฝ๏ผŒๆˆ‘ๅœจๅค็Žฐๆ‚จ็š„ไปฃ็ ็š„่ฟ‡็จ‹ไธญๅ‡บ็Žฐไบ†ไธ€ไบ›้—ฎ้ข˜๏ผŒ้€š่ฟ‡ๆœ็ดข๏ผŒๅฏ่ƒฝ็š„ๅŽŸๅ› ๆ˜ฏไธ€ไบ›ๅ‚ๆ•ฐๆฒกๆœ‰ๅ‚ๅŠ ๆขฏๅบฆๆ›ดๆ–ฐ๏ผŒๅฆ‚ไธ‹๏ผš
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You can enable unused parameter detection by (1) passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel; (2) making sure all forward function outputs participate in calculating loss. If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).
่ฏท้—ฎๆˆ‘่ฏฅๅฆ‚ไฝ•่งฃๅ†ณๅฎƒ๏ผŸ

About the results with V-COCO dataset

Thank you very much for sharing your great work with us!

Iโ€™m trying to use your method with HICO-DET and V-COCO.

While I obtained a coequal result with HICO-DET to your paper (mine is 36.9 AP), my result with V-COCO is not as good as yours (my result with Swin-transformer-large based FGAHOI is around 45 AP in Scenario 2).

The details of my experiment with V-COCO are as follows.

Only the modification I made with your code is about the part for calculating auxiliary losses in DABDETR.py; I did this because the original code seems to cause RuntimeError in my environment.
I attached a zip file that contain the actual modified file (I modified lines 188, 192, 375 and 492) and scripts to train the networks for each stage.
codes.zip

For base stage, I downloaded pre-trained swin-transformer-large from official github.

It would be greatly appreciated if you would respond to the following two requests.

  1. Could you tell me the detailed settings of your training for V-COCO (including the learning rate drop you mentioned in your paper)?

  2. If you come up with any possible reasons for the difference in the results, please also let me know.

Thank you.

Evaluation with Swin-large ckpt came across loading ckpt error. Can you provide the script to evaluate with Swin-large?

Hi,
Thanks for your great work! I'm very impressed by the strong performance of your method and trying to reproduce the result on Swin-large. However, as only the swin-tiny evaluation script is provided and not the swin-large, I tried to run with swin-large ckpt by replacing "--backbone swin_tiny" with "--backbone swin_large". However, I came across such error:

        size mismatch for backbone.0.body.layers.2.blocks.4.attn.relative_position_index: copying a param with shape torch.Size([144, 144]) from checkpoint, the shape in current model is torch.Size([49, 49]).

How can I fix this and run with your swin-large ckpt? Can you provide the evaluation script for swin-large?
I'm looking forward to your reply. Thanks a lot!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.