Giter VIP home page Giter VIP logo

aurora's Introduction

[NeurIPS 2023] Parameter-efficient Tuning of Large-scale Multimodal Foundation Model

Introduction

Aurora is an efficient PETL method used in multimodal large model fields. It uses mode approximation to further reduce the trainable parameters and promote the fusion of different modalities.

1. Comparison with other PETL methods image

2. Overall architecture image

Getting Started

Requirements

  • Python 3.8, PyTorch>=1.8.0, torchvision>=0.7.0, timm>=0.6.13, numpy>=1.21.0, transformers>=4.27.4 are required for the current codebase.

Datasets

1.Image-text Retrieval Task

COCO2014: download dataset through https://cocodataset.org/#download, you can use such Linux command [wget -c http://images.cocodataset.org/annotations/annotations_trainval2014.zip] to help you download directly.

Flickr30k: download dataset through https://shannon.cs.illinois.edu/DenotationGraph/data/index.html; or you can download through this link: https://pan.baidu.com/s/1r0RVUwctJsI0iNuVXHQ6kA, the password is hrf3.

2.Video-text Retrieval

MSRVTT: download the video dataset in https://www.mediafire.com/folder/h14iarbs62e7p/shared, and the corresponding annotation file in https://mega.nz/file/UnRnyb7A#es4XmqsLxl-B7MP0KAat9VibkH7J_qpKj9NcxLh8aHg.

DiDemo: download the dataset through this Github project https://github.com/jpthu17/EMCL.

3.Visual Question Answering Task

VQAv2: The COCO dataset can be downloaded through https://visualqa.org/download.html, and the additional VG dataset can be downloaded through this GitHub project https://github.com/jayleicn/ClipBERT.

VideoQA: The video dataset is come from MSRVTT, and the annotation file can be downloaded through this GitHub project https://github.com/jayleicn/ClipBERT.

Image-text Retrieval

  • Download COCO and Flickr30k datasets, and set 'image_root' in configs/retrieval_{dataset}.yaml accordingly.

  • To parameter-efficient finetune on MSCOCO/Flickr:

python -m torch.distributed.run --nproc_per_node=8 train_retrieval.py --config ./configs/retrieval_{coco, flickr}.yaml --output_dir output/{coco, flickr} 
  • To evaluate on MSCOCO/Flickr:
python -m torch.distributed.run --nproc_per_node=8 train_retrieval.py --config ./configs/retrieval_{coco, flickr}.yaml --output_dir output/{coco, flickr} --evaluate 

Visual Question Answering

  • Download VQAv2 dataset and Visual Genome dataset, and set 'vqa_root' and 'vg_root' in configs/vqa.yaml.

  • To parameter-efficient finetune on VQAv2:

python -m torch.distributed.run --nproc_per_node=8 train_vqa.py --config ./configs/vqa.yaml --output_dir $static_dir
python -m torch.distributed.run --nproc_per_node=8 train_vqa.py --config ./configs/vqa.yaml --output_dir $static_dir --evaluate 

Video-text Retrieval and VideoQA

  • Download MSRVTT and DiDemo datasets, and set 'video_root' & 'ann_root' in configs/retrieval_{dataset}.yaml accordingly.

  • To parameter-efficient finetune on MSRVTT:

python -m torch.distributed.run --nproc_per_node=8 train_video_retrieval.py --config ./configs/retrieval_msrvtt.yaml --output_dir $static_dir
  • To parameter-efficient finetune on DiDemo:
python -m torch.distributed.run --nproc_per_node=8 train_video_retrieval.py --config ./configs/retrieval_didemo.yaml --output_dir $static_dir
  • To parameter-efficient finetune on VideoQA:
python -m torch.distributed.run --nproc_per_node=8 train_vqa.py --config ./configs/videoqa.yaml --output_dir $static_dir

Acknowledgement

Our codebase is built based on BLIP, timm, and transformers. We thank the authors for the nicely organized code!

How To Cite Aurora

If you use this code in your research, please kindly cite the following paper:

@article{wang2023mode,
  title={Mode Approximation Makes Good Vision-Language Prompts},
  author={Wang, Haixin and Yang, Xinlong and Chang, Jianlong and Jin, Dian and Sun, Jinan and Zhang, Shikun and Luo, Xiao and Tian, Qi},
  journal={arXiv preprint arXiv:2305.08381},
  year={2023}
}

@inproceedings{wang2023parameter,
  title={Parameter-efficient Tuning of Large-scale Multimodal Foundation Model},
  author={Wang, Haixin and Yang, Xinlong and Chang, Jianlong and Jin, Dian and Sun, Jinan and Zhang, Shikun and Luo, Xiao and Tian, Qi},
  booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
  year={2023}
}

aurora's People

Contributors

willdreamer avatar xinlong-yang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

aurora's Issues

关于运行环境

首先感谢您伟大的工作!但是当我尝试复现的时候出现了WARNING:main:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


thread panicked while processing panic. aborting.
thread panicked while processing panic. aborting.
thread panicked while processing panic. aborting.
thread panicked while processing panic. aborting.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 16031 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 1 (pid: 16032) of binary: /HOME/scw7294/.conda/envs/au/bin/python
Traceback (most recent call last):
File "/HOME/scw7294/.conda/envs/au/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/HOME/scw7294/.conda/envs/au/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/HOME/scw7294/.conda/envs/au/lib/python3.8/site-packages/torch/distributed/run.py", line 766, in
main()
File "/HOME/scw7294/.conda/envs/au/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/HOME/scw7294/.conda/envs/au/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
run(args)
File "/HOME/scw7294/.conda/envs/au/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
elastic_launch(
File "/HOME/scw7294/.conda/envs/au/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/HOME/scw7294/.conda/envs/au/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

train_retrieval.py FAILED

Failures:
[1]:
time : 2023-11-20_21:03:20
host : g0210.para.ai
rank : 2 (local_rank: 2)
exitcode : -6 (pid: 16033)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 16033
[2]:
time : 2023-11-20_21:03:20
host : g0210.para.ai
rank : 3 (local_rank: 3)
exitcode : -6 (pid: 16034)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 16034

Root Cause (first observed failure):
[0]:
time : 2023-11-20_21:03:20
host : g0210.para.ai
rank : 1 (local_rank: 1)
exitcode : -6 (pid: 16032)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 16032

我的环境为(均符合readme.txt中的要求)
(au) [scw7294@ln01 Aurora]$ pip list
Package Version


adapter 0.1
adapters 0.0.0.dev20231116
av 11.0.0
certifi 2023.11.17
charset-normalizer 3.3.2
einops 0.7.0
fairscale 0.4.13
filelock 3.13.1
fsspec 2023.10.0
huggingface-hub 0.19.4
idna 3.4
numpy 1.24.4
packaging 23.2
Pillow 10.0.1
pip 23.3
PyYAML 6.0.1
regex 2023.10.3
requests 2.31.0
ruamel.yaml 0.18.5
ruamel.yaml.clib 0.2.8
safetensors 0.4.0
setuptools 68.0.0
timm 0.9.10
tokenizers 0.13.3
torch 1.13.0+cu117
torchvision 0.14.0+cu117
tqdm 4.66.1
transformers 4.33.3
typing_extensions 4.8.0
urllib3 2.1.0
wheel 0.41.2。
由于没有明确报错我也不清楚是哪个环境出了问题,我是单机8卡运行(报错是4卡但是8卡报错一样),我想请问您是用啥配置,以及相应环境有更具体的要求吗?

about package

Great Job! When I run the code and I find that I don't have this package, how can I get it?
image
image

A bug

there's no adapter in Aurora, is there?

shall we delete the following lines?

Aurora/CP/vit.py

Lines 138 to 139 in 6ef56f1

if self.config and self.config['adapter_visual']:
self.adapter = adapter

the training set size of MSR-VTT

Could you please tell me the training set size of Aurora for video retrieval on MSR-VTT? Because I found there are two annotation versions, 7k or 9k.

Question about the adapter.

Thanks for your wonderful work!

When I want to follow your code, we can not found the code for adapter in
Aurora/tree/main/CP/med.py Line 47

from .adapter import Adapter_Lora

Do we need install some packages to fix this?

unable to reproduce your experimental

We are unable to reproduce your experimental results on the MSRVTT QA dataset, with an accuracy rate of around 31. May I ask for the reason, or can you provide some checkpoints

about dataset MSRVTT

How to load and use dataset MSRVTT in this code?The location information in msrvtt.yaml is inconsistent with the content lilnk of the dataset you provided

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.