Giter VIP home page Giter VIP logo

vary's Introduction

Haoran Wei*, Lingyu Kong*, Jinyue Chen, Liang Zhao, Zheng Ge, Jinrong Yang, Jianjian Sun, Chunrui Han, Xiangyu Zhang

Release

  • [2023/12/11] We released the online demo, have fun!
  • [2023/12/11] We released the codes of Vary (train and inference)!

Code License Data License Usage and License Notices: The data, code, and checkpoint are intended and licensed for research use only. They are also restricted to use that follow the license agreement of LLaMA, Vicuna, GPT-4, Qwen, and LLaVA.

Contents

Install

  1. Clone this repository and navigate to the Vary folder
git clone https://github.com/Ucas-HaoranWei/Vary.git
cd Vary
  1. Install Package
conda create -n vary python=3.10 -y
conda activate vary
pip install e .
  1. Install Flash-Attention
pip install ninja
pip install flash-attn --no-build-isolation

Vary Weights

  • Due to download speed issues with Baiduyun, we have temporarily closed the download link. Our weights will be reorganized and open source again in the next few days. If you are in urgent need of weights for your research recently, please contact me by email.
  • Download the CLIP-VIT-L in Hugging Face

Demo

  1. Update the CLIP-VIT path in the codes (/cache/vit-large-patch14/) to your path.

python vary/demo/run_qwen_vary.py  --model-name  /vary/model/path/ --image-file /an/image/file.png

Train

  • We currently do not plan to open source the weights of the intermediate.
  • However, we release the train codes. So you can train on your own dataset. If you want to do this, you can try this:
  1. For Vary-base (one machine, if you have multiple machines you need to prepare your host file)
deepspeed   Vary/train/train_qwen_vary.py  --deepspeed /Vary/zero_config/zero2.json
            --model_name_or_path /Qwen-7B/path/
            --vision_tower /vit-large-patch14/path/
            --freeze_vision_tower True
            --freeze_lm_model False
            --vision_select_layer  -2
            --use_im_start_end True
            --bf16 True
            --per_device_eval_batch_size 4
            --gradient_accumulation_steps 1
            --evaluation_strategy "no"
            --save_strategy "steps"
            --save_steps 5000
            --save_total_limit 1
            --weight_decay 0.
            --warmup_ratio 0.03
            --lr_scheduler_type "cosine"
            --logging_steps 1 --tf32 True
            --model_max_length 4096
            --gradient_checkpointing True
            --dataloader_num_workers 4
            --report_to none
            --per_device_train_batch_size 4
            --num_train_epochs 1
            --learning_rate 5e-5
            --datasets  data_name1+data_name2+data_name3
            --output_dir /path/to/output/
  1. For Vary-tiny
deepspeed   Vary/train/train_opt.py  --deepspeed /Vary/zero_config/zero2.json
            --model_name_or_path /opt125m/path/
            --conversation_version opt
            --freeze_vision_tower False
            --freeze_lm_model False
            --use_im_start_end True
            --bf16 True
            --per_device_eval_batch_size 4
            --gradient_accumulation_steps 1
            --evaluation_strategy "no"
            --save_strategy "steps"
            --save_steps 5000
            --save_total_limit 1
            --weight_decay 0.
            --warmup_ratio 0.03
            --lr_scheduler_type "cosine"
            --logging_steps 1 --tf32 True
            --model_max_length 4096
            --gradient_checkpointing True
            --dataloader_num_workers 4
            --report_to none
            --per_device_train_batch_size 16
            --num_train_epochs 1
            --learning_rate 5e-5
            --datasets  data_name1+data_name2+data_name3
            --output_dir /path/to/output/

Contact

If you have any questions related to the code or the paper, feel free to email ([email protected]).

Acknowledgement

  • LLaVA: the codebase we built upon!
  • Qwen: the LLM base model of Vary, which is good at both English and Chinese!

Citation

If you find our work useful in your research, please consider citing Vary:

@article{wei2023vary,
  title={Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models},
  author={Wei, Haoran and Kong, Lingyu and Chen, Jinyue and Zhao, Liang and Ge, Zheng and Yang, Jinrong and Sun, Jianjian and Han, Chunrui and Zhang, Xiangyu},
  journal={arXiv preprint arXiv:2312.06109},
  year={2023}
}

vary's People

Contributors

ucas-haoranwei avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.