Giter VIP home page Giter VIP logo

llava-pp's Introduction

LLaVA++: Extending Visual Capabilities with LLaMA-3 and Phi-3

Oryx Models

* Equal contributions

Mohamed bin Zayed University of AI (MBZUAI)


๐Ÿ“ข Latest Updates

  • Apr-26-24- Phi-3-V and LLaVA-3-V released: Excited to release the new integration of LLaVA with Phi-3 Mini Instruct and LLaMA-3 Instruct models! Hugging Face ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ

๐Ÿ’ฌ Introduction

This repository enhances the capabilities of the LLaVA 1.5 model, incorporating latest LLMs released this weak๐Ÿ”ฅ, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B.

๐Ÿ† Results: Phi-3-V and LLaVA-3-V

Comparison on Benchmarks for Instruction-following LMMS & academic-task-oriented datasets:

Model MMMU POPE MME MMBench-en MMBench-cn SEED-all SEED-img SEED-vid LLaVA-Wild GQA Science-QA Average
LLaVA-v1.5-7B 35.4 85.8 1510.7 64.3 58.3 58.6 66.1 37.3 65.4 62.0 66.8 60.0
LLaVA-v1.5-13B 36.4 85.9 1531.3 67.7 63.6 61.6 68.2 42.7 72.5 63.3 71.6 63.3
LLaMA-3-V-8B 37.1 84.2 1441.1 67.0 57.8 62.8 68.6 41.1 66.2 61.9 78.6 62.5
Phi-3-V-3.8B 37.8 85.6 1470.1 68.2 58.5 62.8 67.7 44.5 70.9 61.7 80.7 63.8
  • Average computed excluding MME, and second-best are underlined.

๐ŸŒŸ LLaMA-3-V-8B full fine-tuning results - coming soon!

๐Ÿค– Model-Zoo

The following table provides an overview of the available models in our zoo. For each model, you can find links to its Hugging Face page.

Model Name Hugging Face Link Summary
LLaVA-Phi-3-mini-4k-instruct-pretrain Hugging Face Pretrained on LCS-558K.
LLaVA-Phi-3-mini-4k-instruct-lora Hugging Face LoRA weights fine-tuned on LLaVA-Instruct-665K.
LLaVA-Phi-3-mini-4k-instruct Hugging Face Merged weights in HuggingFace format.
Model Name Hugging Face Link Summary
LLaVA-Meta-Llama-3-8B-Instruct-pretrain Hugging Face Pretrained on LCS-558K.
LLaVA-Meta-Llama-3-8B-Instruct-lora Hugging Face LoRA weights fine-tuned on LLaVA-Instruct-665K.
LLaVA-Meta-Llama-3-8B-Instruct Hugging Face Merged weights in HuggingFace format.

Installation

git clone https://github.com/mbzuai-oryx/LLaVA-pp.git
cd LLaVA-pp
git submodule update --init --recursive

Packages you need to update from LLAVA:

pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3

๐Ÿš€ Phi-3-V

To integrate Phi-3-V with LLaVA, follow these steps to update the codebase:

# Copy necessary files
cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/llava_phi3.py LLaVA/llava/model/language_model/llava_phi3.py
cp Phi-3-V/builder.py LLaVA/llava/model/builder.py
cp Phi-3-V/model__init__.py LLaVA/llava/model/__init__.py
cp Phi-3-V/main__init__.py LLaVA/llava/__init__.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py

# Training commands
cp scripts/Phi3-V_pretrain.sh LLaVA/Vi-phi3_pretrain.sh
cp scripts/Phi3-V_finetune_lora.sh LLaVA/Vi-phi3_finetune_lora.sh

Train Phi-3-V

  1. Pre-train
cd LLaVA
bash Phi3-V_pretrain.sh
  1. Finetune
cd LLaVA
bash Phi3-V_finetune_lora.sh

๐Ÿš€ LLaMA-3-V

To integrate LLaMA-3-V with LLaVA, follow these steps to update the codebase:

# Copy necessary files
cp LLaMA-3-V/train.py LLaVA/llava/train/train.py
cp LLaMA-3-V/conversation.py LLaVA/llava/conversation.py
cp LLaMA-3-V/builder.py LLaVA/llava/model/builder.py
cp LLaMA-3-V/llava_llama.py LLaVA/llava/model/language_model/llava_llama.py

# Training commands
cp scripts/LLaMA3-V_pretrain.sh LLaVA/LLaMA3-V_pretrain.sh
cp scripts/LLaMA3-V_finetune_lora.sh LLaVA/LLaMA3-V_finetune_lora.sh

Train LLaMA-3-V

  1. Pre-train
cd LLaVA
bash LLaMA3-V_pretrain.sh
  1. Finetune
cd LLaVA
bash LLaMA3-V_finetune_lora.sh

๐Ÿ™ Acknowledgement

We are thankful to LLaVA, and lmms-eval for releasing their models and code as open-source contributions.

In case if you face any issues or have any questions, please feel free to create an issue or reach out at [email protected] & [email protected].


llava-pp's People

Contributors

mmaaz60 avatar hanoonar avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.