LLaRA: Large Language and Robotics Assistant

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy [Arxiv]

Xiang Li¹, Cristina Mata¹, Jongwoo Park¹, Kumara Kahatapitiya¹, Yoo Sung Jang¹, Jinghuan Shang¹, Kanchana Ranasinghe¹, Ryan Burgert¹, Mu Cai², Yong Jae Lee², and Michael S. Ryoo¹

¹Stony Brook University ²University of Wisconsin-Madison

Installation

Set Up Python Environment:

Follow the instructions to install the same Python environment as used by LLaVA.

conda create -n llara python=3.10 -y
conda activate llara
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

Install Revised LLaVA:

Navigate to train-llava in this repo and install the llava package there:

cd train-llava && pip install -e ".[train]"
conda install cuda=12.1 cuda-compiler=12.1 cuda-nvcc=12.1 cuda-version=12.1 -c nvidia
pip install flash-attn --no-build-isolation

Install VIMABench:

Complete the setup for VIMABench.

git clone https://github.com/vimalabs/VimaBench && cd VimaBench
pip install -e .

Demo

Download the Pretrained Model:

Download the following model to ./checkpoints/
- llava-1.5-7b-D-inBC + Aux(B) trained on VIMA-80k Hugging Face
More models are available at Model Zoo

Run the evaluation:

cd eval
# evaluate the model with oracle object detector
python3 eval-llara.py D-inBC-AuxB-VIMA-80k --model-path ../checkpoints/llava-1.5-7b-llara-D-inBC-Aux-B-VIMA-80k --prompt-mode hso

# the results will be saved to ../results/[hso]D-inBC-AuxB-VIMA-80k.json

Check the results: Please refer to llara-result.ipynb

Quick Start Guide

Minuiment Hardware Requirement:

Inference: Requires at least one GPU with a minimum of 24GB RAM.
Training: Requires a system with at least 300GB of system RAM and four Ampere (or newer) GPUs, each equipped with a minimum of 24GB of memory.

Prepare the Dataset:

Visit the datasets directory to prepare your dataset for training.
Finetune a LLaVA Model:

To start finetuning a LLaVA model, refer to the instructions in train-llava.
Evaluate the Trained Model:

Follow the steps in eval to assess the performance of your trained model.
Train a MaskRCNN for Object Detection:

If you want to train a MaskRCNN for object detection, check out train-maskrcnn for detailed steps.

Issues

If you encounter any issues or have questions about the project, please submit an issue on our GitHub issues page.

License

This project is licensed under the Apache-2.0 License - see the LICENSE file for details.

Support us

If you find this work useful in your research, please consider giving it a star ⭐ and cite our work:

@article{li2024llara,
  title={LLaRA: Supercharging Robot Learning Data for Vision-Language Policy},
  author={Li, Xiang and Mata, Cristina and Park, Jongwoo and Kahatapitiya, Kumara and Jang, Yoo Sung and Shang, Jinghuan and Ranasinghe, Kanchana and Burgert, Ryan and Cai, Mu and Lee, Yong Jae and Ryoo, Michael S.},
  journal={arXiv preprint arXiv:2406.20095},
  year={2024}
}

Thanks!

Problem about replicating results

What's a nice job! And code is easy to run.
While I have some problems in replicating eval results. Here is my process.

I have download ckpt llava-1.5-7b-D-inBC + Aux(B) trained on VIMA-80k Hugging Face;
Then build a new empty directory myresults and use command cd eval && python3 eval-llara.py D-inBC-AuxB-VIMA-80k --model-path ../checkpoints/llava-1.5-7b-llara-D-inBC-Aux-B-VIMA-80k --prompt-mode hso --output-path ../myresults/;
I also cp results/llara-result.ipynb to ./myresults;
In ./myresults directory, I run llara-result.ipynb to get the final result, but the result is too bad;

Doing Step 2 and 3 is to get a new json result.

What mistakes happend in my process? Could anyone point out for me?

Besides, thanks authors to share training logs for us. I found the learning rate is changing among training according to ./checkpoints/llava-1.5-7b-llara-D-inBC-Aux-B-VIMA-80k/trainer_state.json. Which schedual is used in training? Following your guide, the learning rate is always 2e-05 except warm up stage.

lostxine / llara Goto Github PK

llara's Introduction

LLaRA: Large Language and Robotics Assistant

Installation

Demo

Quick Start Guide

Issues

License

Support us

llara's People

Contributors

Stargazers

Watchers

Forkers

llara's Issues

Recommend Projects

Recommend Topics

Recommend Org