chendelong1999 / polite-flamingo Goto Github PK

🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)

Home Page: https://arxiv.org/abs/2307.01003

Python 100.00%

large-language-models multimodal-large-language-models visual-instruction-tuning

polite-flamingo's Introduction

Delong Chen (陈德龙) is a first-year Ph.D. student at HKUST under the supervision of Prof. Pascale Fung. Before that, he received a bachelor's degree of computer science in 2021 from Hohai University, where he was advised by Prof. Fan Liu. Afterward, he took two gap years doing internships at MEGVII, MSRA, and Xiaobing.AI. He is now working on vision-language and representation learning.

polite-flamingo's People

Contributors

Stargazers

Watchers

Forkers

youngergao zhyj3038 wdr-ra02

polite-flamingo's Issues

when python gradio_demo.py

the error:
Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.
Traceback (most recent call last):
File "/root/miniconda3/envs/openflamingo/lib/python3.9/site-packages/gradio/routes.py", line 439, in run_predict
output = await app.get_blocks().process_api(
File "/root/miniconda3/envs/openflamingo/lib/python3.9/site-packages/gradio/blocks.py", line 1389, in process_api
result = await self.call_function(
File "/root/miniconda3/envs/openflamingo/lib/python3.9/site-packages/gradio/blocks.py", line 1094, in call_function
prediction = await anyio.to_thread.run_sync(
File "/root/miniconda3/envs/openflamingo/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/root/miniconda3/envs/openflamingo/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future
File "/root/miniconda3/envs/openflamingo/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/root/miniconda3/envs/openflamingo/lib/python3.9/site-packages/gradio/utils.py", line 704, in wrapper
response = f(*args, **kwargs)
File "/root/paddlejob/workspace/env_run/code/polite-flamingo/gradio_demo.py", line 185, in bot
inference_results = inferencer(
File "/root/paddlejob/workspace/env_run/code/polite-flamingo/gradio_demo.py", line 47, in call
return clever_flamingo_api(prompt, imgpaths)
File "/root/paddlejob/workspace/env_run/code/polite-flamingo/gradio_demo.py", line 45, in clever_flamingo_api
return js['result']['response']
version:
gradio: 3.37.0

May I know if the trainable part is only LoRA(ed) LLM?

https://github.com/ChenDelong1999/polite_flamingo/blob/bf8ef5e103bf7c86a2a4a2b5eae3f89c8580cfb6/polite_flamingo/src/factory.py#L93-L101

From my understanding of above code, is the PF's training part a LoRA LLM?

Thanks for your time in advance!

All Image data

This work is wonderful.
However, when I downloading the datasest, I found that the released resized_images.zip in huggingface are not all the images corresponding to PF-1M.json. I wonder whether you plan to open all the images contained in PF-1M.json?
Thanks~

Timeline for code release?

Dear authors,

Great work, and I really enjoyed reading the paper -- I appreciate the efforts undertaken to ensure high quality data for instruction tuning LMMs. Further, the results in Table 2 (multi-image reasoning) are impressive -- to the best of my knowledge, not many other foundational LMMs can perform well on multi-image reasoning benchmarks. I would like to play around with the models (CleverFlamingo) especially for these tasks.
Do you have an estimated timeline for when you could release the code and CleverFlamingo checkpoints? Further, if it would be possible I would request you to release the evaluation scripts used for the numbers in Table 2!

Looking forward to the release!

Dataset-PointingQA

Hello authors. Thanks for your effort in dataset contribution, but I feel confused to align the image in PF-1M to the open-source dataset [PointingQA]. For example,
<img_path>/pointingga-main/Datasets/LookTwiceQA/images_with_points_train/train_42636.jpg<img_path>
how to find this image in [PointingQA] or [Visual Genome]?

chendelong1999 / polite-flamingo Goto Github PK

polite-flamingo's Introduction

polite-flamingo's People

Contributors

Stargazers

Watchers

Forkers

polite-flamingo's Issues

when python gradio_demo.py

May I know if the trainable part is only LoRA(ed) LLM?

All Image data

Timeline for code release?

Dataset-PointingQA

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent