kjerk / instructblip-pipeline Goto Github PK
View Code? Open in Web Editor NEWA multimodal inference pipeline that integrates InstructBLIP with textgen-webui for Vicuna and related models.
License: BSD 3-Clause "New" or "Revised" License
A multimodal inference pipeline that integrates InstructBLIP with textgen-webui for Vicuna and related models.
License: BSD 3-Clause "New" or "Revised" License
I was super disappointed when I tried to run blip instruct out of the box and it just didn't work. I saw your comment on the LAVIS repo and really appreciated you trying to make this work!
I was wondering if this could be implemented or adapted outside of Oobabooga? I wasn't able to get it running in there to test it out.
I was getting an error and didn't spend a lot of time digging into it to be honest.
I'm working on an image captioning project for archival photos. I've implemented Salesforce/blip2-opt-2.7b locally and it's quite good! However, I would love to explore the use of llms, particularly the wizard models, to help add additional user guided detail to these images. The code to get blip2-opt running is amazingly trivial. The oobabooga api, however, has given me some problems.
Is it possible to implement blip-instruct outside of oobabooga without a lot of hassle? or are you piggybacking off of their exllama and other model loading infrastructure?
I'm using oobabooga on Windows 11 with an Nvidia 1080 8GB. I've actived the multimodal extension and installed your pipeline as suggested in the readme.
The I load this model https://huggingface.co/Yhyu13/instructblip-vicuna-7b-gptq-4bit
It is the only one I can find on huggingface for instructblip and gptq.
After clicking on the load button I get the following error message.
2023-09-22 20:19:46 ERROR:Failed to load the model.
Traceback (most recent call last):
File "C:\apps\oobabooga_windows\text-generation-webui\modules\ui_model_menu.py", line 194, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
File "C:\apps\oobabooga_windows\text-generation-webui\modules\models.py", line 76, in load_model
output = load_func_maploader
File "C:\apps\oobabooga_windows\text-generation-webui\modules\models.py", line 302, in AutoGPTQ_loader
return modules.AutoGPTQ_loader.load_quantized(model_name)
File "C:\apps\oobabooga_windows\text-generation-webui\modules\AutoGPTQ_loader.py", line 57, in load_quantized
model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
File "C:\apps\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py", line 87, in from_quantized
model_type = check_and_get_model_type(model_name_or_path, trust_remote_code)
File "C:\apps\oobabooga_windows\installer_files\env\lib\site-packages\auto_gptq\modeling_utils.py", line 149, in check_and_get_model_type
raise TypeError(f"{config.model_type} isn't supported yet.")
TypeError: instructblip isn't supported yet.
AFAIK Only Flan T5 models are "uncensored" in large visual question answering models, and it can generate creative answers than Vicuna variant, LLaVA and MiniGPT.
https://huggingface.co/Salesforce/instructblip-flan-t5-xl
https://huggingface.co/Salesforce/instructblip-flan-t5-xxl
While AutoGPTQ does not support T5 models, GPTQ-For-LLaMA supports it.
I tried to modify the code to make it work, but I don't know much about AI and my skills are lacking.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.