Giter VIP home page Giter VIP logo

opengvlab / interngpt Goto Github PK

View Code? Open in Web Editor NEW
3.1K 43.0 225.0 42.92 MB

InternGPT (iGPT) is an open source demo platform where you can easily showcase your AI models. Now it supports DragGAN, ChatGPT, ImageBind, multimodal chat like GPT-4, SAM, interactive image editing, etc. Try it at igpt.opengvlab.com (支持DragGAN、ChatGPT、ImageBind、SAM的在线Demo系统)

Home Page: https://igpt.opengvlab.com

License: Apache License 2.0

Python 98.11% C++ 0.24% Cuda 1.36% Dockerfile 0.28%
chatgpt foundation-model gpt gpt-4 gradio husky image-captioning langchain llm multimodal

interngpt's Introduction

[中文文档]

The project is still under construction, we will continue to update it and welcome contributions/pull requests from the community.

| |

🤖💬 InternGPT [Paper]

InternGPT(short for iGPT) / InternChat(short for iChat) is pointing-language-driven visual interactive system, allowing you to interact with ChatGPT by clicking, dragging and drawing using a pointing device. The name InternGPT stands for interaction, nonverbal, and ChatGPT. Different from existing interactive systems that rely on pure language, by incorporating pointing instructions, iGPT significantly improves the efficiency of communication between users and chatbots, as well as the accuracy of chatbots in vision-centric tasks, especially in complicated visual scenarios. Additionally, in iGPT, an auxiliary control mechanism is used to improve the control capability of LLM, and a large vision-language model termed Husky is fine-tuned for high-quality multi-modal dialogue (impressing ChatGPT-3.5-turbo with 93.89% GPT-4 Quality).

🤖💬 Online Demo

InternGPT is online (see https://igpt.opengvlab.com). Let's try it!

[NOTE] It is possible that you are waiting in a lengthy queue. You can clone our repo and run it with your private GPU.

Video Demo with DragGAN:

dragGAN_demo2.mp4

Video Demo with ImageBind:

video_demo_with_imagebind.mp4

iGPT Video Demo:

online_demo.mp4

🥳 🚀 What's New

  • (2023.06.19) We optimize the GPU memory usage when executing the tools. Please refer to Get Started.

  • (2023.06.19) We update the INSTALL.md which provides more detailed instructions for setting up environment.

  • (2023.05.31) It is with great regret that due to some emergency reasons, we have to suspend the online demo. If you want to experience all the features, please try them after deploying locally.

  • (2023.05.24) 🎉🎉🎉 We have supported the DragGAN! Please see the video demo for the usage. Let's try this awesome feauture: Demo. (我们现在支持了功能完全的DragGAN! 可以拖动、可以自定义图片,具体用法见video demo,复现的DragGAN代码在这里,在线demo在这里

  • (2023.05.18) We have supported ImageBind. Please see the video demo for the usage.

  • (2023.05.15) The model_zoo including HuskyVQA has been released! Try it on your local machine!

  • (2023.05.15) Our code is also publicly available on Hugging Face! You can duplicate the repository and run it on your own GPUs.

🧭 User Manual

Update:

(2023.05.24) We now support DragGAN. You can try it as follows:

  • Click the button New Image;
  • Click the image where blue denotes the start point and red denotes the end point;
  • Notice that the number of blue points is the same as the number of red points. Then you can click the button Drag It;
  • After processing, you will receive an edited image and a video that visualizes the editing process.

(2023.05.18) We now support ImageBind. If you want to generate a new image conditioned on audio, you can upload an audio file in advance:

  • To generate a new image from a single audio file, you can send the message like: "generate a real image from this audio";
  • To generate a new image from audio and text, you can send the message like: "generate a real image from this audio and {your prompt}";
  • To generate a new image from audio and image, you need to upload an image and then send the message like: "generate a new image from above image and audio".

Main features:

After uploading the image, you can have a multi-modal dialogue by sending messages like: "what is it in the image?" or "what is the background color of image?".
You also can interactively operate, edit or generate the image as follows:

  • You can click the image and press the button Pick to visualize the segmented region or press the button OCR to recognize the words at chosen position;
  • To remove the masked reigon in the image, you can send the message like: "remove the masked region";
  • To replace the masked reigon in the image, you can send the message like: "replace the masked region with {your prompt}";
  • To generate a new image, you can send the message like: "generate a new image based on its segmentation describing {your prompt}"
  • To create a new image by your scribble, you should press button Whiteboard and draw in the board. After drawing, you need to press the button Save and send the message like: "generate a new image based on this scribble describing {your prompt}".

🗓️ Schedule

  • Support VisionLLM
  • Support Chinese
  • Support MOSS
  • More powerful foundation models based on InternImage and InternVideo
  • More accurate interactive experience
  • OpenMMLab toolkit
  • Web page & code generation
  • Support search engine
  • Low cost deployment
  • Support DragGAN
  • Support ImageBind
  • Response verification for agent
  • Prompt optimization
  • User manual and video demo
  • Support voice assistant
  • Support click interaction
  • Interactive image editing
  • Interactive image generation
  • Interactive visual question answering
  • Segment anything
  • Image inpainting
  • Image caption
  • Image matting
  • Optical character recognition
  • Action recognition
  • Video caption
  • Video dense caption
  • Video highlight interpretation

🏠 System Overview

arch

🎁 Major Features

Remove the masked object

Interactive image editing

Image generation

Interactive visual question answer

Interactive image generation

Video highlight interpretation

🛠️ Installation

See INSTALL.md

👨‍🏫 Get Started

Running the following shell can start a gradio service for our basic features:

python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456 -e

if you want to enable the voice assistant, please use openssl to generate the certificate:

mkdir certificate
openssl req -x509 -newkey rsa:4096 -keyout certificate/key.pem -out certificate/cert.pem -sha256 -days 365 -nodes

and then run:

python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" \
--port 3456 --https -e

For all features of our iGPT, you need to run:

python -u app.py \
--load "ImageOCRRecognition_cuda:0,Text2Image_cuda:0,SegmentAnything_cuda:0,ActionRecognition_cuda:0,VideoCaption_cuda:0,DenseCaption_cuda:0,ReplaceMaskedAnything_cuda:0,LDMInpainting_cuda:0,SegText2Image_cuda:0,ScribbleText2Image_cuda:0,Image2Scribble_cuda:0,Image2Canny_cuda:0,CannyText2Image_cuda:0,StyleGAN_cuda:0,Anything2Image_cuda:0,HuskyVQA_cuda:0" \
-p 3456 --https -e

Notice that -e flag can save a lot of memory.

Selectively Loading Features

When you only want to try DragGAN, you just need to load StyleGAN and open the tab "DragGAN":

python -u app.py --load "StyleGAN_cuda:0" --tab "DragGAN" --port 3456 --https -e

In this situation, you can only use the functions of DragGAN, which frees you from some dependencies that you are not interested in.

🎫 License

This project is released under the Apache 2.0 license.

🖊️ Citation

If you find this project useful in your research, please consider cite:

@article{2023interngpt,
  title={InternGPT: Solving Vision-Centric Tasks by Interacting with ChatGPT Beyond Language},
  author={Liu, Zhaoyang and He, Yinan and Wang, Wenhai and Wang, Weiyun and Wang, Yi and Chen, Shoufa and Zhang, Qinglong and Lai, Zeqiang and Yang, Yang and Li, Qingyun and Yu, Jiashuo and others},
  journal={arXiv preprint arXiv:2305.05662},
  year={2023}
}

🤝 Acknowledgement

Thanks to the open source of the following projects:

Hugging FaceLangChainTaskMatrixSAMStable DiffusionControlNetInstructPix2PixBLIPLatent Diffusion ModelsEasyOCRImageBindDragGAN

Welcome to discuss with us and continuously improve the user experience of InternGPT.

If you want to join our WeChat group, please scan the following QR Code to add our assistant as a Wechat friend:

image

interngpt's People

Contributors

czczup avatar eltociear avatar erfeicui avatar g-z-w avatar jnc-nj avatar liu-zhy avatar rajathbharadwaj avatar weikx avatar whai362 avatar yinanhe avatar zeqiang-lai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

interngpt's Issues

下载时候呢,开clash 会报错,不开也报错

开VPN 时候报错如下:
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProxyError('Cannot connect to proxy.', OSError(0, 'Error'))': /simple/draggan/
不开VPN 下载一会报错如下:
下载大概不到一分钟,就速度为0 ,就报错如下
ERROR: Exception:
Traceback (most recent call last):
File "C:\Users\Administrator.conda\envs\draggan\lib\site-packages\pip_vendor\urllib3\response.py", line 437, in _error_catcher
yield
File "C:\Users\Administrator.conda\envs\draggan\lib\site-packages\pip_vendor\urllib3\response.py", line 560, in read
data = self._fp_read(amt) if not fp_closed else b""
File "C:\Users\Administrator.conda\envs\draggan\lib\site-packages\pip_vendor\urllib3\response.py", line 526, in _fp_read
return self._fp.read(amt) if amt is not None else self._fp.read()
File "C:\Users\Administrator.conda\envs\draggan\lib\site-packages\pip_vendor\cachecontrol\filewrapper.py", line 90, in read
data = self.__fp.read(amt)

error: metadata-generation-failed

After installation according to guidiance, i run the follow commands, then the error happen.
python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456

Looking in indexes: http://mirrors.ops.ctripcorp.com/pypi-latest/simple
Collecting git+https://github.com/facebookresearch/detectron2.git
Cloning https://github.com/facebookresearch/detectron2.git to /tmp/pip-req-build-tcy8m3_g
Running command git clone --filter=blob:none --quiet https://github.com/facebookresearch/detectron2.git /tmp/pip-req-build-tcy8m3_g
Resolved https://github.com/facebookresearch/detectron2.git to commit 3c7bb714795edc7a96c9a1a6dd83663ecd293e36
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

_ python setup.py egg_info did not run successfully.
_ exit code: 1
__> [12 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/tmp/pip-req-build-tcy8m3_g/setup.py", line 10, in
import torch
File "/home/powerop/work/conda/envs/igpt/lib/python3.8/site-packages/torch/init.py", line 191, in
_load_global_deps()
File "/home/powerop/work/conda/envs/igpt/lib/python3.8/site-packages/torch/init.py", line 153, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/powerop/work/conda/envs/igpt/lib/python3.8/ctypes/init.py", line 373, in init
self._handle = _dlopen(self._name, mode)
OSError: /home/powerop/work/conda/envs/igpt/lib/python3.8/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

_ Encountered error while generating package metadata.
__> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Traceback (most recent call last):
File "app.py", line 28, in
from iGPT.controllers import ConversationBot
File "/home/powerop/work/gongsong/InternGPT/iGPT/init.py", line 1, in
from .models import *
File "/home/powerop/work/gongsong/InternGPT/iGPT/models/init.py", line 1, in
from .image import (InstructPix2Pix, ImageText2Image,
File "/home/powerop/work/gongsong/InternGPT/iGPT/models/image.py", line 2, in
import torch
File "/home/powerop/work/conda/envs/igpt/lib/python3.8/site-packages/torch/init.py", line 191, in
_load_global_deps()
File "/home/powerop/work/conda/envs/igpt/lib/python3.8/site-packages/torch/init.py", line 153, in _load_global_deps
ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
File "/home/powerop/work/conda/envs/igpt/lib/python3.8/ctypes/init.py", line 373, in init
self._handle = _dlopen(self._name, mode)
OSError: /home/powerop/work/conda/envs/igpt/lib/python3.8/site-packages/torch/lib/../../nvidia/cublas/lib/libcublas.so.11: symbol cublasLtHSHMatmulAlgoInit version libcublasLt.so.11 not defined in file libcublasLt.so.11 with link time reference

About the choice of visual models

Hi~
Thanks for your great work!

I have read your paper and went through in detail this script (https://github.com/OpenGVLab/InternGPT/blob/main/iGPT/controllers/ConversationBot.py).

I noticed that the used visual models are determined by some key words, i.e., remove & erase means LDMInpainting, describe & introduce means HuskyVQA. This is a direct and effective way.

However, I wonder what will happen if the user does not input such words. For example, the user could input take out some objects instead of remove some objects for object removing.

Thanks in advance.

OpenAI key not accepted

After the demo server spins up, I cannot login to OpenAI with an api key. I confirmed with another app that the api key is good.
In the GUI I see:

Incorrect key, please input again

The console just logs:
===>logging in

How can I debug this further?

只启动DragGAN,生成图片到了最后一步Error, 报错内容needs one of codec_name or template

只启动DragGAN,生成图片到了最后一步Error, 报错内容needs one of codec_name or template,请问该如何处理?

(igpt) [root@localhost InternGPT]# python -u app.py --load "StyleGAN_cuda:0" --tab "DragGAN" --port 3456 --https
[06/05 16:12:15] bark.generation WARNING: torch version does not support flash attention. You will get faster inference speed by upgrade torch to newest nightly version.
Initializing InternGPT, load_dict={'StyleGAN': 'cuda:0'}
Running on local URL: https://0.0.0.0:3456

To create a public link, set share=True in launch().
===>logging in
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
Traceback (most recent call last):
File "/usr/local/anaconda3/envs/igpt/lib/python3.8/site-packages/gradio/routes.py", line 399, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/anaconda3/envs/igpt/lib/python3.8/site-packages/gradio/blocks.py", line 1299, in process_api
result = await self.call_function(
File "/usr/local/anaconda3/envs/igpt/lib/python3.8/site-packages/gradio/blocks.py", line 1036, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/anaconda3/envs/igpt/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/anaconda3/envs/igpt/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "/usr/local/anaconda3/envs/igpt/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
result = context.run(func, *args)
File "/usr/local/anaconda3/envs/igpt/lib/python3.8/site-packages/gradio/utils.py", line 488, in async_iteration
return next(iterator)
File "/data/InternGPT/iGPT/controllers/ConversationBot.py", line 920, in drag_it
imageio.mimsave(video_name, style_gan_state['history'])
File "/usr/local/anaconda3/envs/igpt/lib/python3.8/site-packages/imageio/v2.py", line 484, in mimwrite
return file.write(ims, is_batch=True, **kwargs)
File "/usr/local/anaconda3/envs/igpt/lib/python3.8/site-packages/imageio/plugins/pyav.py", line 634, in write
self.init_video_stream(codec, fps=fps, pixel_format=out_pixel_format)
File "/usr/local/anaconda3/envs/igpt/lib/python3.8/site-packages/imageio/plugins/pyav.py", line 846, in init_video_stream
stream = self._container.add_stream(codec, fps)
File "av/container/output.pyx", line 61, in av.container.output.OutputContainer.add_stream
ValueError: needs one of codec_name or template

Get stuck in conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia

conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia
Collecting package metadata (current_repodata.json): - WARNING conda.models.version:get_matcher(535): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.7.1.*, but conda is ignoring the .* and treating it as 1.7.1
done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): | WARNING conda.models.version:get_matcher(535): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.9.0.*, but conda is ignoring the .* and treating it as 1.9.0
WARNING conda.models.version:get_matcher(535): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.8.0.*, but conda is ignoring the .* and treating it as 1.8.0
WARNING conda.models.version:get_matcher(535): Using .* with relational operator is superfluous and deprecated and will be removed in a future version of conda. Your spec was 1.6.0.*, but conda is ignoring the .* and treating it as 1.6.0
done
Solving environment: - 

I wonder if you ever got stuck here before, thanks!

Job portal Newbies

<title>GEMJob Portal</title> GEMJob Portal
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.7/umd/popper.min.js"></script> <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"></script>

if resolved_archive_file.endswith(".index"): AttributeError: 'list' object has no attribute 'endswith

Traceback (most recent call last):
File "/mnt/data/creative/InternGPT/app.py", line 225, in
bot = ConversationBot(load_dict=load_dict, e_mode=args.e_mode)
File "/mnt/data/creative/InternGPT/iGPT/controllers/ConversationBot.py", line 144, in init
self.models[class_name] = globals()class_name
File "/mnt/data/creative/InternGPT/iGPT/models/husky.py", line 369, in init
download_if_not_exists(base_path="model_zoo/llama",
File "/mnt/data/creative/InternGPT/iGPT/models/husky.py", line 359, in download_if_not_exists
apply_delta(output_dir, new_path, delta_path)
File "/mnt/data/creative/InternGPT/iGPT/models/husky_src/load_ckpt.py", line 11, in apply_delta
base = AutoModelForCausalLM.from_pretrained(base_model_path, torch_dtype=torch.float16,from_tf=True,low_cpu_mem_usage=True)
File "/mnt/data/creative/miniconda3/envs/internGPT/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 471, in from_pretrained
return model_class.from_pretrained(
File "/mnt/data/creative/miniconda3/envs/internGPT/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2753, in from_pretrained
if resolved_archive_file.endswith(".index"):
AttributeError: 'list' object has no attribute 'endswith'

When run python -u app.py --load "HuskyVQA_cuda:1,SegmentAnything_cuda:2,ImageOCRRecognition_cuda:3" --port 7863 -e,it happens
How to solve it?

Download a 461M file when initializing InternGPT

What this 461M file is? Seems that it is not in the model_zoo. And I cannot find the url to download it manualy.

root@autodl-container-895011b752-eff81c3b:~/autodl-tmp/InternGPT_github/InternGPT# python -u app.py --load "StyleGAN_cuda:0" --tab "DragGAN" --port 19991 --https -e
[08/23 10:29:05] bark.generation WARNING: torch version does not support flash attention. You will get faster inference speed by upgrade torch to newest nightly version.
Initializing InternGPT, load_dict={'StyleGAN': 'cuda:0'}
 11%|████▏                                | 51.8M/461M [09:00<1:10:47, 101kiB/s]

AttributeError: partially initialized module 'cv2' has no attribute 'gapi_wip_gst_GStreamerPipeline' (most likely due to a circular import)

when running this command

python -u app.py \ --load "ImageOCRRecognition_cuda:0,Text2Image_cuda:0,SegmentAnything_cuda:0,ActionRecognition_cuda:0,VideoCaption_cuda:0,DenseCaption_cuda:0,ReplaceMaskedAnything_cuda:0,LDMInpainting_cuda:0,SegText2Image_cuda:0,ScribbleText2Image_cuda:0,Image2Scribble_cuda:0,Image2Canny_cuda:0,CannyText2Image_cuda:0,StyleGAN_cuda:0,Anything2Image_cuda:0,HuskyVQA_cuda:0" -e -p 3456 --https

I met this error

image

Husky model not getting initialized.

Hello. I somehow managed to get the LLama weights manually as per the directory structure. Everything seemed to be going fine. However, when the Husky model started to load, I got this error:

OSError: model_zoo/husky-7b-delta-v0_01 is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo with `use_auth_token` or log in with `huggingface-cli login` and pass `use_auth_token=True`.

From further inspection, it seems that that the Husky 7B model is not available on Hugging Face. Any help on this front is appreciated.

Husky的训练过程?

论文中没有说明Husky的训练过程,只说了是三阶段的训练?可以详细说一下这三阶段训练的过程与用到的数据集吗

OSError: Unable to load weights from pytorch checkpoint file for 'model_zoo/llama_7B_hf/pytorch_model-00002-of-00033.bin' at 'model_zoo/llama_7B_hf/pytorch_model-00002-of-00033.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

OSError: Unable to load weights from pytorch checkpoint file for 'model_zoo/llama_7B_hf/pytorch_model-00002-of-00033.bin' at 'model_zoo/llama_7B_hf/pytorch_model-00002-of-00033.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
When run python -u app.py --load "HuskyVQA_cuda:1,SegmentAnything_cuda:2,ImageOCRRecognition_cuda:3" --port 7863 -e,it happens
How to solve it?

Online

How can I run it online? I've tried Colab, Gradio, etc...I know there's a online demo, but it is suspended, and I can't use it and I can't wait, also, my PC doesnt meet the requirements, so if someone can provide the code, give it!

OpenAI API

When I run python -u app.py --load "StyleGAN_cuda:0" --tab "DragGAN" --port 3456 --https -e, the web page requires an OpenAI API key to log in. Is the connection to World Wide Web essential for running your works?
image

_pickle.UnpicklingError: invalid load key, 'v'.

I get the following error:

(igpt) host:~/fa/InternGPT$ python -u app.py --load "ImageOCRRecognition_cuda:0,Text2Image_cuda:0,SegmentAnything_cuda:0,ActionRecognition_cuda:0,VideoCaption_cuda:0,DenseCaption_cuda:0,ReplaceMaskedAnything_cuda:0,LDMInpainting_cuda:0,SegText2Image_cuda:0,ScribbleText2Image_cuda:0,Image2Scribble_cuda:0,Image2Canny_cuda:0,CannyText2Image_cuda:0,StyleGAN_cuda:0,Anything2Image_cuda:0,HuskyVQA_cuda:0" -e -p 3456 --https
[07/19 11:24:32] bark.generation WARNING: torch version does not support flash attention. You will get faster inference speed by upgrade torch to newest nightly version.
Initializing InternGPT, load_dict={'ImageOCRRecognition': 'cuda:0', 'Text2Image': 'cuda:0', 'SegmentAnything': 'cuda:0', 'ActionRecognition': 'cuda:0', 'VideoCaption': 'cuda:0', 'DenseCaption': 'cuda:0', 'ReplaceMaskedAnything': 'cuda:0', 'LDMInpainting': 'cuda:0', 'SegText2Image': 'cuda:0', 'ScribbleText2Image': 'cuda:0', 'Image2Scribble': 'cuda:0', 'Image2Canny': 'cuda:0', 'CannyText2Image': 'cuda:0', 'StyleGAN': 'cuda:0', 'Anything2Image': 'cuda:0', 'HuskyVQA': 'cuda:0'}
Initializing ImageOCRRecognition to cuda:0
Initializing Text2Image to cuda:0
text_config_dict is provided which will be used to initialize CLIPTextConfig. The value text_config["id2label"] will be overriden.
Initializing SegmentAnything to cuda:0
Traceback (most recent call last):
File "app.py", line 225, in
bot = ConversationBot(load_dict=load_dict, e_mode=args.e_mode)
File "/home/ubuntu/fa/InternGPT/iGPT/controllers/ConversationBot.py", line 144, in init
self.models[class_name] = globals()class_name
File "/home/ubuntu/fa/InternGPT/iGPT/models/image.py", line 672, in init
self.sam = sam_model_registrymodel_type
File "/home/ubuntu/anaconda3/envs/igpt/lib/python3.8/site-packages/segment_anything/build_sam.py", line 15, in build_sam_vit_h
return _build_sam(
File "/home/ubuntu/anaconda3/envs/igpt/lib/python3.8/site-packages/segment_anything/build_sam.py", line 105, in _build_sam
state_dict = torch.load(f)
File "/home/ubuntu/anaconda3/envs/igpt/lib/python3.8/site-packages/torch/serialization.py", line 795, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/ubuntu/anaconda3/envs/igpt/lib/python3.8/site-packages/torch/serialization.py", line 1002, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

follow readme.md ,I got this error when I try to start a gradio service

Traceback (most recent call last):
  File "E:\github\InternGPT\app.py", line 28, in <module>
    from iGPT.controllers import ConversationBot
  File "E:\github\InternGPT\iGPT\__init__.py", line 1, in <module>
    from .models import *
  File "E:\github\InternGPT\iGPT\models\__init__.py", line 1, in <module>
    from .image import (InstructPix2Pix, ImageText2Image,
  File "E:\github\InternGPT\iGPT\models\image.py", line 8, in <module>
    import cv2
  File "C:\Users\iwaitu\anaconda3\envs\igpt\lib\site-packages\cv2\__init__.py", line 181, in <module>
    bootstrap()
  File "C:\Users\iwaitu\anaconda3\envs\igpt\lib\site-packages\cv2\__init__.py", line 175, in bootstrap
    if __load_extra_py_code_for_module("cv2", submodule, DEBUG):
  File "C:\Users\iwaitu\anaconda3\envs\igpt\lib\site-packages\cv2\__init__.py", line 28, in __load_extra_py_code_for_module
    py_module = importlib.import_module(module_name)
  File "C:\Users\iwaitu\anaconda3\envs\igpt\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "C:\Users\iwaitu\anaconda3\envs\igpt\lib\site-packages\cv2\gapi\__init__.py", line 301, in <module>
    cv.gapi.wip.GStreamerPipeline = cv.gapi_wip_gst_GStreamerPipeline
AttributeError: partially initialized module 'cv2' has no attribute 'gapi_wip_gst_GStreamerPipeline' (most likely due to a circular import)

typos

Hi! I noticed that you may wrongly type 'marked' as 'maked' in the Chinese doc.

Commercial use / LLaMa Dependency?

It seems this has a dependency on LLaMa base weights and Segment Anything. Is that correct?

Is there a way to separate this out and use OpenAI or an actually commercially viable model such as MosaicML's MPT-7B-Instruct?

Table 4 experiment

HI, how is the experiment in Table 4 conducted? What if we have a large dataset e.g. 1000 VQA samples, how could we conduct the experiment?

LLama weights not getting downloaded. Error 403 Forbidden.

Hello. I have been trying to setup InternGPT with local LLama2 downloads. But even after entering the proper link and checking multiple times, I am getting this.

Initializing InternGPT, load_dict={'HuskyVQA': 'cuda:0'}
Downloading tokenizer
model_zoo/llama/tokenizer.model: No such file or directory
model_zoo/llama/tokenizer_checklist.chk: No such file or directory
third-party/llama_download.sh: line 19: cd: model_zoo/llama: No such file or directory
Downloading 7B
--2023-11-18 16:15:15--  https://download.llamameta.net/7B/consolidated.00.pth?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoicjlpeHBvdDBoZHVlanRwdHB1dDhqZDA1IiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQubGxhbWFtZXRhLm5ldFwvKiIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcwMDM4OTg0N319fV19&Signature=lL2YWTPbU8BSWD0wQfqPZ2cjvur44OSSkSXe63V7rvBpCZ80I%7EivDgMaay%7E8dYiOXNj6ULoJJE-Tyl6xn51AW4etc6bP1p2anPc3pWCd-q48GKyKYyvVvOR44EOxfp9dSVzuUBMk83VXkILzGn7kDUWYooGWrov3kRSK72-d2zhsPdcYtdVijc1rG%7EUorXDz8pkUDHUeOHNxgOCQL-0WN-u8BDlvH2HFAbJLWSl1M-Gi4rR4wkyxjH%7EmTqdt-qmaob5L1lF6N9D1jCTupNnIzYMDxBb7sz5qvp6OlBwJonMYGu2tlN%7Ea4DLNT7a-3aHF2JPGLoilVKVt8XexfaTJ4A__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=997730767997515
Resolving download.llamameta.net (download.llamameta.net)... 18.154.144.23, 18.154.144.95, 18.154.144.56, ...
Connecting to download.llamameta.net (download.llamameta.net)|18.154.144.23|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-11-18 16:15:17 ERROR 403: Forbidden.

--2023-11-18 16:15:17--  https://download.llamameta.net/7B/params.json?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoicjlpeHBvdDBoZHVlanRwdHB1dDhqZDA1IiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQubGxhbWFtZXRhLm5ldFwvKiIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcwMDM4OTg0N319fV19&Signature=lL2YWTPbU8BSWD0wQfqPZ2cjvur44OSSkSXe63V7rvBpCZ80I%7EivDgMaay%7E8dYiOXNj6ULoJJE-Tyl6xn51AW4etc6bP1p2anPc3pWCd-q48GKyKYyvVvOR44EOxfp9dSVzuUBMk83VXkILzGn7kDUWYooGWrov3kRSK72-d2zhsPdcYtdVijc1rG%7EUorXDz8pkUDHUeOHNxgOCQL-0WN-u8BDlvH2HFAbJLWSl1M-Gi4rR4wkyxjH%7EmTqdt-qmaob5L1lF6N9D1jCTupNnIzYMDxBb7sz5qvp6OlBwJonMYGu2tlN%7Ea4DLNT7a-3aHF2JPGLoilVKVt8XexfaTJ4A__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=997730767997515
Resolving download.llamameta.net (download.llamameta.net)... 18.154.144.23, 18.154.144.95, 18.154.144.56, ...
Connecting to download.llamameta.net (download.llamameta.net)|18.154.144.23|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-11-18 16:15:18 ERROR 403: Forbidden.

--2023-11-18 16:15:18--  https://download.llamameta.net/7B/checklist.chk?Policy=eyJTdGF0ZW1lbnQiOlt7InVuaXF1ZV9oYXNoIjoicjlpeHBvdDBoZHVlanRwdHB1dDhqZDA1IiwiUmVzb3VyY2UiOiJodHRwczpcL1wvZG93bmxvYWQubGxhbWFtZXRhLm5ldFwvKiIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTcwMDM4OTg0N319fV19&Signature=lL2YWTPbU8BSWD0wQfqPZ2cjvur44OSSkSXe63V7rvBpCZ80I%7EivDgMaay%7E8dYiOXNj6ULoJJE-Tyl6xn51AW4etc6bP1p2anPc3pWCd-q48GKyKYyvVvOR44EOxfp9dSVzuUBMk83VXkILzGn7kDUWYooGWrov3kRSK72-d2zhsPdcYtdVijc1rG%7EUorXDz8pkUDHUeOHNxgOCQL-0WN-u8BDlvH2HFAbJLWSl1M-Gi4rR4wkyxjH%7EmTqdt-qmaob5L1lF6N9D1jCTupNnIzYMDxBb7sz5qvp6OlBwJonMYGu2tlN%7Ea4DLNT7a-3aHF2JPGLoilVKVt8XexfaTJ4A__&Key-Pair-Id=K15QRJLYKIFSLZ&Download-Request-ID=997730767997515
Resolving download.llamameta.net (download.llamameta.net)... 18.154.144.23, 18.154.144.95, 18.154.144.56, ...
Connecting to download.llamameta.net (download.llamameta.net)|18.154.144.23|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-11-18 16:15:18 ERROR 403: Forbidden.

Is anyone else facing this issue?

在gradio的demo上不管跑什么模块,终端始终都报同样的错误。

在gradio的demo上不管跑什么模块,终端都报同样的错误。
Traceback (most recent call last):
File "/mnt/ai212/workspace/dengtb/anaconda3/envs/ichat/lib/python3.8/site-packages/gradio/routes.py", line 414, in run_predict
output = await app.get_blocks().process_api(
File "/mnt/ai212/workspace/dengtb/anaconda3/envs/ichat/lib/python3.8/site-packages/gradio/blocks.py", line 1323, in process_api
data = self.postprocess_data(fn_index, result["prediction"], state)
File "/mnt/ai212/workspace/dengtb/anaconda3/envs/ichat/lib/python3.8/site-packages/gradio/blocks.py", line 1257, in postprocess_data
prediction_value = block.postprocess(prediction_value)
File "/mnt/ai212/workspace/dengtb/anaconda3/envs/ichat/lib/python3.8/site-packages/gradio/components.py", line 4629, in postprocess
assert isinstance(
AssertionError: Expected a list of lists or list of tuples. Received: None

2 reference locations of the model

Hello, just a heads up warning:
I think it's looking for the model in two different locations

  • model_zoo/llama\7B\
  • model_zoo\llama_7B_hf
    If I copy the model to both locations the demo server comes up.
(ichat) E:\ai\InternGPT>python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456
[05/17 20:42:38] bark.generation WARNING: torch version does not support flash attention. You will get faster inference speed by upgrade torch to newest nightly version.
Initializing InternGPT, load_dict={'HuskyVQA': 'cuda:0', 'SegmentAnything': 'cuda:0', 'ImageOCRRecognition': 'cuda:0'}
Für das Windows-Subsystem für Linux wurden keine Distributionen installiert.
Distributionen zur Installation finden Sie im Microsoft Store:
https://aka.ms/wslstore
Traceback (most recent call last):
  File "app.py", line 221, in <module>
    bot = ConversationBot(load_dict=load_dict)
  File "E:\ai\InternGPT\iGPT\controllers\ConversationBot.py", line 141, in __init__
    self.models[class_name] = globals()[class_name](device=device)
  File "E:\ai\InternGPT\iGPT\models\husky.py", line 368, in __init__
    download_if_not_exists(base_path="model_zoo/llama",
  File "E:\ai\InternGPT\iGPT\models\husky.py", line 351, in download_if_not_exists
    write_model(
  File "E:\ai\InternGPT\iGPT\models\husky_src\convert_llama_weights_to_hf.py", line 93, in write_model
    params = read_json(os.path.join(input_base_path, "params.json"))
  File "E:\ai\InternGPT\iGPT\models\husky_src\convert_llama_weights_to_hf.py", line 79, in read_json
    with open(path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'model_zoo/llama\\7B\\params.json'

(ichat) E:\ai\InternGPT>python -u app.py --load "HuskyVQA_cuda:0,SegmentAnything_cuda:0,ImageOCRRecognition_cuda:0" --port 3456
[05/17 20:44:16] bark.generation WARNING: torch version does not support flash attention. You will get faster inference speed by upgrade torch to newest nightly version.
Initializing InternGPT, load_dict={'HuskyVQA': 'cuda:0', 'SegmentAnything': 'cuda:0', 'ImageOCRRecognition': 'cuda:0'}
Loading base model
Traceback (most recent call last):
  File "app.py", line 221, in <module>
    bot = ConversationBot(load_dict=load_dict)
  File "E:\ai\InternGPT\iGPT\controllers\ConversationBot.py", line 141, in __init__
    self.models[class_name] = globals()[class_name](device=device)
  File "E:\ai\InternGPT\iGPT\models\husky.py", line 368, in __init__
    download_if_not_exists(base_path="model_zoo/llama",
  File "E:\ai\InternGPT\iGPT\models\husky.py", line 359, in download_if_not_exists
    apply_delta(output_dir, new_path, delta_path)
  File "E:\ai\InternGPT\iGPT\models\husky_src\load_ckpt.py", line 11, in apply_delta
    base = AutoModelForCausalLM.from_pretrained(base_model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True)
  File "C:\Users\Sasch\.conda\envs\ichat\lib\site-packages\transformers\models\auto\auto_factory.py", line 441, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
  File "C:\Users\Sasch\.conda\envs\ichat\lib\site-packages\transformers\models\auto\configuration_auto.py", line 916, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "C:\Users\Sasch\.conda\envs\ichat\lib\site-packages\transformers\configuration_utils.py", line 573, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "C:\Users\Sasch\.conda\envs\ichat\lib\site-packages\transformers\configuration_utils.py", line 628, in _get_config_dict
    resolved_config_file = cached_file(
  File "C:\Users\Sasch\.conda\envs\ichat\lib\site-packages\transformers\utils\hub.py", line 380, in cached_file
    raise EnvironmentError(
OSError: model_zoo\llama_7B_hf does not appear to have a file named config.json. Checkout 'https://huggingface.co/model_zoo\llama_7B_hf/None' for available files.

Module conflicts in dependencies declaring in the requirements.txt.

Background

Dependencies in requirements.txt have module conflicts.

Description

There are two dependencies mentioned in the requirements.txt file: opencv-python and albumentations and the 'albumentations' depends on opencv-python-headless. The official spec mentioned that the opencv-python package is for the desktop environment, while opencv-python-headless is for the server environment. The documentation also states that these two packages cannot be installed simultaneously (the exact wording is: “There are four different packages (see options 1, 2, 3, and 4 below) and you should SELECT ONLY ONE OF THEM.”). This is because they both use the same module name cv2.

During the installation process using pip, the package installed later will override the cv2 module from the previously installed package (specifically, the modules within the cv2 folders that exist in both packages). Furthermore, the dependency graph even includes different versions of these two packages. It is certain that the common files with the same path in these two packages contain different contents. Therefore, there may be functional implications when using them. However, without analyzing the specific code and function call hierarchy of this project, it can be stated that issues related to overwriting and module conflicts do exist.

Steps to Reproduce

pip install -r requirements.txt

Desired Change

Indeed, it is not an ideal behavior for modules to be overwritten, even if they are not actively used or if the overwritten module is the one being called. It introduces uncertainty and can cause issues in the long run, especially if there are changes or updates to the overwritten modules in future development. It is generally recommended to avoid such conflicts and ensure that only the necessary and compatible dependencies are declared in the requirements to maintain a stable and predictable environment for the project.

We believe that although this project can only modify direct dependencies and indirect dependencies are a black box, it is possible to add additional explanations rather than directly declaring both conflicting packages in the requirements.txt file.

Adding extra explanations or documentation about the potential conflicts and the need to choose only one of the conflicting packages can help developers understand the issue and make informed decisions. Including a clear instruction or warning in the project’s documentation can guide users to choose the appropriate package based on their specific requirements.

OSError: model_zoo/llama_7B_hf does not appear to have a file named config.json

Hi,

Videos of InternGPT looks highly potential, but we stuck up loading the models with the following error:
"OSError: model_zoo/llama_7B_hf does not appear to have a file named config.json"

We understand that it can be due to the license issue. For original checkpoint of LLAMA, we also requested by filling google form but unfortunately we didn't hear any from Meta yet.

Hence can you please suggest if there is any other way to remove dependency on LLaMA and what could be the compatible model to use with InternGPT otherwise?

Thank you in advance.

error: create_ssl_context

ctx.load_cert_chain(cerfile, keyfile, get_password)
FileNotFoundError: [Errorno 2] No such file or directory.

在线运行

你能在线运行吗?而且,您将如何在Google Colab中运行它

ModuleNotFoundError: No module named 'controlnet_aux'

Error after running: python app.py

File "/Users/username/Development/InternGPT/InternGPT/iGPT/models/image.py", line 20, in
from controlnet_aux import OpenposeDetector, MLSDdetector, HEDdetector
ModuleNotFoundError: No module named 'controlnet_aux'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.