Giter VIP home page Giter VIP logo

ttengwang / caption-anything Goto Github PK

View Code? Open in Web Editor NEW
1.6K 15.0 96.0 53.19 MB

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/spaces/TencentARC/Caption-Anything https://huggingface.co/spaces/VIPLab/Caption-Anything

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
chatgpt controllable-generation segment-anything controllable-image-captioning image-captioning

caption-anything's Introduction

Caption-Anything is a versatile image processing tool that combines the capabilities of Segment Anything, Visual Captioning, and ChatGPT. Our solution generates descriptive captions for any object within an image, offering a range of language styles to accommodate diverse user preferences. It supports visual controls (mouse click) and language controls (length, sentiment, factuality, and language).

  • Visual controls and language controls for text generation
  • Chat about selected object for detailed understanding
  • Interactive demo

Along the River During the Qingming Festival (清明上河图)

🚀 Updates

  • 2023/04/30: support caption everything in a paragraph
  • 2023/04/25: We are delighted to introduce Track-Anything, an inventive project from our lab that offers a versatile and user-friendly solution for video object tracking and segmentation.
  • 2023/04/23: support langchain + VQA, better chatbox performance
  • 2023/04/20: add mouse trajectory as visual control (beta)
  • 2023/04/13: add Colab Tutorial Open in Colab
  • 2023/04/11: Release code

🕹️ Demo

Explore the interactive demo of Caption-Anything, which showcases its powerful capabilities in generating captions for various objects within an image. The demo allows users to control visual aspects by clicking on objects, as well as to adjust textual properties such as length, sentiment, factuality, and language.




🛠️ Getting Started

Linux

# Clone the repository:
git clone https://github.com/ttengwang/caption-anything.git
cd caption-anything

# Install dependencies (python version >= 3.8.1):
pip install -r requirements.txt

# Configure the necessary ChatGPT APIs
export OPENAI_API_KEY={Your_Private_Openai_Key}

# Run the Caption-Anything gradio demo.
python app_langchain.py --segmenter huge --captioner blip2 --port 6086  --clip_filter  # requires 13G GPU memory
#python app_langchain.py --segmenter base --captioner blip2 # requires 8.5G GPU memory
#python app_langchain.py --segmenter base --captioner blip # requires 5.5G GPU memory

# (Optional) Use the pre-downloaded SAM checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth ./sam_vit_h_4b8939.pth
python app_langchain.py --segmenter huge --captioner blip2 --port 6086 --segmenter_checkpoint ./sam_vit_b_01ec64.pth  # requires 11.7G GPU memory

Windows(powershell)

Tested in Windows11 using Nvidia 3070-8G.

# Clone the repository:
git clone https://github.com/ttengwang/caption-anything.git
cd caption-anything

# Install dependencies:
pip install -r requirements.txt

# Download the [base SAM checkpoints](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth).
Invoke-WebRequest https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth -OutFile ./sam_vit_b_01ec64.pth

# Configure the necessary ChatGPT APIs
$env:OPENAI_API_KEY = '{Your_Private_Openai_Key}'

# Run the Caption-Anything gradio demo.
python app_langchain.py --captioner blip --port 6086 --segmenter base # better chatbox via langchain + VQA
python app_langchain.py --captioner blip --port 6086 --segmenter base --segmenter_checkpoint ./sam_vit_b_01ec64.pth  # Use the pre-downloaded SAM checkpoints
python app.py --captioner blip --port 6086 --segmenter base 

💻 Usage

from caption_anything import CaptionAnything, parse_augment
args = parse_augment()
visual_controls = {
    "prompt_type":["click"],
    "input_point":[[500, 300], [1000, 500]],
    "input_label":[1, 0], # 1/0 for positive/negative points
    "multimask_output":"True",
}
language_controls = {
    "length": "30",
    "sentiment": "natural", # "positive","negative", "natural"
    "imagination": "False", # "True", "False"
    "language": "English" # "Chinese", "Spanish", etc.
}
model = CaptionAnything(args, openai_api_key)
out = model.inference(image_path, visual_controls, language_controls)

📖 Citation

If you find this work useful for your research, please cite our github repo:

@article{wang2023caption,
  title={Caption anything: Interactive image description with diverse multimodal controls},
  author={Wang, Teng and Zhang, Jinrui and Fei, Junjie and Ge, Yixiao and Zheng, Hao and Tang, Yunlong and Li, Zhe and Gao, Mingqi and Zhao, Shanshan and Shan, Ying and Zheng, Feng},
  journal={arXiv preprint arXiv:2305.02677},
  year={2023}
}

Acknowledgements

The project is based on Segment Anything, BLIP/BLIP-2, ChatGPT, Visual ChatGPT, GiT. Thanks for the authors for their efforts.

Contributor

Our project wouldn't be possible without the contributions of these amazing people! Thank you all for making this project better.

Teng Wang @ Southern University of Science and Technology & HKU & Tencent ARC Lab
Jinrui Zhang @ Southern University of Science and Technology
Junjie Fei @ Xiamen University
Zhe Li @ Southern University of Science and Technology
Yunlong Tang @ Southern University of Science and Technology
Mingqi Gao @ Southern University of Science and Technology & University of Warwick
Hao Zheng @ Southern University of Science and Technology

caption-anything's People

Contributors

developer0hye avatar eltociear avatar feielysia avatar gaomingqi avatar memoryunreal avatar ttengwang avatar y10ab1 avatar yunlong10 avatar zh-plus avatar zjr2000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

caption-anything's Issues

About interaction

Could this work achieve caption everything without any interaction like SAM?

Trajectory Support

Hello!

Thank you for this great work. I was wondering how you were able to input trajectories as a prompt for SAM? Did you just chain together a sequence of points?

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

Gradio demo does not seem to work

I have put in my OpenAI API key and loaded one of the example images that has a corgi. I clicked on anywhere on the picture and nothing seems to show up in the chatbox.

添加了 "--gradio_share" 后运行demo报错

直接运行demo可以成功运行,但是想开放联网端口,所以添加了"--gradio_share"
提示以下错误

Initializing ChatBot, load_dict={'VisualQuestionAnswering': 'cuda:0'}
Initializing VisualQuestionAnswering to cuda:0
/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/deprecation.py:43: UserWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 1.0}
  warnings.warn(
/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/deprecation.py:43: UserWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 0.4}
  warnings.warn(
/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/deprecation.py:43: UserWarning: You have unused kwarg parameters in Row, please remove them: {'scale': 0.5}
  warnings.warn(
/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/components.py:164: UserWarning: Unknown style parameter: scale
  warnings.warn(f"Unknown style parameter: {key}")
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/routes.py", line 238, in main
    return templates.TemplateResponse(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/templating.py", line 112, in TemplateResponse
    template = self.get_template(name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/templating.py", line 94, in get_template
    return self.env.get_template(name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/environment.py", line 1010, in get_template
    return self._load_template(name, globals)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/environment.py", line 969, in _load_template
    template = self.loader.load(self, name, self.make_globals(globals))
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/loaders.py", line 126, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/loaders.py", line 218, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: frontend/share.html

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/routes.py", line 244, in main
    raise ValueError(
ValueError: Did you install Gradio from source files? Share mode only works when Gradio is installed through the pip package.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/routes.py", line 238, in main
    return templates.TemplateResponse(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/templating.py", line 112, in TemplateResponse
    template = self.get_template(name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/templating.py", line 94, in get_template
    return self.env.get_template(name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/environment.py", line 1010, in get_template
    return self._load_template(name, globals)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/environment.py", line 969, in _load_template
    template = self.loader.load(self, name, self.make_globals(globals))
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/loaders.py", line 126, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/loaders.py", line 218, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: frontend/share.html

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/routes.py", line 244, in main
    raise ValueError(
ValueError: Did you install Gradio from source files? Share mode only works when Gradio is installed through the pip package.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/routes.py", line 238, in main
    return templates.TemplateResponse(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/templating.py", line 112, in TemplateResponse
    template = self.get_template(name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/templating.py", line 94, in get_template
    return self.env.get_template(name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/environment.py", line 1010, in get_template
    return self._load_template(name, globals)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/environment.py", line 969, in _load_template
    template = self.loader.load(self, name, self.make_globals(globals))
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/loaders.py", line 126, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/loaders.py", line 218, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: frontend/share.html

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/routes.py", line 244, in main
    raise ValueError(
ValueError: Did you install Gradio from source files? Share mode only works when Gradio is installed through the pip package.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/routes.py", line 238, in main
    return templates.TemplateResponse(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/templating.py", line 112, in TemplateResponse
    template = self.get_template(name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/templating.py", line 94, in get_template
    return self.env.get_template(name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/environment.py", line 1010, in get_template
    return self._load_template(name, globals)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/environment.py", line 969, in _load_template
    template = self.loader.load(self, name, self.make_globals(globals))
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/loaders.py", line 126, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/loaders.py", line 218, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: frontend/share.html

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/routes.py", line 244, in main
    raise ValueError(
ValueError: Did you install Gradio from source files? Share mode only works when Gradio is installed through the pip package.
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/routes.py", line 238, in main
    return templates.TemplateResponse(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/templating.py", line 112, in TemplateResponse
    template = self.get_template(name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/templating.py", line 94, in get_template
    return self.env.get_template(name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/environment.py", line 1010, in get_template
    return self._load_template(name, globals)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/environment.py", line 969, in _load_template
    template = self.loader.load(self, name, self.make_globals(globals))
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/loaders.py", line 126, in load
    source, filename, uptodate = self.get_source(environment, name)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/jinja2/loaders.py", line 218, in get_source
    raise TemplateNotFound(template)
jinja2.exceptions.TemplateNotFound: frontend/share.html

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 429, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/fastapi/routing.py", line 165, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/anyio/_backends/_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "/app/anaconda3/envs/caption-anything/lib/python3.8/site-packages/gradio/routes.py", line 244, in main
    raise ValueError(
ValueError: Did you install Gradio from source files? Share mode only works when Gradio is installed through the pip package.
Running on local URL:  http://0.0.0.0:6086
Running on public URL: https://0adaffcff18e829f67.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades (NEW!), check out Spaces: https://huggingface.co/spaces
^CKeyboard interruption in main thread... closing server.
Killing tunnel 0.0.0.0:6086 <> https://0adaffcff18e829f67.gradio.live

Demo crashes both in terminal and in colab

Hi,

Thank you for sharing your code with the community, truly great work!

I encountered some issues when running the app on my local machine and got the same result when running Colab. The problem appears to be related to the wrong build of bitsandbytes:

Initializing ImageCaptioning to cuda:0
Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning.

CUDA SETUP: CUDA runtime path found: /scratch/miniconda/envs/CSAM_2/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /scratch/miniconda/envs/CSAM_2/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards:   0%|                                                                                                                                                                | 0/2 [00:00<?, ?it/s]
Error named symbol not found at line 479 in file /mmfs1/gscratch/zlab/timdettmers/git/bitsandbytes/csrc/ops.cu

I tried running the colab and my session consistently crashes on this line:
captioning_model = Blip2ForConditionalGeneration.from_pretrained("Salesforce/blip2-opt-2.7b", device_map = "sequential", load_in_8bit = True)
It crashes after attempting to load the checkpoint shards as with the app.py script.

I tried various CUDA versions (11.3, 11.7) and the issue persists.
Could you please check if you encounter the same issue and give suggestions as to how to fix this problem? Maybe if you specify the exact version of bitsandbytes in requirements.txt this problem will be resolved?

Thanks!

Segmentation fault (core dumped)

When using the "caption everything in a paragraph" feature, I got an error "Segmentation fault (core dumped)" at the end.

I tried "ulimit -s unlimited",but it didn't work.

How to solve it??

[strange] Dense captions are all filtered out

Hi, I have some problems when trying some examples by runing caption_anything/model.py.

The dense captions are all filtered by min_ppl_score and min_clip_score in parse_dense_caption(). I noticed that ppl_score is always -100.0 and clip_score is always 0.0, for example:
{'generated_captions': {'raw_caption': 'there is a girl holding a cat and a dog in her arms'}, 'crop_save_path': 'result/crop_1699081274.5122437.png', 'mask_save_path': 'result/mask_1699081274.5082428.png', 'mask': <PIL.Image.Image image mode=RGB size=512x320 at 0x1D552934B80>, 'bbox': array([ 0, 1, 511, 317]), 'area': 81409, 'context_captions': [], 'ppl_score': -100.0, 'clip_score': 0.0},...}

I think the caption is reasonable and should not have that low score, have you met the similar problem or some guess about this?

Torch 1.10.1 not available

ERROR: Could not find a version that satisfies the requirement torch==1.10.1 (from versions: 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1)
ERROR: No matching distribution found for torch==1.10.1

can you bump or correct to the required Torch version to something confirmed to work correctly?

Related work

Hello! Thank you so much for the contribution of this repo.
I'm so interested in this work, and I'm suveying papers with key words like "captioning anything" or "instance level captioning" or "per pixel captioning". Would you like to recomand some related work to me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.