cobanov / image-captioning Goto Github PK

View Code? Open in Web Editor NEW

33.0 2.0 7.0 28.85 MB

Image captioning using python and BLIP

Python 100.00%

image-captioning blip image-text-retrieval img2text vision-language visual-reasoning

image-captioning's Introduction

Image Captioning

Captioning is an img2txt model that uses the BLIP. Exports captions of images.

Checkpoints [Required]

If there is no 'Checkpoints' folder, the script will automatically create the folder and download the model file, you can do this manually if you want.

Download the fine-tuned checkpoint and copy into 'checkpoints' folder (create if does not exists)

BLIP-Large

Demo

datasets\0.jpg, a piece of cheese with figs and a piece of cheese
datasets\1002.jpg, a close up of a yellow flower with a green background
datasets\1005.jpg, a planter filled with lots of colorful flowers
datasets\1008.jpg, a teacher standing in front of a classroom full of children
datasets\1011.jpg, a tortoise on a white background with a white background
datasets\1014.jpg, a glass of wine sitting on top of a table
datasets\1017.jpg, a close up of a plant with pink flowers
datasets\102.jpg, a platter of different types of sushi
datasets\1020.jpg, a frog sitting on top of a bamboo stick
datasets\1023.jpg, a revolver on a white background
datasets\1026.jpg, a woman holding a small white dog in her arms
datasets\1029.jpg, a woman in a business suit standing in front of a building
datasets\1032.jpg, sliced cucumber on a white background
datasets\1035.jpg, a woman in glasses and a pair of boxing gloves
datasets\1038.jpg, a pile of sliced potatoes on a white surface
datasets\1041.jpg, two glasses of orange juice on a wooden table
datasets\1044.jpg, a woman sitting on the floor in front of a door

Usage

usage: inference.py [-h] [-i INPUT] [-b BATCH] [-p PATHS] [-g GPU_ID]        

Image caption CLI

optional arguments:
  -h, --help                      show this help message and exit
  -i INPUT,  --input INPUT        Input directoryt path, such as ./images
  -b BATCH,  --batch BATCH        Batch size
  -p PATHS,  --paths PATHS        A any.txt files contains all image paths.
  -g GPU_ID, --gpu-id GPU_ID      gpu device to use (default=0) can be 0,1,2 for multi-gpu

Example

python inference.py -i /path/images/folder --batch 8 --gpu 0

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

image-captioning's People

Stargazers

Watchers

Forkers

volkerjaenisch jingzhengli drmweigand penah1350 bcashleeprzywara gulsahklc xxmistacruzxx

image-captioning's Issues

MultiGPU does not work

MultiGPU doesnt work as 0,1 is not a valid input for integer parameter.
This results in:
inference.py: error: argument -g/--gpu-id: invalid int value: '0,1'

RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

All of my images are 256x256 pixels (taken from images sample folder, just some of the 256x256 ones)

C:\git\image-captioning>python inference.py -i C:\git\image-captioning\inputs --batch 3 --gpu 0
Device: cpu
Images found: 8
Split size: 2
Checkpoint loading...
load checkpoint from ./checkpoints/model_large_caption.pth

Model to cpu
Inference started
0batch [00:02, ?batch/s]
Traceback (most recent call last):
File "C:\git\image-captioning\inference.py", line 88, in
caption = model.generate(
File "C:\git\image-captioning\models\blip.py", line 201, in generate
outputs = self.text_decoder.generate(
File "C:\Python\Python310\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 1752, in generate
return self.beam_search(
File "C:\Python\Python310\lib\site-packages\transformers\generation\utils.py", line 3091, in beam_search
outputs = self(
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\git\image-captioning\models\med.py", line 886, in forward
outputs = self.bert(
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\git\image-captioning\models\med.py", line 781, in forward
encoder_outputs = self.encoder(
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\git\image-captioning\models\med.py", line 445, in forward
layer_outputs = layer_module(
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\git\image-captioning\models\med.py", line 361, in forward
cross_attention_outputs = self.crossattention(
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\git\image-captioning\models\med.py", line 277, in forward
self_outputs = self.self(
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\git\image-captioning\models\med.py", line 178, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

my images are 256x256 pixels

/content/image-captioning
2023-09-02 18:30:18.889829: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Device: cuda:0
Images found: 263
Split size: 263
Checkpoint loading...
load checkpoint from ./checkpoints/model_large_caption.pth

Model to cuda:0
Inference started
0batch [00:01, ?batch/s]
Traceback (most recent call last):
File "/content/image-captioning/inference.py", line 88, in
caption = model.generate(
File "/content/image-captioning/models/blip.py", line 201, in generate
outputs = self.text_decoder.generate(
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1675, in generate
return self.beam_search(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3014, in beam_search
outputs = self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/image-captioning/models/med.py", line 886, in forward
outputs = self.bert(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/image-captioning/models/med.py", line 781, in forward
encoder_outputs = self.encoder(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/image-captioning/models/med.py", line 445, in forward
layer_outputs = layer_module(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/image-captioning/models/med.py", line 361, in forward
cross_attention_outputs = self.crossattention(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/image-captioning/models/med.py", line 277, in forward
self_outputs = self.self(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/image-captioning/models/med.py", line 178, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

cobanov / image-captioning Goto Github PK

image-captioning's Introduction

Image Captioning

Checkpoints [Required]

Demo

Usage

Example

Contributing

License

image-captioning's People

Stargazers

Watchers

Forkers

image-captioning's Issues

MultiGPU does not work

RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

RuntimeError: The size of tensor a (3) must match the size of tensor b (9) at non-singleton dimension 0

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent