pharmapsychotic / clip-interrogator Goto Github PK

Image to prompt with BLIP and CLIP

License: MIT License

Jupyter Notebook 27.61% Python 72.39%

clip-interrogator's Introduction

clip-interrogator

Want to figure out what a good prompt might be to create new images like an existing one? The CLIP Interrogator is here to get you answers!

Run it!

🆕 Now available as a Stable Diffusion Web UI Extension! 🆕

Run Version 2 on Colab, HuggingFace, and Replicate!

Version 1 still available in Colab for comparing different CLIP models

About

The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion on DreamStudio to create cool art!

Using as a library

Create and activate a Python virtual environment

python3 -m venv ci_env
(for linux  ) source ci_env/bin/activate
(for windows) .\ci_env\Scripts\activate

Install with PIP

# install torch with GPU support for example:
pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu117

# install clip-interrogator
pip install clip-interrogator==0.5.4

# or for very latest WIP with BLIP2 support
#pip install clip-interrogator==0.6.0

You can then use it in your script

from PIL import Image
from clip_interrogator import Config, Interrogator
image = Image.open(image_path).convert('RGB')
ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))
print(ci.interrogate(image))

CLIP Interrogator uses OpenCLIP which supports many different pretrained CLIP models. For the best prompts for Stable Diffusion 1.X use ViT-L-14/openai for clip_model_name. For Stable Diffusion 2.0 use ViT-H-14/laion2b_s32b_b79k

Configuration

The Config object lets you configure CLIP Interrogator's processing.

clip_model_name: which of the OpenCLIP pretrained CLIP models to use
cache_path: path where to save precomputed text embeddings
download_cache: when True will download the precomputed embeddings from huggingface
chunk_size: batch size for CLIP, use smaller for lower VRAM
quiet: when True no progress bars or text output will be displayed

On systems with low VRAM you can call config.apply_low_vram_defaults() to reduce the amount of VRAM needed (at the cost of some speed and quality). The default settings use about 6.3GB of VRAM and the low VRAM settings use about 2.7GB.

See the run_cli.py and run_gradio.py for more examples on using Config and Interrogator classes.

Ranking against your own list of terms (requires version 0.6.0)

from clip_interrogator import Config, Interrogator, LabelTable, load_list
from PIL import Image

ci = Interrogator(Config(blip_model_type=None))
image = Image.open(image_path).convert('RGB')
table = LabelTable(load_list('terms.txt'), 'terms', ci)
best_match = table.rank(ci.image_to_features(image), top_count=1)[0]
print(best_match)

clip-interrogator's People

Contributors

Stargazers

Watchers

Forkers

jags111 altryne tylerut wonkygrub c00renut iammarcin afiaka87 icodein njbx genekogan marcus-arcadius nylki giantmonster ahousehouse airt-ai rodsir xiankgx cnnrhill ianderrington corajr 0x1355 bartman081523 divineomega darktwain greentext2 kriti-agrawal-52 fragmede conscious-data andargor fastrocket sappkevin opencoca samrahimi wayson20 qixinsame eliasaronsson0 isprogenic techthiyanes dingguijin tabalov skittoo zetimente chinoll damian0815 furiousfemmeyazeth csala ls-urrutia jaimeromeroviana yanghanwen unlimiteddreamco breyness dv6230 ndehouche eliselvnnnn amrrs lkwq007 hackerfriendly w-v-r stofi jaedukseo p2enjoy 3vfxvc bjmeo8 vackosar chenxwh dvasya tcfrancis glendaion ichristgit lortc3000 astroluminous lakshynagori maplepy sonicviz moarshy cheng539539 dustycooper ruohoruotsi iamdroppy earronyu jon-chun brandonlucasbytes tungvuthanh iarrationality progamergov voctory deancyl sprites20 twodukes sergiobr wildgenie kaiwi naotokui thuskey xurann seanward merfnad rne1223 artificialguybr sleeplessinva

clip-interrogator's Issues

unable to download any model using Config()

The model download connection always breaks after only downloading a few percent.
If I knew the model urls I could try tcloning the model by executing git lfs
Please advise

Regarding flavors.txt

Can you give some details about how the file flavors.txt was generated.
Is it any standard set of terms..

BLIP2 support?

Pip intall issue with the example

After installing it from pip and try to run the example i got the following error:
ImportError: cannot import name 'Interrogator' from partially initialized module 'clip_interrogator' (most likely due to a circular import) (/usr/local/lib/python3.8/dist-packages/clip_interrogator/init.py)

I use Python 3.8.2

I have problem using clip interogate

its keeps showing me this error
Traceback (most recent call last):
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 275, in run_predict
output = await app.blocks.process_api(
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 787, in process_api
result = await self.call_function(fn_index, inputs, iterator)
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 694, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\modules\ui.py", line 389, in interrogate
prompt = shared.interrogator.interrogate(image)
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\modules\interrogate.py", line 173, in interrogate
res += ""
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

Installation error during "Download preprocessed cache files..."

During the CLIP Interrogator install the line that reads
from clip_interrogator import Config, Interrogator
throws an error in the [numpy/init.py] in getattr(attr) module
error:
module 'numpy' has no attribute '_no_nep50_warning'

which ultimately led me to this page: https://numpy.org/neps/nep-0050-scalar-promotion.html

but I don't know how to fix it. Or I would have already.

NameError: name 'Image' is not defined_

I'm experiencing a "name error" problem using your script.

I copied the notebook to my Google Drive, then I made the setup.
If I run the notebook using your supplied reference image, everything work correctly.
If I try to swap the image url with another one, for example:
https://pasteboard.co/64dPaDu0T35Q.png

I got the following error:

_NameError Traceback (most recent call last)
in ()
37
38 if str(image_path_or_url).startswith('http://') or str(image_path_or_url).startswith('https://'):
---> 39 image = Image.open(requests.get(image_path_or_url, stream=True).raw).convert('RGB')
40 else:
41 image = Image.open(image_path_or_url).convert('RGB')

NameError: name 'Image' is not defined_

Any help will be greatly appreciated

Do the images I insert remain private or public?

If I insert an image to derive text, is the image published somewhere or does it remain private?

how do i install clip-interrogator in my local computer?

Hello, how do I install the clip interrogator on my local computer? i use linux

Loading Flavor chain too slow? Is there a better way?

I am using the following code based on the readme. I am running on Windows 10 using Conda. I'm using a 3090 with 24GB VRAM

import os
from PIL import Image
from clip_interrogator import Config, Interrogator

folder_path = "/path/to/input"
output_folder = "/path/to/output"

for image_name in os.listdir(folder_path):
    if image_name.endswith(".png"):
        image_path = os.path.join(folder_path, image_name)
        image = Image.open(image_path).convert('RGB')
        ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))
        output = ci.interrogate(image)
        output_path = os.path.splitext(image_name)[0] + ".txt"
        output_path = os.path.join(output_folder, output_path)
        with open(output_path, "w") as f:
            f.write(output)

great work!

Really fascinating! And in some cases I could "reproduce" nearly exactly the image from the prompt in Stable Diffusion. Wow!!
Just two questions I am interested in from the technical view:

Why can it happen that the prompt from the same image can be different when restarting the process? Does the code start with a random noise, but here on the token embedding side?
Where are the differences between fast and best mode? The fast mode let me process the image in less than 0.1 seconds. The best mode takes 8 seconds. The result of the best mode is better but not 80 times better. ;) I would say "a bit" better. The classic mode on the other side outputed a rather bad result. To be honest my statistical material is restricted to my very last prompt generation. ;) But even if more tests would come to different results my questions would still not be answered.

Best regards
Marc

P.S: I am on cuda, RTX 4090

Is output length hardcoded?

Hello, @pharmapsychotic!

I enjoy using your tool very much. But it seems that your top item rankings length are hardcoded and cannot be changed in the config nor passes as a function argument when using 'clip-interrogator' from pip.

clip-interrogator/clip_interrogator/clip_interrogator.py

Line 178 in 8f5ddce

for _ in tqdm(range(25), desc="Flavor chain"):

clip-interrogator/clip_interrogator/clip_interrogator.py

Line 135 in 8f5ddce

tops = merged.rank(image_features, 32)

[Google Colab] - Something went wrong - Unexpected end of JSON input

Gradio obfuscates the actual error, developer tools error console output suggests to me this is an issue with a svelte (gradio UI) component expecting json input but getting an empty value instead

[Request] Add openclip models!

Please! This would be so cool!

BLIP med_config can't be found

Hello, when I install your git+https://github.com/pharmapsychotic/BLIP.git@lib#egg=blip BLIP fork, without the -e option, clip-interrogator can't find the med_config file

pip install git+https://github.com/pharmapsychotic/BLIP.git@lib#egg=blip
pip install clip-interrogator==0.3.2

pip install -e git+https://github.com/pharmapsychotic/BLIP.git@lib#egg=blip
pip install clip-interrogator==0.3.2

maybe there's is a better way to locate the config file?

clip-interrogator/clip_interrogator/clip_interrogator.py

Lines 55 to 63 in 180cbc4

 blip_path = os.path.dirname(inspect.getfile(blip_decoder)) 

 configs_path = os.path.join(os.path.dirname(blip_path), 'configs') 

 med_config = os.path.join(configs_path, 'med_config.json') 

 blip_model = blip_decoder( 

 pretrained=config.blip_model_url, 

 image_size=config.blip_image_eval_size, 

 vit='large', 

 med_config=med_config 

 )

Any way to add more artists/keywords?

Is it possible to add more stuff to the database of keywords? I have a pretty big database of words that could be added to increase the accuracy, does it need to be retrained when I would add them or is it possible to just add them?
I'm using it to caption my image dataset so I think it would be better if i can add more words

remove the artist keyword

hi
is there any way to remove the artist name from the generated title

Pillow import error

Using a clean conda environment, and installing with pip. When running an example, all I get is:
ImportError: cannot import name 'PILLOW_VERSION' from 'PIL' (/media/evilscript/DATA/Conda/ClipInterrogator/lib/python3.11/site-packages/PIL/__init__.py)

Maybe downgrading pillow is the solution?

Colab Link out of date to finetuning

Hi,

The finetuning Colab looks like it was deleted by the author.
I wasn't able to find a good alternative that was similar. So might be worth removing from the Colab or updating with another if you know of one.

This link --> https://colab.research.google.com/drive/1vrh_MUSaAMaC5tsLWDxkFILKJ790Z4Bl?usp=sharing

Thanks!

RuntimeError: expected scalar type Float but found Half

I've been getting this error in my M1 lately:

│ /opt/homebrew/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py:205 in │
│ interrogate │
│ │
│ 202 │ │ │ best_prompt = t.rank(image_features, 1)[0] │
│ 203 │ │ │ best_sim = self.similarity(image_features, best_prompt) │
│ 204 │ │ │
│ ❱ 205 │ │ check_multi_batch([best_medium, best_artist, best_trending, best_movement]) │
│ 206 │ │ │
│ 207 │ │ extended_flavors = set(flaves) │
│ 208 │ │ for _ in tqdm(range(max_flavors), desc="Flavor chain", disable=self.config.quiet │
│ │
│ /opt/homebrew/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py:202 in │
│ check_multi_batch │
│ │
│ 199 │ │ │ │ prompts.append(prompt) │
│ 200 │ │ │ │
│ 201 │ │ │ t = LabelTable(prompts, None, self.clip_model, self.tokenize, self.config) │
│ ❱ 202 │ │ │ best_prompt = t.rank(image_features, 1)[0] │
│ 203 │ │ │ best_sim = self.similarity(image_features, best_prompt) │
│ 204 │ │ │
│ 205 │ │ check_multi_batch([best_medium, best_artist, best_trending, best_movement]) │
│ │
│ /opt/homebrew/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py:295 in rank │
│ │
│ 292 │ │
│ 293 │ def rank(self, image_features: torch.Tensor, top_count: int=1) -> List[str]: │
│ 294 │ │ if len(self.labels) <= self.chunk_size: │
│ ❱ 295 │ │ │ tops = self._rank(image_features, self.embeds, top_count=top_count) │
│ 296 │ │ │ return [self.labels[i] for i in tops] │
│ 297 │ │ │
│ 298 │ │ num_chunks = int(math.ceil(len(self.labels)/self.chunk_size)) │
│ │
│ /opt/homebrew/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py:289 in _rank │
│ │
│ 286 │ │ top_count = min(top_count, len(text_embeds)) │
│ 287 │ │ text_embeds = torch.stack([torch.from_numpy(t) for t in text_embeds]).to(self.de │
│ 288 │ │ with torch.cuda.amp.autocast(): │
│ ❱ 289 │ │ │ similarity = image_features @ text_embeds.T │
│ 290 │ │ _, top_labels = similarity.float().cpu().topk(top_count, dim=-1) │
│ 291 │ │ return [top_labels[0][i].numpy() for i in range(top_count)] │
│ 292

Does this work on CPU?

Out of Memory - Google Colab

I am running the CLIP interrogator in Google Colab. It was working fine yesterday but it seems like Pytorch is allocating a significant amount of memory?

CUDA out of memory. Tried to allocate 6.76 GiB (GPU 0; 14.75 GiB total capacity; 11.73 GiB already allocated; 1.93 GiB free; 11.78 GiB reserved in total by PyTorch)

Is there a solution for this/what is causing this?

AttributeError: module 'torch.backends' has no attribute 'mps'

Fatal error on start:

Traceback (most recent call last):
File "F:\projects\AI\CLIP-Interrogator\run_cli.py", line 9, in
from clip_interrogator import Interrogator, Config
File "d:\Users\nailz\anaconda3\lib\site-packages\clip_interrogator_init_.py", line 1, in
from .clip_interrogator import Interrogator, Config
File "d:\Users\nailz\anaconda3\lib\site-packages\clip_interrogator\clip_interrogator.py", line 43, in
class Config:
File "d:\Users\nailz\anaconda3\lib\site-packages\clip_interrogator\clip_interrogator.py", line 65, in Config
device: str = ("mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu")
AttributeError: module 'torch.backends' has no attribute 'mps'

Windows 11 intel x64, python 3.10.7 or conda with 3.9

Difference between the tokenizer of sd 2.0 and ViT-H.

In advance, your work, CLIP Interrogator, is great!

However, It seems that the tokenizer of sd 2.0 is different from one of ViT-H.
For example, here is the test code.

from transformers import AutoTokenizer, CLIPTokenizer

prompt="!!"

tokenizer1 = AutoTokenizer.from_pretrained("laion/CLIP-ViT-H-14-laion2B-s32B-b79K")
tokenizer2 = CLIPTokenizer.from_pretrained("stabilityai/stable-diffusion-2", subfolder="tokenizer")

tokens1 = tokenizer1.tokenize(prompt)
tokens2 = tokenizer2.tokenize(prompt)

print(len(tokens1),len(tokens2))

I have got the output.

1 2

Why the difference occur?

Failed to load BLIP

Made a clean install and tried the sample snippet on the wiki and kept getting the same error.
Any advice to get past this point?
Win11
Cuda 11.6
python 3.10.6

Loading BLIP model...
Traceback (most recent call last):
  File "C:\ken\style-transfer\diff\clip-interrogator\testblip.py", line 4, in <module>
    ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))
  File "C:\ken\style-transfer\diff\clip-interrogator\clip_interrogator\clip_interrogator.py", line 58, in __init__
    blip_model = blip_decoder(
  File "c:\ken\style-transfer\diff\clip-interrogator\src\blip\models\blip.py", line 175, in blip_decoder
    model,msg = load_checkpoint(model,pretrained)
  File "c:\ken\style-transfer\diff\clip-interrogator\src\blip\models\blip.py", line 218, in load_checkpoint
    checkpoint = torch.load(cached_file, map_location='cpu')
  File "C:\Users\ken\Anaconda2\envs\StableDiffusion\lib\site-packages\torch\serialization.py", line 777, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "C:\Users\ken\Anaconda2\envs\StableDiffusion\lib\site-packages\torch\serialization.py", line 282, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

How to use gpu nvidia RTX

How to use gpu nvidia RTX 3070 .py file without anaconda

Can you add a paper in arxiv with details of implementation

Can you add a paper in arxiv with details.

@pharmapsychotic

Difference between implementation with Web UI.

Differences with implementation at
https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/interrogate.py

The model loaded in other repository is
model_url='https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_caption_capfilt_large.pth',
where as clip interrogator uses
blip_model_url: str = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_large_caption.pth'

TypeError: 'type' object is not subscriptable

Traceback (most recent call last):
File "run_cli.py", line 9, in
from clip_interrogator import Interrogator, Config
File "D:\sd14\clip-interrogator-with-less-VRAM-main\clip_interrogator_init_.py", line 1, in
from .clip_interrogator import Interrogator, Config
File "D:\sd14\clip-interrogator-with-less-VRAM-main\clip_interrogator\clip_interrogator.py", line 82, in
class Interrogator:
File "D:\sd14\clip-interrogator-with-less-VRAM-main\clip_interrogator\clip_interrogator.py", line 349, in Interrogator
def _first_bit_batch(self, images: list[Image]) -> (list[str], list[torch.Tensor]):
TypeError: 'type' object is not subscriptable

New instructions to get running local?

HI, I had this running local fine, but just updated and broke it. Is there new process?

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_17848\3198441247.py in <module>
     34 os.makedirs('cache', exist_ok=True)
     35 for url in CACHE_URLS:
---> 36     print(subprocess.run(['wget', url, '-P', 'cache'], stdout=subprocess.PIPE).stdout.decode('utf-8'))
     37 
     38 

Anaconda3\lib\subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    503         kwargs['stderr'] = PIPE
    504 
--> 505     with Popen(*popenargs, **kwargs) as process:
    506         try:
    507             stdout, stderr = process.communicate(input, timeout=timeout)

Anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask)
    949                             encoding=encoding, errors=errors)
    950 
--> 951             self._execute_child(args, executable, preexec_fn, close_fds,
    952                                 pass_fds, cwd, env,
    953                                 startupinfo, creationflags, shell,

Anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_gid, unused_gids, unused_uid, unused_umask, unused_start_new_session)
   1418             # Start the process
...
-> 1420                 hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
   1421                                          # no special security
   1422                                          None, None,

FileNotFoundError: [WinError 2] The system cannot find the file specified

Clip-Interrogator 2 error, RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

I tried using an image and got the following error:

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-5-f6c038c2ae8b>](https://localhost:8080/#) in go(btn)
     47         display(thumb)
     48 
---> 49         prompt = interrogate(image)
     50         IPython.display.clear_output()
     51         show_ui()

6 frames

[/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py](https://localhost:8080/#) in normalize(tensor, mean, std, inplace)
    957     if std.ndim == 1:
    958         std = std.view(-1, 1, 1)
--> 959     tensor.sub_(mean).div_(std)
    960     return tensor
    961 

RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

Issue with SD2

When running SD2 I' getting the following error:
Command: ci = Interrogator(Config(clip_model_name="ViT-H-14/laion2b_s32b_b79k"))

RuntimeError: Model ViT-H-14/laion2b_s32b_b79k not found; available models = ['RN50', 'RN101', 'RN50x4', 'RN50x16', 'RN50x64', 'ViT-B/32', 'ViT-B/16', 'ViT-L/14', 'ViT-L/14@336px']

Should I prepare my own image for training model?

I am beginner of clip. Having look at the code, I find that the project size is 2MB and did not find any trained model. Should I prepare training material for this project?

The size of tensor a (8) must match the size of tensor b (64) at non-singleton dimension 0

Hello! I can't start interogator-ext. Startup stops with "Exception <class 'RuntimeError'>". This error falls in the thread title, Internal CLIP and DeepBooru work. Can anyone help with this problem?

Optimisations for 8GB VRAM?

Are there any optimisations to get this running on a 8GB RTX 3060ti ? From what I understand it currently needs 12GB

Trying to use with diffusers/upgrade to recent transformers

I moved the transformers install to version 4.24.0 and after the BLIP/CLIP which got rid of the "module transformers has no attribute FeatureExtractionMixin" problem but now it is doing this:

The following model_kwargs are not used by the model: ['encoder_hidden_states', 'encoder_attention_mask']

I realize this is a problem with Transformers, maybe not CI but I can't 100% tell anymore so I thought it might be something you need to know about. Right now Diffusers and CLIP Interrogator can not run in the same environment.

Why are the preprocessed files marked unsafe?

Only huggingface page i've seen with this, and now they are set to auto-download? Would it be possible to support safetensors instead?

demo is failing

Prediction input failed validation: {"detail":[{"loc":["body","input","clip_model_name"],"msg":"value is not a valid enumeration member; permitted: 'ViT-L-14/openai', 'ViT-H-14/laion2b_s32b_b79k'","type":"type_error.enum","ctx":{"enum_values":["ViT-L-14/openai","ViT-H-14/laion2b_s32b_b79k"]}}]}

only using 10% of my gpu

hello, I managed to get the new interrogator to run locally but it only uses a fraction of my gpu power (10%) any way to optimize this?
I'm trying to caption my database that has millions of images so it would be nice if it could be optimized.
thanks 😄

Seems to think everything is blue?

Hi love the tool, have been experimenting with it a bunch but one thing I've noticed is that it seems to think everything is blue, especially an issue when it comes to skin tones

Has anyone else encountered this issue? Maybe I need to preprocess the files differently?

I am using ViT-L-14/openai

Discriminatory terms present in flavors.txt

A quick CTRL+F revealed at least one term in flavors.txt that uses a slur --

I wasn't able to find many more, so not sure if this is a one-off.

[Request] Implementation into Automatic1111 WebUI?

In the title.
Is there a simple solution how this can be implemented as extension or so?
Would be awesome.

'charmap' codec can't encode character '\U0001f380'

I don't know exactly where did this come from but here is the error I got:

Traceback (most recent call last):
  File "E:\AIML\data\data\caption.py", line 17, in <module>
    f.write(output)
  File "C:\Users\Aivan\miniconda3\envs\st\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f380' in position 105: character maps to <undefined>

code:

import os
from PIL import Image
from clip_interrogator import Config, Interrogator

folder_path = "/path/to/input"
output_folder = "/path/to/output"
ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))

for image_name in os.listdir(folder_path):
    if image_name.endswith(".png"):
        image_path = os.path.join(folder_path, image_name)
        image = Image.open(image_path).convert('RGB')
        output = ci.interrogate(image)
        output_path = os.path.splitext(image_name)[0] + ".txt"
        output_path = os.path.join(output_folder, output_path)
        with open(output_path, "w") as f:
            f.write(output)

UnicodeDecodeError on Windows

When installing it on Ubuntu, these is no problem. But when installing it on Windows, there is 'gbk' UnicodeDecodeError. I guess the issue should be fixed easy by the author. Thanks!!

(venv37) PS E:\_Ai\clip_interrogator> pip install clip-interrogator
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting clip-interrogator
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/b9/c0/ac7d63330d69bacbe00e057d54db484560020d5acc69f26df8b64f2c3e85/clip-interrogator-0.4.1.tar.gz (786 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [15 lines of output]
      Traceback (most recent call last):
        File "e:\_ai\clip_interrogator\venv37\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
        File "e:\_ai\clip_interrogator\venv37\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "e:\_ai\clip_interrogator\venv37\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "C:\Users\zhuos\AppData\Local\Temp\pip-build-env-dd06cv8m\overlay\Lib\site-packages\setuptools\build_meta.py", line 338, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "C:\Users\zhuos\AppData\Local\Temp\pip-build-env-dd06cv8m\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in _get_build_requires
          self.run_setup()
        File "C:\Users\zhuos\AppData\Local\Temp\pip-build-env-dd06cv8m\overlay\Lib\site-packages\setuptools\build_meta.py", line 335, in run_setup
          exec(code, locals())
        File "<string>", line 14, in <module>
      UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 469: illegal multibyte sequence
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Out of Memory

Loading CLIP model...
ViT-L-14_openai_artists.safetensors: 100%|████████████████████████████████████████| 16.2M/16.2M [00:00<00:00, 39.6MB/s]
ViT-L-14_openai_flavors.safetensors: 100%|██████████████████████████████████████████| 155M/155M [00:02<00:00, 55.0MB/s]
ViT-L-14_openai_mediums.safetensors: 100%|██████████████████████████████████████████| 146k/146k [00:00<00:00, 3.55MB/s]
ViT-L-14_openai_movements.safetensors: 100%|████████████████████████████████████████| 307k/307k [00:00<00:00, 2.69MB/s]
ViT-L-14_openai_trendings.safetensors: 100%|████████████████████████████████████████| 111k/111k [00:00<00:00, 2.53MB/s]
ViT-L-14_openai_negative.safetensors: 100%|███████████████████████████████████████| 63.2k/63.2k [00:00<00:00, 1.55MB/s]
Loaded CLIP model and data in 11.78 seconds.
CUDA out of memory. Tried to allocate 224.00 MiB (GPU 0; 6.00 GiB total capacity; 4.05 GiB already allocated; 0 bytes free; 4.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Ran out of VRAM!

Maybe you could add some more buttons to the UI and run this stuff in stages/passes so can get all the data without loading all these models into memory at once. Maybe just a "I have no VRAM and I must scream" button.

if exists file x.txt append new line
something like that?

What is the source of the data?

Thank you for the great work!
I was wondering where you curated the list of different "flavors.txt", "mediums.txt", etc in the data folder.
Is the data from prompt dataset?
What rules did you use to extract them and classify them?
Thanks in advance!

The size of tensor a (64) must match the size of tensor b (4096) at non-singleton dimension 0

After I get The size of tensor a (64) must match the size of tensor b (4096) at non-singleton dimension 0, I then get out of memory error if I rerun on Colab. It was working flawlessly until I interrupted a job, but I don't know why that would matter since I have re-run it on multiple computers.

flavors text

Where find base for flavors.txt

Format for image input?

Trying to use this api in node using replicate and having trouble getting the image into a file format that the api will accept. I have tried a number of things, data url (which worked in a similar situation with an audio analysis api from replicate), creating a readable stream as in the openai docs, i tried using sharp to convert to flatten the image and mimic what was done in the python example from the api docs on replicate.

Any help appreciated!

Training further on custom data

Is there any way we can train the model further on some custom data.

Thank you

	blip_path = os.path.dirname(inspect.getfile(blip_decoder))
	configs_path = os.path.join(os.path.dirname(blip_path), 'configs')
	med_config = os.path.join(configs_path, 'med_config.json')
	blip_model = blip_decoder(
	pretrained=config.blip_model_url,
	image_size=config.blip_image_eval_size,
	vit='large',
	med_config=med_config
	)

pharmapsychotic / clip-interrogator Goto Github PK

clip-interrogator's Introduction

clip-interrogator

Run it!

About

Using as a library

Configuration

Ranking against your own list of terms (requires version 0.6.0)

clip-interrogator's People

Contributors

Stargazers

Watchers

Forkers

clip-interrogator's Issues

Recommend Projects

Recommend Topics

Recommend Org