Giter VIP home page Giter VIP logo

clip-interrogator's Introduction

clip-interrogator

Want to figure out what a good prompt might be to create new images like an existing one? The CLIP Interrogator is here to get you answers!

Run it!

๐Ÿ†• Now available as a Stable Diffusion Web UI Extension! ๐Ÿ†•


Run Version 2 on Colab, HuggingFace, and Replicate!

Open In Colab Generic badge Replicate Lambda


Version 1 still available in Colab for comparing different CLIP models

Open In Colab

About

The CLIP Interrogator is a prompt engineering tool that combines OpenAI's CLIP and Salesforce's BLIP to optimize text prompts to match a given image. Use the resulting prompts with text-to-image models like Stable Diffusion on DreamStudio to create cool art!

Using as a library

Create and activate a Python virtual environment

python3 -m venv ci_env
(for linux  ) source ci_env/bin/activate
(for windows) .\ci_env\Scripts\activate

Install with PIP

# install torch with GPU support for example:
pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu117

# install clip-interrogator
pip install clip-interrogator==0.5.4

# or for very latest WIP with BLIP2 support
#pip install clip-interrogator==0.6.0

You can then use it in your script

from PIL import Image
from clip_interrogator import Config, Interrogator
image = Image.open(image_path).convert('RGB')
ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))
print(ci.interrogate(image))

CLIP Interrogator uses OpenCLIP which supports many different pretrained CLIP models. For the best prompts for Stable Diffusion 1.X use ViT-L-14/openai for clip_model_name. For Stable Diffusion 2.0 use ViT-H-14/laion2b_s32b_b79k

Configuration

The Config object lets you configure CLIP Interrogator's processing.

  • clip_model_name: which of the OpenCLIP pretrained CLIP models to use
  • cache_path: path where to save precomputed text embeddings
  • download_cache: when True will download the precomputed embeddings from huggingface
  • chunk_size: batch size for CLIP, use smaller for lower VRAM
  • quiet: when True no progress bars or text output will be displayed

On systems with low VRAM you can call config.apply_low_vram_defaults() to reduce the amount of VRAM needed (at the cost of some speed and quality). The default settings use about 6.3GB of VRAM and the low VRAM settings use about 2.7GB.

See the run_cli.py and run_gradio.py for more examples on using Config and Interrogator classes.

Ranking against your own list of terms (requires version 0.6.0)

from clip_interrogator import Config, Interrogator, LabelTable, load_list
from PIL import Image

ci = Interrogator(Config(blip_model_type=None))
image = Image.open(image_path).convert('RGB')
table = LabelTable(load_list('terms.txt'), 'terms', ci)
best_match = table.rank(ci.image_to_features(image), top_count=1)[0]
print(best_match)

clip-interrogator's People

Contributors

altryne avatar amrrs avatar chenxwh avatar drewwalkup avatar harrywang avatar maplepy avatar pharmapsychotic avatar starovoitovs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

clip-interrogator's Issues

unable to download any model using Config()

The model download connection always breaks after only downloading a few percent.
If I knew the model urls I could try tcloning the model by executing git lfs
Please advise

Regarding flavors.txt

Can you give some details about how the file flavors.txt was generated.
Is it any standard set of terms..

Pip intall issue with the example

After installing it from pip and try to run the example i got the following error:
ImportError: cannot import name 'Interrogator' from partially initialized module 'clip_interrogator' (most likely due to a circular import) (/usr/local/lib/python3.8/dist-packages/clip_interrogator/init.py)

I use Python 3.8.2

I have problem using clip interogate

its keeps showing me this error
Traceback (most recent call last):
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 275, in run_predict
output = await app.blocks.process_api(
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 787, in process_api
result = await self.call_function(fn_index, inputs, iterator)
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 694, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\venv\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\modules\ui.py", line 389, in interrogate
prompt = shared.interrogator.interrogate(image)
File "C:\AI\stable-diffusion-webui\stable-diffusion-webui\modules\interrogate.py", line 173, in interrogate
res += ""
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'str'

NameError: name 'Image' is not defined_

I'm experiencing a "name error" problem using your script.

I copied the notebook to my Google Drive, then I made the setup.
If I run the notebook using your supplied reference image, everything work correctly.
If I try to swap the image url with another one, for example:
https://pasteboard.co/64dPaDu0T35Q.png

I got the following error:

_NameError Traceback (most recent call last)
in ()
37
38 if str(image_path_or_url).startswith('http://') or str(image_path_or_url).startswith('https://'):
---> 39 image = Image.open(requests.get(image_path_or_url, stream=True).raw).convert('RGB')
40 else:
41 image = Image.open(image_path_or_url).convert('RGB')

NameError: name 'Image' is not defined_

Any help will be greatly appreciated

Loading Flavor chain too slow? Is there a better way?

I am using the following code based on the readme. I am running on Windows 10 using Conda. I'm using a 3090 with 24GB VRAM

import os
from PIL import Image
from clip_interrogator import Config, Interrogator

folder_path = "/path/to/input"
output_folder = "/path/to/output"

for image_name in os.listdir(folder_path):
    if image_name.endswith(".png"):
        image_path = os.path.join(folder_path, image_name)
        image = Image.open(image_path).convert('RGB')
        ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))
        output = ci.interrogate(image)
        output_path = os.path.splitext(image_name)[0] + ".txt"
        output_path = os.path.join(output_folder, output_path)
        with open(output_path, "w") as f:
            f.write(output)

image

great work!

Really fascinating! And in some cases I could "reproduce" nearly exactly the image from the prompt in Stable Diffusion. Wow!!
Just two questions I am interested in from the technical view:

  1. Why can it happen that the prompt from the same image can be different when restarting the process? Does the code start with a random noise, but here on the token embedding side?
  2. Where are the differences between fast and best mode? The fast mode let me process the image in less than 0.1 seconds. The best mode takes 8 seconds. The result of the best mode is better but not 80 times better. ;) I would say "a bit" better. The classic mode on the other side outputed a rather bad result. To be honest my statistical material is restricted to my very last prompt generation. ;) But even if more tests would come to different results my questions would still not be answered.

Best regards
Marc

P.S: I am on cuda, RTX 4090

Is output length hardcoded?

Hello, @pharmapsychotic!

I enjoy using your tool very much. But it seems that your top item rankings length are hardcoded and cannot be changed in the config nor passes as a function argument when using 'clip-interrogator' from pip.

for _ in tqdm(range(25), desc="Flavor chain"):

tops = merged.rank(image_features, 32)

BLIP med_config can't be found

Hello, when I install your git+https://github.com/pharmapsychotic/BLIP.git@lib#egg=blip BLIP fork, without the -e option, clip-interrogator can't find the med_config file

pip install git+https://github.com/pharmapsychotic/BLIP.git@lib#egg=blip
pip install clip-interrogator==0.3.2
pip install -e git+https://github.com/pharmapsychotic/BLIP.git@lib#egg=blip
pip install clip-interrogator==0.3.2

maybe there's is a better way to locate the config file?

blip_path = os.path.dirname(inspect.getfile(blip_decoder))
configs_path = os.path.join(os.path.dirname(blip_path), 'configs')
med_config = os.path.join(configs_path, 'med_config.json')
blip_model = blip_decoder(
pretrained=config.blip_model_url,
image_size=config.blip_image_eval_size,
vit='large',
med_config=med_config
)

Any way to add more artists/keywords?

Is it possible to add more stuff to the database of keywords? I have a pretty big database of words that could be added to increase the accuracy, does it need to be retrained when I would add them or is it possible to just add them?
I'm using it to caption my image dataset so I think it would be better if i can add more words

Pillow import error

Using a clean conda environment, and installing with pip. When running an example, all I get is:
ImportError: cannot import name 'PILLOW_VERSION' from 'PIL' (/media/evilscript/DATA/Conda/ClipInterrogator/lib/python3.11/site-packages/PIL/__init__.py)

Maybe downgrading pillow is the solution?

RuntimeError: expected scalar type Float but found Half

I've been getting this error in my M1 lately:

โ”‚ /opt/homebrew/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py:205 in โ”‚
โ”‚ interrogate โ”‚
โ”‚ โ”‚
โ”‚ 202 โ”‚ โ”‚ โ”‚ best_prompt = t.rank(image_features, 1)[0] โ”‚
โ”‚ 203 โ”‚ โ”‚ โ”‚ best_sim = self.similarity(image_features, best_prompt) โ”‚
โ”‚ 204 โ”‚ โ”‚ โ”‚
โ”‚ โฑ 205 โ”‚ โ”‚ check_multi_batch([best_medium, best_artist, best_trending, best_movement]) โ”‚
โ”‚ 206 โ”‚ โ”‚ โ”‚
โ”‚ 207 โ”‚ โ”‚ extended_flavors = set(flaves) โ”‚
โ”‚ 208 โ”‚ โ”‚ for _ in tqdm(range(max_flavors), desc="Flavor chain", disable=self.config.quiet โ”‚
โ”‚ โ”‚
โ”‚ /opt/homebrew/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py:202 in โ”‚
โ”‚ check_multi_batch โ”‚
โ”‚ โ”‚
โ”‚ 199 โ”‚ โ”‚ โ”‚ โ”‚ prompts.append(prompt) โ”‚
โ”‚ 200 โ”‚ โ”‚ โ”‚ โ”‚
โ”‚ 201 โ”‚ โ”‚ โ”‚ t = LabelTable(prompts, None, self.clip_model, self.tokenize, self.config) โ”‚
โ”‚ โฑ 202 โ”‚ โ”‚ โ”‚ best_prompt = t.rank(image_features, 1)[0] โ”‚
โ”‚ 203 โ”‚ โ”‚ โ”‚ best_sim = self.similarity(image_features, best_prompt) โ”‚
โ”‚ 204 โ”‚ โ”‚ โ”‚
โ”‚ 205 โ”‚ โ”‚ check_multi_batch([best_medium, best_artist, best_trending, best_movement]) โ”‚
โ”‚ โ”‚
โ”‚ /opt/homebrew/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py:295 in rank โ”‚
โ”‚ โ”‚
โ”‚ 292 โ”‚ โ”‚
โ”‚ 293 โ”‚ def rank(self, image_features: torch.Tensor, top_count: int=1) -> List[str]: โ”‚
โ”‚ 294 โ”‚ โ”‚ if len(self.labels) <= self.chunk_size: โ”‚
โ”‚ โฑ 295 โ”‚ โ”‚ โ”‚ tops = self._rank(image_features, self.embeds, top_count=top_count) โ”‚
โ”‚ 296 โ”‚ โ”‚ โ”‚ return [self.labels[i] for i in tops] โ”‚
โ”‚ 297 โ”‚ โ”‚ โ”‚
โ”‚ 298 โ”‚ โ”‚ num_chunks = int(math.ceil(len(self.labels)/self.chunk_size)) โ”‚
โ”‚ โ”‚
โ”‚ /opt/homebrew/lib/python3.10/site-packages/clip_interrogator/clip_interrogator.py:289 in _rank โ”‚
โ”‚ โ”‚
โ”‚ 286 โ”‚ โ”‚ top_count = min(top_count, len(text_embeds)) โ”‚
โ”‚ 287 โ”‚ โ”‚ text_embeds = torch.stack([torch.from_numpy(t) for t in text_embeds]).to(self.de โ”‚
โ”‚ 288 โ”‚ โ”‚ with torch.cuda.amp.autocast(): โ”‚
โ”‚ โฑ 289 โ”‚ โ”‚ โ”‚ similarity = image_features @ text_embeds.T โ”‚
โ”‚ 290 โ”‚ โ”‚ _, top_labels = similarity.float().cpu().topk(top_count, dim=-1) โ”‚
โ”‚ 291 โ”‚ โ”‚ return [top_labels[0][i].numpy() for i in range(top_count)] โ”‚
โ”‚ 292

Out of Memory - Google Colab

I am running the CLIP interrogator in Google Colab. It was working fine yesterday but it seems like Pytorch is allocating a significant amount of memory?

CUDA out of memory. Tried to allocate 6.76 GiB (GPU 0; 14.75 GiB total capacity; 11.73 GiB already allocated; 1.93 GiB free; 11.78 GiB reserved in total by PyTorch)

Is there a solution for this/what is causing this?

AttributeError: module 'torch.backends' has no attribute 'mps'

Fatal error on start:

Traceback (most recent call last):
File "F:\projects\AI\CLIP-Interrogator\run_cli.py", line 9, in
from clip_interrogator import Interrogator, Config
File "d:\Users\nailz\anaconda3\lib\site-packages\clip_interrogator_init_.py", line 1, in
from .clip_interrogator import Interrogator, Config
File "d:\Users\nailz\anaconda3\lib\site-packages\clip_interrogator\clip_interrogator.py", line 43, in
class Config:
File "d:\Users\nailz\anaconda3\lib\site-packages\clip_interrogator\clip_interrogator.py", line 65, in Config
device: str = ("mps" if torch.backends.mps.is_available() else "cuda" if torch.cuda.is_available() else "cpu")
AttributeError: module 'torch.backends' has no attribute 'mps'

Windows 11 intel x64, python 3.10.7 or conda with 3.9

Difference between the tokenizer of sd 2.0 and ViT-H.

In advance, your work, CLIP Interrogator, is great!

However, It seems that the tokenizer of sd 2.0 is different from one of ViT-H.
For example, here is the test code.

from transformers import AutoTokenizer, CLIPTokenizer

prompt="!!"

tokenizer1 = AutoTokenizer.from_pretrained("laion/CLIP-ViT-H-14-laion2B-s32B-b79K")
tokenizer2 = CLIPTokenizer.from_pretrained("stabilityai/stable-diffusion-2", subfolder="tokenizer")

tokens1 = tokenizer1.tokenize(prompt)
tokens2 = tokenizer2.tokenize(prompt)

print(len(tokens1),len(tokens2))

I have got the output.

1 2

Why the difference occur?

Failed to load BLIP

Made a clean install and tried the sample snippet on the wiki and kept getting the same error.
Any advice to get past this point?
Win11
Cuda 11.6
python 3.10.6

Loading BLIP model...
Traceback (most recent call last):
  File "C:\ken\style-transfer\diff\clip-interrogator\testblip.py", line 4, in <module>
    ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))
  File "C:\ken\style-transfer\diff\clip-interrogator\clip_interrogator\clip_interrogator.py", line 58, in __init__
    blip_model = blip_decoder(
  File "c:\ken\style-transfer\diff\clip-interrogator\src\blip\models\blip.py", line 175, in blip_decoder
    model,msg = load_checkpoint(model,pretrained)
  File "c:\ken\style-transfer\diff\clip-interrogator\src\blip\models\blip.py", line 218, in load_checkpoint
    checkpoint = torch.load(cached_file, map_location='cpu')
  File "C:\Users\ken\Anaconda2\envs\StableDiffusion\lib\site-packages\torch\serialization.py", line 777, in load
    with _open_zipfile_reader(opened_file) as opened_zipfile:
  File "C:\Users\ken\Anaconda2\envs\StableDiffusion\lib\site-packages\torch\serialization.py", line 282, in __init__
    super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

TypeError: 'type' object is not subscriptable

Traceback (most recent call last):
File "run_cli.py", line 9, in
from clip_interrogator import Interrogator, Config
File "D:\sd14\clip-interrogator-with-less-VRAM-main\clip_interrogator_init_.py", line 1, in
from .clip_interrogator import Interrogator, Config
File "D:\sd14\clip-interrogator-with-less-VRAM-main\clip_interrogator\clip_interrogator.py", line 82, in
class Interrogator:
File "D:\sd14\clip-interrogator-with-less-VRAM-main\clip_interrogator\clip_interrogator.py", line 349, in Interrogator
def _first_bit_batch(self, images: list[Image]) -> (list[str], list[torch.Tensor]):
TypeError: 'type' object is not subscriptable

New instructions to get running local?

HI, I had this running local fine, but just updated and broke it. Is there new process?

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_17848\3198441247.py in <module>
     34 os.makedirs('cache', exist_ok=True)
     35 for url in CACHE_URLS:
---> 36     print(subprocess.run(['wget', url, '-P', 'cache'], stdout=subprocess.PIPE).stdout.decode('utf-8'))
     37 
     38 

Anaconda3\lib\subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
    503         kwargs['stderr'] = PIPE
    504 
--> 505     with Popen(*popenargs, **kwargs) as process:
    506         try:
    507             stdout, stderr = process.communicate(input, timeout=timeout)

Anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, user, group, extra_groups, encoding, errors, text, umask)
    949                             encoding=encoding, errors=errors)
    950 
--> 951             self._execute_child(args, executable, preexec_fn, close_fds,
    952                                 pass_fds, cwd, env,
    953                                 startupinfo, creationflags, shell,

Anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_gid, unused_gids, unused_uid, unused_umask, unused_start_new_session)
   1418             # Start the process
...
-> 1420                 hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
   1421                                          # no special security
   1422                                          None, None,

FileNotFoundError: [WinError 2] The system cannot find the file specified

Clip-Interrogator 2 error, RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

I tried using an image and got the following error:

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

[<ipython-input-5-f6c038c2ae8b>](https://localhost:8080/#) in go(btn)
     47         display(thumb)
     48 
---> 49         prompt = interrogate(image)
     50         IPython.display.clear_output()
     51         show_ui()

6 frames

[/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py](https://localhost:8080/#) in normalize(tensor, mean, std, inplace)
    957     if std.ndim == 1:
    958         std = std.view(-1, 1, 1)
--> 959     tensor.sub_(mean).div_(std)
    960     return tensor
    961 

RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

Issue with SD2

When running SD2 I' getting the following error:
Command: ci = Interrogator(Config(clip_model_name="ViT-H-14/laion2b_s32b_b79k"))

RuntimeError: Model ViT-H-14/laion2b_s32b_b79k not found; available models = ['RN50', 'RN101', 'RN50x4', 'RN50x16', 'RN50x64', 'ViT-B/32', 'ViT-B/16', 'ViT-L/14', 'ViT-L/14@336px']

Optimisations for 8GB VRAM?

Are there any optimisations to get this running on a 8GB RTX 3060ti ? From what I understand it currently needs 12GB

Trying to use with diffusers/upgrade to recent transformers

I moved the transformers install to version 4.24.0 and after the BLIP/CLIP which got rid of the "module transformers has no attribute FeatureExtractionMixin" problem but now it is doing this:
errors2

The following model_kwargs are not used by the model: ['encoder_hidden_states', 'encoder_attention_mask']

I realize this is a problem with Transformers, maybe not CI but I can't 100% tell anymore so I thought it might be something you need to know about. Right now Diffusers and CLIP Interrogator can not run in the same environment.

demo is failing

Prediction input failed validation: {"detail":[{"loc":["body","input","clip_model_name"],"msg":"value is not a valid enumeration member; permitted: 'ViT-L-14/openai', 'ViT-H-14/laion2b_s32b_b79k'","type":"type_error.enum","ctx":{"enum_values":["ViT-L-14/openai","ViT-H-14/laion2b_s32b_b79k"]}}]}

image

only using 10% of my gpu

hello, I managed to get the new interrogator to run locally but it only uses a fraction of my gpu power (10%) any way to optimize this?
I'm trying to caption my database that has millions of images so it would be nice if it could be optimized.
thanks ๐Ÿ˜„

Seems to think everything is blue?

Hi love the tool, have been experimenting with it a bunch but one thing I've noticed is that it seems to think everything is blue, especially an issue when it comes to skin tones

Has anyone else encountered this issue? Maybe I need to preprocess the files differently?

I am using ViT-L-14/openai

'charmap' codec can't encode character '\U0001f380'

I don't know exactly where did this come from but here is the error I got:

Traceback (most recent call last):
  File "E:\AIML\data\data\caption.py", line 17, in <module>
    f.write(output)
  File "C:\Users\Aivan\miniconda3\envs\st\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f380' in position 105: character maps to <undefined>

code:

import os
from PIL import Image
from clip_interrogator import Config, Interrogator

folder_path = "/path/to/input"
output_folder = "/path/to/output"
ci = Interrogator(Config(clip_model_name="ViT-L-14/openai"))

for image_name in os.listdir(folder_path):
    if image_name.endswith(".png"):
        image_path = os.path.join(folder_path, image_name)
        image = Image.open(image_path).convert('RGB')
        output = ci.interrogate(image)
        output_path = os.path.splitext(image_name)[0] + ".txt"
        output_path = os.path.join(output_folder, output_path)
        with open(output_path, "w") as f:
            f.write(output)

UnicodeDecodeError on Windows

When installing it on Ubuntu, these is no problem. But when installing it on Windows, there is 'gbk' UnicodeDecodeError. I guess the issue should be fixed easy by the author. Thanks!!

(venv37) PS E:\_Ai\clip_interrogator> pip install clip-interrogator
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting clip-interrogator
  Using cached https://pypi.tuna.tsinghua.edu.cn/packages/b9/c0/ac7d63330d69bacbe00e057d54db484560020d5acc69f26df8b64f2c3e85/clip-interrogator-0.4.1.tar.gz (786 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  ร— Getting requirements to build wheel did not run successfully.
  โ”‚ exit code: 1
  โ•ฐโ”€> [15 lines of output]
      Traceback (most recent call last):
        File "e:\_ai\clip_interrogator\venv37\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 353, in <module>
          main()
        File "e:\_ai\clip_interrogator\venv37\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "e:\_ai\clip_interrogator\venv37\lib\site-packages\pip\_vendor\pyproject_hooks\_in_process\_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "C:\Users\zhuos\AppData\Local\Temp\pip-build-env-dd06cv8m\overlay\Lib\site-packages\setuptools\build_meta.py", line 338, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "C:\Users\zhuos\AppData\Local\Temp\pip-build-env-dd06cv8m\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in _get_build_requires
          self.run_setup()
        File "C:\Users\zhuos\AppData\Local\Temp\pip-build-env-dd06cv8m\overlay\Lib\site-packages\setuptools\build_meta.py", line 335, in run_setup
          exec(code, locals())
        File "<string>", line 14, in <module>
      UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 469: illegal multibyte sequence
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

ร— Getting requirements to build wheel did not run successfully.
โ”‚ exit code: 1
โ•ฐโ”€> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Out of Memory

Loading CLIP model...
ViT-L-14_openai_artists.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 16.2M/16.2M [00:00<00:00, 39.6MB/s]
ViT-L-14_openai_flavors.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 155M/155M [00:02<00:00, 55.0MB/s]
ViT-L-14_openai_mediums.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 146k/146k [00:00<00:00, 3.55MB/s]
ViT-L-14_openai_movements.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 307k/307k [00:00<00:00, 2.69MB/s]
ViT-L-14_openai_trendings.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 111k/111k [00:00<00:00, 2.53MB/s]
ViT-L-14_openai_negative.safetensors: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 63.2k/63.2k [00:00<00:00, 1.55MB/s]
Loaded CLIP model and data in 11.78 seconds.
CUDA out of memory. Tried to allocate 224.00 MiB (GPU 0; 6.00 GiB total capacity; 4.05 GiB already allocated; 0 bytes free; 4.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Ran out of VRAM!

Maybe you could add some more buttons to the UI and run this stuff in stages/passes so can get all the data without loading all these models into memory at once. Maybe just a "I have no VRAM and I must scream" button.

if exists file x.txt append new line
something like that?

What is the source of the data?

Thank you for the great work!
I was wondering where you curated the list of different "flavors.txt", "mediums.txt", etc in the data folder.
Is the data from prompt dataset?
What rules did you use to extract them and classify them?
Thanks in advance!

Format for image input?

Trying to use this api in node using replicate and having trouble getting the image into a file format that the api will accept. I have tried a number of things, data url (which worked in a similar situation with an audio analysis api from replicate), creating a readable stream as in the openai docs, i tried using sharp to convert to flatten the image and mimic what was done in the python example from the api docs on replicate.

Any help appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.