victorchall / everydream Goto Github PK
View Code? Open in Web Editor NEWAdvanced fine tuning tools for vision models
License: GNU Affero General Public License v3.0
Advanced fine tuning tools for vision models
License: GNU Affero General Public License v3.0
When trying to run auto caption, the script fails with:
Windows detected, using asyncio.WindowsSelectorEventLoopPolicy
starting
input_dir: input
Downloading model to .cache/model_base_caption_capfilt_large.pth... please wait
Model cached to: .cache/model_base_caption_capfilt_large.pth
Downloading (…)solve/main/vocab.txt: 100%|██████████████████████████████| 232k/232k [00:00<00:00, 6.17MB/s]
Downloading (…)okenizer_config.json: 100%|██████████████████████████████| 28.0/28.0 [00:00<00:00, 14.0kB/s]
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████| 570/570 [00:00<00:00, 228kB/s]
load checkpoint from .cache/model_base_caption_capfilt_large.pth
loading model to cuda
working image: input\00012-1722407061-gigapixel-standard-height-1024px.jpg
Traceback (most recent call last):
File ".\scripts\auto_caption.py", line 217, in <module>
asyncio.run(main(opt))
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\asyncio\runners.py", line 43, in run
return loop.run_until_complete(main)
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\asyncio\base_events.py", line 608, in run_until_complete
return future.result()
File ".\scripts\auto_caption.py", line 157, in main
captions = blip_decoder.generate(image, sample=sample, num_beams=16, min_length=opt.min_length, \
File "scripts/BLIP\models\blip.py", line 156, in generate
outputs = self.text_decoder.generate(input_ids=input_ids,
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\transformers\generation\utils.py", line 1524, in generate
return self.beam_search(
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\transformers\generation\utils.py", line 2810, in beam_search
outputs = self(
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "scripts/BLIP\models\med.py", line 886, in forward
outputs = self.bert(
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "scripts/BLIP\models\med.py", line 781, in forward
encoder_outputs = self.encoder(
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "scripts/BLIP\models\med.py", line 445, in forward
layer_outputs = layer_module(
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "scripts/BLIP\models\med.py", line 361, in forward
cross_attention_outputs = self.crossattention(
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "scripts/BLIP\models\med.py", line 277, in forward
self_outputs = self.self(
File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "scripts/BLIP\models\med.py", line 178, in forward
attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (16) must match the size of tensor b (256) at non-singleton dimension 0
(dl) PS D:\Projekty\EveryDream>```
Its not very clear from documentation whether or not this is possible yet in this project's code, unless I'm missing something major.
I would like to resize my training images so that their smallest dimension is for example 512px, and then the larger dimension is whatever larger size. Then during training, the a random 512x512 section is cropped from the 512 by whatever size image. The caption information for the full image would remain attached to the crop made from it.
I am running /EveryDream/scripts/auto_caption.py --torch_device cpu
and it throws me RuntimeError: The size of tensor a (16) must match the size of tensor b (256) at non-singleton dimension 0
I am not exactly sure what this means, or how I could resolve it.
Note, I have no GPU, for this experiment I am intending to just caption one single image on my CPU (which is a 2.6 GHz 6-Core Intel Core i7 with 16 GB 2667 MHz DDR4)
Add support for training 2.0 related models.
Models can be found in this repo: https://github.com/Stability-AI/stablediffusion
...and get rid of globals. Should probably be splitting stuff up into classes
do something to throttle downloads per domain, may have issues with limits on other end, maybe shuffle matches or get fancy and track them?
--split n
so you can do crazy numbers
ex. --split 1000 will create out_dir/n subfolder every 1000 images downloaded
filename cleaning is sloppy
code that will limit extreme aspect ratios, i.e. --aspect_max 1.6 would skip 2:1 images as they may be difficult to crop
Hi, this is the first time I see something with .parquet.
I cloned https://huggingface.co/datasets/laion/laion2B-en-aesthetic and manually copy and pasted 127 parquet files in laion folder. then ran downlaod_laion.py and I get this result. I need help
(venv) root@n8f6ytisqn:/notebooks/EveryDream# python scripts/download_laion.py --search_text "a man" --limit 50
Launching...
is running in venv: True
{Fore.CYAN}Unix detected, using default asyncio event loop policy{Style.RESET_ALL}
Searching for a man in column: TEXT in ./laion//*.parquet
reading file: ./laion/part-00051-9230b837-b1e0-4254-8b88-ed2976e9cee9-c000.snappy.parquet
Traceback (most recent call last):
File "/notebooks/EveryDream/scripts/download_laion.py", line 322, in <module>
result = asyncio.run(download_laion_matches(opt))
File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
return future.result()
File "/notebooks/EveryDream/scripts/download_laion.py", line 274, in download_laion_matches
df = pd.read_parquet(file, engine="auto")
File "/notebooks/EveryDream/venv/lib/python3.9/site-packages/pandas/io/parquet.py", line 503, in read_parquet
return impl.read(
File "/notebooks/EveryDream/venv/lib/python3.9/site-packages/pandas/io/parquet.py", line 251, in read
result = self.api.parquet.read_table(
File "/notebooks/EveryDream/venv/lib/python3.9/site-packages/pyarrow/parquet/__init__.py", line 2780, in read_table
dataset = _ParquetDatasetV2(
File "/notebooks/EveryDream/venv/lib/python3.9/site-packages/pyarrow/parquet/__init__.py", line 2368, in __init__
[fragment], schema=schema or fragment.physical_schema,
File "pyarrow/_dataset.pyx", line 898, in pyarrow._dataset.Fragment.physical_schema.__get__
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.
code should probably be using coroutines
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.