Giter VIP home page Giter VIP logo

everydream's People

Contributors

haxys avatar lehmacdj avatar mstevenson avatar nawnie avatar undersampled avatar victorchall avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

everydream's Issues

Running autocaption fails with wrong size of tensor.

When trying to run auto caption, the script fails with:

Windows detected, using asyncio.WindowsSelectorEventLoopPolicy
starting
input_dir:  input
Downloading model to .cache/model_base_caption_capfilt_large.pth... please wait
Model cached to: .cache/model_base_caption_capfilt_large.pth
Downloading (…)solve/main/vocab.txt: 100%|██████████████████████████████| 232k/232k [00:00<00:00, 6.17MB/s]
Downloading (…)okenizer_config.json: 100%|██████████████████████████████| 28.0/28.0 [00:00<00:00, 14.0kB/s]
Downloading (…)lve/main/config.json: 100%|█████████████████████████████████| 570/570 [00:00<00:00, 228kB/s]
load checkpoint from .cache/model_base_caption_capfilt_large.pth
loading model to cuda
working image:  input\00012-1722407061-gigapixel-standard-height-1024px.jpg
Traceback (most recent call last):
  File ".\scripts\auto_caption.py", line 217, in <module>
    asyncio.run(main(opt))
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\asyncio\runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\asyncio\base_events.py", line 608, in run_until_complete
    return future.result()
  File ".\scripts\auto_caption.py", line 157, in main
    captions = blip_decoder.generate(image, sample=sample, num_beams=16, min_length=opt.min_length, \
  File "scripts/BLIP\models\blip.py", line 156, in generate
    outputs = self.text_decoder.generate(input_ids=input_ids,
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\transformers\generation\utils.py", line 1524, in generate
    return self.beam_search(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\transformers\generation\utils.py", line 2810, in beam_search
    outputs = self(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 886, in forward
    outputs = self.bert(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 781, in forward
    encoder_outputs = self.encoder(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 445, in forward
    layer_outputs = layer_module(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 361, in forward
    cross_attention_outputs = self.crossattention(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 277, in forward
    self_outputs = self.self(
  File "C:\Users\ssuuk\anaconda3\envs\dl\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "scripts/BLIP\models\med.py", line 178, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (16) must match the size of tensor b (256) at non-singleton dimension 0
(dl) PS D:\Projekty\EveryDream>```

Is it possible to resize smallest dimension to the target size, and then randomly crop the final amount during training?

Its not very clear from documentation whether or not this is possible yet in this project's code, unless I'm missing something major.

I would like to resize my training images so that their smallest dimension is for example 512px, and then the larger dimension is whatever larger size. Then during training, the a random 512x512 section is cropped from the 512 by whatever size image. The caption information for the full image would remain attached to the crop made from it.

Running auto_caption with torch_device cpu results in fatal error

I am running /EveryDream/scripts/auto_caption.py --torch_device cpu and it throws me RuntimeError: The size of tensor a (16) must match the size of tensor b (256) at non-singleton dimension 0

I am not exactly sure what this means, or how I could resolve it.
Note, I have no GPU, for this experiment I am intending to just caption one single image on my CPU (which is a 2.6 GHz 6-Core Intel Core i7 with 16 GB 2667 MHz DDR4)

refactor to OO

...and get rid of globals. Should probably be splitting stuff up into classes

domain conn limit

do something to throttle downloads per domain, may have issues with limits on other end, maybe shuffle matches or get fancy and track them?

aspect ratio limit

code that will limit extreme aspect ratios, i.e. --aspect_max 1.6 would skip 2:1 images as they may be difficult to crop

Error (solved)

Hi, this is the first time I see something with .parquet.
I cloned https://huggingface.co/datasets/laion/laion2B-en-aesthetic and manually copy and pasted 127 parquet files in laion folder. then ran downlaod_laion.py and I get this result. I need help

(venv) root@n8f6ytisqn:/notebooks/EveryDream# python scripts/download_laion.py --search_text "a man" --limit 50
Launching...
is running in venv: True
{Fore.CYAN}Unix detected, using default asyncio event loop policy{Style.RESET_ALL}
  Searching for a man in column: TEXT in ./laion//*.parquet
  reading file: ./laion/part-00051-9230b837-b1e0-4254-8b88-ed2976e9cee9-c000.snappy.parquet
Traceback (most recent call last):
  File "/notebooks/EveryDream/scripts/download_laion.py", line 322, in <module>
    result = asyncio.run(download_laion_matches(opt))
  File "/usr/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/lib/python3.9/asyncio/base_events.py", line 647, in run_until_complete
    return future.result()
  File "/notebooks/EveryDream/scripts/download_laion.py", line 274, in download_laion_matches
    df = pd.read_parquet(file, engine="auto")
  File "/notebooks/EveryDream/venv/lib/python3.9/site-packages/pandas/io/parquet.py", line 503, in read_parquet
    return impl.read(
  File "/notebooks/EveryDream/venv/lib/python3.9/site-packages/pandas/io/parquet.py", line 251, in read
    result = self.api.parquet.read_table(
  File "/notebooks/EveryDream/venv/lib/python3.9/site-packages/pyarrow/parquet/__init__.py", line 2780, in read_table
    dataset = _ParquetDatasetV2(
  File "/notebooks/EveryDream/venv/lib/python3.9/site-packages/pyarrow/parquet/__init__.py", line 2368, in __init__
    [fragment], schema=schema or fragment.physical_schema,
  File "pyarrow/_dataset.pyx", line 898, in pyarrow._dataset.Fragment.physical_schema.__get__
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Could not open Parquet input source '<Buffer>': Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

coroutines

code should probably be using coroutines

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.