Comments (11)
@leejet It's my impression or it seems that the CUDA backend is experiencing synchronization issues even from the CLIP model; it tends to happen sometimes.
build\bin\Release\sd -m models/kotosmix_v10-f16.gguf -p "beautiful anime girl, white hair, blue eyes, realistic, masterpiece, azur lane, 4k, high quality" -n "bad quality, ugly, face malformed, bad anatomy" --sampling-method dpm++2m --steps 20 -s 424354with cpu backend (and cuda backend sometimes):
Incorrect image since an incorrect (incomplete) embedding is generated, I don't really know. negative embedding invalid.
Investigating this synchronization issue is very challenging; it tends to occur sporadically, and replicating it isn't easy. I tried printing the output tensor of the clip, and after 10 repetitions, I identified a change in the values of the embedding.
@FSSRepo Please try colab
I think I found the real reason. The reason why the previous problem occurred is because there is a read and write competition in soft_max_f32. For details, you can check the document https://docs.nvidia.com/compute-sanitizer/ComputeSanitizer/index.html#racecheck- tool
========= COMPUTE-SANITIZER
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5
[INFO] stable-diffusion.cpp:5386 - loading model from 'v1-5-pruned-emaonly.safetensors'
[INFO] model.cpp:638 - load v1-5-pruned-emaonly.safetensors using safetensors format
[INFO] stable-diffusion.cpp:5412 - Stable Diffusion 1.x
[INFO] stable-diffusion.cpp:5418 - Stable Diffusion weight type: f32
[INFO] stable-diffusion.cpp:5573 - total memory buffer size = 2731.37MB (clip 470.66MB, unet 2165.24MB, vae 95.47MB)
[INFO] stable-diffusion.cpp:5579 - loading model from 'v1-5-pruned-emaonly.safetensors' completed, taking 2.45s
[INFO] stable-diffusion.cpp:5593 - running in eps-prediction mode
[INFO] stable-diffusion.cpp:6486 - apply_loras completed, taking 0.00s
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [80384 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [79488 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [77952 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [75264 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [81408 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [79360 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [80768 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [80384 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [78976 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [78080 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [79104 hazards]
=========
========= Error: Race reported between Read access at 0xbd0 in soft_max_f32(const float *, const float *, float *, int, int, float)
========= and Write access at 0x1d60 in soft_max_f32(const float *, const float *, float *, int, int, float) [78080 hazards]
from stable-diffusion.cpp.
@leejet to fix race condition of softmax in cuda comment the line 6499, this may solve the errors with artifacts when using VAE tiling:
while (nth < ncols_x && nth < CUDA_SOFT_MAX_BLOCK_SIZE) nth *= 2; // comment this line
from stable-diffusion.cpp.
@leejet It's my impression or it seems that the CUDA backend is experiencing synchronization issues even from the CLIP model; it tends to happen sometimes.
build\bin\Release\sd -m models/kotosmix_v10-f16.gguf -p "beautiful anime girl, white hair, blue eyes, realistic, masterpiece, azur lane, 4k, high quality" -n "bad quality, ugly, face malformed, bad anatomy" --sampling-method dpm++2m --steps 20 -s 424354
with cpu backend (and cuda backend sometimes):
Incorrect image since an incorrect (incomplete) embedding is generated, I don't really know. negative embedding invalid.
Investigating this synchronization issue is very challenging; it tends to occur sporadically, and replicating it isn't easy. I tried printing the output tensor of the clip, and after 10 repetitions, I identified a change in the values of the embedding.
from stable-diffusion.cpp.
google colab T4 cuda, in img2img mode VAE without --vae-tiling always producing solid color image.
update: also seeing this for txt2img for t4 cuda
!git clone https://github.com/leejet/stable-diffusion.cpp
%cd stable-diffusion.cpp
!git submodule update --init
!cmake -B build -DSD_CUBLAS=ON && cmake --build build --config Release
!mkdir output models
# https://civitai.com/models/133005?modelVersionId=240840
!wget "https://civitai.com/api/download/models/240840?type=Model&format=SafeTensor&size=full&fp=fp16" -O models/juggernautXL_v7Rundiffusion.safetensors
!wget "https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/resolve/main/sdxl_vae.safetensors?download=true" -O models/sdxl_vae-fp16-fix.safetensors
!wget "https://huggingface.co/madebyollin/taesdxl/resolve/main/diffusion_pytorch_model.safetensors?download=true" -O "models/diffusion_pytorch_model.safetensors"
!clear
!wget "https://upload.wikimedia.org/wikipedia/commons/thumb/c/ce/Improbable_neon_-_--_-_-generative_-code_-processing_-geometry_-algorithmicart_-xuxoe_-procedural_-everyday_-computerart_-3d_-daily_-improbable_-streamofconsciousness_-bnw_-surreal_-abstract_-colors_-blackandwhite_%2826697665157%29.jpg/640px-thumbnail.jpg" -O "output/blob.png"
!./build/bin/sd -m models/juggernautXL_v7Rundiffusion.safetensors \
--vae models/sdxl_vae-fp16-fix.safetensors \
-p "((a lovely cat)), Gorgeous, Magnetic, a palette of warm and vivid colors, Cozy Pastels" \
-n "3d render, 3dcg, abhorrent, abominable, anatomical nonsense, asymmetrical, awful, awkward, b&w" \
-s 77 --sampling-method euler_a --cfg-scale 6.1 --steps 16 \
-o output/tiling.png -M img2img -i output/blob.png --strength 0.77 --vae-tiling
!./build/bin/sd -m models/juggernautXL_v7Rundiffusion.safetensors \
--vae models/sdxl_vae-fp16-fix.safetensors \
-p "((a lovely cat)), Gorgeous, Magnetic, a palette of warm and vivid colors, Cozy Pastels" \
-n "3d render, 3dcg, abhorrent, abominable, anatomical nonsense, asymmetrical, awful, awkward, b&w" \
-s 77 --sampling-method euler_a --cfg-scale 6.1 --steps 16 \
-o output/vae.png -M img2img -i output/blob.png --strength 0.77
!./build/bin/sd -m models/juggernautXL_v7Rundiffusion.safetensors \
--taesd models/diffusion_pytorch_model.safetensors \
-p "((a lovely cat)), Gorgeous, Magnetic, a palette of warm and vivid colors, Cozy Pastels" \
-n "3d render, 3dcg, abhorrent, abominable, anatomical nonsense, asymmetrical, awful, awkward, b&w" \
-s 77 --sampling-method euler_a --cfg-scale 6.1 --steps 16 \
-o output/taesd.png -M img2img -i output/blob.png --strength 0.77
from IPython.display import Image, display
display(Image(filename='output/tiling.png'))
display(Image(filename='output/vae.png'))
display(Image(filename='output/taesd.png'))
from stable-diffusion.cpp.
@Cyberhan123 Could you send me the CLI commands to perform this test? Your link is not allowing me to access Colab.
from stable-diffusion.cpp.
@Cyberhan123 Could you send me the CLI commands to perform this test? Your link is not allowing me to access Colab.
I modified the link and the command is as follows
!rm -r stable-diffusion.cpp
!git clone --recursive https://github.com/leejet/stable-diffusion.cpp.git
!mkdir stable-diffusion.cpp/build
!echo "target_compile_options(ggml PRIVATE \$<\$<COMPILE_LANGUAGE:CUDA>:-lineinfo>)" >> stable-diffusion.cpp/CMakeLists.txt
!cmake -S stable-diffusion.cpp -B stable-diffusion.cpp/build -DSD_CUBLAS=ON
!cmake --build stable-diffusion.cpp/build --config Release
!curl -L -O https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
# !curl -L -O https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.2.4/RealESRGAN_x4plus_anime_6B.pth
# !mv 223670 model.safetensors
# !stable-diffusion.cpp/build/bin/sd -m v1-5-pruned-emaonly.safetensors -p "a lovely cat" --upscale-model RealESRGAN_x4plus_anime_6B.pth
!compute-sanitizer --tool racecheck stable-diffusion.cpp/build/bin/sd -m v1-5-pruned-emaonly.safetensors -p "a lovely cat"
from stable-diffusion.cpp.
@leejet I've been testing the SDXL rendering. I did find some issues:
-
for 1024x1024 pictures(or anything above 512), we can see some undecoded latent on the bottom of the image.
-
There seems to be a problem with the prompting for SDXL - under the same conditions and seed, the image should be deterministic. But I get variations on Stable diffusion cpp that I do not get on other SD apps like InvokeAI (using the same conditions.). For example. This is an example image we should be able to reproduce in SD.cpp.
However, when I use the same meta data in SD.app, I get this instead...
SDXL does have two text encoders - I'm not sure if this is dealt with in SD.cpp....
(NOTE: as a test for deterministic image generation, I did SD.cpp with SD1.5). Here is the example SD1.5:
And this was reproduced in SD.cpp using the same meta data....
from stable-diffusion.cpp.
However, when I use the same meta data in SD.app, I get this instead...
I'm getting the same horrible results while using SD-Turbo and SDXL-Turbo.
from stable-diffusion.cpp.
-
It seems to be an issue with ggml's CUDA backend synchronization. Do you still encounter this problem when using the latest code?
-
The 'Model: Stable Diffusion XL 1.0 (1024)' model doesn't seem to accommodate 512x512 images well. It's better to set the image generation size to 1024x1024. I think there might be an issue with the parameters for generating the displayed images on the webpage. I've generated normal images using the following parameters, consistent with sd-webui.
.\bin\Release\sd.exe -m ..\..\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0.safetensors --vae ..\..\stable-diffusion-webui\models\VAE\sdxl_vae-fp16-fix.safetensors -p "Marilyn Monroe in the 21st century A stylish woman with a fashionable outfit, pretty makeup, facial closeup" -v --steps 25 -H 1024 -W 1024
from stable-diffusion.cpp.
@leejet I think the parameters @YAY-3M-TA3 Y set are wrong. He may have set CFG Scale to 7.0
from stable-diffusion.cpp.
@leejet I think the parameters @YAY-3M-TA3 Y set are wrong. He may have set CFG Scale to 7.0
For the SDXL base model, setting the CFG scale to 7 should be fine. In my example above, the CFG scale is also 7 (the default value).
from stable-diffusion.cpp.
Related Issues (20)
- Problems with function "ggml_quantize_chunk" on M1 in ggml.c HOT 2
- Enhance:free_params_immediately can be infer again HOT 4
- Support for splitting buffers M1/M2/M3
- Enhance: can set options without load model again. HOT 2
- support for bigger seeds and token
- unsupported dtype 'F64' HOT 10
- lora can not free memory HOT 1
- IP Adapter Direct ML Errors Controlnet 1.1.431 and UP HOT 1
- "CUDA error" when set resolution higher than 1280 x 1280 HOT 7
- Support Segmind Stable Diffusion 1B HOT 2
- rocBLAS error: {...} No such file or directory for GPU arch : gfx1031 HOT 6
- In lora.hpp miss ggml_free(ctx0);
- Will Controlnet be supported in the future? HOT 2
- VAE-Tiling seems to be always enabled in the C-API of 36ec16a HOT 1
- How do specify the upscale size? HOT 2
- to do list remove the winograd conv 2d
- Getting always black image using XL models HOT 2
- Is there a way to use controlnet inpaint?
- Using (UTF-8) accented characters give a segfault
- Are SDXL Controlnet models supported? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stable-diffusion.cpp.