leejet / stable-diffusion.cpp Goto Github PK
View Code? Open in Web Editor NEWStable Diffusion in pure C/C++
License: MIT License
Stable Diffusion in pure C/C++
License: MIT License
In the last week, there has been a lot of talk about a new type of model Latent Consistency Models that significantly improves the performance and generation of stable diffusion with fewer steps.
It apparently works with a LoRA adapter that can be applied to any existing model. I'm not sure if there are any specific changes to the UNet architecture that need to be made, but what needs to be done is adding a new sampler LCM Solver.
After completing the CUDA acceleration support, which is almost finished, I will see if I can work on adding LoRA support. This will require a complete change in the current project structure. Following that, I'll add the new solver and conduct the necessary tests.
I'm not aware what changes in the architecture were made, but Stability did release a new model promising significant speed-ups in sampling time..
https://stability.ai/news/stability-ai-sdxl-turbo
https://huggingface.co/stabilityai/sdxl-turbo
https://stability.ai/research/adversarial-diffusion-distillation
I tried to compile your app using cmake but encountered an error. I'm a newbie and know nothing about c and c++ compilation. Could you make a binary release for Windows, please?
P.S: Cmake's error was "Invalid character escape '\M'".
it's a really amazing project but i think, it could be great to have a DockerFile to fastly test the app, what do you think ?
I encountered a strange problem. After using CUDA, I got a pure green picture when running.But it works fine on another computer.
sd_cuda.exe -m meinamix_meinaV11-f16.gguf -p "1girl" -v
Option:
n_threads: 6
mode: txt2img
model_path: meinamix_meinaV11-f16.gguf
output_path: output.png
init_img:
prompt: 1girl
negative_prompt:
cfg_scale: 7.00
width: 512
height: 512
sample_method: euler_a
schedule: default
sample_steps: 20
strength: 0.75
rng: cuda
seed: 42
batch_count: 1
System Info:
BLAS = 1
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[DEBUG] stable-diffusion.cpp:3701 - Using CUDA backend
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1
[INFO] stable-diffusion.cpp:3715 - loading model from 'meinamix_meinaV11-f16.gguf'
[DEBUG] stable-diffusion.cpp:3733 - load_from_file: - kv 0: sd.model.name str
[DEBUG] stable-diffusion.cpp:3733 - load_from_file: - kv 1: sd.model.dtype i32
[DEBUG] stable-diffusion.cpp:3733 - load_from_file: - kv 2: sd.model.version i8
[DEBUG] stable-diffusion.cpp:3733 - load_from_file: - kv 3: sd.vocab.tokens arr
[INFO] stable-diffusion.cpp:3743 - Stable Diffusion 1.x | meinamix_meinaV11.safetensors
[INFO] stable-diffusion.cpp:3751 - model data type: f16
[DEBUG] stable-diffusion.cpp:3755 - loading vocab
[DEBUG] stable-diffusion.cpp:3771 - ggml tensor size = 416 bytes
[DEBUG] stable-diffusion.cpp:887 - clip params backend buffer size = 236.18 MB (449 tensors)
[DEBUG] stable-diffusion.cpp:2028 - unet params backend buffer size = 1641.16 MB (706 tensors)
[DEBUG] stable-diffusion.cpp:3118 - vae params backend buffer size = 95.47 MB (164 tensors)
[DEBUG] stable-diffusion.cpp:3780 - preparing memory for the weights
[DEBUG] stable-diffusion.cpp:3798 - loading weights
[DEBUG] stable-diffusion.cpp:3903 - model size = 1969.67MB
[INFO] stable-diffusion.cpp:3913 - total memory buffer size = 1972.80MB (clip 236.18MB, unet 1641.16MB, vae 95.47MB)
[INFO] stable-diffusion.cpp:3915 - loading model from 'meinamix_meinaV11-f16.gguf' completed, taking 0.92s
[INFO] stable-diffusion.cpp:3939 - running in eps-prediction mode
[DEBUG] stable-diffusion.cpp:3966 - finished loaded file
[DEBUG] stable-diffusion.cpp:4647 - prompt after extract and remove lora: "1girl"
[INFO] stable-diffusion.cpp:4652 - apply_loras completed, taking 0.00s
[DEBUG] stable-diffusion.cpp:1118 - parse '1girl' to [['1girl', 1], ]
[DEBUG] stable-diffusion.cpp:521 - split prompt "1girl" to tokens ["1</w>", "girl</w>", ]
[DEBUG] stable-diffusion.cpp:1051 - learned condition compute buffer size: 1.58 MB
[DEBUG] stable-diffusion.cpp:4061 - computing condition graph completed, taking 455 ms
[DEBUG] stable-diffusion.cpp:1118 - parse '' to [['', 1], ]
[DEBUG] stable-diffusion.cpp:521 - split prompt "" to tokens []
[DEBUG] stable-diffusion.cpp:1051 - learned condition compute buffer size: 1.58 MB
[DEBUG] stable-diffusion.cpp:4061 - computing condition graph completed, taking 415 ms
[INFO] stable-diffusion.cpp:4681 - get_learned_condition completed, taking 876 ms
[INFO] stable-diffusion.cpp:4691 - sampling using Euler A method
[INFO] stable-diffusion.cpp:4694 - generating image: 1/1
[DEBUG] stable-diffusion.cpp:2384 - diffusion compute buffer size: 552.57 MB
|==================================================| 20/20 - 7.42s/it
[INFO] stable-diffusion.cpp:4706 - sampling completed, taking 157.10s
[INFO] stable-diffusion.cpp:4714 - generating 1 latent images completed, taking 157.12s
[INFO] stable-diffusion.cpp:4716 - decoding 1 latents
[DEBUG] stable-diffusion.cpp:3252 - vae compute buffer size: 1664.00 MB
[DEBUG] stable-diffusion.cpp:4605 - computing vae [mode: DECODE] graph completed, taking 6.65s
[INFO] stable-diffusion.cpp:4724 - latent 1 decoded, taking 6.66s
[INFO] stable-diffusion.cpp:4728 - decode_first_stage completed, taking 6.66s
[INFO] stable-diffusion.cpp:4735 - txt2img completed in 164.66s
save result image to 'output.png'
Thanks for your great work!
I wonder if it is possible to output images of intermediate sampling steps. Such as I set sampling steps to 50, but I want to save the image per 5 steps during the diffusion.
If not, can you kindly tell me how to reach this? (This is might be a feature request.) Great thx! :)
Firstly I got many error with latest version of ggml. By checking issue #18, I change ggml branch to ed522bb8051658899b2f4a5bbb5483a5d21fcfb2. But it still give some error when I build the source, and giving
[ 16%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o
[ 33%] Linking C static library libggml.a
[ 33%] Built target ggml
[ 50%] Building CXX object CMakeFiles/stable-diffusion.dir/stable-diffusion.cpp.o
/mnt/d/yeqing/github/stable-diffusion.cpp/stable-diffusion.cpp: In function ‘ggml_tensor* ggml_group_norm_32(ggml_context*, ggml_tensor*)’:
/mnt/d/yeqing/github/stable-diffusion.cpp/stable-diffusion.cpp:266:38: error: too many arguments to function ‘ggml_tensor* ggml_group_norm(ggml_context*, ggml_tensor*)’
266 | return ggml_group_norm(ctx, a, 32);
| ^
In file included from /mnt/d/yeqing/github/stable-diffusion.cpp/stable-diffusion.cpp:16:
/mnt/d/yeqing/github/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:929:34: note: declared here
929 | GGML_API struct ggml_tensor* ggml_group_norm(
| ^~~~~~~~~~~~~~~
/mnt/d/yeqing/github/stable-diffusion.cpp/stable-diffusion.cpp: In member function ‘ggml_tensor* ResBlock::forward(ggml_context*, ggml_tensor*, ggml_tensor*)’:
/mnt/d/yeqing/github/stable-diffusion.cpp/stable-diffusion.cpp:985:47: error: too many arguments to function ‘ggml_tensor* ggml_group_norm_inplace(ggml_context*, ggml_tensor*)’
985 | h = ggml_group_norm_inplace(ctx, h, 32);
| ^
In file included from /mnt/d/yeqing/github/stable-diffusion.cpp/stable-diffusion.cpp:16:
/mnt/d/yeqing/github/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:933:34: note: declared here
933 | GGML_API struct ggml_tensor* ggml_group_norm_inplace(
| ^~~~~~~~~~~~~~~~~~~~~~~
/mnt/d/yeqing/github/stable-diffusion.cpp/stable-diffusion.cpp: In member function ‘ggml_tensor* UpSample::forward(ggml_context*, ggml_tensor*)’:
/mnt/d/yeqing/github/stable-diffusion.cpp/stable-diffusion.cpp:1480:35: error: too many arguments to function ‘ggml_tensor* ggml_upscale(ggml_context*, ggml_tensor*)’
1480 | x = ggml_upscale(ctx, x, 2); // [N, channels, h*2, w*2]
| ^
In file included from /mnt/d/yeqing/github/stable-diffusion.cpp/stable-diffusion.cpp:16:
/mnt/d/yeqing/github/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:1329:34: note: declared here
1329 | GGML_API struct ggml_tensor* ggml_upscale(
| ^~~~~~~~~~~~
make[2]: *** [CMakeFiles/stable-diffusion.dir/build.make:76: CMakeFiles/stable-diffusion.dir/stable-diffusion.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:133: CMakeFiles/stable-diffusion.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
I was wondering which version of ggml should I use to build from source?
THANKS!
@leejet I intend to create a pull request that requires me to use the latest version of ggml to utilize ggml-alloc and ggml-backend for adding GPU acceleration to this project. The issue is that I need some feedback to make progress. I'm not sure if you're already working on something to avoid redoing tasks that are already done.
GGML_Assert ggml.c 5733 a->ne[0] .....
Tried whatever is given in #54 but not working:
Error:
fatal error: intrin.h: No such file or directory
#include <intrin.h>
^
compilation terminated.
ggml\src\CMakeFiles\ggml.dir\build.make:75: recipe for target 'ggml/src/CMakeFiles/ggml.dir/ggml.c.obj' failed
mingw32-make.exe[2]: *** [ggml/src/CMakeFiles/ggml.dir/ggml.c.obj] Error 1
CMakeFiles\Makefile2:157: recipe for target 'ggml/src/CMakeFiles/ggml.dir/all' failed
mingw32-make.exe[1]: *** [ggml/src/CMakeFiles/ggml.dir/all] Error 2
Makefile:134: recipe for target 'all' failed
mingw32-make.exe: *** [all] Error 2
I'm sorry for polluting the GitHub issues with non-bugs, but since that precedent was already set by #1 and there's no Discussions enabled, I thought it may appropriate to share it here.
Laptop CPUs are always rather underpowered. As said in #15, even old desktop CPUs perform much better than modern mid-range laptops. Even more so, phones and ARM micro-computers are laughably slow.
The sampling can be much sped up by using a lower resolution, but models expectedly perform very poorly at resolutions lower than trained, resulting in colorful abstract shapes only vaguely resembling the expected objects.
But someone on HuggingFace managed to fine-tuned on 256x256 and 128x128 images to the point of getting coherent outputs!
.ckpt
first)This is great news for CPU inference, since the sampling time was cut in half! The outputs might have looked slightly less detailed, but were perfectly coherent.
I haven't investigated if there are any differences in outputs between stable-diffusion.cpp and official implementation, or if quantization has greater impact at lower resolution, but it does seem promising for real-life usage of this project.
Hey, I am using this repo successfully in a very strange single-threaded environment, and it's working. Great work! I really love the idea of a CPU-based generator that can be both single-threaded and statically built. One problem that I have though is that images smaller than 512x512 seems to be failing all the time. This issue also happens in a regular Linux terminal. Example:
$ ./build/bin/sd -m models/sd-v1-4-ggml-model-q4_1.bin -W 256 -H 256 --seed 42 --steps 12 -p "A lovely cat, high quality" -o
sd.png
Option:
n_threads: 16
mode: txt2img
model_path: models/sd-v1-4-ggml-model-q4_1.bin
output_path: sd.png
init_img:
prompt: A lovely cat, high quality
negative_prompt:
cfg_scale: 7.00
width: 128
height: 128
sample_method: eular a
sample_steps: 12
strength: 0.75
seed: 42
System Info:
BLAS = 0
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
$ ./build/bin/sd -m models/sd-v1-4-ggml-model-q4_1.bin -W 128 -H 128 --seed 42 --steps 12 -p "A lovely cat, high quality" -o sd.png
$ ./build/bin/sd -v -m models/sd-v1-4-ggml-model-q4_1.bin -W 512 -H 512 --seed 42 --steps 12 -p "A lovely cat, high quality" -o sd.png
These images were created on my AMD Ryzen 9 7950X machine. I am looking into this problem now, just creating this issue to track the problem.
Changing models doesn't help. I am experimenting with -march settings now.
I've started the program with -v -s "-1"
it shows:
seed: -1
It would be helpful if there was a random seed option and the log output included the actual seed used.
I do not know cpp and do not have a solid grasp on how ggml works. , but building the repo with cmake -dggml_clblast=ON seems to work as the GPU utilization goes up and it’s very fast (10s vs 80s per step on a higher end CPU). It does complete all the steps and completes sampling too, but then crashes at line 1505 of ggml-opencl.
If it is a matter of spending time to make this work, is it simple enough for one of you to explain what needs to be done? If so, would be happy to give it a shot but don’t know where to start.
My limited understanding is that sampling is what takes all the effort, so is there a way to maybe switch from GPU to CPU to save the file? Or am I missing some context/knowledge?
Edit: Fixed typo. Flag used is clblast, not openblas.
Could you add it, please? It is supposed to be faster than avx2 version, right? Sorry for dumb question, i'm a newbie in AI technology.
Can this project help for you? https://github.com/philipturner/metal-flash-attention
So far, metal-flash-attention can indeed provide the fastest generation speed for stable diffusion on MacOS.
txt2img works fine for me but img2img gives blurry abstract images
./sd.exe -m v2.bin -p "old two-storied american mansion entrance porch, bushes, second floor, door and windows nailed up with boards" -t 6 --sampling-method dpm++2mv2 --mode img2img -i Untitled.jpg --strength 0.2 --seed -1
I tried several images and different sampling methods, tried negative prompt "blur, blurry", all gives results like these.
Model is 512-base-ema.ckpt (v2 base model, works fine for txt2img)
$ ./sd.exe -m v2.bin -p "old two-storied american mansion entrance porch, bushes, second floor, door and windows nailed up with boards" -t 6 --sampling-method dpm++2mv2 --seed -1 --mode img2img -i Untitled.jpg --strength 0.7 -v Option: n_threads: 6 mode: img2img model_path: v2.bin output_path: output.png init_img: Untitled.jpg prompt: old two-storied american mansion entrance porch, bushes, se cond floor, door and windows nailed up with boards negative_prompt: cfg_scale: 7.00 width: 512 height: 512 sample_method: dpm++2mv2 schedule: default sample_steps: 20 strength: 0.70 rng: cuda seed: 28994 System Info: BLAS = 0 SSE3 = 1 AVX = 1 AVX2 = 1 AVX512 = 0 AVX512_VBMI = 0 AVX512_VNNI = 0 FMA = 1 NEON = 0 ARM_FMA = 0 F16C = 1 FP16_VA = 0 WASM_SIMD = 0 VSX = 0 [INFO] stable-diffusion.cpp:2832 - loading model from 'v2.bin' [DEBUG] stable-diffusion.cpp:2840 - verifying magic [DEBUG] stable-diffusion.cpp:2851 - loading hparams [INFO] stable-diffusion.cpp:2860 - model type: SD2.x [INFO] stable-diffusion.cpp:2868 - ftype: q8_0 [DEBUG] stable-diffusion.cpp:2874 - loading vocab [DEBUG] stable-diffusion.cpp:2902 - ggml tensor size = 320 bytes [DEBUG] stable-diffusion.cpp:2907 - clip params ctx size = 360.00 MB [DEBUG] stable-diffusion.cpp:2926 - unet params ctx size = 1406.42 MB [DEBUG] stable-diffusion.cpp:2947 - vae params ctx size = 179.12 MB [DEBUG] stable-diffusion.cpp:2968 - preparing memory for the weights [DEBUG] stable-diffusion.cpp:2984 - loading weights [DEBUG] stable-diffusion.cpp:3087 - model size = 1923.54MB [INFO] stable-diffusion.cpp:3096 - total params size = 1923.98MB (clip 358.70MB , unet 1405.51MB, vae 159.77MB) [INFO] stable-diffusion.cpp:3098 - loading model from 'v2.bin' completed, takin [DEBUG] stable-diffusion.cpp:3431 - diffusion context need 16.61MB static memory , with work_size needing 5.31MB [INFO] stable-diffusion.cpp:3892 - sampling using modified DPM++ (2M) method [INFO] stable-diffusion.cpp:3561 - step 1 sampling completed, taking 31.30s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 2 sampling completed, taking 30.70s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 3 sampling completed, taking 32.15s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 4 sampling completed, taking 31.58s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 5 sampling completed, taking 31.49s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 6 sampling completed, taking 32.33s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 7 sampling completed, taking 32.68s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 8 sampling completed, taking 32.13s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 9 sampling completed, taking 31.19s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 10 sampling completed, taking 31.08s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 11 sampling completed, taking 31.47s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 12 sampling completed, taking 33.07s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 13 sampling completed, taking 39.65s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 14 sampling completed, taking 34.51s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3561 - step 15 sampling completed, taking 38.46s [DEBUG] stable-diffusion.cpp:3565 - diffusion graph use 396.74MB runtime memory: static 16.61MB, dynamic 380.13MB [DEBUG] stable-diffusion.cpp:3566 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:3960 - diffusion graph use 1802.26MB of memory: par ams 1405.51MB, runtime 396.74MB (static 16.61MB, dynamic 380.13MB) [DEBUG] stable-diffusion.cpp:3961 - 66560 bytes of dynamic memory has not been r eleased yet [INFO] stable-diffusion.cpp:4367 - sampling completed, taking 493.82s [DEBUG] stable-diffusion.cpp:4131 - vae context need 10.16MB static memory, with work_size needing 0.00MB [DEBUG] stable-diffusion.cpp:4162 - computing vae graph completed, taking 71.56s [INFO] stable-diffusion.cpp:4185 - vae graph use 2220.92MB of memory: params 15 9.77MB, runtime 2061.16MB (static 10.16MB, dynamic 2051.00MB) [DEBUG] stable-diffusion.cpp:4186 - 3146752 bytes of dynamic memory has not been released yet [INFO] stable-diffusion.cpp:4379 - decode_first_stage completed, taking 71.61s [INFO] stable-diffusion.cpp:4393 - img2img completed in 599.98s, use 3535.86MB of memory: peak params memory 1923.98MB, peak runtime memory 2061.16MB save result image to 'output.png'
The error starts from master-7620b92, and there is no problem with other versions.
OS: MacOS 14 M1 16G
Error message:
[INFO] stable-diffusion.cpp:3084 - running in eps-prediction mode
ggml_aligned_malloc: insufficient memory (attempted to allocate 2268047721628.89 MB)
GGML_ASSERT: /Volumes/SOFT/Dev/stable-diffusion-cpp/stable-diffusion.cpp/ggml/src/ggml.c:4467: ctx->mem_buffer != NULL
Can you share how many seconds or it/s you do with your hardware (CPU/GPU/RAM) ?
[ 57%] Building CXX object CMakeFiles/stable-diffusion.dir/stable-diffusion.cpp.o
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:137:43: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
printf("shape(%zu, %zu, %zu, %zu)\n", tensor->ne[0], tensor->ne[1], tensor->ne[2], tensor->ne[3]);
~~~ ^~~~~~~~~~~~~
%lld
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:137:58: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
printf("shape(%zu, %zu, %zu, %zu)\n", tensor->ne[0], tensor->ne[1], tensor->ne[2], tensor->ne[3]);
~~~ ^~~~~~~~~~~~~
%lld
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:137:73: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
printf("shape(%zu, %zu, %zu, %zu)\n", tensor->ne[0], tensor->ne[1], tensor->ne[2], tensor->ne[3]);
~~~ ^~~~~~~~~~~~~
%lld
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:137:88: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
printf("shape(%zu, %zu, %zu, %zu)\n", tensor->ne[0], tensor->ne[1], tensor->ne[2], tensor->ne[3]);
~~~ ^~~~~~~~~~~~~
%lld
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:902:18: error: use of undeclared identifier 'ggml_group_norm'
auto h = ggml_group_norm(ctx, x);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1125:13: error: use of undeclared identifier 'ggml_group_norm'
x = ggml_group_norm(ctx, x);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1379:28: error: use of undeclared identifier 'ggml_get_dynamic'; did you mean 'ggml_get_name'?
bool dynamic = ggml_get_dynamic(ctx);
^~~~~~~~~~~~~~~~
ggml_get_name
/Users/saidm/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:664:35: note: 'ggml_get_name' declared here
GGML_API const char * ggml_get_name (const struct ggml_tensor * tensor);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1379:45: error: cannot initialize a parameter of type 'const struct ggml_tensor *' with an lvalue of type 'struct ggml_context *'
bool dynamic = ggml_get_dynamic(ctx);
^~~
/Users/saidm/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:273:12: note: 'ggml_context' is not defined, but forward declared here; conversion would be valid if it was derived from 'ggml_tensor'
struct ggml_context;
^
/Users/saidm/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:664:79: note: passing argument to parameter 'tensor' here
GGML_API const char * ggml_get_name (const struct ggml_tensor * tensor);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1380:13: error: use of undeclared identifier 'ggml_set_dynamic'; did you mean 'ggml_set_name'?
ggml_set_dynamic(ctx, false);
^~~~~~~~~~~~~~~~
ggml_set_name
/Users/saidm/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:665:35: note: 'ggml_set_name' declared here
GGML_API struct ggml_tensor * ggml_set_name ( struct ggml_tensor * tensor, const char * name);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1380:30: error: cannot initialize a parameter of type 'struct ggml_tensor *' with an lvalue of type 'struct ggml_context *'
ggml_set_dynamic(ctx, false);
^~~
/Users/saidm/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:273:12: note: 'ggml_context' is not defined, but forward declared here; conversion would be valid if it was derived from 'ggml_tensor'
struct ggml_context;
^
/Users/saidm/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:665:79: note: passing argument to parameter 'tensor' here
GGML_API struct ggml_tensor * ggml_set_name ( struct ggml_tensor * tensor, const char * name);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1382:13: error: use of undeclared identifier 'ggml_set_dynamic'; did you mean 'ggml_set_name'?
ggml_set_dynamic(ctx, dynamic);
^~~~~~~~~~~~~~~~
ggml_set_name
/Users/saidm/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:665:35: note: 'ggml_set_name' declared here
GGML_API struct ggml_tensor * ggml_set_name ( struct ggml_tensor * tensor, const char * name);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1382:30: error: cannot initialize a parameter of type 'struct ggml_tensor ' with an lvalue of type 'struct ggml_context '
ggml_set_dynamic(ctx, dynamic);
^~~
/Users/saidm/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:273:12: note: 'ggml_context' is not defined, but forward declared here; conversion would be valid if it was derived from 'ggml_tensor'
struct ggml_context;
^
/Users/saidm/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:665:79: note: passing argument to parameter 'tensor' here
GGML_API struct ggml_tensor * ggml_set_name ( struct ggml_tensor * tensor, const char * name);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1427:13: error: use of undeclared identifier 'ggml_upscale'
x = ggml_upscale(ctx, x); // [N, channels, h2, w2]
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1801:21: error: use of undeclared identifier 'ggml_concat'; did you mean 'ggml_context'?
h = ggml_concat(ctx, h, h_skip);
^
/Users/saidm/stable-diffusion.cpp/ggml/src/../include/ggml/ggml.h:273:12: note: 'ggml_context' declared here
struct ggml_context;
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1818:13: error: use of undeclared identifier 'ggml_group_norm'
h = ggml_group_norm(ctx, h);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:1922:18: error: use of undeclared identifier 'ggml_group_norm'
auto h = ggml_group_norm(ctx, z);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2031:19: error: use of undeclared identifier 'ggml_group_norm'
auto h_ = ggml_group_norm(ctx, x);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2256:13: error: use of undeclared identifier 'ggml_group_norm'
h = ggml_group_norm(ctx, h);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2438:13: error: use of undeclared identifier 'ggml_group_norm'
h = ggml_group_norm(ctx, h);
^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2757:20: error: no member named 'dynamic' in 'ggml_init_params'
params.dynamic = false;
~~~~~~ ^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2776:20: error: no member named 'dynamic' in 'ggml_init_params'
params.dynamic = false;
~~~~~~ ^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2797:20: error: no member named 'dynamic' in 'ggml_init_params'
params.dynamic = false;
~~~~~~ ^
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2901:49: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
name.data(), nelements, ggml_nelements(tensor));
^~~~~~~~~~~~~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:41:68: note: expanded from macro 'LOG_ERROR'
#define LOG_ERROR(format, ...) SD_LOG(SDLogLevel::ERROR, format, ##VA_ARGS)
~~~~~~ ^~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:28:80: note: expanded from macro 'SD_LOG'
printf("[DEBUG] %s:%-4d - " format "\n", FILENAME, LINE, ##VA_ARGS);
~~~~~~ ^~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2901:49: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
name.data(), nelements, ggml_nelements(tensor));
^~~~~~~~~~~~~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:41:68: note: expanded from macro 'LOG_ERROR'
#define LOG_ERROR(format, ...) SD_LOG(SDLogLevel::ERROR, format, ##VA_ARGS)
~~~~~~ ^~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:30:80: note: expanded from macro 'SD_LOG'
printf("[INFO] %s:%-4d - " format "\n", FILENAME, LINE, ##VA_ARGS);
~~~~~~ ^~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2901:49: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
name.data(), nelements, ggml_nelements(tensor));
^~~~~~~~~~~~~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:41:68: note: expanded from macro 'LOG_ERROR'
#define LOG_ERROR(format, ...) SD_LOG(SDLogLevel::ERROR, format, ##VA_ARGS)
~~~~~~ ^~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:32:89: note: expanded from macro 'SD_LOG'
fprintf(stderr, "[WARN] %s:%-4d - " format "\n", FILENAME, LINE, ##VA_ARGS);
~~~~~~ ^~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2901:49: warning: format specifies type 'size_t' (aka 'unsigned long') but the argument has type 'int64_t' (aka 'long long') [-Wformat]
name.data(), nelements, ggml_nelements(tensor));
^~~~~~~~~~~~~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:41:68: note: expanded from macro 'LOG_ERROR'
#define LOG_ERROR(format, ...) SD_LOG(SDLogLevel::ERROR, format, ##VA_ARGS)
~~~~~~ ^~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:34:89: note: expanded from macro 'SD_LOG'
fprintf(stderr, "[ERROR] %s:%-4d - " format "\n", FILENAME, LINE, ##VA_ARGS);
~~~~~~ ^~~~~~~~~~~
/Users/saidm/stable-diffusion.cpp/stable-diffusion.cpp:2961:20: error: no member named 'dynamic' in 'ggml_init_params'
params.dynamic = dynamic;
~~~~~~ ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
8 warnings and 20 errors generated.
make[2]: *** [CMakeFiles/stable-diffusion.dir/stable-diffusion.cpp.o] Error 1
make[1]: *** [CMakeFiles/stable-diffusion.dir/all] Error 2
make: *** [all] Error 2
Device Info:
Macbook pro,Apple M2,Ventura 13.3.1
Error Info:
% ./sd -m revAnimated_v11-ggml-model-f16.bin -p "a lovely pig" -v
Option:
n_threads: 4
mode: txt2img
model_path: revAnimated_v11-ggml-model-f16.bin
output_path: output.png
init_img:
prompt: a lovely pig
negative_prompt:
cfg_scale: 7.00
width: 512
height: 512
sample_method: eular a
sample_steps: 20
strength: 0.75
seed: 42
System Info:
BLAS = 1
SSE3 = 1
AVX = 1
AVX2 = 0
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 0
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[INFO] stable-diffusion.cpp:2698 - loading model from 'revAnimated_v11-ggml-model-f16.bin'
[DEBUG] stable-diffusion.cpp:2706 - verifying magic
[DEBUG] stable-diffusion.cpp:2717 - loading hparams
[INFO] stable-diffusion.cpp:2723 - ftype: f16
[DEBUG] stable-diffusion.cpp:2729 - loading vocab
[DEBUG] stable-diffusion.cpp:2757 - ggml tensor size = 288 bytes
zsh: illegal hardware instruction ./sd -m revAnimated_v11-ggml-model-f16.bin -p "a lovely pig" -v
I think most of the models suggest some value for clip skip. It would be very useful if it was supported.
[INFO] stable-diffusion.cpp:2830 - loading model from '/home/cwillu/ext/work/models/sd/cyberrealistic_v33-ggml-model-f16.bin'
[INFO] stable-diffusion.cpp:2858 - model type: SD1.x
[INFO] stable-diffusion.cpp:2866 - ftype: f16
ggml_opencl: selecting platform: 'AMD Accelerated Parallel Processing'
ggml_opencl: selecting device: 'gfx1012:xnack-'
ggml_opencl: device FP16 support: true
[INFO] stable-diffusion.cpp:3090 - total params size = 1969.98MB (clip 235.01MB, unet 1640.46MB, vae 94.51MB)
[INFO] stable-diffusion.cpp:3096 - loading model from '/home/cwillu/ext/work/models/sd/cyberrealistic_v33-ggml-model-f16.bin' completed, taking 0.64s
[INFO] stable-diffusion.cpp:3121 - running in eps-prediction mode
[INFO] stable-diffusion.cpp:3365 - condition graph use 248.59MB of memory: params 235.01MB, runtime 13.58MB (static 10.65MB, dynamic 2.93MB)
[INFO] stable-diffusion.cpp:3365 - condition graph use 248.59MB of memory: params 235.01MB, runtime 13.58MB (static 10.65MB, dynamic 2.93MB)
[INFO] stable-diffusion.cpp:4097 - get_learned_condition completed, taking 2.71s
[INFO] stable-diffusion.cpp:4113 - start sampling
[INFO] stable-diffusion.cpp:3753 - sampling using modified DPM++ (2M) method
ggml_opencl: ggml_cl_h2d_tensor_2d(queue, d_X, 0, src0, i03, i02, NULL) error -30 at /media/cwillu/External/cwillu/work/stable-diffusion.cpp/ggml/src/ggml-opencl.cpp:1505
I also get a similar error when using models that aren't f16 (i.e., f32, q4, etc), regardless of any other options, but that's probably a maybe-related-but-separate issue.
Thanks for your great work. The text2image mode works fine, but I met an error when using image2image mode. Any suggestions?
(mlc)- stable-diffusion.cpp % ./cmake-build-debug/bin/sd --mode img2img -m models/stable-diffusion-nano-2-1-ggml-model-q8_0.bin -p "Cat" -i ./nano_cat_q8_0.png -o ./img2img_output_v21_1.png --strength 0.4
[INFO] stable-diffusion.cpp:2830 - loading model from 'models/stable-diffusion-nano-2-1-ggml-model-q8_0.bin'
[INFO] stable-diffusion.cpp:2858 - model type: SD2.x
[INFO] stable-diffusion.cpp:2866 - ftype: q8_0
[WARN] stable-diffusion.cpp:3028 - unknown tensor 'cond_stage_model.model.transformer.text_model.embeddings.position_ids' in model file
[INFO] stable-diffusion.cpp:3094 - total params size = 1923.94MB (clip 358.69MB, unet 1405.49MB, vae 159.76MB)
[INFO] stable-diffusion.cpp:3096 - loading model from 'models/stable-diffusion-nano-2-1-ggml-model-q8_0.bin' completed, taking 0.86s
[INFO] stable-diffusion.cpp:3244 - check is_using_v_parameterization_for_sd2 completed, taking 0.99s
[INFO] stable-diffusion.cpp:3121 - running in eps-prediction mode
[INFO] stable-diffusion.cpp:4296 - img2img 128x128
[INFO] stable-diffusion.cpp:4300 - target t_enc is 8 steps
Assertion failed: (sizeof(dst->nb[0]) == sizeof(float)), function asymmetric_pad, file stable-diffusion.cpp, line 1407.
zsh: abort ./cmake-build-debug/bin/sd --mode img2img -m -p "Cat" -i ./nano_cat_q8_0.png
The input image is generated by nano-SD2.1 with 128*128 resolution.
I try the example provided, and the same error goes.
[INFO] stable-diffusion.cpp:2830 - loading model from './models/sd-v1-4-ggml-model-f16.bin'
[INFO] stable-diffusion.cpp:2858 - model type: SD1.x
[INFO] stable-diffusion.cpp:2866 - ftype: f16
[INFO] stable-diffusion.cpp:3094 - total params size = 2035.23MB (clip 235.01MB, unet 1640.46MB, vae 159.76MB)
[INFO] stable-diffusion.cpp:3096 - loading model from './models/sd-v1-4-ggml-model-f16.bin' completed, taking 1.75s
[INFO] stable-diffusion.cpp:3121 - running in eps-prediction mode
[INFO] stable-diffusion.cpp:4296 - img2img 512x512
[INFO] stable-diffusion.cpp:4300 - target t_enc is 0 steps
Assertion failed: (sizeof(dst->nb[0]) == sizeof(float)), function asymmetric_pad, file stable-diffusion.cpp, line 1407.
zsh: abort ./cmake-build-debug/bin/sd --mode img2img -m -p "cat with blue eyes" -i -o
Edit:
By temporarily removing the assertion below in lines 1407-1409, it works fine:
// assert(sizeof(dst->nb[0]) == sizeof(float));
// assert(sizeof(a->nb[0]) == sizeof(float));
// assert(sizeof(b->nb[0]) == sizeof(float));
Also, I found that img2img can't change the resolution of the image. Can we pad the input image to change the output to target resolution?
Error says:
$ ./bin/sd -m ../models/sd-v1-4-ggml-model-q4_0.bin -p "cat"
[INFO] stable-diffusion.cpp:2830 - loading model from '../models/sd-v1-4-ggml-model-q4_0.bin'
[INFO] stable-diffusion.cpp:2858 - model type: SD1.x
[INFO] stable-diffusion.cpp:2866 - ftype: q4_0
[INFO] stable-diffusion.cpp:3094 - total params size = 1431.17MB (clip 66.46MB, unet 1270.21MB, vae 94.50MB)
[INFO] stable-diffusion.cpp:3096 - loading model from '../models/sd-v1-4-ggml-model-q4_0.bin' completed, taking 2.13s
[INFO] stable-diffusion.cpp:3121 - running in eps-prediction mode
[INFO] stable-diffusion.cpp:3372 - condition graph use 79.79MB of memory: params 66.46MB, runtime 13.33MB (static 10.40MB, dynamic 2.93MB)
[INFO] stable-diffusion.cpp:3372 - condition graph use 79.79MB of memory: params 66.46MB, runtime 13.33MB (static 10.40MB, dynamic 2.93MB)
[INFO] stable-diffusion.cpp:4228 - get_learned_condition completed, taking 0.89s
[INFO] stable-diffusion.cpp:4244 - start sampling
[INFO] stable-diffusion.cpp:3565 - sampling using Euler A method
Segmentation fault
Idk what is an issue because nothing is written here.
Can we add the dynamic library build to the release build product?
looks like a new error as of clang 16 according to this article:
https://www.redhat.com/en/blog/new-warnings-and-errors-clang-16
I have clang version 17.0.5
Target: aarch64-unknown-linux-android24
~/stable-diffusion.cpp/build $ cmake --build . --config Release
[ 7%] Building C object ggml/src/CMakeFiles/ggml.dir/ggml.c.o/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:1221:5: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
1221 | GGML_F16_VEC_REDUCE(sumf, sum); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:748:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
748 | #define GGML_F16_VEC_REDUCE
GGML_F32Cx4_REDUCE | ^
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:738:38: note: expanded from macro
'GGML_F32Cx4_REDUCE' 738 | #define GGML_F32Cx4_REDUCE GGML_F32x4_REDUCE | ^
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:668:11: note: expanded from macro 'GGML_F32x4_REDUCE'
668 | res = GGML_F32x4_REDUCE_ONE(x[0]); \ | ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:653:34: note: expanded from macro
'GGML_F32x4_REDUCE_ONE'
653 | #define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x) | ^~~~~~~~~~~~~
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:1269:9: warning: implicit conversion increases floating-point
precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
1269 |
GGML_F16_VEC_REDUCE(sumf[k], sum[k]); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:748:41: note: expanded from macro
'GGML_F16_VEC_REDUCE'
748 | #define GGML_F16_VEC_REDUCE
GGML_F32Cx4_REDUCE | ^
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:738:38: note: expanded from macro
'GGML_F32Cx4_REDUCE' 738 |
#define GGML_F32Cx4_REDUCE
GGML_F32x4_REDUCE | ^
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:668:11: note: expanded from macro 'GGML_F32x4_REDUCE'
668 | res = GGML_F32x4_REDUCE_ONE(x[0]); \ | ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:653:34: note: expanded from macro
'GGML_F32x4_REDUCE_ONE'
653 | #define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x) | ^~~~~~~~~~~~~
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:3155:6: warning: no previous prototype for function
'ggml_broadcast' [-Wmissing-prototypes] 3155 | void ggml_broadcast( | ^
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:3155:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
3155 | void ggml_broadcast( | ^ | static
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:11953:11: error: type specifier missing, defaults to 'int'; ISO
C99 and later do not support implicit int [-Wimplicit-int]
11953 | const so2 = ne00 * ne01; | ~~~~~ ^ | int
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:11954:11: error: type specifier missing, defaults to 'int'; ISO
C99 and later do not support implicit int [-Wimplicit-int]
11954 | const so3 = ne00 * ne01 * ne02; | ~~~~~ ^ | int
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:11955:11: error: type specifier missing, defaults to 'int'; ISO
C99 and later do not support implicit int [-Wimplicit-int]
11955 | const do2 = ne0 * ne1; | ~~~~~ ^ | int
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:11956:11: error: type specifier missing, defaults to 'int'; ISO
C99 and later do not support implicit int [-Wimplicit-int]
11956 | const do3 = ne0 * ne1 * ne2; | ~~~~~ ^ | int
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:11948:15: warning: unused variable 'padding_factor' [-
Wunused-variable]
11948 | const int padding_factor = dst->op_params[0]; | ^~~~~~~~~~~~~~
/data/data/com.termux/files/home/stable-diffusion.cpp/ggml/src/ggml.c:19127:28: warning: comparison of integers of different signs:
'const size_t' (aka 'const unsigned long') and 'const int' [-Wsign-compare]
19127 | if (offset_pad != cur_offset) { | ~~~~~~~~~~ ^ ~~~~~~~~~~
5 warnings and 4 errors generated.
make[2]: *** [ggml/src/CMakeFiles/ggml.dir/build.make:76: ggml/src/CMakeFiles/ggml.dir/ggml.c.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:212: ggml/src/CMakeFiles/ggml.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
~/stable-diffusion.cpp/build $
I think it's a great feature that tools like automatic1111 write all the parameters from inference into the images' metadata. That way you can re-discover old pictures on your harddrive and see what model hash, seed and prompt you used.
The Readme of the stb library mentiones this small lib:
(Edit: I think that library is just to read metadata, not write it 😞 )
It is possible to use cuBLAS by enabling it when compiling:
-DGGML_CUBLAS=ON
Maybe add this to the readme?
i am trying to convert the v1-5-pruned-emaonly.safetensors but the file generated is not working.
convert.exe v1-5-pruned-emaonly.safetensors -t q4_0
loading model 'v1-5-pruned-emaonly.safetensors'
model type: checkpoint
Stable Diffusion 1.x - v1-5-pruned-emaonly.safetensors
preprocessing 0 tensors
using embedded vocab
converting 0 tensors
alphas_cumprod computed
CLIP Model Tensor count: 0
UNET Model Tensor count: 0
VAE Model Tensor count: 0
saving gguf file
model saved 'v1-5-pruned-emaonly-q4_0.gguf' correctly.
and then
sd.exe -m v1-5-pruned-emaonly-q4_0.gguf -p "anorange cat, realistic"
[INFO] stable-diffusion.cpp:3715 - loading model from 'v1-5-pruned-emaonly-q4_0.gguf'
[INFO] stable-diffusion.cpp:3743 - Stable Diffusion 1.x | v1-5-pruned-emaonly.safetensors
[INFO] stable-diffusion.cpp:3751 - model data type: q4_0
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.embeddings.position_embedding.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.embeddings.token_embedding.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm1.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm1.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm2.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.layer_norm2.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc1.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc1.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc2.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.mlp.fc2.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.k_proj.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.k_proj.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.out_proj.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.out_proj.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.q_proj.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.q_proj.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.v_proj.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'cond_stage_model.transformer.text_model.encoder.layers.0.self_attn.v_proj.weight' not in model file
.
.
.
model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_out.0.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_q.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.attn2.to_v.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.0.proj.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.0.proj.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.2.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.ff.net.2.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm1.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm1.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm2.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm2.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm3.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.1.transformer_blocks.0.norm3.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.2.conv.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.8.2.conv.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.emb_layers.1.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.emb_layers.1.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.in_layers.0.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.in_layers.0.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.in_layers.2.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.in_layers.2.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.out_layers.0.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.out_layers.0.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.out_layers.3.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.out_layers.3.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.skip_connection.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.0.skip_connection.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.norm.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.norm.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.proj_in.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.proj_in.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.proj_out.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.proj_out.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_k.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_out.0.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_out.0.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_q.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn1.to_v.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_k.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_out.0.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_out.0.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_q.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.attn2.to_v.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.0.proj.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.0.proj.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.2.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.ff.net.2.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm1.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm1.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm2.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm2.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm3.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.output_blocks.9.1.transformer_blocks.0.norm3.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.time_embed.0.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.time_embed.0.weight' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.time_embed.2.bias' not in model file
[ERROR] stable-diffusion.cpp:3889 - tensor 'model.diffusion_model.time_embed.2.weight' not in model file
I wanted to convert this model. It's fine-tuned model based on Stable Diffusion 1.5. I got this error message:
python3 convert.py ~/gameIconInstituteV10_v10.safetensors --out_type f16
loading model from ~/gameIconInstituteV10_v10.safetensors
loading model from ~/gameIconInstituteV10_v10.safetensors completed
Stable diffuison 1.x
no alphas_cumprod in file, generate new one
Saving GGML compatible file to /home/user/stable-diffusion.cpp/models/gameIconInstituteV10_v10-ggml-model-f16.bin
Traceback (most recent call last):
File "/home/user/stable-diffusion.cpp/models/convert.py", line 369, in <module>
convert(args.model_path, args.out_type, args.out_file)
File "/home/user/stable-diffusion.cpp/models/convert.py", line 317, in convert
data = state_dict[name].numpy()
TypeError: Got unsupported ScalarType BFloat16
Is it possible to add support for this kind of model please?
"taesd is tiny, distilled version of a stable diffusion vae."
Image generation results for this vae (showcased on their github) looks nearly identical, maybe having this supported in stable-diffusion-cpp
this could increase generation speed.
Hey, finally stable diffusion for ggml 😄
Did a test run
$ ./sd -t 8 -m ../models/v1-5-pruned-emaonly-ggml-model-q8_0.bin -p "alps, distant alms, small church, (cinematic:1.3), intricate details, (ArtStation:1.2), nikon dlsr, masterpiece, hyperreal"
[INFO] stable-diffusion.cpp:2189 - loading model from '../models/v1-5-pruned-emaonly-ggml-model-q8_0.bin'
[INFO] stable-diffusion.cpp:2214 - ftype: q8_0
[INFO] stable-diffusion.cpp:2259 - params ctx size = 1618.72 MB
[INFO] stable-diffusion.cpp:2399 - loading model from '../models/v1-5-pruned-emaonly-ggml-model-q8_0.bin' completed, taking 0.46s
[INFO] stable-diffusion.cpp:2477 - condition graph use 4.34MB of memory: static 1.41MB, dynamic = 2.93MB
[INFO] stable-diffusion.cpp:2477 - condition graph use 4.34MB of memory: static 1.41MB, dynamic = 2.93MB
[INFO] stable-diffusion.cpp:2822 - get_learned_condition completed, taking 0.16s
[INFO] stable-diffusion.cpp:2830 - start sampling
[INFO] stable-diffusion.cpp:2674 - step 1 sampling completed, taking 18.34s
[INFO] stable-diffusion.cpp:2674 - step 2 sampling completed, taking 18.24s
[INFO] stable-diffusion.cpp:2674 - step 3 sampling completed, taking 18.65s
[INFO] stable-diffusion.cpp:2674 - step 4 sampling completed, taking 18.41s
[INFO] stable-diffusion.cpp:2674 - step 5 sampling completed, taking 18.31s
[INFO] stable-diffusion.cpp:2674 - step 6 sampling completed, taking 18.18s
[INFO] stable-diffusion.cpp:2674 - step 7 sampling completed, taking 18.21s
[INFO] stable-diffusion.cpp:2674 - step 8 sampling completed, taking 18.29s
[INFO] stable-diffusion.cpp:2674 - step 9 sampling completed, taking 18.21s
[INFO] stable-diffusion.cpp:2674 - step 10 sampling completed, taking 18.28s
[INFO] stable-diffusion.cpp:2674 - step 11 sampling completed, taking 18.19s
[INFO] stable-diffusion.cpp:2674 - step 12 sampling completed, taking 18.00s
[INFO] stable-diffusion.cpp:2674 - step 13 sampling completed, taking 18.03s
[INFO] stable-diffusion.cpp:2674 - step 14 sampling completed, taking 18.54s
[INFO] stable-diffusion.cpp:2674 - step 15 sampling completed, taking 18.32s
[INFO] stable-diffusion.cpp:2674 - step 16 sampling completed, taking 18.41s
[INFO] stable-diffusion.cpp:2674 - step 17 sampling completed, taking 18.29s
[INFO] stable-diffusion.cpp:2674 - step 18 sampling completed, taking 18.51s
[INFO] stable-diffusion.cpp:2674 - step 19 sampling completed, taking 18.62s
[INFO] stable-diffusion.cpp:2674 - step 20 sampling completed, taking 18.11s
[INFO] stable-diffusion.cpp:2686 - diffusion graph use 623.74MB of memory: static 69.53MB, dynamic = 554.21MB
[INFO] stable-diffusion.cpp:2835 - sampling completed, taking 366.14s
[INFO] stable-diffusion.cpp:2766 - vae graph use 2177.12MB of memory: static 1153.12MB, dynamic = 1024.00MB
[INFO] stable-diffusion.cpp:2842 - decode_first_stage completed, taking 57.66s
[INFO] stable-diffusion.cpp:2843 - txt2img completed in 423.96s, with a runtime memory usage of 2177.12MB and parameter memory usage of 1618.58MB
save result image to 'output.png'
Painpoint: the extra python libs for conversion. Got a pip install error bc i have an incompatible version of something installed already, convert.py
worked anyway though. :)
Timings: I used the q8_0 quantization and ran with different thread counts:
I have a 12core(24threads) cpu.
I took the timing of a sampling step.
quant | q8_0 | q4_0 | f16 |
---|---|---|---|
-t 1 | 75.31s | 75.20s | 82.92s |
-t 2 | 42.44s | ||
-t 4 | 28.65s | 29.23s | 30.00s |
-t 6 | 21.68s | ||
-t 8 | 18.34s | 18.89s | 19.05s |
-t 10 | 16.38s | 16.78s | 17.61s |
-t 12 | 16.26s | 16.98s | 18.11s |
-t 14 | 17.93s | ||
-t 16 | 16.80s | ||
-t 18 | 16.70s | ||
-t 20 | 16.20s | ||
-t 22 | 16.96s | ||
-t 24 | 18.93s |
Additional questions:
(cinematic:1.3)
)edit: added f16 timings
eg spacestation
should probably be tokenized as space
+ station</w>
, but right now it is just an unhandled token.
[DEBUG] stable-diffusion.cpp:1077 - parse 'spacestation' to [['spacestation', 1], ]
[DEBUG] stable-diffusion.cpp:469 - split prompt "spacestation" to tokens ["<|endoftext|>", ]
(resulting images have nothing to do with a spacestation)
By default openblas will utilize maximum available threads.
You could set the threads in openblas by using
void goto_set_num_threads(int num_threads);
void openblas_set_num_threads(int num_threads);
https://github.com/xianyi/OpenBLAS#setting-the-number-of-threads-at-runtime
The default for --threads is std::thread::hardware_concurrency()
which returns the max threads including hyper-threads. This is not the same as the number of CPU cores. Using threads == cores usually gives the best performance. Here is how you can determinate the number of CPU cores: https://github.com/ggerganov/llama.cpp/blob/d783f7982e0e823a2626a9956359c0d36c1a7e21/examples/common.cpp#L34-L68
Your GGML repo has opencl bug.
leejet/ggml#1
fix size type error
sizeof(ggml_fp16_t) is error
should be sizeof(float)
PS D:\src\sd-cpp-src> $env:CLBLAST_HOME = "C:\vcpkg\installed\x64-windows\"
PS D:\src\sd-cpp-src> ls $env:CLBLAST_HOME\bin
Directory: C:\vcpkg\installed\x64-windows\bin
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 2023/8/31 16:47 3394560 clblast.dll
-a---- 2023/8/31 17:07 1707520 openblas.dll
-a---- 2023/8/31 10:43 54784 OpenCL.dll
PS D:\src\sd-cpp-src> ls $env:CLBLAST_HOME\include
Directory: C:\vcpkg\installed\x64-windows\include
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 2023/8/31 10:43 CL
d----- 2023/8/31 17:07 openblas
-a---- 2021/1/20 4:19 43027 clblast.h
-a---- 2021/1/20 4:19 146525 clblast_c.h
-a---- 2021/1/20 4:19 35227 clblast_half.h
-a---- 2023/8/26 3:43 1238 openblas_common.h
PS D:\src\sd-cpp-src>> cmake -B build -DGGML_CLBLAST=ON
-- Building for: Visual Studio 17 2022
-- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.19044.
-- The C compiler identification is MSVC 19.34.31933.0
-- The CXX compiler identification is MSVC 19.34.31933.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.34.31933/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.34.31933/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Deprecation Warning at ggml/CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- x86 detected
-- clBLAST found
-- Configuring done (4.1s)
-- Generating done (0.0s)
-- Build files have been written to: D:/src/sd-cpp-src/build
PS D:\src\sd-cpp-src>> cmake --build build -j --config Release
MSBuild version 17.4.0+18d5aef85 for .NET Framework
1>Checking Build System
Building Custom Rule D:/src/sd-cpp-src/ggml/src/CMakeLists.txt
ggml.c
ggml-alloc.c
正在生成代码...
ggml-opencl.cpp
D:\src\sd-cpp-src\ggml\src\ggml-opencl.cpp(10,10): fatal error C1083: 无法打开包括文件: “clblast.h”: No such file or directory
[D:\src\sd-cpp-src\build\ggml\src\ggml.vcxproj]
I use the latest commit, but I got:
ggml_aligned_malloc: insufficient memory (attempted to allocate 12320886367328.45 MB)
GGML_ASSERT: /Users/raykkk/Desktop/llama.cpp/sd/sync_ggml/stable-diffusion.cpp/ggml/src/ggml.c:4767: ctx->mem_buffer != NULL
Edit:
The commit 09cab2a2ae5006718c334d1b0e285c9d655002cb
works fine for me.
The bug appears in fbd18e10593fc71f3825d151bd5d8b0a29f8f8bd
.
I found ckpt versions of Segmind Distilles diffusion ( https://github.com/segmind/distill-sd, https://huggingface.co/segmind ) models:
https://huggingface.co/ClashSAN/small-sd/resolve/main/smallSDdistilled.ckpt
https://huggingface.co/ClashSAN/small-sd/resolve/main/tinySDdistilled.ckpt
I ran convert.py script from your repo to make ggml f32 quant for tinySDdistilled.ckpt. Then i tried to launch generated ggml in stable-diffusion.cpp but got this error:
[ERROR] stable-diffusion.cpp:2898 - tensor 'model.diffusion_model.output_blocks.1.0.in_layers.0.weight' has wrong shape in model file: got [1920, 1, 1, 1], expected [2560, 1, 1, 1]
Running the line from the readme, I get this:
step 1 sampling completed, taking 50.97s
Compiled with cmake on Windows. Shouldn't it be a little bit faster?
I try to use ggml_flash_attn
to accelerate the process, so I replace ggml_mul_mat
in cross-attention
in UNET in stable-diffusion.cpp:
...
#if 1
struct ggml_tensor * kqv = ggml_flash_attn(ctx, q, k, v, true);
#else
struct ggml_tensor* kq = ggml_mul_mat(ctx, k, q); // [N * n_head, h * w, h * w]
// kq = ggml_diag_mask_inf_inplace(ctx, kq, 0);
kq = ggml_soft_max_inplace(ctx, kq);
struct ggml_tensor* kqv = ggml_mul_mat(ctx, v, kq); // [N * n_head, h * w, d_head]
#endif
...
But it leads to an error. Looks like the max_position = 2
, N = 64
, and const int64_t P = nek1 - N;
which is less than 0
. Can someone help me? Great thx!
I got this error while loading the models
[INFO] stable-diffusion.cpp:2500 - loading model from '/path/to/models/meichidarkMix_meichidarkV4-ggml-model-q4_0.bin'
[DEBUG] stable-diffusion.cpp:2508 - verifying magic
[DEBUG] stable-diffusion.cpp:2519 - loading hparams
[INFO] stable-diffusion.cpp:2525 - ftype: q4_0
[DEBUG] stable-diffusion.cpp:2531 - loading vocab
[DEBUG] stable-diffusion.cpp:2569 - ggml tensor size = 240 bytes
[INFO] stable-diffusion.cpp:2570 - params ctx size = 1431.33 MB
[DEBUG] stable-diffusion.cpp:2587 - preparing memory for the weights
[DEBUG] stable-diffusion.cpp:2602 - loading weights
[WARN] stable-diffusion.cpp:2650 - unknown tensor 'control_model.input_blocks.0.0.bias' in model file
[WARN] stable-diffusion.cpp:2650 - unknown tensor 'control_model.input_blocks.0.0.weight' in model file
[WARN] stable-diffusion.cpp:2650 - unknown tensor 'control_model.input_blocks.1.0.emb_layers.1.bias' in model file
[WARN] stable-diffusion.cpp:2650 - unknown tensor 'control_model.input_blocks.1.0.emb_layers.1.weight' in model file
terminate called after throwing an instance of 'std::length_error'
what(): basic_string::_M_create
Aborted (core dumped)
I'm using Intel i5-8250U laptop with Ubuntu, RAM 12GB
Am I doing wrong while converting and quantizing the model or... ?
Thank you
Hi!
Is this project viable for Stable Diffusion XL?
Thank you
Hello,
I'm assuming that I'm not using this on a fully supported system but I cannot get this to run on Android with Termux.
I generated a q4_1 version of the standard 1.5 model with the Python script and followed the basic guide to setting everything up.
~/stable-diffusion.cpp/build $ ./bin/sd -m ../models/v1-5-pruned-emaonly-ggml-model-q4_1.bin -p "A cityscape at sunset, oil painting" -v
Option:
n_threads: 4
mode: txt2img
model_path: ../models/v1-5-pruned-emaonly-ggml-model-q4_1.bin
output_path: output.png
init_img:
prompt: A cityscape at sunset, oil painting
negative_prompt:
cfg_scale: 7.00
width: 512
height: 512
sample_method: eular a
sample_steps: 20
strength: 0.75
seed: 42
System Info:
BLAS = 0
SSE3 = 0
AVX = 0
AVX2 = 0
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 0
NEON = 1
ARM_FMA = 1
F16C = 0
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[INFO] stable-diffusion.cpp:2687 - loading model from '../models/v1-5-pruned-emaonly-ggml-model-q4_1.bin'
[DEBUG] stable-diffusion.cpp:2695 - verifying magic
[DEBUG] stable-diffusion.cpp:2706 - loading hparams
[INFO] stable-diffusion.cpp:2712 - ftype: q4_1
[DEBUG] stable-diffusion.cpp:2718 - loading vocab
[DEBUG] stable-diffusion.cpp:2746 - ggml tensor size = 272 bytes
[DEBUG] stable-diffusion.cpp:2751 - clip params ctx size = 75.02 MB
[DEBUG] stable-diffusion.cpp:2770 - unet params ctx size = 1287.24 MB
[DEBUG] stable-diffusion.cpp:2791 - vae params ctx size = 95.51 MB
[DEBUG] stable-diffusion.cpp:2812 - preparing memory for the weights
[DEBUG] stable-diffusion.cpp:2828 - loading weights
[DEBUG] stable-diffusion.cpp:2932 - model size = 1454.34MB
[INFO] stable-diffusion.cpp:2941 - total params size = 1454.64MB (clip 73.80MB, unet 1286.34MB, vae 94.51MB)
[INFO] stable-diffusion.cpp:2943 - loading model from '../models/v1-5-pruned-emaonly-ggml-model-q4_1.bin' completed, taking 1.32s
terminating with uncaught exception of type std::__ndk1::regex_error: The parser did not consume the entire regular expression.
Aborted
(rewriting sloppy request)
I was wondering if video support can be added?
At first I came up with lucidrain's video-diffusion-pytorch
https://github.com/lucidrains/video-diffusion-pytorch
But, after some research it seems like zeroscope might be the right model to use
https://huggingface.co/cerspense/zeroscope_v2_576w
(Cuda) PS D:\stable-diffusion.cpp> ./build/bin/Release/sd.exe -m "D:\stable-diffusion.cpp\models\v1-5-pruned-emaonly-ggml-model-f32.bin" -p "neko, catgirl, cute" -o "D:\stable-diffusion.cpp\outputs\output.png" -v -t 12
Option:
n_threads: 12
mode: txt2img
model_path: D:\stable-diffusion.cpp\models\v1-5-pruned-emaonly-ggml-model-f32.bin
output_path: D:\stable-diffusion.cpp\outputs\output.png
init_img:
prompt: neko, catgirl, cute
negative_prompt:
cfg_scale: 7.00
width: 512
height: 512
sample_method: eular a
sample_steps: 20
strength: 0.75
seed: 42
System Info:
BLAS = 0
SSE3 = 1
AVX = 1
AVX2 = 1
AVX512 = 0
AVX512_VBMI = 0
AVX512_VNNI = 0
FMA = 1
NEON = 0
ARM_FMA = 0
F16C = 1
FP16_VA = 0
WASM_SIMD = 0
VSX = 0
[INFO] stable-diffusion.cpp:2687 - loading model from 'D:\stable-diffusion.cpp\models\v1-5-pruned-emaonly-ggml-model-f32.bin'
[DEBUG] stable-diffusion.cpp:2695 - verifying magic
[DEBUG] stable-diffusion.cpp:2706 - loading hparams
[INFO] stable-diffusion.cpp:2712 - ftype: f32
[DEBUG] stable-diffusion.cpp:2718 - loading vocab
[DEBUG] stable-diffusion.cpp:2746 - ggml tensor size = 272 bytes
[DEBUG] stable-diffusion.cpp:2751 - clip params ctx size = 470.72 MB
[DEBUG] stable-diffusion.cpp:2770 - unet params ctx size = 2156.43 MB
[DEBUG] stable-diffusion.cpp:2791 - vae params ctx size = 95.51 MB
[DEBUG] stable-diffusion.cpp:2812 - preparing memory for the weights
[DEBUG] stable-diffusion.cpp:2828 - loading weights
[DEBUG] stable-diffusion.cpp:2932 - model size = 2719.24MB
[INFO] stable-diffusion.cpp:2941 - total params size = 2719.53MB (clip 469.50MB, unet 2155.53MB, vae 94.51MB)
[INFO] stable-diffusion.cpp:2943 - loading model from 'D:\stable-diffusion.cpp\models\v1-5-pruned-emaonly-ggml-model-f32.bin' completed, taking 43.51s
(Cuda) PS D:\stable-diffusion.cpp>
It stops and there is no output.
Hi,
I tried to build this project on a Xeon W-2135 system, with both gcc-11 and gcc-12. The CPU supports AVX512 (but nothing more advanced). After building, the binary indicated support for AVX and AVX2, but not for AVX512.
So I went ahead and changed the AVX512 flag in ggml/CMakeLists.txt from OFF to ON, deleted all the generated files and ran cmake again. The logs indicated that the AVX512 flag was indeed ON, but CFLAGS/CXXFLAGS still did not contain "-mavx512f". No AVX512 in the binary yet, either.
Eventually I just put "set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mavx512f)" somewhere into the CMakeLists.txt (same for CMAKE_CXX_FLAGS) and got a functional AVX512-enabled result. But I still consider it a bug that AVX512 is not detected by default, and not detected when the flag is manually set, either.
This happened with both "CC=gcc-11" and "CC=gcc-12" when running cmake.
Best Regards
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.