Comments (10)
Yeah that was it: LatentConsistencyDiffuser.cs:198.
from onnxstack.
Hello everyone! If you don't mind, i'll give you some tips on model conversion based on this doc
Long story short, if you run fusion optimizer on the model, it will combine many ops into one. So from 3k+ ops it will get to 1k+. That will lead to VRAM/RAM decrease (less GPU buffers allocated for each node input/output) and performance optimizations, since CUDA and DML have fused attention kernels
I've been using this script, it already has optimized settings for DML https://github.com/Amblyopius/Stable-Diffusion-ONNX-FP16/blob/main/conv_sd_to_onnx.py but with some changes.
Last 4 lines (disable BiasAdd, BiasSplitGelu, packed KV and QKV) are required if you want the model to work on CPU. Bias* kernels are not implemented for CPU in ONNX and packed KV/QKV for MultiHeadAttention are not supported on CPU too
With these optimizations and fp16 you should be able to run unet with less than 5gb VRAM. You can check results with this model i've converted for WebGPU https://huggingface.co/aislamov/stable-diffusion-2-1-base-onnx/tree/main
But if you want maximum performance, you can create two revisions of the model on huggingface. One with max GPU optimizations and another for CPU
Feel free to ask me any questions if you have!
from onnxstack.
That should be easy enough to support, let me see if I can squeeze it into tomorrows release
from onnxstack.
Latest commit will fix immediate issue for both pipelines, added the functionality to both diffuser base classes but I think implementation should be moved to a shared place as new pipelines will also need this I would assume.
Perhaps we need a static helper class for methods like these, as DecodeLatents is the same across both as well
from onnxstack.
Sorry I missed your PR and already commited a fix 38f60b6
GetInputMetadata is accessible and worked perfect, our implementations were pretty much the same
Thanks for the PR
from onnxstack.
Hello everyone! If you don't mind, i'll give you some tips on model conversion based on this doc
Long story short, if you run fusion optimizer on the model, it will combine many ops into one. So from 3k+ ops it will get to 1k+. That will lead to VRAM/RAM decrease (less GPU buffers allocated for each node input/output) and performance optimizations, since CUDA and DML have fused attention kernels
I've been using this script, it already has optimized settings for DML https://github.com/Amblyopius/Stable-Diffusion-ONNX-FP16/blob/main/conv_sd_to_onnx.py but with some changes. Last 4 lines (disable BiasAdd, BiasSplitGelu, packed KV and QKV) are required if you want the model to work on CPU. Bias* kernels are not implemented for CPU in ONNX and packed KV/QKV for MultiHeadAttention are not supported on CPU too
With these optimizations and fp16 you should be able to run unet with less than 5gb VRAM. You can check results with this model i've converted for WebGPU https://huggingface.co/aislamov/stable-diffusion-2-1-base-onnx/tree/main
Feel free to ask me any questions if you have!
hi! thank u so much for sharing this, sadly i have no idea how to code to do it myself, could u please make some fp16 models for cpu too? lyriel v16, deliberate v2 or v3, epiCRealism are a few good ones, any of them is good, i would like to use and test them out in onnxstack if possible, thanks
https://huggingface.co/nyxia/lyriel16/tree/main
or
https://civitai.com/models/22922/lyriel
https://civitai.com/models/25694/epicrealism
https://huggingface.co/stablediffusionapi/deliberate-v3/tree/main
Also, i assume this lcm model is only for gpu only? could u please make a cpu optimized too? but i will test tomorrow for cpu this one either way to see how it goes!
from onnxstack.
LCM fp16 now works very good and it is so fast! but i have no idea what is going on as i used directml and set the device to 0 for unet and the rest on 1 so i think it uses my AMD and Intel gpus [in task manager my intel graphic goes 99% usage so it is mostly this gpu] not the cpu this time,
i close it this topic if it is ok now
from onnxstack.
Looks like it doesn't like the unet's timestep. The fp16's is a float, the original is a long.
from onnxstack.
@saddam213 I've been trying to get a PR going, but I don't have access to the IOnnxModel in DiffuseAsync for _onnxModelService.GetInputMetadata. Is that available and I'm just not seeing it? Or will I have to edit OnnxModelService?
from onnxstack.
uh nice, thanks guys! cant wait for the update to test it out
from onnxstack.
Related Issues (20)
- CUDA "invalid argument" Error When Using OnnxStack.Stable Diffusion on GPU HOT 3
- Safetensors/ckpt conversion or loading HOT 6
- Feature Suggestion: Support for Custom Scheduler Implementation HOT 3
- dotnet build fails on Linux citing missing build target Microsoft.NET.Sdk.WindowsDesktop.targets HOT 1
- WebUI, generated image saved with incorrect path. HOT 9
- WebUI does not support multi-models or pipelines HOT 17
- Suggestions, new ideas and general talk HOT 80
- Support for Float16 Stable Diffusion Onnx models HOT 21
- Pipeline: ControlNet support for Stable Diffusion, LCM and InstaFlow HOT 2
- nuget HOT 2
- GPU resting HOT 1
- Pipeline: Support InstaFlow single step inference HOT 3
- [Bug] CUDA provider disappeared in 0.9.0 HOT 3
- Long input filenames distort the input view HOT 1
- LCM Lora Models run error because of unet had no 4 input HOT 9
- sdxl-turbo can't run HOT 4
- how to use with onnx models that doesnt have tokenizer model onnx file HOT 4
- Run fail on Unit Test (StableDiffusionTests.cs)
- fails when loading model HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onnxstack.