Comments (15)
Yea, while it's getting bit complex with differences on that level I got an ugly workaround together for running auto
profile with "self recovery" when running out of memory. Simply tailing the logs and running the restart command with awk.
After starting up the auto
profile normally, this would be run on another terminal (in WSL):
docker compose --profile auto logs --follow --tail 5 | awk '/illegal memory access was enc
ountered/ {system("docker compose --profile auto restart")}'
After a crash, it should kick the docker compose back up again and the original terminal will exit but logs can be watched there again with docker compose --profile auto logs --follow
. Bit glitchy experience but seems to work for now.
from stable-diffusion-webui-docker.
Ok, will test that too, liked the auto
profile's live preview a lot.
Thank's for the workaround. I'll keep an eye on the upstream repositories.
from stable-diffusion-webui-docker.
I eyed these parts a bit but Iwill keep my hands off from it for now as it works well enough for me and gotta to get back working on other stuff.
On a side note, the default auto
optimizations with 3060 12GB give larger resolution but the render speed drops quite a bit from 4-5 it/s to around 1-2 it/s. GPU CUDA activity on Task Manager going up and down like a saw wave during the render while it's keeping near constant 100% with optimization flags removed from docker-compose.yml
. Guessing that it's expected to behave that way with optimizations. Pretty pleased running without them and using the upscaling methods to get to around 1280x1280.
from stable-diffusion-webui-docker.
@jasalt could you please define what it means to "recover"? you mean that the container restart? or that the app continues to function normally?
from stable-diffusion-webui-docker.
To "recover", meaning the app to continue function normally.
from stable-diffusion-webui-docker.
@jasalt hmmmm, I don't think I can do anything on the container side, if the container stops on error, you can add restart: on-failure
to the docker-compose.yml
to restart it. However, it seems that Gradio catches the error but does not recover from it.
In any case, if you just want to restart, you can try docker compose --profile auto restart
, same effect as stopping and restarting, with less typing.
from stable-diffusion-webui-docker.
Ok, thank's. Will experiment with it. The auto-profile has been very stable if not passing resolution over 640x640 with 12GB VRAM. Hard to share access to it for others however before there's some solution.
from stable-diffusion-webui-docker.
@jasalt You might want to check your config, I can generate a 704 x 704 image on 6GB VRAM. You should be able to go up to 1024 with 12GB?
from stable-diffusion-webui-docker.
Was getting 512x512 with all optimizations off that is (12GB VRAM), tested with defaults of hlky-profile now and got up to 832 x 960. Didn't inspect the difference in render quality much but it's not clearly visible at least.
Adding restart: on-failure
to the profile didn't work for restarting after running out of VRAM but restart: always
does. This would reset the gradio public share url however but that shouldn't be a problem with proper reverse proxy setup. Example config change for hlky-profile which restarts after error:
hlky:
<<: *base_service
profiles: ["hlky"]
restart: always
build: ./services/hlky/
environment:
- CLI_ARGS=--optimized-turbo
from stable-diffusion-webui-docker.
@jasalt the auto
profile has more optimizations, maybe you can have larger images with it.
For the restart, you would probably need to create an issue in the respective UI repository to handle the errors gracefully, then restart:
is not necessary anymore.
from stable-diffusion-webui-docker.
Seems like the restart: always
only works with hlky
profile which exits docker compose with code 0 when running out of VRAM:
webui-docker-hlky-1 | !!Runtime error (txt2img)!!
webui-docker-hlky-1 | CUDA error: an illegal memory access was encountered
webui-docker-hlky-1 | CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
webui-docker-hlky-1 | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
webui-docker-hlky-1 | exiting...calling os._exit(0)
webui-docker-hlky-1 exited with code 0
After this it restarts. The auto
profile handles out of VRAM error differently, it does not exit and restart but hangs in Docker compose prompt like so:
...
webui-docker-automatic1111-1 | File "/stable-diffusion-webui/modules/sd_samplers.py", line 43, in sample_to_image
webui-docker-automatic1111-1 | x_sample = 255. * np.moveaxis(x_sample.cpu().numpy(), 0, 2)
webui-docker-automatic1111-1 | RuntimeError: CUDA error: an illegal memory access was encountered
webui-docker-automatic1111-1 | CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
webui-docker-automatic1111-1 | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
webui-docker-automatic1111-1 |
from stable-diffusion-webui-docker.
Yeah, the hlky
fork handles the errors explicitly, and exists gracefully here. the auto
fork just leaves the error to gradio, which probably does nothing and leaves the app in an invalid state.
from stable-diffusion-webui-docker.
Wizard! this still has the cost of reloading the entire app / models from scratch, which takes roughly 20 seconds on my machine, but I think it is still better than nothing.
On thing you could probably do if you really want to hack it away, is have some code that runs as part of the build that adds a try catch
to the code responsible for all gpu calls
I already have something similar for adding a link to this repo, maybe you can try as well.
Or just MR to the main repo with your solution.
from stable-diffusion-webui-docker.
This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 7 days.
from stable-diffusion-webui-docker.
This issue was closed because it has been stalled for 7 days with no activity.
from stable-diffusion-webui-docker.
Related Issues (20)
- Save Image - no file dialog to save image HOT 1
- Permission denied (13) error HOT 1
- it cant start auto image and stuck at " removing nvidia-cudnn-cu11" HOT 2
- Windows 10: pytorch error on build HOT 7
- rsync: chown "/data/models/VAE-approx/.model.pt.pA3bXk" failed: Operation not permitted (1) HOT 1
- OSError: Can't load tokenizer for 'openai/clip-vit-large-patch14'. HOT 6
- Taking up 16 GB of Ram on startup - automatically renders maybe? HOT 5
- Undefined Symbol Error HOT 3
- can not download stablediffusion.git HOT 17
- Turn off automatic download HOT 5
- Binaries HOT 1
- scripts HOT 1
- 'docker' is not recognized as the name of a cmdlet, function, script file, or operable program HOT 2
- E: You don't have enough free space in /var/cache/apt/archives/. HOT 2
- I update 1.6 to 1.7 HOT 1
- use AnimateDiff mportError: cannot import name 'animatediff_i2ibatch
- comfy 的docker image是否沒有更新了
- exec format error HOT 2
- ComfyUI Out of Date HOT 3
- Cannot start due to wcwidth error
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stable-diffusion-webui-docker.