I tried the one-click installer (log attached) and it completes. I can open the Gradi

Not able to generate - tried Bark, Tortoise, and MusicGen about tts-generation-webui HOT 17 CLOSED

The1Bill commented on May 25, 2024

Not able to generate - tried Bark, Tortoise, and MusicGen

from tts-generation-webui.

Comments (17)

rsxdalv commented on May 25, 2024

Hi, thank you for the report and the log. I'd like to ask - is there a reason you are running it inside ttsgen environment? To add to that, I remember that some users did see an issue if they tried to launch start which creates another nested conda environment, but it's not guaranteed.

As for the other errors - could you please test if the Gradio UI works at all? The error suggests that it's malfunctioning, but it might work and help debugging.
Next, torch compile problem is something new (I'm afraid somebody somewhere "updated and improved" some project).

from tts-generation-webui.

rsxdalv commented on May 25, 2024

Ah, and the problem is simple, the react UI tries to connect to 7860 which doesn't have the endpoints that it should be finding on 7861. Please test 7860 and then we can make the second Gradio interface work together.

from tts-generation-webui.

The1Bill commented on May 25, 2024

I try to keep my dependencies contained in virtual environments as I run several applications in this VM with their own dependencies. I get the same result if I run the one-click installer out of the virtual environment, though.

Is there any way to pick a different port other than 7860? TextGen Webui and Automatic1111 are already jockeying for ports 7860 and 7861. In the meantime I'll try shutting down TextGen WebUI and Automatic1111 and seeing if I can get TTS to run.

Thanks for the replies - I'll let you know what the results are.

from tts-generation-webui.

rsxdalv commented on May 25, 2024

Yes, part one - edit the settings in Gradio UI or config.json then restart. Part two - changing the React UI endpoint. It should be possible by setting the environment variable before launching the UI, but I'm not 100% sure it will pass the environment variable.
GRADIO_BACKEND=http://127.0.0.1:4200/

from tts-generation-webui.

rsxdalv commented on May 25, 2024

And if it works in a nested conda environment, that's good news.

from tts-generation-webui.

The1Bill commented on May 25, 2024

I'm running the start_linux.sh file outside of a Conda environment; I didn't know that the one-click installer made its own Conda environment when I set it up initially.

I'm up and running; I've been able to pick a different port for Gradio, and there's no conflict on 3000.

One thing that I've noticed is that It seems like the MusicGen models aren't being unloaded, like, ever. I just tried loading the MusicGen-Medium, then the MusicGen-Small, and then the MusicGen-Medium again, and my VRAM usage kept ramping up (image attached).

Lastly - how can I shut this down gracefully? Whenever I ctrl+c in terminal or use the "Apply settings and shutdown UI (Manual Restart Required)" button in the settings tab of the UI, it seems to not release port 3000, so when I try to restart I get the below error. The only fix I've found is to restart the whole VM.

(base) gptj6b@huggingface:~/tts$ ./start_linux.sh
Loading extensions:
Loaded extension: callback_save_generation_musicgen_ffmpeg
Loaded extension: empty_extension
Loaded extension: callback_save_generation_ffmpeg
Loaded 2 callback_save_generation extensions.
Loaded 1 callback_save_generation_musicgen extensions.
Blocksparse is not available: the current GPU does not expose Tensor cores
Failed to load voice clone demo
module 'torch' has no attribute 'compiler'
/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
Starting Gradio server...
Gradio interface options:
inline: False
inbrowser: True
share: False
debug: False
max_threads: 40
auth: None
auth_message: None
prevent_thread_lock: False
show_error: False
server_name: 0.0.0.0
server_port: None
show_tips: False
height: 500
width: 100%
favicon_path: None
ssl_keyfile: None
ssl_certfile: None
ssl_keyfile_password: None
ssl_verify: True
quiet: True
show_api: True
file_directories: None
_frontend: True
Running on local URL: http://0.0.0.0:7860

[email protected] start
next start

error Failed to start server
Error: listen EADDRINUSE: address already in use 0.0.0.0:3000
at Server.setupListenHandle [as _listen2] (node:net:1740:16)
at listenInCluster (node:net:1788:12)
at doListen (node:net:1937:7)
at process.processTicksAndRejections (node:internal/process/task_queues:83:21) {
code: 'EADDRINUSE',
errno: -98,
syscall: 'listen',
address: '0.0.0.0',
port: 3000
}

from tts-generation-webui.

The1Bill commented on May 25, 2024

Actually, I spoke too soon. I'm now getting the below error whenever I try to run MusicGen. I rebooted the VM and am still getting this error.

Loading model facebook/musicgen-large
Traceback (most recent call last):
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/gradio/queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/gradio/route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/gradio/blocks.py", line 1550, in process_api
result = await self.call_function(
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/gradio/blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/gradio/utils.py", line 661, in wrapper
response = f(*args, **kwargs)
File "/home/gptj6b/tts/tts-generation-webui/src/musicgen/musicgen_tab.py", line 148, in generate
MODEL = load_model(model)
File "/home/gptj6b/tts/tts-generation-webui/src/musicgen/musicgen_tab.py", line 127, in load_model
return MusicGen.get_pretrained(version)
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/audiocraft/models/musicgen.py", line 91, in get_pretrained
return MusicGen(name, compression_model, lm)
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/audiocraft/models/musicgen.py", line 52, in init
super().init(name, compression_model, lm, max_duration)
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/audiocraft/models/genmodel.py", line 55, in init
self.compression_model = get_wrapped_compression_model(self.compression_model, self.cfg)
File "/home/gptj6b/tts/installer_files/env/lib/python3.10/site-packages/audiocraft/models/builders.py", line 254, in get_wrapped_compression_model
if cfg.interleave_stereo_codebooks.use:
AttributeError: 'NoneType' object has no attribute 'use'

from tts-generation-webui.

rsxdalv commented on May 25, 2024

It seems like it doesn't clear the memory for big GPUs correctly. For small VRAM it does. Once I'll add the garbage collector code I'd like you to test again.

Also, for me I can just close with multiple interrupts with Ctrl C. Node.js is spawned as a subprocess. I could add a shutdown button although the button you found is meant to be a full shutdown.

Musicgen seems to have updated that, I'll try to fix it but without my workstation I'm not sure I'll be able to test properly.

from tts-generation-webui.

The1Bill commented on May 25, 2024

I ran an update and was able to use MusicGen again after I restarted the WebUI. I'm noticing something else that's a bit hinky - there's an increase in VRAM usage as the inference stops.

I'm going to try running inference with this model with a barebones Python script. I've spent most of my time with LLMs, so I don't know a lot about how MusicGen works under the hood. I'm curious if this isn't just the way that this model behaves.

I'm happy to help with bugfinding/bugfixing/testing/whatever else I can do. My hardware may be vintage, but at least it was at the top of its game when it was new. ;)

from tts-generation-webui.

rsxdalv commented on May 25, 2024

No no, it's not a hardware age issue, I'm thinking it's more of the "RAM" principle where you give Windows 2 GBs or 16 GBs but it's full either way. With GPUs this same behavior is a lot more problematic. Either that or Audiocraft/Musicgen made an update that broke memory clearing. Good to hear that it worked. Now as for the memory spike - these models "always" do more. Audio models often generate a "compressed" version before decompressing. Things like encodec would have a small memory footprint but MultibandDiffusion is a heavy duty "decompresser". (Some models, like bark, also generate some sort of semantic version first where they convert the input text into "meaning" for the model and then use that to generate audio).

…

On Sat, Jan 27, 2024, 9:25 AM The1Bill ***@***.***> wrote: I ran an update and was able to use MusicGen again after I restarted the WebUI. I'm noticing something else that's a bit hinky - there's an *increase* in VRAM usage as the inference stops. I'm going to try running inference with this model with a barebones Python script. I've spent most of my time with LLMs, so I don't know a lot about how MusicGen works under the hood. I'm curious if this isn't just the way that this model behaves. I'm happy to help with bugfinding/bugfixing/testing/whatever else I can do. My hardware may be vintage, but at least it was at the top of its game when it was new. ;) — Reply to this email directly, view it on GitHub <#268 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABTRXI6FQ7G5QXFIK4IIJG3YQRJQXAVCNFSM6AAAAABCK7ISLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSHA4TQMJUHE> . You are receiving this because you commented.Message ID: ***@***.***>

from tts-generation-webui.

The1Bill commented on May 25, 2024

Is there any way to get it to use both GPUs? I took a quick look at the FB Research Audiocraft repo and didn't find anybody who had been able to do anything with multi-GPU setups (aside from somebody saying that it might work with SLURM, but that wouldn't really help me), but I was curious if you had found anything to the contrary.

from tts-generation-webui.

rsxdalv commented on May 25, 2024

What about running two instances of the Webui? As you mentioned, GPU parallelism isn't a very popular topic.

…

On Sun, Jan 28, 2024, 1:42 PM The1Bill ***@***.***> wrote: Is there any way to get it to use both GPUs? I took a quick look at the FB Research Audiocraft repo and didn't find anybody who had been able to do anything with multi-GPU setups (aside from somebody saying that it might work with SLURM, but that wouldn't really help me), but I was curious if you had found anything to the contrary. — Reply to this email directly, view it on GitHub <#268 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABTRXI7QQOXLM744RVFVABDYQXQL5AVCNFSM6AAAAABCK7ISLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGQ3DCMZZG4> . You are receiving this because you commented.Message ID: ***@***.***>

from tts-generation-webui.

The1Bill commented on May 25, 2024

It's more that I'm trying to find a way to make longer clips without going OOM; I'm currently experimenting with running the model directly so I can understand its VRAM usage a bit better (things like how VRAM usage scales with clip length) so I have less of a skill gap.

from tts-generation-webui.

rsxdalv commented on May 25, 2024

Ah, VRAM. Well if there's something I can update for integration, please let me know.

…

On Sun, Jan 28, 2024, 3:09 PM The1Bill ***@***.***> wrote: It's more that I'm trying to find a way to make longer clips without going OOM; I'm currently experimenting with running the model directly so I can understand its VRAM usage a bit better (things like how VRAM usage scales with clip length) so I have less of a skill gap. — Reply to this email directly, view it on GitHub <#268 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABTRXI5WFCXBHBOMFUY6FSTYQX2RRAVCNFSM6AAAAABCK7ISLCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGQ4DCMRTGY> . You are receiving this because you commented.Message ID: ***@***.***>

from tts-generation-webui.

The1Bill commented on May 25, 2024

Running the models "raw", I find that the VRAM usage is about the same. I didn't realise how catastrophic of an impact MultiBand Diffusion would have on VRAM usage.

Models definitely aren't unloading from VRAM after inference (or beginning inference with another model), though I see the benefit of leaving a model in memory so another prompt can be inferred straightway without reloading the model every time though.

Thanks for the explanation on why these models have a deceptively large footprint - I'm just going to abandon MBD for the time being as it doesn't play nicely with even the medium sized models.

As I suspected, the issues was on my end; I didn't have a lot of knowledge how these audio models worked. Now that I know, I think the only thing I'd change would be unload buttons on all of the models (basically what enhancement request 162 covers).

from tts-generation-webui.

rsxdalv commented on May 25, 2024

Thanks for the deep feedback. Yes, haha mbd is like golden HDMI cable - I'm sure it does something but why should you do that. Actually, if you want to ask for something - I think mbd can be run after the generation. So (assuming devolpment time didn't exist) you could generate with regular method, then choose to convert the generation with MBD. Also, if this was an app you could offload the MBD to another GPU (assuming development time didn't exist).

Let's keep this issue open a little more, I'd like to convert this conversation into something for the UI since as you've experienced, this isn't what you expect. Even adding a "(MBD is heavy)" label could be a big improvement.

from tts-generation-webui.

rsxdalv commented on May 25, 2024

Ok, I changed it to just:
"Use Multi-Band Diffusion (High VRAM Usage)"

from tts-generation-webui.

Not able to generate - tried Bark, Tortoise, and MusicGen about tts-generation-webui HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent