rsxdalv / tts-generation-webui Goto Github PK

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS)

Home Page: https://rsxdalv.github.io/tts-generation-webui/

License: MIT License

Python 45.62% Jupyter Notebook 0.42% CSS 0.15% Dockerfile 0.28% JavaScript 2.54% TypeScript 46.95% Batchfile 1.73% Shell 1.91% HTML 0.13% PowerShell 0.26%

gradio machine-learning text-to-speech tts web ai audio-generation deep-learning torch bark

tts-generation-webui's Introduction

TTS Generation WebUI

Download || Upgrading || Manual installation || Docker Setup || Configuration Guide || Discord Server || || Feedback / Bug reports

List of models: Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNeT, Stable Audio, Maha TTS, MMS, and more.

Note: Not all models support all platforms. For example, MusicGen and AudioGen are not supported on MacOS as of yet.

Videos

Bark TTS, Seamless Translation, RVC, Music Generation and More	TTS Generation WebUI - A Tool for Text to Speech and Voice Cloning	Text to speech and voice cloning - TTS Generation WebUI

Changelog

Aug 5:

Fix Bark in React UI, add Max Generation Duration.
Change AudioCraft Plus extension models directory to ./data/models/audiocraft_plus/
Improve model unloading for MusicGen and AudioGen. Add unload models button to MusicGen and AudioGen.
Add Huggingface Cache Manager extension.

Aug 4:

Add XTTS-RVC-UI extension, XTTS Fine-tuning demo extension.

Aug 3:

Add Riffusion extension, AudioCraft Mac extension, Bark Legacy extension.

Aug 2:

Add deprecation warning to old installer.
Unify error handling and simplify tab loading.

Aug 1:

Add "Attempt Update" button for external extensions.
Skip reinstalling packages when pip_packages version is not changed.
Synchronize Gradio Port with React UI.
Change default Gradio port to 7770 from 7860.

July 2024

Click to expand

July 31:

Fix React UI's MusicGen after the Gradio changes.
Add unload button to Whisper extension.

July 29:

Change FFMpeg to 4.4.2 from conda-forge in order to support more platforms, including Mac M1.
Disable tortoise CVVP.

July 26:

Whisper extension
Experimental AMD ROCM install support. (Linux only)

July 25:

Add diagnostic scripts for MacOS and Linux.
Add better error details for tabs.
Fix .sh script execution permissions for the installers on Linux and MacOS.

July 21:

Add Gallery History extension (adapted from the old gallery view)
Convert Simple Remixer to extension
Fix update.py to use the newer torch versions (update.py is only for legacy purposes and will likely break)
Add Diagnostic script and Force Reinstall scripts for Windows.

July 20:

Fix Discord join link
Simplify Bark further, removing excessive complexity in code.
Add UI/Modular extensions, these extensions allow installing new models and features to the UI. In the future, models will start as extensions before being added permamently.
Disable Gallery view in outputs
Known issue: Firefox fails at showing outputs in Gradio, it fails at fetching them from backend. Within React UI this works fine.

July 15:

Comment - As the React UI has been out for a long time now, Gradio UI is going to have the role of serving only the functions to the user, without the extremely complicated UI that it cannot handle. There is a real shortage of development time to add new models and features, but the old style of integration was not viable. As the new APIs and 'the role of the model' is defined, it will be possible to have extensions for entire models, enabling a lot more flexibility and lighter installations.
Start scaling back Gradio UI complexity - removed send to RVC/Demucs/Voice buttons. (Remove internal component Joutai).
Add version.json for better updates in the future.
Reduce Gradio Bark maximum number of outputs to 1.
Add unload model button to Tortoise, also unload the model before loading the next one/changing parameters, thus tortoise no longer uses 2x model memory during the settings change.

July 14:

Regroup Gradio tabs into groups - Text to Speech, Audio Conversion, Music Generation, Outputs and Settings
Clean up the header, add link for feedback
Add seed control to Stable Audio
Fix Stable Audio filename bug with newlines
Disable "Simple Remixer" Gradio tab
Fix bark voice clone & RVC once more
Add "Installed Packages" tab for debugging

July 13:

Major upgrade to Torch 2.3.1 and xformers 0.0.27
- All users, including Mac and CPU will now have the same PyTorch version.
Upgrade CUDA to 11.8
Force python to be 3.10.11
Modify installer to allow upgrading Python and Torch without reinstalling (currently major version 2)
Fix magnet default params for better quality
Improve installer script checks to avoid bugs
Update StyleTTS2

July 11:

Improve Stable Audio generation filenames
Add force reinstall to torch repair
Make the installer auto-update before running

July 9:

Fix new installer and installation instructions thanks to https://github.com/Xeraster !

July 8:

Change the installation process to reduce package clashes and enable torch version flexibility.

July 6:

Initial release of new mamba based installer.
Save Stable Audio results to outputs-rvc/StableAudio folder.
Add a disclaimer to Stable Audio model selection and show better error messages when files are missing.

July 1:

Optimize Stable Audio memory usage after generation.
Open React UI automatically only if gradio also opens automatically.
Remove unnecessary conda git reinstall.
Update to lastest Stable Audio which has mps support (requires newer torch versions).

June 2024

Click to expand

June 22: * Add Stable Audio to Gradio.

June 21:

Add Vall-E-X demo to React UI.
Open React UI automatically in browser, fix the link again.
Add Split By Length to React/Tortoise.
Fix UVR5 demo folders.
Set fairseq version to 0.12.2 for Linux and Mac. (#323)
Improve generation history for all React UI tabs.

May 17:

Fix Tortoise presets in React UI.

May 9:

Add MMS to React UI.
Improve React UI and codebase.

May 4:

Group Changelog by month

April 2024

Click to expand

Apr 28: * Add Maha TTS to React UI. * Add GPU Info to React UI.

Apr 6:

Add Vall-E-X generation demo tab.
Add MMS demo tab.
Add Maha TTS demo tab.
Add StyleTTS2 demo tab.

Apr 5:

Fix RVC installation bug.
Add basic UVR5 demo tab.

Apr 4:

Upgrade RVC to include RVMPE and FCPE. Remove the direct file input for models and indexes due to file duplication. Improve React UI interface for RVC.

March 2024

Click to expand

Mar 28:

Add GPU Info tab

Mar 27:

Add information about voice cloning to tab voice clone

Mar 26:

Add Maha TTS demo notebook

Mar 22:

Vall-E X demo via notebook (#292)
Add React UI to Docker image
Add install disclaimer

Mar 16:

Upgrade vocos to 0.1.0

Mar 14:

StyleTTS2 Demo Notebook

Mar 13:

Add Experimental Pipeline (Bark / Tortoise / MusicGen / AudioGen / MAGNeT -> RVC / Demucs / Vocos) (#287)
Fix RVC bug with model reloading on each generation. For short inputs that results in a visible speedup.

Mar 11:

Add Play as Audio and Save to Voices to bark (#286)
Change UX to show that files are deleted from favorites
Fix images for bark voices not showing
Fix audio playback in favorites

Mar 10:

Add Batching to React UI Magnet (#283)
Add audio to audio translation to SeamlessM4T (#284)

Mar 5:

Add Batching to React UI MusicGen (#281), thanks to https://github.com/Aamir3d for requesting this and providing feedback

Mar 3:

Add MMS demo as a notebook
Add MultiBandDiffusion high VRAM disclaimer

February 2024

Click to expand

Feb 21:

Fix Docker container builds and bug with Docker-Audiocraft

Feb 8:

Fix MultiBandDiffusion for MusicGen's stereo models, thank you https://github.com/mykeehu
Fix Node.js installation steps on Google Colab, code by https://github.com/miaohf

Feb 6:

Add FLAC file generation extension by https://github.com/JoaCHIP

January 2024

Click to expand

Jan 21:

Add CPU/M1 torch auto-repair script with each update. To disable, edit check_cuda.py and change FORCE_NO_REPAIR = True

Jan 16:

Upgrade MusicGen, adding support for stereo and large melody models
Add MAGNeT

Jan 15:

Upgraded Gradio to 3.48.0
- Several visual bugs have appeared, if they are critical, please report them or downgrade gradio.
- Gradio: Suppress useless warnings
Supress Triton warnings
Gradio-Bark: Fix "Use last generation as history" behavior, empty selection no longer errors
Improve extensions loader display
Upgrade transformers to 4.36.1 from 4.31.0
Add SeamlessM4T Demo

Jan 14:

React UI: Fix missing directory errors

Jan 13:

React UI: Fix missing npm build step from automatic install

Jan 12:

React UI: Fix names for audio actions
Gradio: Fix multiple API warnings
Integration - React UI now is launched alongside Gradio, with a link to open it

Jan 11:

React UI: Make the build work without any errors

Jan 9:

React UI
- Fix 404 handler for Wavesurfer
- Group Bark tabs together

Jan 8:

Release React UI

2023

Click to expand

October 2023

Oct 26:

Improve model selection UX for Musicgen

Oct 24:

Add initial React UI for Musicgen and Demucs (#202)
Fix Bark long generation seed drifting (thanks to https://github.com/520Pig520)

September 2023

Sep 21:

Bark: Add continue as semantic history button
Switch to github docker image storage, new docker image:
- docker pull ghcr.io/rsxdalv/tts-generation-webui:main
Fix server_port option in config #168 , thanks to https://github.com/Dartvauder

Sep 9:

Fix xdg-open command line, thanks to https://github.com/JFronny
Fix multi-line bark generations, thanks to https://github.com/slack-t and https://github.com/bkutasi
Add unload model button to Bark as requested by https://github.com/Aamir3d
Add Bark details to README_Bark.md as requested by https://github.com/Maki9009
Add "optional" to burn in prompt, thanks to https://github.com/Maki9009

Sep 5:

Add voice mixing to Bark
Add v1 Burn in prompt to Bark (Burn in prompts are for directing the semantic model without spending time on generating the audio. The v1 works by generating the semantic tokens and then using it as a prompt for the semantic model.)
Add generation length limiter to Bark

August 2023

Aug 27:

Fix MusicGen ignoring the melody #153

Aug 26:

Add Send to RVC, Demucs, Vocos buttons to Bark and Vocos

Aug 24:

Add date to RVC outputs to fix #147
Fix safetensors missing wheel
Add Send to demucs button to musicgen

Aug 21:

Add torchvision install to colab for musicgen issue fix
Remove rvc_tab file logging

Aug 20:

Fix MBD by reinstalling hydra-core at the end of an update

Aug 18:

CI: Add a GitHub Action to automatically publish docker image.

Aug 16:

Add "name" to tortoise generation parameters

Aug 15:

Pin torch to 2.0.0 in all requirements.txt files
Bump audiocraft and bark versions
Remove Tortoise transformers fix from colab
Update Tortoise to 2.8.0

Aug 13:

Potentially big fix for new user installs that had issues with GPU not being supported

Aug 11:

Tortoise hotfix thanks to manmay-nakhashi
Add Tortoise option to change tokenizer

Aug 8:

Update AudioCraft, improving MultiBandDiffusion performance
Fix Tortoise parameter 'cond_free' mismatch with 'ultra_fast' preset

Aug 7:

add tortoise deepspeed fix to colab

Aug 6:

Fix audiogen + mbd error, add tortoise fix for colab

Aug 4:

Add MultiBandDiffusion option to MusicGen #109
MusicGen/AudioGen save tokens on generation as .npz files.

Aug 3:

Add AudioGen #105

Aug 2:

Fix Model locations not showing after restart

July 2023

July 26:

Voice gallery
Voice cropping
Fix voice rename bug, rename picture as well, add a hash textbox
Easier downloading of voices (#98)

July 24:

Change bark file format to include history hash: ...continued_generation... -> ...from_3ea0d063...

July 23:

Docker Image thanks to https://github.com/jonfairbanks
RVC UI naming improvements

July 21:

Fix hubert not working with CPU only (#87)
Add Google Colab demo (#88)
New settings tab and model locations (for advanced users) (#90)

July 19:

Add Tortoise Optimizations, Thank you https://github.com/manmay-nakhashi #79 (Implements #18)

July 16:

Voice Photo Demo
Add a directory to store RVC models/indexes in and a dropdown
Workaround rvc not respecting is_half for CPU #74
Tortoise model and voice selection improvements #73

July 10:

Demucs Demo #67

July 9:

RVC Demo + Tortoise, v6 installer with update script and automatic attempts to install extra modules #66

July 5:

Improved v5 installer - faster and more reliable #63

July 2:

Upgrade bark settings #59

July 1:

Studio-tab #58

June 2023

Jun 29:

Tortoise new params #54

Jun 27:

Fix eager loading errors, refactor #50

Jun 20

Tortoise: proper long form generation files #46

Jun 19

Tortoise-upgrade #45

June 18:

Update to newest audiocraft, add longer generations

Jun 14:

add vocos wav tab #42

June 5:

Fix "Save to Favorites" button on bark generation page, clean up console (v4.1.1)
Add "Collections" tab for managing several different data sets and easier curration.

June 4:

Update to v4.1 - improved hash function, code improvements

June 3:

Update to v4 - new output structure, improved history view, codebase reorganization, improved metadata, output extensions support

May 2023

May 21:

Update to v3 - voice clone demo

May 17:

Update to v2 - generate results as they appear, preview long prompt generations piece by piece, enable up to 9 outputs, UI tweaks

May 16:

Add gradio settings tab, fix gradio errors in console, improve logging.
Update History and Favorites with "use as voice" and "save voice" buttons
Add voices tab
Bark tab: Remove "or Use last generation as history"
Improve code organization

May 13:

Enable deterministic generation and enhance generated logs. Credits to suno-ai/bark#175.

May 10:

Enable the possibility of reusing history prompts from older generations. Save generations as npz files. Add a convenient method of reusing any of the last 3 generations for the next prompts. Add a button for saving and collecting history prompts under /voices. #10

May 4:

Long form generation (credits to https://github.com/suno-ai/bark/blob/main/notebooks/long_form_generation.ipynb and suno-ai/bark#161)
Adapt to fixed env var bug

May 3:

Improved Tortoise UI: Voice, Preset and CVVP settings as well as ability to generate 3 results (#6)

May 2:

Added support for history recylcing to continue longer prompts manually
Added support for v2 prompts

Before:

Added support for Tortoise TTS

Upgrading

In case of issues, feel free to contact the developers.

Upgrading from v6 to new installer

Recommended: Fresh install

Download the new version and run the start_tts_webui.bat (Windows) or start_tts_webui.sh (MacOS, Linux)
Once it is finished, close the server.
Recommended: Copy the old generations to the new directory, such as favorites/ outputs/ outputs-rvc/ models/ collections/ config.json
With caution: you can copy the whole new tts-generation-webui directory over the old one, but there might be some old files that are lost.

In-place upgrade, can delete some files, tweaks

Update the existing installation using the update_platform script
After the update run the new start_tts_webui.bat (Windows) or start_tts_webui.sh (MacOS, Linux) inside of the tts-generation-webui directory
Once the server starts, check if it works.
With caution: if the new server works, within the one-click-installers directory, delete the old installer_files.

Upgrading from v5 to v6 installer

Download and run the new installer
Replace the "tts-generation-webui" directory in the newly installed directory
Run update_platform

Is there any more optimal way to do this?

Not exactly, the dependencies clash, especially between conda and python (and dependencies are already in a critical state, moving them to conda is ways off). Therefore, while it might be possible to just replace the old installer with the new one and running the update, the problems are unpredictable and unfixable. Making an update to installer requires a lot of testing so it's not done lightly.

New Installer

Download the repository as a zip file and extract it.
Run start_tts_webui.bat or start_tts_webui.sh to start the server. The server will be available at http://localhost:7860
Output log will be available in the installer_scripts/output.log file.

Manual installation (not recommended)

These instructions might not reflect all of the latest fixes and adjustments, but could be useful as a reference for debugging or understanding what the installer does. Hopefully they can be a basis for supporting new platforms, such as AMD/Intel.
Install conda (https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html)
Set up an environment: conda create -n venv python=3.10
Install git, node.js conda install -y -c conda-forge git nodejs conda
a) Either Continue with the installer script
- activate the environment: conda activate venv and
- (venv) node installer_scripts\init_app.js
- then run the server with (venv) python server.py
b) Or install the requirements manually
- Set up pytorch with CUDA or CPU (https://pytorch.org/audio/stable/build.windows.html#install-pytorch):
  - (venv) conda install pytorch torchvision torchaudio cpuonly ffmpeg -c pytorch for CPU/Mac
  - (venv) conda install -y -k pytorch[version=2,build=py3.10_cuda11.7*] torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit ninja ffmpeg -c pytorch -c nvidia/label/cuda-11.7.0 -c nvidia for CUDA
- Clone the repo: git clone https://github.com/rsxdalv/tts-generation-webui.git
- Potentially (if errors occur in the next step) need to install build tools (without Visual Studio): https://visualstudio.microsoft.com/visual-cpp-build-tools/
- Install the requirements:
  - activate the environment: conda activate venv and
  - install all the requirements*.txt (this list might not be up to date, check https://github.com/rsxdalv/tts-generation-webui/blob/main/Dockerfile#L39-L40):
    - (venv) pip install -r requirements.txt
    - (venv) pip install -r requirements_audiocraft.txt
    - (venv) pip install -r requirements_bark_hubert_quantizer.txt
    - (venv) pip install -r requirements_rvc.txt
    - (venv) pip install hydra-core==1.3.2
    - (venv) pip install -r requirements_styletts2.txt
    - (venv) pip install -r requirements_vall_e.txt
    - (venv) pip install -r requirements_maha_tts.txt
    - (venv) pip install -r requirements_stable_audio.txt
    - (venv) pip install soundfile==0.12.1
  - due to pip-torch incompatibilities torch will be reinstalled to 2.0.0, thus it might be necessary to reinstall it again after the requirements if you have a CPU/Mac or installed a specific torch version other than 2.0.0:
    - (venv) conda install pytorch torchvision torchaudio cpuonly ffmpeg -c pytorch for CPU/Mac
    - (venv) conda install -y -k pytorch[version=2,build=py3.10_cuda11.7*] torchvision torchaudio pytorch-cuda=11.7 cuda-toolkit ninja ffmpeg -c pytorch -c nvidia/label/cuda-11.7.0 -c nvidia for CUDA
  - build the react app: (venv) cd react-ui && npm install && npm run build
- run the server: (venv) python server.py

React UI

Install nodejs (if not already installed with conda)
Install react dependencies: npm install
Build react: npm run build
Run react: npm start
Also run the python server: python server.py or with start_(platform) script

Docker Setup

tts-generation-webui can also be ran inside of a Docker container. To get started, pull the image from GitHub Container Registry:

docker pull ghcr.io/rsxdalv/tts-generation-webui:main

Once the image has been pulled it can be started with Docker Compose:

docker compose up -d

The container will take some time to generate the first output while models are downloaded in the background. The status of this download can be verified by checking the container logs:

docker logs tts-generation-webui

Building the image yourself

If you wish to build your own docker container, you can use the included Dockerfile:

docker build -t tts-generation-webui .

Please note that the docker-compose needs to be edited to use the image you just built.

Extra Voices for Bark, Prompt Samples

https://rsxdalv.github.io/bark-speaker-directory/

Bark Readme

README_Bark.md

Info about managing models, caches and system space for AI projects

#186 (reply in thread)

Screenshots

Examples

audio__bark__continued_generation__2023-05-04_16-07-49_long.webm

audio__bark__continued_generation__2023-05-04_16-09-21_long.webm

audio__bark__continued_generation__2023-05-04_16-10-55_long.webm

Open Source Libraries

This project utilizes the following open source libraries:

suno-ai/bark - MIT License
- Description: A powerful library for XYZ.
- Repository: suno/bark
tortoise-tts - Apache-2.0 License
- Description: A flexible text-to-speech synthesis library for various platforms.
- Repository: neonbjb/tortoise-tts
ffmpeg - LGPL License
- Description: A complete and cross-platform solution for video and audio processing.
- Repository: FFmpeg
- Use: Encoding Vorbis Ogg files
ffmpeg-python - Apache 2.0 License
- Description: Python bindings for FFmpeg library for handling multimedia files.
- Repository: kkroening/ffmpeg-python
audiocraft - MIT License
- Description: A library for audio generation and MusicGen.
- Repository: facebookresearch/audiocraft
vocos - MIT License
- Description: An improved decoder for encodec samples
- Repository: charactr-platform/vocos
RVC - MIT License
- Description: An easy-to-use Voice Conversion framework based on VITS.
- Repository: RVC-Project/Retrieval-based-Voice-Conversion-WebUI

Ethical and Responsible Use

This technology is intended for enablement and creativity, not for harm.

By engaging with this AI model, you acknowledge and agree to abide by these guidelines, employing the AI model in a responsible, ethical, and legal manner.

Non-Malicious Intent: Do not use this AI model for malicious, harmful, or unlawful activities. It should only be used for lawful and ethical purposes that promote positive engagement, knowledge sharing, and constructive conversations.
No Impersonation: Do not use this AI model to impersonate or misrepresent yourself as someone else, including individuals, organizations, or entities. It should not be used to deceive, defraud, or manipulate others.
No Fraudulent Activities: This AI model must not be used for fraudulent purposes, such as financial scams, phishing attempts, or any form of deceitful practices aimed at acquiring sensitive information, monetary gain, or unauthorized access to systems.
Legal Compliance: Ensure that your use of this AI model complies with applicable laws, regulations, and policies regarding AI usage, data protection, privacy, intellectual property, and any other relevant legal obligations in your jurisdiction.
Acknowledgement: By engaging with this AI model, you acknowledge and agree to abide by these guidelines, using the AI model in a responsible, ethical, and legal manner.

License

Codebase and Dependencies

The codebase is licensed under MIT. However, it's important to note that when installing the dependencies, you will also be subject to their respective licenses. Although most of these licenses are permissive, there may be some that are not. Therefore, it's essential to understand that the permissive license only applies to the codebase itself, not the entire project.

That being said, the goal is to maintain MIT compatibility throughout the project. If you come across a dependency that is not compatible with the MIT license, please feel free to open an issue and bring it to our attention.

Known non-permissive dependencies:

Library	License	Notes
encodec	CC BY-NC 4.0	Newer versions are MIT, but need to be installed manually
diffq	CC BY-NC 4.0	Optional in the future, not necessary to run, can be uninstalled, should be updated with demucs
lameenc	GPL License	Future versions will make it LGPL, but need to be installed manually
unidecode	GPL License	Not mission critical, can be replaced with another library, issue: neonbjb/tortoise-tts#494

Model Weights

Model weights have different licenses, please pay attention to the license of the model you are using.

Most notably:

Bark: MIT
Tortoise: Unknown (Apache-2.0 according to repo, but no license file in HuggingFace)
MusicGen: CC BY-NC 4.0
AudioGen: CC BY-NC 4.0

Compatibility / Errors

Audiocraft is currently only compatible with Linux and Windows. MacOS support still has not arrived, although it might be possible to install manually.

Torch being reinstalled

Due to the python package manager (pip) limitations, torch can get reinstalled several times. This is a wide ranging issue of pip and torch.

Red messages in console

These messages:

---- requires ----, but you have ---- which is incompatible.

Are completely normal. It's both a limitation of pip and because this Web UI combines a lot of different AI projects together. Since the projects are not always compatible with each other, they will complain about the other projects being installed. This is normal and expected. And in the end, despite the warnings/errors the projects will work together. It's not clear if this situation will ever be resolvable, but that is the hope.

Configuration Guide

You can configure the interface through the "Settings" tab or, for advanced users, via the config.json file in the root directory (not recommended). Below is a detailed explanation of each setting:

Model Configuration

Argument	Default Value	Description
`text_use_gpu`	`true`	Determines whether the GPU should be used for text processing.
`text_use_small`	`true`	Determines whether a "small" or reduced version of the text model should be used.
`coarse_use_gpu`	`true`	Determines whether the GPU should be used for "coarse" processing.
`coarse_use_small`	`true`	Determines whether a "small" or reduced version of the "coarse" model should be used.
`fine_use_gpu`	`true`	Determines whether the GPU should be used for "fine" processing.
`fine_use_small`	`true`	Determines whether a "small" or reduced version of the "fine" model should be used.
`codec_use_gpu`	`true`	Determines whether the GPU should be used for codec processing.
`load_models_on_startup`	`false`	Determines whether the models should be loaded during application startup.

Gradio Interface Options

Argument	Default Value	Description
`inline`	`false`	Display inline in an iframe.
`inbrowser`	`true`	Automatically launch in a new tab.
`share`	`false`	Create a publicly shareable link.
`debug`	`false`	Block the main thread from running.
`enable_queue`	`true`	Serve inference requests through a queue.
`max_threads`	`40`	Maximum number of total threads.
`auth`	`null`	Username and password required to access interface, format: `username:password`.
`auth_message`	`null`	HTML message provided on login page.
`prevent_thread_lock`	`false`	Block the main thread while the server is running.
`show_error`	`false`	Display errors in an alert modal.
`server_name`	`0.0.0.0`	Make app accessible on local network.
`server_port`	`null`	Start Gradio app on this port.
`show_tips`	`false`	Show tips about new Gradio features.
`height`	`500`	Height in pixels of the iframe element.
`width`	`100%`	Width in pixels of the iframe element.
`favicon_path`	`null`	Path to a file (.png, .gif, or .ico) to use as the favicon.
`ssl_keyfile`	`null`	Path to a file to use as the private key file for a local server running on HTTPS.
`ssl_certfile`	`null`	Path to a file to use as the signed certificate for HTTPS.
`ssl_keyfile_password`	`null`	Password to use with the SSL certificate for HTTPS.
`ssl_verify`	`true`	Skip certificate validation.
`quiet`	`true`	Suppress most print statements.
`show_api`	`true`	Show the API docs in the footer of the app.
`file_directories`	`null`	List of directories that Gradio is allowed to serve files from.
`_frontend`	`true`	Frontend.

tts-generation-webui's People

Contributors

Stargazers

Watchers

Forkers

andyzeng114 pirateal sycomix itsbrex maxmax2016 v-nhandt21 zixian2021 osbarcelos79 neurogen-dev songfang atlonxp 1879687161 suniljeph liujingxiu23 baifengbai jonfairbanks traderpedroso duagangtech kibotu skullface20 bec-au ozymoz mbtwithmbt xiaohai-huang yogendra-singh-rathore vital121 githublover520 skittixch aifahad revmagi kcbf lcsouzamenezes ybl1984 iamleon121 secret-guest enterprisium creativelabsai dev-baris douhaohaode jaedukseo jeddstudio bigrixin evdcush jdwebprogrammer allyaminou themodfather limitlessmatrix sunixliu enzoph byhamzahwijaya anngdev chengcheng23 ali1898 stophobia jmaigc iwmo jisunglim miaohf davidyordan enikolair devdartagnan al-dim wingjoezhou schneewolf-labs codedojokapa cdrini aitrepreneur gabrie1lira quijoteshin leftomelas developertts laynz28 le-si huangyingting mrbloomguy cocktailpeanut lvenxing stas-dok aigongshe soon14 hangover3832 syskey26 xxxxxxxx512 tutumomo wangchengqun saulocatharino ducaaaa wryandginger mukseq zakarialabib steveefemsc aidarkezio knowledge-things walchshofer jorgebmann dm8705 samozta xargs007 joachip metrbox

tts-generation-webui's Issues

WARNING | xformers | WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions

I am coming across the following error when Starting. I can see the WebUI but get errors when trying to run cloning or anything else. When Cloning I get this error: RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory |

And when running Totoise I am getting the following:

2023-07-30 21:14:38 | WARNING | xformers | WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.0.1+cpu)
Python 3.10.11 (you have 3.10.12)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
2023-07-30 21:14:39 | WARNING | xformers | Triton is not available, some optimizations will not be enabled.
This is just a warning: No module named 'triton'

Any solutions? I am not a coder, so go gentle. Thanks in advance

ffmpeg module

Apparently this module doesn't get properly installed on Windows. Keep getting AttributeError regarding input method call when trying to save a generated audio file. Tried to install it manually using cmd, but error still occurs.

tortoise-tts-fast

Have you considered adding Tortoise-TTS-Fast instead of the original Tortoise-TTS? Fast performs much faster, and you can add your own models to it. It's better to clone voices because you can extract the latents from whole audio samples, making them more sophisticated.
https://github.com/152334H/tortoise-tts-fast

how much vram do i need i got 6gb

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Loading extensions:
Loaded extension: callback_save_generation_ffmpeg
Loaded extension: callback_save_generation_musicgen_ffmpeg
Loaded extension: empty_extension
Loaded 2 callback_save_generation extensions.
Loaded 1 callback_save_generation_musicgen extensions.
Loading Bark models
- Text Generation: GPU: Yes, Small Model: Yes
- Coarse-to-Fine Inference: GPU: Yes, Small Model: Yes
- Fine-tuning: GPU: Yes, Small Model: No
- Codec: GPU: Yes
Traceback (most recent call last):
File "D:\AI\TTS\one-click-installers-tts-4.0\tts-generation-webui\server.py", line 13, in
from src.bark.generation_tab_bark import generation_tab_bark
File "D:\AI\TTS\one-click-installers-tts-4.0\tts-generation-webui\src\bark\generation_tab_bark.py", line 27, in
from src.model_manager import model_manager
File "D:\AI\TTS\one-click-installers-tts-4.0\tts-generation-webui\src\model_manager.py", line 4, in
model_manager = BarkModelManager(config)
File "D:\AI\TTS\one-click-installers-tts-4.0\tts-generation-webui\src\bark\BarkModelManager.py", line 8, in init
self.reload_models(config)
File "D:\AI\TTS\one-click-installers-tts-4.0\tts-generation-webui\src\bark\BarkModelManager.py", line 29, in reload_models
preload_models(
File "D:\AI\TTS\one-click-installers-tts-4.0\tts-generation-webui\models\bark\bark\generation.py", line 327, in preload_models
_ = load_model(
File "D:\AI\TTS\one-click-installers-tts-4.0\tts-generation-webui\models\bark\bark\generation.py", line 275, in load_model
model = _load_model_f(ckpt_path, device)
File "D:\AI\TTS\one-click-installers-tts-4.0\tts-generation-webui\models\bark\bark\generation.py", line 240, in _load_model
model.to(device)
File "D:\AI\TTS\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
return self._apply(convert)
File "D:\AI\TTS\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "D:\AI\TTS\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "D:\AI\TTS\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 2 more times]
File "D:\AI\TTS\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "D:\AI\TTS\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.25 GiB already allocated; 0 bytes free; 5.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Microsoft Visual C++ required now? Seriously?

Okay I installed it and update still says that I need it. The installer took about 15 minutes and threw a million errors. After re-running it several times I finally get through it and it says "No GPU being used". Try to run the update bat and 15 minutes later it complains that I don't have MSVC. So I install that, try to update again and it still thinks I don't have it.

No GPU being used. (NVIDIA)

When I launch I get this message:

No GPU being used. Careful, inference might be very slow!

I also get this message earlier in the log, though I am not sure it is related:

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.0.1+cu118 with CUDA 1108 (you have 2.0.1+cpu)
    Python  3.10.11 (you have 3.10.11)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details

I indicated option "A" when I installed but it didn't seem to get my 4090 GPU. Should I try reinstalling, or is there any quicker way to try to repair the GPU detection issue?

Gradio Auth issues

I am having an issue setting a username and password to the Gradio interface on this program. In the webui it allows the user to specify the username and password associated with the Gradio UI. I set the auth as username:password as Gradio Docs suggest to no avail. The error I get is when I restart and go to log into the webui. It will always display the incorrect login error. I tried to configure the same settinng in the config.json file and --gradio-auth-path flag with no changes. Is this something that I am doing wrong or is there a bug in the authentication page of this project?

-theman23290

Specs:
OS: Debian 11
CPU: Xeon E5 2670 v3 (25C)
RAM: 80GB
GPU: None

can't find the voice clone tab

where is the voice clone tab? I cab see it on the git page, but its not there on my webui?

MusicGen HuggingFace issue fix

Some HuggingFace accounts appear to have issues, as a result if you see errors relating to 404 or 401 when downloading the MusicGen model, this might be useful.

pip uninstall audiocraft
pip install git+https://[email protected]/GrandaddyShmax/audiocraft_plus@85a5112#egg=audiocraft

Or just edit ..\env\Lib\site-packages\audiocraft\models\loaders.py to use GrandaddyShmax repo

HF_MODEL_CHECKPOINTS_MAP = {
    "small": "GrandaddyShmax/musicgen-small",
    "medium": "GrandaddyShmax/musicgen-medium",
    "large": "GrandaddyShmax/musicgen-large",
    "melody": "GrandaddyShmax/musicgen-melody",
}

Originally posted by @rsxdalv in facebookresearch/audiocraft#139 (comment)

Possible bug: "cannot import name 'open_folder' from 'src.history_tab.open_folder'" on launch (WSL2)

Using the one-click installer, I got errors launching the webUI after installation:

Traceback (most recent call last):
  File "/home/***/tts-generation-webui/tts-generation-webui/server.py", line 40, in <module>
    from src.tortoise.generation_tab_tortoise import generation_tab_tortoise
  File "/home/***/tts-generation-webui/tts-generation-webui/src/tortoise/generation_tab_tortoise.py", line 2, in <module>
    from src.history_tab.open_folder import open_folder
ImportError: cannot import name 'open_folder' from 'src.history_tab.open_folder' (/home/***/tts-generation-webui/tts-generation-webui/src/history_tab/open_folder.py)

I resolved this by changing elif sys.platform == "linux2": to elif sys.platform == "linux":" in src/history_tab/open_folder.py. Maybe something to do with how python recognises WSL2? Anyway that fixed it.

(Issue was intended for another repo) Please delete - wrong app

Saving Demuxed files

@rsxdalv - Is there a way to save demuxed files?
On the video there are three dots on the right hand side of the separated track with a save option but I'm guessing this is a demo version. Presumably you will set up a button to feed the track to a vocoder or to a voice cloner.
(Helpful video by the way).

regards

@Magenta-6

starting does not work

After executing the start_windows.bat, everything looked fine, but theres an error:

Loading Bark models
        - Text Generation:               GPU: Yes, Small Model: Yes
        - Coarse-to-Fine Inference:      GPU: Yes, Small Model: Yes
        - Fine-tuning:                   GPU: Yes, Small Model: No
        - Codec:                         GPU: Yes
Traceback (most recent call last):
  File "D:\**************\tts-generation-webui\tts-generation-webui\server.py", line 132, in <module>
    settings_tab_gradio(save_config_gradio, reload_config_and_restart_ui, gradio_interface_options)
  File "D:\**************\tts-generation-webui\tts-generation-webui\settings_tab_gradio.py", line 34, in settings_tab_gradio
    "inline": gr.Checkbox(label="inline: Display inline in an iframe", value=gradio_interface_options["inline"]),
KeyError: 'inline'

Done!
Drücken Sie eine beliebige Taste . . .

always appears when starting the start_windows.bat now How to solve ?

Top-p and Temperature Value Correction

Hey there, great project! For the newly added MusicGen section, can you add support for smaller gradations when it comes to Top-p and Temperature? Right now it can only go up by whole digits rather than tenths. I receive "Please enter a valid value" if I try to input something like 0.9 for Temperature.

Where to put custom trained Tortoise models and voices?

RVC models one just drops into the GUI, but how do I select a custom Tortoise model and voice that I trained myself?

I cannot find a Tortoise folder or data/models folder (which I saw some folks here mention) in my installtion directory, nor can I find a dropdown or other thing in the GUI where I could select custom models. Though I did find a selection for Tortoise voices, I could not find out where those are installed to add my own.

Cloning Voice is giving Metadate is None expection

After generation this function is called
def save_cloned_voice( full_generation: FullGeneration, ): voice_name = f"test_clone_voice{str(np.random.randint(100000))}" filename = f"voices/{voice_name}.npz" save_npz(filename, full_generation) return filename

using the method save_npz from npz_tools.py
`def save_npz(filename: str, full_generation: FullGeneration, metadata: dict[str, Any]):
def pack_metadata(metadata: dict[str, Any]):
return list(json.dumps(metadata))

np.savez(
    filename,
    **{
        **compress_history(full_generation),
        "metadata": pack_metadata(metadata),
    },
)

Seems that metadata is missing. What can I use as Metadata?

The is a test call when calling npz_tools directly. That sets the metadata this way:

metadata_in = { "_version": "0.0.1", "_hash_version": "0.0.2", "_type": "bark", "is_big_semantic_model": True, "is_big_coarse_model": False, "is_big_fine_model": False, "prompt": "test", "language": None, "speaker_id": None, "hash": "98b14851692f09df5e89c68f0a8e2013", "history_prompt": "continued_generation", "history_prompt_npz": None, "history_hash": "98b14851692f09df5e89c68f0a8e2013", "text_temp": 0.7, "waveform_temp": 0.7, "date": "2023-06-07_16-56-09", "seed": "2039063546", }

Can simply copy paste this code to be used in save_cloned_voice? Or do I need to set the values like Language, speaker, prompt ...

TTS-4.0 No module named deepspeed (Ans. Install TTS-6.0)

Hi again.
I'm using your TTS 4.0 on a daily basis.
I think its the most comprehensive and usable so far.
I ran the update_windows.bat to get the latest.
I saw that it brought in some changes, however when I later tried to start the app I ran into the following:

+++++++++++++++++++++++++++++++++++++
from src.tortoise.gen_tortoise import generate_tortoise_long
File "C:\SuperStableDiffusion2.0\TTS-4.0\tts-generation-webui\src\tortoise\gen_tortoise.py", line 7, in
from tortoise.api import TextToSpeech, MODELS_DIR
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\tortoise-2.4.2-py3.10.egg\tortoise\api.py", line 14, in
from tortoise.models.autoregressive import UnifiedVoice
File "C:\SuperStableDiffusion2.0\TTS-4.0\installer_files\env\lib\site-packages\tortoise-2.4.2-py3.10.egg\tortoise\models\autoregressive.py", line 6, in
import deepspeed
ModuleNotFoundError: No module named 'deepspeed'

Done!
Press any key to continue . . .
+++++++++++++++++++++++++++++++++++++++

I opened cmd and tried running >pip install deepspeed
but it cried out for torch, (which I know is already installed) but didn't want to stuff it up - so I decided to leave it.
Your thoughts?

macOS install issue

The installer works fine for windows but I get this error when installing on macOS:

Retrieving notices: ...working... done
ERROR: Could not open requirements file: [Errno 2] No such file or directory: 'requirements.txt'
Traceback (most recent call last):
  File "/Users/Downloads/one-click-installers-tts-4.0/webui.py", line 125, in <module>
    install_dependencies()
  File "/Users/Downloads/one-click-installers-tts-4.0/webui.py", line 63, in install_dependencies
    update_dependencies()
  File "/Users/Downloads/one-click-installers-tts-4.0/webui.py", line 77, in update_dependencies
    os.chdir("tts-generation-webui/models/bark")
FileNotFoundError: [Errno 2] No such file or directory: 'tts-generation-webui/models/bark'

After manually adding the required directories and files, I get this error:

/System/Volumes/Preboot/Cryptexes/OS/Users/Downloads/one-click-installers-tts-4.0/installer_files/env/lib/python3.10/site-packages/torch/lib/libtorch_global_deps.dylib' (no such file), 
'/Users/Downloads/one-click-installers-tts-4.0/installer_files/env/lib/python3.10/site-packages/torch/lib/libtorch_global_deps.dylib' 
(mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))

http query

Hi guys, I love the project, this is not an issue just a question, How can I query it with http? There is any api flag or something? Thanks for taking the time :3

Problem with voice cloning option

Hi! I have a error when i upload sample of my voice. What I'm doing wrong?

EDIT: Nevermind. I was commented this line in files.

[Feature Request] Stop Generating Tortoise TTS Samples during generation

When I generate anything, it works perfectly, however, sometimes I am finding I made a typo, mistake or something else and have to close it out entirely and come back to it again.

Can there be a halt or stop Generation when Generating Audio Samples?

Issue when 'Use a voice'

So when I try to 'Use a voice' and it will come into an error and gives me this in the terminal:

Traceback (most recent call last):
File "C:\AI\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\gradio\routes.py", line 427, in run_predict output = await app.get_blocks().process_api(
File "C:\AI\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\gradio\blocks.py", line 1323, in process_api
result = await self.call_function(
File "C:\AI\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\gradio\blocks.py", line 1067, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\AI\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\gradio\utils.py", line 336, in async_iteration
return await iterator.anext()
File "C:\AI\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\gradio\utils.py", line 329, in anext
return await anyio.to_thread.run_sync(
File "C:\AI\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\AI\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\AI\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\AI\one-click-installers-tts-4.0\installer_files\env\lib\site-packages\gradio\utils.py", line 312, in run_sync_iterator_async
return next(iterator)
File "C:\AI\one-click-installers-tts-4.0\tts-generation-webui\src\bark\generation_tab_bark.py", line 294, in gen
filename, filename_png, _, _, filename_npz, seed, metadata = generate(
File "C:\AI\one-click-installers-tts-4.0\tts-generation-webui\src\bark\generation_tab_bark.py", line 77, in generate
filename, filename_png, filename_npz, metadata = save_generation(
File "C:\AI\one-click-installers-tts-4.0\tts-generation-webui\src\bark\generation_tab_bark.py", line 130, in save_generation
history_hash = history_to_hash(history_prompt)
File "C:\AI\one-click-installers-tts-4.0\tts-generation-webui\src\bark\history_to_hash.py", line 11, in history_to_hash
"semantic_prompt": npz["semantic_prompt"].tolist(),
TypeError: string indices must be integers

Do you know how to fix this?

Error using CPU for voice cloning

Hi, I faced this error when I tried to do a bark voice clone.

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

I'm using a mac m2, and my device does not have proper GPU, so I unchecked "use GPT", but it still gets the issue.

Can you please help to check this??

Error After Installation

After installing I get a few errors and then it crashes

Done!
Press any key to continue . . .

This is the output on launching start_windows.bat

[Feature Request] Choose which Directory Cache and Models go to

Is it possible to choose where cached stuff goes to?
My OS Install or C: is becoming full and I ordered a dedicated external HDD for AI stuff.
For example, the models go to my user cached directory, I want everything to go to my separate HDD in a certain folder.

No such file or directory

one click installer fails for M2

he following NEW packages will be INSTALLED:

  ffmpeg             pkgs/main/osx-arm64::ffmpeg-4.2.2-h04105a8_0
  gnutls             pkgs/main/osx-arm64::gnutls-3.6.15-h887c41c_0
  lame               pkgs/main/osx-arm64::lame-3.100-h1a28f6b_0
  libidn2            pkgs/main/osx-arm64::libidn2-2.3.4-h80987f9_0
  libopus            pkgs/main/osx-arm64::libopus-1.3-h1a28f6b_1
  libtasn1           pkgs/main/osx-arm64::libtasn1-4.19.0-h80987f9_0
  libunistring       pkgs/main/osx-arm64::libunistring-0.9.10-h1a28f6b_0
  libvpx             pkgs/main/osx-arm64::libvpx-1.10.0-hc377ac9_0
  nettle             pkgs/main/osx-arm64::nettle-3.7.3-h84b5d62_1
  openh264           pkgs/main/osx-arm64::openh264-1.8.0-h98b2900_0
  x264               pkgs/main/osx-arm64::x264-1!152.20180806-h1a28f6b_0

The following packages will be DOWNGRADED:

  cryptography                       41.0.2-py310h6204c90_0 --> 41.0.2-py310h6e31b35_0
  curl                                     8.1.1-h80987f9_1 --> 8.1.1-h80987f9_0
  git                               2.40.1-pl5340h6cf2078_1 --> 2.40.1-pl5340h3afa44c_1
  krb5                                    1.20.1-hf3e1bf2_1 --> 1.19.4-h8380606_0
  libcurl                                  8.1.1-h3e2b118_1 --> 8.1.1-h0f1d93c_0
  libnghttp2                              1.52.0-h62f6fdd_1 --> 1.52.0-h10c0552_1
  libssh2                                 1.10.0-h02f6b3c_2 --> 1.10.0-h449679c_2
  openssl                                  3.0.9-h1a28f6b_0 --> 1.1.1u-h1a28f6b_0
  python                                 3.10.12-hb885b13_0 --> 3.10.12-hc0d8a6c_0

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Traceback (most recent call last):
  File "/Users/kibotu/Documents/repos/ai/tts-generation-webui/webui.py", line 96, in <module>
    install_dependencies()
  File "/Users/kibotu/Documents/repos/ai/tts-generation-webui/webui.py", line 60, in install_dependencies
    update_dependencies()
  File "/Users/kibotu/Documents/repos/ai/tts-generation-webui/webui.py", line 68, in update_dependencies
    os.chdir("tts-generation-webui")
FileNotFoundError: [Errno 2] No such file or directory: 'tts-generation-webui'

One click installer seemed to have failed on Windows 11 "ModuleNotFoundError: No module named 'tortoise'"

Trying this on an Nvidia GPU, the 1650 Super to be exact. The entire installation process seemed to have gone fine, I selected Nvidia when asked. No issues.

However, at the end I got this error:

Env file not found. Creating default env.
Config file not found. Creating default config.
Traceback (most recent call last):
  File "C:\Users\USER\Desktop\one-click-installers-tts-6.0\one-click-installers-tts-6.0\tts-generation-webui\server.py", line 40, in <module>
    from src.tortoise.generation_tab_tortoise import generation_tab_tortoise
  File "C:\Users\USER\Desktop\one-click-installers-tts-6.0\one-click-installers-tts-6.0\tts-generation-webui\src\tortoise\generation_tab_tortoise.py", line 7, in <module>
    from src.tortoise.gen_tortoise import (
  File "C:\Users\USER\Desktop\one-click-installers-tts-6.0\one-click-installers-tts-6.0\tts-generation-webui\src\tortoise\gen_tortoise.py", line 7, in <module>
    from tortoise.api import TextToSpeech, MODELS_DIR
ModuleNotFoundError: No module named 'tortoise'

Done!

Then it said press any key to continue. The instructions.txt isn't really clear on how to run the actual webui. So I tried running the webui.py and it just says Conda is not installed. Exiting... in the terminal and doesn't run.

I then tried to run the server.py just in case, and for some reason it said I didn't have gradio installed. So I went ahead and installed gradio just by doing pip install gradio. This let the script run, but similar to the one click installer terminal error, it can't find tortoise. I'm also not sure if that's the only issue, or if the one-click installer wanted to continue doing things after that, but stopped when that failed.

[Feature Suggestion] Combine efforts with gitmylo/audio-webui

You might or might not be aware of gitmylo/audio-webui which has very similar goals. Maybe it would make sense to combine forces with gitmylo?

https://github.com/gitmylo/audio-webui

add german huber model

would it be posibel to add the german huber model version from C0untFloyd (Model) to the Bark Voice Clone Tab.

Query about ideal drive:\location for install to reduce conflicts

This is not an issue as such but a request for advice about where to set up an installation.

Firstly, your idea to create a cross-platform tts tool is extremely valuable, especially with a one-click installer. however
I am hesitant to just set up a folder in my C:\Users\NAME drive and expect everything to go flawlessly.

Over the past year the advent of Ai generated tti and LLM's has been an exciting journey.
I am one of those people who has gone through a steep learning curve getting to grips with virtual python environments and do not fully understand the intricacies of how the multitudes of modules inter-relate.
I do know enough to know that that conflicts can occur between them and that a lot of time can be spent un-installing and reinstalling them.

At present I have successfully set up several tts applications: Coqui, Silero, & Bark.
I have also attempted to set up Tortoise and AudioCraft, but have failed to troubleshoot installation errors. (Note1. below)

I first started using Silero in oobabooga, which worked fine.
However the voices were v. limited so I set up coqui and bark to get a better variety of voices and accents.
And the possibility of voice cloning/training is extremely appealing also.

All that is a long way to ask the question:

Is there an ideal place to install your tts-webui that will not create conflicts with other installations?
Should I un-install all the other applications first?
Is it likely that the installation will add duplicate versions of torch, conda and other dependencies that are already installed?

BACKGROUND
I am using Windows 10, RTX 4070Ti, CUDA 11.7, Anaconda3, conda 22.9.0, Torch 2.0.1
2023-06-25_pip list.txt

Here is a list of the folders where I have set up various applications:
Initially I installed tts so that they could be used with oobabooga and later Silly Tavern
C:\SuperStableDiffusion2.0\stable-diffusion-webui
C:\SuperStableDiffusion2.0\oobabooga-windows
C:\SuperStableDiffusion2.0\Bark\bark-gui
C:\SuperStableDiffusion2.0\CoquiTTS\TTS

Later I began setting up applications in the User\ directory
C:\Users\ABC\Audiocraft\audiocraft-main
C:\Users\ABC\Bark-tts\bark_win\bark-gui
C:\Users\ABC\Silero-tts
C:\Users\ABC\coqui-tts
C:\Users\ABC\tortoise-tts

(Note 1.) Issues Raised:
neonbjb/tortoise-tts#468
facebookresearch/audiocraft#123

History tab broken after using for first time

Using the v3.1 one click installer, I had to comment out lines 12 and 126 of tts-generation-webui/server.py to get the start_windows.bat to work. Ran a few tests and got great results with both Bark and Tortoise.
After closing and restarting the webui, i got the error: ValueError: not enough values to unpack (expected 2, got 1). A screenshot is below of what the Terminal looks after running start_windows.bat

A temporary fix is commenting out line 129 of tts-generation-webui/server.py but this removes the History tab.

Massive amount of both dependency errors and tracebacks related either to Windows 11 or an unsupported GPU while installing "oogabooga", the same problem occurs while installing the "tts-generation-webui"

OS: Windows 11
GPU: GeForce 1650 (no Ti, anything like that)
Problem(s):

The very first error occured while installing the one-click-installer package itself:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
open-clip-torch 2.7.0 requires protobuf==3.20.0, but you have protobuf 4.23.4 which is incompatible.
rembg 2.0.35 requires scikit-image>=0.19.3, but you have scikit-image 0.19.2 which is incompatible.
tensorflow-intel 2.11.0 requires protobuf<3.20,>=3.9.2, but you have protobuf 4.23.4 which is incompatible.
tensorflow-intel 2.11.0 requires tensorboard<2.12,>=2.11, but you have tensorboard 2.13.0 which is incompatible.
audiocraft 0.0.2a2 requires hydra-core>=1.1, but you have hydra-core 1.0.7 which is incompatible.
Successfully installed Cython-0.29.36 Pillow-9.3.0 Werkzeug-2.3.6 absl-py-1.4.0 faiss-cpu-1.7.4 functorch-2.0.0 json5-0.9.14 librosa-0.9.2 llvmlite-0.39.0 matplotlib-inline-0.1.6 praat-parselmouth-0.4.3 protobuf-4.23.4 pyworld-0.3.3 resampy-0.4.2 rvc-beta-0.1.1 tensorboard-2.13.0 tensorboardX-2.6.1 torchcrepe-0.0.20 torchgen-0.0.1 tornado-6.3.2 traitlets-5.9.0 uvicorn-0.21.1

[notice] A new release of pip is available: 23.1.2 -> 23.2
[notice] To update, run: python.exe -m pip install --upgrade pip
Successfully installed RVC dependencies
Env file not found. Creating default env.
Config file not found. Creating default config.
Traceback (most recent call last):
  File "X:\tts_gen_webui\oobabooga_windows\installer_files\env\lib\site-packages\tensorboard\compat\__init__.py", line 42, in tf
    from tensorboard.compat import notf  # noqa: F401
ImportError: cannot import name 'notf' from 'tensorboard.compat' (X:\Bark\one-click-installers-tts-6.0\tts_gen_webui\oobabooga_windows\installer_files\env\lib\site-packages\tensorboard\compat\__init__.py)

After I tried to run it "anyway":

Traceback (most recent call last):
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\tensorboard\compat\__init__.py", line 42, in tf
    from tensorboard.compat import notf  # noqa: F401
ImportError: cannot import name 'notf' from 'tensorboard.compat' (X:\tts_gen_webui\installer_files\env\lib\site-packages\tensorboard\compat\__init__.py)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\transformers\utils\import_utils.py", line 1172, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "X:\tts_gen_webui\installer_files\env\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 38, in <module>
    from ...modeling_utils import PreTrainedModel, SequenceSummary
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\transformers\modeling_utils.py", line 84, in <module>
    from accelerate import __version__ as accelerate_version
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\accelerate\__init__.py", line 3, in <module>
    from .accelerator import Accelerator
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\accelerate\accelerator.py", line 37, in <module>
    from .tracking import LOGGER_TYPE_TO_CLASS, GeneralTracker, filter_trackers
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\accelerate\tracking.py", line 42, in <module>
    from torch.utils import tensorboard
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\torch\utils\tensorboard\__init__.py", line 12, in <module>
    from .writer import FileWriter, SummaryWriter  # noqa: F401
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\torch\utils\tensorboard\writer.py", line 16, in <module>
    from ._embedding import (
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\torch\utils\tensorboard\_embedding.py", line 9, in <module>
    _HAS_GFILE_JOIN = hasattr(tf.io.gfile, "join")
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\tensorboard\lazy.py", line 65, in __getattr__
    return getattr(load_once(self), attr_name)
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\tensorboard\lazy.py", line 97, in wrapper
    cache[arg] = f(arg)
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\tensorboard\lazy.py", line 50, in load_once
    module = load_fn()
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\tensorboard\compat\__init__.py", line 45, in tf
    import tensorflow
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\tensorflow\__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\tensorflow\python\__init__.py", line 37, in <module>
    from tensorflow.python.eager import context
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\tensorflow\python\eager\context.py", line 28, in <module>
    from tensorflow.core.framework import function_pb2
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\tensorflow\core\framework\function_pb2.py", line 16, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\tensorflow\core\framework\attr_value_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\tensorflow\core\framework\tensor_pb2.py", line 16, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\tensorflow\core\framework\resource_handle_pb2.py", line 16, in <module>
    from tensorflow.core.framework import tensor_shape_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__shape__pb2
  File "C:\Users\Shiny\AppData\Roaming\Python\Python310\site-packages\tensorflow\core\framework\tensor_shape_pb2.py", line 36, in <module>
    _descriptor.FieldDescriptor(
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\google\protobuf\descriptor.py", line 561, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "X:\tts_gen_webui\tts-generation-webui\server.py", line 40, in <module>
    from src.tortoise.generation_tab_tortoise import generation_tab_tortoise
  File "X:\tts_gen_webui\tts-generation-webui\src\tortoise\generation_tab_tortoise.py", line 7, in <module>
    from src.tortoise.gen_tortoise import (
  File "X:\tts_gen_webui\tts-generation-webui\src\tortoise\gen_tortoise.py", line 7, in <module>
    from tortoise.api import TextToSpeech, MODELS_DIR
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\tortoise\api.py", line 14, in <module>
    from tortoise.models.autoregressive import UnifiedVoice
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\tortoise\models\autoregressive.py", line 6, in <module>
    from transformers import GPT2Config, GPT2PreTrainedModel, LogitsProcessorList
  File "<frozen importlib._bootstrap>", line 1075, in _handle_fromlist
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\transformers\utils\import_utils.py", line 1163, in __getattr__
    value = getattr(module, name)
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\transformers\utils\import_utils.py", line 1162, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "X:\tts_gen_webui\installer_files\env\lib\site-packages\transformers\utils\import_utils.py", line 1174, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.gpt2.modeling_gpt2 because of the following error (look up to see its traceback):
Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Train on expressions

Hi, voice cloning is working good with bark, is there way of training more expressions in Bark?

Installation error

I used the latest installation package “one-click-installers-tts-4.0” for installation, but encountered an error during the final installation.

Is this error related to an exception during installation? How can I solve this problem?

Opensource License

Hello;
Could this repo get a MIT or Apache2 license on it?

That could free people up to use and build upon it freely and comfortably.

docker installer on M2 fails with this error

190.7   note: This error originates from a subprocess, and is likely not a problem with pip.
190.7   ERROR: Failed building wheel for praat-parselmouth
190.7   Building wheel for pyworld (pyproject.toml): started
197.9   Building wheel for pyworld (pyproject.toml): finished with status 'done'
197.9   Created wheel for pyworld: filename=pyworld-0.3.4-cp310-cp310-linux_aarch64.whl size=813752 sha256=573bb778d844d9e23c988c840b507e1a959c87e9831d7155341699300be9ba95
197.9   Stored in directory: /root/.cache/pip/wheels/66/09/8a/a1d79b73d59756f66e9bfe55a199840efc7473adb76ddacdfd
197.9 Successfully built torchcrepe rvc-beta pyworld
197.9 Failed to build praat-parselmouth
197.9 ERROR: Could not build wheels for praat-parselmouth, which is required to install pyproject.toml-based projects
------
Dockerfile:30
--------------------
  28 |     RUN pip3 install -r requirements_audiocraft.txt
  29 |     RUN pip3 install -r requirements_bark_hubert_quantizer.txt
  30 | >>> RUN pip3 install -r requirements_rvc.txt
  31 |
  32 |     # Run the server
--------------------
ERROR: failed to solve: process "/bin/sh -c pip3 install -r requirements_rvc.txt" did not complete successfully: exit code: 1

missing config.json

config.json is missing

Error Tortoise with corrupted regresive model

I was making my first test with tortoise model and this started to download all requirements to generate the audio but my hard disk get full and this operation could not be finished and get me an error, so I restarted the application and the error still were there, so I delete all the folder for trying to fix the problem, then I reinstall everything once more but the issue still keeps, that means that the installation of the regresive model of tortoise is in a folder out of the application, can you say me where can I find this corrupted file?

Generating tortoise with params:
TortoiseParameters(
text='hello world',
voice='random',
preset='ultra_fast',
seed=-1.0,
cvvp_amount=0.0,
split_prompt=False,
num_autoregressive_samples=16,
diffusion_iterations=30,
temperature=0.8,
length_penalty=1.0,
repetition_penalty=2.0,
top_p=0.8,
max_mel_tokens=500,
cond_free=True,
cond_free_k=2,
diffusion_temperature=1.0,
model='Default'
)
Traceback (most recent call last):
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\gradio\routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\gradio\blocks.py", line 1352, in process_api
result = await self.call_function(
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\gradio\blocks.py", line 1093, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\gradio\utils.py", line 341, in async_iteration
return await iterator.anext()
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\gradio\utils.py", line 334, in anext
return await anyio.to_thread.run_sync(
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\gradio\utils.py", line 317, in run_sync_iterator_async
return next(iterator)
File "C:\one-click-installers-tts-6.0\tts-generation-webui\src\tortoise\generation_tab_tortoise.py", line 162, in gen
yield from generate_tortoise_long(
File "C:\one-click-installers-tts-6.0\tts-generation-webui\src\tortoise\gen_tortoise.py", line 188, in generate_tortoise_long
datas = generate_tortoise(
File "C:\one-click-installers-tts-6.0\tts-generation-webui\src\tortoise\gen_tortoise.py", line 100, in generate_tortoise
tts = get_tts()
File "C:\one-click-installers-tts-6.0\tts-generation-webui\src\tortoise\gen_tortoise.py", line 77, in get_tts
MODEL = TextToSpeech(
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\tortoise\api.py", line 237, in init
self.diffusion.load_state_dict(torch.load(get_model_path('diffusion_decoder.pth', models_dir)))
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\torch\serialization.py", line 797, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "C:\one-click-installers-tts-6.0\installer_files\env\lib\site-packages\torch\serialization.py", line 283, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

Convert RVC error

when i press generate i am receiving an error
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1352, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1077, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/content/tts-generation-webui/src/rvc_tab/rvc_tab.py", line 72, in run_rvc
index_path=index_path.name,
AttributeError: 'NoneType' object has no attribute 'name'

[Feature Suggestion] Queue System

I have no experience with gradio otherwise i would even attempt it myself. I plan to even learn from it haha
My suggestion is to add a queue system to the voice generation. Instead of clicking generate, you would basically click add to queue. Process queue would then take care of it. This would help with line splitting as well. In this specific case i am talkign about tortoise. As i am getting way more stable results from that one

https://www.reddit.com/r/StableDiffusion/comments/zq0wl9/i_made_a_queue_system_for_automatic1111s_stable/
https://github.com/Kryptortio/SDAtom-WebUi-us
This is an example on how i envision it

Specific Bart Voice for TTS?

Hello there! Firstly, I must say that this web UI is absolutely amazing! Great job!
I would like to inquire about how I can choose a specific voice for the text-to-speech (TTS) generated by Bart. Currently, it seems to use random voices.
Thank you for your assistance.

How to change the quantizer model.

I encountered an error when trying to use my own quantizer. I have been testing this model, and it works, but I don't know where to add it in your code so that I can use my own. The one you provided only works in English. So, I placed it in the folder where the hubert models are and renamed it as "tokenizer," but I'm getting an error:

RuntimeError: Error(s) in loading state_dict for CustomTokenizer:
Unexpected key(s) in state_dict: "intermediate.weight", "intermediate.bias".
size mismatch for fc.weight: copying a param with shape torch.Size([10000, 4096]) from checkpoint, the shape in the current model is torch.Size([10000, 1024])

Using the code below, everything works fine.

import os
import torchaudio
import torch
import numpy as np

from hubert.hubert_manager import HuBERTManager
from hubert.pre_kmeans_hubert import CustomHubert
from hubert.customtokenizer import CustomTokenizer
from encodec import EncodecModel
from encodec.utils import convert_audio

hubert_model = CustomHubert(checkpoint_path='data/models/hubert/hubert.pt')

wav, sr = torchaudio.load('audio.wav')
if wav.shape[0] == 2:
    wav = wav.mean(0, keepdim=True)
semantic_vectors = hubert_model.forward(wav, input_sample_hz=sr)
tokenizer = CustomTokenizer.load_from_checkpoint('data/models/hubert/tokenizer.pth')
semantic_tokens = tokenizer.get_token(semantic_vectors)
model = EncodecModel.encodec_model_24khz()
model.set_target_bandwidth(6.0)
wav, sr = torchaudio.load('audio.wav')
wav = convert_audio(wav, sr, model.sample_rate, model.channels)
wav = wav.unsqueeze(0)
with torch.no_grad():
    encoded_frames = model.encode(wav)
codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1).squeeze()
fine_prompt = codes
coarse_prompt = fine_prompt[:2, :]
np.savez('helloWorld.npz', semantic_prompt=semantic_tokens, fine_prompt=fine_prompt, coarse_prompt=coarse_prompt)

Upload model/index in RVC demo issue

when i press on the icon to upload i receive this error
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1352, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1077, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/content/tts-generation-webui/src/tortoise/gr_reload_button.py", line 24, in
fn=lambda: open_folder(dirname),
File "/content/tts-generation-webui/src/history_tab/open_folder.py", line 12, in open_folder
subprocess.check_call(["xdg-open", "--", folder_path])
File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['xdg-open', '--', '/content/tts-generation-webui/data/models/rvc/checkpoints']' returned non-zero exit status 1.
xdg-open: unexpected option '--'
Try 'xdg-open --help' for more information.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 437, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1352, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1077, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/content/tts-generation-webui/src/tortoise/gr_reload_button.py", line 24, in
fn=lambda: open_folder(dirname),
File "/content/tts-generation-webui/src/history_tab/open_folder.py", line 12, in open_folder
subprocess.check_call(["xdg-open", "--", folder_path])
File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['xdg-open', '--', '/content/tts-generation-webui/data/models/rvc/checkpoints']' returned non-zero exit status 1.

Tortoise TTS: Length and Repetition Penalty at low values makes text too long?

When I put Length and Repetition values at a lower value, command line tells me I need to shorten my text, but I barely put any in.
Is there a way to fix this?

I can clone a voice but not render a TTS

I have managed to get it up and running using Docker Install. However, while I can clone a voice in Bark, I am unable to render a TTS on my clone, or the preinstalled voice, either on Bark or TTS. The attached is a pic of the screen, which just hangs there...

I would really appreciate any pointers. Thanks

Here is a copy of the logs:

2023-08-01 18:01:47
2023-08-01 18:01:47 ==========
2023-08-01 18:01:47 == CUDA ==
2023-08-01 18:01:47 ==========
2023-08-01 18:01:47
2023-08-01 18:01:47 CUDA Version 11.8.0
2023-08-01 18:01:47
2023-08-01 18:01:47 Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2023-08-01 18:01:47
2023-08-01 18:01:47 This container image and its contents are governed by the NVIDIA Deep Learning Container License.
2023-08-01 18:01:47 By pulling and using the container, you accept the terms and conditions of this license:
2023-08-01 18:01:47 https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
2023-08-01 18:01:47
2023-08-01 18:01:47 A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
2023-08-01 18:01:47
2023-08-01 18:01:58 Loading extensions:
2023-08-01 18:01:58 Loaded extension: callback_save_generation_musicgen_ffmpeg
2023-08-01 18:01:58 Loaded extension: callback_save_generation_ffmpeg
2023-08-01 18:01:58 Loaded extension: empty_extension
2023-08-01 18:01:58 Loaded 2 callback_save_generation extensions.
2023-08-01 18:01:58 Loaded 1 callback_save_generation_musicgen extensions.
2023-08-01 18:01:58 /app/tts-generation-webui/src/rvc_tab/rvc_tab.py
2023-08-01 18:01:58 Starting Gradio server...
2023-08-01 18:01:58 Gradio interface options:
2023-08-01 18:01:58 inline: False
2023-08-01 18:01:58 inbrowser: True
2023-08-01 18:01:58 share: False
2023-08-01 18:01:58 debug: False
2023-08-01 18:01:58 enable_queue: True
2023-08-01 18:01:58 max_threads: 40
2023-08-01 18:01:58 auth: None
2023-08-01 18:01:58 auth_message: None
2023-08-01 18:01:58 prevent_thread_lock: False
2023-08-01 18:01:58 show_error: False
2023-08-01 18:01:58 server_name: 0.0.0.0
2023-08-01 18:01:58 server_port: None
2023-08-01 18:01:58 show_tips: False
2023-08-01 18:01:58 height: 500
2023-08-01 18:01:58 width: 100%
2023-08-01 18:01:58 favicon_path: None
2023-08-01 18:01:58 ssl_keyfile: None
2023-08-01 18:01:58 ssl_certfile: None
2023-08-01 18:01:58 ssl_keyfile_password: None
2023-08-01 18:01:58 ssl_verify: True
2023-08-01 18:01:58 quiet: True
2023-08-01 18:01:58 show_api: True
2023-08-01 18:01:58 file_directories: None
2023-08-01 18:01:58 _frontend: True
2023-08-01 18:03:10 Running on local URL: http://0.0.0.0:7860
2023-08-01 18:03:10 gen old_generation_filename voices/voice_from_audio_1d36e051d27db45661738e095c0a8a9d.npz
2023-08-01 18:03:10 Loading Bark models
2023-08-01 18:03:10 - Text Generation: GPU: Yes, Small Model: Yes
2023-08-01 18:03:10 - Coarse-to-Fine Inference: GPU: Yes, Small Model: Yes
2023-08-01 18:03:10 - Fine-tuning: GPU: Yes, Small Model: Yes
2023-08-01 18:03:10 - Codec: GPU: Yes
Downloading text.pt: 14%|█▎ | 315M/2.32G [07:22<54:57, 607kB/s] /venv/lib/python3.10/site-packages/gradio/processing_utils.py:171: UserWarning: Trying to convert audio automatically from float32 to 16-bit int format.
2023-08-01 14:49:40 warnings.warn(warning.format(data.dtype))
Downloading text.pt: 19%|█▊ | 430M/2.32G [09:55<45:48, 686kB/s]
Downloading text_2.pt: 6%|▌ | 315M/5.35G [08:46<2:31:37, 554kB/s]

"bark" and "tortoise"

Is this two separate things, "bark" and "tortoise" or are you planning to somehow combine them so that "bark" can use the "tortoise" model?

RuntimeError: Input type (MPSHalfType) and weight type (torch.HalfTensor) should be the same for RVC with mps

Hi, I faced this issue when I tried to use RVC with mps (mac m2), and I got this issue.

没有发现支持的N卡, 使用MPS进行推理

logs...

RuntimeError: Input type (MPSHalfType) and weight type (torch.HalfTensor) should be the same

This is current GUI setting

Can you please check this issue?

Installation on cloud gpu

Hey, i would like to run this somewhere in the cloud. Either on google colab, or a runpod.io container (that also has jupyter notebooks available).

Now the one click installer that is recommended, isn't very helpful in that case. I'd like to run this in a jupyter/colab notebook like people do with the other WebUIs out there. Is that an intended use case? Are there instructions floating around or do i have to adapt the manual installation instructions?

Regarding those manual installation: What do you mean with number 8: "clone the repos in the ./models/ directory and install requirements under them". I can't find a directory with that name. do you mean "./data/models/"? And what do i clone there? Can you please specify the repositories somewhere or add them as git submodules?

I think all the other instructions could simply be included in such a notebook. Thanks.

not working

This script relies on Miniconda which can not be silently installed under a path with spaces.
Press any key to continue . . .

RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'

I tried use RVC Beta Demo so i get this error.
"RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'"
How can i fix this?

Specifications:

f0 Up key = 0
f0 Method = harvest
Index Rate = 0,66
Filter Radius = 3
Resample SR = 0
RMS Mix Rate = 1
Protect = 0,33

Hubert
-Device = CPU (I did the installation for "CPU only")

Use half precision model (Depends on GPU support) = [ ](not check)

rsxdalv / tts-generation-webui Goto Github PK

tts-generation-webui's Introduction

TTS Generation WebUI

Videos

Changelog

July 2024

June 2024

April 2024

March 2024

February 2024

January 2024

2023

October 2023

September 2023

August 2023

July 2023

June 2023

May 2023

Upgrading

Upgrading from v6 to new installer

Recommended: Fresh install

In-place upgrade, can delete some files, tweaks

Upgrading from v5 to v6 installer

Is there any more optimal way to do this?

New Installer

Manual installation (not recommended)

React UI

Docker Setup

Building the image yourself

Extra Voices for Bark, Prompt Samples

Bark Readme

Info about managing models, caches and system space for AI projects

Screenshots

Examples

Open Source Libraries

Ethical and Responsible Use

License

Codebase and Dependencies

Model Weights

Compatibility / Errors

Torch being reinstalled

Red messages in console

Configuration Guide

Model Configuration

Gradio Interface Options

tts-generation-webui's People

Contributors

Stargazers

Watchers

Forkers

tts-generation-webui's Issues

Recommend Projects

Recommend Topics

Recommend Org