monatis / german-tts Goto Github PK

German Tacotron 2 and Multi-band MelGAN in TensorFlow with TF Lite inference support

Python 1.03% Jupyter Notebook 98.97%

german multiband-melgan tacotron2 tensorflow tflite tts

german-tts's Introduction

👋 Hey, I'm Yusuf!

I'm an AI research engineer from Turkey. 📊 My work is usually related to NLProc, automatic speech recognition and neural text-to-speech. I'm always passionate about efficient implementations and green AI as an abolutionist vegan. 🌱

🗞️ Timeline

The timeline below is dynamically updated with the messages I posted to a Telegram bot. 🤖

monatis/clip.cpp	ggerganov/llama.cpp	monatis/stable-diffusion-tf-docker

unum-cloud/usearch	damian0815/llama.cpp	abetlen/llama-cpp-python

🤙 Some more places where you can find me

german-tts's People

Stargazers

Watchers

Forkers

x-ccs danielfrankee prachi3098

german-tts's Issues

Inference performance of modified inference_tflite.py

The default sentence takes about 67secs here while it takes about 2.6s with inference.py. Please check.

Several typos in inference_tflite.py code and missing audio generation

diff inference_tflite.py inference_tflite_corrected.py
89a90

processor = Processor()

136c137
< mbmelgan_interpreter = tf.lite.Interpreter(model_path=path_tombmelgan[:-6] + "tflite")

mbmelgan_interpreter = tf.lite.Interpreter(model_path=path_to_melgan[:-6] + "tflite")

142c143,148
< audio = inference_tflite("Möchtest du das meiner Frau erklären? Nein? Ich auch nicht.", interpreter, mbmelgan_interpreter)
\ Kein Zeilenumbruch am Dateiende.

start = time.time()
audio = infer_tflite("Möchtest du das meiner Frau erklären? Nein? Ich auch nicht.", interpreter, mbmelgan_interpreter)

duration = time.time() - start
print(f"it took {duration} secs")
wavfile.write("sample.wav", 22050, audio)

background noise in multi-band melgan

Hi, in your third note, multi-band melgan will cause some background noise, so how I can avoid the background ? What's the optimazations you mean ?

download link is not working

https://www.dropbox.com/s/raw/2k8ikhyu0qjsv2b/thorsten-mb_melgan-820533.h5

Some guidance on training?

Thanks for sharing your training results. I'm trying to reproduce the same process and perhaps do some finetuning afterwards. Now I have trained Tacotron2 94k steps, but still getting poor results.. I'm listing what I have done, would you mind sharing some guidance on what I have done wrong?

Cloned the TensorFlowTTS repo(after your PR merged)
Updated one line in tensorflow_tts/processor/thorsten.py to include special characters: _letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzäüößÄÜÖ"
Updated tensorflow_tts/configs/tacotron2.py to add support of 'thorsten' dataset: from tensorflow_tts.processor.thorsten import THORSTEN_SYMBOLS as thorsten_symbols ...... elif dataset == "thorsten": self.vocab_size = len(thorsten_symbols)
Updated examples/tacotron2/conf/tacotron2.v1.yaml to use dataset thorsten, and set batch_size to 9 for my 3070 gpu: dataset: thorsten ...... batch_size: 9
Set up TensorflowGPU, did pip install, ran commands to preprocess the dataset:tensorflow-tts-normalize --rootdir ./dump_thorsten --outdir ./dump_thorsten --config preprocess/thorsten_preprocess.yaml --dataset thorsten & tensorflow-tts-normalize --rootdir ./dump_thorsten --outdir ./dump_thorsten --config preprocess/thorsten_preprocess.yaml --dataset thorsten
Trained Tacotron2 with command: CUDA_VISIBLE_DEVICES=0 python examples/tacotron2/train_tacotron2.py --train-dir ./dump_thorsten/train/ --dev-dir ./dump_thorsten/valid/ --outdir ./output/tacotron2/v1.thorsten/ --config ./examples/tacotron2/conf/tacotron2.v1.thorsten.yaml --use-norm 1 --mixed_precision 0
Verified result with the 'german-tts-inference' notebook you shared, using my 94k checkpoint h5 and your MB-MELGAN vocoder, the new vocab_size 70, and new mapper file. Here is the modified notebook with my 94k h5: https://colab.research.google.com/drive/1VLjk3PouMlhAl0Tizko3OqH9CrNcvlzk?usp=sharing

You can listen to the audio in the notebook, any help would be appreciated.

Adding "emotional" sentences to "thorsten" dataset

Hi @monatis.
Thanks for your efforts on training models based on my free german dataset.
This is not an issue, more like an information on what i'm planning to do next on my dataset.

In mid january 2021 i'll start recording phrases in an emotional manner (https://arxiv.org/pdf/1901.04276.pdf) and publish them as usually. Because i'm not an actor or professional voice i hope to read the phrases in a normal way of emotions and not too emotional.

Next steps:

Choose random 300 phrases (around 50 chars length each), no special emotional phrases, from existing dataset
Record every one of these phrases in following emotions:
- Amused (Erfreut)
- Angry (Wütend)
- Disgusted (Angeekelt)
- Sleepy (Schläfrig)
- Surprised (Überrascht)

Maybe that's interesting for you too.

Umlaute are removed

I think Umlaut-characters (äüößÄÜÖ) are currently just being removed from the input texts instead of getting their own symbol id or being replaced by similar ASCII encodings ('ae', 'ue, 'oe', 'ss'...). Even though I guess the neural network learns to pronounce 'fnf' as 'fünf' I think the performance could be improved by fixing this.

The background is that german_transliterate actually doesn't change the umlaut-characters, even though it states it 'replaces Unicode symbols with ASCII characters'. They are still in the string afterwards and as there is no symbol id for them in symbol_to_id they are just left out in the resulting sequence.

A solution could be to append those characters to ALL_SYMBOLS to give them their own id. Unfortunately the network probably has to be retrained after changing this.

Please don't hesitate to tell me if I got something wrong and umlaut characters are being handled correctly.

[Edit: Thank you Monatis and Thorsten for this really great effort regardless of this issue anyway!]

monatis / german-tts Goto Github PK

german-tts's Introduction

👋 Hey, I'm Yusuf!

🗞️ Timeline

german-tts's People

Stargazers

Watchers

Forkers

german-tts's Issues

Inference performance of modified inference_tflite.py

Several typos in inference_tflite.py code and missing audio generation

136c137
< mbmelgan_interpreter = tf.lite.Interpreter(model_path=path_tombmelgan[:-6] + "tflite")

142c143,148
< audio = inference_tflite("Möchtest du das meiner Frau erklären? Nein? Ich auch nicht.", interpreter, mbmelgan_interpreter)
\ Kein Zeilenumbruch am Dateiende.

background noise in multi-band melgan

download link is not working

Some guidance on training?

Adding "emotional" sentences to "thorsten" dataset

Umlaute are removed

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

monatis / german-tts Goto Github PK

german-tts's Introduction

👋 Hey, I'm Yusuf!

🗞️ Timeline

german-tts's People

Stargazers

Watchers

Forkers

german-tts's Issues

136c137 < mbmelgan_interpreter = tf.lite.Interpreter(model_path=path_tombmelgan[:-6] + "tflite")

142c143,148 < audio = inference_tflite("Möchtest du das meiner Frau erklären? Nein? Ich auch nicht.", interpreter, mbmelgan_interpreter) \ Kein Zeilenumbruch am Dateiende.

Recommend Projects

Recommend Topics

Recommend Org

136c137
< mbmelgan_interpreter = tf.lite.Interpreter(model_path=path_tombmelgan[:-6] + "tflite")

142c143,148
< audio = inference_tflite("Möchtest du das meiner Frau erklären? Nein? Ich auch nicht.", interpreter, mbmelgan_interpreter)
\ Kein Zeilenumbruch am Dateiende.