Comments (3)
How many speakers do you have? If it is a single speaker dataset you are finetuning, you do not need any reference (and can even set multispeaker flag to false and do not load the pretrained diffusion model). Otherwise you do need a reference in the same way as the base model because the model needs to know the target speaker you want to synthesize. Or you can just hard code the speaker embeddings as a part of the model weights if you do not want any reference.
from styletts2.
My dataset is 1 speaker but libritts is many speakers. I used the libritts model you shared to finetune. I will try setting multispeaker flag to false
from styletts2.
It should work even if you set multispeaker flag to true. You just need a arbitrary reference audio from the training set. You can save this as a part of the parameters. For example,
text = '''Maltby and Company would issue warrants on them deliverable to the importer, and the goods were then passed to be stored in neighboring warehouses.
'''
reference_dicts = {}
reference_dicts['LJSpeech'] = "data/LJSpeech-1.1/wavs/LJ001-0001.wav"
start = time.time()
noise = torch.randn(1,1,256).to(device)
for k, path in reference_dicts.items():
ref_s = compute_style(path)
wav = inference(text, ref_s, alpha=0.9, beta=0.9, diffusion_steps=10, embedding_scale=1)
rtf = (time.time() - start) / (len(wav) / 24000)
print(f"RTF = {rtf:5f}")
import IPython.display as ipd
print(k + ' Synthesized:')
display(ipd.Audio(wav, rate=24000, normalize=False))
print('Reference:')
display(ipd.Audio(path, rate=24000, normalize=False))
from styletts2.
Related Issues (20)
- Solutions to punctuation pause problems HOT 2
- ValueError: cannot convert float NaN to integer HOT 1
- Is It Possible to Get Word-Level Timestamps for Generated Audio? HOT 1
- All the term of training losses are 0.000 except mel loss HOT 1
- styletts2 inference pip package HOT 1
- Current code doesn't work with hifigan HOT 4
- Testing foundation layer needed!
- Noise on long sentences HOT 1
- Some of FineTuning has this error HOT 5
- Using a smaller Hifigan HOT 1
- An Error From LJspeech Dataset HOT 1
- Stage 2 training bug (after joint training) HOT 7
- Speech-to-speech possible? HOT 3
- stage1 training issue HOT 6
- 你好,请问模型支持流式tts吗? HOT 1
- Fine-tuning worsens the quality of speech-synthesis. HOT 6
- When start firtst_train give errors. I have 96 Gb Ram and 3 P40/24GB/ 1 T4 /16GB/ ?? HOT 10
- Train a zero-shot voice adaptation model for a different accent/language HOT 1
- Finetuning kernel size issue HOT 1
- Preparing text and data HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from styletts2.