Comments (10)
Share an image of the training and validation loss, the rhythm (alignment map) and the pitch contour (f0) used during inference such that we can investigate better.
from mellotron.
Here's the image of the training and validation loss and the rhythm (alignment map) and the pitch contour (f0) used during inference.
The style transfer results are likely going to get better considering how early the train is (checkpoint at 35500 iterations, I also have it at 36000 but it couldn't generate any sound for some reason), however, the singing voice results are insanely loud and clipping most of the time, which is weird because I don't remember it to be like this with the previous dataset I tested.
I have shared the entire dataset here if you would like to check, thanks!
P.S.: Now that I think about it, I'm using only upper-case letters for the transcription, could this affect the results compared to having lower-case letters?
from mellotron.
Set the smoothing factor of your validation loss to 0 and zoom in such that we can take a look at the curve. My suspicion is that your not picking the model with the lowest validation loss.
Also, would love to hear singing samples from donald trump if you have them :-)
from mellotron.
I went for a GIF, hope it's clearer now:
No zoom: https://i.imgur.com/NsMzzTO.gif
With zoom: https://i.imgur.com/HmkV5X7.gif
from mellotron.
Just zoom in on the long line after the big drop such that we can see the variation on that line.
from mellotron.
Alright, I hope it's better now: https://i.imgur.com/VRHpvd9.png
Oh and I'll definitely share some Trump samples as soon as I can get decent results, the ones I made weren't worth keeping, in the past I synthesized some songs using Trump voice but I used a completely different method than the one adopted here, so I'm not sure if this is the place to share them as they're unrelated to this repo, perhaps your email?
from mellotron.
The training and validation curves, alignment and other things look fine.
Your data seems to have audio files from multiple speaker, although you're training it as a single speaker. This might be the source of the problem. You can also train further and check again later.
from mellotron.
Can you share your batch_size and how many steps when you get this good result? @AndroYD84
from mellotron.
@daxiangpanda I haven't changed any parameters other than pointing my own dataset and training with the LibriTTS pretrained model provided. However this time I'm using a new dataset (4750 files, total duration 2h 55m) as the old one had a few transcription errors, I manually checked that the audio transcriptions were perfect this time.
The super loud screech I experienced before was still present at the beginning, but disappeared the more I kept training, I'm just beginning to get listenable results at 235000 iters.
from mellotron.
Closing due to inactivity.
from mellotron.
Related Issues (20)
- NoneType' object is not iterable
- Mismatch model volume
- Training on a different language HOT 1
- Inference without rhythm and pitch
- parse_output error with Blizzard2013 data
- Training on EmovDB HOT 2
- Voice synthesis by model is not the same as the voice with speaker ID HOT 1
- Try to train some new words
- inference speed on CPU
- Training time
- Two key points of training multispeaker mellotron
- how to train?
- colab demo for inferenece
- How to generate .musicxml files like the examples in `/data`? HOT 1
- Synthesize own text without style transfer gives poor audio results HOT 1
- Here's some code to start mellotron inference by calling a .py file from CLI [Docs]
- What is the reason of filtering "_" and "~" symbols?
- Something wrong with text padding HOT 5
- Can I use TensorRT to speed up model inference?
- colab error HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mellotron.