Giter VIP home page Giter VIP logo

Comments (6)

rafaelvalle avatar rafaelvalle commented on August 15, 2024 3

Yeah, it matches.

from tacotron2.

yliess86 avatar yliess86 commented on August 15, 2024 2

@rafaelvalle It is now working. Tank you for the help!
When I will finish my prototype I will probably retrain both models with the same mel representation.

from tacotron2.

rafaelvalle avatar rafaelvalle commented on August 15, 2024

The representation of the mel-spectrograms output by the Tacotron 2 model you trained does not match the mel-spectrogram used in r9y9's MoL WaveNet. More specifically, the minimum and maximum mel-spectrogram frequencies are different.

The code below converts a mel trained with the default mel-spectrogram representation in this repo to the representation used in r9y9's shared WaveNet MoL. Ideally one would train Tacotron 2 and Wavenet with the same mel representation, specially the minimum and maximum mel frequencies.

# load mel file output by Tacotron 2
mel = torch.autograd.Variable(torch.from_numpy(
    np.load('mel_spec.npy'))[None,:])

# Tacotron 2 Training Params
filter_length = 1024
hop_length = 256
win_length = 1024
sampling_rate = 22050
mel_fmin = 0.0
mel_fmax = None
taco_stft = TacotronSTFT(
    filter_length, hop_length, win_length, 
    sampling_rate=sampling_rate, mel_fmin=mel_fmin, 
    mel_fmax=mel_fmax)

# Project from Mel-Spectrogram to Spectrogram
mel_decompress = taco_stft.spectral_de_normalize(mel)
mel_decompress = mel_decompress.transpose(1, 2).data.cpu()
spec_from_mel_scaling = 1000
spec_from_mel = torch.mm(mel_decompress[0], taco_stft.mel_basis)
spec_from_mel = spec_from_mel.transpose(0, 1)
spec_from_mel = spec_from_mel * spec_from_mel_scaling

# WaveNet Decoder 2 Training Params
filter_length = 1024
hop_length = 256
win_length = 1024
sampling_rate = 22050
mel_fmin = 125
mel_fmax = 7600

taco_stft_other = TacotronSTFT(
    filter_length, hop_length, win_length, 
    sampling_rate=sampling_rate, mel_fmin=mel_fmin, mel_fmax=mel_fmax)

# Project from Spectrogram to r9y9's WaveNet Mel-Spectrogram
mel_minmax = taco_stft_other.spectral_normalize(
    torch.matmul(taco_stft_other.mel_basis, spec_from_mel))

from tacotron2.

rafaelvalle avatar rafaelvalle commented on August 15, 2024

The first few frames of the mel-spectrogram you provided sounds like this:
yliess86_audio_trim.wav.zip

from tacotron2.

MXGray avatar MXGray commented on August 15, 2024

@rafaelvalle
Hope you could help me figure out if the default Tacotron2 hparams of this repo is a match to the nv-wavenet hparams I used below?
Or if not, how can I ensure that it matches?

config.json of nv-wavenet/pytorch:

"data_config": {
    "training_files": "train_files.txt",
    "segment_length": 22050,
    "mu_quantization": 256,
    "filter_length": 1024,
    "hop_length": 256,
    "win_length": 1024,
    "sampling_rate": 22050
},

"dist_config": {
    "dist_backend": "nccl",
    "dist_url": "tcp://localhost:54321"
},

"wavenet_config": {
    "n_in_channels": 256,
    "n_layers": 16,
    "max_dilation": 128,
    "n_residual_channels": 64,
    "n_skip_channels": 256,
    "n_out_channels": 256,
    "n_cond_channels": 80,
    "upsamp_window": 1024,
    "upsamp_stride": 256
}

}

And, these are my Tacotron2 hparams:

    # Data Parameters             #
    load_mel_from_disk=False,
    training_files='filelists/ljs_audio_text_train_filelist.txt',
    validation_files='filelists/ljs_audio_text_val_filelist.txt',
    text_cleaners=['english_cleaners'],
    sort_by_length=False,

    # Audio Parameters             #
    max_wav_value=32768.0,
    sampling_rate=22050,
    filter_length=1024,
    hop_length=256,
    win_length=1024,
    n_mel_channels=80,
    mel_fmin=0.0,
    mel_fmax=None,  # if None, half the sampling rate

    # Model Parameters             #
    n_symbols=len(symbols),
    symbols_embedding_dim=512,

    # Encoder parameters
    encoder_kernel_size=5,
    encoder_n_convolutions=3,
    encoder_embedding_dim=512,

    # Decoder parameters
    n_frames_per_step=1,  # currently only 1 is supported
    decoder_rnn_dim=1024,
    prenet_dim=256,
    max_decoder_steps=1000,
    gate_threshold=0.6,

    # Attention parameters
    attention_rnn_dim=1024,
    attention_dim=128,

    # Location Layer parameters
    attention_location_n_filters=32,
    attention_location_kernel_size=31,

    # Mel-post processing network parameters
    postnet_embedding_dim=512,
    postnet_kernel_size=5,
    postnet_n_convolutions=5,

    # Optimization Hyperparameters #
    use_saved_learning_rate=False,
    learning_rate=1e-3,
    weight_decay=1e-6,
    grad_clip_thresh=1,
    batch_size=12,
    mask_padding=False  # set model's padded outputs to padded values
)

Would greatly appreciate your help. Thanks!

from tacotron2.

rafaelvalle avatar rafaelvalle commented on August 15, 2024

Closing. Please re-open if new issues appear!

from tacotron2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.