The alt_speechbrain from guxm2021

alt_speechbrain's Introduction

Hi there 👋

I am Xiangming Gu. You can also call me Brian. I am currently a third-year Ph.D. candidate from NUS Sound and Music Computing Lab, where I am supervised by Prof. Ye Wang. I am affilated to Integrative Sciences and Engineering Programme and School of Computing at National University of Singapore. Before that, I obtained my B.E. degree of Electronic Engineering and B.S. degree of Finance at Tsinghua University.

My research interests include two directions: (i) fundamental research for generative models and (multimodal) large language models; (ii) application of machine learning, e.g. multimodal learning, multi-distribution learning (domain adaptation), and trustworthy machine learning (fairness, memorization), to singing/speech techniques.

Visit my personal website.

alt_speechbrain's People

Contributors

Stargazers

Watchers

alt_speechbrain's Issues

Reproducing the results

Hi,

I read your impressive paper and thank you for releasing the training script. I am trying to reproduce the results on DSing (train30) but I encounter some problems.

My training gets overfitting quickly. I compared the train_log.txt and found that the training losses are in the same range as yours, but my validation losses and WER/CERs, are much higher. I guess that's why the lr scheduler reduces the lr faster than expected, and leads to the overfitting problem. Below is my training log for the fine-tune experiment:

epoch: 1, lr_model: 3.00e-04, lr_wav2vec: 1.00e-05 - train loss: 1.55 - valid loss: 1.23, valid ctc_loss: 2.15, valid seq_loss: 1.00, valid CER: 33.66, valid WER: 49.62

epoch: 2, lr_model: 3.00e-04, lr_wav2vec: 1.00e-05 - train loss: 1.30 - valid loss: 1.33, valid ctc_loss: 2.73, valid seq_loss: 9.79e-01, valid CER: 62.26, valid WER: 93.71

epoch: 3, lr_model: 2.40e-04, lr_wav2vec: 9.00e-06 - train loss: 1.22 - valid loss: 1.46, valid ctc_loss: 3.29, valid seq_loss: 1.00, valid CER: 90.94, valid WER: 1.45e+02

epoch: 4, lr_model: 1.92e-04, lr_wav2vec: 8.10e-06 - train loss: 1.18 - valid loss: 1.47, valid ctc_loss: 3.54, valid seq_loss: 9.49e-01, valid CER: 99.51, valid WER: 1.55e+02

epoch: 5, lr_model: 1.54e-04, lr_wav2vec: 7.29e-06 - train loss: 1.15 - valid loss: 1.39, valid ctc_loss: 3.18, valid seq_loss: 9.40e-01, valid CER: 82.79, valid WER: 1.19e+02

First, I thought there is something wrong with my dev set. I tried inferencing on my dev and test set using the checkpoint you provide, and it gives a WER/CER similar to what you reported. Now I am confused and want to ask for help. Any insights would be appreciated.

I prepared my dev set using my own script and it should be doing the same thing as the Kaldi recipe, except that some problematic files are excluded. I ended up having 408 songs, which is a subset of the standard 482 songs.

Thank you in advance!

Jiawen

Checkpoint request

Hello how are you? First I want to congratulate the developers of this tool, which apparently will be very useful, however from what I've been seeing there is not a pre-trained model for us to test the results, I would like to know if it is possible to release the checkpoint model to test this tool in our musics?

Thank you very much in advance,

Lucas Rodrigues.

Recommend Projects

guxm2021 / alt_speechbrain Goto Github PK

alt_speechbrain's Introduction

Hi there 👋

alt_speechbrain's People

Contributors

Stargazers

Watchers

Forkers

alt_speechbrain's Issues

Reproducing the results

Checkpoint request

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent