Comments (5)
Hi @raghav-menon !
My experience with low-resource languages was quite similar to yours (increasing validation loss + decreasing WER). I'm speculating that this is due to overfitting to high-frequency words.
I found that these hyperparameters work pretty well for most languages in CommonVoice:
attention_dropout=0.2, activation_dropout=0.1, hidden_dropout=0.05, final_dropout=0.1, feat_proj_dropout=0.05, mask_time_prob=0.05, layerdrop=0.04, learning_rate=3e-4
+ batch_size=128
(can be obtained e.g. with batch_size=16, gradient_accumulation_steps=8
)
Also these augmentations from Audiomentations gave a bit more stability:
AddGaussianNoise(min_amplitude=0.0001, max_amplitude=0.005, p=0.2)
and PitchShift(min_semitones=-1, max_semitones=2, p=0.2)
Keeping shorter utterances should not be a problem too. It's more important to catch any incorrectly transcribed clips, since they can greatly destabilize the CTC loss (if you get inf
loss with ctc_zero_infinity=False
, then it's likely that) .
from blog.
@raghav-menon the test data followed the same WER dynamic as the validation one.
I also had very limited success working with noisy speech (youtube & radio). With a frozen feature encoder the model stopped converging at around 40 WER even with 100s of hours of speech.
At the moment, the most promising pretrained model for noisy speech is Wav2Vec-Robust (https://huggingface.co/facebook/wav2vec2-large-robust), which may or may not work for you, since the training data for it was English-only.
from blog.
Hey @raghav-menon,
Did you play around with the hyper-parameters a bit to see what works / doesn't work well? One important thing to notice with facebook/wav2vec2-large-xlsr-53
is that it was pretrained on read-out audio data meaning that the data was quite clean. Is this also the case for you dataset?
Also I would definitely keep utterances that are <4s. I usually filter only utterances that are shorter than <1s (or even keep those as well).
It's very normal that the validation loss goes up again where as the WER continues going down.
In terms of hyperparameters, I would try to play around with the learning_rate
, batch_size
and hidden_dropout
-> those seem to be quite correlated to the final WER. Here is a nice graphic about hyperparameters for the fine-tuning the model in Turkish: https://wandb.ai/wandb/xlsr/sweeps/p23j88jo?workspace=user-borisd13
Also it might help quite a bit to use data-augmentation (@anton-l - do you maybe have a link to some good techniques here?)
from blog.
Hello @patrickvonplaten,
Thank you for your response. I did play around with the hyperparameters. A slight deviation from the value given the WER remains at 1 throughout. So not much of a progress on that end. The data is real time data from radio transmission recording and hence not of studio quality.
I will, as you have mentioned, filter out the ones which are <1s and keep the rest and try it. I had also tried pretraining wav2vec2 with untranscribed data but looks like even the colab pro memory is not enough.
I will let you know how it progresses.
Thanks.
Regards,
Raghav
from blog.
Hello @anton-l,
Thanks for your suggestions. How did the trained model fare with the test data when you experienced increasing validation loss and decreasing WER. Just curious. I did not bother to run the model on the test data as model final WER was 76% and far worse than my HMM-GMM where I obtained a 60 WER. The best WER I had obtained for this data was with a TDNN architecture and it was around 50 were I had included a little bit of Self-supervised learning as well. Just to let you know my data is not studio quality as these are real time radio transmission recordings. I am wondering what the impact of noise is on the wav2vec2 feature extractor as it is a huge difference in WER.
I will indeed try out your suggestions and let you know.
Thanks.
Regards,
Raghav
from blog.
Related Issues (20)
- Broken link in the PEFT blog post HOT 1
- Can not execute example in idefics-9b-instruct
- [w2v-bert] Questions about average duration by token
- Are images cached? HOT 7
- Codelab issue Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers HOT 2
- The training procedure of probabilistic time series forecasting is wrong 🔴
- only classifier head is trained in tweet sentiment classification LoRA finetuning blog
- Gemma Hugging face generate error HOT 2
- gemma: jax/ flax version HOT 2
- Decision Transformers: Error in Trainer HOT 7
- Pip Install Failure: 101_train-decision-transformers.ipynb HOT 2
- Is the Sightseer model open source? HOT 1
- Please update the blog to fix: "Due to a bug fix in https://github.com/huggingface/transformers/pull/28687 transcription" HOT 9
- [bug] zh/ blog titles displaying in English & missing translation links HOT 6
- reload the command for every Send
- personal access token doest work
- gemma fine tuning blog: formatting_func never called HOT 4
- Add Llama3! HOT 4
- PerceiverIO multimodal tensor sizes
- Citation needed HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blog.