zac2022-lyric-alignment's Introduction

Solution for Zalo AI Challenge 2022 - Lyrics Alignment

Requirements

pip install -r requirements.txt

Overview

Using Demucs to extract the music and lyrics in the original audio.
Resampling original audio to 16K audio.
Creating new vocab dictionary for Wav2Vec2.
Selecting segments from labels randomly and merge them to create new pair of audio/lyric.
Fine-tuning Wav2Vec2 model with original CTC loss with all training data with the new vocab dictionary.
Using forced-alignment (dynamic programming) to find the best alignment path between audio and lyric.
Merging character durations to obtain words segment index from the audio.

Reproduce

Prepare Dataset

Download data here and prepare a dataset in the following format:

|- data/
|   |- public_test/
|       |- lyrics/
|       |- new_labels_json/
|       |- songs/
|   |- train/
|       |- labels/
|       |- songs/

Training

sh reproduce.sh

you can also download, extract our checkpoints here and will obtain the following format:

|- checkpoints/
|   |- dragonSwing/
|       |- wav2vec2-base-vietnamese/
|           |- checkpoint-5500/
|               |- pytorch_model.bin

Make A Submission

python submission.py submission --saved_path ./result
zip -r submit.zip result/*.json

zac2022-lyric-alignment's People

Contributors

Stargazers

Watchers

zac2022-lyric-alignment's Issues

Size mismatch when copying a param from checkpoint

Xin chào team Telegram, mình chạy submission thì xuất hiện lỗi này

Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Wav2Vec2ForCTCV2: size mismatch for lm_head.weight: copying a param with shape torch.Size([109, 768]) from checkpoint, the shape in current model is torch.Size([98, 768]).
size mismatch for lm_head.bias: copying a param with shape torch.Size([109]) from checkpoint, the shape in current model is torch.Size([98]).

Mình muốn hỏi là làm sao để mình khác phục lỗi này ạ? Mình xin cảm ơn

Recommend Projects