Finetuning whisper with Catalan datasets to change the domain of the original model.
test
- Creation and activation of the Python venv:
python3 -m venv /path/to/new/virtual/environment
source <venv>/bin/activate
- Install the requirements. To do so, just install the files in
requirements.txt
pip install -r requirements.txt
- Create environmental variable with the hugging face token: `export HF_TOKEN=hf_yOurToKEn'. You can check that it was properly created by writing:
printenv
- Run the
main.py
either locally or in any cluster using SLURM (Or whatever you have there).
Warning!
The following libraries need to be installed manually from terminal:
pip install accelerate -U
pip install transformers[torch]
TODO: Solve this.
In theory now it is solved.
For the moment, we encompass the following Datasets:
Common voice it's a dataset developed by Mozilla of people recording their voice through microphones.
Catalan is the most recorded voice, summing up to 3.500h.