Giter VIP home page Giter VIP logo

muavic's Introduction

MuAViC

A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation.

Paper

Overview

MuAViC provides

  • 1200 hours of transcribed audio-visual speech for 9 languages (English, Arabic, German, Greek, Spanish, French, Italian, Portuguese and Russian)
  • text translations for 6 English-to-X directions and 6 X-to-English directions (X = Greek, Spanish, French, Italian, Portuguese or Russian)
MuAViC data statistics

The raw data is collected from TED/TEDx talk recordings.

Detailed statistics

Audio-Visual Speech Recognition

Language Code Train Hours (H+P) Train Speakers
English En 436 + 0 4.7K
Arabic Ar 16 + 0 95
German De 10 + 0 53
Greek El 25 + 0 113
Spanish Es 178 + 0 987
French Fr 176 + 0 948
Italian It 101 + 0 487
Portuguese Pt 153 + 0 810
Russian Ru 49 + 0 238

Audio-Visual En-X Speech-to-Text Translation

Direction Code Train Hours (H+P) Train Speakers
English-Greek En-El 17 + 420 4.7K
English-Spanish En-Es 21 + 416 4.7K
English-French En-Fr 21 + 416 4.7K
English-Italian En-It 20 + 417 4.7K
English-Portuguese En-Pt 18 + 419 4.7K
English-Russian En-Ru 20 + 417 4.7K

Audio-Visual X-En Speech-to-Text Translation

Direction Code Train Hours (H+P) Train Speakers
Greek-English El-En 8 + 17 113
Spanish-English Es-En 64 + 114 987
French-English Fr-En 45 + 131 948
Italian-English It-En 48 + 53 487
Portuguese-English Pt-En 53 + 100 810
Russian-English Ru-En 8 + 41 238

Getting Data

We provide scripts to generate the audio/video data and AV-HuBERT training manifests for MuAViC.

First, clone this repo for the scripts

git clone https://github.com/facebookresearch/muavic.git

Install required packages:

conda install -c conda-forge ffmpeg==4.2.2
conda install -c conda-forge sox
pip install -r requirements.txt

Then get audio-visual speech recognition and translation data via

python get_data.py --root-path ${ROOT} --src-lang ${SRC_LANG}

where the speech language ${SRC_LANG} is one of en, ar, de, el, es, fr, it, pt and ru.

Generated data will be saved to ${ROOT}/muavic:

  • ${ROOT}/muavic/${SRC_LANG}/audio for processed audio files
  • ${ROOT}/muavic/${SRC_LANG}/video for processed video files
  • ${ROOT}/muavic/${SRC_LANG}/*.tsv for AV-HuBERT AVSR training manifests
  • ${ROOT}/muavic/${SRC_LANG}/${TGT_LANG}/*.tsv for AV-HuBERT AVST training manifests

Models

In the following table, we provide all end-to-end trained models mentioned in our paper:

Task Languages Best Checkpoint Dictionary Tokenizer
AVSR ar best_ckpt.pt dict tokenizer
de best_ckpt.pt dict tokenizer
el best_ckpt.pt dict tokenizer
en best_ckpt.pt dict tokenizer
es best_ckpt.pt dict tokenizer
fr best_ckpt.pt dict tokenizer
it best_ckpt.pt dict tokenizer
pt best_ckpt.pt dict tokenizer
ru best_ckpt.pt dict tokenizer
ar,de,el,es,fr,it,pt,ru best_ckpt.pt dict tokenizer
AVST en-el best_ckpt.pt dict tokenizer
en-es best_ckpt.pt dict tokenizer
en-fr best_ckpt.pt dict tokenizer
en-it best_ckpt.pt dict tokenizer
en-pt best_ckpt.pt dict tokenizer
en-ru best_ckpt.pt dict tokenizer
el-en best_ckpt.pt dict tokenizer
es-en best_ckpt.pt dict tokenizer
fr-en best_ckpt.pt dict tokenizer
it-en best_ckpt.pt dict tokenizer
pt-en best_ckpt.pt dict tokenizer
ru-en best_ckpt.pt dict tokenizer
{el,es,fr,it,pt,ru}-en best_ckpt.pt dict tokenizer

Demo

To try out our state-of-the-art audio-visual models with different audio and video inputs, including a recorded video through the webcam or an uploaded video, checkout our demo:

demo.mp4

You can read more about our model in the README file in the demo folder.

Training

For training Audio-Visual models, we are going to use AV-HuBERT framework.

  1. Clone and install AV-HuBERT in the root directory:

    $ # Clone the "muavic" branch of av_hubert's repo
    $ git -b muavic clone https://github.com/facebookresearch/av_hubert.git
    $ # Set the fairseq version
    $ cd avhubert
    $ git submodule init
    $ git submodule update
    $ # Install av-hubert's requirements
    $ pip install -r requirements.txt
    $ # Install fairseq
    $ cd fairseq
    $ pip install --editable ./
  2. Download an AV-HuBERT pre-trained model from here.

  3. Open the training script (scripts/train.sh) and replace these variables:

    # language direction (e.g "en" or "en-fr")
    LANG=
    
    # path where output trained models will be located
    OUT_PATH= 
    
    # path to the downloaded pre-trained model
    PRETRAINED_MODEL_PATH=
  4. Run the training script:

    $ bash scripts/train.sh

Note:
All audio-visual models found here used the large_vox_iter5.pt pre-trained model.

Decoding/Evaluating

To evaluate your trained model (or our trained models) against MuAViC, follow these steps:

  1. Open the decoding script (scripts/decode.sh) and replace these variables:

    # language direction (e.g "en" or "en-fr")
    LANG=???
    
    # data split (e.g "test" or "valid")
    GROUP=???
    
    # inference modality (choices: "audio", "video", "audio,video")
    MODALITIES=???
    
    # path to the trained model
    MODEL_PATH=???
    
    # path where decoding results and scores will be located
    OUT_PATH=???
  2. Run the decoding script:

    $ bash scripts/decode.sh

License

CC-BY-NC 4.0

Citation

@article{anwar2023muavic,
  title={MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation},
  author={Anwar, Mohamed and Shi, Bowen and Goswami, Vedanuj and Hsu, Wei-Ning and Pino, Juan and Wang, Changhan},
  journal={arXiv preprint arXiv:2303.00628},
  year={2023}
}

muavic's People

Contributors

anwarvic avatar kahne avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

muavic's Issues

Multilingual AVSR model decoding and training

I downloaded the multilingual AVSR model (x_avsr) and tried to use the decoding script.
First, I ran into this error:

Traceback (most recent call last):   
  File "/usr/users/roudi/muavic/av_hubert/avhubert/infer_s2s.py", line 311, in hydra_main                                                         
    distributed_utils.call_main(cfg, main)
  File "/data/sls/u/meng/roudi/muavic/av_hubert/fairseq/fairseq/distributed/utils.py", line 369, in call_main                                     
    main(cfg, **kwargs)
  File "/usr/users/roudi/muavic/av_hubert/avhubert/infer_s2s.py", line 96, in main
    return _main(cfg, h)                                                                                                                          
  File "/usr/users/roudi/muavic/av_hubert/avhubert/infer_s2s.py", line 118, in _main                                                              
    models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task([cfg.common_eval.path])
  File "/data/sls/u/meng/roudi/muavic/av_hubert/fairseq/fairseq/checkpoint_utils.py", line 432, in load_model_ensemble_and_task                   
    task = tasks.setup_task(cfg.task)                                                                                                             
  File "/data/sls/u/meng/roudi/muavic/av_hubert/fairseq/fairseq/tasks/__init__.py", line 39, in setup_task
    cfg = merge_with_parent(dc(), cfg)
  File "/data/sls/u/meng/roudi/muavic/av_hubert/fairseq/fairseq/dataclass/utils.py", line 490, in merge_with_parent                               
    merged_cfg = OmegaConf.merge(dc, cfg)                                                                                                         
omegaconf.errors.ConfigKeyError: Key 'add_eos' not in 'AVHubertPretrainingConfig'
        full_key: add_eos
        reference_type=Optional[AVHubertPretrainingConfig]                                                                                        
        object_type=AVHubertPretrainingConfig  

I fixed this by adding add_eos: bool = field(default=False, metadata={"help": "hack: make the multilingual model work"}) to this line: https://github.com/facebookresearch/av_hubert/blob/e8a6d4202c208f1ec10f5d41a66a61f96d1c442f/avhubert/hubert_pretraining.py#L161

I ran decoding on a few languages. I noticed the model outputs a language tag in the hypothesis (examples: <fr> (Applaudissements), <es> (Aplausos)), while the reference doesn't contain the language tag.
My WERs were quite different than what's reported in the paper, but I found that adding the language tag to the reference sentences seems to make the WERs comparable to what's in the paper (removing the language tag in the hypothesis resulted in worse WER than reported). Just wanted to check if you used the language tag in the reference for evaluation in the multilingual setting?

The model sometimes outputs the text in the wrong language (as well as the incorrect language tag). Is there a way to force output text in a certain language?

I was also wondering how to train the multilingual model (the current training script seems to be for audio in one language). Specifically, should I add the language tag in the beginning of all of the sentences, and how do you balance samples from different languages?

Empty X -> EN translations

For the X -> EN task, I noticed there are some blank / empty translations even though the source language text has a valid sentence.

Here are the number of blank translations per language. The problem is mainly on a few of the test sets. How does the bleu score computation handle this?

  • el train 0
  • el valid 0
  • el test 3
  • es train 0
  • es valid 0
  • es test 11
  • fr train 0
  • fr valid 0
  • fr test 1
  • it train 0
  • it valid 0
  • it test 0
  • pt train 0
  • pt valid 0
  • pt test 1
  • ru train 0
  • ru valid 0
  • ru test 8

Questions towards hyper-parameters and the token post-processing

Dear authors,
thanks for the great work. I have two questions about the paper.
In Section 4.1 about the experimental setup, it's written:
For both AVSR and AVST, we use an English AV-HuBERT large pre-trained model [3], which is trained on the combination of LRS3-TED [8] and the English portion of VoxCeleb2 [27]. We follow [3] for fine-tuning hyper-parameters, except that we fine-tune our bilingual models to 30K updates and our multilingual AVSR model to 90K updates.

I would ask, how many warmup_steps, hold_steps, and decay_steps did you use? And how many freeze_finetune_updates did you set? Because the original configuration file for the large model has 60k updates. We may need to change the above-mentioned hyperparameters if the max_updates is changed to 30k.

The second question is about punctuation removal and lowercasing before calculating WER. Because I also observed some special tokens, e.g. the music token ♪ in the dictionary. Which tokens have you removed and how?

I'm looking forward to your reply and thank you in advance :)

Best regards,
Zhengyang

RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for AVHubertSeq2Seq:

Traceback (most recent call last):
File "/home/lpl/muavic/demo/run_demo.py", line 220, in
AV_RESOURCES = load_av_models(args.av_models_path)
File "/home/lpl/muavic/demo/demo_utils.py", line 65, in load_av_models
models, _, task = checkpoint_utils.load_model_ensemble_and_task(
File "/home/lpl/av_hubert/fairseq/fairseq/checkpoint_utils.py", line 447, in load_model_ensemble_and_task
model.load_state_dict(
File "/home/lpl/av_hubert/fairseq/fairseq/models/fairseq_model.py", line 125, in load_state_dict
return super().load_state_dict(new_state_dict, strict)
File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for AVHubertSeq2Seq:
size mismatch for decoder.layers.0.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.0.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.1.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.1.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.2.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.2.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.3.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.3.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.4.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.4.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.5.encoder_attn.k_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).
size mismatch for decoder.layers.5.encoder_attn.v_proj.weight: copying a param with shape torch.Size([768, 1024]) from checkpoint, the shape in current model is torch.Size([768, 768]).

    I'm having this issue, pls is there any solution?

Only audio files could be downloaded

Dear authors,
I have downloaded the tgz files by myself, but only txt, vtt and wav files can be found in the directory. Then how could I download video files for visual speech recognition goals? Thanks!

VSR performance lower on MuAViC version of LRS3 (En)

Hi, thanks for your nice work! I preprocessed the MuAViC dataset according to the instructions. I already had LRS3 processed according to the AV-HuBERT instructions, so I wanted to test if a pre-trained model would get the same performance on both the AV-HuBERT dataset version and the MuAViC version of LRS3.

I first tried ckpt=large_noise_pt_noise_ft_433h.pt from AV-HuBERT, and ran this command:

python -B infer_s2s.py --config-dir ./conf/ --config-name s2s_decode.yaml \
  dataset.gen_subset=test common_eval.path=${ckpts_dir}/${ckpt} \
  common_eval.results_path=${exp_dir}/av-hubert/decode/s2s/test \
  override.modalities=['audio', 'video'] override.data=${lrs3_dir}/30h_data override.label_dir=${lrs3_dir}/30h_data common.user_dir=`pwd`

Using the AV-HuBERT version of LRS3:

  • 433 audio-visual: 1.486
  • 433h audio-only: 1.951
  • 433h video-only: 34.135

Using the MuAViC version of LRS3:

  • 433 audio-visual: 1.496 (slightly worse)
  • 433h audio-only: 1.951 (the same)
  • 433h video-only: 35.995 (noticeably worse)

It seems that the AV-HuBERT checkpoint got worse performance on the MuAViC data versions whenever video is involved.

I also tried running the MuAViC decoding script using the MuAViC English checkpoint on the MuAViC version of LRS3 and got the following performance:

  • 433 audio-visual: 2.1941
  • 433h audio-only: 3.22
  • 433h video-only: 35.995

Then I tried the MuAViC decoding script, MuAViC English checkpoint, and the AV-HuBERT LRS3 dataset version:

  • 433h audio-visual: 2.153 (slightly better)
  • 433h audio-only: 3.225 (the same)
  • 433h video-only: 34.459 (noticeably better).

The MuAViC checkpoint also gets better performance on the AV-HuBERT version of LRS3 which is kind of surprising. In both cases (AV-HuBERT checkpoint or MuAViC checkpoint), the audio-only performance stays identical.
I have also tried this with the other AV-HuBERT checkpoints and the conclusion is the same (also, the gap was more noticeable for the base models).
I wonder if MuAViC processed the LRS3 video differently than AV-HuBERT, which leads to a different performance?

Unable to download corpora other than English

Downloading mtedx_el.tgz from https://www.openslr.org/resources/100/mtedx_el.tgz
Traceback (most recent call last):
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 1037, in _send_output
self.send(msg)
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 975, in send
self.connect()
File "/home/w30043779/miniconda3/lib/python3.10/http/client.py", line 1454, in connect
self.sock = self._context.wrap_socket(self.sock,
File "/home/w30043779/miniconda3/lib/python3.10/ssl.py", line 513, in wrap_socket
return self.sslsocket_class._create(
File "/home/w30043779/miniconda3/lib/python3.10/ssl.py", line 1071, in _create
self.do_handshake()
File "/home/w30043779/miniconda3/lib/python3.10/ssl.py", line 1342, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:997)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/w30043779/code1/muavic-main/utils.py", line 62, in download_file
wget.download(url, out=str(download_path / filename), bar=custom_bar)
File "/home/w30043779/miniconda3/lib/python3.10/site-packages/wget.py", line 526, in download
(tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 241, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/home/w30043779/miniconda3/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:997)>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/w30043779/code1/muavic-main/get_data.py", line 115, in
main(args)
File "/home/w30043779/code1/muavic-main/get_data.py", line 84, in main
prepare_mtedx(args)
File "/home/w30043779/code1/muavic-main/get_data.py", line 15, in prepare_mtedx
download_mtedx_data(args["mtedx"], args["src_lang"], args["src_lang"])
File "/home/w30043779/code1/muavic-main/mtedx_utils.py", line 27, in download_mtedx_data
download_extract_file_if_not(
File "/home/w30043779/code1/muavic-main/utils.py", line 89, in download_extract_file_if_not
download_file(url, download_path)
File "/home/w30043779/code1/muavic-main/utils.py", line 65, in download_file
raise HTTPError(e.url, e.code, message, e.hdrs, e.fp)
AttributeError: 'URLError' object has no attribute 'url'

TEDx Talk with ID=D4TE28-L7FI is not available anymore

I was downloading the MuAViC database for the Spanish language when suddenly a error message appeared when segmenting videos. It seems that the video with ID=D4TE28-L7FI is not available anymore. Do you have a backup of the database for these cases? In addition, the script was interrupted, I consider that it should not happen.

Best regards,

David.

Error running the data prep script

First, I downloaded lrs3_pretrain.zip, lrs3_test_v0.4.zip, and lrs_3_v0.4_txt.zip, and made sure the checksums matched. Unzipping them gave me three folders: pretrain, lrs3_v0.4, and test. I copied out lrs3_v0.4/trainval and placed it in the root folder beside pretrain.

Next, I ran the command:
python3 get_data.py --root-path . --src-lang en

I got an error with "Creating AVSR manifests for en"
KeyError: 'iW4fCwfw1vg/00033'

Can you please let me know I'm doing wrong?

Problems when Downloading the Italian Dataset

Hi,

I run the following command to download the Italian Datasert from MuAViC:

python get_data.py --root-path ./esperanza/ --src-lang it

However, in some moment of the running the script was interrupted. Please find attached the full error trace:

Traceback (most recent call last):
  File "/home/dgimeno/phd/muavic/utils.py", line 62, in download_file
    wget.download(url, out=str(download_path / filename), bar=custom_bar)
  File "/home/dgimeno/anaconda3/envs/muavic/lib/python3.8/site-packages/wget.py", line 506, in download
    (fd, tmpfile) = tempfile.mkstemp(".tmp", prefix=prefix, dir=".")
  File "/home/dgimeno/anaconda3/envs/muavic/lib/python3.8/tempfile.py", line 331, in mkstemp
    return _mkstemp_inner(dir, prefix, suffix, flags, output_type)
  File "/home/dgimeno/anaconda3/envs/muavic/lib/python3.8/tempfile.py", line 250, in _mkstemp_inner
    fd = _os.open(file, flags, 0o600)
FileNotFoundError: [Errno 2] No such file or directory: './esperanza/metadata/it_metadata.tgz88g65ab3.tmp'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "get_data.py", line 115, in <module>
    main(args)
  File "get_data.py", line 84, in main
    prepare_mtedx(args)
  File "get_data.py", line 26, in prepare_mtedx
    preprocess_mtedx_video(
  File "/home/dgimeno/phd/muavic/mtedx_utils.py", line 220, in preprocess_mtedx_video
    video_metadata = load_video_metadata(
  File "/home/dgimeno/phd/muavic/utils.py", line 110, in load_video_metadata
    download_extract_file_if_not(
  File "/home/dgimeno/phd/muavic/utils.py", line 89, in download_extract_file_if_not
    download_file(url, download_path)
  File "/home/dgimeno/phd/muavic/utils.py", line 65, in download_file
    raise HTTPError(e.url, e.code, message, e.hdrs, e.fp)
AttributeError: 'FileNotFoundError' object has no attribute 'url'

Error when generating the manifest for AVSR

Dear authors,

Thanks a lot for your work. When generating manifests for AVSR, I meet the following error, which can not restart from the breaking point:

Creating fr/train manifest: 26%|██▌ | 30189/116045 [10:24:50<40:25:00, 1.69s/it]
Creating fr/train manifest: 26%|██▌ | 30195/116045 [10:24:53<25:27:27, 1.07s/it]
Creating fr/train manifest: 26%|██▌ | 30197/116045 [10:24:58<33:41:41, 1.41s/it]
Creating fr/train manifest: 26%|██▌ | 30199/116045 [10:25:00<30:51:53, 1.29s/it]Exception in thread QueueManagerThread:
Traceback (most recent call last):
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/threading.py", line 932, in _bootstrap_inner

Creating fr/train manifest: 26%|██▌ | 30202/116045 [10:25:03<29:36:35, 1.24s/it]
self.run()
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 394, in _queue_management_worker
work_item.future.set_exception(bpe)
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 547, in set_exception
raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x7fc3fa091ca0 state=cancelled>
concurrent.futures.process._RemoteTraceback:
'''
Traceback (most recent call last):
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 368, in _queue_management_worker
result_item = result_reader.recv()
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/multiprocessing/connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
TypeError: init() missing 2 required positional arguments: 'stdout' and 'stderr'
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "get_data.py", line 115, in
main(args)
File "get_data.py", line 84, in main
prepare_mtedx(args)
File "get_data.py", line 31, in prepare_mtedx
prepare_mtedx_avsr_manifests(args["mtedx"], args["src_lang"], args["muavic"])
File "/beegfs/work/zhengyangli/muavic/mtedx_utils.py", line 268, in prepare_mtedx_avsr_manifests
process_map(
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 105, in process_map
return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/std.py", line 1178, in iter
for obj in iterable:
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
slurmstepd: error: *** JOB 536548 ON gpu01 CANCELLED AT 2023-04-21T10:16:11 ***

Do you have any clue to solve the problem?

Best regards,
Zhengyang

Problem met when downloading German data

Hi,
I run the following command to download the German Dataset from MuAViC:
python get_data.py --root-path ./muavic_project --src-lang de
and met the error below during the stage of running segmenting (at 21% of the process "Segmenting de videos files (It takes a few hours to complete)").

  File "get_data.py", line 115, in <module>
    main(args)
  File "get_data.py", line 84, in main
    prepare_mtedx(args)
  File "get_data.py", line 26, in prepare_mtedx
    preprocess_mtedx_video(
  File "/mnt/ceph_rbd/muavic_project/muavic/mtedx_utils.py", line 236, in preprocess_mtedx_video
    process_map(
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 105, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/site-packages/tqdm/std.py", line 1170, in __iter__
    for obj in iterable:
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists    for element in iterable:
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/mnt/ceph_rbd/applications/anaconda3/envs/avhubert/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

I'm not very familiar with using process_map. Do you have any potential assumption about the reason of this error and suggestions on solving it?
Many thanks.

Noise parameters for decoding and training

I am trying to figure out the noise parameters for the decode and train script to reproduce the results in the paper.
For decoding, I originally tried adding babble noise from musan:

override.noise_wav=/path-to-musan/musan/tsv/babble \
override.noise_prob=1 \
override.noise_snr=0

I found the average performance of the monolingual and multilingual models in the noisy condition was noticeably better than reported in the paper (while obtaining similar results as to the paper in clean conditions).
I also tried using the babble noise from lrs3 (override.noise_wav=/path-to-lrs3/noise/babble), and the average performance was closer to what was reported in the paper.
Which noise should be used?

For training, are these the right parameters to add?

override.noise_wav=/path-to-musan/musan/tsv/all \
override.noise_prob=0.25 \
override.noise_snr=0

Also, for the pre-trained model ("All models FT from strongest large_vox_iter5.pt") is this the noisy pre-trained checkpoint or clean pre-trained checkpoint? I assume it's the noisy one, but just double checking.

Thanks for the help!

Error when preprocessing the video data

Dear authors,

thanks a lot for this contribution to multi-lingual AV-ASR! I have an error when preprocessing the video data. This error happens in:

muavic/mtedx_utils.py

Lines 190 to 201 in 122ef0c

process_map(
partial(
segment_normalize_video_file,
mean_face_metadata,
metadata_path / src_lang / split,
video_dir_path,
out_path,
),
video_segments.items(),
max_workers=os.cpu_count(),
chunksize=1,
)

The error is as follows, did you also happen to have this error or do you have some clues to solve it? :

  0%|          | 0/95 [00:00<?, ?it/s]
  0%|          | 0/95 [00:05<?, ?it/s]
concurrent.futures.process._RemoteTraceback: 
'''
Traceback (most recent call last):
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 368, in _queue_management_worker
    result_item = result_reader.recv()
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/multiprocessing/connection.py", line 251, in recv
    return _ForkingPickler.loads(buf.getbuffer())
TypeError: __init__() missing 2 required positional arguments: 'stdout' and 'stderr'
'''

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "get_data.py", line 77, in <module>
    main(args)
  File "get_data.py", line 59, in main
    prepare_mtedx(args)
  File "get_data.py", line 22, in prepare_mtedx
    preprocess_mtedx_video(
  File "/beegfs/work/zhengyangli/muavic/mtedx_utils.py", line 190, in preprocess_mtedx_video
    process_map(
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 105, in process_map
    return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
    return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
    for element in iterable:
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
    yield fs.pop().result()
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/home/zhengyangli/anaconda3/envs/muavic/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

download_ted2020() error

It seems that you try to parse a zip file using GzipFile?
Here is the traceback:
Downloading el-en.txt.zip from https://opus.nlpl.eu/download.php?f=TED2020/v1/moses/el-en.txt.zip
MBTraceback (most recent call last):
File "get_data.py", line 107, in
main(args)
File "get_data.py", line 73, in main
prepare_lrs3(args)
File "get_data.py", line 59, in prepare_lrs3
download_ted2020(args["ted2020"])
File "/mnt/pfs/wanghe/corpus/muavic/muavic/lrs3_utils.py", line 345, in download_ted2020
extract_ted2020_data(str(tgz_filepath), "en", lang, ted2020_path)
File "/mnt/pfs/wanghe/corpus/muavic/muavic/lrs3_utils.py", line 308, in extract_ted2020_data
tmx_dict = xmltodict.parse(GzipFile(tgz_filepath))
File "/opt/conda/envs/oslasr/lib/python3.8/site-packages/xmltodict.py", line 372, in parse
parser.ParseFile(xml_input)
File "/opt/conda/envs/oslasr/lib/python3.8/gzip.py", line 292, in read
return self._buffer.read(size)
File "/opt/conda/envs/oslasr/lib/python3.8/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/opt/conda/envs/oslasr/lib/python3.8/gzip.py", line 479, in read
if not self._read_gzip_header():
File "/opt/conda/envs/oslasr/lib/python3.8/gzip.py", line 427, in _read_gzip_header
raise BadGzipFile('Not a gzipped file (%r)' % magic)
gzip.BadGzipFile: Not a gzipped file (b'PK')

A small bug during audio pre-processing

Hi,
I just found an error when pre-processing the audio data in

muavic/mtedx_utils.py

Lines 77 to 102 in 122ef0c

for split in ["train", "valid", "test"]:
# create directory for segmented & normalized audio
out_path = muavic_path / src_lang / "audio" / split
out_path.mkdir(parents=True, exist_ok=True)
if not is_empty(out_path):
if split == "train":
print(f"\nSegmenting {src_lang} audio files")
# collect needed info from segment file
segments_info = []
split_dir_path = mtedx_path / f"{src_lang}-{src_lang}" / "data" / split
wav_dir_path = split_dir_path / "wav"
segment_file = split_dir_path / "txt" / "segments"
for line in read_txt_file(segment_file):
seg_id, fid, start, end = line.strip().split(' ')
segments_info.append(
(wav_dir_path/(fid+".flac"), fid, seg_id, float(start), float(end))
)
# preprocess audio files
process_map(
partial(segment_normalize_audio_file, out_path),
segments_info,
max_workers=os.cpu_count(),
desc=f"Preprocessing {src_lang}/{split} Audios",
chunksize=1,
)

There are additional Tabs from line 84 to line 102. I corrected it to the following:

def preprocess_mtedx_audio(mtedx_path, src_lang, muavic_path):
    # get files id per split
    for split in ["train", "valid", "test"]:
        # create directory for segmented & normalized audio
        out_path = muavic_path / src_lang / "audio" / split
        out_path.mkdir(parents=True, exist_ok=True)
        if not is_empty(out_path):
            if split == "train":
                print(f"\nSegmenting {src_lang} audio files")
        # collect needed info from segment file
        segments_info = []
        split_dir_path = mtedx_path / f"{src_lang}-{src_lang}" / "data" / split
        wav_dir_path = split_dir_path / "wav"
        segment_file = split_dir_path / "txt" / "segments"
            
        for line in read_txt_file(segment_file):
            seg_id, fid, start, end = line.strip().split(' ')
            segments_info.append(
                (wav_dir_path/(fid+".flac"), fid, seg_id, float(start), float(end))
            )
        # preprocess audio files
        process_map(
            partial(segment_normalize_audio_file, out_path),
            segments_info,
            max_workers=os.cpu_count(),
            desc=f"Preprocessing {src_lang}/{split} Audios",
            chunksize=1,
        )

Then the code can work ;)

Got error when preparing LRS3

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "get_data.py", line 107, in
main(args)
File "get_data.py", line 73, in main
prepare_lrs3(args)
File "get_data.py", line 53, in prepare_lrs3
process_lrs3_videos(args["lrs3"], args["metadata"], args["muavic"])
File "/mnt/pfs/wanghe/corpus/muavic/muavic/lrs3_utils.py", line 239, in process_lrs3_videos
process_map(
File "/opt/conda/envs/oslasr/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 130, in process_map
return _executor_map(ProcessPoolExecutor, fn, *iterables, **tqdm_kwargs)
File "/opt/conda/envs/oslasr/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs))
File "/opt/conda/envs/oslasr/lib/python3.8/site-packages/tqdm/std.py", line 1195, in iter
for obj in iterable:
File "/opt/conda/envs/oslasr/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists
for element in iterable:
File "/opt/conda/envs/oslasr/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "/opt/conda/envs/oslasr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/opt/conda/envs/oslasr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
AssertionError: /mnt/pfs/wanghe/corpus/muavic/metadata/en/train/FPhZGDS6kVQ.pkl should've been downloaded!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.