conferencingspeech / conferencingspeech2021 Goto Github PK

Conferencing Speech Challenge

License: Apache License 2.0

Python 96.15% Shell 3.85%

conferencingspeech2021's Introduction

ConferencingSpeech 2021 challenge

This repository contains the datasets list and scripts required for the ConferencingSpeech challenge. For more details about the challenge, please see our website.

Details

baseline, this folder contains baseline system include inference model exported by onnx and inference scripts;
eval, this folder contains evaluation scripts to calculate PESQ, STOI and SI-SNR;
selected_lists, the selected wave about train speech and noise wave name from aishell-1, aishell-3, librispeech-360, VCTK, MUSAN, Audioset. Each participant is only allowed to use the selected speech and noise data below :
- selected_lists/dev/circle.name circle RIR wave utt name of dev set
- selected_lists/dev/linear.name linear RIR wave utt name of dev set
- selected_lists/dev/non_uniform.name non uniform linear RIR wave utt name of dev set
- selected_lists/dev/clean.name wave utt name of dev set used clean set
- selected_lists/dev/noise.name wave utt name of dev set used noise set
- selected_lists/train/aishell_1.name wave utt name from aishell-1 set used in train set
- selected_lists/train/aishell_3.name wave utt name from aishell-3 set used in train set
- selected_lists/train/librispeech_360.name wave utt name from librispeech-360 set used in train set
- selected_lists/train/vctk.name wave utt name from VCTK set used in train set
- selected_lists/train/audioset.name wave utt name from Audioset used in train set
- selected_lists/train/musan.name wave utt name from MUSAN used in train set
- selected_lists/train/circle.name circle wave RIR name of train set
- selected_lists/train/linear.name linear wave RIR name of train set
- selected_lists/train/non_uniform.name non unifrom linear RIR utt name of train set
simulation, about simulation scripts, how to use to see ReadMe
- simulation/mix_wav.py simulate dev set and train set
- simulation/prepare.sh use selected_lists/*/*name to select used wave from downloaded raw data, or you can select them by yourself scripts.
- simulation/quick_select.py quickly select the name by a name list instead of grep -r -f
- simulation/challenge_rirgenerator.py the script to simulate RIRs in train and dev set
- simulation/data/dev_circle_simu_mix.config dev circle set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point
- simulation/data/dev_linear_simu_mix.config dev linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point
- simulation/data/dev_non_uniform_linear_simu_mix.config dev non uniform linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point
- simulation/data/train_simu_circle.config train circle set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point; please download it from dropbox.
- simulation/data/train_simu_linear.config train linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point; please download it from dropbox.
- simulation/data/train_simu_non_uniform.config train non uniform linear set simulation setup, include clean wave, noise wave, rir wave, snr, volume scale, start point; please download it from dropbox.
requirements.txt, dependency

Notes:

1. \*.config file should be replaced with correct path of audio files.
2. Training config files have been released together with challenge data.

Citation

If you use this challenge dataset and baseline system in a publication, please cite the following paper:

@article{wei2021interspeech,
  title={{INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing}},
  author={Wei Rao and Yihui Fu and Yanxin Hu and Xin Xu and Yvkai Jv and Jiangyu Han and Zhongjie Jiang and Lei Xie and Yannan Wang and Shinji Watanabe and Zheng-Hua Tan and Hui Bu and Tao Yu and Shidong Shang},
  journal={arXiv preprint arXiv:2104.00960}
}

Requirements

python3.6 or above

pip install -r requirements.txt

if you simulation RIRs by yourself with our scripts, you may better install this:

pyrirgen

Code license

Apache 2.0

conferencingspeech2021's People

Contributors

Stargazers

Watchers

conferencingspeech2021's Issues

The version mismatch of VCTK.

Hi, all

Without the VCTK in my cluster, I used the given download link to download the VCTK corpus.
However, I found the given link maybe not correct one to download.

The VCTK corpus is updated with version 0.92 now, which is given by the link in https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/simulation/ReadMe.md and https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/ReadMe.md.
The new version of VCTK actually is quite different from the original widely used version 0.80 ( which I believe is also used by the official baseline, inferred from the selected_list content).
Now the new version gets two tracks of audios with _mic1.flac or _mic2.flac suffix.
However, the VCTK files in selected_list is in .wav suffix.

So, I think maybe the correct version for the VCTK is 0.80, which could be downloaded from the link: https://datashare.ed.ac.uk/handle/10283/2651.
Please check it.

And the 0.80 version contains both raw recording and the waves without the silence part. And they are using the same file name as p376_295.wav.
I'd also like to know which one should be used.
Because the quick_select.py may encounter some "accident" in this part.

Thanks.

Cannot get the dataset

We have already registered the competition using the educational mailbox. However we have not received the sharing code and do not have the permission to download data. Could you help us to fix it?

.so files

Are they necessary?
We usually don't put these library files since these files depend on the environment.

Missing data in Audioset

Hello,

I was trying to run the simulation with the given selected_list, but I found some of the IDs for Audioset is not accessible now.
Below I list part of them (I haven't check all of the sample IDs):

HKTIe6piDOI
M7GmqUqVQEA
Hm20kZ7QzO0
oz3LrVaXMb4
6-kHUulyCog
TGd5kPDdN_I
IjoePLT_cFw
dKK-JaIzwS4
Cmhpj4MJ_hQ
NbBM82N1Xos
2JoJ_1agmTk
8YIELHXpf3g
AdLiRtpI01s
AgVZ65Hr9rw
4fh52mLYBYw
KKoTQfro920
L6DFGW6jeV8
X61ftZ590Uc
pK1ucosjoRo
Lpzx6N2aCMY
lnWP_zWFpBg
mg2rhu_HHR0

For example, if you go to https://www.youtube.com/watch?v=6-kHUulyCog, it says the video is unavailable.
If you go to https://www.youtube.com/watch?v=Lpzx6N2aCMY, it says the video becomes private.

Could you release the unavailable samples in Audioset directly, or just change the selected list for Audioset?

exported by onnx and inference scripts?

Hi, thanks for the challenge and baseline. I saw the description as "baseline, this folder contains baseline system include inference model exported by onnx and inference scripts;" But, where is actually the onnx part? and the inference scripts.

Thanks a lot.

Bugs in the simulation code

I notice that the simulation code is not compatible with the current pyrirgen.

In tencent_challenge_rirgenerator.py#L75, it calls the function pyrirgen.generateRir, but this API has been refactored to pyrirgen.rir_generator since this commit in phecda-xu/RIR-Generator

Could you update it and also double-check the scripts?

Can you use PR?

How about using a PR to commit the sources, even for the internal change?
This will make us easy to review the change in the source code.
(In other words, it is not easy to review the changes if you directly commit the master without PR).

The version mis-match of VCTK corpus.

Hi, all

Without the VCTK in my cluster, I used the given download link to download the VCTK corpus.
However, I found the given link maybe not correct one to download.

So, I think maybe the correct version for the VCTK is 0.80, which could be downloaded from the link: https://datashare.ed.ac.uk/handle/10283/2651.
Please check it, thx.

Thanks.

checkpoint file of pretrained baseline model

Task2 : RIR file

According to the RIR files, It seems that there is no correlation between different microphone arrays.

The rooms are different and source positions are different.

Is it useful for task2?

Generating the synth examples. Step 3 not clear.

In simulation/README.md:

What does it means for step 3:

Attention to the data/[dev | train]_[linear|circle]_simu_mix.config . In the config file path should be replaced with the corresponding path.

do we have to produce a script for replacing path with our own paths ?
If so, can you include in the repo the script you have used to replace the paths so each participant has not to write its own ? (i am lazy :) ).