The conferencingspeech2021's discuss from conferencingspeech

Can you use PR?

How about using a PR to commit the sources, even for the internal change?
This will make us easy to review the change in the source code.
(In other words, it is not easy to review the changes if you directly commit the master without PR).

exported by onnx and inference scripts?

Hi, thanks for the challenge and baseline. I saw the description as "baseline, this folder contains baseline system include inference model exported by onnx and inference scripts;" But, where is actually the onnx part? and the inference scripts.

Thanks a lot.

Missing data in Audioset

Hello,

I was trying to run the simulation with the given selected_list, but I found some of the IDs for Audioset is not accessible now.
Below I list part of them (I haven't check all of the sample IDs):

HKTIe6piDOI
M7GmqUqVQEA
Hm20kZ7QzO0
oz3LrVaXMb4
6-kHUulyCog
TGd5kPDdN_I
IjoePLT_cFw
dKK-JaIzwS4
Cmhpj4MJ_hQ
NbBM82N1Xos
2JoJ_1agmTk
8YIELHXpf3g
AdLiRtpI01s
AgVZ65Hr9rw
4fh52mLYBYw
KKoTQfro920
L6DFGW6jeV8
X61ftZ590Uc
pK1ucosjoRo
Lpzx6N2aCMY
lnWP_zWFpBg
mg2rhu_HHR0

For example, if you go to https://www.youtube.com/watch?v=6-kHUulyCog, it says the video is unavailable.
If you go to https://www.youtube.com/watch?v=Lpzx6N2aCMY, it says the video becomes private.

Could you release the unavailable samples in Audioset directly, or just change the selected list for Audioset?

Task2 : RIR file

According to the RIR files, It seems that there is no correlation between different microphone arrays.

The rooms are different and source positions are different.

Is it useful for task2?

The version mis-match of VCTK corpus.

Hi, all

Without the VCTK in my cluster, I used the given download link to download the VCTK corpus.
However, I found the given link maybe not correct one to download.

The VCTK corpus is updated with version 0.92 now, which is given by the link in https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/simulation/ReadMe.md and https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/ReadMe.md.
The new version of VCTK actually is quite different from the original widely used version 0.80 ( which I believe is also used by the official baseline, inferred from the selected_list content).
Now the new version gets two tracks of audios with _mic1.flac or _mic2.flac suffix.
However, the VCTK files in selected_list is in .wav suffix.

So, I think maybe the correct version for the VCTK is 0.80, which could be downloaded from the link: https://datashare.ed.ac.uk/handle/10283/2651.
Please check it, thx.

And the 0.80 version contains both raw recording and the waves without the silence part. And they are using the same file name as p376_295.wav.
I'd also like to know which one should be used.
Because the quick_select.py may encounter some "accident" now.

Thanks.

Cannot get the dataset

We have already registered the competition using the educational mailbox. However we have not received the sharing code and do not have the permission to download data. Could you help us to fix it?

.so files

Are they necessary?
We usually don't put these library files since these files depend on the environment.

checkpoint file of pretrained baseline model

Generating the synth examples. Step 3 not clear.

In simulation/README.md:

What does it means for step 3:

Attention to the data/[dev | train]_[linear|circle]_simu_mix.config . In the config file path should be replaced with the corresponding path.

do we have to produce a script for replacing path with our own paths ?
If so, can you include in the repo the script you have used to replace the paths so each participant has not to write its own ? (i am lazy :) ).

Bugs in the simulation code

I notice that the simulation code is not compatible with the current pyrirgen.

In tencent_challenge_rirgenerator.py#L75, it calls the function pyrirgen.generateRir, but this API has been refactored to pyrirgen.rir_generator since this commit in phecda-xu/RIR-Generator

Could you update it and also double-check the scripts?

The version mismatch of VCTK.

Hi, all

Without the VCTK in my cluster, I used the given download link to download the VCTK corpus.
However, I found the given link maybe not correct one to download.

The VCTK corpus is updated with version 0.92 now, which is given by the link in https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/simulation/ReadMe.md and https://github.com/ConferencingSpeech/ConferencingSpeech2021/blob/master/ReadMe.md.
The new version of VCTK actually is quite different from the original widely used version 0.80 ( which I believe is also used by the official baseline, inferred from the selected_list content).
Now the new version gets two tracks of audios with _mic1.flac or _mic2.flac suffix.
However, the VCTK files in selected_list is in .wav suffix.

So, I think maybe the correct version for the VCTK is 0.80, which could be downloaded from the link: https://datashare.ed.ac.uk/handle/10283/2651.
Please check it.

And the 0.80 version contains both raw recording and the waves without the silence part. And they are using the same file name as p376_295.wav.
I'd also like to know which one should be used.
Because the quick_select.py may encounter some "accident" in this part.

Thanks.

conferencingspeech / conferencingspeech2021 Goto Github PK

conferencingspeech2021's Issues

Can you use PR?

exported by onnx and inference scripts?

Missing data in Audioset

Task2 : RIR file

The version mis-match of VCTK corpus.

Cannot get the dataset

.so files

checkpoint file of pretrained baseline model

Generating the synth examples. Step 3 not clear.

Bugs in the simulation code

The version mismatch of VCTK.

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent