Giter VIP home page Giter VIP logo

dual-path-rnn-pytorch's Introduction

Hey 👋🏽, I'm Kai Li!


       

GIF

My name is Kai Li (Chinese name: 李凯). I'm a second-year master student at Department of Computer Science and Technology, Tsinghua University, supervised by Prof. Xiaolin Hu (胡晓林). I am also a member of TSAIL Group directed by Prof. Bo Zhang (张拨) and Prof. Jun zhu (朱军). I am an intern at Tencent AI Lab, mainly doing research on causal speech separation, supervised by Yi Luo (罗艺).

🤗   These works are open source to the best of my ability.

🤗   I am currently doing research on multimodal speech separation, and am interested in other speech tasks (e.g., pre-training models and neuralscience). If you would like to collaborate, please contact me. Many thanks.

🔖 Homepages

: Kai Li     : Jusper Lee     : cslikai.cn

📅 News

  • 2023.07: 🎲 One paper is accepted by ECAI 2023.
  • 2023.05: 🧩 Two papers are accepted by Interspeech 2023.
  • 2023.05: 🎉 We won the first prize 🥇 of the Cinematic Sound Demixing Track 23 in the Leaderboard A and B.
  • 2023.05: 🎉 We won the first prize 🥇 of the ASC23 and Best Application Award.
  • 2023.04: 🎲 One paper is appeared by Arxiv.
  • 2023.02: 🧩 One paper is accepted by ICASSP 2023.
  • 2023.01: 🧩 One paper is accepted by ICLR 2023.

📰 Selected Publications:

See Google Scholar for a full list of publications.

Speech Separation

Neuroscience

Cloud Removal

Super Resolution

dual-path-rnn-pytorch's People

Contributors

jusperlee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

dual-path-rnn-pytorch's Issues

DPRNN的前几代的loss大概是多少

首先感谢博主分享了自己的开源代码,其次我有2个问题需要咨询下,因为您这个代码是适用于8k,我想改成16k,是不是修改相关的每一块的chunk_size K=sqrt(2乘16000乘4)约等于360。之前是8k对应250。第二点:想咨询下我用这个代码来做降噪发现loss都维持在17-18左右,想问下您用DPRNN做人声分离前几代的loss是大概多少的样子。期待您的回复。

分离的音频有较大噪声

训练生成的模型用于分离测试集中的混合音频,为何得到的分离音频有较大噪声呢?
stoi指标很低,有什么方法改进吗?
求助,万分感谢!

计算 loss -SI-SNR

您好,之前看您解答问题的时候说DPRNN在50个epoch的时候可以到达17dB左右,但是我现在运行了36个epoch,才到14.7dB,您知道这是什么原因吗?
image
是我使用的WSJ0数据集太大的原因吗?

About the requirements

hello, I am new to this project, is there anyone helping me about the version of the requirements, like pytorch, python, torch and so on.
I would be very grateful, thanks!

可以做中文语音分离吗

您好,请问这个模型要做汉语语音分离的话,是不是只需要用汉语数据集重新训练就好了呢?

Gradient explosion at 17 epochs

Loss becomes bigger at epoch 17 when training the mode on our own dataset.

20-07-15 02:34:45 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:16000, lr:5.000e-04, loss:-12.715>
20-07-15 02:50:07 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:17000, lr:5.000e-04, loss:-12.711>
20-07-15 03:05:29 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:18000, lr:5.000e-04, loss:-12.701>
20-07-15 03:20:51 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:19000, lr:5.000e-04, loss:-12.699>
20-07-15 03:36:12 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:20000, lr:5.000e-04, loss:-12.700>
20-07-15 03:51:34 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:21000, lr:5.000e-04, loss:-12.703>
20-07-15 04:06:56 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:22000, lr:5.000e-04, loss:-12.704>
20-07-15 04:22:18 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:23000, lr:5.000e-04, loss:-12.703>
20-07-15 04:37:40 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:24000, lr:5.000e-04, loss:-12.705>
20-07-15 04:53:01 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:25000, lr:5.000e-04, loss:-12.703>
20-07-15 05:08:23 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:26000, lr:5.000e-04, loss:-12.705>
20-07-15 05:23:45 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:27000, lr:5.000e-04, loss:-12.703>
20-07-15 05:39:06 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:28000, lr:5.000e-04, loss:-12.705>
20-07-15 05:54:28 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:29000, lr:5.000e-04, loss:-12.704>
20-07-15 06:09:51 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:30000, lr:5.000e-04, loss:-12.704>
20-07-15 06:25:12 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:31000, lr:5.000e-04, loss:-12.705>
20-07-15 06:40:34 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:32000, lr:5.000e-04, loss:-12.705>
20-07-15 06:55:56 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:33000, lr:5.000e-04, loss:-12.705>
20-07-15 07:11:18 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:34000, lr:5.000e-04, loss:-12.704>
20-07-15 07:26:42 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:35000, lr:5.000e-04, loss:-12.702>
20-07-15 07:42:04 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:36000, lr:5.000e-04, loss:-12.698>
20-07-15 07:57:26 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:37000, lr:5.000e-04, loss:-12.599>
20-07-15 08:12:52 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:38000, lr:5.000e-04, loss:-8.057>
20-07-15 08:28:16 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:39000, lr:5.000e-04, loss:-3.748>
20-07-15 08:43:40 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:40000, lr:5.000e-04, loss:0.346>
20-07-15 08:59:05 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:41000, lr:5.000e-04, loss:4.240>
20-07-15 09:14:31 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:42000, lr:5.000e-04, loss:7.948>
20-07-15 09:28:02 [/trainer/trainer_Dual_RNN.py:106 - INFO ] Finished *** <epoch:17, iter:42879, lr:5.000e-04, loss:11.061, Total time:659.562 min>
20-07-15 09:28:02 [/trainer/trainer_Dual_RNN.py:111 - INFO ] Start Validation from epoch: 17, iter: 0
20-07-15 09:34:36 [/trainer/trainer_Dual_RNN.py:136 - INFO ] <epoch:17, iter:1000, lr:5.000e-04, loss:160.000>
20-07-15 09:41:11 [/trainer/trainer_Dual_RNN.py:136 - INFO ] <epoch:17, iter:2000, lr:5.000e-04, loss:160.000>
20-07-15 09:42:57 [/trainer/trainer_Dual_RNN.py:142 - INFO ] Finished *** <epoch:17, iter:2267, lr:5.000e-04, loss:159.929, Total time:14.915 min>
20-07-15 09:42:57 [/trainer/trainer_Dual_RNN.py:171 - INFO ] No improvement, Best Loss: -12.5711
20-07-15 09:42:57 [/trainer/trainer_Dual_RNN.py:70 - INFO ] Start training from epoch: 18, iter: 0
20-07-15 09:58:27 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:1000, lr:5.000e-04, loss:160.000>
20-07-15 10:14:02 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:2000, lr:5.000e-04, loss:160.000>
20-07-15 10:29:38 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:3000, lr:5.000e-04, loss:160.000>
20-07-15 10:45:08 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:4000, lr:5.000e-04, loss:160.000>
20-07-15 11:00:39 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:5000, lr:5.000e-04, loss:160.000>
20-07-15 11:16:10 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:6000, lr:5.000e-04, loss:160.000>
20-07-15 11:31:38 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:7000, lr:5.000e-04, loss:160.000>

DPRNN batch size

您好,我看到您的知乎分享贴中说DPRNN好像一个batch size效果最好, 您试过batch size等于2或者更高的时候吗,您在代码的readme中说100个epoch之后DPRNN的sisnr能到达18.98dB,那batch size = 2的时候sisnr能到达多少您记得吗?因为batchsize=1训练起来非常慢,我想试试batch size=2的时候,但是不知道能获得什么效果。

Number of speakers

During inference, if I need to test for a track with 3 speakers, do I need to retrain the model?
Because from the given checkpoint best.pt, there is a mismatch in the shape of separation.conv2d.bias and separation.conv2d.weight tensors, when tried with num_spks = 3.

Test result has been speed up

Wanna ask why I using the test audio from your demo pages with the dualrnn_test_wav.py to test, the result has been speed up, from originally 6 sec to output of 2 sec. Is there some parameters I need to adjust?

Source I'm using:
Yaml file: ./config/Dual_RNN/train_rnn.yml
Pretrained model: best.pt file for dprnn from your documentation
Audio file: demo pages from your documentation

about train_rnn.py

学长好,我在运行您的DPRNN的时候,运行python train_rnn.py --opt config/Dual_RNN/train_rnn.yml时,不知道为什么一直报下面的错,不知道为什么:
22-06-09 16:31:20 [train_rnn.py:69 - INFO ] Building the model of Dual-Path-RNN
22-06-09 16:31:20 [train_rnn.py:72 - INFO ] Building the optimizer of Dual-Path-RNN
22-06-09 16:31:20 [train_rnn.py:76 - INFO ] Building the dataloader of Dual-Path-RNN
22-06-09 16:34:34 [train_rnn.py:81 - INFO ] Train Datasets Length: 27698, Val Datasets Length: 7004
22-06-09 16:34:34 [train_rnn.py:90 - INFO ] Building the Trainer of Dual-Path-RNN
22-06-09 16:34:34 [/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py:35 - INFO ] Load Nvida GPU .....
22-06-09 16:34:43 [/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py:41 - INFO ] Loading Dual-Path-RNN parameters: 2.634 Mb
22-06-09 16:34:43 [/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py:69 - INFO ] Gradient clipping by 5, default L2
22-06-09 16:34:43 [/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py:116 - INFO ] Start Validation from epoch: 0, iter: 0
Traceback (most recent call last):
File "train_rnn.py", line 96, in
train()
File "train_rnn.py", line 92, in train
trainer.run()
File "/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py", line 162, in run
v_loss = self.validation(self.cur_epoch)
File "/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py", line 135, in validation
out = torch.nn.parallel.data_parallel(self.dualrnn,mix,device_ids=self.gpuid)
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 207, in data_parallel
return module(*inputs[0], **module_kwargs[0])
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/model/model_rnn.py", line 404, in forward
s = self.separation(e)
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/model/model_rnn.py", line 289, in forward
x = self.dual_rnni
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/model/model_rnn.py", line 207, in forward
intra_rnn = self.intra_norm(intra_rnn)
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 100, in forward
self._check_input_dim(input)
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 207, in _check_input_dim
.format(input.dim()))
ValueError: expected 2D or 3D input (got 4D input)

use this model to noise reduction

I used this project for noise reduction, this was what I got, Its really amazing!!! Thanks for your implement.
|-samples
|-mix: mixture audio (clean speech + noise)
|-speech-actual: clean audio
|-speech-separated: separated audio

all the audios are from test set.
samples.zip

PESQ

作者,您好,Dual path的效果SDR很高13.多,听的也很清楚,但是计算pesq和stoi值的时候很低,pesq只有1.9左右,这个可能是什么问题呢?

About computing loss -SI-SNR

Thank you for your great work in Speech Separation, I've been benifited a lot from your awesome-speech-separation repository.
I am a little confused when computing loss:
In model->loss.py->function sisnr(): you return 20log10(...)at last, but when computing Si-SNR, we should use 10log10(...), should we ?
I can achieve about 19 in Si-SNR according to your code, but if I change to 10, the result falls

Increasing Memory Size Problem

Hello there,

I was using this repo, trying to resume training your pretrained model using my own dataset. I already prepared the dataset as the readme tutorial guided. The problem now is; I ran the train_rnn.py script. Yes, I successfully run and freeze at:

20-09-29 15:51:21 [train_rnn.py:66 - INFO ] Building the model of Dual-Path-RNN
20-09-29 15:51:21 [train_rnn.py:69 - INFO ] Building the optimizer of Dual-Path-RNN
20-09-29 15:51:21 [train_rnn.py:72 - INFO ] Building the dataloader of Dual-Path-RNN

It's been an hour and I saw the htop my RAM increasing above 200G. Is this normal or the script is running the dataset on the fly?

Addition: I am using your default .yml config, batch size also 1. Is adjusting larger batch size help this problem?
FYI, for each speaker my dataset size s 60-ish GB. So it means, total dataset for speaker 1 & 2, and mix are about 180 GB.

The tasks also increasing, is it normal?
Screenshot from 2020-09-29 16-56-20

about 'key'

image
hello, I can't find any definition about 'key'

Dual-Path的epoch

作者您好,在您的代码里Dual-Path被设置成了单块gpu运行,如果运行100epoch,我这边的数据量要3个月,这里的epoch如果自己设置的小一点 会有效果么?最小的话 大概设成多少?

Autoencoder替换成FFT的方法询问

您好,我想尝试看看将DPRNN中的encoder还有decoder替换成FFT来做convolution autoencoder 的效果对比,但我还只是个初学者,程序能力还不足,想询问一下能否将DPRNN的model套用到您的Deep clustering 程序码来实现

Input Normalization

I'm not sure if my mixing is exactly same as yours, but does your torchaudio read the wav files to int(value is typically around a couple hundred) or to float values between [-1, 1]?

I started with scipy which loads to int, and it caused loss going to NaN at a point. So I switched to librosa which loads to float

GPU Usage

Hi Jusper

I had tried your code to deal with speech denoise and I set the speaker at 1.

However, it only took up 1 gpu when I use 4 gpus as multi-gpu training, cause the training very slow.

Have you ever met the problem?

Sampling rate and application

Hello! We are very glad to read the code you've shared here. It saves a lot of time for our work!Then we noticed that the audio file sampling rate of file in the dataset is 8k.However, the current audio files are mostly 44.1k.So we wonder if it'll be a good result for the module if 44.1k is used.
In addition, we wonder if this model can be used in other scenes,for example, to separate mixing birds songs.
Thanks again for your contribution!

RuntimeError: Error(s) in loading state_dict for Conv_TasNet: Missing key(s) in state_dict: "separation.conv1d_list.0.0.conv1x1.weight",

When I run inference with provided model, I got following error, is there anything wrong in the model?

root:/lab/Dual-Path-RNN-Pytorch# python3 test_tasnet.py -mix_scp=sample/test.scp -save_path=sample/result
Traceback (most recent call last):
File "test_tasnet.py", line 81, in
main()
File "test_tasnet.py", line 76, in main
separation=Separation(args.mix_scp, args.yaml, args.model, gpuid)
File "test_tasnet.py", line 20, in init
net.load_state_dict(dicts["model_state_dict"])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Conv_TasNet:
Missing key(s) in state_dict: "separation.conv1d_list.0.0.conv1x1.weight", "separation.conv1d_list.0.0.conv1x1.bias", "separation.conv1d_list.0.0.PReLu1.weight", "separation.conv1d_list.0.0.norm1.weight", "separation.conv1d_list.0.0.norm1.bias", "separation.conv1d_list.0.0.dwconv.weight", "separation.conv1d_list.0.0.dwconv.bias", "separation.conv1d_list.0.0.PReLu2.weight", "separation.conv1d_list.0.0.norm2.weight", "separation.conv1d_list.0.0.norm2.bias", "separation.conv1d_list.0.0.end_conv1x1.weight", "separation.conv1d_list.0.0.end_conv1x1.bias", "separation.conv1d_list.0.1.conv1x1.weight", "separation.conv1d_list.0.1.conv1x1.bias", "separation.conv1d_list.0.1.PReLu1.weight", "separation.conv1d_list.0.1.norm1.weight", "separation.conv1d_list.0.1.norm1.bias", "separation.conv1d_list.0.1.dwconv.weight", "separation.conv1d_list.0.1.dwconv.bias", "separation.conv1d_list.0.1.PReLu2.weight", "separation.conv1d_list.0.1.norm2.weight", "separation.conv1d_list.0.1.norm2.bias", "separation.conv1d_list.0.1.end_conv1x1.weight", "separation.conv1d_list.

Batch size of DPRNN_TasNet

Hi Kai,

I noticed that the bachsize in your training config of DPRNN_TasNet is 1. Have you tried larger batchsize? How is the performance compared with batchsize 1?

Thanks a lot!

Output normalization in dualrnn_test_wav.py

Thank you for your contribution !

I have a question in the line 44 of dualrnn_test_wav.py.

You normalize the DPRNN prediction like below:

norm = torch.norm(egs,float('inf'))
#norm
s = s - torch.mean(s)
s = s*norm/torch.max(torch.abs(s))

I'm a little confused about here, because you don't do any input normalization in training pipeline, why you do here?

双路径RNN为什么需要重叠一个P

作者您好,双路径RNN在分段的时候,为什么需要使用重叠?单独使用一个P,分成N/P段,不重叠的话,对效果的影响大吗?

关于运行train_rnn.py的问题

C:\Users\yhq.conda\envs\pytorch\python.exe D:/Dual-Path-RNN-Pytorch-master/Dual-Path-RNN-Pytorch-master/train_rnn.py
C:\Users\yhq.conda\envs\pytorch\lib\site-packages\torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
Traceback (most recent call last):
File "D:/Dual-Path-RNN-Pytorch-master/Dual-Path-RNN-Pytorch-master/train_rnn.py", line 91, in
train()
File "D:/Dual-Path-RNN-Pytorch-master/Dual-Path-RNN-Pytorch-master/train_rnn.py", line 61, in train
opt = option.parse(args.opt)
File "D:\Dual-Path-RNN-Pytorch-master\Dual-Path-RNN-Pytorch-master\config\option.py", line 4, in parse
with open(opt_path, mode='r') as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Process finished with exit code 1
请问一下出现这个错误怎么解决,或者说是哪方面出现了问题,求答复,感谢!

Evaluation on 4sec segments?

Hi,

I was just having a look at the evaluation script.
Can you confirm that you are discarding the utterances smaller than 2s and chunking all remaining utterances in chunks of 4sec, zero-padding the last chunk?

Thanks

BTW, the demo page is great !

error

这个是什么意思,怎么修改呢?求教
image

48K音频超参数设置

请问如果音频采样率为48k,超参数中除了将L改为96,其他的如N,B,H需要修改吗

multichannel

Hi, I am a newer in speech separation. I'm trying to do multichannel SS in time domain, and I want to ask if I do the multichannel, what things should I pay attention to in DPRNN?

关于fine-tune

学长,因为您这套代码是用8khz做训练的,我能否用您的pretrain-model在16khz数据集上做微调呢?

效果

image
这个效果展示网页还有链接吗?这个打不开了404,挂梯也不行

dprnn在reverberation下的性能

你好,我想请教一下dprnn在混响场景下的性能问题

我尝试使用自己合成的数据集(fs=8k,rt60=0.7s,训练使用长4s的片段)进行训练(单通道,以干净混响信号为目标,lr=1e-3)
但在训练过程中发现,相比无混响数据的训练过程,sisdr上升极慢,并且最高似乎只有约0.6dB,从分离信号时频图看两路输出也没有明显的区别。

按luo的论文看,Noise-free reverberant speech场景下WER能达到9.1%
我不清楚这是否是我的数据集存在问题,还是训练过程中有什么与无混响场景不同的设置被我遗漏导致的训练困难,望能指点一下!

error when training 3-mix

I try to train the model with 3-mix datasets, and I just change the num_speakers and data path, is there anything else I need to modify? thanks!

the following is the error detail:

Training...
data_loader.len 20000
/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/utils.py:45: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than tensor.new_tensor(sourceTensor).
frame = signal.new_tensor(frame).long().cuda() # signal may in GPU or CPU
Traceback (most recent call last):
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/train.py", line 138, in
main(args)
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/train.py", line 132, in main
solver.train()
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/solver.py", line 60, in train
tr_avg_loss = self._run_one_epoch(epoch)
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/solver.py", line 153, in _run_one_epoch
cal_loss(padded_source, estimate_source, mixture_lengths)
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/pit_criterion.py", line 19, in cal_loss
source_lengths)
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/pit_criterion.py", line 31, in cal_si_snr_with_pit
assert source.size() == estimate_source.size()
AssertionError

Process finished with exit code 1

你好,昨天听了你的意见我在知乎上看到了你有关dual-path RNN的论文解读,于是想尝试看看这篇文章的复现代码的指标性能能达到多少,对这里面的SDRi指标仍旧有不明白的地方

现在因为疫情原因不能返校,目前只有一台1050的笔记本,想请教下能跑得动这个的训练代码吗?以及在该文章中的SDRi指标如何与Google的SDR进行比较,是否可以对SDRi这样理解,纯净语音与嘈杂环境的信号失真比得到一个SDR1,纯净语音信号与生成的目标语音信号的失真比得到SDR2,然后SDR2与SDR1的对数关系比得到SDRi?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.