jusperlee / dual-path-rnn-pytorch Goto Github PK

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch

License: Apache License 2.0

Python 100.00%

deep-learning pytorch rnn-model speech-separation speech-separation-algorithm

dual-path-rnn-pytorch's Introduction

Hey 👋🏽, I'm Kai Li!

My name is Kai Li (Chinese name: 李凯). I'm a second-year master student at Department of Computer Science and Technology, Tsinghua University, supervised by Prof. Xiaolin Hu (胡晓林). I am also a member of TSAIL Group directed by Prof. Bo Zhang (张拨) and Prof. Jun zhu (朱军). I am an intern at Tencent AI Lab, mainly doing research on causal speech separation, supervised by Yi Luo (罗艺).

🤗 These works are open source to the best of my ability.

🤗 I am currently doing research on multimodal speech separation, and am interested in other speech tasks (e.g., pre-training models and neuralscience). If you would like to collaborate, please contact me. Many thanks.

🔖 Homepages

: Kai Li : Jusper Lee : cslikai.cn

📅 News

2023.07: 🎲 One paper is accepted by ECAI 2023.
2023.05: 🧩 Two papers are accepted by Interspeech 2023.
2023.05: 🎉 We won the first prize 🥇 of the Cinematic Sound Demixing Track 23 in the Leaderboard A and B.
2023.05: 🎉 We won the first prize 🥇 of the ASC23 and Best Application Award.
2023.04: 🎲 One paper is appeared by Arxiv.
2023.02: 🧩 One paper is accepted by ICASSP 2023.
2023.01: 🧩 One paper is accepted by ICLR 2023.

📰 Selected Publications:

See Google Scholar for a full list of publications.

Speech Separation

An efficient encoder-decoder architecture with top-down attention for speech separation. Kai Li, Runxuan Yang, Xiaolin Hu. ICLR 2023.
An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits. Kai Li, Fenghua Xie, Hang Chen, Kexin Yuan, Xiaolin Hu. Arxiv 2022.
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network Xiaolin Hu, Kai Li, Weiyi Zhang, Yi Luo, Jean-Marie Lemercier, Timo Gerkmann. NeurIPS 2021.

Neuroscience

Inferring mechanisms of auditory attentional modulation with deep neural networks. Ting-Yu Kuo, Yuanda Liao, Kai Li, Bo Hong, Xiaolin Hu. Neural Computation 2022.

Cloud Removal

PMAA: A Progressive Multi-scale Attention Autoencoder Model for High-Performance Cloud Removal from Multi-temporal Satellite Imagery. Xuechao Zou, Kai Li, Junliang Xing, Pin Tao#, Yachao Cui. ECAI 2023.

Super Resolution

A Survey of Single Image Super Resolution Reconstruction. Kai Li, Shenghao Yang, Runting Dong, Jianqiang Huang, Xiaoying Wang. IET Image Processing 2020.
Single Image Super-resolution Reconstruction of Enhanced Loss Function with Multi-GPU Training. Jianqiang Huang, Kai Li, Xiaoying Wang. ISPA 2019.

dual-path-rnn-pytorch's People

Contributors

Stargazers

Watchers

dual-path-rnn-pytorch's Issues

DPRNN的前几代的loss大概是多少

首先感谢博主分享了自己的开源代码，其次我有2个问题需要咨询下，因为您这个代码是适用于8k，我想改成16k，是不是修改相关的每一块的chunk_size K=sqrt(2乘16000乘4)约等于360。之前是8k对应250。第二点：想咨询下我用这个代码来做降噪发现loss都维持在17-18左右，想问下您用DPRNN做人声分离前几代的loss是大概多少的样子。期待您的回复。

分离的音频有较大噪声

训练生成的模型用于分离测试集中的混合音频，为何得到的分离音频有较大噪声呢？
stoi指标很低，有什么方法改进吗？
求助，万分感谢！

计算 loss -SI-SNR

您好，之前看您解答问题的时候说DPRNN在50个epoch的时候可以到达17dB左右，但是我现在运行了36个epoch，才到14.7dB，您知道这是什么原因吗？

是我使用的WSJ0数据集太大的原因吗？

About the requirements

hello, I am new to this project, is there anyone helping me about the version of the requirements, like pytorch, python, torch and so on.
I would be very grateful, thanks!

可以做中文语音分离吗

您好，请问这个模型要做汉语语音分离的话，是不是只需要用汉语数据集重新训练就好了呢？

Gradient explosion at 17 epochs

Loss becomes bigger at epoch 17 when training the mode on our own dataset.

20-07-15 02:34:45 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:16000, lr:5.000e-04, loss:-12.715>
20-07-15 02:50:07 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:17000, lr:5.000e-04, loss:-12.711>
20-07-15 03:05:29 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:18000, lr:5.000e-04, loss:-12.701>
20-07-15 03:20:51 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:19000, lr:5.000e-04, loss:-12.699>
20-07-15 03:36:12 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:20000, lr:5.000e-04, loss:-12.700>
20-07-15 03:51:34 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:21000, lr:5.000e-04, loss:-12.703>
20-07-15 04:06:56 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:22000, lr:5.000e-04, loss:-12.704>
20-07-15 04:22:18 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:23000, lr:5.000e-04, loss:-12.703>
20-07-15 04:37:40 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:24000, lr:5.000e-04, loss:-12.705>
20-07-15 04:53:01 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:25000, lr:5.000e-04, loss:-12.703>
20-07-15 05:08:23 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:26000, lr:5.000e-04, loss:-12.705>
20-07-15 05:23:45 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:27000, lr:5.000e-04, loss:-12.703>
20-07-15 05:39:06 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:28000, lr:5.000e-04, loss:-12.705>
20-07-15 05:54:28 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:29000, lr:5.000e-04, loss:-12.704>
20-07-15 06:09:51 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:30000, lr:5.000e-04, loss:-12.704>
20-07-15 06:25:12 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:31000, lr:5.000e-04, loss:-12.705>
20-07-15 06:40:34 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:32000, lr:5.000e-04, loss:-12.705>
20-07-15 06:55:56 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:33000, lr:5.000e-04, loss:-12.705>
20-07-15 07:11:18 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:34000, lr:5.000e-04, loss:-12.704>
20-07-15 07:26:42 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:35000, lr:5.000e-04, loss:-12.702>
20-07-15 07:42:04 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:36000, lr:5.000e-04, loss:-12.698>
20-07-15 07:57:26 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:37000, lr:5.000e-04, loss:-12.599>
20-07-15 08:12:52 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:38000, lr:5.000e-04, loss:-8.057>
20-07-15 08:28:16 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:39000, lr:5.000e-04, loss:-3.748>
20-07-15 08:43:40 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:40000, lr:5.000e-04, loss:0.346>
20-07-15 08:59:05 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:41000, lr:5.000e-04, loss:4.240>
20-07-15 09:14:31 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:17, iter:42000, lr:5.000e-04, loss:7.948>
20-07-15 09:28:02 [/trainer/trainer_Dual_RNN.py:106 - INFO ] Finished *** <epoch:17, iter:42879, lr:5.000e-04, loss:11.061, Total time:659.562 min>
20-07-15 09:28:02 [/trainer/trainer_Dual_RNN.py:111 - INFO ] Start Validation from epoch: 17, iter: 0
20-07-15 09:34:36 [/trainer/trainer_Dual_RNN.py:136 - INFO ] <epoch:17, iter:1000, lr:5.000e-04, loss:160.000>
20-07-15 09:41:11 [/trainer/trainer_Dual_RNN.py:136 - INFO ] <epoch:17, iter:2000, lr:5.000e-04, loss:160.000>
20-07-15 09:42:57 [/trainer/trainer_Dual_RNN.py:142 - INFO ] Finished *** <epoch:17, iter:2267, lr:5.000e-04, loss:159.929, Total time:14.915 min>
20-07-15 09:42:57 [/trainer/trainer_Dual_RNN.py:171 - INFO ] No improvement, Best Loss: -12.5711
20-07-15 09:42:57 [/trainer/trainer_Dual_RNN.py:70 - INFO ] Start training from epoch: 18, iter: 0
20-07-15 09:58:27 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:1000, lr:5.000e-04, loss:160.000>
20-07-15 10:14:02 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:2000, lr:5.000e-04, loss:160.000>
20-07-15 10:29:38 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:3000, lr:5.000e-04, loss:160.000>
20-07-15 10:45:08 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:4000, lr:5.000e-04, loss:160.000>
20-07-15 11:00:39 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:5000, lr:5.000e-04, loss:160.000>
20-07-15 11:16:10 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:6000, lr:5.000e-04, loss:160.000>
20-07-15 11:31:38 [/trainer/trainer_Dual_RNN.py:100 - INFO ] <epoch:18, iter:7000, lr:5.000e-04, loss:160.000>

DPRNN batch size

您好，我看到您的知乎分享贴中说DPRNN好像一个batch size效果最好, 您试过batch size等于2或者更高的时候吗，您在代码的readme中说100个epoch之后DPRNN的sisnr能到达18.98dB，那batch size = 2的时候sisnr能到达多少您记得吗？因为batchsize=1训练起来非常慢，我想试试batch size=2的时候，但是不知道能获得什么效果。

Curve of train and val loss of DPRNN_TasNet

Hi Kai,

Could you pls. share the train and val loss curve of DPRNN TasNet? Just as that of ConvTasNet in your Result part.

Thanks a lot!

Number of speakers

During inference, if I need to test for a track with 3 speakers, do I need to retrain the model?
Because from the given checkpoint best.pt, there is a mismatch in the shape of separation.conv2d.bias and separation.conv2d.weight tensors, when tried with num_spks = 3.

运行DPRNN时出现TypeError: expected str, bytes or os.PathLike object, not NoneType

你好！请问运行DPRNN时一直出现TypeError: expected str, bytes or os.PathLike object, not NoneType的问题，请问是什么原因呢？谢谢解答！

Test result has been speed up

Wanna ask why I using the test audio from your demo pages with the dualrnn_test_wav.py to test, the result has been speed up, from originally 6 sec to output of 2 sec. Is there some parameters I need to adjust?

Source I'm using:
Yaml file: ./config/Dual_RNN/train_rnn.yml
Pretrained model: best.pt file for dprnn from your documentation
Audio file: demo pages from your documentation

about train_rnn.py

学长好，我在运行您的DPRNN的时候，运行python train_rnn.py --opt config/Dual_RNN/train_rnn.yml时，不知道为什么一直报下面的错，不知道为什么：
22-06-09 16:31:20 [train_rnn.py:69 - INFO ] Building the model of Dual-Path-RNN
22-06-09 16:31:20 [train_rnn.py:72 - INFO ] Building the optimizer of Dual-Path-RNN
22-06-09 16:31:20 [train_rnn.py:76 - INFO ] Building the dataloader of Dual-Path-RNN
22-06-09 16:34:34 [train_rnn.py:81 - INFO ] Train Datasets Length: 27698, Val Datasets Length: 7004
22-06-09 16:34:34 [train_rnn.py:90 - INFO ] Building the Trainer of Dual-Path-RNN
22-06-09 16:34:34 [/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py:35 - INFO ] Load Nvida GPU .....
22-06-09 16:34:43 [/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py:41 - INFO ] Loading Dual-Path-RNN parameters: 2.634 Mb
22-06-09 16:34:43 [/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py:69 - INFO ] Gradient clipping by 5, default L2
22-06-09 16:34:43 [/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py:116 - INFO ] Start Validation from epoch: 0, iter: 0
Traceback (most recent call last):
File "train_rnn.py", line 96, in
train()
File "train_rnn.py", line 92, in train
trainer.run()
File "/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py", line 162, in run
v_loss = self.validation(self.cur_epoch)
File "/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/trainer/trainer_Dual_RNN.py", line 135, in validation
out = torch.nn.parallel.data_parallel(self.dualrnn,mix,device_ids=self.gpuid)
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 207, in data_parallel
return module(*inputs[0], **module_kwargs[0])
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/model/model_rnn.py", line 404, in forward
s = self.separation(e)
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/model/model_rnn.py", line 289, in forward
x = self.dual_rnni
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wangtt/wttpython_project/Dual-Path-RNN-Pytorch-master/model/model_rnn.py", line 207, in forward
intra_rnn = self.intra_norm(intra_rnn)
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 100, in forward
self._check_input_dim(input)
File "/home/wangtt/anaconda3/envs/Dual_path_RNN_Sepatation/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 207, in _check_input_dim
.format(input.dim()))
ValueError: expected 2D or 3D input (got 4D input)

which version of pytorch in this experiment?

use this model to noise reduction

I used this project for noise reduction, this was what I got, Its really amazing!!! Thanks for your implement.
|-samples
|-mix: mixture audio (clean speech + noise)
|-speech-actual: clean audio
|-speech-separated: separated audio

all the audios are from test set.
samples.zip

Audio samples?

Could you upload some audio samples?

How long it take for train 1 epoch with wsj dataset?

Hi, I only have one tesla T4 for training this model. And I'm about to compare with Conv-Tasnet. Could you tell me how long it take for train 1 epoch with your own mixed data? thx!

How is the performance（rtf） on the CPU or GPU?

PESQ

作者，您好，Dual path的效果SDR很高13.多，听的也很清楚，但是计算pesq和stoi值的时候很低，pesq只有1.9左右，这个可能是什么问题呢？

About computing loss -SI-SNR

Thank you for your great work in Speech Separation, I've been benifited a lot from your awesome-speech-separation repository.
I am a little confused when computing loss:
In model->loss.py->function sisnr(): you return 20log10(...）at last, but when computing Si-SNR, we should use 10log10(...), should we ?
I can achieve about 19 in Si-SNR according to your code, but if I change to 10, the result falls

Increasing Memory Size Problem

Hello there,

I was using this repo, trying to resume training your pretrained model using my own dataset. I already prepared the dataset as the readme tutorial guided. The problem now is; I ran the train_rnn.py script. Yes, I successfully run and freeze at:

20-09-29 15:51:21 [train_rnn.py:66 - INFO ] Building the model of Dual-Path-RNN
20-09-29 15:51:21 [train_rnn.py:69 - INFO ] Building the optimizer of Dual-Path-RNN
20-09-29 15:51:21 [train_rnn.py:72 - INFO ] Building the dataloader of Dual-Path-RNN

It's been an hour and I saw the htop my RAM increasing above 200G. Is this normal or the script is running the dataset on the fly?

Addition: I am using your default .yml config, batch size also 1. Is adjusting larger batch size help this problem?
FYI, for each speaker my dataset size s 60-ish GB. So it means, total dataset for speaker 1 & 2, and mix are about 180 GB.

The tasks also increasing, is it normal?

about 'key'

hello, I can't find any definition about 'key'

Dual-Path的epoch

作者您好，在您的代码里Dual-Path被设置成了单块gpu运行，如果运行100epoch，我这边的数据量要3个月，这里的epoch如果自己设置的小一点会有效果么？最小的话大概设成多少？

Autoencoder替换成FFT的方法询问

您好，我想尝试看看将DPRNN中的encoder还有decoder替换成FFT来做convolution autoencoder 的效果对比，但我还只是个初学者，程序能力还不足，想询问一下能否将DPRNN的model套用到您的Deep clustering 程序码来实现

dual_path_test_wav的输出采样应该是写错了

学长，8000hz训练的，输出应该是8000吧，宁写出的时候那里写成16khz了

Input Normalization

I'm not sure if my mixing is exactly same as yours, but does your torchaudio read the wav files to int(value is typically around a couple hundred) or to float values between [-1, 1]?

I started with scipy which loads to int, and it caused loss going to NaN at a point. So I switched to librosa which loads to float

output and output_gate in decoder?

Hi, are those two something you added yourself, or are they from the original DPRNN paper?

GPU Usage

Hi Jusper

I had tried your code to deal with speech denoise and I set the speaker at 1.

However, it only took up 1 gpu when I use 4 gpus as multi-gpu training, cause the training very slow.

Have you ever met the problem?

为什么去掉Conv-TasNet的skip-connection部分？

想知道Conv-TasNet为什么不保留原论文中的skip-connection部分呢？

运行DPRNN时出现TypeError: expected str, bytes or os.PathLike object, not NoneType

Sampling rate and application

Hello! We are very glad to read the code you've shared here. It saves a lot of time for our work!Then we noticed that the audio file sampling rate of file in the dataset is 8k.However, the current audio files are mostly 44.1k.So we wonder if it'll be a good result for the module if 44.1k is used.
In addition, we wonder if this model can be used in other scenes，for example, to separate mixing birds songs.
Thanks again for your contribution!

train_rnn.py error in make_dataloader(opt)

Don't discard last chunk during training

Currently the data loader code discards the last chunk if it doesn't exactly fit N * chunk_size https://github.com/JusperLee/Dual-Path-RNN-Pytorch/blob/master/data_loader/AudioData.py#L77

For example taking the default step size of 1s and chunk size of 2s and an audio clip of length 3.5s, these are the chunks:

start | start + chunk_size
0     | 2 -> ok
1     | 3 -> ok
2     | 4 -> break

Is this on purpose? Shouldn't we zero-pad the last chunk?

RuntimeError: Error(s) in loading state_dict for Conv_TasNet: Missing key(s) in state_dict: "separation.conv1d_list.0.0.conv1x1.weight",

When I run inference with provided model, I got following error, is there anything wrong in the model?

root:/lab/Dual-Path-RNN-Pytorch# python3 test_tasnet.py -mix_scp=sample/test.scp -save_path=sample/result
Traceback (most recent call last):
File "test_tasnet.py", line 81, in
main()
File "test_tasnet.py", line 76, in main
separation=Separation(args.mix_scp, args.yaml, args.model, gpuid)
File "test_tasnet.py", line 20, in init
net.load_state_dict(dicts["model_state_dict"])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for Conv_TasNet:
Missing key(s) in state_dict: "separation.conv1d_list.0.0.conv1x1.weight", "separation.conv1d_list.0.0.conv1x1.bias", "separation.conv1d_list.0.0.PReLu1.weight", "separation.conv1d_list.0.0.norm1.weight", "separation.conv1d_list.0.0.norm1.bias", "separation.conv1d_list.0.0.dwconv.weight", "separation.conv1d_list.0.0.dwconv.bias", "separation.conv1d_list.0.0.PReLu2.weight", "separation.conv1d_list.0.0.norm2.weight", "separation.conv1d_list.0.0.norm2.bias", "separation.conv1d_list.0.0.end_conv1x1.weight", "separation.conv1d_list.0.0.end_conv1x1.bias", "separation.conv1d_list.0.1.conv1x1.weight", "separation.conv1d_list.0.1.conv1x1.bias", "separation.conv1d_list.0.1.PReLu1.weight", "separation.conv1d_list.0.1.norm1.weight", "separation.conv1d_list.0.1.norm1.bias", "separation.conv1d_list.0.1.dwconv.weight", "separation.conv1d_list.0.1.dwconv.bias", "separation.conv1d_list.0.1.PReLu2.weight", "separation.conv1d_list.0.1.norm2.weight", "separation.conv1d_list.0.1.norm2.bias", "separation.conv1d_list.0.1.end_conv1x1.weight", "separation.conv1d_list.

Batch size of DPRNN_TasNet

Hi Kai,

I noticed that the bachsize in your training config of DPRNN_TasNet is 1. Have you tried larger batchsize? How is the performance compared with batchsize 1?

Thanks a lot!

Output normalization in dualrnn_test_wav.py

Thank you for your contribution !

I have a question in the line 44 of dualrnn_test_wav.py.

You normalize the DPRNN prediction like below:

norm = torch.norm(egs,float('inf'))
#norm
s = s - torch.mean(s)
s = s*norm/torch.max(torch.abs(s))

I'm a little confused about here, because you don't do any input normalization in training pipeline, why you do here?

双路径RNN为什么需要重叠一个P

作者您好，双路径RNN在分段的时候，为什么需要使用重叠？单独使用一个P，分成N/P段，不重叠的话，对效果的影响大吗？

关于运行train_rnn.py的问题

C:\Users\yhq.conda\envs\pytorch\python.exe D:/Dual-Path-RNN-Pytorch-master/Dual-Path-RNN-Pytorch-master/train_rnn.py
C:\Users\yhq.conda\envs\pytorch\lib\site-packages\torchaudio\extension\extension.py:14: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
Traceback (most recent call last):
File "D:/Dual-Path-RNN-Pytorch-master/Dual-Path-RNN-Pytorch-master/train_rnn.py", line 91, in
train()
File "D:/Dual-Path-RNN-Pytorch-master/Dual-Path-RNN-Pytorch-master/train_rnn.py", line 61, in train
opt = option.parse(args.opt)
File "D:\Dual-Path-RNN-Pytorch-master\Dual-Path-RNN-Pytorch-master\config\option.py", line 4, in parse
with open(opt_path, mode='r') as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Process finished with exit code 1
请问一下出现这个错误怎么解决，或者说是哪方面出现了问题，求答复，感谢！

Evaluation on 4sec segments?

Hi,

I was just having a look at the evaluation script.
Can you confirm that you are discarding the utterances smaller than 2s and chunking all remaining utterances in chunks of 4sec, zero-padding the last chunk?

Thanks

BTW, the demo page is great !

error

这个是什么意思，怎么修改呢？求教

There is no need to read all the data into CPU memory.

您好，我想问这种方法是通过训练代码对wsj02_mix里所有混合语音进行训练，再在测试代码里输入wsj02_mix里的一个混合语音得到该混合语音的分离结果吗？

48K音频超参数设置

请问如果音频采样率为48k，超参数中除了将L改为96，其他的如N,B,H需要修改吗

multichannel

Hi, I am a newer in speech separation. I'm trying to do multichannel SS in time domain, and I want to ask if I do the multichannel, what things should I pay attention to in DPRNN?

关于fine-tune

学长，因为您这套代码是用8khz做训练的，我能否用您的pretrain-model在16khz数据集上做微调呢？

效果

这个效果展示网页还有链接吗？这个打不开了404，挂梯也不行

how many epochs I need at least?

how many epochs when do your own training? Now I've run two epoch and loss is -16.8.

dprnn在reverberation下的性能

你好，我想请教一下dprnn在混响场景下的性能问题

我尝试使用自己合成的数据集（fs=8k，rt60=0.7s，训练使用长4s的片段）进行训练（单通道，以干净混响信号为目标，lr=1e-3）
但在训练过程中发现，相比无混响数据的训练过程，sisdr上升极慢，并且最高似乎只有约0.6dB，从分离信号时频图看两路输出也没有明显的区别。

按luo的论文看，Noise-free reverberant speech场景下WER能达到9.1%
我不清楚这是否是我的数据集存在问题，还是训练过程中有什么与无混响场景不同的设置被我遗漏导致的训练困难，望能指点一下！

error when training 3-mix

I try to train the model with 3-mix datasets, and I just change the num_speakers and data path, is there anything else I need to modify? thanks!

the following is the error detail:

Training...
data_loader.len 20000
/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/utils.py:45: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than tensor.new_tensor(sourceTensor).
frame = signal.new_tensor(frame).long().cuda() # signal may in GPU or CPU
Traceback (most recent call last):
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/train.py", line 138, in
main(args)
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/train.py", line 132, in main
solver.train()
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/solver.py", line 60, in train
tr_avg_loss = self._run_one_epoch(epoch)
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/solver.py", line 153, in _run_one_epoch
cal_loss(padded_source, estimate_source, mixture_lengths)
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/pit_criterion.py", line 19, in cal_loss
source_lengths)
File "/home/bsipl_1/PycharmProjects/dual-path-RNNs-DPRNNs-based-speech-separation-master/pit_criterion.py", line 31, in cal_si_snr_with_pit
assert source.size() == estimate_source.size()
AssertionError

Process finished with exit code 1

请问我可以使用cpu训练吗在train.yml里面应该怎么修改

你好，昨天听了你的意见我在知乎上看到了你有关dual-path RNN的论文解读，于是想尝试看看这篇文章的复现代码的指标性能能达到多少，对这里面的SDRi指标仍旧有不明白的地方

现在因为疫情原因不能返校，目前只有一台1050的笔记本，想请教下能跑得动这个的训练代码吗？以及在该文章中的SDRi指标如何与Google的SDR进行比较，是否可以对SDRi这样理解，纯净语音与嘈杂环境的信号失真比得到一个SDR1，纯净语音信号与生成的目标语音信号的失真比得到SDR2，然后SDR2与SDR1的对数关系比得到SDRi？