Giter VIP home page Giter VIP logo

nbss's Introduction

Multichannel Speech Separation, Denoising and Dereverberation

The official repo of:
[1] Changsheng Quan, Xiaofei Li. Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training. In ICASSP 2022.
[2] Changsheng Quan, Xiaofei Li. Multichannel Speech Separation with Narrow-band Conformer. In Interspeech 2022.
[3] Changsheng Quan, Xiaofei Li. NBC2: Multichannel Speech Separation with Revised Narrow-band Conformer. arXiv:2212.02076.
[4] Changsheng Quan, Xiaofei Li. SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation. TASLP, 2024.
[5] Changsheng Quan, Xiaofei Li. Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers. arXiv:2403.07675.

Audio examples can be found at https://audio.westlake.edu.cn/Research/nbss.htm and https://audio.westlake.edu.cn/Research/SpatialNet.htm. More information about our group can be found at https://audio.westlake.edu.cn.

Performance

SpatialNet:

  • SOTA performance on 6 public datasets for all the three multichannel tasks (Speech Separation, Denoising and Dereverberation). Below is the results on the SMS-WSJ dataset.

  • Relatively low computational cost and small model size.

Requirements

pip install -r requirements.txt

# gpuRIR: check https://github.com/DavidDiazGuerra/gpuRIR

Generate Dataset SMS-WSJ-Plus

Generate rirs for the dataset SMS-WSJ_plus used in SpatialNet ablation experiment.

CUDA_VISIBLE_DEVICES=0 python generate_rirs.py --rir_dir ~/datasets/SMS_WSJ_Plus_rirs --save_to configs/datasets/sms_wsj_rir_cfg.npz
cp configs/datasets/sms_wsj_plus_diffuse.npz ~/datasets/SMS_WSJ_Plus_rirs/diffuse.npz # copy diffuse parameters

For SMS-WSJ, please see https://github.com/fgnt/sms_wsj

Train & Test

This project is built on the pytorch-lightning package, in particular its command line interface (CLI). Thus we recommond you to have some knowledge about the CLI in lightning. For Chinese user, you can learn CLI & lightning with this begining project pytorch_lightning_template_for_beginners.

Train SpatialNet on the 0-th GPU with network config file configs/SpatialNet.yaml and dataset config file configs/datasets/sms_wsj_plus.yaml (replace the rir & clean speech dir before training).

python SharedTrainer.py fit \
 --config=configs/SpatialNet.yaml \
 --config=configs/datasets/sms_wsj_plus.yaml \
 --model.channels=[0,1,2,3,4,5] \
 --model.arch.dim_input=12 \ 
 --model.arch.dim_output=4 \ 
 --model.arch.num_freqs=129 \ 
 --trainer.precision=bf16-mixed \ 
 --model.compile=true \ 
 --data.batch_size=[2,4] \ 
 --trainer.devices=0, \
 --trainer.max_epochs=100

More gpus can be used by appending the gpu indexes to trainer.devices, e.g. --trainer.devices=0,1,2,3,.

Resume training from a checkpoint:

python SharedTrainer.py fit --config=logs/SpatialNet/version_x/config.yaml \
 --data.batch_size=[2,2] \
 --trainer.devices=0, \ 
 --ckpt_path=logs/SpatialNet/version_x/checkpoints/last.ckpt

where version_x should be replaced with the version you want to resume.

Test the model trained:

python SharedTrainer.py test --config=logs/SpatialNet/version_x/config.yaml \ 
 --ckpt_path=logs/SpatialNet/version_x/checkpoints/epochY_neg_si_sdrZ.ckpt \ 
 --trainer.devices=0,

Module Version

network file
NB-BLSTM [1] / NBC [2] / NBC2 [3] models/arch/NBSS.py
SpatialNet [4] models/arch/SpatialNet.py
online SpatialNet [5] models/arch/OnlineSpatialNet.py

Note

The dataset generation & training commands for the NB-BLSTM/NBC/NBC2 are available in the NBSS branch.

nbss's People

Contributors

quancs avatar mrbernie avatar yang-yujie avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.