ruizhecao96 / cmgan Goto Github PK

Conformer-based Metric GAN for speech enhancement

License: MIT License

Python 100.00%

cmgan's Introduction

CMGAN: Conformer-Based Metric GAN for Monaural Speech Enhancement (https://ieeexplore.ieee.org/document/10508391)

Abstract:

Recently, convolution-augmented transformer (Conformer) has achieved promising performance in automatic speech recognition (ASR) and time-domain speech enhancement (SE), as it can capture both local and global dependencies in the speech signal. In this paper, we propose a conformer-based metric generative adversarial network (CMGAN) for SE in the time-frequency (TF) domain. In the generator, we utilize two-stage conformer blocks to aggregate all magnitude and complex spectrogram information by modeling both time and frequency dependencies. The estimation of magnitude and complex spectrogram is decoupled in the decoder stage and then jointly incorporated to reconstruct the enhanced speech. In addition, a metric discriminator is employed to further improve the quality of the enhanced estimated speech by optimizing the generator with respect to a corresponding evaluation score. Quantitative analysis on Voice Bank+DEMAND dataset indicates the capability of CMGAN in outperforming various previous models with a margin, i.e., PESQ of 3.41 and SSNR of 11.10 dB.

Demo of audio samples

A longer detailed version is now available on IEEE/ACM Transactions on Audio, Speech, and Language Processing, arXiv Version.

The short manuscript is published in INTERSPEECH2022.

Source code is released!

How to train:

Step 1:

In src:

pip install -r requirements.txt

Step 2:

Download VCTK-DEMAND dataset with 16 kHz, change the dataset dir:

-VCTK-DEMAND/
  -train/
    -noisy/
    -clean/
  -test/
    -noisy/
    -clean/

Step 3:

If you want to train the model, run train.py

python3 train.py --data_dir <dir to VCTK-DEMAND dataset>

Step 4:

Evaluation with the best ckpt:

python3 evaluation.py --test_dir <dir to VCTK-DEMAND/test> --model_path <path to the best ckpt>

Model and Comparison:

The detailed architecture of CMGAN with both generator and discriminator.

Performance comparison on the Voice Bank+DEMAND dataset. “-” denotes the result is not provided in the original paper. Model size represents the number of trainable parameters in million.

Long version citation:

@misc{abdulatif2022cmgan,
  title={CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement}, 
  author={Abdulatif, Sherif and Cao, Ruizhe and Yang, Bin},
  year={2022},
  eprint={2209.11112},
  archivePrefix={arXiv}
}

Short version citation:

@inproceedings{cao22_interspeech,
  author={Cao, Ruizhe and Abdulatif, Sherif and Yang, Bin},
  title={{CMGAN: Conformer-based Metric GAN for Speech Enhancement}},
  year=2022,
  booktitle={Proc. Interspeech 2022},
  pages={936--940},
  doi={10.21437/Interspeech.2022-517}
}

cmgan's People

Contributors

Stargazers

Watchers

cmgan's Issues

时域Loss计算疑惑

在计算时域损失的时候，使用重建后的音频（乘以了c）减去纯净音频（dataloader的输出，未乘c），这个是bug还是有意为之？如果确实是这样，那么在推理的时候是否应该直接输出重建的音频，为何推理的时候重建的音频又除以c？

ComplexDecoder

Hello, I found that compared with your paper, your code has a conv layer reduced in ComplexDecoder. Could you tell me the reason or if I misunderstand your paper/code? Thank you.

Can not reproduce the results

Sorry to bother you, when I used the original parameters in Github for training, and trained for 50 epochs, the pesq only reached 2.2. I found that the generator loss would increase around 20 epochs, so I used 20 epoches for testing, with a pesq of 2.5. May I ask if this model is easily affected by random seeds or similar factors? Has PESQ not risen after 50 rounds? Here is my part of my log file

INFO:root:GPU: 0, Epoch 19, Step 500, loss: 0.13138285279273987, disc_loss: 0.0011457887012511492
INFO:root:GPU: 0, Epoch 19, Step 1000, loss: 0.13329894840717316, disc_loss: 0.002238199347630143
INFO:root:GPU: 0, Epoch 19, Step 1500, loss: 0.1375545859336853, disc_loss: 0.0013448027893900871
INFO:root:GPU: 0, Epoch 19, Step 2000, loss: 0.10363011807203293, disc_loss: 0.00407591974362731
INFO:root:GPU: 0, Epoch 19, Step 2500, loss: 0.12573812901973724, disc_loss: 0.0016270074993371964
INFO:root:GPU: 0, Generator loss: 0.10857707276506331, Discriminator loss: 0.006840812125004764
INFO:root:GPU: 0, Epoch 20, Step 500, loss: 0.13214150071144104, disc_loss: 0.0050138202495872974
INFO:root:GPU: 0, Epoch 20, Step 1000, loss: 0.1271681934595108, disc_loss: 0.001531225978396833
INFO:root:GPU: 0, Epoch 20, Step 1500, loss: 0.14278484880924225, disc_loss: 0.0020016769412904978
INFO:root:GPU: 0, Epoch 20, Step 2000, loss: 0.10626572370529175, disc_loss: 0.001245336257852614
INFO:root:GPU: 0, Epoch 20, Step 2500, loss: 0.12693673372268677, disc_loss: 0.0011474542552605271
INFO:root:GPU: 0, Generator loss: 0.11096440830711023, Discriminator loss: 0.009940159190909539
INFO:root:GPU: 0, Epoch 21, Step 500, loss: 0.13440656661987305, disc_loss: 0.002420067088678479
INFO:root:GPU: 0, Epoch 21, Step 1000, loss: 0.13075625896453857, disc_loss: 0.003778402227908373
INFO:root:GPU: 0, Epoch 21, Step 1500, loss: 0.1465650051832199, disc_loss: 0.000708427163772285
INFO:root:GPU: 0, Epoch 21, Step 2000, loss: 0.10176242142915726, disc_loss: 0.001363859511911869
INFO:root:GPU: 0, Epoch 21, Step 2500, loss: 0.12090417742729187, disc_loss: 0.003150659380480647
INFO:root:GPU: 0, Generator loss: 0.1136595553275451, Discriminator loss: 0.005748485746082524
INFO:root:GPU: 0, Epoch 22, Step 500, loss: 0.1308746635913849, disc_loss: 0.0012463699094951153
INFO:root:GPU: 0, Epoch 22, Step 1000, loss: 0.1357329934835434, disc_loss: 0.0028033568523824215
INFO:root:GPU: 0, Epoch 22, Step 1500, loss: 0.1422165483236313, disc_loss: 0.011835111305117607
INFO:root:GPU: 0, Epoch 22, Step 2000, loss: 0.10482912510633469, disc_loss: 0.002375335432589054
INFO:root:GPU: 0, Epoch 22, Step 2500, loss: 0.1230633407831192, disc_loss: 0.0025530874263495207
.
.
.
INFO:root:GPU: 0, Generator loss: 0.10818793404015523, Discriminator loss: 0.0060944516351333486
INFO:root:GPU: 0, Epoch 50, Step 500, loss: 0.13102197647094727, disc_loss: 0.001470182673074305
INFO:root:GPU: 0, Epoch 50, Step 1000, loss: 0.12735071778297424, disc_loss: 0.00043258434743620455
INFO:root:GPU: 0, Epoch 50, Step 1500, loss: 0.13080495595932007, disc_loss: 0.0006283668917603791
INFO:root:GPU: 0, Epoch 50, Step 2000, loss: 0.09814462810754776, disc_loss: 0.0006183524965308607
INFO:root:GPU: 0, Epoch 50, Step 2500, loss: 0.11451182514429092, disc_loss: 0.0008931112242862582
INFO:root:GPU: 0, Generator loss: 0.1056402796535816, Discriminator loss: 0.004768050249429471
INFO:root:GPU: 0, Epoch 51, Step 500, loss: 0.1282331794500351, disc_loss: 0.0003159200132358819
INFO:root:GPU: 0, Epoch 51, Step 1000, loss: 0.13263651728630066, disc_loss: 0.0025379392318427563
INFO:root:GPU: 0, Epoch 51, Step 1500, loss: 0.1349661946296692, disc_loss: 0.0008474360220134258
INFO:root:GPU: 0, Epoch 51, Step 2000, loss: 0.10511360317468643, disc_loss: 0.0023576354142278433
INFO:root:GPU: 0, Epoch 51, Step 2500, loss: 0.1188545823097229, disc_loss: 0.011680787429213524
INFO:root:GPU: 0, Generator loss: 0.1116596677349609, Discriminator loss: 0.003238930973998984
INFO:root:GPU: 0, Epoch 52, Step 500, loss: 0.12579278647899628, disc_loss: 0.0030293799936771393
INFO:root:GPU: 0, Epoch 52, Step 1000, loss: 0.13355058431625366, disc_loss: 0.0017012724420055747
INFO:root:GPU: 0, Epoch 52, Step 1500, loss: 0.13323292136192322, disc_loss: 0.0010849591344594955
INFO:root:GPU: 0, Epoch 52, Step 2000, loss: 0.09986048191785812, disc_loss: 0.002493425039574504
INFO:root:GPU: 0, Epoch 52, Step 2500, loss: 0.11676986515522003, disc_loss: 0.005572836380451918
INFO:root:GPU: 0, Generator loss: 0.10965107189654147, Discriminator loss: 0.00365501402201808
INFO:root:GPU: 0, Epoch 53, Step 500, loss: 0.12720242142677307, disc_loss: 0.020244047045707703
INFO:root:GPU: 0, Epoch 53, Step 1000, loss: 0.12291329354047775, disc_loss: 0.0010470837587490678
INFO:root:GPU: 0, Epoch 53, Step 1500, loss: 0.1304602324962616, disc_loss: 0.0023244067560881376
INFO:root:GPU: 0, Epoch 53, Step 2000, loss: 0.10364727675914764, disc_loss: 0.0025291203055530787
INFO:root:GPU: 0, Epoch 53, Step 2500, loss: 0.11734053492546082, disc_loss: 0.0014599505811929703
INFO:root:GPU: 0, Generator loss: 0.10770693768575354, Discriminator loss: 0.004855742649931361
INFO:root:GPU: 0, Epoch 54, Step 500, loss: 0.12715910375118256, disc_loss: 0.0006953651900403202
INFO:root:GPU: 0, Epoch 54, Step 1000, loss: 0.1277923434972763, disc_loss: 0.0006543396739289165
INFO:root:GPU: 0, Epoch 54, Step 1500, loss: 0.12873995304107666, disc_loss: 0.001449460512958467
INFO:root:GPU: 0, Epoch 54, Step 2000, loss: 0.10310039669275284, disc_loss: 0.0012442576698958874

the training speed confusion

HI, thanks for your great work.
It seems that you use single gpu to train, because the code for distrbuted training seems not perfect. And I really want to know the training time when you train it with single gpu, because when I use 4X 24 GB gpu to train, I still need 1 day to get 50 epochs.
Thanks for your attention and looking forward to your reply.

ETA for the source code release

Hey,

Do you have an ETA for the release of the CMGAN source code?

Thanks!

File "/anaconda3/envs/cmg/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 578, in init dist._verify_model_across_ranks(self.process_group, parameters) RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3 ncclInvalidUsage: This usually reflects invalid usage of NCCL library (such as too many async ops, too many collectives at once, mixing streams in a group, etc).

when I do "If you want to train the model, run train.py

python3 train.py --data_dir <dir to VCTK-DEMAND dataset>```",the program are interrupted by a bug about nccl . I can't fix it. Can you help me?

Training GPU requirements

Hello,
What GPU do you use for training?(3080? 2080?)
Number of GPUs？
Total training time?

epochs

Hello, after how many epochs of training will the fitting be achieved? I think the default training is 120 epochs

File "pesq/cypesq.pyx", line 1, in init cypesq ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use '<void>numpy._import_array' to disable if you are certain you don't need it)

hello，I met a problem when doing python evaluation,"File "pesq/cypesq.pyx", line 1, in init cypesq
ImportError: numpy.core.multiarray failed to import (auto-generated because you didn't call 'numpy.import_array()' after cimporting numpy; use 'numpy._import_array' to disable if you are certain you don't need it)". I can't fix this problem now.

Error while loading audio file while trainning on colab

Original Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/CMGAN/src/data/dataloader.py", line 25, in getitem
clean_ds, _ = torchaudio.load(clean_file)
File "/usr/local/lib/python3.8/dist-packages/torchaudio/backend/sox_io_backend.py", line 152, in load
return torch.ops.torchaudio.sox_io_load_audio_file(
RuntimeError: Error loading audio file: failed to open file /content/train/clean/.ipynb_checkpoints

Too much GPU usage

I have run evaluation.py on your given AudioSamples with a RTX 2080Ti. However, I got an error: CUDA out of memory. I aslo tried to infer your model with a single file, but the GPU use approximately 8GB VRAM. I was confused by this situation because your generator has about 1.8M parameters only.

Inferior results trained from scratch

Hello! Your paper and codes are very enlightening to me and I tried to train the model from scratch on VCTK-DEMAND dataset to reproduce the results, but I found that the results are very very bad. PESQ and SSNR are merely 2.13 and 1.12, respectively. I don't modify the codes except for changing cut_len to 1.6 and batch_size to 2 to be suitable for my limited GPU. I don't know what errors are inside my codes.
For hyper-parameters, I conducted experiments on both [0.3, 0.7, 1, 0.01] in paper and [0.1, 0.9, 0.2, 0.05] in github but results are similar. For inference, I changed variable length to 8.
Looking forward to your reply~

Pre-trained weights

Hi! First off, lovely work with the model! I'd really like to try it, but have limited GPU power available to me.
Do you have any downloadable pre-trained weights that you could make public, such that one could do inference and fine-tuning with the model.
Thanks!

RuntimeError: permute(sparse_coo): number of dimensions in the tensor input does not match the length of the desired ordering of dimensions i.e. input.dim() = 3 is not equal to len(dims) = 4

I am a beginner and I would like to ask why I have this problem using the same dataset

how to train on samplerate=44100

get error.

Run model on reference(ref) and degraded(deg)
Sample rate (fs) - No default. Must select either 8000 or 16000.
Note there is narrow band (nb) mode only when sampling rate is 8000Hz.

it seems like the model only can train on 8000hz or 16000hz?

RuntimeError

i dont want to change my pytorch version ,so i set "return_complex=False" in torch.stft()
but i got error while running torch.istft(),the problrm just like below:

/home/tewt/anaconda3/envs/hyj_test/lib/python3.9/site-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error.
Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.)
return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined]
Traceback (most recent call last):
File "/home/tewt/Desktop/hyj_projects/CMGAN/src/hyj_test2.py", line 53, in
est_audio = torch.istft(est_spec_uncompress, n_fft, hop, window=torch.hamming_window(n_fft).cuda(),
RuntimeError: istft requires a complex-valued input tensor matching the output from stft with return_complex=True.

can anyone give some suggestions?

Extract estimated noise

Hi!
First off, excellent paper and code! I learnt a lot about SE from the paper and find your code very well structured and clear. Thank you!

Now, what I'd like to be able to do is to extract the estimated noise (as opposed to the estimated speech). I have experimented a little with the code but not got been able to extract the noise. Is this supported by the model structure, and if so, how?

Kindest wishes.

the change of gen_loss during training

Due to limited GPU memory I have to change following parameters to reduce memory usage in training:

batchsize: from 4 to 2
self.hop: from 100 to 200

Training works fine and the gen_loss in the checkpoint model filename seems to be stuck about 0.057. Just curious about what the gen_loss would be if I use the default parameters.

RuntimeError

RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 0

RuntimeeError

istft requires a complex-valued input tensor matching the output from stft with return_complex=True.

RuntimeError: the derivative for 'angle' is not implemented

Hello! Thank you for your open source code！I changed the batch size to 3, but I encountered the following problems:

How do you resample to 16000?

When I use VCTK-DEMAND dataset to test， I found the sample_rate of wavs in the VCTK-DEMAND/test/ is 48000, but evaluation.py

assert sr == 16000

so I add lines to resample,

def evaluation(model_path, noisy_dir, clean_dir, save_tracks, saved_dir):

```
clean_audio, sr = sf.read(clean_path)
```
clean_audio = librosa.resample(clean_audio, sr, 16000)
metrics = compute_metrics(clean_audio, est_audio, 16000, 0)

def enhance_one_track(model, audio_path, saved_dir, cut_len, n_fft=400, hop=100, save_tracks=False):

` noisy, sr = torchaudio.load(audio_path)

# audio_path VCTK-DEMAND/test/noisy/p232_001.wav  sr 48000

```
noisy_np = noisy.numpy()
```

noisy_resampled_np = librosa.resample(noisy_np, sr, 16000)

noisy = torch.tensor(noisy_resampled_np)

```
sr = 16000
```
```
noisy = noisy.cuda().to(device)
```

and generate some wavs. But the audio quality of the WAV file is very poor. It's hard to make out.
How do you resample to 16000? Maybe my way to resample is wrong?

And the result is

pesq: 1.2306799195634508 csig: 1.6080942775665945 cbak: 2.1193723316105366 covl: 1.4202725754636616 ssnr: 0.6998261689532145 stoi: 0.6101097034995405

Inference

Hello! I saw your work in papers with code and I wanted to try it out. Nice work.
I have completed the model training, can you share the inference code?

Have a nice day

Can the model be open sourced?

Can not reproduce the results

Hi!
Your paper and code are excellent! I have learned a lot about speech enhancement from the paper, and I find your code to be very well-structured and clear. Thank you so much!

I can not reproduce the results in your paper. I just want to know some settings to run the experiments

about the loss_weights, do you use the setting in your paper or the setting in your github?
about the epoch number, do you use 50 in the paper or 120 in the github repo?
how do you select the final model for inference?
why you set the utterance length as 16 * 16000 during testing
How do you downsample the audio, could you share the script?

Run-time Error

Hey
can you please tell me how to resolve this error
It generates while running the python train.py file

Platform: Google Colaboratory

RuntimeError: CUDA out of memory. Tried to allocate 1.97 GiB (GPU 0; 14.76 GiB total capacity; 12.16 GiB already allocated; 1.04 GiB free; 12.32 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My server has a 3090, but reports that I don't have a gpu

[W CUDAFunctions.cpp:108] Warning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (function operator())
Traceback (most recent call last):
File "/root/.pycharm_helpers/pydev/pydevd.py", line 1483, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/root/autodl-tmp/CMGAN/src/train.py", line 297, in
python-BaseException
mp.spawn(main, args=(world_size, args), nprocs=world_size)
File "/root/miniconda3/envs/CMGAN/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 246, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method="spawn")
File "/root/miniconda3/envs/CMGAN/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 202, in start_processes
while not context.join():
File "/root/miniconda3/envs/CMGAN/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 163, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/root/miniconda3/envs/CMGAN/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 74, in _wrap
fn(i, *args)
File "/root/autodl-tmp/CMGAN/src/train.py", line 279, in main
ddp_setup(rank, world_size)
File "/root/autodl-tmp/CMGAN/src/train.py", line 42, in ddp_setup
init_process_group(backend="nccl", rank=rank, world_size=world_size)
File "/root/miniconda3/envs/CMGAN/lib/python3.9/site-packages/torch/distributed/c10d_logger.py", line 74, in wrapper
func_return = func(*args, **kwargs)
File "/root/miniconda3/envs/CMGAN/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1148, in init_process_group
default_pg, _ = _new_process_group_helper(
File "/root/miniconda3/envs/CMGAN/lib/python3.9/site-packages/torch/distributed/distributed_c10d.py", line 1279, in _new_process_group_helper
backend_class = ProcessGroupNCCL(backend_prefix_store, group_rank, group_size, pg_options)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

About the decreasing of loss

Hello,

Thanks for releasing the code!

I tried to train the model in my own 8kHz dataset. I wonder if you can provide your training log. Will the generator loss decrease rapidly in the first few epochs? I just want to make sure it is correct to implement the model in my own dataset...

Thanks a lot.

Training can get stuck

Hello
when batch_size=4, mp.spawn(main, args=(2, args), nprocs=2), train_ds, test_ds = dataloader.load_data(args.data_dir, args.batch_size, 8, args.cut_len),When training to the period of 1epoch, GPU=100%, training will be stuck, may I ask you have encountered this, how to solve?

模型训练的采样率以及显卡训练配置咨询

您好很感谢您在语音增强领域做出的贡献我想问一下您的模型训练的显卡配置是什么我下载了VCTK-DEMAND数据集但是训练数据是48khz的请问您用作训练的数据的samplerate是经过重采样降到16khz吗？

Not match the pretrained ckpt

I use VCTK-DEMAND to train the model (120 epoch)，but when I use the 120 epoch ckpt to inference the VCTK-DEMAND noisy data，the denoising effect is not as good as the pretrained ckpt.
Did you use extract dataset to train the model?

Dataset

Can you please share the code that you organise the VCTK Corpus?
Thanks

Hello, may I ask if your final result is the average of many experiments

Does this program require a large amount of gpu memory？

I tried to reproduce the code and found that although there are not many network parameters, it takes up 20g of GPU memory. Is the code I reproduced wrong?

When the GPU is occupied

run evaluation.py
When the GPU is occupied, for example, how big the GPU needs to be in 20 seconds

Real-time enhancement

Hi,

Is this project working on real-time? If not, do you think I can adjust for real time.

Thank you

RuntimeError

RuntimeError: stft requires the return_complex parameter be given for real inputs, and will further require that return_complex=True in a future PyTorch release.

what python version this model work on ?

it gives me this

ERROR: Could not find a version that satisfies the requirement torch==1.10.0+cu113 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0)
ERROR: No matching distribution found for torch==1.10.0+cu113

I think I am using the wrong python version

Have you trained the model on the DNS Challenge dataset?

The samples you provided sound impressive! It would be nice to know how well the model works for the DNS challenge dataset.

RuntimeError: NCCL error in: /……/.cpp:957, invalid usage, NCCL version 21.0.3

RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1634272204863/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3

Test set requirements when training

Hello,
I am implementing your code to train another model on another dataset.
However, when I only put the train set (without the test set) in the "data" folder.
The error below appeared:
=> DO WE NEED TO USE TEST SETS IN TRAINING PHASE?

CAN WE USE THE MODEL TO CLEAN ANOTHER TEST SETS?

$ python3 train.py --data_dir /scratch/thinh/CMGAN/20240227_Backup_alarm_5dB/CMGAN/src/data/
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 1
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
Namespace(batch_size=1, cut_len=32000, data_dir='/scratch/thinh/CMGAN/20240227_Backup_alarm_5dB/CMGAN/src/data/', decay_epoch=30, epochs=20, init_lr=0.0005, log_interval=500, loss_weights=[0.1, 0.9, 0.2, 0.05], save_model_dir='./saved_model')
['Tesla V100-SXM2-16GB', 'Tesla V100-SXM2-16GB']
Traceback (most recent call last):
File "train.py", line 298, in
mp.spawn(main, args=(world_size, args), nprocs=world_size)
File "/home/thinh/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/thinh/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/thinh/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/thinh/.local/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/scratch/thinh/CMGAN/20240227_Backup_alarm_5dB/CMGAN/src/train.py", line 288, in main
args.data_dir, args.batch_size, 2, args.cut_len
File "/scratch/thinh/CMGAN/20240227_Backup_alarm_5dB/CMGAN/src/data/dataloader.py", line 60, in load_data
test_ds = DemandDataset(test_dir, cut_len)
File "/scratch/thinh/CMGAN/20240227_Backup_alarm_5dB/CMGAN/src/data/dataloader.py", line 18, in init
self.clean_wav_name = os.listdir(self.clean_dir)
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/thinh/CMGAN/20240227_Backup_alarm_5dB/CMGAN/src/data/test/clean'

Model Finetuning

Hello Ruizhe. Thank you for sharing the entire code!
I am trying to fine tune the saved ckpt on a custom dataset and I was wondering if you could give a few hints regarding the following code. Is this correct regarding loading the saved best_ckpt that you provided?

def train(self):
        scheduler_G = torch.optim.lr_scheduler.StepLR(self.optimizer, step_size=decay_epoch, gamma=0.5)
        scheduler_D = torch.optim.lr_scheduler.StepLR(self.optimizer_disc, step_size=decay_epoch, gamma=0.5)
        **self.model.load_state_dict((torch.load(model_path)))**
        for epoch in range(epochs):
            self.model.train()
            self.discriminator.train()
            for idx, batch in enumerate(self.train_ds):
                step = idx + 1
                loss, disc_loss = self.train_step(batch)
                template = 'GPU: {}, Epoch {}, Step {}, loss: {}, disc_loss: {}'
                # template = 'Epoch {}, Step {}, loss: {}, disc_loss: {}'
                if (step % log_interval) == 0:
                    # logging.info(template.format(epoch, step, loss, disc_loss))
                    logging.info(template.format(self.gpu_id, epoch, step, loss, disc_loss))
            gen_loss = self.test()
            path = os.path.join(save_model_dir, 'CMGAN_epoch_' + str(epoch) + '_' + str(gen_loss)[:5])
            if not os.path.exists(save_model_dir):
                os.makedirs(save_model_dir)
            # if self.gpu_id == 0:
            # torch.save(self.model.module.state_dict(), path)
            torch.save(self.model.state_dict(), path)
            scheduler_G.step()
            scheduler_D.step()

Chinese

Sorry for trouble I got to you.I want ask if the model is suitable for Chinese Speech dataset(contain clean and noise)

SSNR is 10.4

I used your open-source code in github, ran evaluation.py, and also downloaded VCTK+DEMAND data. The model in it uses the model checkpoint you left on github in best_ckpt folder. Because the source data is 48000hz, I added the resampling part in the code. After running evaluation.py, the SSNR of the result is only 10.4, and other indicators are also slightly lower than the paper. If you have any opinions, I would appreciate it very much.

The weight

Hello, I want to ask if the weight under the path "src best_ckpt" is a trained weight? Because I use it to evaluate my own data results are not good.

ruizhecao96 / cmgan Goto Github PK

cmgan's Introduction

CMGAN: Conformer-Based Metric GAN for Monaural Speech Enhancement (https://ieeexplore.ieee.org/document/10508391)

Abstract:

How to train:

Step 1:

Step 2:

Step 3:

Step 4:

Model and Comparison:

Long version citation:

Short version citation:

cmgan's People

Contributors

Stargazers

Watchers

Forkers

cmgan's Issues

Recommend Projects

Recommend Topics

Recommend Org