santi-pdp / segan_pytorch Goto Github PK

View Code? Open in Web Editor NEW

375.0 375.0 109.0 566 KB

Speech Enhancement Generative Adversarial Network in PyTorch

License: MIT License

Python 95.67% Shell 0.59% MATLAB 3.74%

deeplearning gans neural-network pytorch segan

segan_pytorch's People

Contributors

Stargazers

Watchers

Forkers

templeblock stevenlol tony32769 trendingtechnology jfsantos entn-at toannhu zhangwen464 saurabh-kataria plthiyagu dendisuhubdy danny0122 jhuiac taras-sereda haoxiangsnr byfaith gregarious9612 yfliao okrio titospadini chemingway georgedoma2000 tvuong123 davidyangda ml-lab speechdnn lp940708 hyli666 johnhany asadullah797 boozyguo tamwaiban buyexu shamoons dmzubr xingws sluo171 donghaiyw ricwg yousa2298 liar1573 dyydudu lbxcfx ppodxiang zrq-7 zk1001 lsdbwb mnabihali 5l1v3r1 coalboss silvadirceu zhouzhenkun llmhao spxnn shrutikshirsagar new-okaerinasai wangtao2668129173 iabd jseam2 wonderwrj vxltrxrsmxth newoneincntk marin-chen roocki spxen razorenhua qute012 archiki microgroove doyeon-k kedengfeng herryfan msinvent jupiterethan guocj97 gdy1201 ay-ay-vasilev mahbubnoor sadam1195 3i-hust-asr iron-y aidanmomo machinelearningzuu zhongshijun harkoun mansoorcheema shumile66 chenzqing zp1018 denzv95 wjliu0215 quanw8781 aniruddhapal211316 ruthwright baekms amirhussein96 jellyfish1456 tqbx aparnaagrawal02 windylam

segan_pytorch's Issues

please help me

No module named 'ahoproc_tools'，i can not find it ，what can i do？
thanks！！！

pesqmain not found! Please add it your PATH

where is pesqmain source code?

How to run the vanilla SEGAN version (like the one in TensorFlow repo)?
Or, do I need to manually make it deeper, change the strides from 4 to 2, etc.? According to me, after listening to the outputs you shared, SEGAN+ has tonal artifacts.

The enhanced .wav file sounds very weird!

After run run_segan+_clean.sh I obtained enhanced .wav files, but it sounds like the audio is slowed down by many times. I'd like to know which test data set should I use?

Can anyone run the clean.py with self audio file and get good output cleaned audio?

Can anyone run the clean.py with self audio file and get good output cleaned audio?
I try my best to debug the codes and run it successfully, but the quality of output audio is too bad.
Now I'm wondering whether the weights file(segan+_generator.ckpt) provided is good enough?

add LICENSE

Hi there!

This is great research / code, and it would be great if there were an explicit (and hopefully permissive) license with it.

You can find good info on pros/cons of licenses here: https://choosealicense.com/

or more in-depth here: https://www.cio.com/article/2382115/how-to-choose-the-best-license-for-your-open-source-software-project.html

Thanks again for releasing this into the world:)

-josh

segan+generation.ckpt

Thanks for your code!

GPU out of memory

Hi,

I am getting GPU out of memory after several epochs of training. It seems like GPU memory is not getting flushed automatically

What may be the reason?

Thanks in advance

A singletone noise in the enhanced speech

Hi!
thanks a lot for the code. It is very helpful. I have a small doubt and it would be great if you kindly address it.
In the enhanced speech, there is single tone noise in the background. Is this some kind of bug or is there any way to remove this single-tone frequency?

The single-tone noise can be seen the in spectrograms (at 4 Khz in the first sentence and at 7khz and 2 khz in the second sentence.)

The speech files are also added here. Please kindly suggest .

p232_102.zip

DB used for training pretrained model

Excuseme. Can you tell me what DB you have used for training the pretrained model?
I might refer to your repository on my research paper and would like to know on what DB it was trained and maybe some other details on the training settings only if that's not too much to ask.
Thank you :D

can you provide the pesqmain

hello,i want to calculate the pesq score ,but i dont have the pesqmain script ,can you provide it to me . my email is [email protected] .thanks very much

OutGate is missing

generator.py
class Generator1D(Model):
...
if post_proc:
self.comb_net = PostProcessingCombNet(1, 512)
if out_gate:
self.out_gate = OutGate(1, 1) <- There isn't any OutGate
if big_out_filter:
self.out_filter = nn.Conv1d(1, 1, 513, padding=513 // 2)
...

pretrain model error: 'ArgsParser' object has no attribute 'reg_loss'

Hello author, when I run segan+_clean.sh, I has this error, Could you help me? Thanks.

Enhanced wav files clearly lost some frequency bin in speech

Hi, I am using the official training dataset and test dataset from https://datashare.ed.ac.uk/handle/10283/1942, as mentioned in the repo.

And during the training, there is a warning all along, but I guess, as long as the new ckpt files is generated, this should be fine

Removing old ckpt ckpt_segan+/weights_EOE_D-Discriminator-164.ckpt
ERROR: ckpt is not there?
(Iter 16187) Batch 50/163 (Epoch 100) d_real:0.0036, d_fake:0.0003, g_adv:0.9946, g_l1:0.3390 l1_w: 100.00, btime: 0.9317 s, mbtime: 0.9344 s
(Iter 16237) Batch 100/163 (Epoch 100) d_real:0.0159, d_fake:0.0003, g_adv:0.9716, g_l1:0.3667 l1_w: 100.00, btime: 0.9287 s, mbtime: 0.9344 s
(Iter 16287) Batch 150/163 (Epoch 100) d_real:0.0014, d_fake:0.0004, g_adv:1.0072, g_l1:0.4364 l1_w: 100.00, btime: 0.9299 s, mbtime: 0.9344 s
(Iter 16300) Batch 163/163 (Epoch 100) d_real:0.0027, d_fake:0.0003, g_adv:1.0105, g_l1:0.4218 l1_w: 100.00, btime: 0.6067 s, mbtime: 0.9343 s
Removing old ckpt ckpt_segan+/weights_EOE_G-Generator-164.ckpt
ERROR: ckpt is not there?
Removing old ckpt ckpt_segan+/weights_EOE_D-Discriminator-164.ckpt
ERROR: ckpt is not there?

So I am using the latest trained models, after 100 epoch, but the enhanced wav file is not ok, denoised but damaged even more.

--skip_type constant creates an error

in the generator.

generator.py", line 70, in forward
    sk_h =  skip_k * hj
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor

ValueError: cannot reshape array of size 2 into shape (1,)

I was trying to train this model but an error occurred. Is there any way to resolved it?

Thanks for your time!

core dumped

Hello, when i run the run_segan+clean.sh script, it happened a strange problem, like that:
Cleaning 1 wavs
./run_segan+_clean.sh: line 19: 150043 Segmentation fault (core dumped) python -u clean.py --g_pretrained_ckpt $CKPT_PATH/$G_PRETRAINED_CKPT --test_files $TEST_FILES_PATH
--cfg_file $CKPT_PATH/train.opts --synthesis_path $SAVE_PATH --soundfile
how can i solve this ? thanks you very much.

Reproducible script for training GSEGAN?

Hi @santi-pdp, thank you for open-sourcing this amazing project.

I am trying to reproduce training GSEGAN [1] from scratch.
However, I could not have found a reproducible script for it.
Could you let me know such a train script?

Since there are a lot of parameters and scripts, I am not sure which options I must select to reproduce it.
For example, I don't know which dataset I must select.

segan_pytorch/train_gsegan.py

Lines 137 to 159 in 6b831de

  dset = SEOnlineDataset(opts.data_root, 

  distorteds=opts.distorted_roots, 

  distorted_p=opts.distorted_p, 

  noises_dir=opts.noises_dir, 

  chunker=chunker, 

  nsamples=opts.data_samples, 

  transform=trans, 

  utt2class=opts.utt2class, 

  lab_transform=aco_transform, 

  lab_folder=opts.lab_folder) 

  """ 

 else: 

 # create Dataset(s) and Dataloader(s) 

 assert opts.noisy_data_root is not None 

 # a contaminated dataset is specified, use ChunkerSEDataset 

 dset = RandomChunkSEDataset(opts.data_root, 

 opts.noisy_data_root, 

 opts.preemph, 

 slice_size=opts.slice_size, 

 transform=aco_transform) 

 dloader = DataLoader(dset, batch_size=opts.batch_size, 

 shuffle=True, num_workers=opts.num_workers, 

 pin_memory=CUDA)

Could you let me know such a train script, similar to this one (for GSEGAN)?

python train.py --save_path ckpt_segan+ --batch_size 300 \
		--clean_trainset data/clean_trainset \
		--noisy_trainset data/noisy_trainset \
		--cache_dir data/cache

Best,
Woosung Choi
[1] Pascual, Santiago, Joan Serrà, and Antonio Bonafonte. "Towards generalized speech enhancement with generative adversarial networks." arXiv preprint arXiv:1904.03418 (2019). Interspeech

What does the latent vector z mean?

why my result used your SEGAN+ generator weights can't be a human voice?

Hello,I used your generator to clean the test_noisy.However,the result is so terrible that it can't be sound like a human voice,and the time of each result was different from the test data.Could you please help me repair it?

run_segan+_clean.sh with cuda have error

When i run run_segan+_clean.sh and set cuda, but got the following error.

RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight'

ahoproc_tools

Hello author, where can I find Ahoprocessing tools (ahoproc_tools), Thank you very much!

error while running clean.py

I tried using the pretrained model as given in the instructions, but got the following error

line 79, in init
self.reg_loss = getattr(F, opts.reg_loss)
AttributeError: 'ArgParser' object has no attribute 'reg_loss'

Inference Speed

Hey!
Amazing work, gonna test it.

I've one question prior to testing - how fast is enhancement inference?
Let's say - with perspective of input length.

what does this line do? can you please explain?

segan_pytorch/segan/datasets/se_dataset.py

Line 563 in 0522387

returns = ['N/A', torch.FloatTensor(c_slice).squeeze(-1),

ADD LICENSE

Hi @santi-pdp

First of all, congrats for putting out such an amazing work!

I have been experimenting with the segan tensorflow repository, and now I am looking to do some testing on this one.

The other repo has a license, found here, so I imagine this being a related work it will have the same license, and because of that you forgot to add it.

Anyway, could you please add a license to this repository?
This way we will know how can we use it! (As you know, if no license is attached to the code, the only thing I can do is view the code but not modify anything)

Thanks a lot for your time and effort.

Looking forward to hear from you soon,

Miguel.

PD: ¡Saludos desde Santander!

Canot using tensorboard

tensorboard\data_compat.py", line 74, in _migrate_histogram_value
    buckets = np.array([bucket_lefts, bucket_rights, bucket_counts], dtype=np.float32).transpose()
ValueError: setting an array element with a sequence.

Hey, I've tried hard to solve this problem, however, doesn't got it.
Python 3.6.6
Torch 0.4.1
Numpy 1.14.3

I can't get pretrained models

When i click "this link", i can't download pretrained models. Can you help me? Thanks

Why the stride of convolution layer in generator is 4?

In your paper LINK, I readed G is composed of ~ strides of N =2 in page 3, but strides are 4 in your code. Because of the performance?

Pretrained model is valid only for english speechs?

if there is another speech but not with English, we must train model again? thanks.

Enhanced output very strange

I have been training with 100 clean audios and 100 noisy audios all about three minutes. For whatever reason, when using this model to enhance audio, the output is terrible. There is a bunch of crackling and clicks and it is a mess. What am I doing wrong? All the audios are 16 bit 16k mono. Please respond as soon as possible.

WSEGAN weights

Hi!

I would be very interested in using a pre-trained of WSEGAN. However, from what I can see you have not released the pre-trained weights for this version of your model, only for SEGAN. Do you plan to release these at any point?

Thanks a lot.

Size of the dataset is small - 824 files

Can you suggest the best training experience like: code modifications, number of epochs, batch size, etc to set?

Or I should increase the training set?

Thank you

	dset = SEOnlineDataset(opts.data_root,
	distorteds=opts.distorted_roots,
	distorted_p=opts.distorted_p,
	noises_dir=opts.noises_dir,
	chunker=chunker,
	nsamples=opts.data_samples,
	transform=trans,
	utt2class=opts.utt2class,
	lab_transform=aco_transform,
	lab_folder=opts.lab_folder)
	"""
	else:
	# create Dataset(s) and Dataloader(s)
	assert opts.noisy_data_root is not None
	# a contaminated dataset is specified, use ChunkerSEDataset
	dset = RandomChunkSEDataset(opts.data_root,
	opts.noisy_data_root,
	opts.preemph,
	slice_size=opts.slice_size,
	transform=aco_transform)
	dloader = DataLoader(dset, batch_size=opts.batch_size,
	shuffle=True, num_workers=opts.num_workers,
	pin_memory=CUDA)