santi-pdp / segan_pytorch Goto Github PK
View Code? Open in Web Editor NEWSpeech Enhancement Generative Adversarial Network in PyTorch
License: MIT License
Speech Enhancement Generative Adversarial Network in PyTorch
License: MIT License
No module named 'ahoproc_tools',i can not find it ,what can i do?
thanks!!!
where is pesqmain source code?
How to run the vanilla SEGAN version (like the one in TensorFlow repo)?
Or, do I need to manually make it deeper, change the strides from 4 to 2, etc.? According to me, after listening to the outputs you shared, SEGAN+ has tonal artifacts.
After run run_segan+_clean.sh I obtained enhanced .wav files, but it sounds like the audio is slowed down by many times. I'd like to know which test data set should I use?
Can anyone run the clean.py with self audio file and get good output cleaned audio?
I try my best to debug the codes and run it successfully, but the quality of output audio is too bad.
Now I'm wondering whether the weights file(segan+_generator.ckpt) provided is good enough?
Hi there!
This is great research / code, and it would be great if there were an explicit (and hopefully permissive
) license with it.
You can find good info on pros/cons of licenses here: https://choosealicense.com/
or more in-depth here: https://www.cio.com/article/2382115/how-to-choose-the-best-license-for-your-open-source-software-project.html
Thanks again for releasing this into the world:)
-josh
Thanks for your code!
Hi,
I am getting GPU out of memory after several epochs of training. It seems like GPU memory is not getting flushed automatically
What may be the reason?
Thanks in advance
Hi!
thanks a lot for the code. It is very helpful. I have a small doubt and it would be great if you kindly address it.
In the enhanced speech, there is single tone noise in the background. Is this some kind of bug or is there any way to remove this single-tone frequency?
The single-tone noise can be seen the in spectrograms (at 4 Khz in the first sentence and at 7khz and 2 khz in the second sentence.)
The speech files are also added here. Please kindly suggest .
Excuseme. Can you tell me what DB you have used for training the pretrained model?
I might refer to your repository on my research paper and would like to know on what DB it was trained and maybe some other details on the training settings only if that's not too much to ask.
Thank you :D
hello,i want to calculate the pesq score ,but i dont have the pesqmain script ,can you provide it to me . my email is [email protected] .thanks very much
generator.py
class Generator1D(Model):
...
if post_proc:
self.comb_net = PostProcessingCombNet(1, 512)
if out_gate:
self.out_gate = OutGate(1, 1) <- There isn't any OutGate
if big_out_filter:
self.out_filter = nn.Conv1d(1, 1, 513, padding=513 // 2)
...
Hi, I am using the official training dataset and test dataset from https://datashare.ed.ac.uk/handle/10283/1942, as mentioned in the repo.
And during the training, there is a warning all along, but I guess, as long as the new ckpt files is generated, this should be fine
Removing old ckpt ckpt_segan+/weights_EOE_D-Discriminator-164.ckpt
ERROR: ckpt is not there?
(Iter 16187) Batch 50/163 (Epoch 100) d_real:0.0036, d_fake:0.0003, g_adv:0.9946, g_l1:0.3390 l1_w: 100.00, btime: 0.9317 s, mbtime: 0.9344 s
(Iter 16237) Batch 100/163 (Epoch 100) d_real:0.0159, d_fake:0.0003, g_adv:0.9716, g_l1:0.3667 l1_w: 100.00, btime: 0.9287 s, mbtime: 0.9344 s
(Iter 16287) Batch 150/163 (Epoch 100) d_real:0.0014, d_fake:0.0004, g_adv:1.0072, g_l1:0.4364 l1_w: 100.00, btime: 0.9299 s, mbtime: 0.9344 s
(Iter 16300) Batch 163/163 (Epoch 100) d_real:0.0027, d_fake:0.0003, g_adv:1.0105, g_l1:0.4218 l1_w: 100.00, btime: 0.6067 s, mbtime: 0.9343 s
Removing old ckpt ckpt_segan+/weights_EOE_G-Generator-164.ckpt
ERROR: ckpt is not there?
Removing old ckpt ckpt_segan+/weights_EOE_D-Discriminator-164.ckpt
ERROR: ckpt is not there?
So I am using the latest trained models, after 100 epoch, but the enhanced wav file is not ok, denoised but damaged even more.
in the generator.
generator.py", line 70, in forward
sk_h = skip_k * hj
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor
Hello, when i run the run_segan+clean.sh script, it happened a strange problem, like that:
Cleaning 1 wavs
./run_segan+_clean.sh: line 19: 150043 Segmentation fault (core dumped) python -u clean.py --g_pretrained_ckpt $CKPT_PATH/$G_PRETRAINED_CKPT --test_files $TEST_FILES_PATH
--cfg_file $CKPT_PATH/train.opts --synthesis_path $SAVE_PATH --soundfile
how can i solve this ? thanks you very much.
Hi @santi-pdp, thank you for open-sourcing this amazing project.
I am trying to reproduce training GSEGAN [1] from scratch.
However, I could not have found a reproducible script for it.
Could you let me know such a train script?
Since there are a lot of parameters and scripts, I am not sure which options I must select to reproduce it.
For example, I don't know which dataset I must select.
Lines 137 to 159 in 6b831de
Could you let me know such a train script, similar to this one (for GSEGAN)?
python train.py --save_path ckpt_segan+ --batch_size 300 \
--clean_trainset data/clean_trainset \
--noisy_trainset data/noisy_trainset \
--cache_dir data/cache
Best,
Woosung Choi
[1] Pascual, Santiago, Joan Serrà, and Antonio Bonafonte. "Towards generalized speech enhancement with generative adversarial networks." arXiv preprint arXiv:1904.03418 (2019). Interspeech
Hello,I used your generator to clean the test_noisy.However,the result is so terrible that it can't be sound like a human voice,and the time of each result was different from the test data.Could you please help me repair it?
When i run run_segan+_clean.sh and set cuda, but got the following error.
RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight'
Hello author, where can I find Ahoprocessing tools (ahoproc_tools), Thank you very much!
I tried using the pretrained model as given in the instructions, but got the following error
line 79, in init
self.reg_loss = getattr(F, opts.reg_loss)
AttributeError: 'ArgParser' object has no attribute 'reg_loss'
Hey!
Amazing work, gonna test it.
I've one question prior to testing - how fast is enhancement inference?
Let's say - with perspective of input length.
segan_pytorch/segan/datasets/se_dataset.py
Line 563 in 0522387
Hi @santi-pdp
First of all, congrats for putting out such an amazing work!
I have been experimenting with the segan tensorflow repository, and now I am looking to do some testing on this one.
The other repo has a license, found here, so I imagine this being a related work it will have the same license, and because of that you forgot to add it.
Anyway, could you please add a license to this repository?
This way we will know how can we use it! (As you know, if no license is attached to the code, the only thing I can do is view the code but not modify anything)
Thanks a lot for your time and effort.
Looking forward to hear from you soon,
Miguel.
PD: ¡Saludos desde Santander!
tensorboard\data_compat.py", line 74, in _migrate_histogram_value
buckets = np.array([bucket_lefts, bucket_rights, bucket_counts], dtype=np.float32).transpose()
ValueError: setting an array element with a sequence.
Hey, I've tried hard to solve this problem, however, doesn't got it.
Python 3.6.6
Torch 0.4.1
Numpy 1.14.3
When i click "this link", i can't download pretrained models. Can you help me? Thanks
In your paper LINK, I readed G is composed of ~ strides of N =2 in page 3, but strides are 4 in your code. Because of the performance?
if there is another speech but not with English, we must train model again? thanks.
I have been training with 100 clean audios and 100 noisy audios all about three minutes. For whatever reason, when using this model to enhance audio, the output is terrible. There is a bunch of crackling and clicks and it is a mess. What am I doing wrong? All the audios are 16 bit 16k mono. Please respond as soon as possible.
Hi!
I would be very interested in using a pre-trained of WSEGAN. However, from what I can see you have not released the pre-trained weights for this version of your model, only for SEGAN. Do you plan to release these at any point?
Thanks a lot.
I tried to access http://veu.talp.cat/seganp/, but site is not answerin?
Hello,
I have a clean trainset .wav files with following characteristics:
bitrate: 128 kb/s, Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 128 kb/s
and noisy trainset .wav files with (files are generated by encoding clean wavs to .gsm then to .wav format):
encoder : Lavf58.20.100, bitrate: 128 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s
Size of the dataset is small - 824 files
Can you suggest the best training experience like: code modifications, number of epochs, batch size, etc to set?
Or I should increase the training set?
Thank you
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.