Giter VIP home page Giter VIP logo

segan_pytorch's People

Contributors

chemingway avatar santi-pdp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

segan_pytorch's Issues

please help me

No module named 'ahoproc_tools',i can not find it ,what can i do?
thanks!!!

Running vanilla SEGAN

How to run the vanilla SEGAN version (like the one in TensorFlow repo)?
Or, do I need to manually make it deeper, change the strides from 4 to 2, etc.? According to me, after listening to the outputs you shared, SEGAN+ has tonal artifacts.

The enhanced .wav file sounds very weird!

After run run_segan+_clean.sh I obtained enhanced .wav files, but it sounds like the audio is slowed down by many times. I'd like to know which test data set should I use?

GPU out of memory

Hi,

I am getting GPU out of memory after several epochs of training. It seems like GPU memory is not getting flushed automatically

What may be the reason?

Thanks in advance

A singletone noise in the enhanced speech

Hi!
thanks a lot for the code. It is very helpful. I have a small doubt and it would be great if you kindly address it.
In the enhanced speech, there is single tone noise in the background. Is this some kind of bug or is there any way to remove this single-tone frequency?

The single-tone noise can be seen the in spectrograms (at 4 Khz in the first sentence and at 7khz and 2 khz in the second sentence.)

Untitled drawing

The speech files are also added here. Please kindly suggest .

p232_102.zip

DB used for training pretrained model

Excuseme. Can you tell me what DB you have used for training the pretrained model?
I might refer to your repository on my research paper and would like to know on what DB it was trained and maybe some other details on the training settings only if that's not too much to ask.
Thank you :D

OutGate is missing

generator.py
class Generator1D(Model):
...
if post_proc:
self.comb_net = PostProcessingCombNet(1, 512)
if out_gate:
self.out_gate = OutGate(1, 1) <- There isn't any OutGate
if big_out_filter:
self.out_filter = nn.Conv1d(1, 1, 513, padding=513 // 2)
...

Enhanced wav files clearly lost some frequency bin in speech

Hi, I am using the official training dataset and test dataset from https://datashare.ed.ac.uk/handle/10283/1942, as mentioned in the repo.

And during the training, there is a warning all along, but I guess, as long as the new ckpt files is generated, this should be fine

Removing old ckpt ckpt_segan+/weights_EOE_D-Discriminator-164.ckpt
ERROR: ckpt is not there?
(Iter 16187) Batch 50/163 (Epoch 100) d_real:0.0036, d_fake:0.0003, g_adv:0.9946, g_l1:0.3390 l1_w: 100.00, btime: 0.9317 s, mbtime: 0.9344 s
(Iter 16237) Batch 100/163 (Epoch 100) d_real:0.0159, d_fake:0.0003, g_adv:0.9716, g_l1:0.3667 l1_w: 100.00, btime: 0.9287 s, mbtime: 0.9344 s
(Iter 16287) Batch 150/163 (Epoch 100) d_real:0.0014, d_fake:0.0004, g_adv:1.0072, g_l1:0.4364 l1_w: 100.00, btime: 0.9299 s, mbtime: 0.9344 s
(Iter 16300) Batch 163/163 (Epoch 100) d_real:0.0027, d_fake:0.0003, g_adv:1.0105, g_l1:0.4218 l1_w: 100.00, btime: 0.6067 s, mbtime: 0.9343 s
Removing old ckpt ckpt_segan+/weights_EOE_G-Generator-164.ckpt
ERROR: ckpt is not there?
Removing old ckpt ckpt_segan+/weights_EOE_D-Discriminator-164.ckpt
ERROR: ckpt is not there?

So I am using the latest trained models, after 100 epoch, but the enhanced wav file is not ok, denoised but damaged even more.

--skip_type constant creates an error

in the generator.

generator.py", line 70, in forward
    sk_h =  skip_k * hj
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor

core dumped

Hello, when i run the run_segan+clean.sh script, it happened a strange problem, like that:
Cleaning 1 wavs
./run_segan+_clean.sh: line 19: 150043 Segmentation fault (core dumped) python -u clean.py --g_pretrained_ckpt $CKPT_PATH/$G_PRETRAINED_CKPT --test_files $TEST_FILES_PATH
--cfg_file $CKPT_PATH/train.opts --synthesis_path $SAVE_PATH --soundfile
how can i solve this ? thanks you very much.

Reproducible script for training GSEGAN?

Hi @santi-pdp, thank you for open-sourcing this amazing project.

I am trying to reproduce training GSEGAN [1] from scratch.
However, I could not have found a reproducible script for it.
Could you let me know such a train script?

Since there are a lot of parameters and scripts, I am not sure which options I must select to reproduce it.
For example, I don't know which dataset I must select.

dset = SEOnlineDataset(opts.data_root,
distorteds=opts.distorted_roots,
distorted_p=opts.distorted_p,
noises_dir=opts.noises_dir,
chunker=chunker,
nsamples=opts.data_samples,
transform=trans,
utt2class=opts.utt2class,
lab_transform=aco_transform,
lab_folder=opts.lab_folder)
"""
else:
# create Dataset(s) and Dataloader(s)
assert opts.noisy_data_root is not None
# a contaminated dataset is specified, use ChunkerSEDataset
dset = RandomChunkSEDataset(opts.data_root,
opts.noisy_data_root,
opts.preemph,
slice_size=opts.slice_size,
transform=aco_transform)
dloader = DataLoader(dset, batch_size=opts.batch_size,
shuffle=True, num_workers=opts.num_workers,
pin_memory=CUDA)

Could you let me know such a train script, similar to this one (for GSEGAN)?

python train.py --save_path ckpt_segan+ --batch_size 300 \
		--clean_trainset data/clean_trainset \
		--noisy_trainset data/noisy_trainset \
		--cache_dir data/cache

Best,
Woosung Choi
[1] Pascual, Santiago, Joan Serrà, and Antonio Bonafonte. "Towards generalized speech enhancement with generative adversarial networks." arXiv preprint arXiv:1904.03418 (2019). Interspeech

run_segan+_clean.sh with cuda have error

When i run run_segan+_clean.sh and set cuda, but got the following error.

RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight'

ahoproc_tools

Hello author, where can I find Ahoprocessing tools (ahoproc_tools), Thank you very much!

error while running clean.py

I tried using the pretrained model as given in the instructions, but got the following error

line 79, in init
self.reg_loss = getattr(F, opts.reg_loss)
AttributeError: 'ArgParser' object has no attribute 'reg_loss'

Inference Speed

Hey!
Amazing work, gonna test it.

I've one question prior to testing - how fast is enhancement inference?
Let's say - with perspective of input length.

ADD LICENSE

Hi @santi-pdp

First of all, congrats for putting out such an amazing work!

I have been experimenting with the segan tensorflow repository, and now I am looking to do some testing on this one.

The other repo has a license, found here, so I imagine this being a related work it will have the same license, and because of that you forgot to add it.

Anyway, could you please add a license to this repository?
This way we will know how can we use it! (As you know, if no license is attached to the code, the only thing I can do is view the code but not modify anything)

Thanks a lot for your time and effort.

Looking forward to hear from you soon,

Miguel.

PD: ¡Saludos desde Santander!

Canot using tensorboard

tensorboard\data_compat.py", line 74, in _migrate_histogram_value
    buckets = np.array([bucket_lefts, bucket_rights, bucket_counts], dtype=np.float32).transpose()
ValueError: setting an array element with a sequence.

Hey, I've tried hard to solve this problem, however, doesn't got it.
Python 3.6.6
Torch 0.4.1
Numpy 1.14.3

Enhanced output very strange

I have been training with 100 clean audios and 100 noisy audios all about three minutes. For whatever reason, when using this model to enhance audio, the output is terrible. There is a bunch of crackling and clicks and it is a mess. What am I doing wrong? All the audios are 16 bit 16k mono. Please respond as soon as possible.

WSEGAN weights

Hi!

I would be very interested in using a pre-trained of WSEGAN. However, from what I can see you have not released the pre-trained weights for this version of your model, only for SEGAN. Do you plan to release these at any point?

Thanks a lot.

the best training experience

Hello,
I have a clean trainset .wav files with following characteristics:
bitrate: 128 kb/s, Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, 1 channels, s16, 128 kb/s

and noisy trainset .wav files with (files are generated by encoding clean wavs to .gsm then to .wav format):
encoder : Lavf58.20.100, bitrate: 128 kb/s
Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s

Size of the dataset is small - 824 files

Can you suggest the best training experience like: code modifications, number of epochs, batch size, etc to set?

Or I should increase the training set?

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.