Giter VIP home page Giter VIP logo

pit-speech-separation's Issues

Dataset structure

Hi
can you explain what should be the actual structure of the dataset? not sure i got it..
is it like in the matlab file you attached? i.e. cv/tr/tt folder, each folder contain the mixed/s1/s2 folder?
and what actual file/function should i run after the signal mixing?
thanks!

a littile

You said:

def get_padded_batch(file_list, batch_size, input_size, output_size,
                     num_enqueuing_threads=4, num_epochs=1, shuffle=True):
    """Reads batches of SequenceExamples from TFRecords and pads them.
    Can deal with variable length SequenceExamples by padding each batch to the
    length of the longest sequence with zeros.
    Args:
        file_list: A list of paths to TFRecord files containing SequenceExamples.
        batch_size: The number of SequenceExamples to include in each batch.
        input_size: The size of each input vector. The returned batch of inputs
            will have a shape [batch_size, num_steps, input_size].
        num_enqueuing_threads: The number of threads to use for enqueuing
            SequenceExamples.
    Returns:
        inputs: A tensor of shape [batch_size, num_steps, input_size] of floats32s.
        labels: A tensor of shape [batch_size, num_steps] of float32s.
        lengths: A tensor of shape [batch_size] of int32s. The lengths of each
            SequenceExample before padding.
    """

What's the meaning of the lengths. it said that it has the shape of [batch_size], do you mean that it is all the same for all batchs? Or is it a vector with the dim the same as the length of audio files?

I have searched the decode(), and it tell me that it is the shape of [batch_size].
So every first samples in every batch have the same length?

wv version of wsj0 dataset

hi i have wv version of wsj0, how can i change this format to the wav without any information loss in the files? you know wv is a specific compressed format and i we have to change it to the wav purly, do you have any suggestion?

list index out of range

hi I have created dataset and extracted features, when execute run_lstm.py it shows this error

Traceback (most recent call last):
File "run_lstm.py", line 454, in
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\ProgramData\Anaconda3\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\ProgramData\Anaconda3\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_lstm.py", line 295, in main
train()
File "run_lstm.py", line 166, in train
tr_tfrecords_lst, tr_num_batches = read_list_file("tr_tf", FLAGS.batch_size)
File "run_lstm.py", line 40, in read_list_file
utt_id = line.strip().split()[0]
IndexError: list index out of range

how can I resolve this error?

Have you ever faced the problem that two masks always turns to equal during the training

Hi, the issue is not for your code, I am doing some research on speech separation and I implemented basic PIT model (CNN, RNN), however, I always face an issue that two masks trend to have equal values during the training. It results in separation totally not work. I just wonder whether you have ever faced a similar issue when you implemented the PIT algorithm?
Thanks

I have some problems with executing code.

Hello, sorry to bother you.
I am a student and just getting into machine learning.
Recently studied paper and code for speech separation.
I want to execute your program, but I have some problems.
Because I don't have WSJ0 data, so I use TIMIT dataset.
And slightly modify the matlab code you provided to produce a mixed speech.
But I don't quite understand how to extract the spectral features and labels of each utterance in python.
Execute gen_tfreords.py separately with Spyder (2.7 or 3.7), an error like No module named 'io_funcs.signal_processing' will appear.
Or would it be possible that we contact on Wechat OR Email? Many thanks!

run.sh

Hello.Sorry to bother you.
python -u local/gen_tfreords.py --gender_list local/wsj0-train-spkrinfo.txt data/wav/wav8k/min/$x/ lists/${x}_wav.lst data/tfrecords/${x}_psm/ &
I cannot get the _wav.lst. Could please tell me where does it come from?Thank you very much.
Or ,could we chat in Wechat or QQ please?

there is a small error

Hi, sun, the blstm.py and spknet.py are the same in the folder of model, maybe there is a small error?

Hello, I have a question...

Hi, Sun Sining:
I want to ask you a question in the line 103 of local/gen_tfreords.py. The line "labels = np.concatenate((s1_abs * np.cos(mix_angle - s1_angle), s2_abs * np.cos(mix_angle - s2_angle)), axis = 1)" means that you want to compute PSM(phase sensitive mask), right? If so, why not compute as "labels = np.concatenate((s1_abs / mix_abs * np.cos(mix_angle - s1_angle), s2_abs / mix_abs * np.cos(mix_angle - s2_angle)), axis = 1)"?

A problem when preparing my own data for TFRecords format.

Hi! I am a college student. I was studying your projects these days and got some problems.
I did not find the function "make_sequence_example_two_labels" in the "io_funcs" folder. Was it an error?And I was not really sure about how to generate the required TFRecords using my own data. Or would it be possible that we contact on Wechat? Many thanks!

what‘s meta-frame's meaning?

1.I don't understand meta-frame's meaning.

2.how to choose the least total MSE , as the model parameters haven't been optimized yet.

maybe my question sounds like a little fool,i am sooo new to ml.

dataset

hello my friend I have two important questions
finally I could run your amazing code... as far as I know for doing that we need 4 kind of lists

1)dataset lists(mix_2_spk_tr and etc)
2) gender lists
3) wav lists
4) tfrecords lists
for small scale and just for run the code I generated these lists by hand and the man_wav_list.py script but here is my two big problems :

1- how can I produce above lists specially dataset lists by script? do you have any script to do that?

  1. for example in mix_2_spk_tr we have pretty much line of mixing different wav files in different SNRs to generate mixing's train dataset, my question is :

the mixing code automatically convert the wav files to target SNRs or before it we have to do that to make that list? for example we have this in first line of mix_2_spk_tr :
/home/disk1/snsun/Workspace/tensorflow/kaldi/data/wsj0/tr/40na010x.wav 1.9857 /home/disk1/snsun/Workspace/tensorflow/kaldi/data/wsj0/tr/01xo031a.wav -1.9857

this script create_wav_2speakers.m automatically produce wavs with these SNRs (1.9857 and -1.9857 ) and then mix them for making the SNR or before that we have to produce such kind of wavs then run that script for making dataset?

tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 1488 and 992 for 'model/blstm/stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/fw/while/fw/basic_lstm_cell/MatMul_2' (op: 'MatMul') with input shapes: [25,1488], [992,1984].

Problem:
(1) tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 1488 and 992 for 'model/blstm/stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/fw/while/fw/basic_lstm_cell/ MatMul_2' (op: 'MatMul') with input shapes: [25,1488], [992,1984].
(2) ValueError: Dimensions must be equal, but are 1488 and 992 for 'model/blstm/stack_bidirectional_rnn/ cell_0/bidirectional_rnn/fw/fw/while/fw/basic_lstm_cell/MatMul_2' (op: 'MatMul') with input shapes:[25,1488], [992,1984].

@snsun I guess it might be the reason for the tensorflow version (your tensorflow version is 1.0 ?). What do you think of it? Thanks!

How to convert feats_mapping.lst to tfrecords?

I have run the baseline, and I also generate another feats using matlab scripts provided by you, extract_czt_fft_feats, and convert it to Kaldi type.

Then I use local/makelist to generate a list file as

huanglu@speech-WorkStation:pit-speech-separation$ head lists/tt_feats_mapping.lst
050a0501_1.7783_442o030z_-1.7783.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:37 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:39 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:141477
050a0502_1.3461_440o030j_-1.3461.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:281761 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:282915 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:438801
050a0502_1.463_420a010o_-1.463.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:592267 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:594685 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:750569
050a0502_1.9707_440c020w_-1.9707.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:902775 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:906455 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1062341
050a0504_2.4414_443o0313_-2.4414.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:1213283 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1218227 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1398365
050a0505_1.5097_440o030d_-1.5097.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:1572107 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1578503 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1767413
050a0506_1.7744_447c0213_-1.7744.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:1948407 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1956323 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2108597
050a0506_1.9887_447c0215_-1.9887.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:2251719 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2260871 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2444621
050a0507_0.75154_423a010l_-0.75154.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:2617741 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2628373 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2811093
050a0508_0.19796_423a010l_-0.19796.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:2981707 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2993813 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:3160021

How can I then convert this one to tfrecords, I am not familiar with the tf.

Thx!

How to separate the target speech?

Dear,

Thanks to your great Project,

I have a question now, if I only have the target person's corpus,
and how I separate his speech from multi-talker ?

Any suggestion or advice will be okey.

Thx

Lychee

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.