snsun / pit-speech-separation Goto Github PK

View Code? Open in Web Editor NEW

131.0 14.0 57.0 1.05 MB

MATLAB 30.10% Shell 3.31% Python 66.59%

pit-speech-separation's Issues

Can I add you wechat to discuss this work?

Hi,

你好。我叫刘晓宇，我读了这篇paper和你的code，方便加个微信讨论一下吗？

我的邮箱是 [email protected]，可以把你的微信号发给我吗？多谢多谢。

Dataset structure

Hi
can you explain what should be the actual structure of the dataset? not sure i got it..
is it like in the matlab file you attached? i.e. cv/tr/tt folder, each folder contain the mixed/s1/s2 folder?
and what actual file/function should i run after the signal mixing?
thanks!

a littile

You said:

def get_padded_batch(file_list, batch_size, input_size, output_size,
                     num_enqueuing_threads=4, num_epochs=1, shuffle=True):
    """Reads batches of SequenceExamples from TFRecords and pads them.
    Can deal with variable length SequenceExamples by padding each batch to the
    length of the longest sequence with zeros.
    Args:
        file_list: A list of paths to TFRecord files containing SequenceExamples.
        batch_size: The number of SequenceExamples to include in each batch.
        input_size: The size of each input vector. The returned batch of inputs
            will have a shape [batch_size, num_steps, input_size].
        num_enqueuing_threads: The number of threads to use for enqueuing
            SequenceExamples.
    Returns:
        inputs: A tensor of shape [batch_size, num_steps, input_size] of floats32s.
        labels: A tensor of shape [batch_size, num_steps] of float32s.
        lengths: A tensor of shape [batch_size] of int32s. The lengths of each
            SequenceExample before padding.
    """

What's the meaning of the lengths. it said that it has the shape of [batch_size], do you mean that it is all the same for all batchs? Or is it a vector with the dim the same as the length of audio files?

I have searched the decode(), and it tell me that it is the shape of [batch_size].
So every first samples in every batch have the same length?

wv version of wsj0 dataset

hi i have wv version of wsj0, how can i change this format to the wav without any information loss in the files? you know wv is a specific compressed format and i we have to change it to the wav purly, do you have any suggestion?

Contact

你好，我对PIT方法有一些疑问，想问问你，方便加个微信吗，我的邮箱是[email protected]

list index out of range

hi I have created dataset and extracted features, when execute run_lstm.py it shows this error

Traceback (most recent call last):
File "run_lstm.py", line 454, in
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
File "C:\ProgramData\Anaconda3\lib\site-packages\tensorflow_core\python\platform\app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "C:\ProgramData\Anaconda3\lib\site-packages\absl\app.py", line 299, in run
_run_main(main, args)
File "C:\ProgramData\Anaconda3\lib\site-packages\absl\app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_lstm.py", line 295, in main
train()
File "run_lstm.py", line 166, in train
tr_tfrecords_lst, tr_num_batches = read_list_file("tr_tf", FLAGS.batch_size)
File "run_lstm.py", line 40, in read_list_file
utt_id = line.strip().split()[0]
IndexError: list index out of range

how can I resolve this error?

Have you ever faced the problem that two masks always turns to equal during the training

Hi, the issue is not for your code, I am doing some research on speech separation and I implemented basic PIT model (CNN, RNN), however, I always face an issue that two masks trend to have equal values during the training. It results in separation totally not work. I just wonder whether you have ever faced a similar issue when you implemented the PIT algorithm?
Thanks

I have some problems with executing code.

Hello, sorry to bother you.
I am a student and just getting into machine learning.
Recently studied paper and code for speech separation.
I want to execute your program, but I have some problems.
Because I don't have WSJ0 data, so I use TIMIT dataset.
And slightly modify the matlab code you provided to produce a mixed speech.
But I don't quite understand how to extract the spectral features and labels of each utterance in python.
Execute gen_tfreords.py separately with Spyder (2.7 or 3.7), an error like No module named 'io_funcs.signal_processing' will appear.
Or would it be possible that we contact on Wechat OR Email? Many thanks!

run.sh

Hello.Sorry to bother you.
python -u local/gen_tfreords.py --gender_list local/wsj0-train-spkrinfo.txt data/wav/wav8k/min/$x/ lists/${x}_wav.lst data/tfrecords/${x}_psm/ &
I cannot get the _wav.lst. Could please tell me where does it come from?Thank you very much.
Or ,could we chat in Wechat or QQ please?

there is a small error

Hi, sun, the blstm.py and spknet.py are the same in the folder of model, maybe there is a small error?

hi,can you offer me some data for train,i am new to the LSTM

Hello, I have a question...

Hi, Sun Sining:
I want to ask you a question in the line 103 of local/gen_tfreords.py. The line "labels = np.concatenate((s1_abs * np.cos(mix_angle - s1_angle), s2_abs * np.cos(mix_angle - s2_angle)), axis = 1)" means that you want to compute PSM（phase sensitive mask）, right? If so, why not compute as "labels = np.concatenate((s1_abs / mix_abs * np.cos(mix_angle - s1_angle), s2_abs / mix_abs * np.cos(mix_angle - s2_angle)), axis = 1)"?

A problem when preparing my own data for TFRecords format.

Hi! I am a college student. I was studying your projects these days and got some problems.
I did not find the function "make_sequence_example_two_labels" in the "io_funcs" folder. Was it an error?And I was not really sure about how to generate the required TFRecords using my own data. Or would it be possible that we contact on Wechat? Many thanks!

what‘s meta-frame's meaning?

1.I don't understand meta-frame's meaning.

2.how to choose the least total MSE , as the model parameters haven't been optimized yet.

maybe my question sounds like a little fool,i am sooo new to ml.

dataset

hello my friend I have two important questions
finally I could run your amazing code... as far as I know for doing that we need 4 kind of lists

1)dataset lists(mix_2_spk_tr and etc)
2) gender lists
3) wav lists
4) tfrecords lists
for small scale and just for run the code I generated these lists by hand and the man_wav_list.py script but here is my two big problems :

1- how can I produce above lists specially dataset lists by script? do you have any script to do that?

for example in mix_2_spk_tr we have pretty much line of mixing different wav files in different SNRs to generate mixing's train dataset, my question is :

the mixing code automatically convert the wav files to target SNRs or before it we have to do that to make that list? for example we have this in first line of mix_2_spk_tr :
/home/disk1/snsun/Workspace/tensorflow/kaldi/data/wsj0/tr/40na010x.wav 1.9857 /home/disk1/snsun/Workspace/tensorflow/kaldi/data/wsj0/tr/01xo031a.wav -1.9857

this script create_wav_2speakers.m automatically produce wavs with these SNRs (1.9857 and -1.9857 ) and then mix them for making the SNR or before that we have to produce such kind of wavs then run that script for making dataset?

Code clarity: Permutation for minimum loss

Hello,

your code is not quite clear, regarding Eq. (13) of the referenced paper. I can just find https://github.com/snsun/separation/blob/master/model/spknet.py#L149 in your code, but that does not use either min() nor argmin().

Cheers.

tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 1488 and 992 for 'model/blstm/stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/fw/while/fw/basic_lstm_cell/MatMul_2' (op: 'MatMul') with input shapes: [25,1488], [992,1984].

Problem:
(1) tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 1488 and 992 for 'model/blstm/stack_bidirectional_rnn/cell_0/bidirectional_rnn/fw/fw/while/fw/basic_lstm_cell/ MatMul_2' (op: 'MatMul') with input shapes: [25,1488], [992,1984].
(2) ValueError: Dimensions must be equal, but are 1488 and 992 for 'model/blstm/stack_bidirectional_rnn/ cell_0/bidirectional_rnn/fw/fw/while/fw/basic_lstm_cell/MatMul_2' (op: 'MatMul') with input shapes:[25,1488], [992,1984].

@snsun I guess it might be the reason for the tensorflow version (your tensorflow version is 1.0 ?). What do you think of it? Thanks!

Is this repo for training or training+speech separation?

Hi,

Is this repo for PIT training only or does it do speech separation also?

How to convert feats_mapping.lst to tfrecords?

I have run the baseline, and I also generate another feats using matlab scripts provided by you, extract_czt_fft_feats, and convert it to Kaldi type.

Then I use local/makelist to generate a list file as

huanglu@speech-WorkStation:pit-speech-separation$ head lists/tt_feats_mapping.lst
050a0501_1.7783_442o030z_-1.7783.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:37 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:39 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:141477
050a0502_1.3461_440o030j_-1.3461.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:281761 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:282915 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:438801
050a0502_1.463_420a010o_-1.463.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:592267 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:594685 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:750569
050a0502_1.9707_440c020w_-1.9707.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:902775 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:906455 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1062341
050a0504_2.4414_443o0313_-2.4414.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:1213283 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1218227 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1398365
050a0505_1.5097_440o030d_-1.5097.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:1572107 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1578503 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1767413
050a0506_1.7744_447c0213_-1.7744.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:1948407 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:1956323 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2108597
050a0506_1.9887_447c0215_-1.9887.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:2251719 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2260871 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2444621
050a0507_0.75154_423a010l_-0.75154.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:2617741 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2628373 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2811093
050a0508_0.19796_423a010l_-0.19796.wav /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_inputs/feats.ark:2981707 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:2993813 /home/huanglu/data/WSJ0_Mix/2speakers/feat8k/min/tt_labels/feats.ark:3160021

How can I then convert this one to tfrecords, I am not familiar with the tf.

Thx!

How to separate the target speech?

Dear,

Thanks to your great Project,

I have a question now, if I only have the target person's corpus,
and how I separate his speech from multi-talker ?

Any suggestion or advice will be okey.

Thx

Lychee

snsun / pit-speech-separation Goto Github PK

pit-speech-separation's Issues

Recommend Projects

Recommend Topics

Recommend Org