f90 / wave-u-net Goto Github PK

View Code? Open in Web Editor NEW

791.0 21.0 177.0 7.27 MB

Implementation of the Wave-U-Net for audio source separation

License: MIT License

Python 100.00%

audio-processing deep-learning waveform-analysis mit-license

wave-u-net's People

Contributors

Stargazers

Watchers

Forkers

agangzz zhaoforever kastnerkyle blank-wang shaun95 rockycamp dacson godanse shlpu yjleader-team4 skmun-github 21off lvaleriu entn-at mazzzystar bruceyang-yeu nature1317 breeef roebel xinkez jhyang0815 zc280330 poonono supershinyeyes klms ideaplexus vshalts iver56 ronggong loretoparisi rainy625 suwadith yunzqq afd77 lapd-python lyapple2008 maoxin7676 m-tian matangover lanpn85 sam93katoch cweng6 pg121380 yuzhshen hyzhan templeblock nindidooo hyon0930 markselias yacaeh pat331 ishine gyusig lym0302 alexander-prutko angenge byfaith yfliao mengze-96 donaher fossprime zvk harshit206 xiaochengcike shoegazerstella naweln karlosmuradyan jayg996 jperezlapillo jbgh2 palash93 matthewwedwards chekelee auzxb bourdalas kamalsky hadryan frizzid07 zzdxlee emergent fsharpcsharp xiaozhuo12138 wegamekinglc dung-n-tran vivekgangwar02 caozhengquan mchijmma wyp19930313 jacobic kuonanhong lijun-tian meadow163 fullofhope1282 shaoboh darius522 dmadboy zyfzyf tangxinofchina sadam1195 windstudent

wave-u-net's Issues

ERROR:Key separator/interp_0 not found in checkpoint

Thanks for your works! When I prun the Predict.py with the pre-trained model, I got the ERROR:Key separator/interp_0 not found in checkpoint, Can you help me to figure it out?

TypeError: load() missing 1 required positional argument: 'Loader'

Excuse me, When i use training command "python Training.py" it raises TypeError: load() missing 1 required positional argument: 'Loader'. The version of musdb is 0.2.3, How to solve this problem?

Preparing MUSDB dataset! This could take a while...
ERROR - Waveunet Training - Failed after 0:00:00!
Traceback (most recent calls WITHOUT Sacred internals):
File "Training.py", line 162, in run
sup_model_path, sup_loss = optimise()
File "Training.py", line 137, in optimise
model_path = train(load_model=model_path)
File "Training.py", line 39, in train
dataset = Datasets.get_dataset(model_config, sep_input_shape, sep_output_shape, partition="train")
File "C:\Users\86135\PycharmProjects\Wave-U-Net\Wave-U-Net\Datasets.py", line 137, in get_dataset
dsd_train, dsd_test = getMUSDB(model_config["musdb_path"]) # List of (mix, acc, bass, drums, other, vocal) tuples
File "C:\Users\86135\PycharmProjects\Wave-U-Net\Wave-U-Net\Datasets.py", line 222, in getMUSDB
mus = musdb.DB(root_dir=database_path, is_wav=False)
File "C:\Users\86135\PycharmProjects\py\Wave-U-Net\lib\site-packages\musdb_init_.py", line 86, in init
self.setup = yaml.load(f)
TypeError: load() missing 1 required positional argument: 'Loader'

there are some issues in "Evaluate.produce_musdb_source_estimates" function

in the fuction,I see
assert(mus.test(predict_fun))
mus.run(predict_fun, estimates_dir=output_path, subsets=subsets)
and call wrong when I run it
and the function has no definitions of "test" and "run"
can you tell me how can I fix the code ?
thank you very much!!!

InvalidArgumentError in multi_instrument Training

I'm trying training with the musdb dataset on colab. At EPOCH 0 after printing the number of sep_variables and the number of variables it throws me this error.

/usr/local/lib/python2.7/dist-packages/sacred/config/captured_function.pyc in captured_function(wrapped, instance, args, kwargs)
44 start_time = time.time()
45 # =================== run actual function =================================
---> 46 result = wrapped(*args, **kwargs)
47 # =========================================================================
48 if wrapped.logger is not None:

/content/drive/My Drive/Wave-U-Net-master/Wave-U-Net-master/Training.pyc in run(cfg)
164 print(model_config)
165 # Optimize in a supervised fashion until validation loss worsens
--> 166 sup_model_path, sup_loss = optimise(model_config, idd)
167 print("Supervised training finished! Saved model at " + sup_model_path + ". Performance: " + str(sup_loss))
168

/content/drive/My Drive/Wave-U-Net-master/Wave-U-Net-master/Training.pyc in optimise(model_config, experiment_id)
136 while worse_epochs < model_config["worse_epochs"]: # Early stopping on validation set after a few epochs
137 print("EPOCH: " + str(epoch))
--> 138 model_path = train(model_config, experiment_id, model_path)
139 curr_loss = Test.test(model_config, model_folder=str(experiment_id), partition="valid", load_model=model_path)
140 epoch += 1

/content/drive/My Drive/Wave-U-Net-master/Wave-U-Net-master/Training.pyc in train(model_config, experiment_id, load_model)
104 for _ in range(model_config["epoch_it"]):
105 # TRAIN SEPARATOR
--> 106 _, _sup_summaries = sess.run([separator_solver, sup_summaries])
107 writer.add_summary(_sup_summaries, global_step=_global_step)
108

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in run(self, fetches, feed_dict, options, run_metadata)
927 try:
928 result = self._run(None, fetches, feed_dict, options_ptr,
--> 929 run_metadata_ptr)
930 if run_metadata:
931 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _run(self, handle, fetches, feed_dict, options, run_metadata)
1150 if final_fetches or final_targets or (handle and feed_dict_tensor):
1151 results = self._do_run(handle, final_targets, final_fetches,
-> 1152 feed_dict_tensor, options, run_metadata)
1153 else:
1154 results = []

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1326 if handle is None:
1327 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1328 run_metadata)
1329 else:
1330 return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.pyc in _do_call(self, fn, *args)
1346 pass
1347 message = error_interpolation.interpolate(message, self._graph)
-> 1348 raise type(e)(node_def, op, message)
1349
1350 def _extend_graph(self):

InvalidArgumentError: Input to reshape is a tensor with 0 values, but the requested shape has 2099164
[[{{node Reshape}}]]
[[node IteratorGetNext (defined at /content/drive/My Drive/Wave-U-Net-master/Wave-U-Net-master/Training.py:42) ]]
[[node IteratorGetNext (defined at /content/drive/My Drive/Wave-U-Net-master/Wave-U-Net-master/Training.py:42) ]]

But I'm 100% sure to give the correct paths and that the configuration has the right parameters. I've tried printing the sep_input_shape and sep_output_shapes and they're both 3-dimensional arrays with proper dimensions. Is there something i'm missing?

Inquiry on SDR mean and median on MUSDB/test

When running the provided pre-trained model on musdb, i.e. call compute_mean_metrics, the SDR result on musdb/test is "acc_median: -15.40719 acc_mean: -15.94557 voc_median: -18.48437 voc_mean: -19.03658"
while SDR result on musdb/train is "train：acc_median: 3.91334 acc_mean: 4.07193 voc_median: 7.46886 voc_mean: 28.05654"

I wonder why the result SDR on musdb test dataset is so strange which obviously does not match what is stated in github "M5-HighSR is our best vocal separator, reaching a median (mean) vocal/acc SDR of 4.95 (1.01) and 11.16 (12.87), respectively.".

Key separator/conv1d_26/bias not found in checkpoint

thanks for your works!When I run the predict.py with the full_44khz pretrained model,I got the issue as "NotFoundError (see above for traceback): Key separator/conv1d_26/bias not found in checkpoint",can you help me to solve this problem?

How to Install Tensorflow on Python 2.7 ?

Hi there.
I'm having this error while trying to install Tensorflow on Python 2.7

"Could not find a version that satisfies the requirement tensorflow-gpu==1.8.0 (from -r requirements.txt (line 3)) (from versions: )
No matching distribution found for tensorflow-gpu==1.8.0 (from -r requirements.txt (line 3))"

I'm not very familiar with these things, is there anything that I'm missing before installing this?

Thanks!

Question - Why not train on smaller patches

I noticed that you run convolutions over the entire 16384 frames rather than processing a song in smaller patches. Is there a reason for this decision? Doesn't this increase the memory requirements and lower the ability to randomize data?

Training halts after the first epoch

Hello,

I tried to train your model with full_multi_instrument mode using 4 GPUs (NVIDIA Tesla P100) and with the same datasets (musdb). It took 2 hours to finish the first epoch followed by a very long hanging with no progress.

Here is the stack trace after stopping the script manually



2018-07-16 16:21:31.186201: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_25/kernel/Adam/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186236: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_25/kernel/Adam_1/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186247: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_25/bias/Adam/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186262: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_25/bias/Adam_1/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186272: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_26/kernel/Adam/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186285: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_26/kernel/Adam_1/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186297: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_26/bias/Adam/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186309: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_26/bias/Adam_1/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186322: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_27/kernel/Adam/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186334: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_27/kernel/Adam_1/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186347: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_27/bias/Adam/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186359: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/separator/conv1d_27/bias/Adam_1/Initializer/zeros: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186384: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/Adam/beta1: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186406: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/Adam/beta2: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186427: I tensorflow/core/common_runtime/placer.cc:886] separator_solver/Adam/epsilon: (Const)/job:localhost/replica:0/task:0/device:GPU:0
2018-07-16 16:21:31.186440: I tensorflow/core/common_runtime/placer.cc:886] sep_loss/tags: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-07-16 16:21:31.186451: I tensorflow/core/common_runtime/placer.cc:886] save/Const: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-07-16 16:21:31.186464: I tensorflow/core/common_runtime/placer.cc:886] save/SaveV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-07-16 16:21:31.186477: I tensorflow/core/common_runtime/placer.cc:886] save/SaveV2/shape_and_slices: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-07-16 16:21:31.186490: I tensorflow/core/common_runtime/placer.cc:886] save/RestoreV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-07-16 16:21:31.186503: I tensorflow/core/common_runtime/placer.cc:886] save/RestoreV2/shape_and_slices: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-07-16 16:21:31.186516: I tensorflow/core/common_runtime/placer.cc:886] save_1/Const: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-07-16 16:21:31.186528: I tensorflow/core/common_runtime/placer.cc:886] save_1/SaveV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-07-16 16:21:31.186542: I tensorflow/core/common_runtime/placer.cc:886] save_1/SaveV2/shape_and_slices: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-07-16 16:21:31.186554: I tensorflow/core/common_runtime/placer.cc:886] save_1/RestoreV2/tensor_names: (Const)/job:localhost/replica:0/task:0/device:CPU:0
2018-07-16 16:21:31.186567: I tensorflow/core/common_runtime/placer.cc:886] save_1/RestoreV2/shape_and_slices: (Const)/job:localhost/replica:0/task:0/device:CPU:0


^CWARNING - Waveunet - Aborted after 6:05:45!
Traceback (most recent call last):
  File "Training.py", line 326, in <module>
    @ex.automain
  File "/home/leo/.local/lib/python2.7/site-packages/sacred/experiment.py", line 137, in automain
    self.run_commandline()
  File "/home/leo/.local/lib/python2.7/site-packages/sacred/experiment.py", line 260, in run_commandline
    return self.run(cmd_name, config_updates, named_configs, {}, args)
  File "/home/leo/.local/lib/python2.7/site-packages/sacred/experiment.py", line 209, in run
    run()
  File "/home/leo/.local/lib/python2.7/site-packages/sacred/run.py", line 221, in __call__
    self.result = self.main_function(*args)
  File "/home/leo/.local/lib/python2.7/site-packages/sacred/config/captured_function.py", line 46, in captured_function
    result = wrapped(*args, **kwargs)
  File "Training.py", line 373, in dsd_100_experiment
    sup_model_path, sup_loss = optimise(dataset=dataset)
  File "/home/leo/.local/lib/python2.7/site-packages/sacred/config/captured_function.py", line 46, in captured_function
    result = wrapped(*args, **kwargs)
  File "Training.py", line 311, in optimise
    model_path = train(sup_dataset=dataset["train_sup"], load_model=model_path)
  File "/home/leo/.local/lib/python2.7/site-packages/sacred/config/captured_function.py", line 46, in captured_function
    result = wrapped(*args, **kwargs)
  File "Training.py", line 269, in train
    sup_batch = sup_batch_gen.get_batch()
  File "/home/leo/Wave-U-Net/Input/batchgenerators.py", line 123, in get_batch
    self.cache.update_cache_from_queue()
  File "/home/leo/Wave-U-Net/Input/multistreamcache.py", line 84, in update_cache_from_queue
    self.update_next_cache_item(self.communication_queue.get())
  File "/usr/lib/python2.7/multiprocessing/queues.py", line 117, in get
    res = self._recv()
KeyboardInterrupt
^C

The full stack trace can be found here [stack trace gist] (https://gist.github.com/leoybkim/789d367a0ee2c63db8a513613270b017)

It seems like something failed while fetching the next batch from queue. I also found your TODO comment on update_cache_from_queue() func about empty queues in
multistreamcache.py. I wonder if it has any relation to this.

TypeError: super(type, obj): obj must be an instance or subtype of type

"ex = Experiment('Waveunet Training', ingredients=[config_ingredient]),
super(Experiment, self).init(path=name,
TypeError: super(type, obj): obj must be an instance or subtype of type"

Data Augmentation

Hi, I am following this paper for performing data augmentation on the musdb.
I am using librosa time_stretch and pitch_shift on each sample of the musdb dataset.
I then use spempeg to build a new stem file.
Unfortunately, the preprocessing of the wave-u-net shows these statistics that seem not to so be good for re-training the network properly:

stems_augmented/train/ANiMAL - Clinic A_stretched_1.0.stem_bass.wav
stems_augmented/train/ANiMAL - Clinic A_stretched_1.0.stem_drums.wav
stems_augmented/train/ANiMAL - Clinic A_stretched_1.0.stem_other.wav
stems_augmented/train/ANiMAL - Clinic A_stretched_1.0.stem_vocals.wav
Maximum absolute deviation from source additivity constraint: 1.015533447265625
Mean absolute deviation from source additivity constraint:    0.09679516069867423

On the musdb website it is also stated that:

Since the mixture is separately encoded as AAC, there there is a small difference between the sum of all sources and the mixture. This difference has no impact on the bsseval evaluation performance.

Some of my code:

SR = 44100
R = 0.1

def timeStretch(y, rate=0):

    y_right = y[:, 0]
    y_left = y[:, 1]

    y_stretched_R = librosa.effects.time_stretch(y_right, rate=rate)
    y_stretched_L = librosa.effects.time_stretch(y_left, rate=rate)

    y_stretched = np.array([y_stretched_R, y_stretched_L])

    return y_stretched


# open stem and retrieve all channels
stem_path = os.path.join(ORIGINAL_STEMS_DIR, f)
info = stempeg.Info(stem_path)
S, _ = stempeg.read_stems(stem_path, info=info)

process_list = [S[0], S[1], S[2], S[3], S[4]]
for audio_to_process in process_list:
    y_stretched = timeStretch(audio_to_process, rate=R)
    stretched_list.append(y_stretched)

# create and save stem
S = np.array(stretched_list)
S = np.swapaxes(S,1,2) #n x samples x channels
stempeg.write_stems(S, output_mp4, rate=SR)

Do you have any idea on what could be the problem here?
Thanks a lot!

pre-trained model full_44KHz works fine, but the pre-trained model baseline_stereo cant not work

C:\Users\Administrator\Anaconda3\envs\AudioProcess\python.exe

"C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.4\helpers\pydev\pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 58567 --file D:/Workspace/MusicVoiceSeparation/OpenSourcs/Wave-U-Net-master/Predict.py
pydev debugger: process 133692 is connecting

Connected to pydev debugger (build 182.4505.26)
WARNING - Waveunet Prediction - No observers have been added to this run
INFO - Waveunet Prediction - Running command 'main'
INFO - Waveunet Prediction - Started
Backend Qt5Agg is interactive backend. Turning interactive mode on.
Producing source estimates for input mixture file audio_examples\testset\Jam - 七月上.mp3
Testing...
2018-12-27 10:15:18.122639: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Num of variables54
INFO:tensorflow:Restoring parameters from checkpoints\baseline_stereo\baseline_stereo-186093
INFO - tensorflow - Restoring parameters from checkpoints\baseline_stereo\baseline_stereo-186093
2018-12-27 10:15:19.049623: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key separator/conv1d_26/bias not found in checkpoint
ERROR - Waveunet Prediction - Failed after 0:01:41!
Traceback (most recent calls WITHOUT Sacred internals):
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\client\session.py", line 1292, in _do_call
return fn(*args)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\client\session.py", line 1277, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\client\session.py", line 1367, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.NotFoundError: Key separator/conv1d_26/bias not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\training\saver.py", line 1538, in restore
{self.saver_def.filename_tensor_name: save_path})
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\client\session.py", line 887, in run
run_metadata_ptr)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\client\session.py", line 1110, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\client\session.py", line 1286, in _do_run
run_metadata)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\client\session.py", line 1308, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key separator/conv1d_26/bias not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

Caused by op 'save/RestoreV2', defined at:
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.4\helpers\pydev\pydevd.py", line 1664, in
main()
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.4\helpers\pydev\pydevd.py", line 1658, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.4\helpers\pydev\pydevd.py", line 1068, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2018.2.4\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/Workspace/MusicVoiceSeparation/OpenSourcs/Wave-U-Net-master/Predict.py", line 14, in
@ex.automain
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\sacred\experiment.py", line 137, in automain
self.run_commandline()
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\sacred\experiment.py", line 260, in run_commandline
return self.run(cmd_name, config_updates, named_configs, {}, args)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\sacred\experiment.py", line 209, in run
run()
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\sacred\run.py", line 221, in call
self.result = self.main_function(*args)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\sacred\config\captured_function.py", line 46, in captured_function
result = wrapped(*args, **kwargs)
File "D:/Workspace/MusicVoiceSeparation/OpenSourcs/Wave-U-Net-master/Predict.py", line 17, in main
Evaluate.produce_source_estimates(model_config, model_path, input_path, output_path)
File "D:\Workspace\MusicVoiceSeparation\OpenSourcs\Wave-U-Net-master\Evaluate.py", line 193, in produce_source_estimates
sources_pred = predict(track, model_config, load_model) # Input track to prediction function, get source estimates
File "D:\Workspace\MusicVoiceSeparation\OpenSourcs\Wave-U-Net-master\Evaluate.py", line 65, in predict
restorer = tf.train.Saver(None, write_version=tf.train.SaverDef.V2)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\training\saver.py", line 1094, in init
self.build()
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\training\saver.py", line 1106, in build
self._build(self._filename, build_save=True, build_restore=True)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\training\saver.py", line 1143, in _build
build_save=build_save, build_restore=build_restore)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\training\saver.py", line 787, in _build_internal
restore_sequentially, reshape)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\training\saver.py", line 406, in _AddRestoreOps
restore_sequentially)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\training\saver.py", line 854, in bulk_restore
return io_ops.restore_v2(filename_tensor, names, slices, dtypes)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 1549, in restore_v2
shape_and_slices=shape_and_slices, dtypes=dtypes, name=name)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\util\deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\framework\ops.py", line 3272, in create_op
op_def=op_def)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\framework\ops.py", line 1768, in init
self._traceback = tf_stack.extract_stack()

NotFoundError (see above for traceback): Key separator/conv1d_26/bias not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
File "D:/Workspace/MusicVoiceSeparation/OpenSourcs/Wave-U-Net-master/Predict.py", line 17, in main
Evaluate.produce_source_estimates(model_config, model_path, input_path, output_path)
File "D:\Workspace\MusicVoiceSeparation\OpenSourcs\Wave-U-Net-master\Evaluate.py", line 193, in produce_source_estimates
sources_pred = predict(track, model_config, load_model) # Input track to prediction function, get source estimates
File "D:\Workspace\MusicVoiceSeparation\OpenSourcs\Wave-U-Net-master\Evaluate.py", line 67, in predict
restorer.restore(sess, load_model)
File "C:\Users\Administrator\Anaconda3\envs\AudioProcess\lib\site-packages\tensorflow\python\training\saver.py", line 1554, in restore
err, "a Variable name or other graph key that is missing")
tensorflow.python.framework.errors_impl.NotFoundError: Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key separator/conv1d_26/bias not found in checkpoint
[[{{node save/RestoreV2}} = RestoreV2[dtypes=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Test it on random song

Hello !

I would like to ask how to test it on random song (test.wav)

Thank you in advance

(Question) GUI version?

Will there be a GUI verion of this at some point? because I'm shit at coding.

scikits.audiolab: failed

I get this error while installing required scikits.audiolab

Command "python setup.py egg_info" failed with error code 1 in /private/tmp/pip-install-BXHQz4/scikits.audiolab/

This is because scikits.audiolab needs libsndfile, so first of all this one must be installed:

brew install libsndfile

Wave-U-Net: TypeError: guvectorize() missing 1 required positional argument: 'signature' ?

why?

Requirements for CPU on macOS

These are the right requirements when on macOS (hence no NVIDIA, no GPU :(

sacred==0.7.3
#tensorflow-gpu==1.8.0
tensorflow == 1.8
librosa==0.6.2
scikit-image==0.13.1
soundfile==0.10.2
scikits.audiolab==0.11.0
lxml==4.2.1
musdb==0.2.3
museval==0.2.0
google
protobuf

Note that both google and protobuf are needed - according to tensorflow/tensorflow#6341

Thank you.

Qustions-About the paper result

Hi，I'm trying training the M6 with the musdb dataset and have the following two questions to consult you.
1)How much GPU memory is needed to train this model?I have to set the batch size to 1, otherwise the GPU will report a memory error at the beginning of training.

2)When the code that Training.py is finished, the result saved in the folder where the evaluation results are saved is 151 json files, one json file for each song, and one test-test.json file.There are also four separate audio sources for each song.I want to know how to produce the Table3(test performance metrics for multi-instrument model) in the paper?
I think the compute_mean_metrics(json_folder, compute_averages=True, metric="SDR") function will computes all the numbers shown in the paper (Mean, SD, Median, MAD),but I don’t know how to use this function.I only found that the drawing module named plot.py calls this function, so how do I use this function in the evaluation process?
I am a deep learning beginner and hope to get your answer

Error during evaluation: `pad_width` must be of integral type.

Great project! Upon model evaluation, I am facing the following issue:

Pre-trained model restored for song prediction
ERROR - Waveunet Prediction - Failed after 0:00:11!
Traceback (most recent calls WITHOUT Sacred internals):
  File "/home/martin/Wave-U-Net/Predict.py", line 17, in main
    Evaluate.produce_source_estimates(model_config, model_path, input_path, output_path)
  File "/home/martin/Wave-U-Net/Evaluate.py", line 193, in produce_source_estimates
    sources_pred = predict(track, model_config, load_model) # Input track to prediction function, get source estimates
  File "/home/martin/Wave-U-Net/Evaluate.py", line 71, in predict
    separator_preds = predict_track(model_config, sess, mix_audio, orig_sr, sep_input_shape, sep_output_shape, separator_sources, mix_context)
  File "/home/martin/Wave-U-Net/Evaluate.py", line 138, in predict_track
    mix_audio_padded = np.pad(mix_audio, [(pad_time_frames, pad_time_frames), (0,0)], mode="constant", constant_values=0.0)
  File "/home/martin/anaconda3/envs/tf-gpu/lib/python3.6/site-packages/numpy/lib/arraypad.py", line 1197, in pad
    raise TypeError('`pad_width` must be of integral type.')
TypeError: `pad_width` must be of integral type.

Any thoughts on what might be the cause?

Thanks!

Random seed

Hello, I'm trying to find the validation set you used.

To split data into train/validation, you said you fixed the random seed.

Wave-U-Net/Training.py

Line 208 in 55109a0

 # Pick 25 random songs for validation from MUSDB train set (this is always the same selection each time since we fix the random seed!) 

I found you defined random seed in config,

Wave-U-Net/Config.py

Line 37 in 48fb1a2

seed=1337

but I couldn't find where this value is used.

Where was the random seed fixed?
Or, can you tell me the indices of validation split you used?

Python 3 Support

Hi,

I am planning on using this code in a personal Python 3 project. Are the maintainers of this repository interested in a Python 3 rewrite? Does the current implementation support Python 3? (I have not dug into the code deeply, so I am uncertain if this package is compatible with Python 3.) If work is needed to support Python 3, I'd be happy to contribute

Evaluation mertirc of models

To evaluate model performance, did you average segment-wise SDRs over the whole dataset using below function?

https://github.com/f90/Wave-U-Net/blob/master/Evaluate.py#L207

def compute_mean_metrics(json_folder, compute_averages=True):
    files = glob.glob(os.path.join(json_folder, "*.json"))
    sdr_inst_list = None
    for path in files:
        #print(path)
        with open(path, "r") as f:
            js = json.load(f)

        if sdr_inst_list is None:
            sdr_inst_list = [list() for _ in range(len(js["targets"]))]

        for i in range(len(js["targets"])):
            sdr_inst_list[i].extend([np.float(f['metrics']["SDR"]) for f in js["targets"][i]["frames"]])

    #return np.array(sdr_acc), np.array(sdr_voc)
    sdr_inst_list = [np.array(sdr) for sdr in sdr_inst_list]

    if compute_averages:
        return [(np.nanmedian(sdr), np.nanmedian(np.abs(sdr - np.nanmedian(sdr))), np.nanmean(sdr), np.nanstd(sdr)) for sdr in sdr_inst_list]
    else:
        return sdr_inst_list

TypeError: init() got an unexpected keyword argument 'serialized_options'

I finally was able to install all required packages, but after that:

Traceback (most recent call last):
  File "Predict.py", line 3, in <module>
    import Evaluate
  File "/Users/loretoparisi/Documents/Projects/AI/Wave-U-Net/Evaluate.py", line 2, in <module>
    import tensorflow as tf
  File "/Library/Python/2.7/site-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/Library/Python/2.7/site-packages/tensorflow/python/__init__.py", line 59, in <module>
    from tensorflow.core.framework.graph_pb2 import *
  File "/Library/Python/2.7/site-packages/tensorflow/core/framework/graph_pb2.py", line 15, in <module>
    from tensorflow.core.framework import node_def_pb2 as tensorflow_dot_core_dot_framework_dot_node__def__pb2
  File "/Library/Python/2.7/site-packages/tensorflow/core/framework/node_def_pb2.py", line 15, in <module>
    from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
  File "/Library/Python/2.7/site-packages/tensorflow/core/framework/attr_value_pb2.py", line 15, in <module>
    from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
  File "/Library/Python/2.7/site-packages/tensorflow/core/framework/tensor_pb2.py", line 15, in <module>
    from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
  File "/Library/Python/2.7/site-packages/tensorflow/core/framework/resource_handle_pb2.py", line 22, in <module>
    serialized_pb=_b('\n/tensorflow/core/framework/resource_handle.proto\x12\ntensorflow\"r\n\x13ResourceHandleProto\x12\x0e\n\x06\x64\x65vice\x18\x01 \x01(\t\x12\x11\n\tcontainer\x18\x02 \x01(\t\x12\x0c\n\x04name\x18\x03 \x01(\t\x12\x11\n\thash_code\x18\x04 \x01(\x04\x12\x17\n\x0fmaybe_type_name\x18\x05 \x01(\tBn\n\x18org.tensorflow.frameworkB\x0eResourceHandleP\x01Z=github.com/tensorflow/tensorflow/tensorflow/go/core/framework\xf8\x01\x01\x62\x06proto3')
TypeError: __init__() got an unexpected keyword argument 'serialized_options'

I have linked the pre-trained nmodels in the checkpoints folder

(.env) ip-192-168-22-127:checkpoints loretoparisi$ tree -L 1
.
├── README.md
├── baseline_stereo -> /Users/loretoparisi/wavenet/models/baseline_stereo
└── full_multi_instrument -> /Users/loretoparisi/wavenet/models/full_multi_instrument

2 directories, 1 file

FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated

It's just a deprecation warning, but it should be use to fix it

/Library/Python/2.7/site-packages/scipy/signal/signaltools.py:2383: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.

ERROR: "Key separator/interp_0" not found in checkpoint" in my own trained model

I just forget cfg.full_44KHz when training

Pretrained models NotFoundError

run the command:
python2 Predict.py with cfg.full_multi_instrument
results in:
NotFoundError: Key separator/conv1d_26/bias not found in checkpoint
however, the command python2 Predict.py with cfg.full_44KHz will terminate with no errors.

pretrained models can't be downloaded

How much is the GPU usage?

Hello.
I want to know how much is the GPU usage while processing your code.
The GPU doesn't work on my PC when training process...
Please teach me about it.

Question - About audio input resampling

Hello, according to the util.load method

Wave-U-Net/Utils.py

Line 110 in 1dd50fd

 def load(path, sr=22050, mono=True, offset=0.0, duration=None, dtype=np.float32): 

you are loading the audio file through librosa.load but not doing any resampling here, while running a resampling by defaults of sr=22050 with scypy. resample_poly method.
I'm using produce_source_estimates where the audio is loaded like

audio, sr = Utils.load(input_path, sr=None, mono=False)

so it's not using any resampling by defaults. Any reason for that? In my CNN I have a resampling of 12KHz of the input, so can I do apply this resampling?

How to enable multi-GPU

How to enable multi-GPU support to speed-up training?

Encoder outputs handling in UnetSpectrogramSeparator

Sorry, delete this. From now on everything is clear to me.

Training.py not running on GPU

Hi @f90 ,
I have a GPU in my paperspace machine. I am trying to run Training.py using the following command: python Training.py with cfg.full_44KHz. But the script runs only on CPU, not on GPU.
I have just forked this repo and executing the Training.py script.
How can I make sure that it runs on GPU.
I am a Pytorch guy, and trying to refactor the codebase for Pytorch. Please help.
Thak you

How to adapt for real-time / streaming?

Suppose I'm reading audio from a live music performance. Is it possible to do real time vocal extraction using wave-u-net? How small chunks of audio would you recommend using and if you could please give some pointers on how to go about making the modifications to process a chunk at a time. Many thanks for this wonderful project.

Not sure if this is even possible

Hello again. I noticed you added the multi-instrument dataset and I'm very interested in this. However: I cannot get this running on Windows, which is my primary OS... I think it might be impossible since I have every dependency installed on my Python 3.5 installation aside from scikits.audiolab. As far as I can tell, that one isn't available for Python 3.x... Am I hopeless? I don't have an Ubuntu computer handy (I can't dual boot since Windows overwrites the GRUB bootloader every time after you boot back into Windows...

KeyError: 'brand'

(yinpin) c:\download\Wave-U-Net-master>python Predict.py with cfg.full_multi_instrument model_path="checkpoints/full_multi_instrument/full_multi_instrument-134067" input_path="/mnt/medien/Daniel/Music/Dark Passion Play/Nightwish - Bye Bye Beautiful.mp3" output_path="/home/daniel"
Training multi-instrument separation with best model
Traceback (most recent call last):
File "Predict.py", line 14, in
@ex.automain
File "C:\Users\Administrator\anaconda3\envs\yinpin\lib\site-packages\sacred\experiment.py", line 137, in automain
self.run_commandline()
File "C:\Users\Administrator\anaconda3\envs\yinpin\lib\site-packages\sacred\experiment.py", line 260, in run_commandline
return self.run(cmd_name, config_updates, named_configs, {}, args)
File "C:\Users\Administrator\anaconda3\envs\yinpin\lib\site-packages\sacred\experiment.py", line 208, in run
meta_info, options)
File "C:\Users\Administrator\anaconda3\envs\yinpin\lib\site-packages\sacred\experiment.py", line 433, in _create_run
None))
File "C:\Users\Administrator\anaconda3\envs\yinpin\lib\site-packages\sacred\initialize.py", line 394, in create_run
host_info = get_host_info()
File "C:\Users\Administrator\anaconda3\envs\yinpin\lib\site-packages\sacred\host_info.py", line 40, in get_host_info
host_info[k] = v()
File "C:\Users\Administrator\anaconda3\envs\yinpin\lib\site-packages\sacred\host_info.py", line 95, in _cpu
return _get_cpu_by_pycpuinfo()
File "C:\Users\Administrator\anaconda3\envs\yinpin\lib\site-packages\sacred\host_info.py", line 159, in _get_cpu_by_pycpuinfo
return cpuinfo.get_cpu_info()['brand']
KeyError: 'brand'

how to make the code running on the specific gpu?such as gpu1 rather than default gpu0

global name 'exit' is not defined

..../scared/experiment.py , line 271, in commandlie
exit(1).
I don't know how handle it.

Excuse me, how can I use test.py?

I've trained my model，but I can't use test.py to evaluate my parameters

ImportError: Matplotlib backend_wx and backend_wxagg require wxPython >=2.8.12

This is another potential issue when in macOS that can be handled changing matplotlib settings in ~/.matplotlib/matplotlibrc: with

backend : MacOSX

ModuleNotFoundError: No module named 'tensorflow.contrib'

How to solve this problem, thank you!

Further optimization of the model with the help of "Neural Structured Learning"?

Recently Tensorflow also supports NSL. This is supposed to be a great support for the learning processes in order to train the model better.

So I just wanted to ask if this might be relevant for this project as well. From your point of view, would it be possible to optimize the results, or is this the completely wrong approach?

I thought I'd ask the author directly, because of course it's a bit easier before I'm puzzling around myself for a long time.

I would be very grateful for a short and concise answer!

libcublas.so.9.0: cannot open shared object file: No such file or directory

I'm not sure what's happening. here's the log:

Traceback (most recent call last):
File "Predict.py", line 3, in
import Evaluate
File "/home/user/Wave-U-Net-master/Evaluate.py", line 2, in
import tensorflow as tf
File "/usr/local/lib/python2.7/dist-packages/tensorflow/init.py", line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

After epoch, <defunct> process remains

EPOCH: 1
Starting worker
2018-10-21 14:50:57.740707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-10-21 14:50:57.740741: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-10-21 14:50:57.740747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-10-21 14:50:57.740751: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-10-21 14:50:57.740864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10417 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:04:00.0, compute capability: 6.1)
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument
terminate called after throwing an instance of 'std::system_error'
what(): Invalid argument

num_workers was 6 and when I checked process, there were 5 defunct childerens.
Is stop_worker problem??
server is Ubuntu 16.04.5 LTS (GNU/Linux 4.15.0-36-generic x86_64)

Pre-trained models

Since I can't access www.dropbox.com, can you share the pre-trained model in another way?

How to run the codes in multi GPUs?

@f90 Thank you for your previous work, what you did save me a lot of time. But now I want to try to run the codes on multiple GPUs, but I found that I can't run directly. So I decided to modify the codes, but only to find that TensorFlow is not good for multi-GPUs support. Since you are the person who wrote these codes, I wonder if you could change the code to run on multiple GPUs? Thanks again.

MacOSX additional installation steps

I was getting this error on MacOSX:
Python is not installed as a framework. The Mac OS X backend will not be able to function correctly if Python is not installed as a framework.

and also then missing library:
ImportError: No module named wx

I fixed using the following steps:

export MPLBACKEND="WXAgg"
pip install tensorflow==1.8.0
pip install -U wxPython
python Predict.py with cfg.full_44KHz

Index out of range errors when using ccmixter

I had some trouble getting this to run with the ccmixter tracks included, using python Training.py with full. It appears that the ccmixter tracks are tuples of length 3, while the musdb tracks are of length 6.

The following line doesn't take that into account, and returns index out of range errors. I added a conditional to check for the length of the tuple. Not sure if I did something wrong or not, but I'm just throwing this out there in case anyone else runs into it:

https://github.com/f90/Wave-U-Net/blob/master/Training.py#L362

Question - About Prediction time over CPU and GPU

I'm doing some tests for CPU and GPU environment usages for prediction (Predict.py).
I'm using an audio file Audio: mp3, 44100 Hz, stereo, fltp, 192 kb/s of duration 00:03:15.29

$ ffprobe /audio/12380187.mp3
ffprobe version 4.0 Copyright (c) 2007-2018 the FFmpeg developers
  built with Apple LLVM version 9.1.0 (clang-902.0.39.1)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libmp3lame --enable-libx264 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma
  libavutil      56. 14.100 / 56. 14.100
  libavcodec     58. 18.100 / 58. 18.100
  libavformat    58. 12.100 / 58. 12.100
  libavdevice    58.  3.100 / 58.  3.100
  libavfilter     7. 16.100 /  7. 16.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  1.100 /  5.  1.100
  libswresample   3.  1.100 /  3.  1.100
  libpostproc    55.  1.100 / 55.  1.100
Input #0, mp3, from '/audio/12380187.mp3':
  Metadata:
    encoder         : Lavf56.40.101
  Duration: 00:03:15.29, start: 0.025057, bitrate: 192 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 192 kb/s
    Metadata:
      encoder         : Lavc56.60

On a Intel i7 - 12 core CPU the prediction time log says Completed after 0:03:19

$ time python Predict.py with cfg.full_44KHz input_path=/audio/12380187.mp3 output_path=/audio_sep/
Training full singing voice separation model, with difference output and input context (valid convolutions) and stereo input/output, and learned upsampling layer, and 44.1 KHz sampling rate
WARNING - Waveunet Prediction - No observers have been added to this run
INFO - Waveunet Prediction - Running command 'main'
INFO - Waveunet Prediction - Started
Producing source estimates for input mixture file /audio/12380187.mp3
Testing...
2018-11-20 14:54:05.306099: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Num of variables64
INFO:tensorflow:Restoring parameters from checkpoints/full_44KHz/full_44KHz-236118
INFO - tensorflow - Restoring parameters from checkpoints/full_44KHz/full_44KHz-236118
Pre-trained model restored for song prediction
INFO - Waveunet Prediction - Completed after 0:03:19

real	3m26.034s
user	13m30.420s
sys	4m40.200s

while on Intel Xeon 12 core plus 2x Nvidia GeForce GTX 1080 says Completed after 0:00:16

$ time python Predict.py with cfg.full_44KHz input_path=/audio/12380187.mp3
/usr/local/lib/python2.7/dist-packages/scikits/audiolab/soundio/play.py:48: UserWarning: Could not import alsa backend; most probably, you did not have alsa headers when building audiolab
  warnings.warn("Could not import alsa backend; most probably, "
Training full singing voice separation model, with difference output and input context (valid convolutions) and stereo input/output, and learned upsampling layer, and 44.1 KHz sampling rate
WARNING - Waveunet Prediction - No observers have been added to this run
INFO - Waveunet Prediction - Running command 'main'
INFO - Waveunet Prediction - Started
Producing source estimates for input mixture file /audio/12380187.mp3
Testing...
2018-11-21 12:34:13.829481: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-21 12:34:13.830157: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:01:00.0
totalMemory: 7.92GiB freeMemory: 7.46GiB
2018-11-21 12:34:13.961794: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2018-11-21 12:34:13.962562: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 1 with properties: 
name: GeForce GTX 1080 major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:02:00.0
totalMemory: 7.93GiB freeMemory: 7.81GiB
2018-11-21 12:34:13.963292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0, 1
2018-11-21 12:34:14.531254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-11-21 12:34:14.531305: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 1 
2018-11-21 12:34:14.531329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N Y 
2018-11-21 12:34:14.531336: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 1:   Y N 
2018-11-21 12:34:14.531830: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7209 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-11-21 12:34:14.589915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7543 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080, pci bus id: 0000:02:00.0, compute capability: 6.1)
Num of variables64
INFO:tensorflow:Restoring parameters from checkpoints/full_44KHz/full_44KHz-236118
INFO - tensorflow - Restoring parameters from checkpoints/full_44KHz/full_44KHz-236118
Pre-trained model restored for song prediction
INFO - Waveunet Prediction - Completed after 0:00:16

real	0m18.340s
user	0m15.972s
sys	0m5.528s

I'm not sure from logging if tensorflow is using both gpu devices or gpu 0 only. If I'm not wrong, most of the work is done in the Models.py here https://github.com/f90/Wave-U-Net/blob/master/Models/UnetSpectrogramSeparator.py#L39 when the computation graph is calculated. I assume that these operations go on gpu:0 in this configuration, so gpu:1 will not be used - but I'm not sure of it.

Thank you very much!

Possible to train with number of sources K=1?

Hello there,

I have to admit that you are doing very very great job on this audio source separation task. After reading through your paper and tried your codes, I would say this is an evaluation of audio source separation methods, especially for music and vocal separation. How comes you are so smart to come up with this idea that using the entire time domain input for training (If I am not misunderstanding your method) that is able to give smoother and more natural audio outputs, compared to those spectrum/STFT based methods. The sound quality (which is a high demand for the music industry) of your prediction is incredibly high, and the accuracy is decent considering you are only using a very very small dataset for training. Excellent job!!

So back to my question. I am really interested in your model and would expect huge improvements with larger training datasets. However, as indicated in your paper, you are using K number of sources for prediction output, so K=2 for vocal separation (if I am not wrong), which requires both accompaniment and clean vocal datasets for training. While accompaniment music is easy to find but clean vocals are not, so this makes it harder to train on a larger dataset. So I am asking, let's say, if I am only interested in the prediction of accompaniment music, is your model able to predict K=1 output which is exactly the accompaniment prediction I want, that I can apply to a large accompaniment training set.

Thanks very much!

Matplotlib Version

When un Ubuntu, using current requirements with sudo -H pip install -r requirements the latest matplotlib will be installed (3.3). But since we need Python 2.7, the requirements should explicitly specify a lower version of it:

matplotlib<3.0

so that the 2.2.3 will be installed:

$ python -c "import matplotlib as mp; print mp.__version__"
2.2.3