Giter VIP home page Giter VIP logo

chimeranet's Introduction

ChimeraNet

An implementation of music separation model by Luo et.al.

Getting started

Sample separation task with pretrained model
  1. Prepare .wav files to separate.

  2. Install library pip install git+https://github.com/leichtrhino/ChimeraNet

  3. Download pretrained model.

  4. Download sample script.

  5. Run script

python chimeranet-separate.py -i ${input_dir}/*.wav \
    -m model.hdf5 \
    --replace-top-directory ${output_dir}

Output in nutshell

  • the format of filename is ${input_file}_{embd,mask}_ch[12].wav.
  • embd and mask indicates that it was inferred from deep clustering and mask respectively.
  • ch1 and ch2 are voice and music channel respectively.
Train and separation examples

See Example section on ChimeraNet documentation.

Install

Requirements
  • keras
  • one of keras' backends (i.e. TensorFlow, CNTK, Theano)
  • sklearn
  • librosa
  • soundfile
Instructions
  1. Run pip install git+https://github.com/leichtrhino/ChimeraNet or any python package installer. (Currently, ChimeraNet is not in PyPI.)
  2. Install keras' backend if the environment does not have any. Install tensorflow if unsure.

See also

chimeranet's People

Contributors

leichtrhino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

ishine jhlusko

chimeranet's Issues

sample_rate argument

Hello again! I have tried my own wav files with the following characteristics shown by the soxi command. With the wav files, chimeranet-train.py threw an error when I set the --sr argument to either 44100 or 22050 (the error message >> Error when checking input: expected input_1 to have shape (64, 259) but got array with shape (64, 188)), whereas the default sampling rate of 16,000 just works fine to train. I am wondering whether higher than 16,000 SR is accepted for training (or higher SR makes any sense to be used in your system). Thanks.

Input File : 'XXX.wav'
Channels : 1
Sample Rate : 44100
Precision : 16-bit
Duration : 00:00:02.00 = 88200 samples = 150 CDDA sectors
File Size : 184k
Bit Rate : 734k
Sample Encoding: 16-bit Signed Integer PCM

folder format

Hello!
It is easy to use and seems to work. Before pursuing more with my own data, please guide me on the data placement. I placed the wav. files as shown below. Is this the correct way to process melody1 and vocal1, etc as pairs?? Adding more readme would be appreciated. Thanks!!

root- |
|--melody--
|-melody1.wav
|-melody2.wav
|--vocal
|-vocal1.wav
|-vocal2.wav

Total training samples

Hi,

I have a question about the data_generator.py, generate_test_data function.

How do you calculate the number of steps (samples)=7200 in your training script using this function. The generator function for keras requires you to know the number of samples/steps per epoch.

How can I use this to calculate the samples for a different dataset ?
Also, how can I calculate the same for a validation set?

Any help for the understanding would be appreciated.

Speech Separation

Hi,

Did you try to use this network for separating two speakers sources from a mixture of them?

Details about the pretrained net

Hi,
thanks for implementing the chimera network! Could you explain what data the pretrained model was trained on? What are the sources it separates? What does the 120 in the name stand for? Do you have more pretrained models that you can share?

Thanks and best regards
Verena

Joining coding forces

Hi leichtrhino,

I'd like to ask if you'd like to join our dev team of Asteroid, this would be really great as we would be faster to develop things and have even more coverage of the architectures and dataset we can quickly experiment with ๐Ÿš€

Don't hesitate to join our slack to discuss potential collaboration ! ๐Ÿ˜ƒ

Mask-Inference layers

https://github.com/arity-r/ChimeraNet/blob/6341383c61f238a83a0be8c7d4972aac4e7d958a/chimeranet/model.py#L57-L69

Correct me if I am wrong. One can simply change this block of code with Dense, and Reshape layers like this:

mask_linear = Dense(self.F*self.C, activation='softmax', name='mask_linear')(body_linear)
mask = Reshape((self.T, self.F, self.C), name='mask')(mask_linear)

I think, the API can handle the gradients update accordingly because of the Reshape layer. Also, it does not require much memory because the masks are not extracted a list. But I wonder if it would be a a correct definition for the mask-inference head of the model.

The reference I use is the Chimera++ network from the paper ALTERNATIVE OBJECTIVE FUNCTIONS FOR DEEP CLUSTERING which redefines the architecture for speaker separation.

Screenshot from 2019-08-01 11-11-57

Topological sorting of the Graph

When running the training, before the initial epoch I get a log like:

2019-08-02 08:32:55.251530: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-08-02 08:32:55.305719: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-08-02 08:32:56.161844: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-08-02 08:32:56.213568: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:666] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.

Although, I don't have any problem with the final model accuracy. I think, this is related to the fact that there are multiple heads in the Functional definition of the model. Do you also get the same when you run the script? Which Keras/tensorflow version do you use?

The versions, I am using are:

  • tensorflow-gpu: 1.12.0
  • Keras: 2.2.4
  • CUDA Version: 9.0.176

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.