Giter VIP home page Giter VIP logo

kapre's Introduction

Kapre

Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.

Tested on Python 3.6 and 3.7

Why Kapre?

vs. Pre-computation

  • You can optimize DSP parameters
  • Your model deployment becomes much simpler and consistent.
  • Your code and model has less dependencies

vs. Your own implementation

  • Quick and easy!
  • Consistent with 1D/2D tensorflow batch shapes
  • Data format agnostic (channels_first and channels_last)
  • Less error prone - Kapre layers are tested against Librosa (stft, decibel, etc) - which is (trust me) trickier than you think.
  • Kapre layers have some extended APIs from the default tf.signals implementation such as..
    • A perfectly invertible STFT and InverseSTFT pair
    • Mel-spectrogram with more options
  • Reproducibility - Kapre is available on pip with versioning

Workflow with Kapre

  1. Preprocess your audio dataset. Resample the audio to the right sampling rate and store the audio signals (waveforms).
  2. In your ML model, add Kapre layer e.g. kapre.time_frequency.STFT() as the first layer of the model.
  3. The data loader simply loads audio signals and feed them into the model
  4. In your hyperparameter search, include DSP parameters like n_fft to boost the performance.
  5. When deploying the final model, all you need to remember is the sampling rate of the signal. No dependency or preprocessing!

Installation

pip install kapre

API Documentation

Please refer to Kapre API Documentation at https://kapre.readthedocs.io

One-shot example

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer

# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
               input_shape=input_shape))
model.add(Magnitude())
model.add(MagnitudeToDecibel())  # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer() 

# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())

# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification

# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!

Tflite compatbility

The STFT layer is not tflite compatible (due to tf.signal.stft). To create a tflite compatible model, first train using the normal kapre layers then create a new model replacing STFT and Magnitude with STFTTflite, MagnitudeTflite. Tflite compatible layers are restricted to a batch size of 1 which prevents use of them during training.

# assumes you have run the one-shot example above.
from kapre import STFTTflite, MagnitudeTflite
model_tflite = Sequential()

model_tflite.add(STFTTflite(n_fft=2048, win_length=2018, hop_length=1024,
               window_name=None, pad_end=False,
               input_data_format='channels_last', output_data_format='channels_last',
               input_shape=input_shape))
model_tflite.add(MagnitudeTflite())
model_tflite.add(MagnitudeToDecibel())  
model_tflite.add(Conv2D(32, (3, 3), strides=(2, 2)))
model_tflite.add(BatchNormalization())
model_tflite.add(ReLU())
model_tflite.add(GlobalAveragePooling2D())
model_tflite.add(Dense(10))
model_tflite.add(Softmax())

# load the trained weights into the tflite compatible model.
model_tflite.set_weights(model.get_weights())

Citation

Please cite this paper if you use Kapre for your work.

@inproceedings{choi2017kapre,
  title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
  author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
  booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
  year={2017},
  organization={ICML}
}

kapre's People

Contributors

cgratie avatar douglas125 avatar jackz314 avatar jamesmishra avatar keunwoochoi avatar path-a avatar tgabor avatar timgates42 avatar xreyrobert-ibm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kapre's Issues

Functional API example?

As in title; is it possible to show in examples how to use kapre with keras functional API?

STFT

At this point, I feel like we definitely update the current STFT implementation with more native STFT operations for speed/stability. Out of three Keras backends -- tf, theano, cntk -- only tf has its STFT, but in practice, it'd be fair to assume that the majority is using TF. It'd make sense to have another type of layer, probably STFT (currently we have Spectrogram for it with conv2d-based implementation), which will be only available for TF-backend.

Well, also, before doing it, Melspectrogram layer can be re-implemented to be based on Sequential so that ultimately, Melspectrogramcan be based on either Spectrogram or STFT.

Guess it's my to-do for one of my nothing-to-do weekends in the future.

stft.py error

There seems to be a mix up between the variable names "n_dft" and "n_fft". For example, in init for the class, the argument is clearly n_fft, but the code that triggers when "n_hop" is undefined references "n_dft" which is not defined in this constructor. Similar problems are present in the documentation, it references "n_dft", but the constructor clearly uses "n_fft".

Tensorflow 2.0

Any plans for integrating it for tensorflow-2.0 ?

Currently here's an issue i faced when using kapre with tf 2.0

Python 3.6.8 |Anaconda, Inc.| (default, Feb 21 2019, 18:30:04) [MSC v.1916 64 bit (AMD64)] on win32
>>> import os,sys,pdb
>>> import tensorflow as tf
>>> print(tf.__version__)
2.0.0-alpha0
>>> print(tf.keras.__version__)
2.2.4-tf
>>> from tensorflow.keras.models import Sequential
>>> from kapre.time_frequency import Melspectrogram
Using TensorFlow backend.
>>> from kapre.utils import Normalization2D
>>> from kapre.augmentation import AdditiveNoise
>>> input_shape = (6, 44100)
>>> sr = 44100
>>> model = Sequential()
>>> model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=input_shape,
...                          padding='same', sr=sr, n_mels=128,
...                          fmin=0.0, fmax=sr/2, power_melgram=1.0,
...                          return_decibel_melgram=False, trainable_fb=False,
...                          trainable_kernel=False,
...                          name='trainable_stft'))
Traceback (most recent call last):
  File "<stdin>", line 6, in <module>
  File "C:\Users\.conda\envs\Python36_tf2_kapre\lib\site-packages\tensorflow\python\training\tracking\base.py", line 456, in _method_wrapper
    result = method(self, *args, **kwargs)
  File "C:\Users\.conda\envs\Python36_tf2_kapre\lib\site-packages\tensorflow\python\keras\engine\sequential.py", line 152, in add
    'Found: ' + str(layer))
TypeError: The added layer must be an instance of class Layer. Found: <kapre.time_frequency.Melspectrogram object at 0x000001FD56DA69E8>

todo's

  • remove out stft.
  • fully keras 2 (e.g., image_data_format)
  • python3 compatible
    then
  • update documentation
  • update pip

Why use K.conv2d ?

I wonder why do you use keras.conv2d in Spectrogram knowing that the STFT computation is based on a 1d convolution.
If you would use keras.conv1d, I think you would be able to remove one axis from the filters and the inputs and thus gain in computation efficiency.
Do you have any argument why you use 2D convolutions over 1D convolutions? Is it about the shape ?

Unit tests

It'd be awesome (and necessary to ensure correctness) to add unit tests and continuous integration (e.g. via travis) to this project - this way pull requests can be validated against the tests before being merged into master to ensure no bugs are introduced to the code base. It'll also save people from re-writing small test scripts when developing new features.

Scipy spectogram

Let's assume that I wish to perform the function in the subject. particularly , getting the magnitude. Which way should I use Kapre?

"channels_first" vs "channels_last" parameter for the input

Hi,

Thank you for the excellent library, it is really very handy.

I have a minor issue related to the fact that the "image_data_format" is only applicable to the output format. I think it would be better if this parameter applies to both the input and the output. Is it possible to change this so if we choose channels_last as a configuration, the input should also be shaped like the output, so something like (samples, n_channels).
I would change the parameter to a Boolean and name it "channels_first" (with a default value to true). Or simply add another parameter for the input, something like "audio_data_format". What do you think?

Thanks!

CQT

It is a feature request. Is it possible to add constant Q transform in this library?

Conversion to Core ML

Hi all,

I've tryed to convert this to coreml but failed so far...
I tried 2 approches :
* kapre / Keras -> coremltools = FAILED
* Kapre / Keras -> Tensorflow -> tf-coreml = FAILED

Hints anyone ?

How to feed in train data

Hi,

Please can you tell me, how you feed in the data for training and prediction tasks.
Do you load the signals from mp3/WAV into numpy arrays?

It would be great if you can provide a full example with training and prediction on real data.

Raphael

Test PyPI package for 0.1.3.1 still broken

Though the fix introduced in 0.1.3.1 has been incorporated in the main PyPI index, the fix is not present in the Test PyPI index, which can be verified by looking at the files in the archive: https://test.pypi.org/project/kapre/#files. This is not an issue for production code, but when testing PyPI packages that depend on kapre, since (AFAIK) even if specifying --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple, pip will still look at the test index first. Since there is indeed a version 0.1.3.1 on Test PyPI, it pulls this version without the fix.

Kapre with Keras 2.3.1 issues

For the records, while trying to debug some issues I had with a new environment I realised that my kapre melspectrogram were wrong when using keras 2.3.1 (latest) and 2.3.0

Reverting to keras 2.2.5 (like my stable environment) solved the Issue. (Keras 2.2.4 also good)

parameters were as this:

fmin = 70   
fmax = 496
hop_length = 512
n_fft = 4096 
n_mels =  110

model = Sequential()
model.add(Melspectrogram(sr=44100, n_mels=n_mels, fmin=fmin, fmax=fmax,
  n_dft=n_fft, n_hop=hop_length, input_shape=input_shape, power_melgram=1,
  return_decibel_melgram=True, 
  trainable_kernel=True, name='melgram'))  

With Keras 2.2.5
Screenshot 2019-11-26 at 10 51 37

With Keras 2.3.1
Screenshot 2019-11-26 at 10 49 38

Now It looks a lot like the issue I encountered trying to use #58 & #56 with the embeded tensorflow.keras (issues tested with tensorflow 2 and latest tensorflow 1.15)
BUT the keras version embedded with tensorflow 1.15.0 is supposed to be keras 2.2.4-tf and keras 2.2.4 standalone didn't display this issue.

Amplitude-to-decibel conversion produces different results on different batches

Related to #16, I found another issue that contributes to different prediction results depending on batch size (and the batches themselves). In particular, it occurs when using converting spectrograms to decibels.

https://github.com/keunwoochoi/kapre/blob/master/kapre/backend_keras.py#L17

The maximum is taken over the entire tensor, instead of per example in the batch. This results in different normalization when the examples in a batch are different.

Noise layers during training

Namaste,

considering the AdditiveNoise-layer:

def call(self, x):
        if self.random_gain:
            noise_x = x + K.random_normal(shape=K.shape(x),
                                          mean=0.,
                                          stddev=np.random.uniform(0.0, self.power))
        else:
            noise_x = x + K.random_normal(shape=K.shape(x),
                                          mean=0.,
                                          stddev=self.power)

        return K.in_train_phase(noise_x, x)

to me this clearly computes the noise always. That is, during training and not during training. If so, maybe this could be sped up with a function like in Keras? See:

def call(self, inputs, training=None):
        if 0 < self.rate < 1:
            def noised():
                stddev = np.sqrt(self.rate / (1.0 - self.rate))
                return inputs * K.random_normal(shape=K.shape(inputs),
                                                mean=1.0,
                                                stddev=stddev)
            return K.in_train_phase(noised, inputs, training=training)
        return inputs

Best,
Tristan

Issue with expected dimensions on basic example

Hello,

I'm trying to use Kapre to do some basic audio classification as a starting point before I dive into deeper projects. Right now, I'm having an issue compiling and fitting the most basic of models.

Right now, I have 2 classes, 556 samples, and I'm using a bit rate of 22050.

Per the Using Mel-spectrogram section in the readme, I have my input data shaped like this:

>>> x.shape
(556, 1, 22050)
>>> y.shape
(556, 2)

I tried using the exact model from the README in the aforementioned section (except I substituted in the correct sampling rate (22050) and channel count (1) for my use case.

With those updates, it looks something like this:

input_shape = (1, 22050)
sr = 22050
model = Sequential()
# A mel-spectrogram layer
model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=input_shape,
                         padding='same', sr=sr, n_mels=128,
                         fmin=0.0, fmax=sr/2, power_melgram=1.0,
                         return_decibel_melgram=False, trainable_fb=False,
                         trainable_kernel=False,
                         name='trainable_stft'))
# Maybe some additive white noise.
model.add(AdditiveNoise(power=0.2))
# If you wanna normalise it per-frequency
model.add(Normalization2D(str_axis='freq')) # or 'channel', 'time', 'batch', 'data_sample'
# After this, it's just a usual keras workflow. For example..
# Add some layers, e.g., model.add(some convolution layers..)
# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification
model.fit(x, y)

I get this stacktrace:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/anaconda3/envs/py3_6/lib/python3.7/site-packages/keras/engine/training.py", line 952, in fit
    batch_size=batch_size)
  File "/anaconda3/envs/py3_6/lib/python3.7/site-packages/keras/engine/training.py", line 789, in _standardize_user_data
    exception_prefix='target')
  File "/anaconda3/envs/py3_6/lib/python3.7/site-packages/keras/engine/training_utils.py", line 128, in standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking target: expected normalization2d_14 to have 4 dimensions, but got array with shape (556, 2)

OS: Mac 10.14.1
Python 3.7.2 (though it was happening with 3.6 earlier today too)
Kapre: 0.1.3.1
keras: 2.2.4
tensorflow: 1.13.0rc2

Any ideas on what I'm missing?

Compatibility with Keras2

Hello! I pretty much suppose that you are aware of the current Keras Api changes. Would it be ok for you to support thèse changes? (There are some Breaking changes at the layer implementation). I could submit a PR if that could help you.

When running CNTK with Keras, I'm getting error.

When running CNTK with Keras, I'm getting this error.
ValueError: CNTK backend: the permute pattern [0, 2, 1] requested permute on dynamic axis, which is not supported. Please do permute on static axis.

This is where this error happens:
x = K.permute_dimensions(x, [0, 2, 1])

Is there any temporary workaround? Any hints how I should change the code?
I tried using:
x = cntk.transpose(x, perm = (0, 2, 1))

Computing the correct spectrogram

Seems like there's a bug that causes some small difference on the results in the spectrogram. Would anyone be interested in fixing it?

missing "import os"?

On file kapre/datasets.py:
os used but never imported.

for set in set_names:
fnames = [f.lstrip('._') for f in os.listdir(os.path.join(save_path, 'jamendo', set)) \

Error appeared while running dl4mir's "main_preprocess.py jamendo"

Spectrogram integration issues

I'm a bit stumped on this.
TLDR; im getting weird behavior using kapre as a replacement for spectrogram feats

I have a model and traditionally i have precomputed 64 mel x 128 frame specs and fed them into the first layer. I tried integrating kapre because it seemed like a great idea (and i still think is). I added in the kapre mel spec layer tuned the same way I was generating my precomputed ones (librosa based) and nothing about them is trainable. My new input was pickled raw mono 16khz wav files (~5 seconds).

I started training and noticed there was very little learning taking place compared to the original model. I poked around tried adjusting my input shape, made sure it was (None, 1, 79872) were ~80k was the number of samples per wav.

I also did a similar spectrogram comparison as in the examples/ and the kapre version looked nearly identical to my original. The values were scaled slightly differently, but more or less contained the same information. For example [ -14.019988 -11.445856] became [ -51.93689 -49.89946 ] , see the attached specs for comparison:

Original Version
test_spec_original

Kapre Version
test_spec_kapre

They basically look the same which is why im confused/surprised the kapre version doesn't train. I tried with and without normalization (frequency wise), transposing, scaling differently and I always have the same issue, they all seem to stop improving after a few epochs. My original model trained for > 50 before it stopped improving.

At this point im trying to figure out if this is a me problem or something going on with kapre so i figured it was worth a shot asking. Thanks for any help resolving this!

Lastly, here is a snippet of my model summary for the old and new kapre one

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
====================================================================
in_layer (InputLayer)           (None, 64, 128)      0                                            
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 64, 128, 1)   0           in_layer[0][0]                   
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 64, 128, 64)  3200        reshape_1[0][0]                  
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 64, 128, 64)  256         conv2d_1[0][0]                   
__________________________________________________________________________________________________
elu_1 (ELU)                     (None, 64, 128, 64)  0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 32, 64, 64)   0           elu_1[0][0]                      
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 32, 64, 64)   0           max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 32, 64, 64)   200768      dropout_1[0][0]                  
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 32, 64, 64)   256         conv2d_2[0][0]                   
__________________________________________________________________________________________________
elu_2 (ELU)                     (None, 32, 64, 64)   0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
.... and 2 more simlar conv layers and some dense layers 
output is multple scaler values,
cost funtion uses mse for each output, 
this traditionally worked well

similarly, the kapre version looked like this, identical other than the first few layer

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
===================================================================
in_layer (InputLayer)           (None, 1, 79872)     0                                            
__________________________________________________________________________________________________
log-power-mel-spec (Melspectrog (None, 64, 128, 1)   1083456     in_layer[0][0]                   
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 64, 128, 64)  3200        in_layer[0][0]                  
__________________________________________________________________________________________________
... and the same other stuff

Finally here is a validation metric plotted where the gray line is the kapre one
image

random cropping won't be implemented

Note: although it can be implemented and can be useful, but it's tricky what to do in inference. Probably (random) cropping should be done on cpu side when it loads data.

Melspectrogram() no arguments 'border_mode' or 'power'

I just copied your Mel-spectogram code found in the readme:

# Keras model setup
input_shape = (6, 44100)
sr = 44100
model = Sequential()
# A mel-spectrogram layer
model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=input_shape,
                         sr=sr, n_mels=128,
                         fmin=0.0, fmax=sr/2, power=1.0,
                         return_decibel_melgram=False, trainable_fb=False,
                         trainable_kernel=False,
                         name='trainable_stft'))

, but that gave the below 2 errors.
If I would delete these 2 arguments, the .add() doesn't give any error.

'border_mode'

TypeError                                 Traceback (most recent call last)
<ipython-input-10-41496f793cdb> in <module>()
      9                          return_decibel_melgram=False, trainable_fb=False,
     10                          trainable_kernel=False,
---> 11                          name='trainable_stft'))

~/anaconda3/envs/StoAU/lib/python3.6/site-packages/kapre-0.1.2.1-py3.6.egg/kapre/time_frequency.py in __init__(self, sr, n_mels, fmin, fmax, power_melgram, return_decibel_melgram, trainable_fb, **kwargs)
    254                  trainable_fb=False, **kwargs):
    255 
--> 256         super(Melspectrogram, self).__init__(**kwargs)
    257         assert sr > 0
    258         assert fmin >= 0.0

~/anaconda3/envs/StoAU/lib/python3.6/site-packages/kapre-0.1.2.1-py3.6.egg/kapre/time_frequency.py in __init__(self, n_dft, n_hop, padding, power_spectrogram, return_decibel_spectrogram, trainable_kernel, image_data_format, **kwargs)
     96         self.power_spectrogram = float(power_spectrogram)
     97         self.return_decibel_spectrogram = return_decibel_spectrogram
---> 98         super(Spectrogram, self).__init__(**kwargs)
     99 
    100     def build(self, input_shape):

~/anaconda3/envs/StoAU/lib/python3.6/site-packages/keras/engine/topology.py in __init__(self, **kwargs)
    277         for kwarg in kwargs:
    278             if kwarg not in allowed_kwargs:
--> 279                 raise TypeError('Keyword argument not understood:', kwarg)
    280         name = kwargs.get('name')
    281         if not name:

TypeError: ('Keyword argument not understood:', 'border_mode')

'power'

TypeError                                 Traceback (most recent call last)
<ipython-input-11-0ae5a30e071e> in <module>()
      9                          return_decibel_melgram=False, trainable_fb=False,
     10                          trainable_kernel=False,
---> 11                          name='trainable_stft'))
     12 
     13 # border_mode='same',

~/anaconda3/envs/StoAU/lib/python3.6/site-packages/kapre-0.1.2.1-py3.6.egg/kapre/time_frequency.py in __init__(self, sr, n_mels, fmin, fmax, power_melgram, return_decibel_melgram, trainable_fb, **kwargs)
    254                  trainable_fb=False, **kwargs):
    255 
--> 256         super(Melspectrogram, self).__init__(**kwargs)
    257         assert sr > 0
    258         assert fmin >= 0.0

~/anaconda3/envs/StoAU/lib/python3.6/site-packages/kapre-0.1.2.1-py3.6.egg/kapre/time_frequency.py in __init__(self, n_dft, n_hop, padding, power_spectrogram, return_decibel_spectrogram, trainable_kernel, image_data_format, **kwargs)
     96         self.power_spectrogram = float(power_spectrogram)
     97         self.return_decibel_spectrogram = return_decibel_spectrogram
---> 98         super(Spectrogram, self).__init__(**kwargs)
     99 
    100     def build(self, input_shape):

~/anaconda3/envs/StoAU/lib/python3.6/site-packages/keras/engine/topology.py in __init__(self, **kwargs)
    277         for kwarg in kwargs:
    278             if kwarg not in allowed_kwargs:
--> 279                 raise TypeError('Keyword argument not understood:', kwarg)
    280         name = kwargs.get('name')
    281         if not name:

TypeError: ('Keyword argument not understood:', 'power')

trainable_stft error

Following your example but missing layer definition trainable_stft or something, can you provide example with error resolution?

`# 6 channels (!), maybe 1-sec audio signal
input_shape = (6, 44100) 
sr = 44100
model = Sequential()
model.add(Melspectrogram(n_dft=512, n_hop=256, input_shape=src_shape,
                         border_mode='same', sr=sr, n_mels=128,
                         fmin=0.0, fmax=sr/2, power=1.0,
                         return_decibel=False, trainable_fb=False,
                         trainable_kernel=False
                         name='trainable_stft'))`

  File "<ipython-input-24-cea5588ddf1e>", line 13
    name='trainable_stft'))
       ^
SyntaxError: invalid syntax

GPU Acceleration

Hi there!

I really, really like kapre. Good work!

One thing. I am curious. How well is kapre GPU accelerated? You are using librosa. And to the best of my knowledge, librosa is CPU-only. Could using kapre-layers create bottlenecks when it comes to training time?

Best,
Tristan

TODO

  • #16 : make all the normalization would be done per sample.
  • perhaps add phase computation in STFT (it's not the most efficient though)
  • perhaps I can deprecate the support for theano

Data preperation

Hello, is there any specific requirement on how to load our data? I suppose the dimensions have to be (batchesx1xsamples) but is there any specific constraint on the normalization?

Inverse Spectrogram and Mel-Spectrogram Layer?

Namaste!

kapre has become an integral part of all my audio Deep Learning experiments. Powerful! Thanks for providing such a great software!

I was thinking... I guess it would make sense to have layers for inverse spectrogram and inverse mel-spectrogram. Thinking about Autoencoders, this would be even more powerful. I know that reconstructing samples from spectrograms is not the best, but it is possible to a certain degree.

What do you think about that feature request?

Best,
Tristan

output channel of melspectrogram

Hi,
I am trying to run a 1D Conv on a Melspectrogram with kapre, but it seems like the Melspectrogram layer assumes that 2D operation will be done subsequently by giving a 4D output.
So at this moment I am removing the channel dim and swapping axes after the Melspectrogram layer.
Any thoughts on allowing a 3D output with no channel dim and appropriate input dimension ordering for 1D operation for Keras? (or am I missing out on something from the doc..?)

Thanks :)

Different prediction results depending on inference batch size

Hi @keunwoochoi !

I'm enjoying your Kapre library, as I'm currently using it in my music-based deep learning project...

In my deep network, I'm using a similar architecture as the ones you have in your music auto tagging repo, but I have replaced manual input audio preprocessing and the batch normalization layers in the network with their Kapre equivalent layers (like Melspectrogram and Normalization2D).

However, I am getting different prediction results when I change the batch size for the number of audio samples I am predicting at once.

I believe this is because your Normalization2D layer (which I think mimics Keras's BatchNormalization) is recalculating the mean/std, etc. during testing mode (aka. Keras.learning_phase == 0). This might cause the difference that I am experiencing from changing the batch size during batch prediction...

Is it possible to fix this?

Update: You can see Keras's implementation of BatchNormalization here: https://github.com/fchollet/keras/blob/master/keras/layers/normalization.py

From what I observe, it seems that Keras handles this issue by checking if the model is in testing mode, and if it is, it simply performs the batch normalization by using the mean and variance values learned through training (via storing them and updating these weights)..

How can we implement this functionality into Normalization2D, so that it also produces accurate prediction results when predicting single and/or batch samples?

Or, can we maybe rewrite Normalization2D, such that it simply extends Keras's BatchNormalization and only passes the axis to it (determined by the str_axis)? I think this would resolve this issue and also reduce the amt. of code needed for this Kapre layer...

Thanks!

Hey! The input is too short!

Hi,

I'm encountering an assertion problem when calling your code with a Tensorflow backend.

input_shape = (44100,1)

Could this be a be a problem with "channels_first" / "channels_last"?

Best,
Alex

kapre install error

Hi,

I installed kapre in two Ubuntu servers. Same problems happened. I don't know what is wrong.
When I imported kapre in python, erros is down below:

/usr/local/lib/python2.7/dist-packages/llvmlite-0.20.0-py2.7.egg/llvmlite/binding/libllvmlite.so: cannot open shared object file: No such file or directory

I really don't know how to fix this error. I wonder could you please help me solve this error.

Input data representation

Hi!

I just wanted to re-check on the input data representation requirements, because I think this is not mentioned in the documentation.

Are the frequency transformation layers expecting integer or float values as input?
According to the provided example, where you use librosa to load the audio data, it should be a float32 representation scaled between -1 and 1.

Is this a fixed requirement or would it also work with int16 values which are returned by scipy.io.wavfile.read()?

Best,
Alex

Numba installation warning and subsequent errror

Hi I installed Kapre in Anaconda

The installation has a few warnings about Numba:

Running numba-0.38.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-epmbsvi/numba-0.38.0/egg-dist-tmp-a6780hd
warning: no files found matching '.inc' under directory 'numba'
warning: no files found matching '
.ipynb' under directory 'docs'
warning: no files found matching '*.txt' under directory 'docs'
no previously-included directories found matching 'docs/_build'
no previously-included directories found matching 'docs/gh-pages'
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
zip_safe flag not set; analyzing archive contents...
numba.pycache._dispatcher.cpython-34: module references file
numba.pycache._dynfunc.cpython-34: module references file
numba.pycache._helperlib.cpython-34: module references file
numba.pycache.caching.cpython-34: module MAY be using inspect.getsource
numba.pycache.errors.cpython-34: module references file
numba.pycache.mviewbuf.cpython-34: module references file
numba.pycache.six.cpython-34: module references path
numba.pycache.tracing.cpython-34: module MAY be using inspect.trace
numba.annotations.pycache.type_annotations.cpython-34: module references file
numba.cuda.cudadrv.pycache._extras.cpython-34: module references file
numba.cuda.tests.pycache.init.cpython-34: module references file
numba.cuda.tests.cudadrv.pycache.init.cpython-34: module references file
numba.cuda.tests.cudadrv.pycache.test_linker.cpython-34: module references file
numba.cuda.tests.cudapy.pycache.init.cpython-34: module references file
numba.cuda.tests.cudasim.pycache.init.cpython-34: module references file
numba.cuda.tests.nocuda.pycache.init.cpython-34: module references file
numba.hsa.tests.hsadrv.pycache.test_driver.cpython-34: module references file
numba.jitclass.pycache._box.cpython-34: module references file
numba.npyufunc.pycache._internal.cpython-34: module references file
numba.npyufunc.pycache.workqueue.cpython-34: module references file
numba.pycc.pycache.cc.cpython-34: module references file
numba.runtime.pycache._nrt_python.cpython-34: module references file
numba.scripts.pycache.generate_lower_listing.cpython-34: module references file
numba.scripts.pycache.generate_lower_listing.cpython-34: module MAY be using inspect.getsourcefile
numba.testing.pycache.ddt.cpython-34: module MAY be using inspect.getsourcefile
numba.testing.pycache.loader.cpython-34: module references file
numba.testing.pycache.main.cpython-34: module references file
numba.tests.pycache.init.cpython-34: module references file
numba.tests.pycache.cffi_usecases.cpython-34: module references file
numba.tests.pycache.ctypes_usecases.cpython-34: module references file
numba.tests.pycache.test_cfunc.cpython-34: module references file
numba.tests.pycache.test_dispatcher.cpython-34: module references file
numba.tests.pycache.test_pycc.cpython-34: module references file
numba.tests.pycache.test_stencils.cpython-34: module MAY be using inspect.getsource
numba.tests.npyufunc.pycache.init.cpython-34: module references file
numba.tests.npyufunc.pycache.test_caching.cpython-34: module references file
numba.typeconv.pycache._typeconv.cpython-34: module references file
creating /gruntdata/sicheng.wang/anaconda3/envs/keras/lib/python3.4/site-packages/numba-0.38.0-py3.4-linux-x86_64.egg
Extracting numba-0.38.0-py3.4-linux-x86_64.egg to /gruntdata/sicheng.wang/anaconda3/envs/keras/lib/python3.4/site-packages
Adding numba 0.38.0 to easy-install.pth file
Installing pycc script to /gruntdata/sicheng.wang/anaconda3/envs/keras/bin
Installing numba script to /gruntdata/sicheng.wang/anaconda3/envs/keras/bin

After installing, importing kapre results in the following error (may be unrelated):

Traceback (most recent call last):
File "/gruntdata/sicheng.wang/anaconda3/envs/keras/lib/python3.4/site-packages/llvmlite-0.23.0-py3.4.egg/llvmlite/binding/ffi.py", line 119, in
lib = ctypes.CDLL(os.path.join(_lib_dir, _lib_name))
File "/gruntdata/sicheng.wang/anaconda3/envs/keras/lib/python3.4/ctypes/init.py", line 351, in init
self._handle = _dlopen(self._name, mode)
OSError: /gruntdata/sicheng.wang/anaconda3/envs/keras/lib/python3.4/site-packages/llvmlite-0.23.0-py3.4.egg/llvmlite/binding/libllvmlite.so: cannot open shared object file: No such file or directory

conda install lvmlite only updates the packets to 0.20.0. Manual compilation of libllvmlite has this error:undefined symbol: LLVMInitializeInstCombine

pip package for 0.1.3 broken

First: thanks for sharing your code!

I couldn't import kapre after installing it via pip.

Investigating the archive on pypi revealed a problem with uppercases in the following filenames:

kapre-0.1.3
├── kapre
│   ├── Filterbank.py
│   └── Utils.py

Furthermore, the links on pypi to the repo are broken

Home Page: http://github.com/keunwoo/kapre/
Download URL: http://github.com/keunwoochoi/kapre/releases

why magnitute_to_db scale to [-?, 0]

log_spec = log_spec - K.max(log_spec, axis=axis, keepdims=True) # [-?, 0]

its different from librosa version:
if top_db is not None:
if top_db < 0:
raise ParameterError('top_db must be non-negative')
log_spec = np.maximum(log_spec, log_spec.max() - top_db)

I doubts making spectrogram max value to 0 is standard practice.

htk=true for mel frequencies

We noticed the current implenetation of the mel_frequencies function (based on Librosa) doesn't include the htk=True option, which is handy when training CNNs because then the frequency scale is fully logarithmic which, in principle, makes more sense for frequency invariant convolutional filters.

What was the motivation for removing this? Any chance it can be added?

how to build audio classifier using this?

I have folder containing 100 samples of 1-second long music files.
When given a an test audio (1-2 minutes long), I want to have a keras model, that will identify where in the audio does the trained music samples occur.

Please tell me how to achieve this using kapre.

Thank you.

Pip?

It seems you were on pip, but are no longer. Is there anything I could do to help get kapre back on there? We want to use this library in a commercial application, and for our process pip packages are easier to support than a git repository.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.