bapalto / birdsong-keras Goto Github PK

Birdsong classification in noisy environments with Convolutional Neural Networks implemented in Keras Deep Learning library for the BIRDCLEF 2016 competition. Can be fine-tuned to arbitrary audio classification task.

License: GNU General Public License v3.0

Shell 1.14% Python 98.86%

birdsong-keras's Introduction

Trainig scripts for deep convolutional neural network based audio classification in Keras

The following scripts were created for the BirdCLEF 2016 competition by Bálint Czeba and Bálint Pál Tóth.

The LifeCLEF bird identification challenge provides a largescale testbed for the system-oriented evaluation of bird species identifi- cation based on audio recordings. One of its main strength is that the data used for the evaluation is collected through Xeno-Canto, the largest network of bird sound recordists in the world. This makes the task closer to the conditions of a real-world application than previous, similar initiatives. The main novelty of the 2016-th edition of the challenge was the inclusion of soundscape recordings in addition to the usual xeno-canto recordings that focus on a single foreground species. This paper reports the methodology of the conducted evaluation, the overview of the systems experimented by the 6 participating research groups and a synthetic analysis of the obtained results. (More details: http://www.imageclef.org/lifeclef/2016/bird)

With some tweeks (reading meta-data and modifing network structure / how the spectogram is preprocessed) it is possible to apply it to arbitrary audio classification problems.

Citation

Please cite the following paper if this code was useful for your research:

Tóth Bálint Pál, Czeba Bálint, "Convolutional Neural Networks for Large-Scale Bird Song Classification in Noisy Environment", In: Working Notes of Conference and Labs of the Evaluation Forum, Évora, Portugália, 2016, p. 8 Download from here (PDF): http://ceur-ws.org/Vol-1609/16090560.pdf

@article{tothczeba,
    author =       "B\'{a}lint P\'{a}l T\'{o}th, B\'{a}lint Czeba",
    title =        "{Convolutional Neural Networks for Large-Scale Bird Song Classification in Noisy Environment}",
    booktitle =    "{Working Notes of Conference and Labs of the Evaluation Forum},
    pages =        "8",
    year =         "2016",
}

Prerequisites

You will need SOX for wave file resampling and Keras deep learning frameworks and some necessary modules. At the time of writeing you can install them in the following way:

sudo apt-get install sox
sudo apt-get install python-tk
sudo pip install scipy
sudo pip install matplotlib
sudo pip install sklearn
sudo pip install tensorflow-gpu
sudo pip install keras

The code is tested under Python 2.7. with TensorFlow (GPU) 1.0.0a0 and Keras 1.1.1. backend, NVidia Titan X 12GB GPU.

If you use TensorFlow as a backend with Keras 1.x you should set

"image_dim_ordering": "th",

in ~/.keras/keras.json configuration file.

In Keras 2 "image_dim_ordering" is deprecated. If you use TensorFlow + Keras 2.x, you should change the "image_data_format" setting to "channels_first".

Directory structure and files

doall.sh                - run this script and it will do everything (you will need plenty of disk space > 100 GB)
preprocess/loadData.py  - responsible for preprocessing the data (wavs and XML meta-data)
preprocess/sample_wavs_to_16k.sh - simple script that resamples wave files to 16 kHz with SOX
preprocess/xmltodict.py - XML processing from https://github.com/martinblech/xmltodict
train/trainModel.py     - after preprocessing this script trains the neural networks
train/model-AlexNet.py  - AlexNet inspired model for audio classification
train/model-BirdClef.py - Another convolutional neural net model for audio classification
train/MAPCallback.py    - Script to calculate MAP scores during training the neural nets
train/generateImages.py - Generate images from the preprocessed spectogram for visualization purposes
train/io_utils_mod.py   - Functions for loading and saving data to HDF5
train/log.py            - Functions for logging purposes
predict/predict.py      - Predict after preprocessing and training is done

Training (and download data and preprocess)

For training you have to simply run

./doall.sh

Be aware that this will download all the data (>50 GB) from http://otmedia.lirmm.fr/LifeCLEF/BirdCLEF2016/ to

birdclef_data

directory, unpack it and resample to 16 kHz and preprocess it into HDF5 files. You need cca. 280 GB of free space for the whole process. If you would like to put the data to somewhere else, please modify the doall.sh, preprocess/loadData.py and train/trainModel.py scripts.

The download process, preprocessing and training takes 4-5 days on an i7 CPU + Titan X GPU.

Prediction

After the preprocessing and training is do simpy run the following script to make predictions on test data:

./predict.sh

The prediction results will be written in a .csv file in the predict/ directory.

birdsong-keras's People

Contributors

Stargazers

Watchers

Forkers

bmsasilva scmyzc skallumadi lvaleriu bw4sz chincol aascode qihongda518 fitrialif wanderlustr zclccc ashishpatel26 tengbing88 briandannenmueller datatales-with-pankaj blightedway

birdsong-keras's Issues

AlexNet ValueError: Negative dimension size caused by subtracting 16 from 1

Hi,

When running trainModel.py, I get the following error:

model-AlexNet.py:45: UserWarning: Update your `Conv2D` call to the Keras 2 API: `Conv2D(kernel_initializer="glorot_normal", activation="relu", input_shape=(1, 200, 3..., padding="valid", strides=(6, 6), filters=96, kernel_size=(16, 16))`
  subsample=(6, 6)
Traceback (most recent call last):
  File "model-AlexNet.py", line 45, in <module>
    subsample=(6, 6)
  File "/anaconda/lib/python2.7/site-packages/keras/models.py", line 430, in add
    layer(x)
  File "/anaconda/lib/python2.7/site-packages/keras/engine/topology.py", line 578, in __call__
    output = self.call(inputs, **kwargs)
  File "/anaconda/lib/python2.7/site-packages/keras/layers/convolutional.py", line 164, in call
    dilation_rate=self.dilation_rate)
  File "/anaconda/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2893, in conv2d
    data_format='NHWC')
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 639, in convolution
    op=op)
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 308, in with_space_to_batch
    return op(input, num_spatial_dims, padding)
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 631, in op
    name=name)
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 129, in _non_atrous_convolution
    name=name)
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
    data_format=data_format, name=name)
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
    op_def=op_def)
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2329, in create_op
    set_shapes_for_outputs(ret)
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1717, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1667, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 610, in call_cpp_shape_fn
    debug_python_shape_fn, require_shape_fn)
  File "/anaconda/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 676, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Negative dimension size caused by subtracting 16 from 1 for 'conv2d_1/convolution' (op: 'Conv2D') with input shapes: [?,1,200,310], [16,16,310,96].

Perhaps it's a change from Keras 1 to 2, but it seems that the `input_shape' should be in form of (200,310,1), the latter being depth, rather (1,200,310).

The Version of tensorflow and keras

I had installed the tensorflow-cpu 1.0.1 and keras 2.0.2.

When I ran the script of trainModel.py, I got an error as below:

Traceback (most recent call last):
File "trainModel.py", line 101, in
execfile(modelPath)
File "./model-AlexNet.py", line 25, in
from keras.models import Sequential
File "/usr/local/lib/python2.7/dist-packages/keras/init.py", line 3, in
from . import activations
File "/usr/local/lib/python2.7/dist-packages/keras/activations.py", line 3, in
from . import backend as K
File "/usr/local/lib/python2.7/dist-packages/keras/backend/init.py", line 64, in
from .tensorflow_backend import *
File "/usr/local/lib/python2.7/dist-packages/keras/backend/tensorflow_backend.py", line 1, in
import tensorflow as tf
File "/usr/local/lib/python2.7/dist-packages/tensorflow/init.py", line 24, in
from tensorflow.python import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/init.py", line 75, in
from tensorflow.core.framework.graph_pb2 import *
File "/usr/local/lib/python2.7/dist-packages/tensorflow/core/framework/graph_pb2.py", line 16, in
from tensorflow.core.framework import node_def_pb2 as tensorflow_dot_core_dot_framework_dot_node__def__pb2
File "/usr/local/lib/python2.7/dist-packages/tensorflow/core/framework/node_def_pb2.py", line 16, in
from tensorflow.core.framework import attr_value_pb2 as tensorflow_dot_core_dot_framework_dot_attr__value__pb2
File "/usr/local/lib/python2.7/dist-packages/tensorflow/core/framework/attr_value_pb2.py", line 16, in
from tensorflow.core.framework import tensor_pb2 as tensorflow_dot_core_dot_framework_dot_tensor__pb2
File "/usr/local/lib/python2.7/dist-packages/tensorflow/core/framework/tensor_pb2.py", line 16, in
from tensorflow.core.framework import resource_handle_pb2 as tensorflow_dot_core_dot_framework_dot_resource__handle__pb2
File "/usr/local/lib/python2.7/dist-packages/tensorflow/core/framework/resource_handle_pb2.py", line 22, in
serialized_pb=_b('\n/tensorflow/core/framework/resource_handle.proto\x12\ntensorflow"m\n\x0eResourceHandle\x12\x0e\n\x06\x64\x65vice\x18\x01 \x01(\t\x12\x11\n\tcontainer\x18\x02 \x01(\t\x12\x0c\n\x04name\x18\x03 \x01(\t\x12\x11\n\thash_code\x18\x04 \x01(\x04\x12\x17\n\x0fmaybe_type_name\x18\x05 \x01(\tB4\n\x18org.tensorflow.frameworkB\x13ResourceHandleProtoP\x01\xf8\x01\x01\x62\x06proto3')
TypeError: init() got an unexpected keyword argument 'syntax'

So can you tell me the version of tensorflow and keras?

how to use gpu

thanks. I'm inspired by the code.

[QUESTION] How to use my own training data

I have two folders. Each folder has one type of sound. (about 100 1 second samples each type)
Please suggest how to train model on my data.

bapalto / birdsong-keras Goto Github PK

birdsong-keras's Introduction

Trainig scripts for deep convolutional neural network based audio classification in Keras

Citation

Prerequisites

Directory structure and files

Training (and download data and preprocess)

Prediction

birdsong-keras's People

Contributors

Stargazers

Watchers

Forkers

birdsong-keras's Issues

AlexNet ValueError: Negative dimension size caused by subtracting 16 from 1

The Version of tensorflow and keras

how to use gpu

[QUESTION] How to use my own training data

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent