A model for bird sound classification

The model for training the bird classifier.

Repo layout

The complete list of JibJib repos is:

jibjib: Our Android app. Records sounds and looks fantastic.
deploy: Instructions to deploy the JibJib stack.
jibjib-model: Code for training the machine learning model for bird classification
jibjib-api: Main API to receive database requests & audio files.
jibjib-data: A MongoDB instance holding information about detectable birds.
jibjib-query: A thin Python Flask API that handles communication with the TensorFlow Serving instance.
gopeana: A API client for Europeana, written in Go.
voice-grabber: A collection of scripts to construct the dataset required for model training

Overview

CNN for Spectrogram-wise Classification

In vggish_train.py we are training a convolutional classifier model for an arbitrary number of birds. We take a pretrained VGGish/ Audioset model by Google and finetune it by letting it iterate during training on more than 80,000 audio samples of 10 second length. Please read the following papers for more information:

Hershey, S. et. al., CNN Architectures for Large-Scale Audio Classification, ICASSP 2017
Gemmeke, J. et. al., AudioSet: An ontology and human-labelled dataset for audio events, ICASSP 2017

Before you can start, you first need to download a VGGish checkpoint file. You can either use a checkpoint provided by or very own model that has been additionally trained for more than 100 hours and 60 epochs on a GPU cluster inside a Docker container.

The original final layer is cut off and replaced with our own output nodes.

During the first training step a directory containing labeled bird songs is iterated over and each .wav file is converted into a spectrogram where the x-axis is the time and the y-axis symbolyzes the frequency. For instance, this is the spectrogram of a golden eagles call:

Furthermore, each bird class is one-hot-encoded and then in pairs of features and corresponding labels fed into the model. After, VGGish's convolutional filters run over each spectrogram and extract meaningful features. The following graphic gives a short overview about how after some convolutions and subpooling the extracted features are then fed into the fully connected layer just like in any other CNN:

After every epoch a snapshot of the models weights and biases is saved on disk. In the next step we can restore the model to either do a query or continue with training.

We are deploying the model by enabling TensorFlow Serving to reduce response time drastically. Check out to learn more about how we implemented TensorFlow Serving for our model.

New: Convolutional LSTM for Sequence Classification

In train_LSTM.py we provide a Convolutional LSTM for audio event recognition. Similar to vggish_train.py it performs classification tasks on mel spectrograms. In contrast to vggish_train.py, it does not perform a classification for each spectrogram but analyzes array of matrices and then performs a single classification on the entire sequence. C-LSTMs may outperform CNNs when data only contains sparse specific features that don't occure in every timestep.

Training

Docker

Get the container:

# GPU, needs nvidia-docker installed
docker pull obitech/jibjib-model:latest-gpu

# CPU
docker pull obitech/jibjib-model:latest-cpu

Create folders, if necessary:

mkdir -p output/logs output/train output/model input/data

Get the audioset checkpoint:

curl -O input/vggish_model.ckpt https://storage.googleapis.com/audioset/vggish_model.ckpt

Copy all training folders / files into input/data/

Get the bird_id_map.pickle:

curl -O input/bird_id_map.pickle https://github.com/gojibjib/voice-grabber/raw/master/meta/bird_id_map.pickle

Run the container:

docker container run --rm -d \
    --runtime=nvidia \
    -v $(pwd)/input:/model/input \
    -v $(pwd)/output:/model/output \
    obitech/jibjib-model:latest-gpu

For quickly starting training run:

# GPU
./train_docker.sh

# CPU
./train_docker.sh

Locally

Clone the repo:

git clone https://github.com/gojibjib/jibjib-model

Install dependencies, use python2.7:

# CPU training
pip install -r requirements.txt

# GPU training
pip install -r requirements-gpu.txt

Copy all training folders / files into input/data/

Get the audioset checkpoint:

curl -O input/vggish_model.ckpt https://storage.googleapis.com/audioset/vggish_model.ckpt

Get the bird_id_map.pickle:

curl -O input/bird_id_map.pickle https://github.com/gojibjib/voice-grabber/raw/master/meta/bird_id_map.pickle

Start training:

# Make sure to start the script from the code/ directory !
cd code
python ./vggish_train.py

You can then use modelbuilder.py to convert the model to protocol buffer.

ddcas / jibjib-model Goto Github PK

jibjib-model's Introduction

A model for bird sound classification

Repo layout

Overview

CNN for Spectrogram-wise Classification

New: Convolutional LSTM for Sequence Classification

Training

Docker

Locally

jibjib-model's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent