Giter VIP home page Giter VIP logo

bird_sound_recognition's Introduction

Bird Sound Recognition

A Python app that identifies Indian Bird sounds/songs. The motivation for building this app came from a personal experience in Covid19 pandemic lockdown. I used to stay a lot in home and used to see and hear a lot of different birds in my suburb, and slowly became fond of them. I had an epiphany and thought of building a sound classifier of bird sounds using machine learning techniques.

Data Collection

I collected the sounds of 10 individual birds most commonly found in India from a great website https://www.xeno-canto.org/ using a very useful script found at https://github.com/AgaMiko/xeno-canto-download.

Initially I collected all bird sounds based on the region India, but had to later limit my collection to 10 most common birds as learning was difficult with 997 different classes of birds with a very imbalanced ratio of classes in the dataset.

Data Preprocessing

I used the librosa library a lot in my data wrangling and preprocessing steps.

I performed the following preprocessing steps:

  1. Converted all the mp3 files to wav format
  2. Removed silent parts from the audio
  3. Created chunks of 10 seconds each for audio with large length
  4. Computed Mel-Frequency Cepstral Coefficients(MFCC) which are a small set of features (usually about 10-20) which concisely describe the overall shape of a spectral envelope to be used with Artificial Neural Network
  5. Generated Mel-Spectograms which is a visualization of the frequency spectrum of a signal, where the frequency spectrum of a signal is the frequency range that is contained by the signal, but is then converted to a mel scale. The images were then used as inputs for Convolutional Neural Network.
  6. I performed Data Augmentation on wav files using nlpaug library. I performed augmentation by increasing loudness, shifting the signal, and adding white noise to the audio to generate more inputs.
    download

Modelling using Artificial Neural Network

I performed manual stratified sampling of data points to handle the imbalance in class ratio.

  • I used the Sequential layer stack from Keras library to build the network.
  • 1 Input Layer - 2 Hidden Layers - 1 Output Layer
  • ReLU activation function was used for all the layers except for the output layer where softmax function was used to perform multi-class classification
  • Dropout regularization was used to prevent overfitting with keep_prob set at 0.5 for every layer.
  • Adam optimizer was used with the loss function used as "categorical_crossentropy"
    Unfortunately I couldn't get a really good accuracy using the MFCC features and hence decided to go for Convolutional Neural Network using Mel-Spectogram images as input.

Modelling using Convolutional Neural Network

I used transfer learning to train existing VGG-16 architecture for which you can checkout the architecture at https://www.geeksforgeeks.org/vgg-16-cnn-model/
The basic idea is to take advantage of pre-trained models which have learned a lot of features on a big dataset in a Computer Vision application and remove the final fully-connected layers in the pre-trained model and add our own small network with a softmax layer as the output layer to get multi-class classification. This way we get features from the VGG-16 called bottleneck features that we can use for our classifcation task.

  • Split the images with 70-25-5 split in seperate train-valid-test folders with each folder containing folders for each class
  • Generate bottleneck features from the pre-trained VGG-16 model and save them.
  • Load the train-valid-test data from the saved bottleneck features
  • Define your own network to which you will feed the bottleneck featrues
  • I used a similar architecture as before(in ANN) with some changes to Dropout keep_prob values and size of layers.
  • Adam optimizer was used with the loss function used as "categorical_crossentropy"
    I was able to achieve 90% accuracy even though I had few inputs for each class (around 600-700), thanks to transfer learning.

Future Scope

Currently, I'm working on creating an Android application which will record audio in real-time and predict the bird sound.
I have already taken steps to save the keras model from HDF5 format to TFLite format for use in the app.
Thanks for reading and stay tuned for further updates.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.