columbia_e6891

Reproduction Package for Final Project of Columbia University course EECS E6891: Reproducing Computational Research. The work done is aimed at reproducing the paper 'Data mining applied to acoustic bird species recognition' by Vilches, E., Escobar, I. A., Vallejo, E. E., & Taylor, C. E. (2006, August). In Pattern Recognition, 2006. ICPR 2006. 18th International Conference on (Vol. 3, pp. 400-403). IEEE.

Software Pre-requisites. In order to run the project, you should:

Have Python installed. The project was build on Python 2.7.6

Have the following Python libraries installed: numpy, matplotlib, scipy
Have Matlab
Download Weka Softwate version 3.4.19 (http://www.cs.waikato.ac.nz/ml/weka/) from http://sourceforge.net/projects/weka/files/weka-3-4/

Data: Data is provided by Macaulay Library at Cornell University. There are two ways to proceed with the project in terms of data. 1) Use the original sound recordings which have been provided under license by the Macaulay Library, 2) Use .mat files, which already contain the segmented sound data and its MFCC features. Using the later approach will save a lot of time (as sounds will not have to be segmented) but .mat files take up about 15GB of space. Using the first approach requires less initial data, around 3 GB, but takes massive processing time. On my Macbook air, just running the segmentation algorithm on all calls took over 17 hours to complete! The scripts are designed to work with both the approaches, without any manual interference. There is, however, a third approach too, a reduced data set containing only the sound files which were used in the project. This is the data set which is being used in the reproduction instructions below.

#Step by step Instructions:

Clone the Github repository to your local computer

Download the zipped file bird_sounds_data_reduced.zip from (https://drive.google.com/file/d/0B1Ywmwt5sCj3TWpLcTNobFU4MkU/edit?usp=sharing) and put it in the source folder (directory where the script folder is), unzip bird_sounds_data_reduced.zip and rename the unzipped folder bird_sounds_data_reduced to bird_sounds_data.
Go to the directory named scripts
Open Matlab and run main_step1.m (wait for it to finish)
Run main_step2.py in Python
Classification (1,2,3 out of 5 approaches)
- Open Weka Software> Explorer> OpenFile> Select: 'features_nom_revised.csv'
- In Weka, go to Tab Explorer>Filters> select: unsupervised > attributes > numericToNominal, and apply!
- In Weka, go to Tab: Classification>Classifier>Open> Select: 'classifier/trees/id3; -> Start. Note down the accuracy *** IMPORTANT *** Copy the tree from the output of Weka (only the tree) and save it in new 'tree_id3.txt' file.
- In Weka, go to Tab: Classification>Classifier>Open> Select: 'classifier/trees/j48; -> Start. Note down the accuracy *** IMPORTANT *** Copy the tree from the output of Weka (only thetree) and save it in new 'tree_j48.txt' file.
- In Weka, go to Tab: Classification>Classifier>Open> Select: 'classifier/bayes/NaiveBayes; -> Start. Note down the accuracy
Dimensionality Reduction, carried out in Python script 'main_step3.py'. Please run main_step3.py to create two new .csv files, with reduced datasets using ID3 and J4.8. The the new datasets are named 'features_nom_id3.csv' and 'features_nom_j48.csv'
Classification (4, 5 approach)
- Open Weka Software> Explorer> OpenFile> Select: 'features_nom_id3.csv'
- In Weka, go to Tab Explorer>Filters> select: unsupervised > attributes > numericToNominal, and apply!
- In Weka, go to Tab: Classification>Classifier>Open> Select: 'classifier/bayes/NaiveBayes; -> Start. Note down the accuracy
- Open Weka Software> Explorer> OpenFile> Select: 'features_nom_j48.csv'
- In Weka, go to Tab: Classification>Classifier>Open> Select: 'classifier/bayes/NaiveBayes; -> Start. Note down the accuracy
Compare the accuracies of the five approaches to classification

If you have any questions, feel free to email at [email protected] Talha Jawad Ansari (tja2117), 2014.

teejays / columbia_e6891 Goto Github PK

columbia_e6891's Introduction

columbia_e6891

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent