Giter VIP home page Giter VIP logo

bat-detector-msds's Introduction

MSDS Capstone Project: Bats!!!

Overview

This repository offers the following features:

  • Detection of bat search, social, and feedbuzz calls
  • A fast, customizable pipeline for automating application of the aforementioned detectors

Usage

Setup

We recommend using Python 3.9.x. Other versions may work, but they're not tested. We also recommend creating a Python virtual environment.

From repository root, run the following commands:

git submodule update --init --recursive
pip install -r requirements.txt

Usage

The following invokation generates a TSV in output_dir containing all detections:

python src/cli.py audio.wav output_dir/

Additionally, you can specify the number of processes used to process the audio and generate a CSV instead of TSV:

python src/cli.py --csv --num_processes=4 audio.wav output_dir/

Analytics Configuration

All of the analytical parameters are accessible in src/cfg.py. Have a look!

Adding custom detectors

src/cfg.py is also where new, custom detectors can be added. To add your own detector to the pipeline:

  1. Create a new class in src/models/ that inherits from src/models/detection_interface.py
  2. Override DetectionInterface's run() and get_name() methods
  3. Add your model's constructer to src/cfg.py in the models list, passing in any parameters needed in the constructor.

The pipeline executes the run() method of every model present in that aforementioned models list in src/cfg.py.

Update feedbuzz detection templates

To identify feedbuzzes, this repository uses a technique called template matching. We offer an initial set of templates, that is stored in src/models/bat_call_detector/templates/template_dict.pickle that could perform decently for feeding buzz from bat calls native to Seattle, Washington. The templates are generated based on the following steps:

  1. An individual feeding buzz is identified in an audio recording. The time and frequency of the feeding buzz are being identified manually.
  2. Run generate_template() function in src/models/bat_call_detector/feed_buzz_helper.py to generate template based on the time and frequency identified above.
  3. The template will be saved in a pickle object.

User can see what are the templates stored in the template_dict.pikle by running load_template() function in src/models/bat_call_detector/feed_buzz_helper.py. However, the details below are the templates used in the current pipeline.

Template Audio File Name Time (s) Frequency (kHz)
1 20210910_030000_time2303_LFbuzz.wav (9.762, 10.059) (14532.7, 29760.3)
2 20210910_033000.wav (70.637, 71.328) (19745, 28638.2)
3 20210910_033000.wav (620.663, 620.854) (12434.9,29910.9)
4 20210910_033000.wav (898.079, 898.368) (11426.6, 25205.9)
5 20210910_030000.wav (608.139, 608.452) (14328.0,30138.3)
6 20210910_030000.wav (744.961, 745.0877) (10375.5, 47430.83)
7 20210910_030000.wav (1065.034, 1065.228) (14328, 25691.7)
8 20211016_030000.wav (1611.886, 1612.014) (19214.9,53801.6)
9 20211016_030000.wav (1717.383, 1717.518) (19762.8, 46442.7)
10 20211016_030000.wav (1728.248, 1728.397 ) (20751, 52865.6)

User can choose to update these templates as a way to improve the performance of feeding buzz detection. Follow the steps below to update the templates. Note that all the functions mentioned below are in src/models/bat_call_detector/feed_buzz_helper.py

  1. Run load_template() to assess existing template.
  2. Run remove_template() to remove any unwanted template.
  3. Run generate_template() to generate new templates and save it to existing or new template dictionary.

Help

python src/cli.py --help

Deeper dive into the models inside the library

We have created a software combining BatDetect2 and scikit-maad to increase the accuracy and efficiency in bat calls and feeding buzz detection. The pipeline will then be programmed to run in parallel processes to increase efficiency.

BatDetect 2

BatDetect2 is a convolutional neural network based open-source pipeline for detecting ultrasonic, full-spectrum, search-phase calls produced by echolocating bats. The model first converts a raw audio file into a spectrogram and uses a sliding window method to identify the pieces of spectrogram that contains bat calls.

BatDetect2_example

Example output of BatDetect2.

Scikit-maad (Spectrogram Template Matching)

Scikit-maad is a Python package that specializes in quantitative analysis of environmental audio recording. Given that feeding buzzes and ordinary bat calls have different shapes in the spectrogram and leveraging the stereotypical shape of feeding buzzes, we use multiple feed buzz templates and a template matching function provided in the package, proving to be effective in identifying feeding buzzes amongst bat calls.

BatCall_example

(a) A group of bat calls have consistent frequency between each call

FeedingBuzz_example

(b) A feeding buzz is identified as a sudden dip in calls.

TemplateMatching_example

Example output of template matching from scikit-maad using only one template. The bounding boxes in top image show the feeding buzz identified. The correlation coefficient chart below indicates the coefficient of this file with the template used. Note the three peaks in the chart corresponds to the bounding boxes in the top chart.

Our model combines the results of multiple templates (10 templates) that are passed to each spectrogram. Given that this resluts in many different potential feeding buzz detections, we use a voting system among all of these detections to choose the final feeding buzz identifications. Currently our voting threshold is: 2.

Pipeline Workflow

The following diagram describes the overall pipeline of our model:

PipelineWorkflow

Analysis

Model Evaluation

We evaluate our model based on calculating Recall and Precision metrics using one audio wav file: 20210910_030000.wav that contains more than 3000 bat calls.

Bat calls

One tunable parameter in this bat call model is the probability threshold, which refers to the detection probability computed by the model. The higher the probability, the more confident the model is in identifying the target as a bat call. We found that the Recall-Precision for bat calls is most optimized around threshold=0.44, with both recall and precision rate around 0.85.

PRCurve_BatCalls

Feeding buzzes

We created a method of combining threshold tuning and filtering false positives using the result from the bat call pipeline to improve our recall and precision rate from 0.25 to 0.6 using two templates (number of templates=2). The threshold that provides the most balanced outcome is 0.26. This threshold represents the correlation coefficient between the target and template.

PRCurve_FeedingBuzzes

Results

Based on the table below, our pipeline has increased the Precision by 73%, Recall by 140% for bat call detection and the Computation time by 10% for a 30-minute audio wav file.

ResultsTable

*The value for precision is not available for feeding buzzes because there is no labelled data in the manual process

Computation times gains are calculated on the specific improvement that our sponsor will observe, so it has to be taken with care. We explain why:

  1. Our sponsor currently uses RavenPro. For batch processing on Mac the software limits batch processing to no more than approximately 16 files for 16GB RAM and 8 files for 8GB files, hence, they were forced to use a slower Linux machine to be able to batch process the amount of files they require. This machine is what the currently use and it's the baseline we use of 2 minutes 36 seconds per file.

  2. Our library can be run on any OS, specifically in the faster Mac machine they have available, we know that for a similar Mac Book Pro M1X with 64GB RAM it takes 2 minutes 12 seconds to run. This will be the processing time they will observe per file.

Acknowledgements

Dr. Wu-Jung Lee -- University of Washington EchoSpace
Aditya Krishna -- University of Washington EchoSpace
Juan Sebastian Ulloa -- Author of scikit-maad
Oisin Mac Aodha -- Author of Bat Detect 2

bat-detector-msds's People

Contributors

ccharp avatar emcediel avatar kirsteenng avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Forkers

aditya-uw

bat-detector-msds's Issues

Create a util.py for data download

under utils.py, create a data_download function with following requirements.

Input: URL path/directory path where raw data resides
Output: the downloaded directory containing raw data, will be in a form of directory.

Investigate Koogu repository

From Dr. Lee:

Forgot if I mentioned this but just thought of it: I think the package koogu can take in RavenPro detection table as inputs for training, and it is a complete framework for the detection from the look of it

Familiarise using RavenPro

Ensure have good understanding with the following workflow.

  1. Open .wav and .txt files using RavenPro.
  2. Know how to identify specific time box in .txt file.
  3. Know how to annotate and make amendments to existing annotation.

Bat Detective! Understand capabilities of model and data

http://visual.cs.ucl.ac.uk/pubs/batDetective/index.html

Questions to answer:
Model:

  • Detect calls other than Search?
  • Y-values of bounding box seem nonsensical (0 to 288?) What does it mean?
  • Detection cuts early. Are there classification thresholds that can be adjusted?

Data:

  • Are non-search calls labeled at all?
  • Are non-search calls labeled as feeding and social?
  • Quality of training data?

Misc:

  • They used a "citizen scientists" to label their data. Can we leverage their system?

Final exploration summary and next steps

Sharing all insights and next steps found during the bat call detection and classification exploration:

A. Bat call detection

The idea here is to be able to detect any type of call in any frequency range and from any type of bat species.
Raven Pro: We explored RavenPro's capabilities, trying out the band limited detector, we noticed that it take very long and the project should be all held within one same pipeline so RavenPro would be a problem

Scikit-maad

We found a useful library that has most of the required capabilities for the project such as sound processing, segmentation and feature extraction. We believe this library will be very useful throughout the project. We found the following most useful for now:

  1. trim: snipper that trims a sound file into smaller components
  2. find_rois(): band limited detector, it works at the same level as RavenPro but inside Python which is a big plus.
  3. spectrogram generation and plotting of bounding boxes.
  4. shape and centroid feature extraction: It extracts from a set of bounding boxes the shape and frequency characteristics into a set of features that can later use for training any model

batdetect

This model based on the following paper: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005995 uses a pre-trained CNN using labeled calls from multiple different bat species around the globe. The authors have published open-source code here: https://github.com/macaodha/batdetect that uses Python 2.7 and for visualizations they reference another library that also uses Python 2.7 called AudioTagger. For our project we have found the following:

  1. The code works even though it requires python 2.7, it does a good job at detecting a good amount of calls. In this case, we use the 20221012_030000.wav file and it detects a total of 23 calls (compared to the 42 detections found by RavenPro's Band-Limited Detector). Manually observing, it misses some fainter calls but captures all of the clear calls, there are no false positives detected. We can observe the results for this specific audio as follows:
    `
    LabelStartTime_Seconds | LabelEndTime_Seconds | DetectorConfidence

-- | -- | --

48.152 | 48.153 | 0.977

48.336 | 48.337 | 0.973

48.534 | 48.535 | 1

48.633 | 48.634 | 0.995

48.732 | 48.733 | 1

48.835 | 48.836 | 1

48.939 | 48.94 | 1

49.024 | 49.025 | 1

49.184 | 49.185 | 0.955

49.278 | 49.279 | 0.99

49.372 | 49.373 | 1

49.575 | 49.576 | 0.999

49.674 | 49.675 | 0.998

49.763 | 49.764 | 1

49.764 | 49.765 | 0.998

49.863 | 49.864 | 0.997

49.966 | 49.967 | 0.953

50.141 | 50.142 | 0.965

50.348 | 50.349 | 1

50.452 | 50.453 | 0.982

50.65 | 50.651 | 0.995

50.852 | 50.853 | 0.993

51.573 | 51.574 | 0.978

`
It starts to recognize from here:
Untitled
But it misses multiple fainter such as these:
Untitled 2

  1. The GUI and plotter AudioTagger is old and we were not able to make it run.
  2. The output format is a .csv that doesn't have the y-axis/frequency to be able to generate a bounding box and the precision on the x-axis is in the 10e-3 seconds, this could chop off part of the horizontal bounding box.

batdetect2

We have found that the same authors of batdetect have recently released a new version of batdetect called batdetect2. The open-source code can be found here: https://github.com/macaodha/batdetect2 and it is code explianed and used in the following paper: https://www.biorxiv.org/content/10.1101/2022.12.14.520490v1. This version includes multiple enhancements such as:

  1. They use a transformer that considers the time aspect of bat calls, claiming to have much better accuracy.
  2. It uses Python 3.10
  3. They increase their dataset for training to include more species and countries
  4. Their prediction now includes a potential species (class prediction)
  5. They develop their own easy and intuitive GUI (Python 3.10) to annotate new audio as well as visualize results.
  6. Their visualizations and output include detailed bounding boxes.
  7. They have developed a Colab version of their model to run right away

For our project, we were able to easily clone this repository and run the pre-trained model. The results are also good, for the same sample audio file 20221012_030000.wav it identifies 59 calls, all of the ones that batdetect missed, all of them or almost all look like bat calls (need Wu Jung or Aditya to confirm) but clearly the ones that look less like bat calls have lower probabilities so with a higher threshold we would be able to remove these FP. There are no FN in this case, we can observe excerpts of this detection below:
Clearest part detection:
Untitled 6
We obserrve that these detections might not be bat calls but they are assigned lower probabilities as we can see with the brightness of the box:

Untitled 7

This model is robust, easy to run and fast when run with excerpts of less than 10 seconds.

B. Bat call classification

Even though batdetect2 is the best we have found to detect bat calls, it is robust only for social and search calls, but it hasn't proven to be able to detect feed buzzes.

As of now we have an idea in development to figure out if we are able to detect feed buzzes independently using template matching detection in scikit.maad library. If this detection works, we will be able to separately recognize feedbuzzes from social and search calls. Allowing us to automatically have a segmentation between these two types of calls. The preliminary results for this model in one audio file with 3 feedbing buzzes (20210910_030000_time1220_HFbuzz_LF.wav) are shown below:

unnamed

The template matching was able to identify the 3 feed-buzzes using the first one as the template to recognize the following two.

C. Pipeline

We will initially implement these models into a Pipeline that approximately looks like this (subject to change):

  1. User inputs 30 minute audio file
  2. Audio file will be cut into 30 smaller 1 minute audio files using scikit.maad.trim()
  3. Each of the 30 files will be processed for social or search call detection using batdetect2 labeled 'social or search call' and feedbuzz detection using template detection (or alternative feedbuzz detection model) labeled 'feed buzz'.
  4. All detection will be re-contextualized to the 30 minute span of the audio file when merged toghether in a single .txt detection and classification file
  5. This .txt file is output for the user.

D. Call with Juan Ulloa (creator of scikit-maad)

We were able to contact the creator of Scikit-maad who kindly dedicated an hour to respond to any of our questions, this summarizes what we found:

Q: Batdetec2 is not robust against buzz feeds, what do you recommend us to do?
A: He recommends to use template matching that is a function that is still in the development code of scikit-maad but it is stable so he told us to use it by accessing directly to that branch.

Q: What are centroid and shape features, how and for what can we use them?
A: Shape features are completely based on the bounding box, they describe using numbers what we observe within the bounding box. Centroid features allow us to understand where in the spectrogram (y-axis) this bounding box is. So if we mix both of these we will have a lot of information to describe a bounding box. This data can be used to feed into classifiers, ML models, etc. There are 3 types, 'Low', 'Med' and 'Hi', this is the resolution of the shape features, as you increase, it will describe better your shape (more angles "you can think of them as more convolutional filters") but it will be much more heavy (larger file). This link helps to understand this more: https://scikit-maad.github.io/generated/maad.features.opt_shape_presets.html#maad.features.opt_shape_presets

Q: How do we plot bounding boxes with maad?
A: Follow this tutorial page: https://scikit-maad.github.io/_auto_examples/1_basic/plot_find_rois_simple.html#sphx-glr-auto-examples-1-basic-plot-find-rois-simple-py

Q: What model would you recommend if we go for supervised learning with anbnotated datasets
A: He recommends the following paper: https://www.sciencedirect.com/science/article/pii/S1574954120300637?via%3Dihub

Q: Is there an example to be able to do template matching?
A: Yes, we have an example in the production branch: https://github.com/scikit-maad/scikit-maad/tree/production/example_gallery/1_basic

Last mile TODOs

  • Parallel processing capability (@ccharp)
  • Create runtime comparison after parallel processing (@ccharp)
  • Output TSV in format consumable by RavenPro (@ccharp)
  • Fix "There is an error" bug occurring in audo segmenter @ccharp
  • Investigate extraneous output from CLI application and remove (@ccharp )
  • Try putting all code inside of src directory @ccharp
  • Clear out all TODOs in the current repo (@ccharp )
  • Move template_dict.pickle to be local with the code that uses it (similar to BatDetect2's model)
  • Create conclusions around FP and FN, what do these detections have in common? ( @Kirsteenng and @emcediel )
  • Add results, graphs and conclusions to poster (due Wednesday) ( whole team )
  • Verify that Feed Buzz model works End-to-End (@ccharp and @emcediel )
  • Verify that FP removal is at the right level and works (@ccharp and @emcediel )
  • Create PR to add FeedBuzz and FP removal ( @emcediel )
  • Make random subsample of FP and modify FP results with a better estimate of true FP (annotator error) ( @Kirsteenng and @emcediel )
  • Check if Feed Buzz FN are truly FN (send Wu - Jung pictures) ( @Kirsteenng and @emcediel )
  • Write documentation for updating feed buzz detection templates and scikit-maad library (@Kirsteenng )
  • In-line documentations for all functions for template matching pipeline involve
  • Write documentation on analysis and results (@emcediel )
  • Write documentation on how to batch process multiple files using bash. (@ccharp )

Please let me know any other TODOs to include in this list!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.