Giter VIP home page Giter VIP logo

orcasound / aifororcas-livesystem Goto Github PK

View Code? Open in Web Editor NEW
35.0 35.0 20.0 192.41 MB

Real-time AI-assisted killer whale notification system (model and moderator portal) :star:

Home Page: http://orcahello.ai4orcas.net/

License: MIT License

C# 58.19% HTML 6.56% CSS 22.49% JavaScript 0.85% Python 11.82% Dockerfile 0.09%
artificial-intelligence audio-analysis audio-processing bioacoustics deep-learning inference machine-learning marine marine-biology neural-network orcas real-time realtime whales

aifororcas-livesystem's People

Contributors

aarjukumar avatar aayushmnit avatar akashmjn avatar benjamintdk avatar catskids3 avatar danielpickens avatar dependabot[bot] avatar dhananjaypurohit avatar esolor avatar micowan avatar micya avatar molkree avatar nithya4 avatar pastorep avatar prakruti avatar rawlink avatar salsal97 avatar sasgilmer avatar scottveirs avatar truashamu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aifororcas-livesystem's Issues

Change subscriber notifications to be sent less frequently

We would like to reduce the number of notifications subscribers receive.

  1. Subscribers should get a notification when orcas are first confirmed.
  2. Subscribers should continue to get notifications every 15 minutes while orcas are still there.
  3. Subscribers should get a notification when orcas are no longer there.

Some state tracking might be necessary across function runs.

Moderator emails should still go out immediately for every potential detected orca clip.

Add a python function that given a date-range, downloads audio in a range from Orcasound's S3

There is a class to do this, see DateRangeHLS
https://github.com/orcasound/aifororcas-livesystem/tree/main/InferenceSystem/src/hls_utils

See https://github.com/orcasound/aifororcas-livesystem/blob/main/InferenceSystem/src/PrepareDataForPredictionExplorer.py for usage.

Add support + error handling to make sure only date ranges < a certain time period are allowed.
Add support for randomly sampling audio segments within the date range.

Moderator candidates silent on iOS playback

[On my iPhone in Chrome (& Safari and Firefox) I can log into the moderator site, but playback on the default or detailed view of a candidate makes no sound. This makes it impossible to review candidates when I’m away from my laptop, other than visually inspecting the spectrograms.

However, the play button icon does toggle to pause, and the vertical white lines indicating predictions in the detail view are rendered. See screen shot:

562CE5B3-FE14-4E9F-85BC-C0FFA4298593

This is not expected behavior for WAV format playback on iOS, at least in Chrome — https://en.m.wikipedia.org/wiki/HTML5_audio#Supported_audio_coding_formats](https://en.m.wikipedia.org/wiki/HTML5_audio#Supported_audio_coding_formats)

Add dashboard for sendgrid metrics

SendGrid presents us with a lovely dashboard showing email metrics:

sendgrid dashboard

However, the dashboard only lives inside SendGrid, which makes accessing the dashboard more difficult. We would like to replicate the dashboard in Azure.

This should be possible by using Event Webhooks.

OrcaHello false negative event: SRKWs at Orcasound Lab on 11/9/21

For this brief (<10 min-long) event, I received no notification as a moderator, nor did the OrcaHello system seem to create any candidates that are visible within the moderator portal.

Having reviewed much of the continuous recording, I believe there are many SRKW calls that would have been detected by the system as it was performing in late 2020 and early 2021. The signal to noise ratio was intermediate and many calls were 100% recognizable as those from SRKWs.

Which hydrophone?
Orcasound Lab (Haro Strait)

When did the KW event start?
11/9/2021 | 15:48:00

When did the KW event end?
11/9/2021 | 15:57:00

What was the nature of the event?
About 8 minutes of SRKW signals at intermediate to high SNR. Signals included an unusually wide variety of SRKW calls. An Orcasound listener annotated the live audio data at 15:58:16. I reviewed continuous recording and heard lots of calls, nearly continuous sometimes and often overlapping. I noted a surprisingly high % of S10 excitement calls, some S01s, S16s, S17s?, and 3 S04s at the end.

Was the container running? (If you have this info?)
TBD

Here is a link to a the continuous recording (HLS segments transcoded to .ogg and .mp3). There is also a set of preliminary annotations there in Audacity label track format.

Automated re-training with known negative samples: Launch script

Audio like chain clinking, boat noises or high-frequency can sometimes confuse our system.
This feature request is for a moderator to launch an automated re-training of a model when we find new instances of false positives.

Scoped to only inanimate false positives for now though the actual mechanism would be independent of what data is considered false positive.

Ideally, this feature enables the creation of a PR that updates the model.

Batch annotation

On some days there are many candidates containing only false positives (e.g. 20-40 candidates containing only Pigeon Guillemot bird calls). In such cases -- when the moderator is confident that no SRKW were present at the time and location of the false positives -- it would be able to "Select all" within a page view or a time interval, and then annotate that temporal batch of candidates with the same labels and comment.

To ensure high-quality annotation of acoustic bouts containing SRKW, each candidate in a bout should be processed separately, rather than in a batch. This would increase odds that every true positive is confirmed, false negatives are flagged, and that call types and "special" signals like whistles or buzzes will be annotated accurately.

Add SMS to notification system

Add functionality to notify moderators and subscribers via SMS. Maybe use azure communication services?

  1. Add subscribe/unsubscribe functionality for moderators and subscribers
  2. Figure out how to store user info properly in Azure Tables
  3. Send SMS messages as appropriate

Candidate metadata states 4 detections, but only one displayed

In at least one case during first year of deployment, the metadata for a candidate stated four detections were contained in the candidate. Yet, upon examination of the model predictions via the "Details" button, there appears to be only one detection (overlaid white bounding lines) near the end of the 60-second clip.

I've noticed this only once (see candidate in question), but tagged it as related to this potential "bug" within the CosmoDB of annotations.

incomplete data problem

The system normally processes one minute at a time. However, sometimes data are missing, so that only 40-50 seconds are received. When this happens, the bars indicating when detections occurred are misaligned. This is a rare problem and has little impact on the function of the system, so is a low priority from my perspective, but could be important for retraining runs.

Sorting problem

When sorting all in ascending or descending order, year is not included in the sort. I.e., December, 2020 is treated as after January, 2021.

Due to the large number of detections accumulated over the last year, it would be helpful to be able to jump to a specific date and time rather than starting at the beginning or end and working toward the middle.

Moderator UI slow upon submission

It seems the delay is slowly increasing between a push of the "Submit" button and the refreshed moderator UI view (next candidate ready for annotation). As of today (8/19/21) it is about 9 seconds. This makes bulk annotation much less efficient than it could be...

FWIW: this delay is based on a MacBook Pro (OSX 10.15.5) running the Chrome browser -- Version 92.0.4515.159 (Official Build) (x86_64).

Add dashboard for notification system metrics

Add dashboard for Azure Functions execution count for SendModeratorEmail and SendSubscriberEmail. This will allow us to have a "single pane of glass" to identify failures in the pipeline.

OrcaHello false negative event: SRKWs at Orcasound Lab on 11/2/21

For this event, I received no notification as a moderator, nor did the OrcaHello system seem to create any candidates that are visible within the moderator portal. Having reviewed much of the continuous recording, I believe there are many SRKW calls that would have been detected by the system as it was performing in late 2020 and early 2021.

  • Which hydrophone?

Orcasound Lab (Haro Strait)

  • When did the KW event start?

11/2/2021 | 14:05:20

  • When did the KW event end?

11/2/2021 | 16:15

  • What was the nature of the event?

Greater than 2 hours of SRKW signals at intermediate to high SNR. Signals included an unusually wide variety of SRKW calls; Monika Wieland estimated hearing about ~2/3 of the SRKW repertoire, whereas we typically hear <1/4. There were

  • Was the container running? (If you have this info?)

TBD

I will provide a link to a blog post presenting the continuous recording (HLS segments transcoded to .ogg and .mp3). For now, the start/stop date-times are listed in the shared Orcasound annotation candidate spreadsheet.

Add axes to spectrograms

As a bioacoustician end-user (OrcaHello moderator persona),
In the default moderator view of candidates and the Detail view of candiates with overlain detections,
I would like to see the axes of the spectrogram, specifically frequency (in Hz) and time (in seconds) with ticks and labels.

Shared logging for inference system

The inference system currently runs in Azure Container Instances. It restarts every once in a while, which is expected. However, logs are printed to stdout and lost when the container restarts.

We would like to save logs externally so they are available beyond container restarts.

Update data flow schematic for README

As of Sept 2022, the data schematic in the aifororcas-livesystem README indicates the inference model is deployed via Azure Container Instance service. It currently looks like this:

SystemOverview

To be more accurate, the current image should be modified to instead indicate that the inference system is deployed via Azure Kubernetes Service (AKS).

I'm not sure where this graphic lives (in Microsoft hackathon Teams workspace?) or in what software it was created. Ideally it would be stored within the REPO in a format that could be edited easily in the future, rather than only as an image file.

After updating the file, the README should be proof-read to ensure deployment descriptions are consistent with the new version of the schematic.

Add rationale for 2.45 second window to README

A good question was raised on a call with Canadian open source collaborators today (HALLO project, #ai4orcas-hallo in Orcasound Slack), some of whom have been experimenting with different window durations in developing a binary classifier for SRKW+Bigg's+NRKW+offshore ecotypes of killer whales in the NE Pacific (with habitat in BC, Canada, coastal environments):

Why did Pod.Cast and OrcaHello elect to use a 2.45 second window?

It would be ideal to recall the rationale and add it to the README.MD file.

On the call, I said I thought it was due to the statistics of SRKW call duration, but I'm not seeing the 2.45 second (or 2450 millisecond) value in Orcasound's shared spreadsheet of SRKW.

Add heartbeat/monitoring dashboard for inference system

Historically, troubleshooting for inference system/notification system failures involved manual steps to identify failures. Past hackathon focused on utilizing Azure Dashboards to surface some metrics from Log Analytics. However, Azure Dashboards is difficult for non-technical observers to use.

I'd like to look into setting up something separate from Azure for monitoring purposes. It can either be a self-developed application or an existing monitoring solution (prometheus?). It should show at minimum:

  • Heartbeats from inference system instances
  • Line chart for Cosmos DB read/write metrics
  • Line chart for Azure function executions
  • Line chart for SendGrid emails sent

Ascertain % of known SRKW transits missed by OrcaHello

Prakruti asked @scottveirs, @dbaing17, and others to work out how many times SRKWs were missed by the live inference system over the first year of deployment and beta-testing.

Tasks:

  • Review and consistently complete Orcasound event spreadsheet (including tally of when humans and/or machines detected SRKWs)
  • Double-check that all known SRKW events are included (using Orca Network sighting reports and OBI/etc FB accounts, and any other available "ground truth" opportunities).
  • Compute percentages of total transits detected/missed by OrcaHell (and the human listening network, and both combined).

Questions:

  • On 8/28/21 around 22:35 did OrcaHello miss Bigg's transit of Orcasound Lab, or had instance failed at this point in August? (Not Bigg's but may be symptomatic of system destabilizing...)
  • On 9/7/21 did OrcaHello miss Bush Point event due to high water noise levels, or was system down (after 9/5 detections)?

Functions to perform forward pass on the test set, create a submission and score it.

The output of this function should be graphs embedded in a markdown file as describerd in the readme.md

A lot of this likely exists, you'd have to look into https://github.com/orcasound/aifororcas-livesystem/blob/main/ModelEvaluation/readme.md and https://github.com/orcasound/aifororcas-livesystem/blob/main/ModelEvaluation/score.py

This "scoring" procedure will be used to create the body of the PR that we will eventually check in.

You'll like need to check in with Aayush for the forward inference part of this.

CI/CD for Moderator FrontEnd

Put in unit tests that will run for each PR.
Put in a build pipeline that will run for each PR.
Automated deployment of moderator portal after PR completes.

Python function to finetune existing FastAI model with new data.

Convert notebook to script containing at least two functions blenddataset() and finetune()
Also, include code to "blend" data from old dataset + new as a separate function.
Please liberally add comments in the script apart from writing the reasoning in the notebook.

Update inference system deployment documentation

We moved the inference system from ACI to AKS after several long and unexplained failures of the former. However, the current documentation for inference system deployment still refers to ACI.

Improve availability of prediction system

The email notifications are working intermittently. After the first round of failures, it restarted, but has failed again. It failed on 9/15, resumed working on 9/16, and failed again on 9/17. It seemed to work flawlessly for almost a year until these failures.

False positives with no discernible signal

Since approximately the last Microsoft hackathon in October there has been a degradation in performance of the inference system. Over this period (Nov through Feb) it feels like the model has been generating more false positives that don't seem to have any sort of signal reminiscent of a killer whale call. I have been wondering if the model has become more sensitive to low-frequencies, possibly due to being re-trained with data that included humpback non-song vocalizations.

However, there are cases where I can't hear (or see in the spectrogram) any tonal, whale-like signals at either high or low frequencies. These false positives are disconcerting and are difficult for moderators to process in large numbers (because they are pretty boring to review! ;). Most often, these candidates have pretty low average confidence (just above the 50% threshold), but here are a couple of recent examples where the average confidence was above 70%!

My impression is that this issue is by far most prevalent for the Orcasound Lab node. In fact, there have been remarkably few candidates for review from Port Townsend and Bush Point in comparison over the last few months. I'm not sure if the differences are related to this performance issue, but here are the tallies for Jan 01 - Feb 28, 2022:

  • 343 Orcasound Lab (97%)
  • 003 Port Townsend (01%)
  • 008 Bush Point (02%)

It took a while to settle in on this apparent change in performance of the inference system, in part because after the hackathon we migrated to a new Azure subscription and experimented with different modes of deploying the code. Additionally, the winter storms caused a suite of physical changes in the hydrophones and deployment conditions at the Orcasound node during this period. A confounding factor is that SRKW transits of the Orcasound hydrophones are decreasing during the same period (which is the normal wintertime pattern), so we have few opportunities to observe the model performance when SRKW calls are present. Nevertheless, these false positives associated only with typical background noise are a significant issue and may indicate a bug and/or a step backwards in model performance, at least for the Orcasound Lab node.

For the last week or so, I've added the fail tag to candidates which only seem to contain "typical" noise for a particular hydrophone location. These can be explored via the OrcaHello Dashboard by searching the tag cloud for fail...

Screen Shot 2022-02-28 at 9 46 37 AM

Port Townsend inference system container crashes

Port Townsend inference system container (orcaconservancycr.azurecr.io/live-inference-system:11-15-20.FastAI.R1-12.PortTownsend.v0) crashes with the below message:

Listening to location https://s3-us-west-2.amazonaws.com/streaming-orcasound-net/rpi_port_townsend
Traceback (most recent call last):
  File "./src/LiveInferenceOrchestrator.py", line 158, in <module>
    clip_path, start_timestamp, current_clip_end_time = hls_stream.get_next_clip(current_clip_end_time)
  File "/usr/src/app/src/hls_utils/HLSStream.py", line 94, in get_next_clip
    num_segments_in_wav_duration = math.ceil(self.polling_interval/stream_obj.target_duration)
TypeError: unsupported operand type(s) for /: 'int' and 'NoneType'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.