Project Objective:

Project is created with intection to detect/classify an audio signal if it is such as a cough or sneeze audio signal.
Further goal is to pipeline this to mobile applications to narrow the detection of sickness audio specificlly of a COVID19.
To contribute to help gov authorities to identify the persons with probable coronavirus infection living among us. ("We should fight the Virus, not the Patient effected with virus")

Audio Classification using Deep Learning in Python

Let's Get Started

Terminology to Know:

Audio Signal Processing
Basic ML Framework
Bit Depth
CNN
Data visualization
Digital Signal Processing
Fast Fourier Transform
Fliter Bank Coefficients
Fourier Transform
Hanning Window
Implementation of a ML model in python
Mel Filter Bank
Mel Cepstrum Coefficients
Preprocessing
RNN
Sampling and Sampling Frequency
Sensors
Short Time Fourier Transform
Spectrogram

Questions to ponder:

What are different types of audio sources known?
what are various audio file formats?
How to read an audio file?
what are various properties if audio files?
How to vizualize an audio file?
How to find Bit depth of an audio file in python?
How to find various properties of an audio file?
How to Extract Features from audio Files?

so on..

Why do we hear various Sounds? What makes a sound unique/distinct?

► When we think about sound:

We often think about how loud it is (amplitude, or intensity) and its pitch (frequency).

► In a given medium under fixed conditions, speed is constant. Hence, there is a relationship between frequency(f) and wavelength(λ); the higher the frequency, the smaller the wavelength

► The animation above shows two acoustic longitudinal waves with two different frequencies but travelling with the same velocity. It can be seen that the wavelength is halved when the frequency is doubled.

► An interactive animation illustrating the amplitude, wavelength and phase of a sine wave. Varying the amplitude, wavelength and phase; observe the effects on the transverse wave

Sound in our environment is the energy, things produce when they vibrate (move back and forth quickly)

How much of Sound Intensity can we Feel?

Image Follow Up

How much of Sound Frequency can we Feel?

Image courtesy of NASA

Image Follow-up

How do we record various Sounds? Can Machines distinguish audio from non-audio data?

► Digital Sound Recording:
Method of preserving sound in which audio signals are transformed into a series of pulses that correspond to patterns of binary digits (0's and 1's)

What's the science of sound ? ►

Graphic Follow Up

► Signal sampling representation:

A sample is a value or set of values at a point in time and/or space.
Sampler is a subsystem or operation that extracts samples from a continuous signal.

Fig: The continuous signal is represented with a green colored line while the discrete samples are indicated by the blue vertical lines.

let s(t) be a continuous function (or "signal") to be sampled

Sampling Interval or Sampling Period:
Sampling performed by measuring the value of the continuous function every T seconds

Sampling Frequency or Sampling Rate:
The average number of samples obtained in one second (samples per second)

COVID'19 Cough Audio Analysis

Patient Details:

Age: 49
Sex: Male
Country: UK
Day: 5
Resource Date: Mar 23, 2020
Infection Symptoms: cannot Breathe, Heavy Coughs.
Health Status before effected by COVID'19: Healthy Person, Regular Swimmer

COVID19 Cough sample of a 49year old Male in UK

Audio Analysis of COVID-19 Cough and Breathing patterns of the patient:

► Time Domain to Frequency Domain Transformation:

Signal Feature Extraction: ►

Filter Banks
Mel Frequency Cepstrum Coefficients

How to find these Coefficients?

A signal goes through a pre-emphasis filter.

Then gets sliced into (overlapping) frames
A window function is applied to each frame
Afterwards, we do a Fourier transform on each frame (or more specifically a Short-Time Fourier Transform)
Calculate the power spectrum;
And subsequently compute the filter banks.
To obtain MFCCs, a Discrete Cosine Transform (DCT) is applied to the filter banks retaining a number of the resulting coefficients while the rest are discarded.

A final step in both cases, is mean normalization.

► Steps used for calculating MFCCs for the COVID19 Cough audio sample:

Slice the signal into short frames (of time)
Compute the periodogram estimate of the power spectrum for each frame
Apply the mel filterbank to the power spectra and sum the energy in each filter
Take the discrete cosine transform (DCT) of the log filterbank energies

What is the difference between mono and stereo?

In monaural sound one single channel is used. It can be reproduced through several speakers, but all speakers are still reproducing the same copy of the signal.

In stereophonic sound more channels are used (typically two). You can use two different channels and make one feed one speaker and the second channel feed a second speaker (which is the most common stereo setup).

This is used to create directionality, perspective, space.

Data Description

Dataset Source: https://osf.io/tmkud/

Motivation:

This dataset has been created for the Pfizer Digital Medicine Challenge.

Early detection of respiratory tract infections can lead to timely diagnosis and treatment, which can result in better outcomes and reduce the likelihood of severe complications.
Respiratory sounds carry rich information that can be mined to develop automated approaches for detection of sickness behaviors like coughing and sneezing.
In this challenge, we invite you to build machine learning models for automatic detection of sickness sounds by using audio recordings from open datasets.
The dataset was created using audio files from ESC-50 and AudioSet.
We used the open source BMAT Annotation Tool to annotate this dataset.

Challenge

Develop machine learning models for detection of sickness sounds (coughing and sneezing)

Dataset

The dataset is organized as follows:

train

sick (n=1435)
not_sick (n=2283)

validation

sick (n=468)
not_sick (n=753)

test

sick (n=642)
not_sick (n=1012)

What's the Execution Plan?

The data is in the directory Dataset
- further in the directories: 'Train' 'Test' and 'Validation'
Each Set has two directories named by the dataset classes

What's the dataset Size?

Its Big !!!

Is it Big Data Problem?

Do I have resources to use hadoop/aws?

No, I'm in Lockdown and limited time, knowledge and internet is a concern for me!!

What's the solution?

Have to use my old Intel i3 core :/ laptop to devolep few basic templates
Once I get internet access, I'll use the template to run on Google's Colab =')
After debugging, I'll increase the full dataset and re-run the program files for visualizaton, model training :O (A possible update on this :|)

Preprocessing

Data is Cleaned and Following is the class distribution:

The above analysis explains that the dataset of both classes in the training folder is equally distributed in the length.

The MFCC Feature Extraction is applied to every training sample to get 13x99 features/coefficients. This is the method used to convert the audio data into numpy arrays