cmsc730-project: Live gesture recognition

Getting started

Requirements Python 3.73+

May need twiddling to work on an M1 chip.

Preliminaries: General setup

Set up virtualenv using setup.sh or by running

#!/bin/zsh

python3 -m venv venv 
source venv/bin/activate
pip install jupyter
ipython kernel install --name "local-venv" --user
python -m pip install -r requirements.txt

Step 1: Collect data for your gestures

You can also use our dataset based on 8 predefined gestures(https://github.com/tinydeltas/cmsc730-project/blob/main/docs/videos/recording_all.mov), available here: https://drive.google.com/file/d/18hm1RU70tTb_t4tOmPw_p5gTfLb840NV/view?usp=sharing

Download and install Matlab
Install Signal Processing Toolbox (https://www.mathworks.com/products/signal.html)
Follow instructions to set up Matlab with Python https://www.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html

cd "matlabroot/extern/engines/python"
python setup.py install

Run /src/audio/record.py, tweaking the directory_default and gesture_label_default parameters accordingly. This will save the samples to src/audio
Move the samples you'd like to use under data/raw/audio

Step 2: Train model on collected dataset

Generate spectrograms from the .wav raws of your samples. Stores them in /data.

Looks for your raw WAV files under data/raw/audio

python gen_spectrograms.py

Train your model: Defines and trains input image types on 7-layer CNN Siamese network model. Takes about 10 minutes per image type, for total of ~1.5 hours to train and compare on every image type.

Looks for your processed spectrogram files under data/processed/images

python pipeline.py

Step 3: Live gesture detection

Install requirements
Place Keras model trained above into demo_app/static/data/model
Start the Flask server

./start_flask.sh`

Navigate to http://localhost:5000
Click "Start Recording" and wait 10s for signal to broadcast, and start gesturing!

Directory overview

data/: Data for predefined gestures.
- images: The spectrogram images generated from the raw .wav files
- wav: The raw .wav files of the eight predefined gestures, collected by the SEEED microphone mounted on a raspberry pi.
src/
- params.py:
```
  - `input_image_types`: Defines the input image (spectrogram) types we are interested in feeding to the ML model for training and validation. 

  - `default_gestures`: labels for the pre-defined gestures (corresponding to the names of their respective folders in `data/images`)

  - `source_wav_directory`: where to save the wav files as part of dataset collection. 
```
- spectrograms.py: Library that produces spectrograms from raw sound data.
- src/data: Prepare the dataset for the model training step. Divides the spectrogram image data generated by spectrograms.py into training and validation data sets. Selects param_training_percentage * #_samples_per_gesture samples for the training dataset at random; the rest comprise the validation data set.
  - /src/data/params: Parameters
    - run_directory: Directory for output of each run. Default: ./tmp.
    - training_percentage: Percentage of dataset for each gesture that will be allotted to the trainings set. Default: 0.55.
    - data_type: Specifies type of data being stored and loaded (either npy or png).
- src/ml/siamese.py: Defines the Fewshot implementation. Skeleton code taken from https://github.com/akshaysharma096/Siamese-Networks and heavily modified for the purposes of this assignment.
  - Parameters:
    - loss_function: Loss function for the ML model. Default: binary_crossentropy
    - optimizer: Optimizer algorithm. Default: Adam (Stochastic gradient descent).
    - param_N_way: How many classes to assign a potential task to. Default: 8 (for the 8 pre-defined gestures)
    - param_n_val: How many tasks to validate on. Default: 7
    - param_batch_size_per_trial: Number of paired batch tasks per trial. Default: 7
    - param_n_trials: Number of trials to perform during the validation phase. Default: 100
    - param_n_iterations: Number of epochs to train the model on. Default: 1000
- train.py: Runs the whole pipeline.

Temporary folders

tmp Stores the dataset, ML models, and results for each run
- YYYY-MM-DD HH-MM-SS: Overall run folder, corresponding to each time pipeline.py is run.
  - models: Stores model weights
  - results: Stores results of training and validation, by type of spectrogram, as well as composite

tinydeltas / cmsc730-project Goto Github PK

cmsc730-project's Introduction

cmsc730-project: Live gesture recognition

Getting started

Preliminaries: General setup

Step 1: Collect data for your gestures

Step 2: Train model on collected dataset

Step 3: Live gesture detection

Directory overview

Temporary folders

cmsc730-project's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent