Giter VIP home page Giter VIP logo

cmsc730-project's Introduction

cmsc730-project: Live gesture recognition

Getting started

Requirements Python 3.73+

May need twiddling to work on an M1 chip.

Preliminaries: General setup

Set up virtualenv using setup.sh or by running

#!/bin/zsh

python3 -m venv venv 
source venv/bin/activate
pip install jupyter
ipython kernel install --name "local-venv" --user
python -m pip install -r requirements.txt

Step 1: Collect data for your gestures

You can also use our dataset based on 8 predefined gestures(https://github.com/tinydeltas/cmsc730-project/blob/main/docs/videos/recording_all.mov), available here: https://drive.google.com/file/d/18hm1RU70tTb_t4tOmPw_p5gTfLb840NV/view?usp=sharing

  1. Download and install Matlab
  2. Install Signal Processing Toolbox (https://www.mathworks.com/products/signal.html)
  3. Follow instructions to set up Matlab with Python https://www.mathworks.com/help/matlab/matlab_external/install-the-matlab-engine-for-python.html
cd "matlabroot/extern/engines/python"
python setup.py install
  1. Run /src/audio/record.py, tweaking the directory_default and gesture_label_default parameters accordingly. This will save the samples to src/audio

  2. Move the samples you'd like to use under data/raw/audio

Step 2: Train model on collected dataset

  1. Generate spectrograms from the .wav raws of your samples. Stores them in /data.

Looks for your raw WAV files under data/raw/audio

python gen_spectrograms.py 
  1. Train your model: Defines and trains input image types on 7-layer CNN Siamese network model. Takes about 10 minutes per image type, for total of ~1.5 hours to train and compare on every image type.

Looks for your processed spectrogram files under data/processed/images

python pipeline.py 

Step 3: Live gesture detection

  1. Install requirements
  2. Place Keras model trained above into demo_app/static/data/model
  3. Start the Flask server
./start_flask.sh` 
  1. Navigate to http://localhost:5000
  2. Click "Start Recording" and wait 10s for signal to broadcast, and start gesturing!

Directory overview

  • data/: Data for predefined gestures.

    • images: The spectrogram images generated from the raw .wav files
    • wav: The raw .wav files of the eight predefined gestures, collected by the SEEED microphone mounted on a raspberry pi.
  • src/

    • params.py:

        - `input_image_types`: Defines the input image (spectrogram) types we are interested in feeding to the ML model for training and validation. 
      
        - `default_gestures`: labels for the pre-defined gestures (corresponding to the names of their respective folders in `data/images`)
      
        - `source_wav_directory`: where to save the wav files as part of dataset collection. 
      
    • spectrograms.py: Library that produces spectrograms from raw sound data.

    • src/data: Prepare the dataset for the model training step. Divides the spectrogram image data generated by spectrograms.py into training and validation data sets. Selects param_training_percentage * #_samples_per_gesture samples for the training dataset at random; the rest comprise the validation data set.

      • /src/data/params: Parameters
        • run_directory: Directory for output of each run. Default: ./tmp.

        • training_percentage: Percentage of dataset for each gesture that will be allotted to the trainings set. Default: 0.55.

        • data_type: Specifies type of data being stored and loaded (either npy or png).

    • src/ml/siamese.py: Defines the Fewshot implementation. Skeleton code taken from https://github.com/akshaysharma096/Siamese-Networks and heavily modified for the purposes of this assignment.

      • Parameters:
        • loss_function: Loss function for the ML model. Default: binary_crossentropy

        • optimizer: Optimizer algorithm. Default: Adam (Stochastic gradient descent).

        • param_N_way: How many classes to assign a potential task to. Default: 8 (for the 8 pre-defined gestures)

        • param_n_val: How many tasks to validate on. Default: 7

        • param_batch_size_per_trial: Number of paired batch tasks per trial. Default: 7

        • param_n_trials: Number of trials to perform during the validation phase. Default: 100

        • param_n_iterations: Number of epochs to train the model on. Default: 1000

    • train.py: Runs the whole pipeline.

Temporary folders

  • tmp Stores the dataset, ML models, and results for each run
    • YYYY-MM-DD HH-MM-SS: Overall run folder, corresponding to each time pipeline.py is run.
      • models: Stores model weights
      • results: Stores results of training and validation, by type of spectrogram, as well as composite

cmsc730-project's People

Contributors

irtazashahidumd avatar tinydeltas avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.