Giter VIP home page Giter VIP logo

samsunginnovationai_sign_mnist's Introduction

Samsung Innovation Campus 2021 Project Sign_MNIST_AI

Sign Language Detection

Team: Vision of Sauron

  • Abel Asfaw
  • Hilarion Reyes
  • Mosunmola Oyeleye
  • Mya Thanegi Soe
  • Giovan Panzanella

Technologies used:

-- Google Collab Pro -- Tensorflow -- Python -- OpenCV -- Pandas -- Matplotlib -- Seaborn -- Keras -- Scikit-Learning

================

Table of Contents:

  • Problem Definition
  • Dataset Overview
  • EDA
  • Models
  • Further Improvements
  • Conclusion

Problem Definition:

  • The deaf or people with hearing problems don’t have efficient applications that can be used to communicate
  • Current visual recognition algorithms have issues with real world application
  • American Sign Language is a complex language and primary language for deaf people.

Dataset Overview:

  • Training and test set contain a label ( 0-25) and letters (A-Z)
  • Match the patterns of the MNIST dataset
  • Each pixel is of size 28*28
  • Each sample has 784 pixels

Exploratory Data Analysis:

  • Label Count ( How many times each letter appears)
  • 0 – 25 is mapped to A-Z
  • J and Z require motion
  • There is no count for 9 = J and 25 = Z
  • Highest count is Q

Dataset Image Previews:

Models:

Model 1: Stochastic Gradient Descent Architecture

  • Built a custom SGD model using Convolution2D Layers and Dense Layers with a total of 1,526,425 trainable params.
  • Used Max-pooling, Dropout, and Decay for Regularization..

Model 1: SGD Results

Model 2: Pretrained ResNET Model

  • Introduced by Microsoft Research
  • Increasing network depth does not work by simply stacking layers together.
  • applies concept of skip connection
  • avoid small gradients by allowing this alternate shortcut path for gradient to flow through
  • recommended to have a minimum shape of 32,32, (+ #of feature channels)

Model 2: ResNET Architecture

Model 3: Custom Model RMSprop CNN

  • most commonly applied to analyze visual imagery
  • CNNs are regularized versions of multilayer perceptrons.
  • Multilayer perceptrons usually mean fully connected networks.
  • Each neuron in one layer is connected to all neurons in the next layer.

Model 2 + 3: Results

ResNET Drawbacks:

  • requires higher image sizes
  • lower performance compared to a cnn model
  • Increased complexity of architecture
  • Computationally expensive
  • There is no specific rule for determining the structure of a neural network.

OpenCV GUI video demo: Using Stochastic Gradient Descent Model

AI_SignLanguage_demo_video.mov

Future Possible Improvemenets:

  • Expand diversity of dataset by getting different individuals to model letter types (excluding J and Z which require motion).
    • potential factors: size of hand, colour of hand
  • Implement Data Augmentation (rotations, flipping, etc..) to improve model robustness .
  • Increase image sizes to improve training generalization and improve live-testing accuracy.
  • Continue fine-tuning the Hyperparameters and the complexity of the hidden layers in order to optimize models’ performance.

Conclusion:

  • Our code configuration was able to reasonably capture dynamic webcam data from the user using openCV to use for our model’s prediction algorithm.

  • Further improvements must be made to increase model accuracy, improve generalization in order to enhance communication for individuals with hearing impairment and build a more inclusive society.

samsunginnovationai_sign_mnist's People

Contributors

h-jamesr2 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.