- Abel Asfaw
- Hilarion Reyes
- Mosunmola Oyeleye
- Mya Thanegi Soe
- Giovan Panzanella
-- Google Collab Pro -- Tensorflow -- Python -- OpenCV -- Pandas -- Matplotlib -- Seaborn -- Keras -- Scikit-Learning
- Problem Definition
- Dataset Overview
- EDA
- Models
- Further Improvements
- Conclusion
- The deaf or people with hearing problems don’t have efficient applications that can be used to communicate
- Current visual recognition algorithms have issues with real world application
- American Sign Language is a complex language and primary language for deaf people.
- Training and test set contain a label ( 0-25) and letters (A-Z)
- Match the patterns of the MNIST dataset
- Each pixel is of size 28*28
- Each sample has 784 pixels
- Label Count ( How many times each letter appears)
- 0 – 25 is mapped to A-Z
- J and Z require motion
- There is no count for 9 = J and 25 = Z
- Highest count is Q
- Built a custom SGD model using Convolution2D Layers and Dense Layers with a total of 1,526,425 trainable params.
- Used Max-pooling, Dropout, and Decay for Regularization..
- Introduced by Microsoft Research
- Increasing network depth does not work by simply stacking layers together.
- applies concept of skip connection
- avoid small gradients by allowing this alternate shortcut path for gradient to flow through
- recommended to have a minimum shape of 32,32, (+ #of feature channels)
- most commonly applied to analyze visual imagery
- CNNs are regularized versions of multilayer perceptrons.
- Multilayer perceptrons usually mean fully connected networks.
- Each neuron in one layer is connected to all neurons in the next layer.
- requires higher image sizes
- lower performance compared to a cnn model
- Increased complexity of architecture
- Computationally expensive
- There is no specific rule for determining the structure of a neural network.
AI_SignLanguage_demo_video.mov
- Expand diversity of dataset by getting different individuals to model letter types (excluding J and Z which require motion).
- potential factors: size of hand, colour of hand
- Implement Data Augmentation (rotations, flipping, etc..) to improve model robustness .
- Increase image sizes to improve training generalization and improve live-testing accuracy.
- Continue fine-tuning the Hyperparameters and the complexity of the hidden layers in order to optimize models’ performance.
-
Our code configuration was able to reasonably capture dynamic webcam data from the user using openCV to use for our model’s prediction algorithm.
-
Further improvements must be made to increase model accuracy, improve generalization in order to enhance communication for individuals with hearing impairment and build a more inclusive society.