Giter VIP home page Giter VIP logo

sodapeter / automated-speech-recognition Goto Github PK

View Code? Open in Web Editor NEW

This project forked from allenye66/computer-vision-lip-reading

0.0 0.0 0.0 197.92 MB

An autonomous speechreading algorithm to help the deaf or hard-of-hearing by translating visual lip movements in live-time into coherent sentences. This algorithm uses deep learning, computer vision, and natural language processing models.

Python 100.00%

automated-speech-recognition's Introduction

Automated-Speech-Recognition

More than 13% of U.S. adults suffer from hearing loss. Some causes include exposure to loud noises, physical head injuries, and presbycusis. We propose using an autonomous speechreading algorithm to help the deaf or hard-of-hearing by translating visual lip movements in live-time into coherent sentences. We accomplish this by using a supervised ensemble deep learning model to classify lip movements into phonemes, then stitch phonemes back into words. Our dataset consists of images of segmented mouths that are each labeled with a phoneme. We process our images by first downsizing them to 64 by 64 pixels in order to speed up training time and reduce the memory needed. Afterward, we perform Gaussian Blurring to blur edges, reduce contrast, and smooth sharp curves and also perform data augmentation to train the model to be less prone to overfitting. Our first computer vision model is a 1-D CNN (convolutional neural network) that imitates the famous VGG architecture. Next, we use a similar architecture for a 2-D CNN. We then perform ensemble learning, specifically using the voting technique. Our 1-D and 2-D CNN achieves a balanced accuracy of 31.7% and 17.3% respectively. Our ensemble techniques raise the balanced accuracy to 33.29%. We use the balanced accuracy as our metric due to using an unbalanced dataset. Human experts achieve only ~30 percent accuracy after years of training, which our models match after a few minutes of training.

automated-speech-recognition's People

Contributors

allenye66 avatar richard-cao226 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.