patrobic / blindaid Goto Github PK

Capstone Project: Assist the blind in moving around safely by warning them of impending obstacles using depth sensing, computer vision, and tactile glove feedback.

C 9.70% C++ 90.16% Objective-C 0.14%

capstone computer-vision blind visually-impaired tactile-feedback visual-assistant depth-perception traffic-light depth deep-learning

blindaid's Introduction

BlindAid

Capstone Project: Assist the blind in moving around safely by warning them of impending obstacles using depth sensing, computer vision, and tactile glove feedback.

CONFIGURATION MODES: BlindAid can be launched and configured in 3 ways.

Arguments: Launch from command line, configured with no interaction by passing arguments.
User Menu: Launch without arguments, configured through a user interactive menu interface.
Shortcuts: To simplify mode selection, create desktop shortcuts with argument combinations.

USAGE: Summary of flags by category.

COMMAND     [FLAGS]                                                 (CATEGORY)
blindaid    [-a | -c | -t <path> | -s <path> -r [delay] [path]]     (operation mode selection)
            [-d -v [level] -l]                                      (debugging performance options)
            [-p <port> -cc <count>]                                 (miscellaneous settings)
            [-do {fr | hp} -tl {dl | bd}]                           (processing module selection)
            [-coloroff | -depthoff]                                 (image channel selection)

DETAILS: Description of flags and parameters.

-FLAG [ARGUMENT]    DESCRIPTION             (DETAILS)                                       [CHANGES]
-a                  Realtime Mode           (bypass menu, no user interaction)              [Camera ON /Glove ON]
-c                  Camera Only             (disable glove, print control to screen)        [Camera ON / Glove OFF]
-t <path>           Glove Only              (disable camera, load images from disk)         [Camera OFF/Glove ON]
-s <path>           Simulate Mode           (disable performance optimizations)             [Camera OFF/Glove OFF]
-r [delay] [path]   Record Enabled          (save images to disk, 0 for manual)             [Record ON]
-d                  Display Images          (show color/depth images to screen)             [Display ON]
-v [level]          Verbose Messages        (print info messages to screen)                 [Logging ON]
-l                  Low Performance         (disable multi threading optimizations)         [Threads OFF]
-p <port #>         Set COM Port Number     (for Bluetooth glove connection)
-cc [count]         Set Consecutive Count   (number of detections before warning)
-do {fr | hp}       Depth Obstacle Mode     (fixed regions/hand position)                   [DepthObstacle FR/HP]
-tl {dl | bd}       Traffic Light Mode      (deep learning/blob detector)                   [TrafficLight DL/BD]
-coloroff           Depth Image Only        (disable color stream processing)               [Color OFF]
-depthoff           Color Image Only        (disable depth stream processing)               [Depth OFF]
-smooth [count]     Depth Frame Smoothing   (take consecutive max to reduce noise)
-conf [confidence]  Minimum Red Confidence  (confidence required to detect red)
-colordim <w> [h]   Color Image Dimensions  (color image width and height ratios)
-depthdim <w> [h]   Depth Image Dimensions  (depth image width and height ratios)
-ignore [ignore]    Percentile to Ignore    (near end of histogram ratio to ignore)
-valid [valid]      Valid Ratio Threshold   (minimum ratio of non-zero depth pixels)
-?                  Show Help               (show flag descriptions)

SCENARIOS: Useful argument combinations.

COMMAND  FLAGS              DESCRIPTION             (PURPOSE)
blindaid                    Menu Interface          (manual configuration, via interactive menu)
blindaid -a                 Realtime Final          (complete experience, for final product demo)
blindaid -c                 Capture Only            (to demo without glove, print control to screen)
blindaid -t path            Control Only            (to demo without camera, load images from disk)
blindaid -s path            Simulate All            (disable camera and glove, only test software loop)
blindaid -r ms              Record Images           (save images periodically, or 0 for manual trigger)
blindaid -a -d              Realtime w\Debug        (full experience w\image display, low performance)
blindaid -c -d              Capture w\Debug         (capture only w\display, to demonstrate processing)

Shortcut Icons

1) the FULL experience, NO display (for final demo).
2) disable glove, NO display (mainly for benchmarking).
3) FULL experience, WITH images on screen (to give in depth look into program).
4) disable glove, WITH images on screen (so we don't have to worry about the glove's battery/connection).

Modules

The various modules supported which can be enabled or disabled, along with key parameters associated to each.

DepthObstacle Detection

Uses the depth (IR Grayscale) image to detect the presence of nearby obstacles, warning the user of their position and distance via the glove's tactile feedback.

TrafficLight Detection

Uses the color (RGB) image to detect traffic lights in the upper portion of the frame, indicating to the user of their presence and color (red/yellow/green) via audible feedback.

blindaid's People

Contributors

Stargazers

Watchers

Forkers

mostafahammoud pgoddard10 zudame han6man hosamalsamman ranjithramu26 jahmaine94 church1ll shub659 antidesert5 fuhongxuelook vip-aid

blindaid's Issues

Improve Display Module persistence and user friendliness.

The display window system currently opens a color and depth OpenCV window of huge size and at a fixed place on screen, usually blocking the command line window results. The images also disappear after the function displaying it goes out of scope.

Scale the window to reasonable size
Position image windows and command line window to not overlap.
Have a persistent display window that isn't recreated and doesn't disappear each time a new image/mode is select.

It would be nice to have a unified window/primitive interface that displays all images and text together. Look into GUI extensions to C++.

Gradual memory leak in Realsense SDK.

Fix Depth Detection Gaussian blur causing Shadow regions to overflow into non-masked areas.

A mask is calculated using the regions in the depth image which are zero (no depth data available), since zero would otherwise indicate ZERO DISTANCE from the camera and invalidate/pollute depth data.

Gaussian blur does not seem to support masks, and so it causes non-zero regions NEAR a zero region to become much darker. These affected regions are NOT covered by the mask, and so depth data is often erroneously zero.

Need to find a way to ignore zero regions with GaussianBlur, use another blur algorithm, or I've read non-zero regions can be dilated into zero regions (or perform inverse erode on zero regions?).

Implement mapping function from nearest pixel intensity value to glove control value.

design algorithm that calculates vibrator control value (voltage/factor/whatever is required by the API controlling the Arduino/glove).
Should implement a function that maps the nearest pixel intensity value to control value in some non-linear way, giving higher weight to near objects (possibly negative exponential function?).

Relevant function is in ModuleControl.h source file.

Implement Control class to communicate with Glove controller.

Once we choose/design/build a Glove unit.

Convert the minimum depth pixel values into voltages/signals that control the glove.
Provide audible feedback for detection of specific objects (record audio).
Implement consecutive/time-duration counter for each object type, to reduce false positives.

Write Unit Tests for various Modules.

Design unit tests for the various modules, testing for stability and functionality.

Core module photo/video file loading with multiple key configurations.
Each vision module run with a set of sample images, both easy and difficult conditions. Count the number and position of expected detections, and compare actual with expected.

Integrate Unit Testing Framework into Project.

Visual studio has an integrated unit testing framework for C++ code which seems to be the easiest way to go, found here:

https://msdn.microsoft.com/en-us/library/hh598953.aspx

It requires the creation of a new "project" within the solution, and uses the structure TEST_CLASS which contains multiple TEST_METHOD.
We should have one .cpp file containing a TEST_CLASS for each Module named "TestName.cpp", and any number of tests within that class.

Implement proper config file loading/validating mechanic that creates new file if nonexistant or invalid.

Should be safe (i.e. no crashing) regardles of current state of config file (valid/corrupt/nonexistant). The following pseudocode addresses all the cases.

if(file exists)
try(LoadConfiguration());
if(!file.valid) .. LoadConfig should return if the file is valid (contains all values, and sensical)
if(PromptOverwrite()) ... prompt user to overwrite existing corrupt file?
Saveconfiguration();
else
throw exception(); ... terminate application!
else ... no configuration exists at all (new installation).
SaveConfiguration(); ... create a file by saving the default configuration specified in each class's Parameters class.

Relevant code is in Menu.cpp source file, in the constructor.

Bluetooth CSerial COM port class not working with Asus BT400 USB module.

Bluetooth communication works with laptop's integrated bluetooth module but not BT400 on laptop OR Up Board.

Fixed by switching to SerialPort class from here.
https://github.com/manashmndl/SerialPort

Implement Capture class to communicate with hardware Camera.

Once we choose/receive the camera model.
Will most likely be using Intel RealSense SR300 with RealSense SDK, found here:
https://software.intel.com/en-us/realsense-sdk-windows-eol

Integrate the Realsense SDK libraries/dlls into project.
Capture/extract images from the class.

Fix threading concurrency read/write conflicts between Image/Results.

Threads for Capture/Vision/Control/Display modules use mutexes to access Image/Results, but it still seems to be a bit buggy and possibly unsafe.

Also I suspect that the threads do very little to improve performance, if not even making them worse!

Consider either improving the code to guarantee that threads do not collide, AND improve performance by better synchronizing them, or REMOVE them entirely and call the modules consecutively in ModuleCore.

Need to find a way to cause Capture module to trigger the other 3 through ModuleCore.

It remains to be seen how the SDK will provide frames. I foresee either one of two approaches:

Push: The SDK writes every new frame to a buffer as it arrives, and it is up to us whether we want to and have the time to get it.
Pull: The user requests a new frame from the SDK whenever the program is done processing the previous one (this could theoretically mean that the same frame could be acquired twice, though in practice the processing will NEVER be that rapid).

Need to familiarize with Events/Callbacks... anything else the SDK requires.

Implement FaceDetection using Machine Learning.

Identify an existing library/API that offers Face Detection (identification not necessary).
Integrate this into VS project and use it in ModuleFace.

Tune/configure it to accurately detect faces in sample images.

Implement Audio Feedback for Traffic Light Detection.

Record audio telling user "red traffic light".
Research and implement library/class requiredd for audio signal generation from C++ program.

Investigate the use of Deep/Machine Learning for Traffic Light Detection.

It seems that machine/CNNs is well suited to identifying patterns/objects in images, which is basically all that we do.

Advantages:

Requires much less CV algorithm design.
Potentially much more reliable (fewer false positives, maybe more good positives?).
Would bring an aspect of novelty/coolness to our project, and give us experience in ML.

Pitfalls:

Harder to setup and configure (lack of familiarity), and higher learning curve.
Potentially high demand in performance, maybe not suitable for embedded systems.
Requires training process which is expensive in time/computation cost/number of images.

Conduct research to see if the foreseen advantages are realistic, and the disadvantages are manageable/not show stoppers.

Implement Record module that saves color and depth image streams to disk.

Should create a folder for recordings, a subfolder with uniquely numbered/dated name, and within this subfolder save every captured color and depth image as a jpg (or other common format) named as follows.

Recordings/
DD-MM-YY_HH-MM-SS/
Color_0000.jpg ... Color_9999.jpg
Depth_0000.jpg ... Depth_9999.jpg

Photo is easier than video and gives us the ability to select a subset of key images for testing/simulation.

Relevant code is in ModuleRecord.cpp source file, in function SaveToDisk(). #

Add New Control Parameters to Customize the Commands Sent to the Glove.

Need a set of parameters to control the max/min range thresholds for glove vibration control.

Minimum point and value for mapping function.
Maximum point and value for mapping function.
Threshold value for minimum command to transmit.

These parameters may be adjustable independently for the various fingers or sets of fingers.

Separate range for middle/middle 3 fingers (i.e. ignore objects in periphery unless they are REALLY close).
Separate weights for top/bottom and center horizontal region (i.e. prioritize center or top regions ex. to ignore floor).

Look into procedure to port code to Embedded Platform.

Since the end product needs to be mobile, we need to look into options for a compact/efficient processing platform. Some alternatives suggested are as follows.

Raspberry Pi (seems most appealing)
Android Phone/Tablet (also very interesting)
Intel NUC (very expensive)
Arduino (not powerful enough)

As we look into options, we need to consider:

Cost
Size
Applicable Languages
Compatibility with OpenCV
Overall Porting Difficulty

Implement Framework to store Test Image with Expected Detections.

Design a framework to store an image path along with a set of it's expected detections, and ability to compare these to actual detections within a margin of error.
The class should return true or false based on the accuracy of it's detections.

Add detection of Green and Yellow traffic lights.

Requested by Ted, must be done before Friday of this week.

Modify TrafficLight::Results class to incorporate color of a Circle.
Modify Display Module to color the circle according to the light's color, and display text ("Red", "Green", "Yellow").
Modify Control Simulator to print the colors which have been detected.

Design Pixel Intensity to Arduino Control Value Mapping Function.

Need a function in Control Module that will map nearest pixel value for a region to a PWM value for the Arduino. It must do the following.

Invert the value's range (0-255 -> 255-0).
Weight closer values more significantly.

Possibly use some kind of exponential distribution?

Capture test images/videos of typical/challenging detection scenarios.

Use any old camera to capture images and videos of traffic lights/stop signs/faces/vehicles in easy conditions, as well as the following:

Night
Rainy
Snowing
High glare
Complex background

Downscale the images to anywhere between 640480 and 19201080, and following a consistent naming/numbering convention.

Implement ParseConfig class to load/save parameters from/to .INI file.

Could use XML or simpler [NAME]=data format.
INI file should have same name as program ("BlindAid.ini"), and be loaded from the executable's folder.

Optional (very useful):
Auto-generate template config file with default values.
Error checking/recovery in case file is corrupt/missing.

Implement StopSign detection using OpenCV

Tools available:

SimpleBlobDetector
FindContours/approximatePolyDP
HoughLines
Canny edge detection
And especially cvtColor to convert the image to HSV (hue, saturation, value) color space. That way you can threshold on the color wheel and easily extract red regions.

From that point on it will be a question of using clever ways to filter out garbage regions, limit size/aspect ratio/concavity, maybe count the number of vertexes (should be 8), or even search within the potential region with OCR/stringReader for the string "STOP".

Fix Persistence of Parameters when Running Multiple Modes without Restarting.

Currently, using the menu system manual configuration, if one mode is selected, the customized parameters persist and are inherited by the next selected mode. This requires a program restart to obtain causal, expected behavior.

Using either a new Parameters class (to set it back to default), or some kind of reset function... parameters should be fresh for each mode without requiring a restart.

Convert Capture/Control Module Base Classes to Factory Pattern.

Currently an if statement is needed to check mode parameter and construct either simulation or realtime derived classes in runtime accordingly. The base class must have a function that returns either Simulate or ModuleCapture based on its internal Mode parameter.

The goal is to eliminate the need for a logical switch in the Core Module.

Implement Consecutive Counter for Vision Detections.

In order to decrease the number of false positives, we rely on the following three principles:

A valid detection will appear consistently in consecutive frames (unless conditions are very bad).
A valid detection will move very little in the frame between successive frames (unless subject moves very quickly).
A valid detection will not grow or shrink significantly in size.

We can thus implement a counter that "silently" tracks the number of consecutive frames in which a particular detection is made, and only trigger a warning once said counter reaches a threshold (e.g. 10 frames ~= 1 second).

The tracking mechanism must thus be spatially and size aware. The challenge is that the number of detections will change frame by frame, and so an algorithm to match closest detections together is needed (i.e. the vector of detections will not be in the same order, and so detection #x could be associated with #y in the next frame).

This may not be needed if we use deep learning, AND it is sufficiently accurate. However with the high number of frames and low quality the video will produce, even a very high success rate will still lead to a significant number of false detection (ex. 10fps with 95% -> 1 false positive every 2 seconds!!!).

Decide on and Implement Functionality of 6th Vibrator.

Most likely will be used to alert user of traffic light in the frame.

Write new function that reads traffic light results, determines if there is a red light in the frame, and modulates 6th vibrator signal accordingly.
Also need new parameter to enable or disable this functionality, possibly to repurpose it to another use.