Giter VIP home page Giter VIP logo

seal-see-all's Introduction

Project SEAL-See All

Project SEAL-See All is a comprehensive computer vision application designed to augment the interaction between users and computers using facial detection, hand gesture recognition, and object detection. Implemented on a macOS environment and leveraging a webcam, SEAL enables users to control actions or trigger scripts through specific hand gestures while also offering object detection and text extraction capabilities.

Features Facial Detection and Recognition: Identifies and recognizes faces in real-time video streams. Hand Gesture Recognition: Detects hand gestures including closed fist, open palm, pointing up, thumbs up, thumbs down, and victory sign using MediaPipe. Object Detection: Utilizes YOLO COCO dataset for recognizing a wide range of common objects. Optical Character Recognition (OCR): Extracts text from identified objects with high confidence using PyTesseract.

Installation Ensure you have Python 3.x installed on your macOS. Clone the repository and navigate to the project directory.

git clone cd SEAL-See-All

Dependencies

Install the required Python packages: pip install -r requirements.txt

Make sure to have the models and required files in the following paths: YOLO configuration and weights: /yolo/yolov3.cfg, /yolo/yolov3.weights COCO names: /yolo/coco.names Haar Cascade for face detection: /haar/haarcascade_frontalface_default.xml Gesture recognizer model: gesture_recognizer/gesture_recognizer.task Tesseract OCR: Ensure tesseract is installed and correctly set up on your macOS.

Preparing Facial Encodings Place sample images in the designated folder. Run image_augmentation.py to augment your images for better recognition performance. Use face_encoder.py to generate facial encoding files.

python image_augmentation.py python face_encoder.py

image

Usage Run the main script to start the application:

python seal.py

Toggle Object Detection: Press the 'o' key to turn on/off object detection. Extract Text: Press the 'a' key to activate OCR on objects detected with over 80% confidence.

Yolo Objects:

names:

0: person

1: bicycle

2: car

3: motorcycle

4: airplane

5: bus

6: train

7: truck

8: boat

9: traffic light

10: fire hydrant

11: stop sign

12: parking meter

13: bench

14: bird

15: cat

16: dog

17: horse

18: sheep

19: cow

20: elephant

21: bear

22: zebra

23: giraffe

24: backpack

25: umbrella

26: handbag

27: tie

28: suitcase

29: frisbee

30: skis

31: snowboard

32: sports ball

33: kite

34: baseball bat

35: baseball glove

36: skateboard

37: surfboard

38: tennis racket

39: bottle

40: wine glass

41: cup

42: fork

43: knife

44: spoon

45: bowl

46: banana

47: apple

48: sandwich

49: orange

50: broccoli

51: carrot

52: hot dog

53: pizza

54: donut

55: cake

56: chair

57: couch

58: potted plant

59: bed

60: dining table

61: toilet

62: tv

63: laptop

64: mouse

65: remote

66: keyboard

67: cell phone

68: microwave

69: oven

70: toaster

71: sink

72: refrigerator

73: book

74: clock

75: vase

76: scissors

77: teddy bear

78: hair drier

79: toothbrush

image

image

image

image

image

Adding Gesture Actions

Edit the gesture_recognition_callback function in seal.py to define actions for detected gestures.

To-Do:

Image Enhancement Integration:

I plan to integrate an image enhancement model to refine the images captured for object detection. This enhancement aims to significantly improve the quality of images before they are processed by OCR, thereby enhancing the accuracy of text extraction from complex backgrounds or low-quality images.

Voice Command Activation:

In an effort to broaden the interactivity of SEAL-See All, I am working on incorporating a feature that enables live voice recording with immediate transcription. The transcribed audio will then be fed into ShellGPT, facilitating the conversion of voice commands into actionable code execution within the command line environment. This enhancement is targeted towards creating a more seamless and hands-free user experience, enabling users to execute commands and control the application through voice interactions.

Stay tuned for these exciting updates as we continue to enhance the functionality and user experience of Project SEAL-See All.

seal-see-all's People

Contributors

ljpearson176 avatar

Stargazers

 avatar

Watchers

Kostas Georgiou avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.