wolfmanstout / gaze-ocr Goto Github PK
View Code? Open in Web Editor NEWEasily apply OCR to wherever the user is looking onscreen.
License: Apache License 2.0
Easily apply OCR to wherever the user is looking onscreen.
License: Apache License 2.0
@lunixbochs FYI
Currently, if I try to include this package (and screen-ocr) in a Talon user directory, it runs several files that I don't want, including setup.py and optional dependencies such as dragonfly.py. I can guard some of these on if __name__ == "__main__"
, but not the optional dependencies, which are supposed to be imported as modules. This triggers import errors due to missing dependencies.
I'm testing this out on Python 3.8.2 32-bit.
I've tried several different ways to run this. The 2nd and 3rd methods do not function reliably. Is anything further I can do to help troubleshoot?
Running the grammar through natlink through the traditional method works correctly _gaze-ocr.py
. (in-process method.)
python -m dragonfly load --engine natlink gaze-ocr.py --no-recobs-messages
in-process method.
The OCR seems to be working correctly marked as success and does indeed have a correct gaze location. However no text is highlighted.
ocr_data.zip
python -m dragonfly load --engine natlink gaze-ocr.py --no-recobs-messages
out-of-process method. (grammars on its own thread)
Now what's interesting here several different behaviors appear.
The OCR seems to be working correctly marked as success and does indeed have a correct gaze location. However no text is highlighted.
ocr_data.zip success
On occasion regardless of repeated tests it fails to create a selection and Execution failed: SelectTextAction()
is produced. Examining OCR data it does not reflect the gaze at the time of recognition. (Verified through Gaze Trace). In fact no matter where I look for (e.g. 4 corners of the screen) the coordinates of the gaze snapshot seems to be the same location. This is despite a significant pause after dictation.
import threading, time
import gaze_ocr
import screen_ocr # dependency of gaze-ocr
from dragonfly import (
Dictation,
Grammar,
Key,
MappingRule,
Mouse,
Text,
get_engine
)
# See installation instructions:
# https://github.com/wolfmanstout/gaze-ocr
DLL_DIRECTORY = r"C:\Users\Main\Desktop\ocr_data\dll"
# Initialize eye tracking and OCR.
tracker = gaze_ocr.eye_tracking.EyeTracker.get_connected_instance(DLL_DIRECTORY)
ocr_reader = screen_ocr.Reader.create_fast_reader()
gaze_ocr_controller = gaze_ocr.Controller(ocr_reader, tracker, save_data_directory=r"C:\Users\Main\Desktop\ocr_data")
class CommandRule(MappingRule):
mapping = {
# Click on text.
"<text> click": gaze_ocr_controller.move_cursor_to_word_action("%(text)s") + Mouse("left"),
# Move the cursor for text editing.
"go before <text>": gaze_ocr_controller.move_cursor_to_word_action("%(text)s", "before") + Mouse("left"),
"go after <text>": gaze_ocr_controller.move_cursor_to_word_action("%(text)s", "after") + Mouse("left"),
# Select text starting from the current position.
"words before <text>": gaze_ocr_controller.move_cursor_to_word_action("%(text)s", "before") + Key("shift:down") + Mouse("left") + Key("shift:up"),
"words after <text>": gaze_ocr_controller.move_cursor_to_word_action("%(text)s", "after") + Key("shift:down") + Mouse("left") + Key("shift:up"),
# Select a phrase or range of text.
"words <text> [through <text2>]": gaze_ocr_controller.select_text_action("%(text)s", "%(text2)s"),
# Select and replace text.
"replace <text> with <replacement>": gaze_ocr_controller.select_text_action("%(text)s") + Text("%(replacement)s"),
}
extras = [
Dictation("text"),
Dictation("text2"),
Dictation("replacement"),
]
def _process_begin(self):
# Start OCR now so that results are ready when the command completes.
gaze_ocr_controller.start_reading_nearby()
grammar = Grammar("ocr_test")
grammar.add_rule(CommandRule())
grammar.load()
# Force NatLink to schedule background threads frequently by regularly waking up
# a dummy thread.
shutdown_dummy_thread_event = threading.Event()
def run_dummy_thread():
while not shutdown_dummy_thread_event.is_set():
time.sleep(1)
dummy_thread = threading.Thread(target=run_dummy_thread)
dummy_thread.start()
# Initialize a Dragonfly timer to manually yield control to the thread.
def wake_dummy_thread():
dummy_thread.join(0.002)
wake_dummy_thread_timer = get_engine().create_timer(wake_dummy_thread, 0.02)
def unload():
# ... after unloading the grammar ...
shutdown_dummy_thread_event.set()
dummy_thread.join()
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.