danpla / dpscreenocr Goto Github PK
View Code? Open in Web Editor NEWProgram to recognize text on screen
Home Page: https://danpla.github.io/dpscreenocr/
License: zlib License
Program to recognize text on screen
Home Page: https://danpla.github.io/dpscreenocr/
License: zlib License
At now users see readonly list of available languages, with no ideas how to add new language. In manual there are described ways, but not all users read it. So will be good to create "Add languages" button that show instructions how to add new languages via installing Tesseract packages.
$sudo add-apt-repository ppa:daniel.p/dpscreenocr
E: The repository 'https://ppa.launchpadcontent.net/daniel.p/dpscreenocr/ubuntu kinetic Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
$sudo apt install dpscreenocr
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package dpscreenocr
Application was running previously but after
the application opens with the message, "Please install languages" . Assumed that no language file was installed or missing. Upon inspection, the file eng.traindata was present in the C:\Program Files (x86)\dpScreenOCR\tessdata directory. Downloading another language file and copying it to the same directory does not change the error message.
Removed and re-installed dpscreenocr several times did not yield different results. Uninstalling dpscreenocr does not remove the configuration file in C:\Users<user>\AppData\Local\dpscreenocr. Manually deleted the directory and re-installed dpscreenocr but still no change.
Installation onto another computer with similar windows level (version 20H2 Build 19042.1466) was successful so issue is local to the computer.
Does dpscreenOCR aware of other tesseract installation through environment variables?
Currently after hotkey is pressed screen doesn't freeze like in Text extractor from windows powertoys, where I'm able to select text without worrying that I'll miss the moment after hotkey press. Adding this feature would be convenient when i need to quickly copy some text that is gonna disappear soon from screen.
When I try to translate some text on the screen, dpscreenocr
is crashing with
Warning: Invalid resolution 0 dpi. Using 70 instead.
Segmentation fault
I don't know if the warning is meaningful, btw.
Any idea how to diagnose this better?
I'm running Ubuntu 18.04 and dpscreenocr
1.0.2 from the ppa.
some of the dlls are detected by avg as heavily disguised malware
Hi, just starting to look at the code and wondering what would be needed for wayland support. If it's just around the hotkey triggering limitations I know some projects (e.g. kupfer) have just worked around needing X11KeyManager by having a command-line method of invocation paired with instructions to setup a custom shortcut (e.g. via gnome settings).
Opening an issue to discuss/understand if there's other aspects to be considered in attempting to support wayland.
(Thanks for your earlier advice, application works fine under x11)
I just upgraded to 1.30 and Malwarebytes is flagging it as malware:
-Blocked Malware Details-
File: 1
MachineLearning/Anomalous.94%, C:\Program Files (x86)\dpScreenOCR\dpscreenocr.exe, Quarantined, 0, 392687, 1.0.63270, , shuriken, , 7A97F31B2DC40CFC17A36D015C311E30, 89CF93609749445DB72798C62731B3611525E7BAFA1872E3A9AC825720D03451
I have this error
qt5ct: using qt5ct plugin
Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
but /usr/share/tesseract-ocr/5/tessdata/eng.traineddata is the new one
error from linux mint
Hello, Thank you very much for this useful program. How about a command line option to run the command that copies the recognized text of the selected rectanglye, as triggered by the hot key, to use a key grabber such as xbindkeys
instead of that of dpscreenocr?
Hello!
I have very annoying issue with dpscreenocr. It's supper nice program, but if I open new windows, i. e. some PDF, after I launched dpscreenocr, then nothing can be done to ocr the content of new window, because dpscreenocr's box shows up behind window I need.
I use Linux, X11 with qtile DE.
At now result of OCR operation seem placed only in history textarea, so not easy to understand that process successfully completed. Will be good to show popup window with result text after finishing recognize process.
sorry if its stupid, I read the manual or doc whatever you call but I could not decide should I use split text block or not, im using dpscreenocr for manga reading, ty.
Dear danpla,
I hope this message finds you well. I am an avid user of dpScreenOCR, and I want to express my appreciation for the excellent work you've done on this application. I have been impressed by its speed and efficiency.
As I've been using dpScreenOCR, I've identified three features that I believe would significantly enhance the user experience:
Auto Startup: It would be immensely beneficial to have an option to enable dpScreenOCR to launch automatically upon system startup. This feature would streamline its accessibility and make it readily available whenever the computer is powered on.
Minimize to Tray: Another valuable addition would be the ability to keep dpScreenOCR minimized and in the system tray upon
launch. This would contribute to a cleaner desktop environment while ensuring quick access to the application when needed.
Adjustable Line Concatenation Option: Introducing a feature that allows users to enable or disable line concatenation upon copying text would significantly enhance the readability and organization of extracted content. When activated, this option would intelligently join lines of text, eliminating line breaks and presenting a seamless and cohesive view of the captured information.
Additionally, it seems that the "Split Text Blocks" option is not working a. Your consideration of these enhancements, including investigating the "Split Text Blocks" issue, would be greatly appreciated. I believe these improvements will contribute to making dpScreenOCR an even more versatile and user-friendly tool.
Thank you for your time and dedication to improving this superb application. I look forward to seeing dpScreenOCR evolve with these potential enhancements.
Hi,
I've downloaded version 1.1.0, which supports Tesseract 5.2.0, but OCR languages refer to Tesseract 4.1:
https://danpla.github.io/dpscreenocr/languages
I'm particularly interested in Spanish.
Thanks in advance and thank you for your programme!
Will be good to integrate translate result text into other language function, using translate-shell command-line tool.
if no any plan to support bsd
Hi, I was using it on Ubuntu without any problem but on Windows it does not save config when reboot computer. I just created shortcut of dpscreenocr.exe and copied it to C:\Users\username\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup
to make dpscreenocr show up on system startup, but yeah it does not save configs.
I use Windows 7, eng ocr, by the way, What split text blocks option does?
Hello, running sudo alien --to-rpm
on the latest deb package produced a perfectly working rpm file on Opensuse 15.4; maybe worth mentioning?
Congratulations on such a promising tool! Thanks for the effort!
It seems that, sometimes, dark text/light background works better than light text/dark background (or viceversa) depending on the specific sample.
It seems interesting that there is an option to "invert" the image clip (negative) before sending it to Tesseract (checkbox or so). Is it possible?
DJuego
When using dpScreenOCR, it does not recocgnize numbers. dpScreenOCR should or could include recocgnition of numbers when numbers apperas inside a text or for the case in which the whole ocred image is a number.
I tried to add these to config file Caps
, Capslock
, Caps-Lock
, CapsLock
but it does not work after restarting dpscreenocr hotkey_cancel_selection is being empty
At the moment, it seems that's not possible to start dpscreenocr without having its window opened.
I don't need to use the window, I'd just want to use the ocr screenshot reading feature through the setted keyboard shortcut.
Also, and maybe obviously, I don't want to start the app manually, and so I'd be great if there was a "autostart at system startup" feature (with, like said before, no window opened and then showed).
Thank you for creating this cool little app. But there is someting important I really miss.
When I try to use dpScreenOCR with this picture
Then I get this output:
Eh bien, je m’entraîne beaur-
coup, je me prépare pour les
championnats suisses, après je
voudrais partir pour les Etats-
Unis. Mais ma mère préfère
rester en Europe.
But what I would like to have is this:
Eh bien, je m’entraîne beaurcoup, je me prépare pour les championnats suisses, après je voudrais partir pour les Etats-Unis. Mais ma mère préfère rester en Europe.
What is the difference?
A) I prefer to have a mode that produces no line breaks.
B) Separate words like "beaur-
coup" at hte end of a line should be written together: "beaurcoup".
C) Hyphenated words in which the hyphen is not used to separate (like "Etats-Unis"): The hyphen should be retained here.
If you can not implement C) it would be good inough to have A) and B) .
What do you think?
I'm new to Ubuntu and have no idea where the download is saved or how to get the application to run. Can you please give some further guidance?
I really love the program and am using it more than I thought I'd. Thank you!
However, I had a feature request: run in the background (preferably as a tray icon).
I am a bit confused.
According to https://github.com/danpla/dpscreenocr/releases/tag/v1.3.0, version 1.3.0 is the latest version.
I have added the danpla Repo according to https://software.opensuse.org//download.html?project=home%3Adanpla&package=dpscreenocr:
With ...
echo 'deb http://download.opensuse.org/repositories/home:/danpla/Debian_10/ /' | sudo tee /etc/apt/sources.list.d/home:danpla.list
curl -fsSL https://download.opensuse.org/repositories/home:danpla/Debian_10/Release.key | gpg --dearmor | sudo tee /etc/apt/trusted.gpg.d/home_danpla.gpg > /dev/null
sudo apt update
sudo apt install dpscreenocr
... some moth ago. This has worked.
But when I make this:
$ sudo apt install dpscreenocr
Package lists are read... Ready
Dependency tree is built.
Status information is read.... Done
dpscreenocr is already the latest version (1.2.0-1).
0 updated, 0 reinstalled, 0 to remove and 0 not updated.
My expectation was, that I will get version 1.3.0
What do I miss?
Would appreciate some answer. Thank you.
Can you briefly explain the difference between the dpScreenOCR tool, Microsoft's Text Extractor utility from PowerToys, and the OCR tool from the open-source ShareX package?
Actually, it would be ideal to write inside README.md file a brief comparison between these three tools, or a comparison table with those tools features.
Thanks
Even testing with one language it takes ten seconds to recognize. OCR'ing the same image with tesseract via the commandline takes under a second.
I have it set to only recognize one language in both case, so am not sure what's causing the delay in dpscreenocr.
I'm trying to find ways to improve OCR quality so I have seen some OCR postprocess tweaking with imagemagick so I wonder do dpscreenocr that already?
After installation using the PPA, I installed tesseract-ocr-eng
, but it wasn't recognized. So I reinstalled the app. Still does not recognize language.
This is very bad UX, and no examples for English packages are given in the manual.
I'm on Ubuntu 20.04
Due to the nature of the program, it cannot detect oblique text or words, even if it detects it, it detects it incorrectly, an angled area selection can be beautiful.
Please provide a screenshot or a video link to help users better understand how the tool works, before the install app.
I have found a video, although not very well recorded: https://www.youtube.com/watch?v=DCbfp0zMJCk
... but several screenshots from the application would be much better.
Maybe you could "integrate" it somehow in the Scoop package manager for Windows, for example.
In order to easily and quickly test the application, adding a Scoop manifest JSON file might be a good idea,
then I could install it quickly with:
scoop install https://raw.githubusercontent.com/danpla/dpscreenocr/master/dpscreenocr.json
or something similar.
For now, that's enough and too much from me...
Thanks!
Great program, good job.
Is it possible to add an option to minimize the program to system tray.
I use Kubuntu 18.04.
Hello,
Please allow me a suggestion. Would be nice an option to start the program minimized in the system tray. In my case the program starts when windows starts and I use it when I need it with a hotkey, I can minimize it every time in the system tray using some other program for this purpose, but it has to be done every time the session starts. An option for minimize itself to the system tray on startup would be wonderful.
A greeting. and congratulations for this fantastic program.
Problem:
dpscreenocr uses Tesseract to recognize text on the screen. However, Tesseract has known issues with recognizing text from images captured by Intel UHD 600 graphics cards (like BSOD).
Solution:
Add support for Tesseract-OCR-DNN to dpscreenocr. Tesseract-OCR-DNN is a deep learning-based OCR engine that is not affected by the same issues as Tesseract.
Benefits:
Improved accuracy for recognizing text from images captured by Intel UHD600 graphics cards.
Increased flexibility for users to choose the best OCR engine for their needs.
Implementation:
Add a new configuration option to dpscreenocr that allows users to choose between using Tesseract or Tesseract-OCR-DNN.
Update the code to use Tesseract-OCR-DNN when the new configuration option is set to "Tesseract-OCR-DNN".
Conclusion:
Adding support for Tesseract-OCR-DNN to dpscreenocr would be a valuable addition to the program. It would provide users with improved accuracy and flexibility, making it a more powerful tool for recognizing text on the screen.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.