janis91 / ocr Goto Github PK
View Code? Open in Web Editor NEWNextcloud OCR (optical character recoginition) processing for images with tesseract-js
License: GNU Affero General Public License v3.0
Nextcloud OCR (optical character recoginition) processing for images with tesseract-js
License: GNU Affero General Public License v3.0
Nextcloud 11.0.0, OCR 2.0.0.
When trying to OCR and OCRWorker ist not running, the OCR queue is getting big and bigger.
After starting OCRWorker, freshly queued documents will be OCR'd, but the old the OCR queue from before won't vanish.
Also Apache log will be filled up a lot.
"GET /apps/ocr/status HTTP/1.1" 200 997 "
Can only disable the app at the moment until the bug is fixed.
I have tried to install this app by downloading tarball from Nextcloud appstore and by cloning the repo. It should appear as an inactive app.
App can't be found in list.
I have tried to install the app from the old Owncloud apstore. As it is not there, I changed my settings.php to point to NC appstore. I couldn't find it there so I tried to download the tar.gz from there. I noticed there were some problems with file permissions and I fixed them manually, but app doesn't appear in the list.
I have also tried to install app by cloning the repository. I find the same permission issue.
The permissions in app folder are:
Nextcloud 10
Apache 2.4.18
MariaDb 10.1.19
sudo -u www-data php /var/www/nextcloud/occ encryption:status
Instead of running supervisor, i've added this to systemd:
/etc/systemd/system/nextcloud-ocr-worker.service
:
[Unit]
Description=OCRWorker for Nextcloud OCR
After=apache2.service
[Service]
User=www-data
Group=www-data
ExecStart=/usr/bin/php /var/www/nextcloud/apps/ocr/worker/OCRWorker.php
Nice=19
Restart=always
[Install]
WantedBy=multi-user.target
A file in a folder shared by another user should be able to be processed by the OCR app.
Currently you may select files shared by other users and start the OCR process but it never finishes since the file is not accessible via nextcloud/data/user/current_user/files.
There are two possilbe solutions: 1st: disallow OCR processing of shared files or 2nd implement a possibility to find out the real path of the file to process - and if writable by the current user, process it.
OCRWorker outputs "ERROR - File not found"
Create the remaining unit tests for better test coverage. maybe exclude the worker from coverage report.
Fix the failing build. It only is an issue for php7 build.
Provide a better documentation, also for php7 and gearman for example.
If a command/file fails to process the OCRWorker will recieve error messages which can be very helpful for bugtrakcing AND problem resolution for the user.
Nothing of these messages is logged nor displayed during the status update to FAILED.
The whole message which is recieved by the OCRWorker could be transferred (via tmp file solution) to occ command and get logged and shown as a "mouse-hover-tip" in the personal settings page.
It would result in an easier bug tracking and provide much more information for the reason of a failed file.
I will test the app with ubuntu 14.04 LTS on a VM in the next days.
OCRWorker.php
is running as a systemd
service, as described in the documentation, with the correct user and group www-data
same as the web server.
Yet, OCRWorker fails to ocr a newly uploaded pdf file.
The documentation states that, OCRWorker.php
is supposed to automatically ocr a newly added pdf file.
It doesn't ocr the pdf file, even after waiting about an hour. However, the command in the overlay menu, does indeed ocr the file.
Would prefer the automatic ocr to be working.
Where to look to troubleshoot this?
Is there a log?
Has anyone had the same issue with OCRWorker
and solved it?
OCRWorker.php
daemon using the systemd
option as detailed in the wiki.OCRWorker.php
is doing nothing, even after a long time.OCRWorker
and tesseract
run with high load for 10 seconds, and a new file is produced adjacent to the original file, with _OCR.pdf
suffix, correctly containing the ocr'ed data.Enabling the OCR app in Nextcloud 11 should succeed
App "Array" cannot be installed because the following dependencies are not fulfilled: The command line tool ocrmypdf could not be found
alias ocrmypdf='docker run --rm -v "$(pwd):/home/docker" ocrmypdf'
ocrmypdf
on the CLI. (Success.)at the moment the js code is really ugly and not testable at all. I want to setup a package.json inside the "js" folder:
js:
Webpack as bundler for the dist/ocr-app.js file (npm run buildApp).
Webpack as bundler for the dist/ocr-personal.js file (npm run buildPersonal).
Jasmine for unit tests (separate tsconfig.json) (step: npm run test).
Restructure the client code to a good class structure.
Add unit tests for the classes/methods.
Adjust the php application to only include the ocr-app.js file in the dist folder.
Adjust the .travis.yml that the node_modules are installed and tests run properly.
Error index OCP\AppFramework\Db\DoesNotExistException: Did expect one result but found none when executing: query "SELECT file_target FROM *PREFIX*share WHERE file_source = ? AND share_with = ? AND uid_owner = ?"; parameters Array ( [0] => 13392691 [1] => rascal [2] => local::/home/ ) ; limit ""; offset "" /var/www/nextcloud/lib/public/AppFramework/Db/Mapper.php - line 373: OCP\AppFramework\Db\Mapper->findOneQuery('SELECT file_tar...', Array, NULL, NULL) /var/www/nextcloud/apps/ocr/lib/Db/ShareMapper.php - line 42: OCP\AppFramework\Db\Mapper->findEntity('SELECT file_tar...', Array) /var/www/nextcloud/apps/ocr/lib/Service/OcrService.php - line 320: OCA\Ocr\Db\ShareMapper->find('13392691', 'rascal', 'local /home/') /var/www/nextcloud/apps/ocr/lib/Service/OcrService.php - line 173: OCA\Ocr\Service\OcrService->buildTargetForShared(Object(OCA\Ocr\Db\File)) /var/www/nextcloud/apps/ocr/lib/Controller/OcrController.php - line 74: OCA\Ocr\Service\OcrService->process(Array, Array) /var/www/nextcloud/apps/ocr/lib/Controller/Errors.php - line 35: OCA\Ocr\Controller\OcrController->OCA\Ocr\Controller\{closure}() /var/www/nextcloud/apps/ocr/lib/Controller/OcrController.php - line 75: OCA\Ocr\Controller\OcrController->handleNotFound(Object(Closure)) [internal function] OCA\Ocr\Controller\OcrController->process(Array, Array) /var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php - line 160: call_user_func_array(Array, Array) /var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php - line 90: OC\AppFramework\Http\Dispatcher->executeController(Object(OCA\Ocr\Controller\OcrController), 'process') /var/www/nextcloud/lib/private/AppFramework/App.php - line 114: OC\AppFramework\Http\Dispatcher->dispatch(Object(OCA\Ocr\Controller\OcrController), 'process') /var/www/nextcloud/lib/private/AppFramework/Routing/RouteActionHandler.php - line 47: OC\AppFramework\App main('OcrController', 'process', Object(OC\AppFramework\DependencyInjection\DIContainer), Array) [internal function] OC\AppFramework\Routing\RouteActionHandler->__invoke(Array) /var/www/nextcloud/lib/private/Route/Router.php - line 299: call_user_func(Object(OC\AppFramework\Routing\RouteActionHandler), Array) /var/www/nextcloud/lib/base.php - line 1010: OC\Route\Router->match('/apps/ocr') /var/www/nextcloud/index.php - line 40: OC handleRequest() {main} 2 minutes ago Error ocr Exception during ocr service function processing: {"Exception":"OCP\\AppFramework\\Db\\DoesNotExistException","Message":"Did expect one result but found none when executing: query \"SELECT file_target FROM *PREFIX*share WHERE file_source = ? AND share_with = ? AND uid_owner = ?\"; parameters Array\n(\n [0] => 13392691\n [1] => rascal\n [2] => local::\/home\/\n)\n; limit \"\"; offset \"\"","Code":0,"Trace":"#0 \/var\/www\/nextcloud\/lib\/public\/AppFramework\/Db\/Mapper.php(373): OCP\\AppFramework\\Db\\Mapper->findOneQuery('SELECT file_tar...', Array, NULL, NULL)\n#1 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Db\/ShareMapper.php(42): OCP\\AppFramework\\Db\\Mapper->findEntity('SELECT file_tar...', Array)\n#2 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Service\/OcrService.php(320): OCA\\Ocr\\Db\\ShareMapper->find('13392691', 'rascal', 'local::\/home\/')\n#3 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Service\/OcrService.php(173): OCA\\Ocr\\Service\\OcrService->buildTargetForShared(Object(OCA\\Ocr\\Db\\File))\n#4 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/OcrController.php(74): OCA\\Ocr\\Service\\OcrService->process(Array, Array)\n#5 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/Errors.php(35): OCA\\Ocr\\Controller\\OcrController->OCA\\Ocr\\Controller\\{closure}()\n#6 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/OcrController.php(75): OCA\\Ocr\\Controller\\OcrController->handleNotFound(Object(Closure))\n#7 [internal function]: OCA\\Ocr\\Controller\\OcrController->process(Array, Array)\n#8 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Http\/Dispatcher.php(160): call_user_func_array(Array, Array)\n#9 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Http\/Dispatcher.php(90): OC\\AppFramework\\Http\\Dispatcher->executeController(Object(OCA\\Ocr\\Controller\\OcrController), 'process')\n#10 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/App.php(114): OC\\AppFramework\\Http\\Dispatcher->dispatch(Object(OCA\\Ocr\\Controller\\OcrController), 'process')\n#11 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Routing\/RouteActionHandler.php(47): OC\\AppFramework\\App::main('OcrController', 'process', Object(OC\\AppFramework\\DependencyInjection\\DIContainer), Array)\n#12 [internal function]: OC\\AppFramework\\Routing\\RouteActionHandler->__invoke(Array)\n#13 \/var\/www\/nextcloud\/lib\/private\/Route\/Router.php(299): call_user_func(Object(OC\\AppFramework\\Routing\\RouteActionHandler), Array)\n#14 \/var\/www\/nextcloud\/lib\/base.php(1010): OC\\Route\\Router->match('\/apps\/ocr')\n#15 \/var\/www\/nextcloud\/index.php(40): OC::handleRequest()\n#16 {main}","File":"\/var\/www\/nextcloud\/lib\/public\/AppFramework\/Db\/Mapper.php","Line":289}
Following the Supervisord example results in the file sitting in the queue.
To process the files
The file stays in a pending mode
I added user=www-data
to supervisord.conf which fixed it.
Install per current instructions
An installation of the OCR app on Debian Stretch fails to work. Users are able to start the OCR with a selected language, but then see a never ending sequence of messages reporting that "Temp file does not exist.". This sequence can only be stopped by hacking the database and removing the row related to the OCR job. Technically, the code should reset the job status before raising the exception (see pull request #64).
The file which was not found exists in /tmp/ and has the correct permissions. Nevertheless, the PHP code claims that the file does not exist. This is caused by a security feature of systemd: apache2.service gets its own /tmp directory. Therefore /tmp seen by PHP code running in Apache2 is not the same as /tmp seen by the OCRWorker.php process.
It is possible to disable the security feature with a private /tmp for apache2.service, but that would be a bad solution. The better solution is using systemd to start OCRWorker.php and tell it to share /tmp with apache2.service. This works for me.
I downloaded the master.zip and the 0.8.8: same error.
I put the folder in apps/ and open the App webpage. Can't display the list of disabled appsand I got this message:
Array to string conversion at /home/www/kh.ro/dev/nextcloud-dev/settings/Controller/AppSettingsController.php#238
I had to remove the german version in the info.xml (name/summary/description) :
<name>OCR</name>
<summary >Character recoginition for your images and pdf files.</summary>
<description><![CDATA[# Description
[![Build Status](https://travis-ci.org/janis91/ocr.svg?branch=master)](https://travis-ci.org/janis91/ocr) [![Scrutinizer Code Quality](https://scrutinizer-ci.com/g/janis91/ocr/badges/quality-score.png?b=master)](https://scrutinizer-ci.c$
**This software is in beta phase
[...]
Feature request
This may already be available and implicit in docs and I've just failed to understand it.
Make it possible to have the OCR work running on another server to help split workloads
On larger installations (250+ users, 1m+ docs) the servers have been specified for the job envisaged. Adding in (the very useful) OCR functionality puts additional load on the frontend server and potentially impacts UX for all users; offloading it reduces impact and also means the OCR server is more easily updated/optimised.
I will test the app on nextcloud with raspbian.
The file should be OCRed as in any other common Nextcloud folder.
An error is outputted in the top area of the web interface and the file is not processed.
I'm trying to OCR a file in a external storage local folder to made it available to other users with access to this folder.
{"reqId":"2cHVIjfj1kBYuU64e6xu","remoteAddr":"192.168.1.104","app":"ocr","message":"Exception during ocr service function processing: {"Exception":"OCP\\AppFramework\\Db\\DoesNotExistException","Message":"Did expect one result but found none when executing: query \"SELECT file_target FROM PREFIXshare WHERE file_source = ? AND share_with = ? AND uid_owner = ?\"; parameters Array\n(\n [0] => 320\n [1] => [email protected]\n [2] => local::\/data\/data\/\n)\n; limit \"\"; offset \"\"","Code":0,"Trace":"#0 \/var\/www\/nextcloud\/lib\/public\/AppFramework\/Db\/Mapper.php(373): OCP\\AppFramework\\Db\\Mapper->findOneQuery('SELECT file_tar...', Array, NULL, NULL)\n#1 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Db\/ShareMapper.php(42): OCP\\AppFramework\\Db\\Mapper->findEntity('SELECT file_tar...', Array)\n#2 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Service\/OcrService.php(320): OCA\\Ocr\\Db\\ShareMapper->find(320, '[email protected]...', 'local::\/data\/da...')\n#3 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Service\/OcrService.php(173): OCA\\Ocr\\Service\\OcrService->buildTargetForShared(Object(OCA\\Ocr\\Db\\File))\n#4 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/OcrController.php(74): OCA\\Ocr\\Service\\OcrService->process(Array, Array)\n#5 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/Errors.php(35): OCA\\Ocr\\Controller\\OcrController->OCA\\Ocr\\Controller\\{closure}()\n#6 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/OcrController.php(75): OCA\\Ocr\\Controller\\OcrController->handleNotFound(Object(Closure))\n#7 [internal function]: OCA\\Ocr\\Controller\\OcrController->process(Array, Array)\n#8 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Http\/Dispatcher.php(160): call_user_func_array(Array, Array)\n#9 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Http\/Dispatcher.php(90): OC\\AppFramework\\Http\\Dispatcher->executeController(Object(OCA\\Ocr\\Controller\\OcrController), 'process')\n#10 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/App.php(114): OC\\AppFramework\\Http\\Dispatcher->dispatch(Object(OCA\\Ocr\\Controller\\OcrController), 'process')\n#11 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Routing\/RouteActionHandler.php(47): OC\\AppFramework\\App::main('OcrController', 'process', Object(OC\\AppFramework\\DependencyInjection\\DIContainer), Array)\n#12 [internal function]: OC\\AppFramework\\Routing\\RouteActionHandler->__invoke(Array)\n#13 \/var\/www\/nextcloud\/lib\/private\/Route\/Router.php(299): call_user_func(Object(OC\\AppFramework\\Routing\\RouteActionHandler), Array)\n#14 \/var\/www\/nextcloud\/lib\/base.php(1010): OC\\Route\\Router->match('\/apps\/ocr')\n#15 \/var\/www\/nextcloud\/index.php(40): OC::handleRequest()\n#16 {main}","File":"\/var\/www\/nextcloud\/lib\/public\/AppFramework\/Db\/Mapper.php","Line":289}","level":3,"time":"2017-02-14T18:04:15+00:00","method":"POST","url":"/nextcloud/index.php/apps/ocr","user":"[email protected]","version":"11.0.1.2"}
{"reqId":"2cHVIjfj1kBYuU64e6xu","remoteAddr":"192.168.1.104","app":"ocr","message":"Exception during ocr service function processing: {"Exception":"OCP\\AppFramework\\Db\\DoesNotExistException","Message":"Did expect one result but found none when executing: query \"SELECT file_target FROM PREFIXshare WHERE file_source = ? AND share_with = ? AND uid_owner = ?\"; parameters Array\n(\n [0] => 320\n [1] => [email protected]\n [2] => local::\/data\/data\/\n)\n; limit \"\"; offset \"\"","Code":0,"Trace":"#0 \/var\/www\/nextcloud\/lib\/public\/AppFramework\/Db\/Mapper.php(373): OCP\\AppFramework\\Db\\Mapper->findOneQuery('SELECT file_tar...', Array, NULL, NULL)\n#1 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Db\/ShareMapper.php(42): OCP\\AppFramework\\Db\\Mapper->findEntity('SELECT file_tar...', Array)\n#2 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Service\/OcrService.php(320): OCA\\Ocr\\Db\\ShareMapper->find(320, '[email protected]...', 'local::\/data\/da...')\n#3 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Service\/OcrService.php(173): OCA\\Ocr\\Service\\OcrService->buildTargetForShared(Object(OCA\\Ocr\\Db\\File))\n#4 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/OcrController.php(74): OCA\\Ocr\\Service\\OcrService->process(Array, Array)\n#5 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/Errors.php(35): OCA\\Ocr\\Controller\\OcrController->OCA\\Ocr\\Controller\\{closure}()\n#6 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/OcrController.php(75): OCA\\Ocr\\Controller\\OcrController->handleNotFound(Object(Closure))\n#7 [internal function]: OCA\\Ocr\\Controller\\OcrController->process(Array, Array)\n#8 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Http\/Dispatcher.php(160): call_user_func_array(Array, Array)\n#9 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Http\/Dispatcher.php(90): OC\\AppFramework\\Http\\Dispatcher->executeController(Object(OCA\\Ocr\\Controller\\OcrController), 'process')\n#10 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/App.php(114): OC\\AppFramework\\Http\\Dispatcher->dispatch(Object(OCA\\Ocr\\Controller\\OcrController), 'process')\n#11 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Routing\/RouteActionHandler.php(47): OC\\AppFramework\\App::main('OcrController', 'process', Object(OC\\AppFramework\\DependencyInjection\\DIContainer), Array)\n#12 [internal function]: OC\\AppFramework\\Routing\\RouteActionHandler->__invoke(Array)\n#13 \/var\/www\/nextcloud\/lib\/private\/Route\/Router.php(299): call_user_func(Object(OC\\AppFramework\\Routing\\RouteActionHandler), Array)\n#14 \/var\/www\/nextcloud\/lib\/base.php(1010): OC\\Route\\Router->match('\/apps\/ocr')\n#15 \/var\/www\/nextcloud\/index.php(40): OC::handleRequest()\n#16 {main}","File":"\/var\/www\/nextcloud\/lib\/public\/AppFramework\/Db\/Mapper.php","Line":289}","level":3,"time":"2017-02-14T18:04:15+00:00","method":"POST","url":"/nextcloud/index.php/apps/ocr","user":"[email protected]","version":"11.0.1.2"}
{"reqId":"2cHVIjfj1kBYuU64e6xu","remoteAddr":"192.168.1.104","app":"index","message":"Exception: {"Exception":"OCP\\AppFramework\\Db\\DoesNotExistException","Message":"Did expect one result but found none when executing: query \"SELECT file_target FROM PREFIXshare WHERE file_source = ? AND share_with = ? AND uid_owner = ?\"; parameters Array\n(\n [0] => 320\n [1] => [email protected]\n [2] => local::\/data\/data\/\n)\n; limit \"\"; offset \"\"","Code":0,"Trace":"#0 \/var\/www\/nextcloud\/lib\/public\/AppFramework\/Db\/Mapper.php(373): OCP\\AppFramework\\Db\\Mapper->findOneQuery('SELECT file_tar...', Array, NULL, NULL)\n#1 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Db\/ShareMapper.php(42): OCP\\AppFramework\\Db\\Mapper->findEntity('SELECT file_tar...', Array)\n#2 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Service\/OcrService.php(320): OCA\\Ocr\\Db\\ShareMapper->find(320, '[email protected]...', 'local::\/data\/da...')\n#3 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Service\/OcrService.php(173): OCA\\Ocr\\Service\\OcrService->buildTargetForShared(Object(OCA\\Ocr\\Db\\File))\n#4 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/OcrController.php(74): OCA\\Ocr\\Service\\OcrService->process(Array, Array)\n#5 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/Errors.php(35): OCA\\Ocr\\Controller\\OcrController->OCA\\Ocr\\Controller\\{closure}()\n#6 \/var\/www\/nextcloud\/apps\/ocr\/lib\/Controller\/OcrController.php(75): OCA\\Ocr\\Controller\\OcrController->handleNotFound(Object(Closure))\n#7 [internal function]: OCA\\Ocr\\Controller\\OcrController->process(Array, Array)\n#8 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Http\/Dispatcher.php(160): call_user_func_array(Array, Array)\n#9 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Http\/Dispatcher.php(90): OC\\AppFramework\\Http\\Dispatcher->executeController(Object(OCA\\Ocr\\Controller\\OcrController), 'process')\n#10 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/App.php(114): OC\\AppFramework\\Http\\Dispatcher->dispatch(Object(OCA\\Ocr\\Controller\\OcrController), 'process')\n#11 \/var\/www\/nextcloud\/lib\/private\/AppFramework\/Routing\/RouteActionHandler.php(47): OC\\AppFramework\\App::main('OcrController', 'process', Object(OC\\AppFramework\\DependencyInjection\\DIContainer), Array)\n#12 [internal function]: OC\\AppFramework\\Routing\\RouteActionHandler->__invoke(Array)\n#13 \/var\/www\/nextcloud\/lib\/private\/Route\/Router.php(299): call_user_func(Object(OC\\AppFramework\\Routing\\RouteActionHandler), Array)\n#14 \/var\/www\/nextcloud\/lib\/base.php(1010): OC\\Route\\Router->match('\/apps\/ocr')\n#15 \/var\/www\/nextcloud\/index.php(40): OC::handleRequest()\n#16 {main}","File":"\/var\/www\/nextcloud\/lib\/public\/AppFramework\/Db\/Mapper.php","Line":289}","level":3,"time":"2017-02-14T18:04:15+00:00","method":"POST","url":"/nextcloud/index.php/apps/ocr","user":"[email protected]","version":"11.0.1.2"}
Mark a folder and process it. Maybe even specify a folder in the user settings, which then can be processed automatically by the cron jobs in nextcloud.
Most parts of ocr service are left out, because of global php functions.
It would be useful to have the ability for the app to use instances of the required dependencies:
a) OCRmyPDF
b) tesseract-ocr
Present at an external site, if this is possible.
This would benefit those using hosted Nextcloud versions where they cannot control the software on the server.
Currently JPEG2000 images are not supported.
Those files are typically named *.jp2
and use the mime type image/jp2
which is supported by Tesseract 3.x and could be added to the OCR app.
Some more image formats are also supported by Tesseract 3.x and missing in the OCR app.
I'm trying to install the OCR app on a Nextcloud 11.0.0 installation with PHP 7.0.
Tesseract and Ocrmypdf commands are installed and executable for any user:
% which ocrmypdf
/usr/local/bin/ocrmypdf
% ls -l /usr/local/bin/ocrmypdf
-rwxr-xr-x 1 root wheel 242 Dec 20 19:53 /usr/local/bin/ocrmypdf
% which tesseract
/usr/local/bin/tesseract
% ls -l /usr/local/bin/tesseract
-rwxr-xr-x 1 root wheel 21280 Dec 15 01:42 /usr/local/bin/tesseract
When I click "Activate" in the Nextcloud App Admin, I get the following error message:
App "Array" cannot be installed because the following dependencies are not fulfilled: The command line tool ocrmypdf could not be found The command line tool tesseract could not be found
In the Admin/Logging section I can then see two error messages:
Error PHP Array to string conversion at /next/data/nextcloud/lib/private/legacy/l10n/string.php#72
Error core App "Array" cannot be installed because the following dependencies are not fulfilled: The command line tool ocrmypdf could not be found The command line tool tesseract could not be found
tesseract 3.04.01
leptonica-1.72
libgif 5.1.3 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.6.23+apng : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.5.0 : libopenjp2 2.1.0
{"reqId":"wRghPv69b1oS1OA4rmt7","remoteAddr":"x.x.x.x","app":"PHP","message":"Array to string conversion at \/next\/data\/nextcloud\/lib\/private\/legacy\/l10n\/string.php#72","level":3,"time":"2016-12-21T09:35:48+00:00","method":"POST","url":"\/nextcloud\/index.php\/settings\/ajax\/enableapp.php","user":"x","version":"11.0.0.10"}
{"reqId":"wRghPv69b1oS1OA4rmt7","remoteAddr":"x.x.x.x","app":"core","message":"App \"Array\" cannot be installed because the following dependencies are not fulfilled: The command line tool ocrmypdf could not be found\nThe command line tool tesseract could not be found","level":3,"time":"2016-12-21T09:35:48+00:00","method":"POST","url":"\/nextcloud\/index.php\/settings\/ajax\/enableapp.php","user":"x","version":"11.0.0.10"}
https://github.com/janis91/ocr/blob/master/lib/Service/QueueService.php#L97
OCR should be written in uppercase letters.
Evaluate and compare the php native semaphore features. Maybe we can get rid of gearman.
More tests which cover the whole ocr service part. not only a little of them.
Most parts of ocr service are left out, because of global php functions which cannot be mocked.
Maybe we can build up a better environment where the global functions can be processed correctly in travis. (As for now the processing is available in local dev env only)
The project becomes more reliable with this.
To properly work with Nextcloud 11+ the appinfo.xml
needs following entry in the <dependencies>
tag (of course adjust the numbers to the ones that actually work ;)):
<nextcloud min-version="9" max-version="11" />
cc @janis91
https://github.com/janis91/ocr/blob/master/templates/settings-personal.php#L16
The info.xml should be also translated according to get a better understanding for other users, what this app is about.
should be included for translation.
Bug report
NC Version 11.01, current OCR app.
When I want to OCR a file that has been uploaded by me, but resides on a directory that has been shared to me, OCR will not start but display an error in the log:
OCP\AppFramework\Db\DoesNotExistException: Did expect one result but found none when executing: query "SELECT file_target FROM *PREFIX*share WHERE file_source = ? AND share_with = ? AND uid_owner = ?"; parameters Array ( [0] => 168228 [1] => current.user[2] => OwnerOfSharedDir ) ; limit ""; offset ""
When I login as (OwnerOfSharedDir), OCR works fine.
Is this reproducible or shall I do some more tests and provide more info here?
I should be able to keep using Nextcloud even if the app is broken.
It's impossible to use the Files app
Add exception handling to not break Nextcloud when a serious issue occurs.
tesseract 3.04.01
leptonica-1.72
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.1) : libpng 1.6.28+apng : libtiff 4.0.7 : zlib 1.2.8 : libwebp 0.5.2 : libopenjp2 2.1.2
An unhandled exception has been thrown:
Error: Call to undefined function OCA\Ocr\Service\msg_get_queue() in customapps/ocr/lib/Service/QueueService.php:70
Stack trace:
#0 [internal function]: OCA\Ocr\Service\QueueService->__construct(Object(OCA\Ocr\Db\OcrStatusMapper), Object(OC\AllConfig), Object(OC\L10N\L10N), Object(OC\Log))
#1 lib/private/AppFramework/Utility/SimpleContainer.php(79): ReflectionClass->newInstanceArgs(Array)
#2 An unhandled exception has been thrown:
Error: Call to undefined function OCA\Ocr\Service\msg_get_queue() in customapps/ocr/lib/Service/QueueService.php:70
Stack trace:
#0 [internal function]: OCA\Ocr\Service\QueueService->__construct(Object(OCA\Ocr\Db\OcrStatusMapper), Object(OC\AllConfig), Object(OC\L10N\L10N), Object(OC\Log))
#1 lib/private/AppFramework/Utility/SimpleContainer.php(79): ReflectionClass->newInstanceArgs(Array)
#2 lib/private/AppFramework/Utility/SimpleContainer.php(96): OC\AppFramework\Utility\SimpleContainer->buildClass(Object(ReflectionClass))
#3 lib/private/AppFramework/Utility/SimpleContainer.php(117): OC\AppFramework\Utility\SimpleContainer->resolve('OCA\\Ocr\\Service...')
#4 lib/private/AppFramework/DependencyInjection/DIContainer.php(544): OC\AppFramework\Utility\SimpleContainer->query('OCA\\Ocr\\Service...')
#5 lib/private/AppFramework/Utility/SimpleContainer.php(66): OC\AppFramework\DependencyInjection\DIContainer->query('OCA\\Ocr\\Service...')
#6 lib/private/AppFramework/Utility/SimpleContainer.php(96): OC\AppFramework\Utility\SimpleContainer->buildClass(Object(ReflectionClass))
#7 lib/private/AppFramework/Utility/SimpleContainer.php(117): OC\AppFramework\Utility\SimpleContainer->resolve('OCA\\Ocr\\Service...')
#8 lib/private/AppFramework/DependencyInjection/DIContainer.php(544): OC\AppFramework\Utility\SimpleContainer->query('OCA\\Ocr\\Service...')
#9 lib/private/AppFramework/Utility/SimpleContainer.php(66): OC\AppFramework\DependencyInjection\DIContainer->query('OCA\\Ocr\\Service...')
#10 lib/private/AppFramework/Utility/SimpleContainer.php(96): OC\AppFramework\Utility\SimpleContainer->buildClass(Object(ReflectionClass))
#11 lib/private/AppFramework/Utility/SimpleContainer.php(117): OC\AppFramework\Utility\SimpleContainer->resolve('OCA\\Ocr\\Command...')
#12 lib/private/AppFramework/DependencyInjection/DIContainer.php(544): OC\AppFramework\Utility\SimpleContainer->query('OCA\\Ocr\\Command...')
#13 customapps/ocr/appinfo/register_command.php(18): OC\AppFramework\DependencyInjection\DIContainer->query('OCA\\Ocr\\Command...')
#14 lib/private/Console/Application.php(119): require('/backyard/yourmum/d...')
#15 console.php(89): OC\Console\Application->loadCommands(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#16 occ(11): require_once('/backyard/yourmum/d...')
#17 {main}/lib/private/AppFramework/Utility/SimpleContainer.php(96): OC\AppFramework\Utility\SimpleContainer->buildClass(Object(ReflectionClass))
#3 lib/private/AppFramework/Utility/SimpleContainer.php(117): OC\AppFramework\Utility\SimpleContainer->resolve('OCA\\Ocr\\Service...')
#4 lib/private/AppFramework/DependencyInjection/DIContainer.php(544): OC\AppFramework\Utility\SimpleContainer->query('OCA\\Ocr\\Service...')
#5 lib/private/AppFramework/Utility/SimpleContainer.php(66): OC\AppFramework\DependencyInjection\DIContainer->query('OCA\\Ocr\\Service...')
#6 lib/private/AppFramework/Utility/SimpleContainer.php(96): OC\AppFramework\Utility\SimpleContainer->buildClass(Object(ReflectionClass))
#7 lib/private/AppFramework/Utility/SimpleContainer.php(117): OC\AppFramework\Utility\SimpleContainer->resolve('OCA\\Ocr\\Service...')
#8 lib/private/AppFramework/DependencyInjection/DIContainer.php(544): OC\AppFramework\Utility\SimpleContainer->query('OCA\\Ocr\\Service...')
#9 lib/private/AppFramework/Utility/SimpleContainer.php(66): OC\AppFramework\DependencyInjection\DIContainer->query('OCA\\Ocr\\Service...')
#10 lib/private/AppFramework/Utility/SimpleContainer.php(96): OC\AppFramework\Utility\SimpleContainer->buildClass(Object(ReflectionClass))
#11 lib/private/AppFramework/Utility/SimpleContainer.php(117): OC\AppFramework\Utility\SimpleContainer->resolve('OCA\\Ocr\\Command...')
#12 lib/private/AppFramework/DependencyInjection/DIContainer.php(544): OC\AppFramework\Utility\SimpleContainer->query('OCA\\Ocr\\Command...')
#13 customapps/ocr/appinfo/register_command.php(18): OC\AppFramework\DependencyInjection\DIContainer->query('OCA\\Ocr\\Command...')
#14 lib/private/Console/Application.php(119): require('/backyard/yourmum/d...')
#15 console.php(89): OC\Console\Application->loadCommands(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#16 occ(11): require_once('/backyard/yourmum/d...')
#17 {main}
As stated here: https://github.com/tesseract-ocr/tesseract/wiki#running-tesseract
tesseract also supports multiple languages to process for.
Include such a behaviour in the app.
ocr/lib/Service/OcrService.php
Line 183 in a121269
Should be Empty parameters passed
instead of Empty passed parameters
.
You don't say Leere übergeben Parameter
in german either ;)
Don't know
No Idea
Just install it as below.
Access for user www-data tested: OK
Level App Message Time
Debug ocr Following status objects failed: [] 2016-12-28T14:11:52+0100
Debug ocr Find processed ocr files and put them to the right dirs. 2016-12-28T14:11:52+0100
Debug ocr Following status objects failed: [] 2016-12-28T14:11:50+0100
Debug ocr Find processed ocr files and put them to the right dirs. 2016-12-28T14:11:50+0100
Debug ocr Following status objects failed: [] 2016-12-28T14:11:47+0100
Debug ocr Find processed ocr files and put them to the right dirs. 2016-12-28T14:11:47+0100
Debug ocr Following status objects failed: [] 2016-12-28T14:11:45+0100
Debug ocr Find processed ocr files and put them to the right dirs. 2016-12-28T14:11:45+0100
Debug ocr Client message: "{"type":"mypdf","datadirectory":"\/var\/ocdata","path":"\/xxxAdmin\/files\/PHA--2054Z0.pdf","tempfile":"\/var\/ocdata\/upload-tmp\/oc_tmp_B6AFT0","language":"deu","statusid":5,"occdir":"\/var\/www\/nextcloud"}" 2016-12-28T14:11:43+0100
Debug ocr Fetched languages: ["ita","fra","osd","deu","spa","equ","por","eng"] 2016-12-28T14:11:43+0100
Debug ocr Fetching languages. 2016-12-28T14:11:42+0100
Debug ocr Will now process files: [{"name":"PHA--2054Z0.pdf","path":"/","type":"file","mimetype":"application/pdf"}] with language: "deu" 2016-12-28T14:11:42+0100
Debug ocr Following status objects failed: [] 2016-12-28T14:11:41+0100
Debug ocr Find processed ocr files and put them to the right dirs. 2016-12-28T14:11:41+0100
Debug ocr Following status objects failed: [] 2016-12-28T14:11:36+0100
Debug ocr Find processed ocr files and put them to the right dirs. 2016-12-28T14:11:36+0100
Debug ocr Following status objects failed: [] 2016-12-28T14:11:31+0100
Debug ocr Find processed ocr files and put them to the right dirs. 2016-12-28T14:11:31+0100
Info admin_audit File written to: "//PHA--2054Z0.pdf" 2016-12-28T14:11:30+0100
Info admin_audit File created: "//PHA--2054Z0.pdf" 2016-12-28T14:11:30+0100
ocadmin@owncloud:~$ sudo apt-get install python3-pip
[sudo] password for ocadmin:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
build-essential g++ g++-5 libpython3-dev libpython3.5-dev libstdc++-5-dev python3-dev python3-setuptools python3-wheel
python3.5-dev
Suggested packages:
g++-multilib g++-5-multilib gcc-5-doc libstdc++6-5-dbg libstdc++-5-doc python-setuptools-doc
The following NEW packages will be installed:
build-essential g++ g++-5 libpython3-dev libpython3.5-dev libstdc++-5-dev python3-dev python3-pip python3-setuptools
python3-wheel python3.5-dev
0 upgraded, 11 newly installed, 0 to remove and 0 not upgraded.
Need to get 47.7 MB of archives.
After this operation, 94.3 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://se.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libstdc++-5-dev amd64 5.4.0-6ubuntu1~16.04.4 [1,426 kB]
Get:3 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 g++ amd64 4:5.3.1-1ubuntu1 [1,504 B]
Get:4 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 build-essential amd64 12.1ubuntu2 [4,758 B]
Get:6 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 libpython3-dev amd64 3.5.1-3 [6,926 B]
Get:7 http://se.archive.ubuntu.com/ubuntu xenial-updates/main amd64 python3.5-dev amd64 3.5.2-2ubuntu0~16.04.1 [413 kB]
Get:8 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 python3-dev amd64 3.5.1-3 [1,186 B]
Get:9 http://se.archive.ubuntu.com/ubuntu xenial-updates/universe amd64 python3-pip all 8.1.1-2ubuntu0.4 [109 kB]
Get:10 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 python3-setuptools all 20.7.0-1 [88.0 kB]
Get:11 http://se.archive.ubuntu.com/ubuntu xenial/universe amd64 python3-wheel all 0.29.0-1 [48.1 kB]
Get:2 http://gensho.acc.umu.se/ubuntu xenial-updates/main amd64 g++-5 amd64 5.4.0-6ubuntu1~16.04.4 [8,300 kB]
Get:5 http://caesar.acc.umu.se/ubuntu xenial-updates/main amd64 libpython3.5-dev amd64 3.5.2-2ubuntu0~16.04.1 [37.3 MB]
Fetched 47.7 MB in 13s (3,619 kB/s)
Selecting previously unselected package libstdc++-5-dev:amd64.
(Reading database ... 120514 files and directories currently installed.)
Preparing to unpack .../libstdc++-5-dev_5.4.0-6ubuntu1~16.04.4_amd64.deb ...
Unpacking libstdc++-5-dev:amd64 (5.4.0-6ubuntu1~16.04.4) ...
Selecting previously unselected package g++-5.
Preparing to unpack .../g++-5_5.4.0-6ubuntu1~16.04.4_amd64.deb ...
Unpacking g++-5 (5.4.0-6ubuntu1~16.04.4) ...
Selecting previously unselected package g++.
Preparing to unpack .../g++_4%3a5.3.1-1ubuntu1_amd64.deb ...
Unpacking g++ (4:5.3.1-1ubuntu1) ...
Selecting previously unselected package build-essential.
Preparing to unpack .../build-essential_12.1ubuntu2_amd64.deb ...
Unpacking build-essential (12.1ubuntu2) ...
Selecting previously unselected package libpython3.5-dev:amd64.
Preparing to unpack .../libpython3.5-dev_3.5.2-2ubuntu0~16.04.1_amd64.deb ...
Unpacking libpython3.5-dev:amd64 (3.5.2-2ubuntu0~16.04.1) ...
Selecting previously unselected package libpython3-dev:amd64.
Preparing to unpack .../libpython3-dev_3.5.1-3_amd64.deb ...
Unpacking libpython3-dev:amd64 (3.5.1-3) ...
Selecting previously unselected package python3.5-dev.
Preparing to unpack .../python3.5-dev_3.5.2-2ubuntu0~16.04.1_amd64.deb ...
Unpacking python3.5-dev (3.5.2-2ubuntu0~16.04.1) ...
Selecting previously unselected package python3-dev.
Preparing to unpack .../python3-dev_3.5.1-3_amd64.deb ...
Unpacking python3-dev (3.5.1-3) ...
Selecting previously unselected package python3-pip.
Preparing to unpack .../python3-pip_8.1.1-2ubuntu0.4_all.deb ...
Unpacking python3-pip (8.1.1-2ubuntu0.4) ...
Selecting previously unselected package python3-setuptools.
Preparing to unpack .../python3-setuptools_20.7.0-1_all.deb ...
Unpacking python3-setuptools (20.7.0-1) ...
Selecting previously unselected package python3-wheel.
Preparing to unpack .../python3-wheel_0.29.0-1_all.deb ...
Unpacking python3-wheel (0.29.0-1) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up libstdc++-5-dev:amd64 (5.4.0-6ubuntu1~16.04.4) ...
Setting up g++-5 (5.4.0-6ubuntu1~16.04.4) ...
Setting up g++ (4:5.3.1-1ubuntu1) ...
update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
Setting up build-essential (12.1ubuntu2) ...
Setting up libpython3.5-dev:amd64 (3.5.2-2ubuntu0~16.04.1) ...
Setting up libpython3-dev:amd64 (3.5.1-3) ...
Setting up python3.5-dev (3.5.2-2ubuntu0~16.04.1) ...
Setting up python3-dev (3.5.1-3) ...
Setting up python3-pip (8.1.1-2ubuntu0.4) ...
Setting up python3-setuptools (20.7.0-1) ...
Setting up python3-wheel (0.29.0-1) ...
ocadmin@owncloud:~$ sudo pip3 install --upgrade pip
The directory '/home/ocadmin/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/ocadmin/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting pip
Downloading pip-9.0.1-py2.py3-none-any.whl (1.3MB)
100% |████████████████████████████████| 1.3MB 712kB/s
Installing collected packages: pip
Found existing installation: pip 8.1.1
Not uninstalling pip at /usr/lib/python3/dist-packages, outside environment /usr
Successfully installed pip-9.0.1
ocadmin@owncloud:~$ sudo apt-get install libffi-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
libffi-dev is already the newest version (3.2.1-4).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
ocadmin@owncloud:~$ sudo pip3 install ocrmypdf
The directory '/home/ocadmin/.cache/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/home/ocadmin/.cache/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Collecting ocrmypdf
Downloading ocrmypdf-4.3.4-py34-none-any.whl (48kB)
100% |████████████████████████████████| 51kB 561kB/s
Collecting cffi>=1.5.0 (from ocrmypdf)
Downloading cffi-1.9.1-cp35-cp35m-manylinux1_x86_64.whl (398kB)
100% |████████████████████████████████| 399kB 1.8MB/s
Collecting PyPDF2>=1.26 (from ocrmypdf)
Downloading PyPDF2-1.26.0.tar.gz (77kB)
100% |████████████████████████████████| 81kB 4.6MB/s
Collecting ruffus==2.6.3 (from ocrmypdf)
Downloading ruffus-2.6.3.tar.gz (36.9MB)
100% |████████████████████████████████| 36.9MB 26kB/s
Collecting img2pdf>=0.2.1 (from ocrmypdf)
Downloading img2pdf-0.2.1.tar.gz (46kB)
100% |████████████████████████████████| 51kB 3.8MB/s
Collecting Pillow>=3.1.0 (from ocrmypdf)
Downloading Pillow-3.4.2-cp35-cp35m-manylinux1_x86_64.whl (5.6MB)
100% |████████████████████████████████| 5.6MB 181kB/s
Collecting reportlab>=3.2.0 (from ocrmypdf)
Downloading reportlab-3.3.0.tar.gz (2.0MB)
100% |████████████████████████████████| 2.0MB 503kB/s
Collecting pycparser (from cffi>=1.5.0->ocrmypdf)
Downloading pycparser-2.17.tar.gz (231kB)
100% |████████████████████████████████| 235kB 362kB/s
Requirement already satisfied: pip>=1.4.1 in /usr/local/lib/python3.5/dist-packages (from reportlab>=3.2.0->ocrmypdf)
Requirement already satisfied: setuptools>=2.2 in /usr/lib/python3/dist-packages (from reportlab>=3.2.0->ocrmypdf)
Installing collected packages: pycparser, cffi, PyPDF2, ruffus, Pillow, img2pdf, reportlab, ocrmypdf
Running setup.py install for pycparser ... done
Running setup.py install for PyPDF2 ... done
Running setup.py install for ruffus ... done
Running setup.py install for img2pdf ... done
Running setup.py install for reportlab ... done
Successfully installed Pillow-3.4.2 PyPDF2-1.26.0 cffi-1.9.1 img2pdf-0.2.1 ocrmypdf-4.3.4 pycparser-2.17 reportlab-3.3.0 ruffus-2.6.3
ocadmin@owncloud:~$ sudo apt-get install tesseract-ocr tesseract-ocr-deu
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
libdatrie1 liblept5 libopenjp2-7 libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libtesseract3 libthai-data
libthai0 libwebp5 tesseract-ocr-eng tesseract-ocr-equ tesseract-ocr-osd
The following NEW packages will be installed:
libdatrie1 liblept5 libopenjp2-7 libpango-1.0-0 libpangocairo-1.0-0 libpangoft2-1.0-0 libtesseract3 libthai-data
libthai0 libwebp5 tesseract-ocr tesseract-ocr-deu tesseract-ocr-eng tesseract-ocr-equ tesseract-ocr-osd
0 upgraded, 15 newly installed, 0 to remove and 0 not upgraded.
Need to get 19.3 MB of archives.
After this operation, 73.3 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 libdatrie1 amd64 0.2.10-2 [17.3 kB]
Get:2 http://se.archive.ubuntu.com/ubuntu xenial-updates/universe amd64 libopenjp2-7 amd64 2.1.0-2.1ubuntu0.1 [103 kB]
Get:3 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 libwebp5 amd64 0.4.4-1 [165 kB]
Get:4 http://se.archive.ubuntu.com/ubuntu xenial/universe amd64 liblept5 amd64 1.73-1 [872 kB]
Get:5 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 libthai-data all 0.1.24-2 [131 kB]
Get:6 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 libthai0 amd64 0.1.24-2 [17.3 kB]
Get:7 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 libpango-1.0-0 amd64 1.38.1-1 [148 kB]
Get:8 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 libpangoft2-1.0-0 amd64 1.38.1-1 [33.3 kB]
Get:9 http://se.archive.ubuntu.com/ubuntu xenial/main amd64 libpangocairo-1.0-0 amd64 1.38.1-1 [20.5 kB]
Get:10 http://se.archive.ubuntu.com/ubuntu xenial/universe amd64 libtesseract3 amd64 3.04.01-4 [1,106 kB]
Get:12 http://se.archive.ubuntu.com/ubuntu xenial/universe amd64 tesseract-ocr-osd all 3.04.00-1 [2,988 kB]
Get:11 http://gensho.acc.umu.se/ubuntu xenial/universe amd64 tesseract-ocr-eng all 3.04.00-1 [8,824 kB]
Get:13 http://se.archive.ubuntu.com/ubuntu xenial/universe amd64 tesseract-ocr-equ all 3.04.00-1 [568 kB]
Get:14 http://se.archive.ubuntu.com/ubuntu xenial/universe amd64 tesseract-ocr amd64 3.04.01-4 [132 kB]
Get:15 http://se.archive.ubuntu.com/ubuntu xenial/universe amd64 tesseract-ocr-deu all 3.04.00-1 [4,153 kB]
Fetched 19.3 MB in 6s (3,137 kB/s)
Selecting previously unselected package libdatrie1:amd64.
(Reading database ... 121649 files and directories currently installed.)
Preparing to unpack .../libdatrie1_0.2.10-2_amd64.deb ...
Unpacking libdatrie1:amd64 (0.2.10-2) ...
Selecting previously unselected package libopenjp2-7:amd64.
Preparing to unpack .../libopenjp2-7_2.1.0-2.1ubuntu0.1_amd64.deb ...
Unpacking libopenjp2-7:amd64 (2.1.0-2.1ubuntu0.1) ...
Selecting previously unselected package libwebp5:amd64.
Preparing to unpack .../libwebp5_0.4.4-1_amd64.deb ...
Unpacking libwebp5:amd64 (0.4.4-1) ...
Selecting previously unselected package liblept5.
Preparing to unpack .../liblept5_1.73-1_amd64.deb ...
Unpacking liblept5 (1.73-1) ...
Selecting previously unselected package libthai-data.
Preparing to unpack .../libthai-data_0.1.24-2_all.deb ...
Unpacking libthai-data (0.1.24-2) ...
Selecting previously unselected package libthai0:amd64.
Preparing to unpack .../libthai0_0.1.24-2_amd64.deb ...
Unpacking libthai0:amd64 (0.1.24-2) ...
Selecting previously unselected package libpango-1.0-0:amd64.
Preparing to unpack .../libpango-1.0-0_1.38.1-1_amd64.deb ...
Unpacking libpango-1.0-0:amd64 (1.38.1-1) ...
Selecting previously unselected package libpangoft2-1.0-0:amd64.
Preparing to unpack .../libpangoft2-1.0-0_1.38.1-1_amd64.deb ...
Unpacking libpangoft2-1.0-0:amd64 (1.38.1-1) ...
Selecting previously unselected package libpangocairo-1.0-0:amd64.
Preparing to unpack .../libpangocairo-1.0-0_1.38.1-1_amd64.deb ...
Unpacking libpangocairo-1.0-0:amd64 (1.38.1-1) ...
Selecting previously unselected package libtesseract3.
Preparing to unpack .../libtesseract3_3.04.01-4_amd64.deb ...
Unpacking libtesseract3 (3.04.01-4) ...
Selecting previously unselected package tesseract-ocr-eng.
Preparing to unpack .../tesseract-ocr-eng_3.04.00-1_all.deb ...
Unpacking tesseract-ocr-eng (3.04.00-1) ...
Selecting previously unselected package tesseract-ocr-osd.
Preparing to unpack .../tesseract-ocr-osd_3.04.00-1_all.deb ...
Unpacking tesseract-ocr-osd (3.04.00-1) ...
Selecting previously unselected package tesseract-ocr-equ.
Preparing to unpack .../tesseract-ocr-equ_3.04.00-1_all.deb ...
Unpacking tesseract-ocr-equ (3.04.00-1) ...
Selecting previously unselected package tesseract-ocr.
Preparing to unpack .../tesseract-ocr_3.04.01-4_amd64.deb ...
Unpacking tesseract-ocr (3.04.01-4) ...
Selecting previously unselected package tesseract-ocr-deu.
Preparing to unpack .../tesseract-ocr-deu_3.04.00-1_all.deb ...
Unpacking tesseract-ocr-deu (3.04.00-1) ...
Processing triggers for libc-bin (2.23-0ubuntu5) ...
Processing triggers for man-db (2.7.5-1) ...
Setting up libdatrie1:amd64 (0.2.10-2) ...
Setting up libopenjp2-7:amd64 (2.1.0-2.1ubuntu0.1) ...
Setting up libwebp5:amd64 (0.4.4-1) ...
Setting up liblept5 (1.73-1) ...
Setting up libthai-data (0.1.24-2) ...
Setting up libthai0:amd64 (0.1.24-2) ...
Setting up libpango-1.0-0:amd64 (1.38.1-1) ...
Setting up libpangoft2-1.0-0:amd64 (1.38.1-1) ...
Setting up libpangocairo-1.0-0:amd64 (1.38.1-1) ...
Setting up libtesseract3 (3.04.01-4) ...
Setting up tesseract-ocr-eng (3.04.00-1) ...
Setting up tesseract-ocr-osd (3.04.00-1) ...
Setting up tesseract-ocr-equ (3.04.00-1) ...
Setting up tesseract-ocr (3.04.01-4) ...
Setting up tesseract-ocr-deu (3.04.00-1) ...
Processing triggers for libc-bin (2.23-0ubuntu5) ...
ocadmin@owncloud:~$ sudo apt-get install tesseract-ocr-spa
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
tesseract-ocr-spa
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 6,688 kB of archives.
After this operation, 39.2 MB of additional disk space will be used.
Get:1 http://caesar.acc.umu.se/ubuntu xenial/universe amd64 tesseract-ocr-spa all 3.04.00-1 [6,688 kB]
Fetched 6,688 kB in 2s (3,143 kB/s)
Selecting previously unselected package tesseract-ocr-spa.
(Reading database ... 121790 files and directories currently installed.)
Preparing to unpack .../tesseract-ocr-spa_3.04.00-1_all.deb ...
Unpacking tesseract-ocr-spa (3.04.00-1) ...
Setting up tesseract-ocr-spa (3.04.00-1) ...
ocadmin@owncloud:~$ sudo apt-get install tesseract-ocr-por
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
tesseract-ocr-por
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 3,893 kB of archives.
After this operation, 12.9 MB of additional disk space will be used.
Get:1 http://se.archive.ubuntu.com/ubuntu xenial/universe amd64 tesseract-ocr-por all 3.04.00-1 [3,893 kB]
Fetched 3,893 kB in 1s (3,273 kB/s)
Selecting previously unselected package tesseract-ocr-por.
(Reading database ... 121801 files and directories currently installed.)
Preparing to unpack .../tesseract-ocr-por_3.04.00-1_all.deb ...
Unpacking tesseract-ocr-por (3.04.00-1) ...
Setting up tesseract-ocr-por (3.04.00-1) ...
ocadmin@owncloud:~$ sudo apt-get install tesseract-ocr-ndl
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package tesseract-ocr-ndl
ocadmin@owncloud:~$ sudo apt-get install tesseract-ocr-ita
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
tesseract-ocr-ita
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 5,845 kB of archives.
After this operation, 32.8 MB of additional disk space will be used.
Get:1 http://saimei.acc.umu.se/ubuntu xenial/universe amd64 tesseract-ocr-ita all 3.04.00-1 [5,845 kB]
Fetched 5,845 kB in 16s (357 kB/s)
Selecting previously unselected package tesseract-ocr-ita.
(Reading database ... 121805 files and directories currently installed.)
Preparing to unpack .../tesseract-ocr-ita_3.04.00-1_all.deb ...
Unpacking tesseract-ocr-ita (3.04.00-1) ...
Setting up tesseract-ocr-ita (3.04.00-1) ...
ocadmin@owncloud:~$ sudo apt-get install tesseract-ocr-fra
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
tesseract-ocr-fra
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 6,075 kB of archives.
After this operation, 37.4 MB of additional disk space will be used.
Get:1 http://caesar.acc.umu.se/ubuntu xenial/universe amd64 tesseract-ocr-fra all 3.04.00-1 [6,075 kB]
Fetched 6,075 kB in 1s (3,163 kB/s)
Selecting previously unselected package tesseract-ocr-fra.
(Reading database ... 121817 files and directories currently installed.)
Preparing to unpack .../tesseract-ocr-fra_3.04.00-1_all.deb ...
Unpacking tesseract-ocr-fra (3.04.00-1) ...
Setting up tesseract-ocr-fra (3.04.00-1) ...
ocadmin@owncloud:~$ sudo apt-get install tesseract-ocr-eng
Reading package lists... Done
Building dependency tree
Reading state information... Done
tesseract-ocr-eng is already the newest version (3.04.00-1).
tesseract-ocr-eng set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
ocadmin@owncloud:~$ sudo apt-get install tesseract-ocr-deu-frak
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
tesseract-ocr-deu-frak
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 616 kB of archives.
After this operation, 2,013 kB of additional disk space will be used.
Get:1 http://se.archive.ubuntu.com/ubuntu xenial/universe amd64 tesseract-ocr-deu-frak all 3.04.00-1 [616 kB]
Fetched 616 kB in 0s (1,016 kB/s)
Selecting previously unselected package tesseract-ocr-deu-frak.
(Reading database ... 121829 files and directories currently installed.)
Preparing to unpack .../tesseract-ocr-deu-frak_3.04.00-1_all.deb ...
Unpacking tesseract-ocr-deu-frak (3.04.00-1) ...
Setting up tesseract-ocr-deu-frak (3.04.00-1) ...
OCR 2.0.0
Mark a folder and say OCR it
Cannot OCR a folder
update the travis.yml for mysql. first the mysql service has to be available in the trusty beta of travis.
Have a new branch available which is ready for the latest nextcloud 12.
For example there is a new feature available:
nextcloud/server#3151
Hey guys,
first of all. The nextcloud ocr plugin is great and works fine. Thank you for your work.
Did you think about an option to replace the original file during ocr process? Every time I do this I delete the old file and rename the new one.
Possible solution:
I would appreciate an option in the plugin settings or somewhere else to automatically replace a file when processing ocr. In case of an error or bad result e.g. I would be able to restore the orginal file via NC history feature.
After a file was selected and the delete action is clicked (in the top action bar). The file action bar gets hidden and the file information sorting is available again.
The ocr icon and menu option is still available and does not get disabled.
maybe hide it after the events of the other actions fired?
I have a queueing solution in mind (3rd party) which could allow to cue the tesseract processing for multiple files.
It could run in background and use the Webdav to upload it to the owncloud.
Feedback for the status should be available. (Websockets maybe)
After extracting the current master version of OCR to nextcloud/apps/ocr/ on my Nextcloud 10.0.1 installation, it appears in the Apps section under "Not activated" like this:
[object Object],[object Object] 1.0.0
von Janis Koehr (agpl-lizensiert)
I can enable and use OCR just fine, but as soon as it is enabled, the "Activated" page of the Apps view does not load anymore (endless loading animation).
After disabling the OCR app on the command line (with occ disable:app ocr), the "Activated" page works again.
I took a quick look at the OCR app code, but couldn't find the reason for this behaviour myself.
Nothing that looks like it has to do with this issue.
transifex setup required.
Scrutinizer awaits the code coverage very long and runs into a timeout. Maybe change the behaviour of travis once again in order to get this right.
Also: change the timeout time to 10 minutes.
I probably should have added this to the other issue (oops).
I forget the exact error but since you are specifying multiple processes you have to specify process_name
This is what worked for me: process_name = %(program_name)s_%(process_num)02d
And this is the output of servervisorctl status
:
myworker:myworker_00 RUNNING pid 2223, uptime 0:12:02
myworker:myworker_01 RUNNING pid 2240, uptime 0:12:02
myworker:myworker_02 RUNNING pid 2241, uptime 0:12:02
Hello,
on Nextcloud 12 i get this at th personal Setttings.
the Worker running with sudo -u pleskuser nohup php /var/www/vhosts/larsmueller.net/nextcloud/apps/ocr/worker/OCRWorker.php
(pleskuser is the same as www-data at an Pleskerver)
Blank OCR entry at personal Settings
no logfile entry
Warning at the top says:
OCR App could not be initialized: "No languages found."
but I do have languages installed
# tesseract --list-langs
List of available languages (4):
deu
equ
osd
eng
"message": "Exception during ocr service function processing: {\"Exception\":\"OCA\\\\Ocr\\\\Service\\\\NotFoundException\",\"Message\":\"No languages found.\",\"Code\":0,\"Trace\":\"#0 \\\/customapps\\\/ocr\\\/lib\\\/Controller\\\/OcrController.php(61): OCA\\\\Ocr\\\\Service\\\\OcrService->listLanguages()\\n#1 \\\/customapps\\\/ocr\\\/lib\\\/Controller\\\/Errors.php(35): OCA\\\\Ocr\\\\Controller\\\\OcrController->OCA\\\\Ocr\\\\Controller\\\\{closure}()\\n#2 \\\/customapps\\\/ocr\\\/lib\\\/Controller\\\/OcrController.php(62): OCA\\\\Ocr\\\\Controller\\\\OcrController->handleNotFound(Object(Closure))\\n#3 [internal function]: OCA\\\\Ocr\\\\Controller\\\\OcrController->languages()\\n#4 \\\/lib\\\/private\\\/AppFramework\\\/Http\\\/Dispatcher.php(160): call_user_func_array(Array, Array)\\n#5 \\\/lib\\\/private\\\/AppFramework\\\/Http\\\/Dispatcher.php(90): OC\\\\AppFramework\\\\Http\\\\Dispatcher->executeController(Object(OCA\\\\Ocr\\\\Controller\\\\OcrController), 'languages')\\n#6 \\\/lib\\\/private\\\/AppFramework\\\/App.php(114): OC\\\\AppFramework\\\\Http\\\\Dispatcher->dispatch(Object(OCA\\\\Ocr\\\\Controller\\\\OcrController), 'languages')\\n#7 \\\/lib\\\/private\\\/AppFramework\\\/Routing\\\/RouteActionHandler.php(47): OC\\\\AppFramework\\\\App::main('OcrController', 'languages', Object(OC\\\\AppFramework\\\\DependencyInjection\\\\DIContainer), Array)\\n#8 [internal function]: OC\\\\AppFramework\\\\Routing\\\\RouteActionHandler->__invoke(Array)\\n#9 \\\/lib\\\/private\\\/Route\\\/Router.php(299): call_user_func(Object(OC\\\\AppFramework\\\\Routing\\\\RouteActionHandler), Array)\\n#10 \\\/lib\\\/base.php(1010): OC\\\\Route\\\\Router->match('\\\/apps\\\/ocr')\\n#11 \\\/index.php(40): OC::handleRequest()\\n#12 {main}\",\"File\":\"\\\/customapps\\\/ocr\\\/lib\\\/Service\\\/OcrService.php\",\"Line\":142}"
Inside a pdf viewer (acrobat reader, or pdf.js in the browser), you cannot search for a phrase of multiple words. The phrase matches nothing even when it is in the document.
When the document contains, for example, "Breakfast menu", when you click the search icon (magnifying glass) and enter text "breakfast menu", it should match the text and find it.
It olny matches one word. For example, it matches "breakfast", or it matches "menu". If you try to search for two words, it fails to find a match, even when the two words are clearly together on the same line, in the document!
Possibly take a look at the parameters or settings for tesseract-ocr
and see if it can be made to connect words which are on the same line, into the same continuous text line.
_OCR.pdf
version of the pdf file which contains the recognized text.Searching for only one word at a time is awkward and time consuming.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.