djireland / cl-tesseract Goto Github PK
View Code? Open in Web Editor NEWThis project forked from rigidus/cl-tesseract
Common Lisp bindings for the Tesseract OCR library.
License: MIT License
This project forked from rigidus/cl-tesseract
Common Lisp bindings for the Tesseract OCR library.
License: MIT License
CL-TESSERACT is a set of CFFI bindings for the Tesseract OCR library v. 3.04: https://github.com/tesseract-ocr/tesseract On OS X, Tesseract can be conveniently installed using Homebrew: brew install tesseract As Tesseract OCR’s capi changed in the update to v. 3.04, earlier versions such as 3.02 will not work with these bindings. CL-TESSERACT also provides convenient lisp functions to retrieve text from images, IMAGE-TO-TEXT and IMAGE-TO-HOCR. IMAGE-TO-TEXT accepts a lisp pathname and an optional language parameter and returns a unicode string: * (image-to-text #P"~/eurotext.tif") "The (quick) [brown] {fox} jumps! Over the $43,456.78 <lazy> #90 dog & duck/goose, as 12.5% of E-mail from [email protected] is spam. Der ,,schnelle” braune Fuchs springt fiber den faulen Hund. Le renard brun «rapide» saute par-dessus le chien paresseux. La volpe marrone rapida salta sopra i] cane pigro. El zorro marrén répido salta sobre el perro perezoso. A raposa marrom répida salta sobre 0 C50 preguieoso. " * (image-to-text #P"~/eurotext.tif" :lang "rus") "ТЬе (чиісК) [Ьгошп] {Гох} ]итрз! Очег [пе $43‚456.78 <1а2у> #90 603 & ‹1исК/3005е, аз 12.5% ог Е-таіі Ггот азраттег@шеЬ5і[е.сош із зрат. Бег ‚,5с11пе11е” Ьгаипе Риспз зргіпві ііЬег ‹!еп Тапіеп Нипа. Ье гепага Ьгип «гарісіе» заше раг-сіеззиз 1е сЬіеп рагеззеих. Ьа уоіре тапопе гаріаа зама зорга і] сапе рівго. Е1 гогго таггбп гёріао зама воЬге е1 репо регегозо. А гароза шапот гйріаа зака воЬге о еде ргевиісозо. " Available languages are dependent on the Tesseract OCR .traineddata files located in the directory denoted by *TESSDATA-DIRECTORY*. CL-TESSERACT attempts to set this variable to a reasonable default for your platform. IMAGE-TO-HOCR accepts a lisp pathname, the optional language parameter, and a optional page number (default 0) and return HOCR XML describing not just the recognized text, but its location in the page: * (image-to-hocr #P"~/python-tesseract/eurotext.jpg”) " <div class='ocr_page' id='page_2' title='image \"/Users/Walrus/python-tesseract/eurotext.jpg\"; bbox 0 0 1024 800; ppageno 1'> <div class='ocr_carea' id='block_2_1' title=\"bbox 98 66 918 661\"> . . . word_2_65' title='bbox 391 621 456 651; x_wconf 72' lang='eng' dir='ltr'>C50</span> <span class='ocrx_word' id='word_2_66' title='bbox 481 621 710 661; x_wconf 74' lang='eng' dir='ltr'>preguieoso.</span> </span> </p> </div> </div> " This can be parsed using Common Lisp libraries such as Closure-XML and plump. Tested on CCL and SBCL. License: MIT Author: Edward Geist ([email protected])
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.