Giter VIP home page Giter VIP logo

tesseract4java's Introduction

tesseract4java: Tesseract GUI

A graphical user interface for the Tesseract OCR engine. The program has been introduced in the Master’s thesis “Analyses and Heuristics for the Improvement of Optical Character Recognition Results for Fraktur Texts” by Paul Vorbach (German).

Donate with PayPal

Download

Binary distributions and release notes are available in the releases section.

Screenshots

Preprocessing

Preprocessing view

Box Editor

Box editor for training

Glyph Overview

Glyph overview for easier detection of errors

Comparison View

Comparison view to compare the original document with the perceived result

Transcription View

Evaluation view with a transcription field

ocrevalUAtion

ocrevalUAtion

Batch Export

Batch export functionality to handle large projects

Building and running the software

This software is written in Java and can be built using Apache Maven. In order to build the software you have to follow these steps:

  1. Obtain a copy either by cloning the repository or downloading the current zip file.
  2. Also obtain a copy of a patched version of ocrevalUAtion (zip file).
  3. Open a command line in the ocrevalUAtion directory and run mvn clean install.
  4. cd to the tesseract4java directory and run mvn clean package -Pstandalone. This will include the Tesseract binaries for your platform. You can manually define the platform by providing the option -Djavacpp.platform=[PLATFORM] (available platforms are windows-x86_64, windows-x86, linux-x86_64, linux-x86, and macosx-x86_64).

After you've run through all steps, the directory "tesseract4java/gui/target" will contain the file "tesseract4java-[VERSION]-[PLATFORM].jar", which you can run by double-clicking or executing java -jar tesseract4java-[VERSION]-[PLATFORM].jar.

Credits

License

GPLv3

tesseract4java - a graphical user interface for the Tesseract OCR engine
Copyright (C) 2014-2016 Paul Vorbach

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

tesseract4java's People

Contributors

pvorb avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.