Giter VIP home page Giter VIP logo

tesseract-ocr-dotnet's Introduction

Warning

This project has been deprecated in favour of https://github.com/charlesw/tesseract.

Overview

A tesseract-ocr .NET wrapper based on tesseractdotnet.

This project can be considered an (unofficial) fork off the tesseract-ocr project that adds a .NET wrapper using C++/CLI. It is based off the excellent work done by the tesseractocrdotnet team.

Code License: Apache License 2.0
Site Content License (Documentation etc): [Creative Commons Attribution 3.0 Unported License](license" href="http://creativecommons.org/licenses/by/3.0/)

Example

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;
using System.Drawing;
using OCR.TesseractWrapper;

namespace tesseractconsole
{
    public class Program
    {
        const string TessractData = @".\tessdata\";

        public static void Main(string[] args)
        {
            const string language = "eng";
            string imageFile = args[0];

            TesseractProcessor processor = new TesseractProcessor();

            using (var bmp = Bitmap.FromFile(imageFile) as Bitmap) {
                var success = processor.Init(TessractData, language, (int)eOcrEngineMode.OEM_DEFAULT);
                if (!success) {
                    Console.WriteLine("Failed to initialize tesseract.");
                } else {
                    string text = processor.Recognize(bmp);
                    Console.WriteLine("Text:");
                    Console.WriteLine("*****************************");
                    Console.WriteLine(text);
                    Console.WriteLine("*****************************");
                }
            }

            Console.WriteLine("Press any key to exit.");
            Console.ReadKey();
        }
    }
}

tesseract-ocr-dotnet's People

Contributors

charlesw avatar jlewin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tesseract-ocr-dotnet's Issues

Wiki for use?

Is there a wiki telling how to install and use this tesseract? I been looking for it but with no luck at all.

I have some troubles compiling the version for vs2010 4.0 .net framwork.

If some one have any guide it would be great

Runtime exception

I am trying to integrate the tesseract ocr in my C'# application. I have included the libtesseract.dll in my application as a reference and during the tesseract initialization it throws a runtime exception saying

"Attempted to read or write protected memory. This is often an indication that other memory is corrupt."

I am using the same code example that has been given by on this page

                    using (var bmp = numberplate)
                    {
                        var success = processor.Init(@"E:\Roshan Rajaratnam\Desktop\tesseract-ocr-3.01.eng (1)\tesseract-ocr\tessdata\", language, (int)eOcrEngineMode.OEM_DEFAULT);
                        if (!success)
                        {
                            Console.WriteLine("Failed to initialize tesseract.");
                        }
                        else
                        {
                            string text = processor.Recognize(bmp);
                            Console.WriteLine("Text:");
                            Console.WriteLine("*****************************");
                            Console.WriteLine(text);
                            Console.WriteLine("*****************************");
                        }
                    }

what can be the cause for this?

Tesseract 3.02

Is there any plan to update the project to Tesseract 3.02? Thanks.

Parsing results

Hi Charles,

I am using your wrapper in a C# class so I hope that you will forgive me for my seeming ignorance. Using your application I am getting OCR results, but I cannot figure out how to get the meta data such as words, zones, characters, confidences, etc. Can you give me a small shove in the right direction?

Many thanks.

John

Exception when running VS2008 console app (The specified module could not be found: 0x8007007E))

Hi thanks very much for creating this fork - just was I was looking for!

When I run the VS2010 version of the console app it works fine.
However when I run the VS2008 version, I get the following error immediately (before even hitting a breakpoint at the start of Main():

The specified module could not be found. (Exception from HRESULT: 0x8007007E)

at tesseractconsole.Program.Main(String[] args)\r\n
at System.AppDomain._nExecuteAssembly(Assembly assembly, String[] args)\r\n
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)\r\n
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()\r\n at System.Threading.ThreadHelper.ThreadStart_Context(Object state)\r\n
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)\r\n
at System.Threading.ThreadHelper.ThreadStart()"

I've searched the tesseractdotnet issues for this, and the only possible solutions I've seen are these posts here:
http://code.google.com/p/tesseractdotnet/issues/detail?id=1#c60
http://code.google.com/p/tesseractdotnet/issues/detail?id=1#c67

I've tried ensuring, as suggested in the second post that the console app is configured to "x86" and not "AnyCPU" but it doesn't appear to make any difference, and I'm afraid I don't understand the what is meant by "try add "leptonlibd.dll" in the output file." in the first post .

Any suggestions?

System.IO.FileNotFoundException

Hi!
I'm using tesseract 3.01 and vs 2008.
When I run the tesseractconsole.exe via command line, an exception happened:

C:\Users\Iris>E:\tesseract\tesseract-ocr-dotnet-master\tesseractconsole.exe E:\t
esseract\tesseract-ocr-dotnet-master\phototest.tif E:\tesseract\tesseract-ocr-do
tnet-master\abc.txt

未处理的异常: System.IO.FileNotFoundException: 未能加载文件或程序集“tesseract,
Version=0.0.0.0, Culture=neutral, PublicKeyToken=null”或它的某一个依赖项。系统
找不到指定的文件。
文件名:“tesseract, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null”
在 tesseractconsole.Program.Main(String[] args)

警告: 程序集绑定日志记录被关闭。
要启用程序集绑定失败日志记录,请将注册表值 HKLM\Software\Microsoft\Fusion!Enabl
eLog
设置为 1。
注意: 会有一些与程序集绑定失败日志记录关联的性能损失。
要关闭此功能,请移除注册表值 [HKLM\Software\Microsoft\Fusion!EnableLog]。

What's the problem?
Thanks in advance!

Rewrite tesseract-ocr wrapper to use C++\CLI

The wrapper currently use Managed extensions for C++ not C++\CLI which is deprecated by Microsoft. When doing this we should also develop the .net API as a one to one port of the C++ API as much as possible.

x86 problem (i think)

Hello, i whant to start with thank you for do tesseract-ocr for .Net 4.0!

Everything has worked good on my developer machine but now when i should publish the project that are a ASP.NET MVC 3 project on my webserver its seems that something is wrong.

I getting this error: http://pastebin.com/LUvBT6ak

I have the libtesseract.dll in the bin folder like it should on the server and everything looks like it should what i think, but when i searching for this problem on google i find this issue on the old project for tesseract-ocr on .Net: http://code.google.com/p/tesseractdotnet/issues/detail?id=1

And it looks like they think it is x86 vs x64 problem, so i should whant you to release the dll as a x86 and a x64 binaries or if you have any other idea?

I can not open the code becouse of i dont have c++ visual studio and stuff else i should try todo it my self.

在 System.RuntimeMethodHandle._InvokeConstructor

The following error occurred on the xp system:
堆栈内容: 在 System.RuntimeMethodHandle._InvokeConstructor(IRuntimeMethodInfo method, Object[] args, SignatureStruct& signature, RuntimeType declaringType)
在 System.RuntimeMethodHandle.InvokeConstructor(IRuntimeMethodInfo method, Object[] args, SignatureStruct signature, RuntimeType declaringType)
在 System.Reflection.RuntimeConstructorInfo.Invoke(BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
在 System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)
在 System.Activator.CreateInstance(Type type, BindingFlags bindingAttr, Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)
在 InteropDotNet.InteropRuntimeImplementer.CreateInstanceT
在 Tesseract.Interop.LeptonicaApi.Initialize()
在 Tesseract.Interop.TessApi.Initialize()
在 Tesseract.Interop.TessApi.get_Native()
在 Tesseract.TesseractEngine..ctor(String datapath, String language, EngineMode engineMode, IEnumerable1 configFiles, IDictionary2 initialOptions, Boolean setOnlyNonDebugVariables)

Word.Confidence and Word.Text not loaded up

Hi, first of all thank you for efforts on making this wrapper work on VS2010. It will be great if you implement a monitor in order to have Word.Confidence and Word.Text loaded after an AnalyzeLayout run.
Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.