Giter VIP home page Giter VIP logo

tesseractsharp's Introduction

TesseractSharp

GitHub license

A C# wrapper for Tesseract 5, last update 21 08 2021

Use TesseractSharp !

Example usage:

using System;
using System.IO;
using TesseractSharp;
using TesseractSharp.Hocr;

namespace ConsoleApp
{
    class Program
    {
        static void Main(string[] args)
        {
            var input = @"C:\Users\nobody\sample.jpg";

            var ouput1 = input.Replace(".jpg", ".pdf");
            using (var stream = Tesseract.ImageToPdf(input, languages: new[] { Language.English, Language.French }))
            using (var writer = File.OpenWrite(ouput1))
            {
                stream.CopyTo(writer);
            }

            var ouput2 = input.Replace(".jpg", ".txt");
            using (var stream = Tesseract.ImageToTxt(input, languages: new[] { Language.English, Language.French }))
            using (var writer = File.OpenWrite(ouput2))
            {
                stream.CopyTo(writer);
            }

            var ouput3 = input.Replace(".jpg", ".tsv");
            using (var stream = Tesseract.ImageToTsv(input, languages: new[] { Language.English, Language.French }))
            using (var writer = File.OpenWrite(ouput3))
            {
                stream.CopyTo(writer);
            }

            var ouput4 = input.Replace(".jpg", ".hocr");
            using (var stream = Tesseract.ImageToHocr(input, languages: new[] { Language.English, Language.French }))
            using (var writer = File.OpenWrite(ouput4))
            {
                stream.CopyTo(writer);
            }

            // Also works with a Bitmap !
            var ouput5 = input.Replace(".jpg", ".pdf");
            var bitmap = (Bitmap)Image.FromFile(input);
            using (var stream = Tesseract.ImageToPdf(bitmap, languages: new[] { Language.English, Language.French }))
            using (var writer = File.OpenWrite(ouput5))
            {
                stream.CopyTo(writer);
            }

            var hocr = HOCRParser.Parse(File.OpenText(ouput4));
            foreach (var page in hocr.Pages)
            {
                Console.WriteLine($"page={page.Title}");
                foreach (var area in page.Areas)
                {
                    Console.WriteLine($"area={area.Title}");
                    foreach (var par in area.Paragraphs)
                    {
                        Console.WriteLine($"par={par.Title}");
                        foreach (var line in par.Lines)
                        {
                            Console.WriteLine($"line={line.Title}");
                            foreach (var word in line.Words)
                            {
                                Console.WriteLine($"word={word.Title}");
                            }
                        }
                    }
                }
            }

            Console.ReadKey();
        }
    }
}

For developers

Developers can rebuild tesseract from build it from source.

tesseractsharp's People

Contributors

andrei-shift avatar thibault-reigner avatar veselv2010 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.