Giter VIP home page Giter VIP logo

chupa-text's Introduction

README

Name

ChupaText

Description

ChupaText is an extensible text extractor. You can plug your custom text extractor in ChupaText. You can write your plugin by Ruby.

Overview

ChupaText applies registered decomposers to input data recursively. Finally, the input data is decomposed to text data.

Here is an ASCII art to describe process flow:

input data
     |
    \|/
|decomposer|
     |
    \|/
other data
     |
    \|/
|decomposer|
     |
    \|/
...
     |
    \|/
|decomposer|
     |
    \|/
text data

Decomposer is a module that decomposes input data to other data. The decomposed data may not be text data. If the decomposed data is not text data, ChupaText applies a decomposer again. Finally, the decomposed data will be text data.

Decomposer module is a plugin. You can add supported data types by installing decomposer modules. Or you can create your custom decomposer. Decomposer is a simple Ruby object. So it is easy to create. It is described later.

Install

Install chupa-text gem:

% gem install chupa-text

Now, you can use chupa-text command:

% chupa-text --version
chupa-text 1.0.0

How to use

You can use ChupaText as command line tool or Ruby library. See the following documentations for details:

How to create a decomposer

See doc/text/decomposer.md how to write a decomposer.

Available plugins

Search by chupa-text-decomposer- on https://rubygems.org/: http://rubygems.org/search?query=chupa-text-decomposer-

Author

License

LGPL 2.1 or later.

(Kouhei Sutou has a right to change the license including contributed patches.)

chupa-text's People

Contributors

abetomo avatar dependabot[bot] avatar kou avatar okkez avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.