Giter VIP home page Giter VIP logo

gamera-3's Introduction

Gamera 3 (deprecated version for Python 2)

Gamera is a framework for building document analysis applications.
It is not a packaged document recognition system, but a toolkit for building document image recognition systems.

For more information about Gamera, visit the Gamera website at:

http://gamera.informatik.hsnr.de/

Note that Gamera version 3 is for Python 2.x only. For a Gamera version that runs under Python 3, see

https://github.com/hsnr-gamera/gamera-4

Installation and Usage

See the INSTALL file for installation instructions, or online under

http://gamera.informatik.hsnr.de/docs/gamera-docs/install.html

The complete Gamera documentation is available online at:

http://gamera.informatik.hsnr.de/docs/gamera-docs/

Authors and License

(c)2001-2003 Michael Droettboom, Karl Mac Millan, Ichiro Fujinaga
(c)2004-2007 Michael Droettboom
(c)2008-2016 Michael Droettboom and Christoph Dalitz

See the file ACKNOWLEDGEMENTS for additional contributors.

This software is distributed under the GNU General Public License. See the file LICENSE for more information.

As the GNU GPL is only applicable to software, the accompanying documentation is distributed under the terms of the Creative Commons Attribution-Share Alike license. See the bottom of the file doc/src/index.txt for details.

gamera-3's People

Contributors

cdalitz avatar fujinaga avatar jannschu avatar jedah007 avatar jwilk avatar mdboom avatar vincsdev avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gamera-3's Issues

Why is the step of reading the image so slow after a rebuild?

In Python-2.7 I use the standard Ubuntu 18.04 deb of python-gamera:
python-gamera_3.4.2+git20160808.1725654-2_amd64.deb

When I rebuild gamera with option nowx in Mint 20.2 it is reading images much slower. Same with gamera-4.

Has the deb been built with an optimization I don't know of or is 3.4.4 slower than 3.4.2 reading images?

The read-step is done from didjvu: 2 minute 45.

date;./didjvu encode ~/Afbeeldingen/outputbase2-000-raar-effect-onderste-regel-didjvu\ zonder\ tekst.tif -o jaarverslagraa5r.djvu;date
wo 17 nov 2021  7:21:20 CET
/home/robert/Afbeeldingen/outputbase2-000-raar-effect-onderste-regel-didjvu zonder tekst.tif:
- reading image
- converting to DjVu
- 0.010 bits/pixel; 113.343:1, 99.12% saved, 4122172 bytes in, 36369 bytes out
wo 17 nov 2021  7:24:05 CET
pipdeptree
argparse==1.2.1
ExifRead==2.3.2
gamera==3.4.4
gyp==0.1
image==1.5.33
  - django [required: Any, installed: 1.11.29]
    - pytz [required: Any, installed: 2021.3]
  - pillow [required: Any, installed: 6.2.1]
  - six [required: Any, installed: 1.16.0]
nose==1.3.7
numpy==1.16.6
olefile==0.46
pipdeptree==2.2.0
  - pip [required: >=6.0.0, installed: 20.3.4]
pyexiv2==1.2.1
Python==2.7.18
setuptools==44.1.1
wheel==0.37.0
wsgiref==0.1.2

Python 3 support

I just tried to build gamera from the current master branch using Python 3.6.8 on Ubuntu 18.04 which failed. It seems like Python 3 is not supported at all, although I could not find any information on this inside the docs (they only mention Python >= 2.4).

Are there any plans to add support for Python 3 to this project? (Python 2 will be considered EOL at the end of the year.)

Grouped glyphs getting created every time Group and Guess All

(Using Gamera version 3.4.3)
If you use "group and guess all" more than once, duplicates of glyphs will get created. (Gamera keeps looking at the _group._part glyphs and groups them every time the function is called.)

I used this image. I grouped one of the 'i's as _group.i
test_image

The first time I grouped them, these are the glyphs I got. There is already an extra i.
first_group

The second time I grouped them, there were double the number of 'i's.
second_group

Further grouping caused some of the 'i's to be placed into the _group._part.i

Stops working too often to be usable

First observed on Debian version. Yesterday compiled from source on Debian buster. Unable to save results after
`(gamera_gui:12378): Gtk-CRITICAL **: 08:40:53.929: gtk_widget_set_allocation: assertion gtk_widget_get_visible (widget) || _gtk_widget_is_toplevel (widget)' failed'
I'm curious whether the problem appears also in gamera-4 butm as reported yesterday, I'm unable to build it.

Gamera crashes on importing some gamera_xml files as classifiers, seg fault 11

When importing a gamera_xml file using knn.kNNonInteractive(), gamera crashes (both in gui and scripts). When the fourier_broken feature is removed, oftentimes that will fix the issue. If imported as an xml file instead of as a classifier, the crash does not occur.

Here is an example of such a file generated through common gamera procedures (cc_analysis, split, etc.)

<?xml version="1.0" encoding="utf-8"?>
<gamera-database version="2.0">
  <glyphs>
    <glyph uly="3869" ulx="2010" nrows="1" ncols="6">
      <ids state="MANUAL">
        <id name="skip" confidence="1.000000"/>
      </ids>
      <data>
        0 6 
      </data>
      <features scaling="1.0">
        <feature name="fourier_broken">
          1.0 0.00862878687372 0.00862878687371 0.402552988566 0.402552988566
          0.015147028975 0.015147028975 0.0106590671642 0.0106590671642
          0.0451549855635 0.0451549855635 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
          0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
          0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
        </feature>
      </features>
    </glyph>
  </glyphs>
</gamera-database>

This is in Gamera version 3.4.3

Gamera crash: free(): invalid pointer: 0x0000558fd75398f0

This has been reported on the Debian BTS (https://bugs.debian.org/874797). A crash reproducer is available there. The output is:

$ ./874797.py 
*** Error in `/usr/bin/python': munmap_chunk(): invalid pointer: 0x0000559f1463da30 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x70bfb)[0x7f3b972e1bfb]
/lib/x86_64-linux-gnu/libc.so.6(+0x76fc6)[0x7f3b972e7fc6]
/usr/lib/python2.7/dist-packages/gamera/plugins/_transformation.x86_64-linux-gnu.so(_ZN5vigra10BasicImageIhSaIhEE10deallocateEv+0x51)[0x7f3b83f5dfb1]
/usr/lib/python2.7/dist-packages/gamera/plugins/_transformation.x86_64-linux-gnu.so(_ZN5vigra13resampleImageIN6Gamera18ConstImageIteratorIKNS1_9ImageViewINS1_9ImageDataIhEEEEPKhEENS1_8AccessorIhEENS1_13ImageIteratorIS6_PhEESC_EEvT_SG_T0_T1_T2_dd+0x33b)[0x7f3b83f76e6b]
/usr/lib/python2.7/dist-packages/gamera/plugins/_transformation.x86_64-linux-gnu.so(_ZN6Gamera6resizeINS_9ImageViewINS_9ImageDataIhEEEEEEPNS_5ImageERT_RKNS_3DimEi+0x43f)[0x7f3b83fb2c3f]
/usr/lib/python2.7/dist-packages/gamera/plugins/_transformation.x86_64-linux-gnu.so(+0x1fd85)[0x7f3b83f4ed85]

Thanks,
DS

'Avg' criterion not used in grouping

I noticed in the function _OnGroupAndGuess, in classifier_display.py, the choice made between “avg” and “min” isn’t passed, so the default value of “min” is always used.

Important Maintenance Considerations.

@cdalitz @danstender
The travis build should guarantee a working build, without any substitutions. A 'passing' travis build badge, in its current state, is misleading.
https://github.com/hsnr-gamera/gamera/blob/096270731e249256af7ba0fd7fdbf3980d630b49/.travis.yml#L9

  • This installation does not include wx. If Gamera is to be maintained, again this is a suggestion, the dependencies should also be maintained. This includes moving incrementally from wxpython 2.8 to 3, and so on to the latest. As it stands, Gamera GUI can not be installed easily because of issues surrounding wx.
  • A second option is to drop support for the mac GUI, and concentrate on properly maintaining a stable debian package only. Here wx could also be replaced with another project.
  • The last option is to drop the GUI completely (and wx with it.)

https://wxpython.org/Phoenix/docs/html/MigrationGuide.html
https://docs.travis-ci.com/user/gui-and-headless-browsers/

gamera xml validation schema

Feature

This is a "would be nice" request. I would like to have a xml validation schema for gamera xml files straight from the maintainers of gamera. I have looked at https://gamera.informatik.hsnr.de/docs/gamera-docs/xml_format.html but the DTD does not seem to be valid anymore.

Python script

#XML Validation
from io import StringIO, BytesIO
from lxml import etree

def validate_xml(gamera_xml):
  """
  Return if the provided gamera xml file is valid against the current gamera xml schema.

  Args:
    gamera_xml (string): a filepath to the gamera xml file to be validated.

  Returns:
    bool
  """
  # schema_filepath = "gamera.dtd"
  schema_filepath = "gamera.xsd"
  with open(schema_filepath, "r") as schema:
    schema = StringIO(schema.read())
  with open(gamera_xml, "rb") as xml_file:
    test = BytesIO(xml_file.read())

  xml_schema = etree.XMLSchema(etree.parse(schema_filepath))
  return xml_schema.validate(etree.parse(test))

print(
  validate_xml("./CF-011-position.xml"),
)

xsd spec

<!--  xsd schema -->
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="gamera-database">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="glyphs">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="glyph" maxOccurs="unbounded" minOccurs="0">
                <xs:complexType>
                  <xs:sequence>
                    <xs:element name="ids">
                      <xs:complexType>
                        <xs:sequence>
                          <xs:element name="id">
                            <xs:complexType>
                              <xs:simpleContent>
                                <xs:extension base="xs:string">
                                  <xs:attribute type="xs:string" name="name" use="optional"/>
                                  <xs:attribute type="xs:float" name="confidence" use="optional"/>
                                </xs:extension>
                              </xs:simpleContent>
                            </xs:complexType>
                          </xs:element>
                        </xs:sequence>
                        <xs:attribute type="xs:string" name="state" use="optional"/>
                      </xs:complexType>
                    </xs:element>
                    <xs:element type="xs:string" name="data"/>
                    <xs:element name="features">
                      <xs:complexType>
                        <xs:sequence>
                          <xs:element name="feature" maxOccurs="unbounded" minOccurs="0">
                            <xs:complexType>
                              <xs:simpleContent>
                                <xs:extension base="xs:string">
                                  <xs:attribute type="xs:string" name="name" use="optional"/>
                                </xs:extension>
                              </xs:simpleContent>
                            </xs:complexType>
                          </xs:element>
                        </xs:sequence>
                        <xs:attribute type="xs:float" name="scaling" use="optional"/>
                      </xs:complexType>
                    </xs:element>
                  </xs:sequence>
                  <xs:attribute type="xs:short" name="uly" use="optional"/>
                  <xs:attribute type="xs:short" name="ulx" use="optional"/>
                  <xs:attribute type="xs:short" name="nrows" use="optional"/>
                  <xs:attribute type="xs:byte" name="ncols" use="optional"/>
                </xs:complexType>
              </xs:element>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute type="xs:float" name="version"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

installation with pip requires wx, even with --nowx flag

When attempting to install gamera with the following command

pip install git+https://github.com/hsnr-gamera/gamera.git --global-option="--nowx"

The instalation fails with the following output:

Traceback (most recent call last):
File "", line 1, in
File "/private/tmp/pip-req-build-9bFOuV/setup.py", line 235, in
import wx
ImportError: No module named wx
Command "python setup.py egg_info" failed with error code 1 in /private/tmp/pip-req-build-9bFOuV/

Crash in _gui_support.color_ccs()

When Gamera is built with GCC 9, test_rle.py crashes:

$ pytest test_rle.py
============================= test session starts ==============================
platform linux2 -- Python 2.7.16, pytest-4.6.5, py-1.8.0, pluggy-0.12.0
rootdir: /home/jwilk/gamera
collected 1 item

test_rle.py Segmentation fault

Backtrace:

#0  0x00007fb7e18aa603 in Gamera::RleDataDetail::RLEProxy<Gamera::RleDataDetail::RleVector<unsigned short> >::operator unsigned short (this=<synthetic pointer>) at include/rle_data.hpp:260
#1  Gamera::is_white<Gamera::RleDataDetail::RLEProxy<Gamera::RleDataDetail::RleVector<unsigned short> > > (value=...) at include/pixel.hpp:440
#2  Gamera::color_ccs<Gamera::ImageView<Gamera::RleImageData<unsigned short> > > (m=..., ignore_unlabeled=false) at include/plugins/gui_support.hpp:258
#3  0x00007fb7e18a0d6a in call_color_ccs (self=<optimized out>, args=<optimized out>) at /home/jwilk/gamera/gamera/plugins/_gui_support.cpp:275
#4  0x000056106008e0d9 in call_function (oparg=<optimized out>, pp_stack=0x7fff9951f7e8) at Python/ceval.c:4376
#5  PyEval_EvalFrameEx (f=f@entry=0x7fb7e0bb4050, throwflag=throwflag@entry=0) at Python/ceval.c:3013
...

Python backtrace:

File "/home/jwilk/.local/lib/python2.7/site-packages/gamera/plugins/gui_support.py", line 98, in __call__
  return _gui_support.color_ccs(image, ignore_unlabeled)
File "/home/jwilk/gamera/tests/test_rle.py", line 23, in test_rle1
  assert image1.color_ccs().to_string() == image2.color_ccs().to_string()
...

This was tested with:

  • GCC 9.2.1
  • Python 2.7.16 (built from source)
  • Gamera from git master (c77194d)

The bug was originally reported in Debian:
https://bugs.debian.org/925689

install toolkits

hey,

I wanted to install this toolkit - MusicStaves

But I encountered this error,
fatal error: gameramodule.hpp: No such file or directory

When I checked under /usr/include/ , indeed gamora was not there. There was no error in installation and the gui is working. I'm probably doing something wrong, could you kindly help me out?

This is the compile command that gives the error
x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -Iinclude -I/home/mukunar1/workspace/wonderingEar/NewNew/musicstaves-1.4.0/gamera/toolkits/musicstaves/plugins -Iinclude/plugins -I/usr/include/python2.7/gamera -I/usr/include/python2.7/gamera -I/usr/include/python2.7/../gamera -I/usr/include/gamera -I/usr/include/python2.7 -c /home/mukunar1/workspace/wonderingEar/NewNew/musicstaves-1.4.0/gamera/toolkits/musicstaves/plugins/_evaluation.cpp -o build/temp.linux-x86_64-2.7/home/mukunar1/workspace/wonderingEar/NewNew/musicstaves-1.4.0/gamera/toolkits/musicstaves/plugins/_evaluation.o -Wall

integer overflow in include/plugins/features.hpp

I get warnings about integer overflow when building on a 32-bit machine:

i586-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Iinclude -I/home/jwilk/gamera/gamera/plugins -Iinclude/plugins -I/usr/include/python2.7 -c /home/jwilk/gamera/gamera/plugins/_fourier_features.cpp -o build/temp.linux-i686-2.7/home/jwilk/gamera/gamera/plugins/_fourier_features.o -DFDLENGTH=48 -Wall
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from include/plugins/segmentation.hpp:32:0,
                 from include/plugins/fourier_features.hpp:27,
                 from /home/jwilk/gamera/gamera/plugins/_fourier_features.cpp:7:
include/plugins/features.hpp: In function ‘double Gamera::zer_pol_R(int, int, double, double)’:
include/plugins/features.hpp:457:101: warning: integer overflow in expression [-Woverflow]
                               (long int)2*3*4*5*6*7*8*9*10*11*12, (long int)2*3*4*5*6*7*8*9*10*11*12*13, (long int)2*3*4*5*6*7*8*9*10*11*12*13*14, (long int)2*3*4*5*6*7*8*9*10*11*12*13*14*15};
                                                                                                     ^
include/plugins/features.hpp:457:140: warning: integer overflow in expression [-Woverflow]
                               (long int)2*3*4*5*6*7*8*9*10*11*12, (long int)2*3*4*5*6*7*8*9*10*11*12*13, (long int)2*3*4*5*6*7*8*9*10*11*12*13*14, (long int)2*3*4*5*6*7*8*9*10*11*12*13*14*15};
                                                                                                                                            ^
include/plugins/features.hpp:457:182: warning: integer overflow in expression [-Woverflow]
                               (long int)2*3*4*5*6*7*8*9*10*11*12, (long int)2*3*4*5*6*7*8*9*10*11*12*13, (long int)2*3*4*5*6*7*8*9*10*11*12*13*14, (long int)2*3*4*5*6*7*8*9*10*11*12*13*14*15};
                                                                                                                                                                                      ^

Indeed, 13! > 231, so it doesn't fit long int.

Tested with git master (d78e199).

installation problem

I tried to install gamera 3.4.2 from the release page on macOS Sierra 10.12.2 following the instructions in the INSTALL file, i.e. running:

sudo python setup.py install

Unfortunately the process fails on line 235 import wx with ImportError: No module named wx

The installed python version is 2.7.10, the one that came with the OS.

threshold.hpp djvu_threshold doesn't respect image boundaries

You can reproduce it with jwilk/didjvu#16

If you want the bottom of the stencil to look right you can alter threshold.hpp like this:

diff --git a/include/plugins/threshold.hpp b/include/plugins/threshold.hpp
index a637c91..617aef9 100644
--- a/include/plugins/threshold.hpp
+++ b/include/plugins/threshold.hpp
@@ -584,8 +584,8 @@ Image *djvu_threshold(const T& image, const double smoothness,
 
   for (size_t r = 0; r < image.nrows(); ++r) {
     for (size_t c = 0; c < image.ncols(); ++c) {
-      double c_frac = (double)c / min_block_size;
-      double r_frac = (double)r / min_block_size;
+      double c_frac = std::min((double)c / min_block_size,floor((double)(image.ncols() - 1)/min_block_size));
+      double r_frac = std::min((double)r / min_block_size,floor((double)(image.nrows() - 1)/min_block_size));
       RGBPixel fg = fg_acc(fg_image.upperLeft(), c_frac, r_frac); 
       RGBPixel bg = bg_acc(bg_image.upperLeft(), c_frac, r_frac);
       double fg_dist = djvu_distance(image.get(Point(c, r)), fg);

However I'm not sure whether the bilinear interpolation projection is exactly done right, that has still to be figured out.
There is a newer version of vigra, but that doesn't solve this issue, and the comments in vigra suggest this issue has been solved before, so I guess there is something wrong with the way Gamera is preparing this interpolation.

Threshold size method?

❓ Maybe add size method?

Is <- thumbnail(I, WxH -> RxR) //small image
Ib <- bicubic(Is, RxR -> WxH) //upscale small image
Ith <- (sign(I - Ib + delta) < 0) ? 0 : 255 //threshold

🔗 See also jwilk/didjvu#15

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.