Giter VIP home page Giter VIP logo

jbig2-imageio's People

Contributors

dependabot[bot] avatar drewshz avatar hennejg avatar janpe2 avatar themattcode avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jbig2-imageio's Issues

Memory Leak in SoftReferenceCache

Hello,

I noticed a memory leak in SoftReferenceCache. The cache is implemented as a HashMap where the key is an arbitrary object and the value is wrapped with a SoftReference. While the values eventually get reclaimed by the garbage collector, there is no mechanism to ever clear the keys of the HashMap. Thus, the keys of the cache grow without bound and may eventually exhaust available heap space.

JBIG2ImageReader uses this cache with a JBIG2Page as the key. The JBIG2Page objects occupy about 20kB worth of memory because they contain a 20k javax.imageio.stream.MemoryCacheImageInputStream. This can accumulate quickly when rendering pdfs with jbig2 (50k images ~ 1GB heap space).

As a workaround, I'm periodically calling cache.clear().

Suggestions:

  1. An option to disable this cache without having to override via ServiceLoader.
  2. Implement a size or time-based eviction strategy to purge old cached entries.

IP vetting for included reference data

The following files come from the official reference suite. It is currently no longer available from the original source, but has been mirrored here for the jbig2dec library

  • 042*.jb2 is an excerpted page from a CCITT specification. It is used as an example in several places, e.g. https://books.google.de/books?id=ptzgBwAAQBAJ&lpg=PA129&ots=hXbW_FaEVw&hl=de&pg=PA129#v=onepage&q&f=false
  • amb.bmp is a dithered representation of a promotional photo of Ally Mc Beal (Callista Flockhart)
  • 002.jb2 is the first page of US Patent 6,122,289
  • 003.jb2 seems to be from a report concerning the Lewinsky scandal
  • 004.jb2 and 005.jb2 are excerpted pages from "Libertarianism: A Primer" by David Boaz, a copyrighted work from 1997
  • 006.jb2 is a page from an SEC registration
  • 007.jb2 see 042*.jb2 above

These sample bitstreams are referenced in many places, e.g. here but not all include the actual files. Jbig2dec is licensed under the GPL but I doubt that this applies to the sample images as well.

The following files/bitstreams are reproduced as hex-dumps within the T.88 specification document

  • sampledata*.jb2 contains minimal sample bitstreams for varoius segment types

The heritage of the following files is known, but the copyright status is not clear

newlines printed to stdout

In the 2.0 version, newlines are printed to stdout. I went through the repository and it is HuffmanTable.java, line 64. The effect can be reproduced with the code and the files from #21.

Include licensing information about T-REC-T.88 bitstreams

The code base includes sample bitstreams that haven been transcribed from the T-REC-T.88 specification. The intellectual property is wholly owned by the International Telecommunication Union. The ITU has kindly permitted us to use the bitstreams, as long as the following information is included with the source code.

The compliance checking files provided with this package contain information which has been extracted from Recommendation ITU-T T.88 "Information technology – Lossy/lossless coding of bi-level images" (2000/02) of the International Telecommunication Union (“ITU Information”), as found in http://www.itu.int/rec/T-REC-T.88-200002-I/en.

The extraction and use of ITU Information has been made under license from International Telecommunication Union (“ITU”), which owns all property rights (including intellectual property rights) to Recommendation ITU-T T.88 (2000/02). This ITU Information is made available to everyone for free and may be used for non-commercial purposes; for any other use please contact ITU at [email protected]. The sole responsibility for extracting the ITU Information and the responsibility for any errors or deficiencies in the package lies exclusively with [creator of package]. ITU is not involved in the development of the package or the extraction and use of ITU Information contained therein.

What does the files *.jb2 are

Hello,

I want to package this library for Debian because i need it as a dependency.

Can you provide the source for these files too?

src/test/resources/images/

Threading-Problem

Hello,

got a problem with the JBIG2-Plugin, with multiple-threads.

When I used it with 5 threads, all five threads were blocked after about one hour. All threads were locked in HashMap.getEntry()

Regards

Felix

google code repository broken

https://code.google.com/archive/p/jbig2-imageio/

Maven coordinates will work and need not to be changed

No, they don't work anymore, the repository is 404 now, so they must be changed: use the general repository, and version 1.6.5. Apparently older versions don't work anymore. (I don't mind this, but it broke our build until I understood what happened).

RuntimeException: Can't instantiate segment class

Exception in thread "main" java.lang.RuntimeException: Can't instantiate segment class
	at com.levigo.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:405)
	at com.levigo.jbig2.JBIG2Page.createNormalPage(JBIG2Page.java:182)
	at com.levigo.jbig2.JBIG2Page.createPage(JBIG2Page.java:154)
	at com.levigo.jbig2.JBIG2Page.composePageBitmap(JBIG2Page.java:145)
	at com.levigo.jbig2.JBIG2Page.getBitmap(JBIG2Page.java:125)
	at com.levigo.jbig2.JBIG2ImageReader.read(JBIG2ImageReader.java:223)
	at javaapplicationjbig2test.JavaApplicationJBig2Test.test2(JavaApplicationJBig2Test.java:84)
	at javaapplicationjbig2test.JavaApplicationJBig2Test.main(JavaApplicationJBig2Test.java:52)
Caused by: java.lang.ClassCastException: com.levigo.jbig2.decoder.huffman.ValueNode cannot be cast to com.levigo.jbig2.decoder.huffman.InternalNode
	at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
	at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
	at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
	at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
	at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
	at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
	at com.levigo.jbig2.decoder.huffman.HuffmanTable.initTree(HuffmanTable.java:68)
	at com.levigo.jbig2.decoder.huffman.FixedSizeTable.<init>(FixedSizeTable.java:30)
	at com.levigo.jbig2.segments.TextRegion.symbolIDCodeLengths(TextRegion.java:892)
	at com.levigo.jbig2.segments.TextRegion.computeSymbolCodeLength(TextRegion.java:255)
	at com.levigo.jbig2.segments.TextRegion.parseHeader(TextRegion.java:153)
	at com.levigo.jbig2.segments.TextRegion.init(TextRegion.java:901)
	at com.levigo.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:400)
	... 7 more

jbig2bug.zip

my code:

JBIG2ImageReader reader = (JBIG2ImageReader) ImageIO.getImageReadersByFormatName("JBIG2").next();
JBIG2Globals globals = reader.processGlobals(ImageIO.createImageInputStream(new File(dir,"globals.bin")));
reader.setGlobals(globals);
reader.setInput(ImageIO.createImageInputStream(new File(dir,"img.jbig2")));
BufferedImage image = reader.read(0, reader.getDefaultReadParam());

I'm using your library as part of the Apache PDFBox project. The two data segments come from the PDF, which displays in Adobe Reader, so I'd assume that the image is valid.

Missing license headers

The following files don't have a license header:
pom.xml
release-notes.md

src/main/resources/META-INF/services/org.apache.pdfbox.jbig2.util.cache.CacheBridge

src/main/resources/META-INF/services/org.apache.pdfbox.jbig2.util.log.LoggerBridge
src/main/resources/META-INF/services/javax.imageio.spi.ImageReaderSpi
src/test/resources/META-INF/services/org.apache.pdfbox.jbig2.util.TestService
README.md
.travis.yml

IP vetting for transition to ASL 2.0

In order to achieve #30, we need to vet the IP to this component. The vast majority of the component's code base is solely owned by levigo holding GmbH. However, there is one known contributor with an already merged change request and two pending pull requests. We need permission by those contributors for a transition to the ASL.

The following contributions from outside levigo need to be considered:

So, to make the questions explicit and hopefully cause GitHub to notify the contributors:

  • @janpe2 would you still be willing to provide your pending pull requests under the new ASL 2.0 license? If so, please let us know in a comment.
  • @dbdr would you be willing to give us permission to release your changes under the ASL 2.0? If so, please let us know in a comment.

Cannot read JBIG2 image: jbig2-imageio is not installed

I download the newest pdfbox command line tools pdfbox-app-2.0.7.jar from official website and levigo-jbig2-imageio-1.6.5.jar from search.maven.org.

When I use the following command to extract images from a pdf document which includes jbig2 images, pdfbox tool will report error.

java -cp levigo-jbig2-imageio-1.6.5.jar -jar pdfbox-app-2.0.7.jar ExtractImages my.pdf

the error messages are like following:

org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
Cannot read JBIG2 image: jbig2-imageio is not installed

Problems in standard Huffman tables

jbig2_2.zip

If you open the image PDFjs_8145_p55.jb2, you get:

ArrayIndexOutOfBoundsException: -2147483604
at Bitmap.getByte(Bitmap.java:120)

The image unitized_page_ii.jb2 should contain some text but the decoding result is an empty page.

I have identified an issue in StandardTables.java that causes these problems. I'll create a pull request.

Trying to copy a non-existing line in generic region decoding

If the typical prediction feature is enabled (TPGDON is 1) and LTP is set to 1
the algorithm defined in ITU-T Rec. T.88, 6.2.5.7 Decoding the bitmap describes that the current row should receive all pixels of the row immediately above. That won't work if the current row is the first (top) row of the page.

Java 9 IllegalArgumentException

Caused by: java.lang.IllegalArgumentException: com.levigo.jbig2.util.log.LoggerBridge is not an ImageIO SPI class
at javax.imageio.spi.ServiceRegistry.checkClassAllowed(java.desktop@9-ea/ServiceRegistry.java:733)

Add CI run with Java 9

As soon as Travis CI supports Java 9 builds, the descriptor should be updated to also use Java 9 in the matrix definition.

More pom.xml adjustments in preparation for PDFBox

Am 19.08.2017 um 14:30 schrieb Tilman Hausherr:

About the pom.xml: you are using

org.apache.pdfbox.jbig2
pdfbox-jbig2-imageio
3.0-SNAPSHOT

Shouldn't this be more like this?

org.apache.pdfbox
jbig2-imageio
3.0.0-SNAPSHOT
+1, there is one superfluous "pdfbox". Besides some other minor things to be adjusted we have to discuss how the plugin shall be integrated.

Slow performance of the decoder

Hi,

I am trying to decode a jbig2 file to a png file. But the code is slower especially while reading the image.
Could you suggest me performance improvements so that execution is faster.Currently it takes 485ms.I want it to be reduced to under 150ms. I am posting the code that I use to decode.

Iterator<?> readers = ImageIO.getImageReadersByFormatName("JBIG2"); //takes about 80ms

ImageReader reader = (ImageReader) readers.next();
Object source = fis; //reading the jbig2 file from the disk using FileInputStream fis
ImageInputStream iis = ImageIO.createImageInputStream(source);
reader.setInput(iis, true);
ImageReadParam param = reader.getDefaultReadParam();

Image image = reader.read(0, param); // takes about 150ms

BufferedImage bufferedImage = new BufferedImage(image.getWidth(null), image.getHeight(null), BufferedImage.TYPE_BYTE_BINARY);

Graphics2D g2 = bufferedImage.createGraphics();
g2.drawImage(image, null, null);

Eagerly waiting for your reply.
Thank You

Replace deprecated `STANDARD_INPUT_TYPE`

The constant STANDARD_INPUT_TYPE in ImageReaderSpi has been deprecated. Replace with a self-created array as suggested by javadoc: { ImageInputStream.class }

Transition project to Apache PDFBox

We are currently in talks with the Apache PDFBox project to donate the JBig2 ImageIO-Plugin toe the ASF. This will probably entail:

  • Changing the license and/or re-licensing the component under the ASL 2.0
  • Updating package names and maven coordinates to reflect the new home
  • levigo continuing to offer support contracts for the component

build tests fail if project path has a space

I wanted to do a local build to test the snapshot with my test sets from work and from home. It worked fine at work but not at home. At home the tests fail at BitmapsChecksumTest.test() at this line

final InputStream inputStream = new FileInputStream(new File(imageUrl.getPath()));

java.io.FileNotFoundException: .....%20....\LevigoJBig2ImageIO\target\test-classes\images\042_1.jb2 (Das System kann den angegebenen Pfad nicht finden)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at com.levigo.jbig2.image.BitmapsChecksumTest.test(BitmapsChecksumTest.java:127)

The reason is that my path has a space. This space is converted to "%20" which doesn't work.

Solutions:

final InputStream inputStream = new FileInputStream(new File(imageUrl.ToURI));

even better:

final InputStream inputStream = imageUrl.openStream();

ultimate:

final InputStream inputStream = JBIG2ImageReaderDemo.class.getResourceAsStream(resourcePath);

Huffman user tables in text regions

Huffman user tables do not work in TextRegion segments. Examples:
jbig2.zip

If you open the JBIG2 files, you get:

ClassCastException: SymbolDictionary cannot be cast to Table
at TextRegion.getUserTable(TextRegion.java:826)

I'll create a pull request.

Maven Central

Hello guys!

Thank you for this lib, we are using it together with tess4j.
I would like to help you guys, to get jbig2-imageio to maven central.

Let me know if you are up to it.

Best regards

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.