levigo / jbig2-imageio Goto Github PK
View Code? Open in Web Editor NEWA Java ImageIO plugin for the JBIG2 bi-level image format
License: Apache License 2.0
A Java ImageIO plugin for the JBIG2 bi-level image format
License: Apache License 2.0
Hello,
I noticed a memory leak in SoftReferenceCache. The cache is implemented as a HashMap where the key is an arbitrary object and the value is wrapped with a SoftReference. While the values eventually get reclaimed by the garbage collector, there is no mechanism to ever clear the keys of the HashMap. Thus, the keys of the cache grow without bound and may eventually exhaust available heap space.
JBIG2ImageReader uses this cache with a JBIG2Page as the key. The JBIG2Page objects occupy about 20kB worth of memory because they contain a 20k javax.imageio.stream.MemoryCacheImageInputStream. This can accumulate quickly when rendering pdfs with jbig2 (50k images ~ 1GB heap space).
As a workaround, I'm periodically calling cache.clear().
Suggestions:
The following files come from the official reference suite. It is currently no longer available from the original source, but has been mirrored here for the jbig2dec library
These sample bitstreams are referenced in many places, e.g. here but not all include the actual files. Jbig2dec is licensed under the GPL but I doubt that this applies to the sample images as well.
The following files/bitstreams are reproduced as hex-dumps within the T.88 specification document
The heritage of the following files is known, but the copyright status is not clear
In the 2.0 version, newlines are printed to stdout. I went through the repository and it is HuffmanTable.java, line 64. The effect can be reproduced with the code and the files from #21.
The code base includes sample bitstreams that haven been transcribed from the T-REC-T.88 specification. The intellectual property is wholly owned by the International Telecommunication Union. The ITU has kindly permitted us to use the bitstreams, as long as the following information is included with the source code.
The compliance checking files provided with this package contain information which has been extracted from Recommendation ITU-T T.88 "Information technology – Lossy/lossless coding of bi-level images" (2000/02) of the International Telecommunication Union (“ITU Information”), as found in http://www.itu.int/rec/T-REC-T.88-200002-I/en.
The extraction and use of ITU Information has been made under license from International Telecommunication Union (“ITU”), which owns all property rights (including intellectual property rights) to Recommendation ITU-T T.88 (2000/02). This ITU Information is made available to everyone for free and may be used for non-commercial purposes; for any other use please contact ITU at [email protected]. The sole responsibility for extracting the ITU Information and the responsibility for any errors or deficiencies in the package lies exclusively with [creator of package]. ITU is not involved in the development of the package or the extraction and use of ITU Information contained therein.
Hello,
I want to package this library for Debian because i need it as a dependency.
Can you provide the source for these files too?
src/test/resources/images/
Hello,
got a problem with the JBIG2-Plugin, with multiple-threads.
When I used it with 5 threads, all five threads were blocked after about one hour. All threads were locked in HashMap.getEntry()
Regards
Felix
https://code.google.com/archive/p/jbig2-imageio/
Maven coordinates will work and need not to be changed
No, they don't work anymore, the repository is 404 now, so they must be changed: use the general repository, and version 1.6.5. Apparently older versions don't work anymore. (I don't mind this, but it broke our build until I understood what happened).
Exception in thread "main" java.lang.RuntimeException: Can't instantiate segment class
at com.levigo.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:405)
at com.levigo.jbig2.JBIG2Page.createNormalPage(JBIG2Page.java:182)
at com.levigo.jbig2.JBIG2Page.createPage(JBIG2Page.java:154)
at com.levigo.jbig2.JBIG2Page.composePageBitmap(JBIG2Page.java:145)
at com.levigo.jbig2.JBIG2Page.getBitmap(JBIG2Page.java:125)
at com.levigo.jbig2.JBIG2ImageReader.read(JBIG2ImageReader.java:223)
at javaapplicationjbig2test.JavaApplicationJBig2Test.test2(JavaApplicationJBig2Test.java:84)
at javaapplicationjbig2test.JavaApplicationJBig2Test.main(JavaApplicationJBig2Test.java:52)
Caused by: java.lang.ClassCastException: com.levigo.jbig2.decoder.huffman.ValueNode cannot be cast to com.levigo.jbig2.decoder.huffman.InternalNode
at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
at com.levigo.jbig2.decoder.huffman.InternalNode.append(InternalNode.java:94)
at com.levigo.jbig2.decoder.huffman.HuffmanTable.initTree(HuffmanTable.java:68)
at com.levigo.jbig2.decoder.huffman.FixedSizeTable.<init>(FixedSizeTable.java:30)
at com.levigo.jbig2.segments.TextRegion.symbolIDCodeLengths(TextRegion.java:892)
at com.levigo.jbig2.segments.TextRegion.computeSymbolCodeLength(TextRegion.java:255)
at com.levigo.jbig2.segments.TextRegion.parseHeader(TextRegion.java:153)
at com.levigo.jbig2.segments.TextRegion.init(TextRegion.java:901)
at com.levigo.jbig2.SegmentHeader.getSegmentData(SegmentHeader.java:400)
... 7 more
my code:
JBIG2ImageReader reader = (JBIG2ImageReader) ImageIO.getImageReadersByFormatName("JBIG2").next();
JBIG2Globals globals = reader.processGlobals(ImageIO.createImageInputStream(new File(dir,"globals.bin")));
reader.setGlobals(globals);
reader.setInput(ImageIO.createImageInputStream(new File(dir,"img.jbig2")));
BufferedImage image = reader.read(0, reader.getDefaultReadParam());
I'm using your library as part of the Apache PDFBox project. The two data segments come from the PDF, which displays in Adobe Reader, so I'd assume that the image is valid.
In addition to my question posted on Software Recommendations.SE, are there any plans to develop a JBig2 image writer allowing to encode JBig2 images using ImageIO/jbig2-imageio ?
The following files don't have a license header:
pom.xml
release-notes.md
src/main/resources/META-INF/services/org.apache.pdfbox.jbig2.util.cache.CacheBridge
src/main/resources/META-INF/services/org.apache.pdfbox.jbig2.util.log.LoggerBridge
src/main/resources/META-INF/services/javax.imageio.spi.ImageReaderSpi
src/test/resources/META-INF/services/org.apache.pdfbox.jbig2.util.TestService
README.md
.travis.yml
In order to achieve #30, we need to vet the IP to this component. The vast majority of the component's code base is solely owned by levigo holding GmbH. However, there is one known contributor with an already merged change request and two pending pull requests. We need permission by those contributors for a transition to the ASL.
The following contributions from outside levigo need to be considered:
So, to make the questions explicit and hopefully cause GitHub to notify the contributors:
I download the newest pdfbox command line tools pdfbox-app-2.0.7.jar from official website and levigo-jbig2-imageio-1.6.5.jar from search.maven.org.
When I use the following command to extract images from a pdf document which includes jbig2 images, pdfbox tool will report error.
java -cp levigo-jbig2-imageio-1.6.5.jar -jar pdfbox-app-2.0.7.jar ExtractImages my.pdf
the error messages are like following:
org.apache.pdfbox.contentstream.PDFStreamEngine operatorException
Cannot read JBIG2 image: jbig2-imageio is not installed
If you open the image PDFjs_8145_p55.jb2, you get:
ArrayIndexOutOfBoundsException: -2147483604
at Bitmap.getByte(Bitmap.java:120)
The image unitized_page_ii.jb2 should contain some text but the decoding result is an empty page.
I have identified an issue in StandardTables.java that causes these problems. I'll create a pull request.
If the typical prediction feature is enabled (TPGDON
is 1
) and LTP
is set to 1
the algorithm defined in ITU-T Rec. T.88, 6.2.5.7 Decoding the bitmap describes that the current row should receive all pixels of the row immediately above. That won't work if the current row is the first (top) row of the page.
Update LICENSE-HEADER.txt and run the maven with license:format.
Maven specifies the authors in pom's section. All other places with this kind of information should be finally deleted according to SSOT.
Caused by: java.lang.IllegalArgumentException: com.levigo.jbig2.util.log.LoggerBridge is not an ImageIO SPI class
at javax.imageio.spi.ServiceRegistry.checkClassAllowed(java.desktop@9-ea/ServiceRegistry.java:733)
As soon as Travis CI supports Java 9 builds, the descriptor should be updated to also use Java 9 in the matrix definition.
Am 19.08.2017 um 14:30 schrieb Tilman Hausherr:
About the pom.xml: you are using
org.apache.pdfbox.jbig2
pdfbox-jbig2-imageio
3.0-SNAPSHOTShouldn't this be more like this?
org.apache.pdfbox
jbig2-imageio
3.0.0-SNAPSHOT
+1, there is one superfluous "pdfbox". Besides some other minor things to be adjusted we have to discuss how the plugin shall be integrated.
Hi,
I am trying to decode a jbig2 file to a png file. But the code is slower especially while reading the image.
Could you suggest me performance improvements so that execution is faster.Currently it takes 485ms.I want it to be reduced to under 150ms. I am posting the code that I use to decode.
Iterator<?> readers = ImageIO.getImageReadersByFormatName("JBIG2"); //takes about 80ms
ImageReader reader = (ImageReader) readers.next();
Object source = fis; //reading the jbig2 file from the disk using FileInputStream fis
ImageInputStream iis = ImageIO.createImageInputStream(source);
reader.setInput(iis, true);
ImageReadParam param = reader.getDefaultReadParam();
Image image = reader.read(0, param); // takes about 150ms
BufferedImage bufferedImage = new BufferedImage(image.getWidth(null), image.getHeight(null), BufferedImage.TYPE_BYTE_BINARY);
Graphics2D g2 = bufferedImage.createGraphics();
g2.drawImage(image, null, null);
Eagerly waiting for your reply.
Thank You
The constant STANDARD_INPUT_TYPE
in ImageReaderSpi
has been deprecated. Replace with a self-created array as suggested by javadoc: { ImageInputStream.class }
We are currently in talks with the Apache PDFBox project to donate the JBig2 ImageIO-Plugin toe the ASF. This will probably entail:
I wanted to do a local build to test the snapshot with my test sets from work and from home. It worked fine at work but not at home. At home the tests fail at BitmapsChecksumTest.test() at this line
final InputStream inputStream = new FileInputStream(new File(imageUrl.getPath()));
java.io.FileNotFoundException: .....%20....\LevigoJBig2ImageIO\target\test-classes\images\042_1.jb2 (Das System kann den angegebenen Pfad nicht finden)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at com.levigo.jbig2.image.BitmapsChecksumTest.test(BitmapsChecksumTest.java:127)
The reason is that my path has a space. This space is converted to "%20" which doesn't work.
Solutions:
final InputStream inputStream = new FileInputStream(new File(imageUrl.ToURI));
even better:
final InputStream inputStream = imageUrl.openStream();
ultimate:
final InputStream inputStream = JBIG2ImageReaderDemo.class.getResourceAsStream(resourcePath);
Huffman user tables do not work in TextRegion segments. Examples:
jbig2.zip
If you open the JBIG2 files, you get:
ClassCastException: SymbolDictionary cannot be cast to Table
at TextRegion.getUserTable(TextRegion.java:826)
I'll create a pull request.
Hello guys!
Thank you for this lib, we are using it together with tess4j.
I would like to help you guys, to get jbig2-imageio to maven central.
Let me know if you are up to it.
Best regards
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.