Comments (14)
https://gist.github.com/Rockstar04/c77f9f46f15be7b156aaed9a34bb5188
from pdf2htmlex.
I have a script I use for building pdf2htmlEX that uses Poppler 0.63.0, and font forge, git branch 20170731.
I start with my fork, which is a mix of other people's patches: https://github.com/Rockstar04/pdf2htmlEX
from pdf2htmlex.
Quick work on that alpine image! I had taken a look at reducing the image size using alpine as a base image, but ended up building an image using Debian: https://hub.docker.com/r/jgoldfar/pdf2htmlex-stable/
Looks like an interesting application you all are working on!
from pdf2htmlex.
Thanks Rockstar04! Is your script in your fork? Would you be willing to share? I would be very grateful.
What do you mean by "and font forge, git branch 20170731"? I'm not seeing a font-forge branch having that name or date? https://github.com/fontforge/fontforge/branches/all
from pdf2htmlex.
Awesome work Rockstar04! I have a comment and a related question.
Comment: Compiling poppler with the DENABLE_LIBOPENJPEG=none flag (line 56) produced poor results when converting pdfs with many layers of images. We found that some layers where missing.
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr -DENABLE_XPDF_HEADERS=ON -DENABLE_LIBOPENJPEG=none
In order to compile without the DENABLE_LIBOPENJPEG flag, I did need to install the following dependencies:
apt-get install libgirepository1.0-dev
apt-get install libopenjp2-7-dev
apt-get install libgtk-3-dev
Question: What is the purpose of the DENABLE_LIBOPENJPEG flag?
from pdf2htmlex.
From what I assume it disabled popplers ability to import or export JPEGs? For the steps compiling Font Forge and Poppler, you can look to their official documentation for more information about compiling them from source.
from pdf2htmlex.
Thanks a lot for the script. I'll try adapting it to run in alpine linux so I can use that as a docker image. Have you tried it? There's this image but it's a bit outdated (alpine 3.2).
from pdf2htmlex.
...and here it is, in case you want it: https://hub.docker.com/r/oaeproject/oae-pdf2htmlex-docker/ It's using alpine:3.8
Source is here.
Feel free to use this as a base for someone else's work and/or include it in the wiki. Thanks a lot for your hard work!
from pdf2htmlex.
Just an FYI. It looks like a lot of security updates have been made to Poppler since .63 (https://poppler.freedesktop.org/releases.html). I'm going to try getting this working on centos7 building from source. Will post the steps if I get it working.
from pdf2htmlex.
@amit777 feel free to fork my repo (alpine based) above and please report back your findings
from pdf2htmlex.
i was able to get it compiled with .63 on centos7.. had to do some updates to the .sh script. I was a little too ambitious and tried to get it working with poppler .72. Which required me (i think to upgrade C++ environment to GCC7).. I made it further, but then got other compliation errors. I'm no where near knowledgable enough to solve them.
Anyway, I'm seeing a bunch of forks and not sure which one is the best to work of..
pdf2htmlEX/pdf2htmlEX
Rockstar
Alpine (which i don't know what that is).
from pdf2htmlex.
@amit777 alpine is a linux distro very lightweight and therefore common when building docker images
The docker image I published above is built on alpine, and compiles pdf2htmlEX succesfuly, you can check versions in the Dockerfile.
from pdf2htmlex.
@brecke, so i've never really worked with Docker before but started with yours.. Got it working pretty well. I noticed a small difference in library versions between your docker dev build and the centos7 build I have. Below are the --versions of each. libfontforge and cairo are the version differences.
-- version output on Centos 7 build
pdf2htmlEX version 0.15.0
Copyright 2012-2015 Lu Wang <[email protected]> and other contributors
Libraries:
poppler 0.63.0
libfontforge 20190219
cairo 1.15.12
Default data-dir: /usr/local/share/pdf2htmlEX
Supported image format: png jpg svg
--version output on your docker build:
pdf2htmlEX version 0.15.0
Copyright 2012-2015 Lu Wang <[email protected]> and other contributors
Libraries:
poppler 0.63.0
libfontforge 20181011
cairo 1.14.8
Default data-dir: /build/usr/share/pdf2htmlEX
Supported image format: png jpg svg
And on my mac (using brew install pdf2htmlEX):
pdf2htmlEX version 0.14.6
Copyright 2012-2015 Lu Wang <[email protected]> and other contributors
Libraries:
poppler 0.57.0
libfontforge 20180321
cairo 1.16.0
Default data-dir: /usr/local/Cellar/pdf2htmlex/0.14.6_20/share/pdf2htmlEX
Supported image format: png jpg svg
from pdf2htmlex.
Yeah, well, I didn't really care for having the most recent versions of all dependencies, once I got one working that was just about enough 🤷🏻♂️ feel free to try different versions though and fork at will
from pdf2htmlex.
Related Issues (20)
- Create a new latest docker image on docker hub HOT 4
- Maintaining the visible form of text when using cut-paste
- Heap-Buffer-Overflow in embed_font Function
- Doubt: Blocks order
- how to install it and can you tell how we can convert pdf to html HOT 3
- how to restore table structure HOT 1
- how to install on macos HOT 1
- Bug: Gen inside xref table too large (bigger than INT_MAX)
- libjpeg-turbo8 is not present on recent Debian versions HOT 1
- Rotated annotations
- Request: Support actionLaunch/actionGoToR links
- Why are the matrix styles needed?
- Why is some of the text not extracted and is basked into the generated images?
- TOC and many internal crossref links?
- Issue in selecting text HOT 1
- Converting error HOT 2
- convert all PDF content into one web page
- How to Use This Tool in a Web App?
- Where is /bin/sh script in the tar archive?
- Run pdf2htmlEX with Node.js, get stderr
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pdf2htmlex.