fagbokforlaget / pdftohtmljs Goto Github PK
View Code? Open in Web Editor NEWPDF to HTML (pdf2htmlEX) shell wrapper pdftohtmljs
License: MIT License
PDF to HTML (pdf2htmlEX) shell wrapper pdftohtmljs
License: MIT License
I want to be able to pass in a data format like a buffer and get the raw HTML from the function so that I don't have to create a file on my server.
When opening the HTML document:
Uncaught SyntaxError: Unexpected end of input
y @ DLdVetwT2o:267
L @ DLdVetwT2o:279
E @ DLdVetwT2o:280
(anonymous function) @ DLdVetwT2o:269
It appears to be referencing the pdf2htmlEX.js
, so I'm debating whether it's an issue with that script or pdftohtmljs
. Any ideas?
Currently, when the child process spawn is closed (child.on('close', ...)
), the rejected error is created with custom properties that are not native to Javascript's Error.
This code is at https://github.com/fagbokforlaget/pdftohtmljs/blob/master/lib/pdftohtml.js#L97:
reject(new Error({code: code, msg:`${self.options.bin} returned an error.`, params: self.options.additional}));
Because of this, when pdf2htmlEX returns an error, pdftohtmljs only shows the following message without explicit details about the error:
Error: [object Object]
at ChildProcess.child.on (/usr/src/app/node_modules/pdftohtmljs/lib/pdftohtml.js:97:18)
at ChildProcess.emit (events.js:198:13)
at maybeClose (internal/child_process.js:982:16)
at Process.ChildProcess._handle.onexit (internal/child_process.js:259:5)
The message above should display explicit details about the error.
Hello!
I hope you are doing well!
We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.
Can you enable it, so that we can report it?
Thanks in advance!
PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository
Hi,
I'm trying to build a file uploader, which should take in only pdf's and uses pdf2htmlEX / your wrapper to convert the files to html.
I ran npm install pdftohtmljs
and I now see pdftohtmljs in my node_modules folder.
However, for the life of me I cannot even get your usage code to work :(
var pdftohtml = require('pdftohtmljs');
var converter = new pdftohtml("/uploads/test.pdf", "upload/test.html");
converter.convert('ipad').then(function() {
console.log("Success");
}).catch(function(err) {
console.error("Conversion error: " + err);
});
// If you would like to tap into progress then create
// progress handler
converter.progress(function(ret) {
console.log ((ret.current*100.0)/ret.total + " %");
});
I get this error if I don't have pdf2html "installed"
events.js:183
throw er; // Unhandled 'error' event
^
Error: spawn pdf2htmlEX ENOENT
at _errnoException (util.js:1022:11)
at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
at onErrorNT (internal/child_process.js:372:16)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
at Function.Module.runMain (module.js:695:11)
at startup (bootstrap_node.js:188:16)
at bootstrap_node.js:609:3
And if I place pdf2thmlEX.exe in the root of my project I get this:
Sorry, this script requires pdf2htmlEX. Please make sure its already installed.
Install it from http://github.com/coolwanglu/pdf2htmlEX
{ code: 1 }
Also tried to make it available through PATH variable but no luck.
How do I "install" pdf2thmlEX correctly, what am I doing wrong?
@iapain and @Jenack any help guys.
pdftohtmljs sample.pdf
I run this command on windows and it returns me an error like below . How can I solve this? Please give a way which is in step by step.
Sorry, this script requires pdf2htmlEX.
Install it from http://github.com/coolwanglu/pdf2htmlEX
Please install pdf2htmlEX
TypeError: data.split is not a function
at Socket.child.stderr.on (/node_modules/pdftohtmljs/lib/pdftohtml.js:71:31)
Hi Team
Your package is awesome. I need some help in that.
I have converted PDF to HTML from your package. But it will not return all images that we have in PDF separately, I want to know How can I have all images.
For e.g., I have a PDF with 3 images and I need html should have 3 images in that separately, so that I can reuse or replace those images in HTMl and later on I m converting that HTML to Image.
Please help.
Thank you in advance.
Hey,
Trying to install this module but for some reason it just won't install. chmod
(line 9) implies it's a permissions issue, however chmod 777 node_modules
didn't make a different. Also, it only seems to be this repo, all my other npm install
s are working fine.
Haydens-MacBook-Pro:tracker haydenbleasel$ npm install pdftohtmljs --save
npm ERR! Darwin 14.3.0
npm ERR! argv "node" "/usr/local/bin/npm" "install" "pdftohtmljs" "--save"
npm ERR! node v0.12.1
npm ERR! npm v2.7.3
npm ERR! path /Users/haydenbleasel/Projects/tracker/node_modules/pdftohtmljs/bin/pdftotextjs
npm ERR! code ENOENT
npm ERR! errno -2
npm ERR! enoent ENOENT, chmod '/Users/haydenbleasel/Projects/tracker/node_modules/pdftohtmljs/bin/pdftotextjs'
npm ERR! enoent This is most likely not a problem with npm itself
npm ERR! enoent and is related to npm not being able to find a file.
There seems to have a memory issue for some PDF files, as it can be seen in a issue in pdf2htmlEX: coolwanglu/pdf2htmlEX#776.
This kind of file raises the following error:
Lookup 'mark' Mark Positioning lookup 2 has an
offset bigger than 65535 bytes. This means
FontForge must use an extension lookup to output it.
Not all applications support extension lookups.
Lookup 'smcp' Lowercase to Small Capitals lookup 21 has an
offset bigger than 65535 bytes. This means
FontForge must use an extension lookup to output it.
Not all applications support extension lookups.
Internal Error: Attempt to output 65744 into a 16-bit field. It will be truncated and the file may not be useful.
I have another file that displays the same error messages when trying to be converted: 66146383-Estudos Disciplinares XIV TI Trabalho Individual 2019.pdf.
The messages above can be searched for online. By searching them, it seems that there are some fonts that FontForge (a dependency of pdf2htmlEX) cannot convert (?).
When trying to convert files that have this problem, the process consumes all memory of the host computer.
Given this problem, there should be an option to limit pdf2htmlEX's spawned process, so that these files do not crash the system when trying to be coverted.
Hello;
I would like to store in the html that is being generated inside my database but for that I think I should store the output of the function inside a variable and then save the value of the variable inside the database; the problem is I don't know how I can save it in the variable ....so Is that possible...and how can I do that?
I am getting from cmd (w pdf in root):
pdftohtmljs test.pdf
/usr/local/lib/node_modules/pdftohtmljs/lib/pdftohtml.js:97
throw new Error("Error code: "+ code);
^
Error: Error code: 1
at ChildProcess. (/usr/local/lib/node_modules/pdftohtmljs/lib/pdftohtml.js:97:17)
at emitTwo (events.js:106:13)
at ChildProcess.emit (events.js:191:7)
at Process.ChildProcess._handle.onexit (internal/child_process.js:204:12)
All dependencies installed: using "engine": "node 5.11.1" to avoid the graceful-fs issue.
Any thoughts?
Thanks
Hi,
I'm trying to build a file uploader, which should take in only pdf's and uses pdf2htmlEX / your wrapper to convert the files to html.
I ran npm install pdftohtmljs
and I now see pdftohtmljs in my node_modules folder.
However, for the life of me I cannot even get your usage code to work :(
var pdftohtml = require('pdftohtmljs');
var converter = new pdftohtml("/uploads/test.pdf", "upload/test.html");
converter.convert('ipad').then(function() {
console.log("Success");
}).catch(function(err) {
console.error("Conversion error: " + err);
});
// If you would like to tap into progress then create
// progress handler
converter.progress(function(ret) {
console.log ((ret.current*100.0)/ret.total + " %");
});
I get this error if I don't have pdf2html "installed"
events.js:183
throw er; // Unhandled 'error' event
^Error: spawn pdf2htmlEX ENOENT
at _errnoException (util.js:1022:11)
at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
at onErrorNT (internal/child_process.js:372:16)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
at Function.Module.runMain (module.js:695:11)
at startup (bootstrap_node.js:188:16)
at bootstrap_node.js:609:3
And if I place pdf2thmlEX.exe in the root of my project I get this:
Sorry, this script requires pdf2htmlEX. Please make sure its already installed.
Install it from http://github.com/coolwanglu/pdf2htmlEX
{ code: 1 }
So basically my question is, how do I "install" pdf2thmlEX correctly, what am I doing wrong?
Hello,
I am getting this error
Please install pdf2htmlEX from https://github.com/coolwanglu/pdf2htmlEX
Conversion error: Error: spawn pdf2htmlEX ENOENT
How would I install it in windows and ubuntu?
try {
await converter.add_options([ '--zoom 1.25', '--embed cfIjo', '--dest-dir out']);
await converter.convert();
} catch (err) {
console.error(Psst! something went wrong: ${err.msg}
);
}
when I pass '--dest-dir out' parameter, always getting "Psst! something went wrong: undefined" error,
hi there,
filenames with spaces cause your tool to break because filename arguments are passed to pdf2htmlEX
command line without quotes.
see here: https://github.com/fagbokforlaget/pdftohtmljs/blob/master/lib/pdftohtml.js#L58
given the way the cmd-line string is simply concatenated I assume this also consistutes a potential cmd injection vector (here is an example discussion regarding this security issue), I believe the following article offers a partial solution:
I'm not sure whether using require('child_process').execFile()
helps with the spaces in file paths, it might be worth considering some kind of package geared towards escaping shell arguments
P.S. this is a drive-by comment, I'm reporting something I found during testing but I ended up not using this module - I won't be creating a PR, please don't ask :-)
I needed all the data that pdftohtmlex child process outputs to stderr as it contains information like font info(when debug flag is 1) Also, if process fails I need this to figure out what went wrong
I did a temp resolution by just passing the 'error' variable to 'resolve' function for Promise.
Line 89 file lib/pdftohtml.js
resolve(); <--> resolve(error);
A useful way to use this package would be to provide the input file as buffer so that we could maybe get the file from a database and pass it to your package to get an html string as output. The dependency on file system should not be enforced.
I'm using the default code in the readme and it's giving me that error. When I don't add a preset the program logs successful but the file hasn't been created.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.