fagbokforlaget / pdftohtmljs Goto Github PK

View Code? Open in Web Editor NEW

143.0 143.0 40.0 213 KB

PDF to HTML (pdf2htmlEX) shell wrapper pdftohtmljs

License: MIT License

JavaScript 100.00%

pdftohtmljs's People

Contributors

Stargazers

Watchers

pdftohtmljs's Issues

Pass in Buffer and Receive plain text.

I want to be able to pass in a data format like a buffer and get the raw HTML from the function so that I don't have to create a file on my server.

Unexpected end of input

When opening the HTML document:

Uncaught SyntaxError: Unexpected end of input

y @ DLdVetwT2o:267
L @ DLdVetwT2o:279
E @ DLdVetwT2o:280
(anonymous function) @ DLdVetwT2o:269

It appears to be referencing the pdf2htmlEX.js, so I'm debating whether it's an issue with that script or pdftohtmljs. Any ideas?

Correctly create error on binary child process error

Currently, when the child process spawn is closed (child.on('close', ...)), the rejected error is created with custom properties that are not native to Javascript's Error.

This code is at https://github.com/fagbokforlaget/pdftohtmljs/blob/master/lib/pdftohtml.js#L97:

reject(new Error({code: code, msg:`${self.options.bin} returned an error.`, params: self.options.additional}));

Because of this, when pdf2htmlEX returns an error, pdftohtmljs only shows the following message without explicit details about the error:

Error: [object Object]
    at ChildProcess.child.on (/usr/src/app/node_modules/pdftohtmljs/lib/pdftohtml.js:97:18)
    at ChildProcess.emit (events.js:198:13)
    at maybeClose (internal/child_process.js:982:16)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:259:5)

The message above should display explicit details about the error.

We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.

Can you enable it, so that we can report it?

Thanks in advance!

PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository

"Installing" pdf2htmlEX - Error: spawn pdf2htmlEX ENOENT or { code: 1 }

Hi,

I'm trying to build a file uploader, which should take in only pdf's and uses pdf2htmlEX / your wrapper to convert the files to html.

I ran npm install pdftohtmljs and I now see pdftohtmljs in my node_modules folder.

However, for the life of me I cannot even get your usage code to work :(

var pdftohtml = require('pdftohtmljs');
var converter = new pdftohtml("/uploads/test.pdf", "upload/test.html");

converter.convert('ipad').then(function() {
  console.log("Success");
}).catch(function(err) {
  console.error("Conversion error: " + err);
});

// If you would like to tap into progress then create
// progress handler
converter.progress(function(ret) {
  console.log ((ret.current*100.0)/ret.total + " %");
});

I get this error if I don't have pdf2html "installed"

events.js:183
throw er; // Unhandled 'error' event
^

Error: spawn pdf2htmlEX ENOENT
at _errnoException (util.js:1022:11)
at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
at onErrorNT (internal/child_process.js:372:16)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
at Function.Module.runMain (module.js:695:11)
at startup (bootstrap_node.js:188:16)
at bootstrap_node.js:609:3

And if I place pdf2thmlEX.exe in the root of my project I get this:

Sorry, this script requires pdf2htmlEX. Please make sure its already installed.
Install it from http://github.com/coolwanglu/pdf2htmlEX
{ code: 1 }

Also tried to make it available through PATH variable but no luck.

How do I "install" pdf2thmlEX correctly, what am I doing wrong?
@iapain and @Jenack any help guys.

Sorry, this script requires pdf2htmlEX

pdftohtmljs sample.pdf
I run this command on windows and it returns me an error like below . How can I solve this? Please give a way which is in step by step.

Sorry, this script requires pdf2htmlEX.
Install it from http://github.com/coolwanglu/pdf2htmlEX
Please install pdf2htmlEX

data.split is not a function error

TypeError: data.split is not a function
at Socket.child.stderr.on (/node_modules/pdftohtmljs/lib/pdftohtml.js:71:31)

Need PDF to HTML with all images

Hi Team

Your package is awesome. I need some help in that.
I have converted PDF to HTML from your package. But it will not return all images that we have in PDF separately, I want to know How can I have all images.

For e.g., I have a PDF with 3 images and I need html should have 3 images in that separately, so that I can reuse or replace those images in HTMl and later on I m converting that HTML to Image.

Please help.

Thank you in advance.

Install issue

Hey,

Trying to install this module but for some reason it just won't install. chmod (line 9) implies it's a permissions issue, however chmod 777 node_modules didn't make a different. Also, it only seems to be this repo, all my other npm installs are working fine.

Haydens-MacBook-Pro:tracker haydenbleasel$ npm install pdftohtmljs --save
npm ERR! Darwin 14.3.0
npm ERR! argv "node" "/usr/local/bin/npm" "install" "pdftohtmljs" "--save"
npm ERR! node v0.12.1
npm ERR! npm  v2.7.3
npm ERR! path /Users/haydenbleasel/Projects/tracker/node_modules/pdftohtmljs/bin/pdftotextjs
npm ERR! code ENOENT
npm ERR! errno -2
npm ERR! enoent ENOENT, chmod '/Users/haydenbleasel/Projects/tracker/node_modules/pdftohtmljs/bin/pdftotextjs'
npm ERR! enoent This is most likely not a problem with npm itself
npm ERR! enoent and is related to npm not being able to find a file.

Limit memory/CPU usage on pdf2htmlEX process

There seems to have a memory issue for some PDF files, as it can be seen in a issue in pdf2htmlEX: coolwanglu/pdf2htmlEX#776.

This kind of file raises the following error:

Lookup 'mark' Mark Positioning lookup 2 has an
offset bigger than 65535 bytes. This means
FontForge must use an extension lookup to output it.
Not all applications support extension lookups.
Lookup 'smcp' Lowercase to Small Capitals lookup 21 has an
offset bigger than 65535 bytes. This means
FontForge must use an extension lookup to output it.
Not all applications support extension lookups.
Internal Error: Attempt to output 65744 into a 16-bit field. It will be truncated and the file may not be useful.

I have another file that displays the same error messages when trying to be converted: 66146383-Estudos Disciplinares XIV TI Trabalho Individual 2019.pdf.

The messages above can be searched for online. By searching them, it seems that there are some fonts that FontForge (a dependency of pdf2htmlEX) cannot convert (?).
When trying to convert files that have this problem, the process consumes all memory of the host computer.
Given this problem, there should be an option to limit pdf2htmlEX's spawned process, so that these files do not crash the system when trying to be coverted.

Is there anyway to output the html in a variable?

Hello;

I would like to store in the html that is being generated inside my database but for that I think I should store the output of the function inside a variable and then save the value of the variable inside the database; the problem is I don't know how I can save it in the variable ....so Is that possible...and how can I do that?

Re: Error code

I am getting from cmd (w pdf in root):

pdftohtmljs test.pdf

/usr/local/lib/node_modules/pdftohtmljs/lib/pdftohtml.js:97
throw new Error("Error code: "+ code);
^

Error: Error code: 1
at ChildProcess. (/usr/local/lib/node_modules/pdftohtmljs/lib/pdftohtml.js:97:17)
at emitTwo (events.js:106:13)
at ChildProcess.emit (events.js:191:7)
at Process.ChildProcess._handle.onexit (internal/child_process.js:204:12)

All dependencies installed: using "engine": "node 5.11.1" to avoid the graceful-fs issue.

Any thoughts?

Thanks

"Installing" pdf2htmlEX - Error: spawn pdf2htmlEX ENOENT or { code: 1 }

Hi,

I'm trying to build a file uploader, which should take in only pdf's and uses pdf2htmlEX / your wrapper to convert the files to html.

I ran npm install pdftohtmljs and I now see pdftohtmljs in my node_modules folder.

However, for the life of me I cannot even get your usage code to work :(

var pdftohtml = require('pdftohtmljs');
var converter = new pdftohtml("/uploads/test.pdf", "upload/test.html");

converter.convert('ipad').then(function() {
  console.log("Success");
}).catch(function(err) {
  console.error("Conversion error: " + err);
});

// If you would like to tap into progress then create
// progress handler
converter.progress(function(ret) {
  console.log ((ret.current*100.0)/ret.total + " %");
});

I get this error if I don't have pdf2html "installed"

events.js:183
throw er; // Unhandled 'error' event
^

Error: spawn pdf2htmlEX ENOENT
at _errnoException (util.js:1022:11)
at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
at onErrorNT (internal/child_process.js:372:16)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
at Function.Module.runMain (module.js:695:11)
at startup (bootstrap_node.js:188:16)
at bootstrap_node.js:609:3

And if I place pdf2thmlEX.exe in the root of my project I get this:

Sorry, this script requires pdf2htmlEX. Please make sure its already installed.
Install it from http://github.com/coolwanglu/pdf2htmlEX
{ code: 1 }

So basically my question is, how do I "install" pdf2thmlEX correctly, what am I doing wrong?

Error while executing

Hello,
I am getting this error
Please install pdf2htmlEX from https://github.com/coolwanglu/pdf2htmlEX
Conversion error: Error: spawn pdf2htmlEX ENOENT
How would I install it in windows and ubuntu?

spawn pdf2htmlEX ENOENT

I ran 'pdf2htmlEX et.pdf sample.html' on terminal, and successfully
Anyone can help me resolve this issue?

--dest-dir not working

try {
await converter.add_options([ '--zoom 1.25', '--embed cfIjo', '--dest-dir out']);
await converter.convert();
} catch (err) {
console.error(Psst! something went wrong: ${err.msg});
}

when I pass '--dest-dir out' parameter, always getting "Psst! something went wrong: undefined" error,

How to pass --embed options

filenames with spaces not supported

hi there,

filenames with spaces cause your tool to break because filename arguments are passed to pdf2htmlEX command line without quotes.

see here: https://github.com/fagbokforlaget/pdftohtmljs/blob/master/lib/pdftohtml.js#L58

given the way the cmd-line string is simply concatenated I assume this also consistutes a potential cmd injection vector (here is an example discussion regarding this security issue), I believe the following article offers a partial solution:

https://blog.liftsecurity.io/2014/08/19/Avoid-Command-Injection-Node.js

I'm not sure whether using require('child_process').execFile() helps with the spaces in file paths, it might be worth considering some kind of package geared towards escaping shell arguments

P.S. this is a drive-by comment, I'm reporting something I found during testing but I ended up not using this module - I won't be creating a PR, please don't ask :-)

Cannot access pdftohtmlex debug data sent to stderr.

I needed all the data that pdftohtmlex child process outputs to stderr as it contains information like font info(when debug flag is 1) Also, if process fails I need this to figure out what went wrong

I did a temp resolution by just passing the 'error' variable to 'resolve' function for Promise.

Line 89 file lib/pdftohtml.js
resolve(); <--> resolve(error);

fagbokforlaget / pdftohtmljs Goto Github PK

pdftohtmljs's People

Contributors

Stargazers

Watchers

Forkers

pdftohtmljs's Issues

Recommend Projects

Recommend Topics

Recommend Org