Giter VIP home page Giter VIP logo

pdftohtmljs's People

Contributors

dependabot[bot] avatar iapain avatar jeremybyu avatar kalley avatar pgcalixto avatar rodrigobdz avatar sundeepnarang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pdftohtmljs's Issues

Unexpected end of input

When opening the HTML document:

Uncaught SyntaxError: Unexpected end of input

y @ DLdVetwT2o:267
L @ DLdVetwT2o:279
E @ DLdVetwT2o:280
(anonymous function) @ DLdVetwT2o:269

It appears to be referencing the pdf2htmlEX.js, so I'm debating whether it's an issue with that script or pdftohtmljs. Any ideas?

Correctly create error on binary child process error

Currently, when the child process spawn is closed (child.on('close', ...)), the rejected error is created with custom properties that are not native to Javascript's Error.

This code is at https://github.com/fagbokforlaget/pdftohtmljs/blob/master/lib/pdftohtml.js#L97:

reject(new Error({code: code, msg:`${self.options.bin} returned an error.`, params: self.options.additional}));

Because of this, when pdf2htmlEX returns an error, pdftohtmljs only shows the following message without explicit details about the error:

Error: [object Object]
    at ChildProcess.child.on (/usr/src/app/node_modules/pdftohtmljs/lib/pdftohtml.js:97:18)
    at ChildProcess.emit (events.js:198:13)
    at maybeClose (internal/child_process.js:982:16)
    at Process.ChildProcess._handle.onexit (internal/child_process.js:259:5)

The message above should display explicit details about the error.

Reporting a vulnerability

Hello!

I hope you are doing well!

We are a security research team. Our tool automatically detected a vulnerability in this repository. We want to disclose it responsibly. GitHub has a feature called Private vulnerability reporting, which enables security research to privately disclose a vulnerability. Unfortunately, it is not enabled for this repository.

Can you enable it, so that we can report it?

Thanks in advance!

PS: you can read about how to enable private vulnerability reporting here: https://docs.github.com/en/code-security/security-advisories/repository-security-advisories/configuring-private-vulnerability-reporting-for-a-repository

"Installing" pdf2htmlEX - Error: spawn pdf2htmlEX ENOENT or { code: 1 }

Hi,

I'm trying to build a file uploader, which should take in only pdf's and uses pdf2htmlEX / your wrapper to convert the files to html.

I ran npm install pdftohtmljs and I now see pdftohtmljs in my node_modules folder.

However, for the life of me I cannot even get your usage code to work :(

var pdftohtml = require('pdftohtmljs');
var converter = new pdftohtml("/uploads/test.pdf", "upload/test.html");

converter.convert('ipad').then(function() {
  console.log("Success");
}).catch(function(err) {
  console.error("Conversion error: " + err);
});

// If you would like to tap into progress then create
// progress handler
converter.progress(function(ret) {
  console.log ((ret.current*100.0)/ret.total + " %");
});

I get this error if I don't have pdf2html "installed"

events.js:183
throw er; // Unhandled 'error' event
^

Error: spawn pdf2htmlEX ENOENT
at _errnoException (util.js:1022:11)
at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
at onErrorNT (internal/child_process.js:372:16)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
at Function.Module.runMain (module.js:695:11)
at startup (bootstrap_node.js:188:16)
at bootstrap_node.js:609:3

And if I place pdf2thmlEX.exe in the root of my project I get this:

Sorry, this script requires pdf2htmlEX. Please make sure its already installed.
Install it from http://github.com/coolwanglu/pdf2htmlEX
{ code: 1 }

Also tried to make it available through PATH variable but no luck.

How do I "install" pdf2thmlEX correctly, what am I doing wrong?
@iapain and @Jenack any help guys.

Need PDF to HTML with all images

Hi Team

Your package is awesome. I need some help in that.
I have converted PDF to HTML from your package. But it will not return all images that we have in PDF separately, I want to know How can I have all images.

For e.g., I have a PDF with 3 images and I need html should have 3 images in that separately, so that I can reuse or replace those images in HTMl and later on I m converting that HTML to Image.

Please help.

Thank you in advance.

Install issue

Hey,

Trying to install this module but for some reason it just won't install. chmod (line 9) implies it's a permissions issue, however chmod 777 node_modules didn't make a different. Also, it only seems to be this repo, all my other npm installs are working fine.

Haydens-MacBook-Pro:tracker haydenbleasel$ npm install pdftohtmljs --save
npm ERR! Darwin 14.3.0
npm ERR! argv "node" "/usr/local/bin/npm" "install" "pdftohtmljs" "--save"
npm ERR! node v0.12.1
npm ERR! npm  v2.7.3
npm ERR! path /Users/haydenbleasel/Projects/tracker/node_modules/pdftohtmljs/bin/pdftotextjs
npm ERR! code ENOENT
npm ERR! errno -2
npm ERR! enoent ENOENT, chmod '/Users/haydenbleasel/Projects/tracker/node_modules/pdftohtmljs/bin/pdftotextjs'
npm ERR! enoent This is most likely not a problem with npm itself
npm ERR! enoent and is related to npm not being able to find a file.

Limit memory/CPU usage on pdf2htmlEX process

There seems to have a memory issue for some PDF files, as it can be seen in a issue in pdf2htmlEX: coolwanglu/pdf2htmlEX#776.

This kind of file raises the following error:

Lookup 'mark' Mark Positioning lookup 2 has an
offset bigger than 65535 bytes. This means
FontForge must use an extension lookup to output it.
Not all applications support extension lookups.
Lookup 'smcp' Lowercase to Small Capitals lookup 21 has an
offset bigger than 65535 bytes. This means
FontForge must use an extension lookup to output it.
Not all applications support extension lookups.
Internal Error: Attempt to output 65744 into a 16-bit field. It will be truncated and the file may not be useful.

I have another file that displays the same error messages when trying to be converted: 66146383-Estudos Disciplinares XIV TI Trabalho Individual 2019.pdf.

The messages above can be searched for online. By searching them, it seems that there are some fonts that FontForge (a dependency of pdf2htmlEX) cannot convert (?).
When trying to convert files that have this problem, the process consumes all memory of the host computer.
Given this problem, there should be an option to limit pdf2htmlEX's spawned process, so that these files do not crash the system when trying to be coverted.

Is there anyway to output the html in a variable?

Hello;

I would like to store in the html that is being generated inside my database but for that I think I should store the output of the function inside a variable and then save the value of the variable inside the database; the problem is I don't know how I can save it in the variable ....so Is that possible...and how can I do that?

Re: Error code

I am getting from cmd (w pdf in root):

pdftohtmljs test.pdf

/usr/local/lib/node_modules/pdftohtmljs/lib/pdftohtml.js:97
throw new Error("Error code: "+ code);
^

Error: Error code: 1
at ChildProcess. (/usr/local/lib/node_modules/pdftohtmljs/lib/pdftohtml.js:97:17)
at emitTwo (events.js:106:13)
at ChildProcess.emit (events.js:191:7)
at Process.ChildProcess._handle.onexit (internal/child_process.js:204:12)

All dependencies installed: using "engine": "node 5.11.1" to avoid the graceful-fs issue.

Any thoughts?

Thanks

"Installing" pdf2htmlEX - Error: spawn pdf2htmlEX ENOENT or { code: 1 }

Hi,

I'm trying to build a file uploader, which should take in only pdf's and uses pdf2htmlEX / your wrapper to convert the files to html.

I ran npm install pdftohtmljs and I now see pdftohtmljs in my node_modules folder.

However, for the life of me I cannot even get your usage code to work :(

var pdftohtml = require('pdftohtmljs');
var converter = new pdftohtml("/uploads/test.pdf", "upload/test.html");

converter.convert('ipad').then(function() {
  console.log("Success");
}).catch(function(err) {
  console.error("Conversion error: " + err);
});

// If you would like to tap into progress then create
// progress handler
converter.progress(function(ret) {
  console.log ((ret.current*100.0)/ret.total + " %");
});

I get this error if I don't have pdf2html "installed"

events.js:183
throw er; // Unhandled 'error' event
^

Error: spawn pdf2htmlEX ENOENT
at _errnoException (util.js:1022:11)
at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
at onErrorNT (internal/child_process.js:372:16)
at _combinedTickCallback (internal/process/next_tick.js:138:11)
at process._tickCallback (internal/process/next_tick.js:180:9)
at Function.Module.runMain (module.js:695:11)
at startup (bootstrap_node.js:188:16)
at bootstrap_node.js:609:3

And if I place pdf2thmlEX.exe in the root of my project I get this:

Sorry, this script requires pdf2htmlEX. Please make sure its already installed.
Install it from http://github.com/coolwanglu/pdf2htmlEX
{ code: 1 }

So basically my question is, how do I "install" pdf2thmlEX correctly, what am I doing wrong?

spawn pdf2htmlEX ENOENT

Screenshot from 2021-12-14 15-51-08
I ran 'pdf2htmlEX et.pdf sample.html' on terminal, and successfully
Anyone can help me resolve this issue?

--dest-dir not working

try {
await converter.add_options([ '--zoom 1.25', '--embed cfIjo', '--dest-dir out']);
await converter.convert();
} catch (err) {
console.error(Psst! something went wrong: ${err.msg});
}

when I pass '--dest-dir out' parameter, always getting "Psst! something went wrong: undefined" error,

filenames with spaces not supported

hi there,

filenames with spaces cause your tool to break because filename arguments are passed to pdf2htmlEX command line without quotes.

see here: https://github.com/fagbokforlaget/pdftohtmljs/blob/master/lib/pdftohtml.js#L58

given the way the cmd-line string is simply concatenated I assume this also consistutes a potential cmd injection vector (here is an example discussion regarding this security issue), I believe the following article offers a partial solution:

I'm not sure whether using require('child_process').execFile() helps with the spaces in file paths, it might be worth considering some kind of package geared towards escaping shell arguments

P.S. this is a drive-by comment, I'm reporting something I found during testing but I ended up not using this module - I won't be creating a PR, please don't ask :-)

Cannot access pdftohtmlex debug data sent to stderr.

I needed all the data that pdftohtmlex child process outputs to stderr as it contains information like font info(when debug flag is 1) Also, if process fails I need this to figure out what went wrong

I did a temp resolution by just passing the 'error' variable to 'resolve' function for Promise.

Line 89 file lib/pdftohtml.js
resolve(); <--> resolve(error);

Remove dependency on file-system

A useful way to use this package would be to provide the input file as buffer so that we could maybe get the file from a database and pass it to your package to get an html string as output. The dependency on file system should not be enforced.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.