Giter VIP home page Giter VIP logo

hummusjssamples's People

Contributors

chunyenhuang avatar galkahana avatar ksloan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

hummusjssamples's Issues

Support CID fonts

Hello Gal,

I have PDF file which contains CID fonts and I could not extract text from it using your text-extraction lib.

If I correctly understand you need such files for further development.

"If you find me an example file that doesn’t bring the text (but also doesn’t crush the code), it’s probably an example of such a file - and you’ll have my thanks if you send it to me."

You can download it here:
https://www.dropbox.com/s/6x9zrc6cd75l3as/publication_1284_original.pdf?dl=0

It would be great if you give some advices about how to handle such fonts.

How can I know the query for a text field?

I have been looking everywhere. I have this goverment form I want to fill using Hummus but how do I tell which field to fill?

I can see you used something like that here.

How were you able to find the name/selector?

font-decoding.js wrong arguments, pass 1 argument which is a valid index in the array

When using the sample for extracting text, I kept getting an error in the font-decoding.js code. The issue is on line 76:

map[j] = besToUnicodes(unicodeArray.queryObject(j).toBytesArray());

queryObject(j) throws an error because j is a number representing the items position in the map that is getting created during the parsing but the unicodeArray is usually just a length of 1 where my pdfs are hitting this code. So using j tries to pull an index in the array that doesn't exist.

I was able to fix the issue by changing things thusly:

if(operands[i+2].getType() === hummus.ePDFObjectArray) {
    var unicodeArray = operands[i+2].toPDFArray();
    // specific codes
    for(var j = startCode;j<=endCode;++j) {
        map[j] = besToUnicodes(unicodeArray.queryObject(j).toBytesArray());
    }
}

becomes ...

if(operands[i+2].getType() === hummus.ePDFObjectArray) {
    var unicodeArray = operands[i+2].toPDFArray();
    // specific codes
    var index = 0;
    for(var j = startCode;j<=endCode;++j) {
        map[j] = besToUnicodes(unicodeArray.queryObject(index).toBytesArray());
        index++;
    }
}

How to fill a radio field?

Is there example code for filling out radio fields?

I have it working with text, and checkboxes like in the example here https://github.com/galkahana/HummusJSSamples/blob/master/filling-form-values/main.js

However, even after reading http://pdfhummus.com/post/161128437261/a-good-day-to-everyone-today-we-will-discuss-a and http://pdfhummus.com/post/154893591116/parsing-pdf-digital-form-values I can't figure out how to fill in a radio field. They keep disappearing on me.

Thanks!

Fill form error

I have a very simple pdf form with one single field.
When trying to fill the form with the pdf-form-fill library, I receive this error.
TypeError: Inconsistent ending of dictionary. Wrong nesting of startDictionary and endDictionary

Segmentation fault: 11 when running text-extraction example on Chrome printed pdf

Hello,

I'm not sure if it's more appropriate to post this here or over in HummusJS, but I'm trying to use your sample library for text extraction. The project I'm working on will be using headless chrome to create pdfs, so I'm trying it against a pdf I manually saved from chrome. I've tried a couple of different pdfs with the same results. I get a Segmentation fault: 11 error when running the text-extraction/test.js

It works fine with the supplied pdf and with another pdf I created using PDF writer, so I think there's something unexpected about how Chrome is creating the pdf.

Attached is an example of a pdf I've had this error with. I got as far as finding it seems to be happening inside the translateText function in lib/text-extraction.js but I don't have a whole lot of time to look into it at the moment. Let me know if you need more info.
HummusJS.pdf

text-extraction thwarted by inline images

Played a bit with text-extraction sample and found that if an inline image is encountered (a BI / ID / EI construct) the rest of the page is skipped. Most likely this is happening because the image stream that follows ID is parsed as a PDF token not as a stream.

Any hint on how I might skip inline images?

Thanks!

Error in the sample

Hello, trying to reproduce the fill form sample with exactly the same code I'm having this error:

ReferenceError: modifiedAcroFormDict is not defined
at writeFieldAndKids (G:\DAMA\hitoscreator\public\controller\pdf-form-fill.js:366:9)
at writeFilledField (G:\DAMA\hitoscreator\public\controller\pdf-form-fill.js:392:9)
at G:\DAMA\hitoscreator\public\controller\pdf-form-fill.js:436:13
at Array.forEach (native)
at writeFilledFields (G:\DAMA\hitoscreator\public\controller\pdf-form-fill.js:433:22)
at writeFilledForm (G:\DAMA\hitoscreator\public\controller\pdf-form-fill.js:458:9)
at fillForm (G:\DAMA\hitoscreator\public\controller\pdf-form-fill.js:500:9)
at generatePDF (G:\DAMA\hitoscreator\public\controller\pdf.js:15:3)
at Layer.handle [as handle_request] (G:\DAMA\hitoscreator\node_modules\express\lib\router\layer.js:95:5)
at next (G:\DAMA\hitoscreator\node_modules\express\lib\router\route.js:137:13)

The PDF that I'm trying to modify is version 1.4. Any tip on how to resolve this problem?

PD: Excuse my poor english :(

Encoding bug in the filling-form-example

I think there is a text encoding bug in the filling-form-example, at least on OS X.

If you run the provided example (main.js) and try to fill the form with any non-english characters (for example å ö ä), the output doesn't handle those characters correctly.

Steps to reproduce
Run the example (main.js) with var data = { "Given Name Text Box": "åäö" } on OS X.

Acrobat Reader does not display values filled by HummusJs pdf-fill-form

I'm trying to figure out why Acrobat Reader doesn't display values filled with HummusJS and pdf-form-fill.js. I can open the pdf in Chrome and see the filled field as expected but not when I open in Acrobat Reader. Clicking in the field also does not get the value to display.

Using:

  • The code in this sample (HummusJSSamples/filling-form-values) latest commit (6261ead). With one modification: .TD(0,4.5) added to writeAppearanceXObjectForText in pdf-form-fill.js to fix alignment as per #12. But same issue even without that modification.
  • Adobe Acrobat Reader DC Version 2018.011.20063
  • Chrome Version 69.0.3497.100

Pdf with filled field (2.a. Family Name of Attorney)
g-28-out.pdf

text-extracion.js code duplication

Hello,

I found this code in text-exxtraction.js line 17

if(extGState.getType() === hummus.ePDFIndirectObjectReference) { extGState = pdfReader.parseNewObject(extGState.toPDFIndirectObjectReference().getObjectID()).toPDFDictionary(); } else { extGState = pdfReader.parseNewObject(extGState.toPDFIndirectObjectReference().getObjectID()).toPDFDictionary(); }

Both if branches performs the same code.

Should there be some differences depends on extGState type or if is not necessary?

Not existing member of text state

In text-extraction.js line 131:
state.currentTextState().text.font = _.extend({},resources.extGStates[operands[0].value].font);

currentTextState() returns object which has no member text.
It should be
state.currentTextState().font = _.extend({},resources.extGStates[operands[0].value].font);

pdf-form-lock removes all fields

@galkahana I am using your form filling sample, and now I need to lock/flatten the resulting PDF. @Hatzl directed me to your lock-form branch here, and I am trying to use pdf-form-lock.js

I had to modify pdf-form-lock.js to avoid crashes. I don't think my change in collectWidgetAnnotations will be sufficient, but it avoids crashes for my testing. Here are my changes:
lock-form...rcoryjohnson:patch-1#diff-4b182c8febde9600d798afac72f21a7e

The resulting PDF has all of the fields cleared. "fillable.pdf" is the filled version with the editable fields, and "locked.pdf" is the result pdf which has been passed to lockForm(...) and has all of the fields cleared rather than just locked as intended.

locked.pdf
fillable.pdf

Any assistance will be very appreciated.

Horizontal alignment of filled fields with pdf-form-fill.js

I'm running into problems with the default alignment of filled fields when using the pdf-form-fill.js script. I think I'm hoping that there's some way to set a default alignment for the form XObject that gets created in writeAppearanceXObjectForText().

Referencing Issue #12 I've been able to manually set a y-offset using TD worked well enough. But I'm trying to fill in a form that has fields that should have a mixture of horizontal alignments. There's a comment mentioning that you're not supporting the quadding of the containing acroform. My pdf knowledge is decidedly shaky, but I think "supporting" this would give me correct horizontal alignment in the form I'm trying to fill. If anyone has some pointers on edits I might make to happen It'd be a real lifesaver. I've been banging my head against this wall for quite a while now

Filled Text Position Off

Hi, I used this sample code to test form filling on a pdf with fields that I created in adobe acrobat. After I run the code and open the newly created pdf. Although each text is in the correct field, the text is positioned at the bottom left corner of the field, with half of the text cut off. This happens even if I set the field to align center. What's even weirder is when I input text manually into the field using acrobat, it displays just fine. I don't know why this is happening.

Static Form convert to Editable PDF

Great to see kind interesting work!!!!

Am just checking that static PDF form to convert editable(textbox and checkbox) PDF form like attached PDF. How can we achieve this?

form-2.pdf

In your example, The data filling in static PDF form and generating output PDF file, We are just looking something send static PDF form and output should be Editable PDF form file.

Thanks in advance.

pdf-form-fill radio buttons dosnt fill when no default value is provided

Thanks for this awesome library author.

I tried to fill radio buttons pdf sample pdf created in adobe acrobat pro and they arent working when there is no default option selected for radio button group(i named it Group7).

Initially when i try to get field values using PDFDigitalForm class {"Favourite Colour List Box": "Red", "Group9": null}radio button Group9 is null, since there is no default value selected.

However when there is a default radio button selected then the radio button filling works, but will there be any fix for not being able to fill the radio button when there is no default option selected?

Text alignment in multi column text field

I am working g28.pdf provided in example, there is field "U S C I S Online Account Number, if any". If, I fill this the data won't come properly aligned. Please see image attached.
image

Vertical align form fields

Hi,

I'm currently working on my own little form filling app, but I noticed the text being cut off on the bottom after it is filled in. So for example the letters 'y' or 'p' get cut off. When I edit the form field manually the text will center itself vertically on blur. Very strange. I use font 'Open Sans' btw if that matters.

I'm trying to find the cause but had no success. Do you have any idea how to fix this?
The strangest thing is that if I copy your form filling example it works correctly. So I think it has something to do with Adobe Acrobat.

Which program (& version) did you use to create the example/template PDF? And what kind of PDF is it (I have a lot of options in Adobe acrobat to save as:

  • Reduced size PDF
  • certified PDF
  • reader extended PDF
  • optimized PDF
  • archivable PDF (PDF/A)
  • press-ready PDF (PDF/X)
  • (PDF/E)

I don't know which one to choose..

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.