Giter VIP home page Giter VIP logo

Comments (4)

jsaponara avatar jsaponara commented on June 29, 2024

Cool! Running otf [the opentaxforms commandline script] on one of the pdf files:

(otf_py35) py35/opentaxforms$ cp t1159-fill-17e.pdf forms/static/pdf/
(otf_py35) py35/opentaxforms$ PYTHONPATH=. script/otf -f t1159-fill-17e.pdf
commandlineArgs:Namespace(calledFromCmdline=True, debug=False, dirName='forms', doctests=False, loglevel='warn', maxrecurselevel=-1, okToDownload=True, postgres=False, quiet=False, recurse=False, rootForms='t1159-fill-17e.pdf', skip=[], useCaches=False, verbose=False, version=False)
logfilename is "t1159-fill-17e.pdf.log"
failed to process 1 forms: ['t1159-fill-17e.pdf']
(otf_py35) py35/opentaxforms$ tail t1159-fill-17e.pdf.log
...
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 75, column 19

or with an older otf version I got, more helpfully:

PDFEncryptionError: Unknown algorithm: param={'CF': {'StdCF': {'Length': 16, 'CFM': /AESV2, 'AuthEvent': /DocOpen}}, 'O': '\x8dm\x14\x0bT~...

In our case, these errors can be addressed by decrypting the pdf:

(otf_py35) py35/opentaxforms$ sudo apt install qpdf
(otf_py35) py35/opentaxforms$ qpdf --decrypt t1159-fill-17e.pdf t1159-fill-17e-decrypt.pdf
(otf_py35) py35/opentaxforms$ ls -l
-rw-rw-r-- 1 john john 385204 Mar 24 06:40 t1159-fill-17e-decrypt.pdf
-rwxr--r-- 1 john john 434789 Mar 24 06:24 t1159-fill-17e.pdf
(otf_py35) py35/opentaxforms$ cp t1159-fill-17e-decrypt.pdf forms/static/pdf
(otf_py35) py35/opentaxforms$ PYTHONPATH=. script/otf -f t1159-fill-17e-decrypt.pdf

which fails at

Form.py, line 163, in pdfInfo: KeyError: 'description': docinfo['desc'] = xmpdict['dc']['description']['x-default']

But that's just a dc [dublin core] field that IRS fills but evidently CRA doesnt. We dont need it. I'll shortly push changes to be more CRA-friendly [or less IRS-specific].

from opentaxforms.

jsaponara avatar jsaponara commented on June 29, 2024

I checked in some temporary CRA fixes, so now the forms run. For example:

(t2) john@nj1:~/dev/tax/opentaxforms$ PYTHONPATH=. script/otf -f t776-fill-17e-decrypt.pdf
... 
form StatementofRealEstateRentals status: layoutBoxes: 200found,0overlapping,?missing,?spurious; refs: 0found,0unrecognized,?missing,?spurious; computedFields: 0found,0empty,?missing,?spurious

Looking thru the resulting log file, this issue gets several subissues:

  • not sure of element type: textbox, checkbox, other? [<Element ...>]
    What pieces of the visible tax form do these xml elements refer to (if any), and are they fillable or do the contain instructions? I suggest we ignore these for now and when all the recognized elements are boxed, these will remain unboxed and their relevance will be obvious.
  • tuple index out of range; icol,coltitles,coltypes,colinstructions=9,('', '', '', '', '', '', '', '', ''),('', 'cost', 'cost', 'proceeds', 'adjustment', '', '', '', ''),('', '', '', '', u'6 Adjustment for current-year additions (col. 3 minus col. 4) divided by 2). If negative, enter "0".', u'7 Base amount for CCA (col. 5 minus col. 6)', '', '', '')
    From line 388 in opentaxforms/extractFillableFields.py, function extractFields.
    Forgive the excessive length of this function! The idea of that part of the function is to propagate the instructions at the top of some table columns to each row in the table. But the comment "set icol to 1st column" doesnt seem to be reflected in any code in that block. So the bug may be related to failure to reset icol when we find a new table (ie when currTable != prevTable).
  • linenumNotFound: cannot find the linenum in: Line 8690. Insurance. Total expenses.
    From line 58 in opentaxforms/link.py
    None of the three IRS line number regex's match, but clearly there's a line number in the string. Maybe another regex is needed.
  • cannotParse: cannot parse [total] cmd [expenses] on t776-fill-17e-decrypt/p2/line8521
    From line 254 in opentaxforms/cmds.py
    'total' is viewed as a command [as in "total number of exemptions claimed"; see line 511 of cmds.py]. Maybe this is a IRS vs CRA difference. Here 'total' should not be considered a command.
  • cannotParse: cannot parse [total] cmd [of column 9] on t776-fill-17e-decrypt/p3/None
    Whereas 'total' here probably is a command, but we need code to parse the predicate.
  • cannotParse: cannot parse [add] cmd [the lines listed under total expenses] on t776-fill-17e-decrypt/p2/None
    And here 'add' is a command, but we need code to parse the predicate and relate it to the input fields.

from opentaxforms.

farhany avatar farhany commented on June 29, 2024

The line numbers are normally 3-4 digits long. I'm not sure how long the IRS forms are usually... :)

Glad to see progress. Let me know if there is anything I can do to help.

from opentaxforms.

jsaponara avatar jsaponara commented on June 29, 2024

The length of the line number doesn't seem to be the issue. Specifically "\w+" (line 39 in opentaxforms/link.py) or "\d+" (lines 42 and 44) will match one or more digits.

How to help: the easiest subissue to address is the 3rd one, adding a regex to match "Line 8690. Insurance. Total expenses." and any other "linenumNotFound" errors in the log file.

from opentaxforms.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.