wootski / peepdf Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/peepdf
License: GNU General Public License v3.0
Automatically exported from code.google.com/p/peepdf
License: GNU General Public License v3.0
What steps will reproduce the problem?
1. Run "./peepdf.py -i"
2. Run "create pdf" in peepdf console
3. Run "save 'test.pdf'" in peepdf console
What is the expected output? What do you see instead?
The following content is the cross-reference table and trailer of test.pdf. The
size of cross-reference table is 4, however, there are 5 entries in table.
There is a useless entry in cross-reference table which does not point to an
object.
xref
0 4
0000000000 65535 f
0000000009 00000 n
0000000059 00000 n
0000000118 00000 n
0000000119 00000 n
trailer
<< /Size 4
/Root 1 0 R >>
startxref
210
%%EOF
What version of the product are you using? On what operating system?
The version of peepdf is r42. The operating system is ubuntu-11.10 x86_64.
Please provide any additional information below.
Original issue reported on code.google.com by czchen
on 24 Oct 2011 at 11:50
I needed to dump streams directly to file, e.g. extracting fonts from a PDF.
Attached is a patch which duplicates the 'stream' command, but accepts a
filename to output to rather than the console.
Original issue reported on code.google.com by [email protected]
on 11 Nov 2012 at 5:07
Attachments:
When processing certain files, peepdf crashes with the following error:
UnboundLocalError: local variable 'ret' referenced before assignment
The bug lies in the PDFFilters.py file in the decodeStream() function, line 92:
{{{
Traceback (most recent call last):
File "my_script.py", line 45, in <module>
ret, pdf = PDFCore.PDFParser().parse(filepath, True, True)
File "/home/travesti/peepdf_0.2/PDFCore.py", line 6727, in parse
ret = body.updateObjects()
File "/home/travesti/peepdf_0.2/PDFCore.py", line 4126, in updateObjects
object.resolveReferences()
File "/home/travesti/peepdf_0.2/PDFCore.py", line 2470, in resolveReferences
ret = self.decode()
File "/home/travesti/peepdf_0.2/PDFCore.py", line 2001, in decode
ret = decodeStream(self.encodedStream, self.filter.getValue(), self.filterParams)
File "/home/travesti/peepdf_0.2/PDFFilters.py", line 92, in decodeStream
return ret
UnboundLocalError: local variable 'ret' referenced before assignment
}}}
The exception is raised because there isn't a previous declaration of the "ret"
variable in the decodeStream() function. If none of the conditions are true
then the "ret" variable never gets a value, the function ret is reached and
Python raises the UnboundLocalError exception.
I patched the function just adding the following line at the begenning of the
decodeStream() function:
{{{
ret = (-1, "")
}}}
But it keeps raising errors in other modules :(
Original issue reported on code.google.com by [email protected]
on 8 Mar 2014 at 3:11
What steps will reproduce the problem?
1.https://www.virustotal.com/en/file/784d1ebd1faccec27f98970cc266859eaf5676da1c4
51e3304fb55435d8c8473/analysis/
2. run peepdf.py -f vtfile
What is the expected output? What do you see instead?
#Expected:
Warning: PyV8 is not installed!!
Warning: pylibemu is not installed!!
Decryption error: Bad format for /O!!
Decryption error: Bad format for /U!!
Decryption error: Default user password not working here!!
File: tp_22340_utf8_88292d7181514fda5390292d73da28d4
MD5: 88292d7181514fda5390292d73da28d4
SHA1: fbc3856fd689e1ac0f8fb56bbd7d0a2b8332a928
Size: 807079 bytes
Version: 1.4
Binary: True
Linearized: False
Encrypted: True (RC4 40 bits)
Updates: 0
Objects: 7
Streams: 1
Comments: 0
Errors: 5
Version 0:
Catalog: 1
Info: No
Objects (7): [1, 2, 3, 4, 5, 8, 9]
Errors (1): [5]
Streams (1): [5]
Encoded (1): [5]
Decoding errors (1): [5]
Suspicious elements:
/AcroForm: [1]
/OpenAction: [1]
/JS: [1]
/JavaScript: [1]
#Instead see:
Traceback (most recent call last):
File "peepdf.py", line 352, in <module>
ret,pdf = pdfParser.parse(fileName, options.isForceMode, options.isLooseMode, options.isManualAnalysis)
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/PDFCore.py", line 6822, in parse
ret = pdfFile.decrypt()
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/PDFCore.py", line 5179, in decrypt
ret = computeUserPass(password, dictO, fileId, perm, keyLength, revision, encryptMetadata)
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/PDFCrypto.py", line 164, in computeUserPass
ret = computeEncryptionKey(userPassString, dictO, dictU, dictOE, dictUE, fileID, pElement, keyLength, revision, encryptMetadata)
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/PDFCrypto.py", line 58, in computeEncryptionKey
md5input = password + dictOwnerPass + struct.pack('<I',abs(int(pElement))) + fileID
TypeError: cannot concatenate 'str' and 'instance' objects
What version of the product are you using? On what operating system?
latest version from svn, any os
Please provide any additional information below.
when forcing and encountering errors and the dict0/dictOwnerPass object doesn't
resolve to a simple string and therefore hinders further execution.
Original issue reported on code.google.com by [email protected]
on 5 Sep 2013 at 3:35
Attachments:
peepdf will raise exception when opening the sample.pdf in attachment because
it does not handle key P in standard encryption dictionary properly. The
rc4.patch in attachment can fix this problem.
Original issue reported on code.google.com by czchen
on 21 Oct 2011 at 1:10
Attachments:
We have an automated malware analysis system that runs a variety of scans in
memory on input files. We patched PDFCore.py to enable string input of file
contents, rather than a filename. It is attached, in case anyone finds it
useful.
Original issue reported on code.google.com by [email protected]
on 22 Mar 2012 at 2:48
Attachments:
this is the error.log
Traceback (most recent call last):
File "./peepdf.py", line 541, in <module>
console.cmdloop()
File "/usr/lib/python2.7/cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "/usr/lib/python2.7/cmd.py", line 219, in onecmd
return func(arg)
File "/usr/local/peepdf/PDFConsole.py", line 2721, in do_open
ret = pdfParser.parse(fileName, forceMode, looseMode)
File "/usr/local/peepdf/PDFCore.py", line 6838, in parse
sys.exit('Error: An error has occurred while parsing an indirect object!!')
SystemExit: Error: An error has occurred while parsing an indirect object!!
Traceback (most recent call last):
File "./peepdf.py", line 541, in <module>
console.cmdloop()
File "/usr/lib/python2.7/cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "/usr/lib/python2.7/cmd.py", line 219, in onecmd
return func(arg)
File "/usr/local/peepdf/PDFConsole.py", line 2721, in do_open
ret = pdfParser.parse(fileName, forceMode, looseMode)
File "/usr/local/peepdf/PDFCore.py", line 6838, in parse
sys.exit('Error: An error has occurred while parsing an indirect object!!')
SystemExit: Error: An error has occurred while parsing an indirect object!!
do you need other info?
thanks a lot
Original issue reported on code.google.com by [email protected]
on 23 Jun 2014 at 3:26
What steps will reproduce the problem?
1. Don't install PyV8
2. try to run peepdf.py on any pdf w/ js
What is the expected output? What do you see instead?
For the python to load.
Instead presented with this:
Traceback (most recent call last):
File "peepdf.py", line 32, in <module>
from PDFCore import PDFParser, vulnsDict
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/PDFCore.py", line 31, in <module>
from JSAnalysis import *
File "/Users/tross/Code/satori/peepdf_service/peepdf-svn/JSAnalysis.py", line 36, in <module>
class Global(PyV8.JSClass):
NameError: name 'PyV8' is not defined
What version of the product are you using? On what operating system?
any
Please provide any additional information below.
placing the global class in the try block will fix it... probably a better fix.
try:
import PyV8
JS_MODULE = True
class Global(PyV8.JSClass):
evalCode = ''
def evalOverride(self, expression):
self.evalCode += '\n\n// New evaluated code\n' + expression
return
except:
JS_MODULE = False
Original issue reported on code.google.com by [email protected]
on 5 Sep 2013 at 3:18
What steps will reproduce the problem?
1. running metadata in the console on a malformed PDF
What is the expected output? What do you see instead?
The program crashed with:
Traceback (most recent call last):
File "/home/.../bin/peepdf.py", line 465, in <module>
console.cmdloop(stats + newLine)
File "/usr/lib64/python2.6/cmd.py", line 142, in cmdloop
stop = self.onecmd(line)
File "/usr/lib64/python2.6/cmd.py", line 219, in onecmd
return func(arg)
File "/home/.../src/svn/sec/peepdf-read-only/PDFConsole.py", line 2290, in do_metadata
type = object.getElementByName('/Type').getValue()
AttributeError: 'list' object has no attribute 'getValue'
What version of the product are you using? On what operating system?
r158 from svn
Please provide any additional information below.
I don't know if the patch is the right long-term solution, but it solved my
crash.
Maybe every interactive command should be in a try/except block, so the program
does not crash on the user?
Original issue reported on code.google.com by [email protected]
on 30 Nov 2012 at 3:27
Attachments:
Add the ASCII85Decode filter to peepdf, using the decoder
from pdfminer.
Original issue reported on code.google.com by [email protected]
on 30 Nov 2012 at 2:49
Attachments:
What steps will reproduce the problem?
1. Have a PDF with /AAPL:Keywords and it will get flagged as /AA based on line
43 of PDFCore.py . By adding a space after each of the the items from line
43-45, i.e. - '/AA ', you will still receive hits for legitimate Additional
Actions still but you now won't receive false positive hits because something
else contains _part_ of the data that was looked to match.
What is the expected output? What do you see instead?
Expected to flag only on the correct Event/Action/Element names but instead you
may receive false hits.
What version of the product are you using? On what operating system?
Version included in REMnux - checked the latest trunk version and it should
still be the same.
Please provide any additional information below.
pdfxray_lite also has this issue since it uses peepdf on the back end, however,
since it uses it's own copy of PDFCore.py that owner will be contacted
separately if this issue is accepted as it'll also need the slight change.
Original issue reported on code.google.com by [email protected]
on 11 Jun 2012 at 10:56
When using PDFs containing PNG images with prediction > 10, the current
implementation only decodes part of the image (1/3 of each row of the image).
Luckily, I already found the problem and I will attach a patch with a possible
solution :)
Original issue reported on code.google.com by [email protected]
on 17 Sep 2013 at 9:55
Attachments:
CVE-2013-3346 pdf samples have obfuscated Javascript code using jjencode
(http://utf-8.jp/public/jjencode.html). It would be nice to have a jjdecoder in
peepdf to quickly deobfuscate the code.
Sample jjdecoder written in Javascript can be found here:
http://csc.cs.utm.my/syed/images/files/jjdecode/jjdecode.html
Some explanation about how a jjdecoder works can be found here:
http://corkami.googlecode.com/svn-history/r399/trunk/misc/jjencode.txt
Original issue reported on code.google.com by [email protected]
on 12 Dec 2013 at 12:28
What steps will reproduce the problem?
1. Get this specially forged PDF:
https://www.virustotal.com/en-gb/file/be9c0025b99f0f8c55f448ba619ba303fc65eba862
cac65a00ea83d480e5efec/analysis/
2. run peepdf -fi filename
3. run js_analysis object 6
What is the expected output? What do you see instead?
Run the JS code the PyV8 .
Because there are XFA tags opening and closing, js emulation fails:
*** Error analysing Javascript: SyntaxError: Unexpected token < ( @ 1 : 0 )
-> <? xml version = "1.0"
What version of the product are you using? On what operating system?
Version: peepdf 0.2 r203
Ubuntu 12.10
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 18 Oct 2013 at 2:53
What steps will reproduce the problem?
1. ./peepdf -i
2. create pdf
3. embed file
4. filters 4 lzw
5. save test.pdf
6. exit
7. ./peepdf -i test.pdf
8. peepdf shows decode error in object 4
What is the expected output? What do you see instead?
Peepdf shall encode/decode LZW filter successfully.
What version of the product are you using? On what operating system?
The peepdf version is r45
The python version is 2.7.2+
The operating system is ubuntu 11.10 x86_64
Please provide any additional information below.
The test.pdf can not decode by other PDF tools like origami-pdf.
Original issue reported on code.google.com by czchen
on 27 Oct 2011 at 12:38
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.