Giter VIP home page Giter VIP logo

tnefparse's Introduction

tnefparse - TNEF decoding and attachment extraction

CI Status

image

image

image

This is a pure-python library for decoding Microsoft's Transport Neutral Encapsulation Format (TNEF), for Python versions 3.6+ and PyPy3. The last version to support Python2 was 1.3.1. For more information on TNEF, see for example wikipedia. The full TNEF specification is also available as a PDF download.

A tnefparse command-line utility is provided for listing contents of TNEF files, extracting attachments found inside them and so on:

usage: tnefparse [-h] [-o] [-a] [-p PATH] [-b] [-hb]
                 [-l LEVEL] [-c]
                 file [file ...]

Extract TNEF file contents. Show this help message if no arguments are given.

positional arguments:
  file                  space-separated list of paths to the TNEF files

optional arguments:
  -h, --help             show this help message and exit
  -o, --overview         show (possibly long) overview of TNEF file contents
  -a, --attachments      extract attachments, by default to current dir
  -z, --zip              extract attachments into a single zip file, by default to current dir
  -p PATH, --path PATH   optional explicit path to extract attachments to
  -b, --body             extract the body to stdout
  -hb, --htmlbody        extract the HTML body to stdout
  -rb, --rtfbody         extract the RTF body to stdout
  -l LEVEL, --log LEVEL  set log level to DEBUG, INFO, WARN or ERROR
  -c, --checksum         calculate checksums (off by default)
  -d, --dump             extract a json dump of the tnef contents

The library can also be used as a basis for applications that need to parse TNEF. To parse a TNEF attachment, run eg. :

>>> from tnefparse import TNEF >>> with open("tests/examples/one-file.tnef", "rb") as tneffile: ... tnefobj = TNEF(tneffile.read())

The parsed attachment contents are then available as TNEF object attributes:

  • signature - TNEF file signature
  • key - generated by TNEF enabled transports before using the TNEF implementation to generate a TNEF stream
  • codepage - a Windows code page string
  • objects - a collection of TNEFObject instances
  • attachments - a collection of TNEFAttachment instances
  • mapiprops - a collection of MAPI properties represented by TNEFMAPI_Attribute instances
  • body - message body (may contain both HTML and RTF)
  • htmlbody - a string containing just the HTML message body
  • rtfbody - just the RTF body

Some of the above properties may be empty, depending on what's contained in the attachment that was parsed.

Tests

To run the test suite, all you need is tox. tox will run all tests on all supported Python versions.

If you want to run the tests only for e.g. Python 3.8, just enter tox -e py38.

You also can run a subset of tests in a specific environment by invoking e.g. tox -e py38 -- -k test_cmdline.

With tox -e coverage you can generate a coverage report. The output will be shown in the terminal and a HTML coverage report will be generated in the htmlcov directory.

Contributing

Issues and pull requests welcome. Please however always provide an example TNEF file that can be used to demonstrate the bug or desired behavior, if at all possible.

Note: If you have understanding of TNEF and/or MIME internals or just need this package and want to help with maintaining it, I am open to giving you commit rights. Just let me know.

tnefparse's People

Contributors

1nf0rmed avatar albrechtd avatar beercow avatar dmbaggett avatar eumiro avatar evomassiny avatar jugmac00 avatar mlaferrera avatar petri avatar rosspatterson avatar styxman avatar venilton avatar wataru-chocola avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tnefparse's Issues

decode_mapi exception

When using decode_mapi, I get the following error with the logger level set to DEBUG:

'int' object has no attribute '__getitem__'

After poking at the source to prevent it swallowing the stack trace of the error, it looks as though it's coming from:

length = ord(offset[0]) + (ord(offset[1]) << 8) + (ord(offset[2]) << 16) + (ord(offset[3]) << 24)

(That makes sense to me, given that offset is an int.)

Apologies, I don't know the subject matter well enough to propose a fix.

JSON dump fails on Windows

I tried doing a JSON dump and I got this error:

PS C:\Users\REMOVED\Downloads\test> tnefparse ./winmail.dat -d -l INFO
INFO:tnef-decode:Skipping checksum for performance
Traceback (most recent call last):
  File "C:\Users\REMOVED\AppData\Local\Programs\Python\Python37-32\Scripts\tnefparse-script.py", line 11, in <module>
    load_entry_point('tnefparse', 'console_scripts', 'tnefparse')()
  File "c:\users\REMOVED\projects\tnefparse\tnefparse\cmdline.py", line 89, in tnefparse
    print(json.dumps(t.dump(force_strings=True), sort_keys=True, indent=4))
  File "c:\users\REMOVED\projects\tnefparse\tnefparse\tnef.py", line 358, in dump
    attachment[att.name_str] = get_data(att)
  File "c:\users\REMOVED\projects\tnefparse\tnefparse\tnef.py", line 342, in get_data
    if force_strings and isinstance(a.data, bytes):
  File "c:\users\REMOVED\projects\tnefparse\tnefparse\mapi.py", line 205, in data
    return systime(self.raw_data)
  File "c:\users\REMOVED\projects\tnefparse\tnefparse\util.py", line 48, in systime
    return datetime.utcfromtimestamp((ft - EPOCH_AS_FILETIME) / HUNDREDS_OF_NANOSECONDS)
OSError: [Errno 22] Invalid argument

I did a debug of the ft, EPOCH_AS_FILETIME, and HUNDREDS_OF_NANOSECONDS variables as well as the end result and here is the output:

PS C:\Users\REMOVED\Downloads\test> tnefparse ./winmail.dat -d -l DEBUG
INFO:tnef-decode:Skipping checksum for performance
DEBUG:tnef-decode:ft: 915151392000000000
DEBUG:tnef-decode:EPOCH_AS_FILETIME: 116444736000000000
DEBUG:tnef-decode:HUNDREDS_OF_NANOSECONDS: 10000000
DEBUG:tnef-decode:END RESULT: 79870665600.0
Traceback (most recent call last):
  File "C:\Users\REMOVED\AppData\Local\Programs\Python\Python37-32\Scripts\tnefparse-script.py", line 11, in <module>
    load_entry_point('tnefparse', 'console_scripts', 'tnefparse')()
  File "c:\users\REMOVED\projects\tnefparse\tnefparse\cmdline.py", line 89, in tnefparse
    print(json.dumps(t.dump(force_strings=True), sort_keys=True, indent=4))
  File "c:\users\REMOVED\projects\tnefparse\tnefparse\tnef.py", line 358, in dump
    attachment[att.name_str] = get_data(att)
  File "c:\users\REMOVED\projects\tnefparse\tnefparse\tnef.py", line 342, in get_data
    if force_strings and isinstance(a.data, bytes):
  File "c:\users\REMOVED\projects\tnefparse\tnefparse\mapi.py", line 205, in data
    return systime(self.raw_data)
  File "c:\users\REMOVED\projects\tnefparse\tnefparse\util.py", line 48, in systime
    return datetime.utcfromtimestamp((ft - EPOCH_AS_FILETIME) / HUNDREDS_OF_NANOSECONDS)
OSError: [Errno 22] Invalid argument

So it looks like the datetime.utcfromtimestamp function can't take the value of 79870665600.0 which I was able to confirm on my local terminal:

>>> print(datetime.utcfromtimestamp(79870665600.0))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
OSError: [Errno 22] Invalid argument

Is the math just being done wrong on Windows? I was able to patch it by changing HUNDREDS_OF_NANOSECONDS from 10000000 to 1000000000. This at least got the function running but I can't confirm that the dates are correct at this time.

Thanks!

use type hints & mypy

There's a lot of byte & string handling and so on that are easy to get wrong accidentally. Having some type hints and mypy checking them as part of tests would help catch errors & increase code quality.

improve (fix) tnefparse logging

Currently, logging in tnefparse is a bit of a mess and should be improved.

This issue is to gather input for that. If you don't like how tnefparse logs warnings etc. please submit a proposal under this issue.

Using `range` in Python 2 causes memory issues

While parsing some corrupted TNEF files the code below can hang and use large amounts of memory:

num_vals = bytes_to_int(data[offset:offset+4]); offset += 4

#            logger.debug("Number of values: %i" % num_vals)
            attr_data = []
            for j in range(num_vals):

In my example num_vals is for some reason 3666903040 and the parser is using large amounts of memory until statement is evaluated. On other machines with low ammount of RAM, parser just raises MemoryError after a while.

The solutuion would be to use xrange for those loops if the script is running in Python 2.

speed bottleneck within decode_mapi

Hello,

today i had a problem with a customer tnef mail. It took around 150ms to call decode_mapi. Because it is called several times within tnefparse and within my code, it took around 2500ms to handle a request to this file.

I searched for the bottleneck with cProfile and found the problem:

In decode_mapi the var num_vals is a int bigger then several millions. Python will generate a list this big...
Solution: do not generate this big list, just use a iterator: xrange()

Speed change: 2500ms > 60ms

TNEFMAPI_Attribute for attachment content-id

in Mail HTML embedded images are references by content-id. At the moment i could not find this value by TNEFMAPI_Attribute class. So i searched in the mapi_attrs. The content-id attribute has the name '14098' => 0x3712

Should 0x3712 added to the TNEFMAPI_Attribute class?

Inconsistent line endings

Can we adopt some code standards?

In particular I'd like to suggest:

  • unix line endings are used consistently
  • 4 spaces for the default indent

More aggressively we could use something like black to auto-format the style.

nullbytes in data

I try to get some tnef data information, but atm i have to remove nullbytes by myself.
Is there a other way to get the same information without nullbytes, or is this is a bug?

  for tnef_attachment in tnef.attachments:
            for mapi_attr in tnef_attachment.mapi_attrs:
                if mapi_attr.name == tnefparse.TNEFMAPI_Attribute.MAPI_ATTACH_MIME_TAG:
                    mimetype = mapi_attr.data[0].rstrip(b'\x00')

Should we rename PID_TAG_ constants?

PidTag or PidLid are the naming conventions MS uses, so I used the same when adding a bunch of properties outside the core MAPI types.

I'm thinking though, we just call everything MAPI_ for consistency. We'll want to decide this before publishing 1.3.

ValueError has no attribute message

tnefparse/cmdline.py:68: error: "ValueError" has no attribute "message"

This is Python 2 code.

The fix is easy -> use str(e) instead. The test is a bit more elaborate - when there is no suitable test file.

This bug was uncovered only now, as the line has no test coverage.

see

try:
t = TNEF(tfp.read(), do_checksum=args.checksum)
except ValueError as exc:
sys.exit(exc.message)

Drop support for Python 2.7 (and 3.5)

The build currently breaks on Python 2.7, as a transitive dep is no longer Python 2.7 compatible.

While we could pin the dependency, maybe it is about time to move forward?

How to recognise inline attachment

Hi,

Using this library I am able to extract all the attachments from winmail.dat. I have a challenge to differentiate inline attachments (Inline attachment usually is an attachment that we can see directly within the email message body ) and normal attachments.

Is there a way that I can recognize the inline attachments? Could you please help me with that

Many Thanks,
Suresh.

Logging logger not registered when used as module

Looks like the tnef.py when not using command line but used as a library has not initiated logging with logging.basicConfig() so when I import it and there is any issue the logger throws an output like
No handlers could be found for logger "tnef-decode"

Recommended changes to init.py to initiate logging. It seems to happen only in cmline.py - useful only when using command line. It will be helpful for this to happen earlier so importing of tnefparse can work without this error!

Here is my recommend diff for init.py


*** 1,4 ****
--- 1,6 ----
import warnings

  • import logging
  • logging.basicConfig()

Cannot pull ics file

Python: 3.5.3
tnefparse installed from pip
Windows 10

I'm trying to parse a winmail.dat file from a appointment invite email. Using lookout (fix version) in thunderbird I can see a rtf file, the ics file and a vcf file. Using:

    for part in msg.walk():
        if part.get_content_type() == 'application/ms-tnef':
            tnef = TNEF( part.get_payload( decode=True ) )
            for attachment in tnef.attachments:
                print(attachment.name)

I get this output:

WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'TNEF Version'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'OEM Codepage'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Message ID'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Priority'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Date Sent'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Date Modified'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Message Class'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Subject'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Date Start'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Date End'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Owner Appointment ID'>
WARNING:tnef-decode:Unknown TNEF Object: <TNEFObject 'Response Requested'>

It looks like the calendar is being parsed as TNEF Objects. Any advice?

converting winmail.dat tnef mapi calender invitation to .ics file

First of all thanks a lot for tnefparse, it rocks! It seems to be the only opensource util/library that has so far managed to parse this winmail.dat file I have here..

My question is: are there any plans to support generating .ics files from winmail.dat MAPI calender invitations? For example MS Exchange 2010 seems to send such winmail.dat files (and some organizations still seem to use such an old servers..). Based on some googling newer Exchange servers and/or Outlook clients might send such winmail.dat invitations aswell.

If I understand correctly such winmail.dat file does not include any real .ics attachments but instead the calender invitation is included in the MAPI fields/objects/properties.. like this:

$ tnefparse -o winmail.dat  -b

Overview of winmail.dat:

  Attachments:


  Objects:

    TNEF Version
    OEM Codepage
    Message ID
    Priority
    Date Sent
    Date Modified
    Message Class
    Subject
    Date Start
    Date End
    Owner Appointment ID
    Response Requested
    MAPI Properties

  Properties:

    MAPI_TNEF_CORRELATION_KEY
    MAPI_SENT_REPRESENTING_NAME
    MAPI_SENT_REPRESENTING_EMAIL_ADDRESS
    MAPI_SENT_REPRESENTING_ADDRTYPE
    MAPI_SENT_REPRESENTING_ENTRYID
    MAPI_SENT_REPRESENTING_SMTP_ADDRESS
    MAPI_SIP_ADDRESS
    MAPI_SENDER_NAME
    MAPI_SENDER_EMAIL_ADDRESS
    MAPI_SENDER_ADDRTYPE
    MAPI_SENDER_ENTRYID
    MAPI_SENDER_SMTP_ADDRESS 
    MAPI_SEND_RICH_INFO
    MAPI_MESSAGE_CLASS
    MAPI_MESSAGE_LOCALE_ID   
    MAPI_SEND_RICH_INFO
    MAPI_MESSAGE_CODEPAGE
    MAPI_SEARCH_KEY
    MAPI_IMPORTANCE
    MAPI_CLIENT_SUBMIT_TIME  
    MAPI_LAST_MODIFICATION_TIME
    MAPI_OWNER_APPT_ID
    MAPI_START_DATE
    MAPI_END_DATE
    MAPI_READ_RECEIPT_REQUESTED
    MAPI_ORIGINATOR_DELIVERY_REPORT_REQUESTED
    MAPI_LAST_MODIFIER_NAME  
    MAPI_REPLY_REQUESTED
    MAPI_RESPONSE_REQUESTED  
    MAPI_SUBJECT
    MAPI_SUBJECT_PREFIX
    MAPI_SENSITIVITY
    MAPI_ORIGINAL_SENSITIVITY
    MAPI_CONVERSATION_INDEX  
    MAPI_CONVERSATION_TOPIC  
    MAPI_SMTP_MESSAGE_ID
    MAPI_INTERNET_CODEPAGE   
    MAPI_ICON_INDEX
    MAPI_CREATION_TIME
    MAPI_ALTERNATE_RECIPIENT_ALLOWED
    MAPI_PRIORITY
    MAPI_RECIPIENT_REASSIGNMENT_PROHIBITED
    MAPI_NON_RECEIPT_NOTIFICATION_REQUESTED
    MAPI_TARGET_ENTRY_ID
    MAPI_CONVERSATION_ID
    MAPI_CREATOR_NAME
    MAPI_CREATOR_ADDRESS_TYPE
    MAPI_CREATOR_EMAIL_ADDRESS
    MAPI_SENDER_SIMPLE_DISPLAY_NAME
    MAPI_SENT_REPRESENTING_SIMPLE_DISPLAY_NAME
    MAPI_CREATOR_SIMPLE_DISP_NAME
    MAPI_LAST_MODIFIER_SIMPLE_DISPLAY_NAME
    MAPI_INTERNET_MAIL_OVERRIDE_FORMAT
    MAPI_MESSAGE_EDITOR_FORMAT
    MAPI_STORE_SUPPORT_MASK  
<body text goes here>

Flake8

Flake8 is currently only used in GitHub actions.

I'd like to introduce a tox environment for Flake8.

Currently, there would be 78 Flake8 errors when I apply very strict rules.

While the most errors are pretty straight forward to fix, before I start I both wanted to know your take on it in general and especially E221 (multiple spaces before operator).

E221 would mark following code as erroneously formatted:

SZMAPI_UNSPECIFIED = 0x0000 # MAPI Unspecified
SZMAPI_NULL = 0x0001 # MAPI null property
SZMAPI_SHORT = 0x0002 # MAPI short (signed 16 bits)
SZMAPI_INT = 0x0003 # MAPI integer (signed 32 bits)
SZMAPI_FLOAT = 0x0004 # MAPI float (4 bytes)
SZMAPI_DOUBLE = 0x0005 # MAPI double
SZMAPI_CURRENCY = 0x0006 # MAPI currency (64 bits)
SZMAPI_APPTIME = 0x0007 # MAPI application time
SZMAPI_ERROR = 0x000A # MAPI error (32 bits)
SZMAPI_BOOLEAN = 0x000B # MAPI boolean (16 bits)
SZMAPI_OBJECT = 0x000D # MAPI embedded object
SZMAPI_INT8BYTE = 0x0014 # MAPI 8 byte signed int
SZMAPI_STRING = 0x001E # MAPI string
SZMAPI_UNICODE_STRING = 0x001F # MAPI unicode-string (null terminated)
SZMAPI_SYSTIME = 0x0040 # MAPI time (64 bits)
SZMAPI_CLSID = 0x0048 # MAPI OLE GUID
SZMAPI_BINARY = 0x0102 # MAPI binary
SZMAPI_BEATS_THE_HELL_OUTTA_ME = 0x0033
MULTI_VALUE_FLAG = 0x1000
GUID_EXISTS_FLAG = 0x8000

So, before I start. Is applying Flake8 welcome? Is is ok to re-format above code? If not - do you prefer noqa annotations per line or ignore E221 altogether?

Is it ok to enforce Flake8 via Travis / GitHub actions?

tnefparse utility issue

tried running the tnefparse utility prior to including this in my own script, but am receiving the following errors:

[/home/lwapnitsky/git/tnefparse]% tnefparse (10:02 - 12-05-29)
Traceback (most recent call last):
File "/usr/local/bin/tnefparse", line 9, in
load_entry_point('tnefparse==1.1', 'console_scripts', 'tnefparse')()
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 337, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 2279, in load_entry_point
return ep.load()
File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1989, in load
entry = import(self.module_name, globals(),globals(), ['name'])
ImportError: No module named tnef.cmdline

This was installed using python setup.py install using python 2.7 on Debian Wheezy

htmlbody for Hebrew text (iso-8859-8-i encoding) is incorrect

iso-8859-8-i.LOG

The htmlbody of a tnefobject containing Hebrew text (iso-8859-8-i encoding) is incorrect. Attached is an email example iso-8859-8-i.LOG [LOG extension only because github does not allow eml] to demonstrate the issue.

>>> fname = 'iso-8859-8-i.LOG'
>>> import email
>>> mime_msg = email.message_from_file(open(fname))
>>> tnef_parsed_content = mime_msg.get_payload()[-1].get_payload(decode=True)
>>> from tnefparse import TNEF
>>> tnefobj = TNEF(tnef_parsed_content)
>>> htmlbody = tnefobj.htmlbody
>>> htmlbody
u'<html>\r\n<head>\r\n<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-8-i">\r\n<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>\r\n</head>\r\n<body dir="ltr">\r\n<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255);">\r\n<a href="https://www.walla.co.il/" id="LPlnk">https://www.walla.co.il/</a><br>\r\n</div>\r\n<div class="_Entity _EType_OWALinkPreview _EId_OWALinkPreview _EReadonly_1">\r\n<div id="LPBorder_GTaHR0cHM6Ly93d3cud2FsbGEuY28uaWwv" class="LPBorder510319" style="width: 100%; margin-top: 16px; margin-bottom: 16px; position: relative; max-width: 800px; min-width: 424px;">\r\n<table id="LPContainer510319" role="presentation" style="padding: 12px 36px 12px 12px; width: 100%; border-width: 1px; border-style: solid; border-color: rgb(200, 200, 200); border-radius: 2px;">\r\n<tbody>\r\n<tr valign="top" style="border-spacing: 0px;">\r\n<td>\r\n<div id="LPImageContainer510319" style="position: relative; margin-right: 12px; height: 135px; overflow: hidden; width: 240px;">\r\n<a target="_blank" id="LPImageAnchor510319" href="https://www.walla.co.il/"><img id="LPThumbnailImageId510319" alt="" height="135" width="240" style="display: block;" src="https://img.wcdn.co.il/f_auto,q_auto,w_1200,t_54/3/1/3/6/3136860-46.jpg"></a></div>\r\n</td>\r\n<td style="width: 100%;">\r\n<div id="LPTitle510319" style="font-size: 21px; font-weight: 300; margin-right: 8px; font-family: wf_segoe-ui_light, &quot;Segoe UI Light&quot;, &quot;Segoe WP Light&quot;, &quot;Segoe UI&quot;, &quot;Segoe WP&quot;, Tahoma, Arial, sans-serif; margin-bottom: 12px;">\r\n<a target="_blank" id="LPUrlAnchor510319" href="https://www.walla.co.il/" style="text-decoration: none; color: var(--themePrimary);">\xe5\xe5\xe0\xec\xe4! - \xe4\xe0\xfa\xf8 \xe4\xee\xe5\xe1\xe9\xec \xe1\xe9\xf9\xf8\xe0\xec - \xf2\xe3\xeb\xe5\xf0\xe9\xed \xee\xf1\xe1\xe9\xe1 \xec\xf9\xf2\xe5\xef</a></div>\r\n<div id="LPDescription510319" style="font-size: 14px; max-height: 100px; color: rgb(102, 102, 102); font-family: wf_segoe-ui_normal, &quot;Segoe UI&quot;, &quot;Segoe WP&quot;, Tahoma, Arial, sans-serif; margin-bottom: 12px; margin-right: 8px; overflow: hidden;">\r\n\xe5\xe5\xe0\xec\xe4!- \xe4\xe0\xfa\xf8 \xe4\xf4\xe5\xf4\xe5\xec\xf8\xe9 \xe1\xe9\xf9\xf8\xe0\xec. \xe7\xe3\xf9\xe5\xfa \xf2\xe3\xeb\xf0\xe9\xe5\xfa 24/7, \xf2\xf9\xf8\xe5\xfa \xf2\xf8\xe5\xf6\xe9 \xfa\xe5\xeb\xef \xe5\xee\xe9\xe3\xf2 \xee\xe5\xe1\xe9\xec\xe9\xed, \xf9\xe9\xf8\xe5\xfa \xe3\xe5\xe0\xf8 \xe0\xec\xf7\xe8\xf8\xe5\xf0\xe9 \xec\xec\xe0 \xe4\xe2\xe1\xec\xfa \xf0\xf4\xe7, \xf9\xe9\xf8\xe5\xfa\xe9 \xf7\xf0\xe9\xe5\xfa \xe5\xfa\xe9\xe9\xf8\xe5\xfa \xe1\xe0\xfa\xf8 walla!</div>\r\n<div id="LPMetadata510319" style="font-size: 14px; font-weight: 400; color: rgb(166, 166, 166); font-family: wf_segoe-ui_normal, &quot;Segoe UI&quot;, &quot;Segoe WP&quot;, Tahoma, Arial, sans-serif;">\r\nwww.walla.co.il</div>\r\n</td>\r\n</tr>\r\n</tbody>\r\n</table>\r\n</div>\r\n</div>\r\n<br>\r\n</body>\r\n</html>\r\n'
>>> htmlbody[1795:1799]
u'\xe5\xe5\xe0\xec'
>>>

This seems wrongly decoded while constructing the htmlbody as the part in 1795:1799 is not unicode. Using this htmlbody to create a text/html part fails while setting the payload with content charset.

>>> from email.charset import Charset, QP
>>> from email.mime.nonmultipart import MIMENonMultipart
>>> charset_name = 'iso-8859-8'  # text/plain content type is 'iso-8859-8-i' which is mapped to 'iso-8859-8'
>>> cs = Charset(charset_name)
>>> cs.body_encoding = QP
>>> text_body_part = MIMENonMultipart('text', 'html', charset=charset_name)
>>> text_body_part.set_payload(htmlbody, charset=cs)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/email/message.py", line 226, in set_payload
    self.set_charset(charset)
  File "/usr/local/lib/python2.7/email/message.py", line 262, in set_charset
    self._payload = self._payload.encode(charset.output_charset)
  File "/usr/local/lib/python2.7/encodings/iso8859_8.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1795-1799: character maps to <undefined>
>>> htmlbody[1795:1799]
u'\xe5\xe5\xe0\xec'
>>>

Above code snippets were run with version 1.31 on ubuntu machine.

The issue seem to exist in 1.31 and 1.40 versions (didn't check earlier ones but I think they too have the bug).
It seems the bug is here: https://github.com/koodaamo/tnefparse/blob/master/tnefparse/mapi.py#L152-L155

unable to extract content from winmail.dat (Mail is a read receipt)

Hi @petri ,

I am trying to get the mailbody of a read receipt by parsing the winmail.dat file, but I am not able to get it from tnefparse.

This is the code I am using ...

from tnefparse import TNEF
TNEFObject = TNEF(open(filename, 'rb').read(), do_checksum=True) #filename = "winmail.dat"

mailBody = getattr(TNEFObject, "rtfbody")
if mailBody is None:
        logger.debug("RTF Body not found")
        mailBody = getattr(TNEFObject, "htmlbody")
        if mailBody is None:
            logger.debug("HTML Body not found")
            mailBody = getattr(TNEFObject, "body")
            if mailBody is None:
                logger.debug("Mail Body not found")
            else:
                logger.debug("Found the mail mailBody from TNEF file")
        else:
            logger.debug("Found the HTML mailBody from TNEF file")
    else:
        logger.debug("Found the rich text mailBody from TNEF file")

Attached the winmail.dat file for your reference. Could you please help me to check what is the issue?

Note: I am able to get the body of a delivery report email without any issue using the tnefparse.

Many Thanks,
Suresh.

travis-ci.org shuts down at the end of the year

Repositories must be migrated to travis-ci.com

As only owners can do this...

Please

  • log in to travis-ci.com
  • go to settings
  • (for organizations you need to select the org name, for private accounts I am not sure)
  • hit the migrate tab
  • select repos an migrate them

Maybe you need click "sign up for beta" button first.

modifying tnef file

I was wondering if it would be feasible using this package to update the contents of TNEF file easily? for example, just modifying the body html.

TNEF.__str__ is broken

Traceback (most recent call last):
  File "/Users/petri/Code/koodaamo/tnefparse/tests/utest.py", line 5, in <module>
    print(TNEF(tf.read()))
  File "/Users/petri/Code/koodaamo/tnefparse/tnefparse/tnef.py", line 340, in __str__
    return f"<{self.__class__.__name__}:0x{self.key:2.2x}{atts}>"
ValueError: Precision not allowed in integer format specifier

switch to "plain" pytest

We have old pytest integration with a setup.py subcommand and a versioned "runtests.py". This is deprecated and giving some trouble on Python2.

Suggest we remove this and just use pytest from now on.

Make it possible to give cmdline a directory to extract attachments to

This is mostly motivated by testing; it's rather annoying to always find the extracted files in the current directory.

Gonna reshuffle the cmdline params a bit for this; will use -o | --output as the param and make overview use -c | --contents from now on.

See tnefparse --help for revised usage.

WARNING Unknown TNEF Object

thank you for this library.

All works well.

But we get these warning messages.

Are this really warnings? What can we do to suppress these warnings?

Maybe it is not important and no logging in tnefparse would be the best solution.

2014-04-18 11:55:22 tnef-decode: WARNING  Unknown TNEF Object: <TNEFObject 'TNEF Version'>
2014-04-18 11:55:22 tnef-decode: WARNING  Unknown TNEF Object: <TNEFObject 'OEM Codepage'>
2014-04-18 11:55:22 tnef-decode: WARNING  Unknown TNEF Object: <TNEFObject 'Message ID'>
2014-04-18 11:55:22 tnef-decode: WARNING  Unknown TNEF Object: <TNEFObject 'Priority'>
2014-04-18 11:55:22 tnef-decode: WARNING  Unknown TNEF Object: <TNEFObject 'Date Sent'>
2014-04-18 11:55:22 tnef-decode: WARNING  Unknown TNEF Object: <TNEFObject 'Date Modified'>
2014-04-18 11:55:22 tnef-decode: WARNING  Unknown TNEF Object: <TNEFObject 'Message Class'>
2014-04-18 11:55:22 tnef-decode: WARNING  Unknown TNEF Object: <TNEFObject 'Subject'>

new release

Hi there, first thank you for this incredibly useful tool. I really love the JSON export feature but just realized it's not released in the main pip install. Any tips on when to expect it?

TNEF is throwing an error for python version 3.7.3

Hi Team,

Such a great and wonderful tool to parse TNEF files. I am using python 2.7 and it is working excellent, few days back I upgraded to 3.7.3 and I am seeing this exception.

Traceback (most recent call last):
File "TNEFParser.py", line 99, in readTNEFFiles
t = TNEF(open(filename).read(), do_checksum=True) ## filename is winmail.dat
File "/usr/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9f in position 1: invalid start byte

Could you please help me to understand the issue

Support Python 3.9

Just ran tox -e py39 without issues. Should be a matter of adding metadata, documenting and including in builds.

Push new version to PyPI

Hi! Thanks for building this library; it's very handy 😄.

Do you think that you'd be able to push a new build to PyPI? It would be get to be able to pip install (without linking to a GitHub revision) the code that has the latest fixed.

crash with message "struct.error: unpack_from requires a buffer of at least 4 bytes"

Hi,

When I tr to extract the winmail.dat using the TNEF library I am seeing the below crash.
Could you please help me to have a look.

 TNEFObject = TNEF(open(filename, 'rb').read(), do_checksum=True) 
  File "/usr/local/lib/python3.7/dist-packages/tnefparse/tnef.py", line 201, in __init__
    self.signature = uint32(data)
  File "/usr/local/lib/python3.7/dist-packages/tnefparse/util.py", line 20, in unpack
    return call(byte_arr, offset)[0]
struct.error: unpack_from requires a buffer of at least 4 bytes

UnicodeDecodeError: 'gbk' codec can't decode byte 0xba in position 14: illegal multibyte sequence

I run tnefparse in command line with debug mode:
tnefparse winmail.dat -l DEBUG

INFO:tnefparse:Skipping checksum for performance
DEBUG:tnefparse:Attribute type: 0x001e
DEBUG:tnefparse:Attribute name: 0x1008 (MAPI_RTF_SYNC_BODY_TAG)
ERROR:tnefparse:decode_mapi exception: 'gbk' codec can't decode byte 0xba in position 14: illegal multibyte sequence
DEBUG:tnefparse:exception details:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/tnefparse/mapi.py", line 99, in decode_mapi
    attr_data, offset = parse_property(data, offset, attr_name, attr_type, codepage, num_mv_properties)
  File "/usr/local/lib/python3.9/site-packages/tnefparse/mapi.py", line 155, in parse_property
    item = item.decode(codepage)
UnicodeDecodeError: 'gbk' codec can't decode byte 0xba in position 14: illegal multibyte sequence

It seems like the encoding charset is wrong, is that anyway to set the charset as parameter?

Filenames are truncated

When you run with the --attachments argument the files are using the truncated name instead of the full name. This is a problem for my use case. Can we either get an option to use the long_filename or just make this the default?

In my example the filename was decemberpermits14.pdf but was being truncated to decemb~1.pdf.

Thanks!

tnef_attachement.name is not unicode

I get a UnicodeError in my application code, because tnef_attachement.name is not unicode.

Could the tnef library get updated, to make tnef_attachement.name a unicode string?

In my case it looks like a latin1 string. It would improve the usability of the library if the application programmer does not need to do string decoding.

Sorry, I can't post the tnef binary, since it is from a customer.

BTW, how can you create tnef test binaries?

Raise test coverage to 100%

While test coverage is already at a very good 93%, the PR #84 alone would have brought two regressions (changed log level + no more help text), as there are still some lines without coverage.

I suggest to fill the coverage gaps before continuing to work on the current open PRs.

deprecation warning

e.g. run tox - e py38

tests/test_decoding.py::test_zip
  /home/jugmac00/Projects/tnef-parallel/tnefparse/tnefparse/tnef.py:403: DeprecationWarning: passing bytes to tnef.to_zip will be deprecated, pass a TNEF object instead
    warnings.warn(msg, DeprecationWarning)

Release 1.4

We have a bunch of changes ready to go for 1.4. Should anything else be included, or are we ready to ship?

Extract MAPI_ATTACH_METHOD == 5 parts?

I got a few (malicious) messages with winmail.dat attachments which tnefparse cannot extract. The JSON dump produced by it contains, inter alia

    "attachments": [
        {
            "MAPI_ATTACH_METHOD": 5,
            "MAPI_CREATION_TIME": "2020-10-10 14:57:44.056806",
            "MAPI_LANGUAGE": "EnUs",
            "MAPI_LAST_MODIFICATION_TIME": "2020-10-10 14:57:44.056806",
            "MAPI_RENDERING_POSITION": -1,
            "data_len": 1,
            "filename": "",
            "long_filename": ""
        }
    ],

However, the part is ~170 kBytes, and running unzip on it extracts at least parts parts of a (malware) M$ Office document starting at offset ~32 kBytes.
I found a note that MAPI attach method 5 indicates a “message”, whatever this means in this context. Is it generally impossible to extract such items (also, neither tnef nor ytnef-tools produce usable output), or would it be possible to add it to tnefparse?

Thanks in advance, Albrecht.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.