mailgun / flanker Goto Github PK

View Code? Open in Web Editor NEW

1.6K 1.6K 205.0 10.04 MB

Python email address and Mime parsing library

Home Page: http://www.mailgun.com

License: Apache License 2.0

Python 100.00%

flanker's People

Contributors

Stargazers

Watchers

Forkers

prince-mishra utahdave mahmoud kngspook vdt scardine ansavvides meta-x spang vhagerty pquerna nickcatal michael-benin-cn pmclanahan jhargis redtailtech afthill sharmapradeep marknotfound dmpayton streeter plq etataurov eguilhon sashka jbinfo n3storm alex alexleigh thedrow knoguchi appono liujianping rodrigopalhares aroberts srault95 mathewsbabu smokymountains biplav qq635938204 unitedlexcorp djt5019 pallav17 nylas ethanblackburn fpx406 xjzhou khamidou diggler stefanw glyph kevinrodbe bayangroup vahdani tuksik nathan-muir manugarri laskarcyber wmelton barbuza financial-times mattchoinski rgcarrasqueira rafiot pombredanne preveil hartym madhat2r digideskio spacedogxyz qingniufly winter-guerra skonstantinov89 priestd09 adjokic owo frankobe kaanuki nullstat mcrowson refindlyllc hex55 pfista tarzioo moldabekov saarthaks samantehrani metapipe caohy1988 cweidenkeller dmendiza twrobinette mrmilo tomyam1 raviclerisy clark-hu coresoft2 makote michielbijland dpcloudxteam

flanker's Issues

git tags missing for recent releases

Recent releases appear not to be tagged in git. This makes it hard to figure out if a certain change is part of a release. For example I'm trying to figure out why #21 is not included in the current release.

Python 3 support?

The build is not running with Python 3.
Has anyone ever tried running the tests with Python 3? Is there a reason why it's not supported yet?

Cache interface does not work with dict semantics

When mail_exchanger_lookup() fails to connect to the mail exchanger for a specific domain, it sets the corresponding cache entry to False (line 149). However, lookup_exchanger_in_cache() looks for the string 'False' to figure out if there was an MX connection failure for a domain in cache.

This asymmetry means that caches that implement proper dict semantics (i.e. not coercing all values to strings unlike the Redis driver) fail miserably. I believe that a simple in memory cache like defaultdict(lambda: None) should just work and not fail with TypeError. This behaviour is also the root cause for #31.

Add `replace_header()` method

The email module has a convenient replace_header method. I don't see a clear way to easily make this with Flanker.

Installation Errors

I get the following when installing on Ubuntu:

Package libffi was not found in the pkg-config search path.
Perhaps you should add the directory containing `libffi.pc'
to the PKG_CONFIG_PATH environment variable
No package 'libffi' found

compilation terminated.
error: Setup script exited with error: command 'x86_64-linux-gnu-gcc' failed with exit status 1

Separation of email format validation for standalone use

I have a use case where I would just like to ensure an email has a valid format. I would love to see some additional separation in the solution.
For this purpose this module has heavy dependencies: chardet dnsq expiringdict mock nose WebOb redis regex dnspython

Aside from my, perhaps slim, use-case there is also the matter of maybe separating mime-parsing and email address validation. I really don't foresee using these two areas in the same parts of my system.

I understand if these aren't relevant, but since your module seems good I figured I'd chip in my thoughts for my use-case.

ImportError: No module named addresslib

This import is not working:

from flanker.addresslib import address

Traceback (most recent call last):
File "/home/jabur/PycharmProjects/scripts/tmp/flanker.py", line 1, in
from flanker.addresslib import address
File "/home/jabur/PycharmProjects/scripts/tmp/flanker.py", line 1, in
from flanker.addresslib import address
ImportError: No module named addresslib

Redis caching credentials

Hey there, excellent work on the flanker library. I'm looking to use it in a docker environment where I do not have a local redis cache. I would like to submit a PR that adds configurable Redis tuning for RedisCache.

My thought was to make it configurable via environment variables REDIS_HOST, REDIS_PORT, and REDIS_DB with the defaults matching the current behavior. If I submitted this, what are the chances of it being included in a (near) future release?

If you do not like this approach, how would you like me to approach the PR?

i18n support for strip_replies on subject

Relevant here: https://github.com/mailgun/flanker/blob/master/flanker/mime/message/headers/wrappers.py#L185

Other languages deviate from the mostly standard Re: prefix. For example, some Outlook clients in German locale will use "AW" (reply) and "WG" (forwarded).

A more complete list: http://en.wikipedia.org/wiki/List_of_email_subject_abbreviations#Abbreviations_in_other_languages

A valid email as invalid...

Maybe it's a bug...

[email protected] - It's a valid email but the API return it as invalid.

Noob Help with locating attachments

I've been using Flanker for a while with no issues, but I just got a new feed and I'm struggling to fix my problem.

Flanker finds the attachment type with this: msg.parts[1] and then it strips and decodes the attachment with this: msg.parts[1].body

The problem is, the attachments from my new feed aren't in those sections. I get IndexError: List out of range

I don't know how to look anywhere else. If I do msg.parts[2], that doesn't work. Any help?

I'm basically doing this:
message_string = sys.stdin.read() msg = mime.from_string(message_string) msg.headers.items() print msg.parts[1] print msg.parts[1].body

on this email: http://pastebin.com/A3EXGBTB

But can't get it to work.

Errors in encoding

I am getting errors for:

if msg.content_type.is_multipart():
            for part in msg.parts:
                print 'Content-Type: {} Body: {}'.format(part, part.body)

This is error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u017e' in position 159: ordinal not in range(128)

It looks like your library is only working for use characters. That sucks.

Here are few chars: ka=BEem, prili=E8no in raw email.

Unnecessary double-quoting of MIME header parameters

Given a MIME header as follows:

Content-Type: multipart/alternative; boundary=decafbad

Doing a round-trip parse-then-serialize using Flanker will result in the header being represented as:

Content-Type: multipart/alternative; boundary="decafbad"

This is due to flanker.mime.messages.headers.encoding.encode_param calling email.message._formatparam without explicitly specifying the optional quote argument (which defaults to True.)

>>> email.message._formatparam('boundary', 'hello')
'boundary="hello"'
>>> email.message._formatparam('boundary', 'hello', quote=False)
'boundary=hello'

>>> email.message._formatparam('boundary', 'hel<lo')
'boundary="hel<lo"'
>>> email.message._formatparam('boundary', 'hel<lo', quote=False)
'boundary="hel<lo"'

As seen above, if Flanker passes quote=False when calling email.message._formatparam, the parameter value would only be quoted if the value does not contain any special characters.

AFAICT, header parameters can be always double-quoted (even if the value is a legal "token", and contains no special characters -- as the example above shows.) So, this is arguably not a bug.

However, I have found (empirically) that Gmail tends to generate hex boundary values, and do not double-quote it in the Content-Type header. This causes DKIM signature (which covers the Content-Type header) verification to fail on messages that have gone through this roundtrip conversion.

We have found this to be an issue with using Mailgun API to retrieve inbound messages with the Accept: message/rfc2822 header. We found that the message returned has an incorrectly re-encoded Content-Type header boundary parameter. This in turn causes DKIM verification to fail. I'm mentioning this because we think that Mailgun is using Flanker internally.

Obviously, the "fix" above could break in the opposite scenario where the original header had a double-quoted boundary parameter value even when it contains no special characters. Unfortunately, I do not have evidence whether such messages are out there in the wild, and if so, how frequent.

I also realise that there are other ways that DKIM verification could fail when doing such roundtrip conversions. However, at least for a big major email provider (Gmail), fixing this header value encoding takes care of the issue in our testing.

Headers with certain characters fail to log

When a header fails parsing, if there is a character that cannot be decoded to ASCII, it will cause logging to fail.

Traceback (most recent call last):
  File "/app/src/flanker/flanker/mime/message/headers/encodedword.py", line 79, in mime_to_unicode
    b64encode(header)))
  File "/usr/lib/python2.7/base64.py", line 53, in b64encode
    encoded = binascii.b2a_base64(s)[:-1]
UnicodeEncodeError: 'ascii' codec can't encode character u'\xea' in position 11: ordinal not in range(128)

ImportError: No module named paste.util.multidict

FYI. :)

→ sudo pip install flanker
[…]



    442 warnings generated.
    cc -bundle -undefined dynamic_lookup -arch x86_64 -arch i386 -Wl,-F. build/temp.macosx-10.9-intel-2.7/Python2/_regex.o -o build/lib.macosx-10.9-intel-2.7/_regex.so

  Running setup.py install for dnspython

Successfully installed flanker chardet dnsq expiringdict mock nose Paste redis regex dnspython
Cleaning up...

Then on the CLI

→ python
Python 2.7.5 (default, Aug 25 2013, 00:04:04) 
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from flanker.addresslib import address
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/flanker/addresslib/address.py", line 38, in <module>
    import flanker.addresslib.parser
  File "/Library/Python/2.7/site-packages/flanker/addresslib/parser.py", line 79, in <module>
    from flanker.mime.message.headers.encoding import encode_string
  File "/Library/Python/2.7/site-packages/flanker/mime/__init__.py", line 61, in <module>
    from flanker.mime.message.errors import DecodingError, EncodingError, MimeError
  File "/Library/Python/2.7/site-packages/flanker/mime/message/__init__.py", line 1, in <module>
    from flanker.mime.message.scanner import ContentType
  File "/Library/Python/2.7/site-packages/flanker/mime/message/scanner.py", line 4, in <module>
    from flanker.mime.message.headers import parsing, is_empty, ContentType
  File "/Library/Python/2.7/site-packages/flanker/mime/message/headers/__init__.py", line 1, in <module>
    from flanker.mime.message.headers.headers import MimeHeaders
  File "/Library/Python/2.7/site-packages/flanker/mime/message/headers/headers.py", line 1, in <module>
    from paste.util.multidict import MultiDict
ImportError: No module named paste.util.multidict

Failed to correctly parse ISO-8859-15 encoded FROM header

See this issue for full example: nylas/sync-engine#174

Question about MAX_LINE_LENGTH in parsing.py

Hello,

Thanks for your working. I'm using this package to parse email but I meet a problem.

The question is raised from

flanker/flanker/mime/message/headers/parsing.py

Lines 61 to 65 in 8421b0e

 for line in fp: 

 if len(line) > MAX_LINE_LENGTH: 

 raise DecodingError( 

 "Line is too long: {0}".format(len(line)))

Is the MAX_LINE_LENGTH necessary?

Support Python 3

Support Yahoo disposable e-mail addresses

The Yahoo e-mail plugin rejects disposable e-mail addresses created using Yahoo's "AddressGuard" feature. This feature allows accounts to create a basename prefix and then append a hyphen (-) and a keyword to end. It's similar to Gmail's + suffix but the basename is different from the Yahoo username.

The basename and keyword parts appear to support the same characters as a normal username: [ a-z, 0-9, dot, period ]. There doesn't appear to be a minimum length on keyword value as I have seen e-mail addresses using "1" as their keyword.

A few links explaining the functionality are available from Yahoo:

Add HACKING document

Adding a HACKING.md would be helpful to potential contributors

Discussion Forum? (IRC? Github issues?)
Code Style guidelines (PEP8? Other?)
How to run test cases (nosetests?)

Decode gb2312

flanker can't decode some messages with charset "gb2312" e.g.:

Content-Type: text/plain;
    charset="gb2312"
Content-Transfer-Encoding: base64

DQogICCyze+LmEkNCiAgIDIwMTUtMS0yOQ0K

See StackOverflow for more details.

regex version pin breaks Python 3 support

Currently regex has a hard version pin on a very old version of the regex package. This version does not support Python 3, which results in flanker also not being usable in Python 3 projects.

The version pin has a comment that indicates that this is done for performance reasons. I am wondering a few things:

Is there a benchmark script that one can run to test this performance problem? That would make it possible to discuss this with the regex developers.
Which regex version showed this degradation?
For many sites slow performance for flanker is not problematic. Can you consider doing the same pin/unpinned trick you added for #20 for regex as well?

Developing/Hosting when post 25 is blocked.

I was developing something utilizing Flanker in windows successfully. When I switched over to a Mac, connect_to_mail_exchanger() began timing out. Using netcat on linux, mac, and windows, I've discovered that I can't establish a connection to the mail server on port 25 at all. I believe it's because FIOS is blocking it. I can't explain why the code seemed to be working on windows and linux.

Would it be useful to provide a method to suppress this check for those developing (seems likely) or serving (less likely) on a connection that's blocking connection on port 25?

Email parsed when host name has no tld

from flanker.addresslib import address
email = address.parse('foo@examplecom', addr_spec_only=True)
print email

gives out: foo@examplecom

Why the software history was not kept?

Hi there,

I'm a researcher studying software evolution. As part of my current research, I'm studying the implications of open-sourcing a proprietary software, for instance, if the project succeed in attracting newcomers. However, I observed that some projects, like flanker, deleted their software history.

9eae95f

Knowing that software history is indispensable for developers (e.g., developers need to refer to history several times a day), I would like to ask flanker developers the following four brief questions:

Why did you decide to not keep the software history?
Do the core developers faced any kind of problems, when trying to refer to the old history? If so, how did they solve these problems?
Do the newcomers faced any kind of problems, when trying to refer to the old history? If so, how did they solve these problems?
How does the lack of history impacted on software evolution? Does it placed any burden in understanding and evolving the software?

Thanks in advance for your collaboration,

Gustavo Pinto, PhD
http://www.gustavopinto.org

Attachments from AppleMail client have no bodies

Flanker parses attachment's body from email as None:

In [19]: mail = open("/tmp/1.txt", "rb").read()

In [20]: mail
Out[20]: 'Delivered-To: [email protected]\nReceived: by 10.27.184.6 with SMTP id i6csp81852wlf;\n        Wed, 9 Sep 2015 01:42:19 -0700 (PDT)\nX-Received: by 10.180.75.176 with SMTP id d16mr54538910wiw.75.1441788139160;\n        Wed, 09 Sep 2015 01:42:19 -0700 (PDT)\nReturn-Path: <[email protected]>\nReceived: from mail-wi0-f180.google.com (mail-wi0-f180.google.com. [209.85.212.180])\n        by mx.google.com with ESMTPS id kf6si11215377wjb.11.2015.09.09.01.42.19\n        for <[email protected]>\n        (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n        Wed, 09 Sep 2015 01:42:19 -0700 (PDT)\nReceived-SPF: softfail (google.com: domain of transitioning [email protected] does not designate 209.85.212.180 as permitted sender) client-ip=209.85.212.180;\nAuthentication-Results: mx.google.com;\n       spf=softfail (google.com: domain of transitioning [email protected] does not designate 209.85.212.180 as permitted sender) [email protected]\nReceived: by wiclk2 with SMTP id lk2so12858549wic.1\n        for <[email protected]>; Wed, 09 Sep 2015 01:42:19 -0700 (PDT)\nX-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\n        d=1e100.net; s=20130820;\n        h=x-gm-message-state:from:content-type:subject:message-id:date:to\n         :mime-version;\n        bh=iADgoaV1xyNMOvT6xlZQjOp+2r7wUfsTdfZA3wUI/eg=;\n        b=H6GiBZGlEfhUxlw6ytg1vbcHiqXd69rOWl0z09HqH6ywhG8dSDXlFFVfe0rYKVZAjc\n         bD0YAjmEAw1BjgRJUXMsVa4zS48+iRLSqRboeWBjnbxJAseUHesxCKzCOd0FTITxHAA6\n         S9E3MwSqUv+zwK6ES7DV90X0hWvxVUyzzVSDtemBnV/rkWr7jlZ9uyAvnaK7dztiTZos\n         lKwuz4+H0OvDw0LV1d1y/23rr0R6TMGqd8QmGnlVqyCTI8E6LQjoeHWaQ3b7tLJxHMtM\n         d5NIhqkRAl58aVSSTSAbKOEiAUqgBq98ZJpz4q5Nw3stPdu1btF/uDxyLUyaQmoTU8nr\n         vIUA==\nX-Gm-Message-State: ALoCoQmaqZjZegl0KF6Y/see4tzw8O/hXN1+vW7W0waIfhTff9DYQa3y+iMBYjCE6XlOJAsq2d1U\nX-Received: by 10.180.230.197 with SMTP id ta5mr31843529wic.26.1441788138945;\n        Wed, 09 Sep 2015 01:42:18 -0700 (PDT)\nReturn-Path: <[email protected]>\nReceived: from [192.168.1.9] (215-81-133-95.pool.ukrtel.net. [95.133.81.215])\n        by smtp.gmail.com with ESMTPSA id fn8sm2658059wib.2.2015.09.09.01.42.18\n        for <[email protected]>\n        (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);\n        Wed, 09 Sep 2015 01:42:18 -0700 (PDT)\nFrom: Michael Korbakov <[email protected]>\nContent-Type: multipart/mixed; boundary="Apple-Mail=_C9C2B061-E965-4274-A8D2-0CAAB92A2F17"\nSubject: =?utf-8?Q?Fwd=3A_=D1=82=D0=B5=D1=81=D1=82_ApplMail?=\nMessage-Id: <[email protected]>\nDate: Wed, 9 Sep 2015 11:42:16 +0300\nTo: Anton Koval <[email protected]>\nMime-Version: 1.0 (Mac OS X Mail 8.2 \\(2104\\))\nX-Mailer: Apple Mail (2.2104)\n\n\n--Apple-Mail=_C9C2B061-E965-4274-A8D2-0CAAB92A2F17\nContent-Transfer-Encoding: base64\nContent-Type: text/plain;\n\tcharset=utf-8\n\n0KLQtdGB0YINCg==\n--Apple-Mail=_C9C2B061-E965-4274-A8D2-0CAAB92A2F17\nContent-Disposition: attachment;\n\tfilename*=utf-8\'\'%D1%82%D0%B5%D1%81%D1%82%20ApplMail.eml\nContent-Type: message/rfc822;\n\tx-mac-hide-extension=yes;\n\tname="=?utf-8?Q?=D1=82=D0=B5=D1=81=D1=82_ApplMail=2Eeml?="\nContent-Transfer-Encoding: 7bit\n\nDelivered-To: [email protected]\nReceived: by 10.27.173.129 with SMTP id w123csp81617wle;\n        Wed, 9 Sep 2015 01:38:33 -0700 (PDT)\nX-Received: by 10.180.101.164 with SMTP id fh4mr54549269wib.25.1441787913118;\n        Wed, 09 Sep 2015 01:38:33 -0700 (PDT)\nReturn-Path: <[email protected]>\nReceived: from mail-wi0-f172.google.com (mail-wi0-f172.google.com. [209.85.212.172])\n        by mx.google.com with ESMTPS id l20si11145816wjw.125.2015.09.09.01.38.33\n        for <[email protected]>\n        (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n        Wed, 09 Sep 2015 01:38:33 -0700 (PDT)\nReceived-SPF: softfail (google.com: domain of transitioning [email protected] does not designate 209.85.212.172 as permitted sender) client-ip=209.85.212.172;\nAuthentication-Results: mx.google.com;\n       spf=softfail (google.com: domain of transitioning [email protected] does not designate 209.85.212.172 as permitted sender) [email protected]\nReceived: by wicfx3 with SMTP id fx3so12722933wic.0\n        for <[email protected]>; Wed, 09 Sep 2015 01:38:33 -0700 (PDT)\nX-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\n        d=1e100.net; s=20130820;\n        h=x-gm-message-state:mime-version:date:message-id:subject:from:to\n         :content-type;\n        bh=yCsUOUoICXWy85xw3Vv9Q1aD4vzogMR3DYZ0+y+X65w=;\n        b=jIoY2gfVjfrjya69jmfc3W/2p108e3c1+3TDCbKQNAgT2B1BQfEtvZqWXHXcij4NbK\n         YCyiva/EJ3CBFQ9C0B4j+fiKUCi5DLTkUXF6E9W6INNM5HFdQJiIpWqi/kHdI7gtBN2G\n         6WlrcVm1NHvAvESVh7j65iDcmdimNDC/zjrUFb0nCkpvmldmhP0dJGTJ0K8t1Ho/3RhH\n         6P2idxr+HpR1RbEaV5+0ehUmiEiVZinaEyGUuT8fcLnsz/ztCy6LueNIyT+jmNbvz1HH\n         /R5/lIykXTI40Q5FUz5vuLxx09u1s4f7JPvAWISnzmQTm51NIesZQ65F94twSce8wLtC\n         A7sA==\nX-Gm-Message-State: ALoCoQmKzyo22NkyAdTk4HymZiAf2dqGE2bdsk3Drc+uloOqVIX4tuICf8KIgMN5JuNsFIQbZlFn\nMIME-Version: 1.0\nX-Received: by 10.194.201.71 with SMTP id jy7mr56686474wjc.93.1441787912788;\n Wed, 09 Sep 2015 01:38:32 -0700 (PDT)\nReceived: by 10.27.176.135 with HTTP; Wed, 9 Sep 2015 01:38:32 -0700 (PDT)\nDate: Wed, 9 Sep 2015 11:38:32 +0300\nMessage-ID: <CABxjYs9Kk=4OnT6uYrR5=kiQ3H+yGw9XNcu_JaJvwRDU_U5GSA@mail.gmail.com>\nSubject: =?UTF-8?B?0YLQtdGB0YIgQXBwbE1haWw=?=\nFrom: Anton Koval <[email protected]>\nTo: Michael Korbakov <[email protected]>\nContent-Type: multipart/alternative; boundary=047d7bae4944623830051f4c6811\n\n--047d7bae4944623830051f4c6811\nContent-Type: text/plain; charset=UTF-8\nContent-Transfer-Encoding: base64\n\n0JAg0YHQtNC10LvQsNC5INGA0LXQv9C70LDQuSDQvdCwINGN0YLQviDQv9C40YHRjNC80L4g0YfQ\ntdGA0LXQtyDRjdC/0L/Qu9C+INC60LvQuNC10L3Rgi4NCg==\n--047d7bae4944623830051f4c6811\nContent-Type: text/html; charset=UTF-8\nContent-Transfer-Encoding: base64\n\nPGRpdiBkaXI9Imx0ciI+0JAg0YHQtNC10LvQsNC5INGA0LXQv9C70LDQuSDQvdCwINGN0YLQviDQ\nv9C40YHRjNC80L4g0YfQtdGA0LXQtyDRjdC/0L/Qu9C+INC60LvQuNC10L3Rgi48YnI+PC9kaXY+\nDQo=\n--047d7bae4944623830051f4c6811--\n\n--Apple-Mail=_C9C2B061-E965-4274-A8D2-0CAAB92A2F17--'

In [21]: msg = mime.from_string(mail)

In [23]: parts = [p for p in msg.walk(with_self=True)]

In [24]: parts
Out[24]:
[<flanker.mime.message.part.MimePart at 0x10bfa7d50>,
 <flanker.mime.message.part.MimePart at 0x10bfa7ad0>,
 <flanker.mime.message.part.MimePart at 0x10bfa7cd0>,
 <flanker.mime.message.part.MimePart at 0x10bfa7c50>,
 <flanker.mime.message.part.MimePart at 0x10bfa7b50>,
 <flanker.mime.message.part.MimePart at 0x10bfa7bd0>]

In [25]: [(p.is_attachment(), p) for p in parts]
Out[25]:
[(False, <flanker.mime.message.part.MimePart at 0x10bfa7d50>),
 (False, <flanker.mime.message.part.MimePart at 0x10bfa7ad0>),
 (True, <flanker.mime.message.part.MimePart at 0x10bfa7cd0>),
 (False, <flanker.mime.message.part.MimePart at 0x10bfa7c50>),
 (False, <flanker.mime.message.part.MimePart at 0x10bfa7b50>),
 (False, <flanker.mime.message.part.MimePart at 0x10bfa7bd0>)]

In [26]: attach = parts[2]

In [27]: attach.dete
attach.detected_content_type  attach.detected_file_name     attach.detected_format        attach.detected_subtype

In [27]: attach.detected_file_name
Out[27]: u'\u0442\u0435\u0441\u0442 ApplMail.eml'

In [28]: attach.body is None
Out[28]: True

However:

In [29]: p_attach = attach.to_python_message()
In [32]: p_attach.get_payload()[0].as_string()
Out[32]: 'Delivered-To: [email protected]\nReceived: by 10.27.173.129 with SMTP id w123csp81617wle;\n Wed, 9 Sep 2015 01:38:33 -0700 (PDT)\nX-Received: by 10.180.101.164 with SMTP id fh4mr54549269wib.25.1441787913118; \n Wed, 09 Sep 2015 01:38:33 -0700 (PDT)\nReturn-Path: <[email protected]>\nReceived: from mail-wi0-f172.google.com (mail-wi0-f172.google.com.\n [209.85.212.172])\n by mx.google.com with ESMTPS id l20si11145816wjw.125.2015.09.09.01.38.33\n for <[email protected]>\n (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n Wed, 09 Sep 2015 01:38:33 -0700 (PDT)\nReceived-SPF: softfail (google.com: domain of transitioning [email protected]\n does not designate 209.85.212.172 as permitted sender)\n client-ip=209.85.212.172; \nAuthentication-Results: mx.google.com;\n spf=softfail (google.com: domain of transitioning [email protected] does not\n designate 209.85.212.172 as permitted sender) [email protected]\nReceived: by wicfx3 with SMTP id fx3so12722933wic.0\n for <[email protected]>; Wed, 09 Sep 2015 01:38:33 -0700 (PDT)\nX-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;\n d=1e100.net; s=20130820;\n h=x-gm-message-state:mime-version:date:message-id:subject:from:to\n :content-type;\n bh=yCsUOUoICXWy85xw3Vv9Q1aD4vzogMR3DYZ0+y+X65w=;\n b=jIoY2gfVjfrjya69jmfc3W/2p108e3c1+3TDCbKQNAgT2B1BQfEtvZqWXHXcij4NbK\n YCyiva/EJ3CBFQ9C0B4j+fiKUCi5DLTkUXF6E9W6INNM5HFdQJiIpWqi/kHdI7gtBN2G\n 6WlrcVm1NHvAvESVh7j65iDcmdimNDC/zjrUFb0nCkpvmldmhP0dJGTJ0K8t1Ho/3RhH\n 6P2idxr+HpR1RbEaV5+0ehUmiEiVZinaEyGUuT8fcLnsz/ztCy6LueNIyT+jmNbvz1HH\n /R5/lIykXTI40Q5FUz5vuLxx09u1s4f7JPvAWISnzmQTm51NIesZQ65F94twSce8wLtC\n A7sA==\nX-Gm-Message-State: ALoCoQmKzyo22NkyAdTk4HymZiAf2dqGE2bdsk3Drc+uloOqVIX4tuICf8KIgMN5JuNsFIQbZlFn\nMIME-Version: 1.0\nX-Received: by 10.194.201.71 with SMTP id jy7mr56686474wjc.93.1441787912788;\n Wed, 09 Sep 2015 01:38:32 -0700 (PDT)\nReceived: by 10.27.176.135 with HTTP; Wed, 9 Sep 2015 01:38:32 -0700 (PDT)\nDate: Wed, 9 Sep 2015 11:38:32 +0300\nMessage-ID: <CABxjYs9Kk=4OnT6uYrR5=kiQ3H+yGw9XNcu_JaJvwRDU_U5GSA@mail.gmail.com>\nSubject: =?UTF-8?B?0YLQtdGB0YIgQXBwbE1haWw=?=\nFrom: Anton Koval <[email protected]>\nTo: Michael Korbakov <[email protected]>\nContent-Type: multipart/alternative; boundary=047d7bae4944623830051f4c6811\n\n--047d7bae4944623830051f4c6811\nContent-Type: text/plain; charset=UTF-8\nContent-Transfer-Encoding: base64\n\n0JAg0YHQtNC10LvQsNC5INGA0LXQv9C70LDQuSDQvdCwINGN0YLQviDQv9C40YHRjNC80L4g0YfQ\ntdGA0LXQtyDRjdC/0L/Qu9C+INC60LvQuNC10L3Rgi4NCg==\n--047d7bae4944623830051f4c6811\nContent-Type: text/html; charset=UTF-8\nContent-Transfer-Encoding: base64\n\nPGRpdiBkaXI9Imx0ciI+0JAg0YHQtNC10LvQsNC5INGA0LXQv9C70LDQuSDQvdCwINGN0YLQviDQ\nv9C40YHRjNC80L4g0YfQtdGA0LXQtyDRjdC/0L/Qu9C+INC60LvQuNC10L3Rgi48YnI+PC9kaXY+\nDQo=\n--047d7bae4944623830051f4c6811--\n'

Maybe I'm trying to get attachment's body in a wrong way?

No long description in setup.py

There is no LONG_DESCRIPTION in setup.py, which means if you go to the pypi page for flanker you don't see any information about the app.

Since the URL is to Mailgun there is also no easy way for somebody to find the documentation (which is inside this repo)

Best way to handle this is probably to just use RST for the core README and then have long_description pull from that and change the URL to point to this repo instead of the main mailgun site.

expiringdict: stale dependency?

Is expiringdict still required for this project? The github search bar only finds it in the setup.py. If it is no longer needed, it should be removed from the setup.py.

ImportError: No module named setuptools

Following this:
https://github.com/mailgun/flanker

I get this:

ubuntu@ubuntu:~/Desktop/python/flanker$ python setup.py install
Traceback (most recent call last):
  File "setup.py", line 4, in <module>
    from setuptools import setup, find_packages
ImportError: No module named setuptools

This fixes it:

wget https://bootstrap.pypa.io/ez_setup.py -O - | sudo python

address.validate_list validates url as valid email address

Hello there,

I recently bumped into strange case.
Passing this url to address.validate_list function, returns it as it is a valid email address:
http://mail.bg/#message/inbox/1/1/all

>>> address.validate_list("http://mail.bg/")
[http://mail.bg/]

This happens also with this url:
http://broshura.bg/shops/hippoland

At the same time, other urls where not returned as not valid email which is the expected result.
Passing the urls from above to address.validate_address works as expected and returns None.

>>> address.validate_address("http://mail.bg/")
>>>

Regards,
Lyubo

Flanker barfs on malformed MIME parts

Hi,

We use flanker to parse MIME parts and I think I found a special case where the parser crashes on slightly-malformed content. I've narrowed it to the following test case:

Delivered-To: [email protected]
Date: 11 Jan 2013 18:54:26 -0000
MIME-Version: 1.0
To: <[email protected]>
Message-ID: <1357884894.S.69618.18751.f5mail-224-118.example.com>
Sender: [email protected]
Subject: Dear sir
From: "John Doe " <[email protected]>
Content-Type: multipart/mixed;
    boundary="=_e6ddd3579a993208589b263b76d66bec"

--=_e6ddd3579a993208589b263b76d66bec
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="UTF-8"

URGENT - HELP ME DISTRIBUTE MY $15 MILLION TO CHARITY

IN SUMMARY:- I have 15,000,000.00 (fifteen million) U.S. Dollars and I want you to assist me in distributing the money to charity organizations.

--=_e6ddd3579a993208589b263b76d66bec
Content-Transfer-Encoding: 
Content-Type: message/rfc822;
 name="ForwardedMessage"; 
Content-Disposition: inline;
 filename="ForwardedMessage"; 
--=_e6ddd3579a993208589b263b76d66bec--

I think the parser expects the Message part to be followed by \r\n which makes it crash.

I use the following program to trigger the bug:

import sys
from flanker import mime

fd = open(sys.argv[1], "r")
contents = fd.read()

parsed = mime.from_string(contents)

for mimepart in parsed.walk(with_self=parsed.content_type.is_singlepart()):
    print mimepart.headers

The traceback is:

Traceback (most recent call last):
  File "/contrib/flanker/testcase.py", line 11, in <module>
    print mimepart.headers
  File "/contrib/flanker/flanker/mime/message/part.py", line 389, in headers
    return self._container.headers
  File "/contrib/flanker/flanker/mime/message/part.py", line 42, in headers
    self._load_headers()
  File "/contrib/flanker/flanker/mime/message/part.py", line 65, in _load_headers
    self.stream.seek(self.start)
TypeError: an integer is required

I'd be happy to contribute a patch if you could point me in the right direction.

Mail Exchanger lookup returns incorrect grammar rules

Suppose CompanyA buys CompanyB and maintains their email service but changes the MX servers of CompanyB to point to the MX servers of CompanyA. This throws our grammar check for a loop, are we checking the grammar of CompanyA or CompanyB?

A modern day example, AOL bought CompuServe. So when someone tries to validate [email protected] (where x is an integer) we try against AOL grammar and mark it as invalid.

We shouldn't rely completely on MX servers, but also take the domain name into consideration when checking custom grammar.

Inline attachment filename not detected

The culprit is this piece of code in flanker/mime/message/part.py:

def detected_file_name(self):
...
        cdisp = self.content_disposition
        if cdisp.value == 'attachment':
            file_name = cdisp.params.get('filename', '') or file_name

It doesn't check inline content disposition

Consistent use of Strict / Ignore / Replace for Encoding errors

For an example message:

https://gist.github.com/pquerna/b6025e307e82262fa04c/raw/c31c0e18568ccc0a9ad7df27989839be5a49b0f5/t.eml

It has non-ascii or utf8 characters in its Subject line.

This only causes an error on accessing the .subject property.

to_unicode in ./flanker/flanker/mime/message/headers/parsing.py does:

return unicode(val, 'utf-8', 'strict')

However other places that are trying to convert strings to utf8 will use ignore, for example flanker.utils.to_utf8.

Addresses with apostrophes are rejected as invalid

See the following issue on the validator-demo repo: mailgun/validator-demo#5

The basic issue is that MS Exchange allows apostrophes, so the validator should as well even though RFC grammar does not.

International characters in email-addresses / diacritics

I've received failed validation on a number of addresses containing the swedish characters åäö as well as the slightly more french é. From what I've gathered any UTF-8 is valid (https://en.wikipedia.org/wiki/International_email#Email_addresses). It might not be nice and I will consider rejecting these anyway, but what kind of ruleset does Flanker actually run on?

Unable to extract Sent emails

When attempting to extract emails from the Sent folder, the MIME parts include the original message but not the actual sent response. Is that by design?

Valid email also return as Invalid

A domain DNS is with one server where MX record has been in another server it says invalid email address

Broken parsing of email with escaped doublequote

We're using flanker to parse emails from Gmail API. Data we receive for "From:" header looks like "some\"thing" <[email protected]> and is not treated as valid email by Flanker.

I understand that this string can be invalid according to RFC, but that's what we're getting from Gmail.

Failure to parse address field encoded in iso-8859-1 (?) encoding

Original message has next From field:
From: =?iso-8859-1?Q?W=F6rz=2C_Michael?= <[email protected]>
after parsing whole message with flanker.mime.from_string():

ipdb> sndr = headers['From']
ipdb> sndr
u'W\xf6rz, Michael <[email protected]>'

and

ipdb> address.parse(sndr) is None
True

Looks like something wend wrong on step of decoding value in From header?

Equality with EmailAddress and String is broken

Flanker's EmailAddress object implements an __eq__ method to support comparison with strings, which is super convenient, but the missing __ne__ (and friends) creates some crazy behavior:

>>> email_address = address.parse('[email protected]')
>>> email_address == '[email protected]'
True
>>> email_address != '[email protected]'
True

Both of these things can't be true, obviously. The answer here is to implement __ne__ and the rest of the python comparison methods.

Large attachments

How does flanker work with large attachments? From what I saw you need to pass an string to flanker, so it means you first need to read the whole mime message into a string, therefore it will all be in memory right? Is there a way to use streams?

Allow for ( and ) as well as < and >

Some email clients use the form:

First Last ([email protected])

Flanker doesn't parse this. You may have a reason for not allowing it and if so that's fine, but since some common email clients use that form it is probably worth adding support for?

Empty body part gets mingled with the next part

It looks like according to RFC1341 section 7.2.1 and RFC822 section 4.1 two CRLF between parts should be enough, however:

>>> s  = """Content-Type: multipart/mixed; boundary="----=_20140710132934_74779"
... 
... ------=_20140710132934_74779
... Content-Type: text/plain; charset="windows-1251"
... Content-Transfer-Encoding: 8bit
... 
... ------=_20140710132934_74779
... Content-Type: application/x-zip; name="ZIP-1.zip"
... Content-Transfer-Encoding: base64
... Content-Disposition: attachment; filename="ZIP-1.zip"
... 
... xxx"""
>>> 
>>> 
>>> 
>>> m = mime.from_string(s)
>>> m.parts
deque([<flanker.mime.message.part.MimePart object at 0x2a84e90>, <flanker.mime.message.part.MimePart object at 0x2a84f10>])
>>> m.parts[0].content_type
('text/plain', {'charset': u'windows-1251'})
>>> m.parts[0].body
u'------=_20140710132934_74779\nContent-Type: application/x-zip; name="ZIP-1.zip"\nContent-Transfer-Encoding: base64\nContent-Disposition: attachment; filename="ZIP-1.zip"\n\nxxx'
>>>

Provide classifiers for PyPI that specify which Python versions are supported

Flanker Library Is Not Working

How should it work the function for validating the email

from flanker.addresslib import address
if name == 'main':
isValid = address.parse('[email protected]', addr_spec_only=True)
isValid2 = address.validate_address('[email protected]')
print isValid
print isValid2

The email [email protected] does not exist and the seconde [email protected] is my email
The result is:

[email protected]
None

it's not working

chardet is very very very slow

The character detection code in chardet is very very slow. A simple profiling of flanker parsing shows that ~85% of CPU time is spent in chardet.

This validation will hurt the IP?

I know something like https://github.com/hbattat/verifyEmail can use PHP to validate the email.

but use it oftenly will cause the IP blocked, so the amount is limited.

I want to know Flanker will hurt the IP reputation or not?

Thank you

Custom hotmail grammer for old hotmail address

Hi,

We are using the flanker validation (we are using the webservice, not selfhosted) for our webapplication and I'm getting a complaint from a user using a hotmail.nl address. His email address is:

[email protected]

I've changed the letters and numbers to anonymize the email address. The api response is:

{
  "address": "[email protected]",
  "did_you_mean": "[email protected]",
  "is_valid": false,
  "parts": {
    "display_name": null,
    "domain": null,
    "local_part": null
  }

I have tested the email address and can confirm the email address exists.

Allow Encoded Attachment Filenames

According to the HTTP RFC 2388:

Field names originally in non-ASCII character sets may be encoded   
within the value of the "name" parameter using the standard method   
described in RFC 2047.

This may occur in cases where an HTTP payload is converted in to SMTP MIME. An HTTP payload uses the Content-Type to define that the file is an "application/octet-stream". This is invalid for SMTP MIME, thus, Flanker's conversion methods.

If Content-Type value is broken or missing, Flanker will attempt to reconstruct in method fix_content_type().

As a result, if an encoded file name is present, value.lower().split("/"), could truncate the name if a slash exists in the filename.

This method should probably check and ignore encoded filenames by inspecting the string for UTF prefixes. "=?UTF-8?b?0L/RgNC+0LHQu9C10LzQsC5wbmc=?="

	for line in fp:
	if len(line) > MAX_LINE_LENGTH:
	raise DecodingError(
	"Line is too long: {0}".format(len(line)))