Giter VIP home page Giter VIP logo

parsemail's People

Contributors

bishopofturkey avatar coypoop avatar dusankasan avatar k-yomo avatar kevinmichaelchen avatar xgess avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsemail's Issues

Unknown multipart/mixed nested mime type: application/octet-stream

----boundary_23799_9387ca5c-694c-4010-b46e-3154585edde8
Content-Type: application/octet-stream; name="filename.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment

...

Hi Thanks for a great lib! I am using this to parse a lot of emails and found this error: Unknown multipart/mixed nested mime type: application/octet-stream if there is attachment like above. May be it is a bug

plain text attachments are parsed as if they are part of TextBody

When sending a plain text file as an attachment to an email, the parser treats it as part of the body, and will just append it to the TextBody, and leave the Attachments slice empty

example mail:

From: John Doe <[email protected]>
Content-Type: multipart/mixed; boundary="Apple-Mail=_D95A65F4-8972-4641-871C-AE6A660FEA26"
Subject: has attachment
Date: Fri, 27 Apr 2012 16:55:53 +0200
Message-Id: <[email protected]>
To: [email protected]
Mime-Version: 1.0 (Apple Message framework v1257)
X-Mailer: Apple Mail (2.1257)

--Apple-Mail=_D95A65F4-8972-4641-871C-AE6A660FEA26
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
    charset=us-ascii

test

--Apple-Mail=_D95A65F4-8972-4641-871C-AE6A660FEA26
Content-Disposition: attachment;
    filename=test.txt
Content-Type: text/plain;
    x-unix-mode=0644;
    name="test.txt"
Content-Transfer-Encoding: quoted-printable

testfile

--Apple-Mail=_D95A65F4-8972-4641-871C-AE6A660FEA26--

Expected result of parsemail.Parse:
TextBody is test, Attachments has a single attachment containing the plain-text file

Actual result:
TextBody is testtestfile, Attachments is empty

REPO UNMAINTAINED

Sadly, this repo has not been updated in more than 3 years and it doesn't look like it will ever be. The library is pretty nice, it just misses some important functionality regarding content-transfer-encoding which was provided by PRs but never merged.

So I decided to create the new fork with CI and integrate all open PRs into it:

  • #42 MERGED make encoding case-insensitive
  • #41 MERGED to allow attaching text/html attachments
  • #39 MERGED (test case only as quoted-printable implemented by another PR) adds quoted-printable
  • #38 MERGED use NextRawPart instead of NextPart and decode content the same way as the rest of the library
  • #37 MERGED TODO NEEDS MANUAL FIX TO AVOID RE-ENCODE Add 8bit and binary decoding support
  • #31 SKIPPED incomplete duplicate of 37
  • #29 MERGED Parse email with just a binary attachment and no text
  • #28 SKIPPED duplicate of 42
  • #26 MERGED Decode content with supported Content-Encoding everywhere
  • #25 SKIPPED duplicate of 26
  • #27 TODO: Support to Encapsulated Messages (message/rfc822) inside multipart/mixed
  • #24 MERGED (test case for base64 encoding only)

Some of them required manual merging but the resulting code should be correct, hopefully.

The fork is a drop-in replacement (just change the import to github.com/k3a/parsemail). I will attempt to keep the API the same. If you miss some functionality, feel free to contribute to the fork.

I've also made a separate branch with integrated PRs which is directly mergeable into this original repo in case it resurrects but I won't update that branch going forward.

Unknown multipart/mixed nested mime type: text/html

I am getting this error when trying to parse emails sent from Outlook365:

Unknown multipart/mixed nested mime type: text/html

The email configuration inside of Outlook365 is:

Server Name: smtp.office365.com
Port: 587
SSL/TLS: true

Is there any known issues with text/html mime types or with TLS?

Unable to parse email with 7bit encoded attachment

I have been using this library to parse some emails and it was working fine until I came across an example where the email had an attachment (a csv file) which was 7bit encoded which the parser was unable to handle.

I have issued this pull request which fixes it for me #8

Feature Request: Surface the filename for embedded files

Embedded files store their filename as a parameter of the content-type and content-disposition. It would be handy if the parsemail package supported extracting them conveniently.

It's not particularly difficult to do with the mime stdlib package, so I can definitely understand if this is out-of-scope of this package and would be considered scope creep.

If the feature would be appreciated I'd be happy to contribute.

An example of how I'd implement would be something like this:
https://github.com/leighmcculloch/emlx/blob/487fb6c539137ec1d4b0c91bc2cb0118db7b4be3/main.go#L72-L78

Content-Transfer-Encoding in multipart not respected

I've tried to parse an email where the Content-Transfer-Encoding was specified in the multipart headers, but it was not respected; the Email struct had both TextBody and HTMLBody as raw, undecoded base64 strings.

--_004_LO2P123MB191986C7932A77EC22F3974AA4290LO2P123MB1919GBRP_
Content-Type: multipart/alternative;
	boundary="_000_LO2P123MB191986C7932A77EC22F3974AA4290LO2P123MB1919GBRP_"

--_000_LO2P123MB191986C7932A77EC22F3974AA4290LO2P123MB1919GBRP_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64

DQpNeSBhY2NvdW50IElEIGlzIDMxNzgyLg0KDQpSZWdhcmRzDQpBaWRhbg0KDQpGcm9tOiBTdXBw
b3J0IHwgR3Vlc3RwbGFuIDxzdXBwb3J0QGd1ZXN0cGxhbi5jb20+DQpTZW50OiAwNyBKdWx5IDIw

Cannot handle Dates with additional suffix

The parser is failing to recognise dates which have an additional timezone suffix, e.g. (GMT).

As this is redundant (the timestamp should include -0700) this can be stripped off and thus allow the time string to be parsed.

I have issued a pull request to do this #9

Test does not compile

The test does not compile, this line

e, err := Parse(m)

has this error:

cannot use m (type *mail.Message) as type io.Reader in argument to Parse:
*mail.Message does not implement io.Reader (missing Read method)

What are you trying to do here? message.Message looks like this:

type Message struct {
	Header Header
	Body   io.Reader
}

You could call Parse(m.Body) but this would fail the tests because the body does not contain the headers.

Reading attachements reads nothing

When I try to read an attachement, there are no bytes returned

var content []byte
size, err := attachement.Data.Read(content)
// size = 0

The email roughly looks like this:

Content-Type: multipart/mixed; boundary="000000000000f229a405f393ef1b"
<!-- ... -->

--000000000000f229a405f393ef1b
Content-Type: multipart/alternative; boundary="000000000000f229a105f393ef19"

--000000000000f229a105f393ef19
Content-Type: text/plain; charset="UTF-8"

<!-- ... -->

--000000000000f229a105f393ef19
Content-Type: text/html; charset="UTF-8"

<!-- ... -->

--000000000000f229a105f393ef19--
--000000000000f229a405f393ef1b
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;
        name="bondsstaf 27_01_2023 JULIE.docx"
Content-Disposition: attachment; filename="bondsstaf 27_01_2023 JULIE.docx"
Content-Transfer-Encoding: base64
Content-ID: <f_ldkltcqk0>
X-Attachment-Id: f_ldkltcqk0

<!-- attachement in base64 ... -->
--000000000000f229a405f393ef1b--

Can't get extension headers

I'm writing an application where the code using the Email object needs to be able to extract extension headers; an example is "X-Thread-ID".May I have a GetHeader method, please?

can't parse correctly Content-Type:text/html

INPUT:

Content-Type: text/html; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
From: Private Person [email protected]
To: A Test User [email protected]
Subject: Hi HTML Smtpman

ICAgIDxodG1sPgogICAgPGJvZHk+CiAgICA8aDE+VGhpcyBpcyBhIHRlc3QgZS1tYWlsIG1lc3Nh
Z2UuPC9oMT4KICAgIDxib2R5PgogICAgPC9odG1sPgogICAg

OUTPUT:

email, err := parsemail.Parse(strings.NewReader(raw))

email.HTMLBody:

ICAgIDxodG1sPgogICAgPGJvZHk+CiAgICA8aDE+VGhpcyBpcyBhIHRlc3QgZS1tYWlsIG1lc3Nh
Z2UuPC9oMT4KICAgIDxib2R5PgogICAgPC9odG1sPgogICAg

Should decode base64

Failed to forward emails with Content type like file

Hey,

I know this repository is not supported, but maybe someone knows how to fix emails like that:

Content-Type: application/zip;
name="google.com!pxdmail.net!1711756800!1711843199.zip"
Content-Disposition: attachment;
filename="google.com!pxdmail.net!1711756800!1711843199.zip"
Content-Transfer-Encoding: base64

UEsDBAoAAAAIAKFMf1iNN1a6OgIAAJQHAAAwAAAAZ29vZ2xlLmNvbSFweGRtYWlsLm5ldCExNzEx
....

It goes without attachment. The body is empty there as it should be, but an attachment should be sent.

I'm currently forked https://github.com/EVANA-AG/parsemail because it works better with encoding. After all, k3a had an encoding problem that was fixed in EVANA.
I've fixed it by adding yatsenkolesh@e372177 application/zip in a switch case, but it's not a very good solution. Also if you want to steal my crutch please be aware that there is the next commit that fixing base64.

If anyone fixed it already I would appreciate stealing your solution, if not - let me know if you have the same problem, I'll ping you in case I fix it myself.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.