dusankasan / parsemail Goto Github PK
View Code? Open in Web Editor NEWSimple email parsing for Golang
License: MIT License
Simple email parsing for Golang
License: MIT License
----boundary_23799_9387ca5c-694c-4010-b46e-3154585edde8
Content-Type: application/octet-stream; name="filename.pdf"
Content-Transfer-Encoding: base64
Content-Disposition: attachment
...
Hi Thanks for a great lib! I am using this to parse a lot of emails and found this error: Unknown multipart/mixed nested mime type: application/octet-stream
if there is attachment like above. May be it is a bug
When sending a plain text file as an attachment to an email, the parser treats it as part of the body, and will just append it to the TextBody
, and leave the Attachments
slice empty
example mail:
From: John Doe <[email protected]>
Content-Type: multipart/mixed; boundary="Apple-Mail=_D95A65F4-8972-4641-871C-AE6A660FEA26"
Subject: has attachment
Date: Fri, 27 Apr 2012 16:55:53 +0200
Message-Id: <[email protected]>
To: [email protected]
Mime-Version: 1.0 (Apple Message framework v1257)
X-Mailer: Apple Mail (2.1257)
--Apple-Mail=_D95A65F4-8972-4641-871C-AE6A660FEA26
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
test
--Apple-Mail=_D95A65F4-8972-4641-871C-AE6A660FEA26
Content-Disposition: attachment;
filename=test.txt
Content-Type: text/plain;
x-unix-mode=0644;
name="test.txt"
Content-Transfer-Encoding: quoted-printable
testfile
--Apple-Mail=_D95A65F4-8972-4641-871C-AE6A660FEA26--
Expected result of parsemail.Parse
:
TextBody
is test
, Attachments
has a single attachment containing the plain-text file
Actual result:
TextBody
is testtestfile
, Attachments
is empty
Sadly, this repo has not been updated in more than 3 years and it doesn't look like it will ever be. The library is pretty nice, it just misses some important functionality regarding content-transfer-encoding which was provided by PRs but never merged.
So I decided to create the new fork with CI and integrate all open PRs into it:
Some of them required manual merging but the resulting code should be correct, hopefully.
The fork is a drop-in replacement (just change the import to github.com/k3a/parsemail
). I will attempt to keep the API the same. If you miss some functionality, feel free to contribute to the fork.
I've also made a separate branch with integrated PRs which is directly mergeable into this original repo in case it resurrects but I won't update that branch going forward.
I am getting this error when trying to parse emails sent from Outlook365:
Unknown multipart/mixed nested mime type: text/html
The email configuration inside of Outlook365 is:
Server Name: smtp.office365.com
Port: 587
SSL/TLS: true
Is there any known issues with text/html
mime types or with TLS?
I have been using this library to parse some emails and it was working fine until I came across an example where the email had an attachment (a csv file) which was 7bit encoded which the parser was unable to handle.
I have issued this pull request which fixes it for me #8
Subject is parsed directly from mail source without any decoding, so if it is encoded as =?UTF-8?B?...
then you get it as it is
Embedded files store their filename as a parameter of the content-type and content-disposition. It would be handy if the parsemail package supported extracting them conveniently.
It's not particularly difficult to do with the mime
stdlib package, so I can definitely understand if this is out-of-scope of this package and would be considered scope creep.
If the feature would be appreciated I'd be happy to contribute.
An example of how I'd implement would be something like this:
https://github.com/leighmcculloch/emlx/blob/487fb6c539137ec1d4b0c91bc2cb0118db7b4be3/main.go#L72-L78
I've tried to parse an email where the Content-Transfer-Encoding was specified in the multipart headers, but it was not respected; the Email
struct had both TextBody
and HTMLBody
as raw, undecoded base64 strings.
--_004_LO2P123MB191986C7932A77EC22F3974AA4290LO2P123MB1919GBRP_
Content-Type: multipart/alternative;
boundary="_000_LO2P123MB191986C7932A77EC22F3974AA4290LO2P123MB1919GBRP_"
--_000_LO2P123MB191986C7932A77EC22F3974AA4290LO2P123MB1919GBRP_
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
DQpNeSBhY2NvdW50IElEIGlzIDMxNzgyLg0KDQpSZWdhcmRzDQpBaWRhbg0KDQpGcm9tOiBTdXBw
b3J0IHwgR3Vlc3RwbGFuIDxzdXBwb3J0QGd1ZXN0cGxhbi5jb20+DQpTZW50OiAwNyBKdWx5IDIw
The parser is failing to recognise dates which have an additional timezone suffix, e.g. (GMT).
As this is redundant (the timestamp should include -0700) this can be stripped off and thus allow the time string to be parsed.
I have issued a pull request to do this #9
The test does not compile, this line
Line 255 in abc6488
has this error:
cannot use m (type *mail.Message) as type io.Reader in argument to Parse:
*mail.Message does not implement io.Reader (missing Read method)
What are you trying to do here? message.Message
looks like this:
type Message struct {
Header Header
Body io.Reader
}
You could call Parse(m.Body)
but this would fail the tests because the body does not contain the headers.
Could the latest commit be tagged as a new version.
When I try to read an attachement, there are no bytes returned
var content []byte
size, err := attachement.Data.Read(content)
// size = 0
The email roughly looks like this:
Content-Type: multipart/mixed; boundary="000000000000f229a405f393ef1b"
<!-- ... -->
--000000000000f229a405f393ef1b
Content-Type: multipart/alternative; boundary="000000000000f229a105f393ef19"
--000000000000f229a105f393ef19
Content-Type: text/plain; charset="UTF-8"
<!-- ... -->
--000000000000f229a105f393ef19
Content-Type: text/html; charset="UTF-8"
<!-- ... -->
--000000000000f229a105f393ef19--
--000000000000f229a405f393ef1b
Content-Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document;
name="bondsstaf 27_01_2023 JULIE.docx"
Content-Disposition: attachment; filename="bondsstaf 27_01_2023 JULIE.docx"
Content-Transfer-Encoding: base64
Content-ID: <f_ldkltcqk0>
X-Attachment-Id: f_ldkltcqk0
<!-- attachement in base64 ... -->
--000000000000f229a405f393ef1b--
I'm writing an application where the code using the Email object needs to be able to extract extension headers; an example is "X-Thread-ID".May I have a GetHeader method, please?
The sender of the qq mailbox is X-QQ-ORGSender instead of Sender
INPUT:
Content-Type: text/html; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
From: Private Person [email protected]
To: A Test User [email protected]
Subject: Hi HTML Smtpman
ICAgIDxodG1sPgogICAgPGJvZHk+CiAgICA8aDE+VGhpcyBpcyBhIHRlc3QgZS1tYWlsIG1lc3Nh
Z2UuPC9oMT4KICAgIDxib2R5PgogICAgPC9odG1sPgogICAg
OUTPUT:
email, err := parsemail.Parse(strings.NewReader(raw))
email.HTMLBody:
ICAgIDxodG1sPgogICAgPGJvZHk+CiAgICA8aDE+VGhpcyBpcyBhIHRlc3QgZS1tYWlsIG1lc3Nh
Z2UuPC9oMT4KICAgIDxib2R5PgogICAgPC9odG1sPgogICAg
Should decode base64
I got the error when parsing an inbound email
Unknown top level mime type: multipart/related
Hey,
I know this repository is not supported, but maybe someone knows how to fix emails like that:
Content-Type: application/zip;
name="google.com!pxdmail.net!1711756800!1711843199.zip"
Content-Disposition: attachment;
filename="google.com!pxdmail.net!1711756800!1711843199.zip"
Content-Transfer-Encoding: base64
UEsDBAoAAAAIAKFMf1iNN1a6OgIAAJQHAAAwAAAAZ29vZ2xlLmNvbSFweGRtYWlsLm5ldCExNzEx
....
It goes without attachment. The body is empty there as it should be, but an attachment should be sent.
I'm currently forked https://github.com/EVANA-AG/parsemail because it works better with encoding. After all, k3a had an encoding problem that was fixed in EVANA.
I've fixed it by adding yatsenkolesh@e372177 application/zip in a switch case, but it's not a very good solution. Also if you want to steal my crutch please be aware that there is the next commit that fixing base64.
If anyone fixed it already I would appreciate stealing your solution, if not - let me know if you have the same problem, I'll ping you in case I fix it myself.
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.