Giter VIP home page Giter VIP logo

Comments (6)

joeyates avatar joeyates commented on August 26, 2024 1

Hi @bentolor

That's an interesting one! If the thunderbird gem had a mailbox message iterator, the rest would just be a bit of glue and deciding on the import and export paths :)

...I'll have a look

from imap-backup.

bentolor avatar bentolor commented on August 26, 2024

Thanks @joeyates for your quick feedback and help.

Meanwhile I was able to spot the little Python-Script https://github.com/rgladwell/imap-upload/ which, after some fiddling, allowed me to upload a local MBOX export. So my immediate problem has been solved and now I realize the challenges of having a self-hosted, web/mobile full-text searchable mail archive.

I still think that for symmetry a imap-backup utils import-from-thunderbird FOLDER would be a great addition.

On the same lines was also missing a imap-backup remote accounts command lately ;-).

from imap-backup.

joeyates avatar joeyates commented on August 26, 2024

With Thunderbird, it's not sufficient to read the mailbox file itself to get the messages.

This is for two reasons.

Firstly, the mailbox may contain messages that have been deleted.

Secondly, there is an edge case regarding plain-text emails in finding message boundaries. The following is a note explaining this second problem.

Each message starts with a 'From' line e.g.:

From - Sun Jan 14 09:39:37 2024

To find the following message, it is not sufficient to search for lines with that format as lines in the email body itself may match.

If the email message is multipart (text+html) the boundary markers can be used to skip past the body, e.g.:

From - Sun Jan 14 09:36:08 2024
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
MIME-Version: 1.0
Date: Sun, 14 Jan 2024 09:35:57 +0100
Message-ID: <CAD0bxQFFegRpHErN1rAKEp1tqwxVBhb=e2UcoBRurSSYMF+Bew@mail.gmail.com>
Subject: Blah
From: Me <[email protected]>
To: You <[email protected]>
Content-Type: multipart/alternative; boundary="00000000000043120b060ee3c968"

--00000000000043120b060ee3c968
Content-Type: text/plain; charset="UTF-8"

From - Sun Jan 14 09:34:15 2024
The previous line is part of the email body!

--00000000000043120b060ee3c968
Content-Type: text/html; charset="UTF-8"

<div dir="ltr"><div>From - Sun Jan 14 09:34:15 2024</div><div>The previous line is part of the email body!</div><div><br></div></div>

--00000000000043120b060ee3c968--

This is not possible for text-only emails, which don't have boundary markers, e.g.:

From - Sun Jan 14 09:39:37 2024
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
MIME-Version: 1.0
Date: Sun, 14 Jan 2024 09:39:34 +0100
Message-ID: <CAD0bxQEb=3gAWk6Gys3FUhOURsTOns9sREQhVJVrD-Quq=gTQg@mail.gmail.com>
Subject: Blah
From: Me <[email protected]>
To: You <[email protected]>
Content-Type: text/plain; charset="UTF-8"

From - Sun Jan 14 09:34:15 2024
The previous line is part of the email body!

So, I believe that, to correctly identify the message boundaries in Thunderbird mailboxes, it is necessary to parse the associated *.msf index file.

These files contain indicators for the position and length of current messages in the mail box (msgOffset, offlineMsgSize).

Unfortunately, Thunderbird still uses the dreaded Mork file format for these files.

from imap-backup.

joeyates avatar joeyates commented on August 26, 2024

I'll leave this open in the hope that an easier solution comes to light. Otherwise, I may just write a Mork parser!

from imap-backup.

bentolor avatar bentolor commented on August 26, 2024

Thanks for your research and friendly feedback!

Mork being called out on Wikipedia as

He has lambasted the ostensibly "textual" format on the grounds that it is "not human-readable",[3] bemoaned the impossibility of writing a correct parser for the format,[4] and referred to it as "...the single most braindamaged file format that I have ever seen in my nineteen year career".[4]

I'm not sure If I'd recommend to write a Mork parser for the sake of sanity ;-)

I understood (and handled) the .msf files as throwaway-files, especially when my fulltext index got corrupted. But I also do have a few corrupted emails where i'm not aware of the source of corruption.

Reliably storing emails – how hard can it be?!?

.mbox file format familiy: Hold my beer!

from imap-backup.

joeyates avatar joeyates commented on August 26, 2024

I've added a contrib script with an example of import from Thunderbird.

from imap-backup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.