Comments (6)
Hi @bentolor
That's an interesting one! If the thunderbird
gem had a mailbox message iterator, the rest would just be a bit of glue and deciding on the import and export paths :)
...I'll have a look
from imap-backup.
Thanks @joeyates for your quick feedback and help.
Meanwhile I was able to spot the little Python-Script https://github.com/rgladwell/imap-upload/ which, after some fiddling, allowed me to upload a local MBOX export. So my immediate problem has been solved and now I realize the challenges of having a self-hosted, web/mobile full-text searchable mail archive.
I still think that for symmetry a imap-backup utils import-from-thunderbird FOLDER
would be a great addition.
On the same lines was also missing a imap-backup remote accounts
command lately ;-).
from imap-backup.
With Thunderbird, it's not sufficient to read the mailbox file itself to get the messages.
This is for two reasons.
Firstly, the mailbox may contain messages that have been deleted.
Secondly, there is an edge case regarding plain-text emails in finding message boundaries. The following is a note explaining this second problem.
Each message starts with a 'From' line e.g.:
From - Sun Jan 14 09:39:37 2024
To find the following message, it is not sufficient to search for lines with that format as lines in the email body itself may match.
If the email message is multipart (text+html) the boundary markers can be used to skip past the body, e.g.:
From - Sun Jan 14 09:36:08 2024
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
MIME-Version: 1.0
Date: Sun, 14 Jan 2024 09:35:57 +0100
Message-ID: <CAD0bxQFFegRpHErN1rAKEp1tqwxVBhb=e2UcoBRurSSYMF+Bew@mail.gmail.com>
Subject: Blah
From: Me <[email protected]>
To: You <[email protected]>
Content-Type: multipart/alternative; boundary="00000000000043120b060ee3c968"
--00000000000043120b060ee3c968
Content-Type: text/plain; charset="UTF-8"
From - Sun Jan 14 09:34:15 2024
The previous line is part of the email body!
--00000000000043120b060ee3c968
Content-Type: text/html; charset="UTF-8"
<div dir="ltr"><div>From - Sun Jan 14 09:34:15 2024</div><div>The previous line is part of the email body!</div><div><br></div></div>
--00000000000043120b060ee3c968--
This is not possible for text-only emails, which don't have boundary markers, e.g.:
From - Sun Jan 14 09:39:37 2024
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
MIME-Version: 1.0
Date: Sun, 14 Jan 2024 09:39:34 +0100
Message-ID: <CAD0bxQEb=3gAWk6Gys3FUhOURsTOns9sREQhVJVrD-Quq=gTQg@mail.gmail.com>
Subject: Blah
From: Me <[email protected]>
To: You <[email protected]>
Content-Type: text/plain; charset="UTF-8"
From - Sun Jan 14 09:34:15 2024
The previous line is part of the email body!
So, I believe that, to correctly identify the message boundaries in Thunderbird mailboxes, it is necessary to parse the associated *.msf index file.
These files contain indicators for the position and length of current messages in the mail box (msgOffset
, offlineMsgSize
).
Unfortunately, Thunderbird still uses the dreaded Mork file format for these files.
from imap-backup.
I'll leave this open in the hope that an easier solution comes to light. Otherwise, I may just write a Mork parser!
from imap-backup.
Thanks for your research and friendly feedback!
Mork being called out on Wikipedia as
He has lambasted the ostensibly "textual" format on the grounds that it is "not human-readable",[3] bemoaned the impossibility of writing a correct parser for the format,[4] and referred to it as "...the single most braindamaged file format that I have ever seen in my nineteen year career".[4]
I'm not sure If I'd recommend to write a Mork parser for the sake of sanity ;-)
I understood (and handled) the .msf
files as throwaway-files, especially when my fulltext index got corrupted. But I also do have a few corrupted emails where i'm not aware of the source of corruption.
Reliably storing emails – how hard can it be?!?
.mbox
file format familiy: Hold my beer!
from imap-backup.
I've added a contrib
script with an example of import from Thunderbird.
from imap-backup.
Related Issues (20)
- Docker DNS access HOT 3
- Clarification between migrate and mirror HOT 4
- Support for setting account (backups) to "archived" or "offline" HOT 3
- Unexpected backup data loss on re-adding legacy backup folder/account HOT 4
- No emails download HOT 8
- "--mirror" ignored when using "single backup" HOT 5
- Backup for account failed with error closed stream HOT 7
- Restore does not accept a `--delimiter` parameter HOT 5
- How is the length/offset calculated? HOT 2
- Backup of a large folder does not work (sent mails, Office 365) HOT 5
- Miration failed with "Character not allowed in mailbox name" HOT 1
- imap-backup with email-oauth2-proxy support in a container HOT 2
- Destination folders when Mirroring HOT 3
- Migrate stops without error HOT 3
- Flag refresher undefined method `each' for nil:NilClass HOT 5
- Single Backup with relying on configuration file HOT 1
- thunderbird export requires Thunderbird installaion? HOT 1
- Error during backup: incompatible character encodings: UTF-8 and ASCII-8BIT HOT 1
- Migrate full automation HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from imap-backup.