Outlook, Mail Archives, and Duplicates

Exchange and Outlook are dismal examples of code, but the fact remains that they are ubiquitous. Nobody has managed to create a mail/calendar/contacts/task application with wider adoption, and it has enough inertia that well designed applications have little chance to make inroads, which means a lot of people are stuck with it. For those of us who prefer elegant, well designed applications, putting up with their quirks is maddening.

Outlook, for example, has a hard-to-explain 2 gigabyte limit on mail archives — and mail archives are arguably one of the niftier features that Outlook offers. Early versions of Outlook don’t know any better, and simply corrupt your mail archives. Later versions of Outlook know better, and warn you not to exceed the limit. While some noise has been made about Outlook finally removing the 2 gigabye limit, it’s actually not quite true, it’s only been removed for Exchange style mailboxes, and is still there, for example, for imap mail boxes.

For those of us with lots of mail and the need to archive it (I receive a lot of technical documents, some very large, via email) using Outlook’s built-in “archives” isn’t really an option, so I used the simple expedient of setting up an archive IMAP server, where the size wouldn’t be an issue. While this works reasonably well going forward, Outlook puked enough while trying to move messages from its proprietary formats to imap, that I was left with a vast number of duplicates.

On a significantly large mailbox, this is a bigger problem than it sounds like — especially since the duplicates were created with different mail id’s, and in many cases the white space or envelopes are different, while the messages are clearly identical. Maddening, but it largely means that any automated duplicate removal will have to happen through IMAP, not through the filesystem.

While it seems that a tool to locate and eliminate duplicate IMAP emails would be simple to find, it appears that such a beast simply does not exist, except for the trivial case in which the message id’s are identical. At the imap level, there are a decent number of tools here:

http://www.athensfbc.com/imap_tools/

Which work admirably, for the most part. For the remainder, I used this Thunderbird Add-on, which took care of the remaining fringe cases. The only problem, of course, is that on a really large email folder, Thunderbird starts to complain endlessly about script timeouts. However, you shouldn’t really need to do this regularly.

Share