By Curtis Jones in tech — 02 Apr 2009

Mail

Due to years of POP3 use and a rough transition to IMAP, I ended up with thousands of duplicate emails scattered across hundreds of mailboxes. Much like my desire to replace my boss at the AJC with a small shell script, I reduced this task to something similar: mail.pl.

Assuming you save it to your Downloads directory, and after you update the $DUPLICATES value near the top of the script, you can run it like this:

find ~/Library/Mail -name Messages | while read FILE; do \
echo "$FILE"; \
perl ~/Downloads/mail.pl "$FILE"; \
done

It looks at all of the emails in a single mailbox at a time. It parses the MIME headers for each email in the mailbox. If the Date matches (to the second) and the Subject matches completely, then it has found a duplicate. One of them is moved to the Duplicates directory. Unfortunately, I can't rely on the file size as part of the comparison because it seems that some mail programs append their own crap onto the end of the email. Also, a cautious person might choose to compare From as well, but it doesn't do that at the moment.

Remember, this is only effective on LOCAL mailboxes. If the mailboxes you point the script to are part of an IMAP account, then you'll just be screwing around with the local copies (which is not the same as a local mailbox) which will probably only piss off Mail and it certainly won't change anything on the IMAP server. Thus, if you have duplicates in an IMAP account you need to COPY down the IMAP mailboxes so that they're On-My-Mac (or whatever), run the script, verify the results, remove the mailbox(es) from the IMAP account and then COPY up the duplicate-free mailbox(es) back to the IMAP account.

This script isn't Mac- or Mail-only in any sense other than it relies on the file names ending in ".emlx" and the file containing the raw email (mime headers followed by other crap).

Use with caution.

Subscribe to A garage sale for your mind