Some months ago I wrote a tiny Python script which scan all folders and sub-folders of a Maildir, then remove duplicate mails.
You can give the script a list of email headers to ignore while it compares mails between each others. This is particularly helpful to find duplicate mails having the exact same content but different headers/metadatas.
I created this script to clean up a Maildir folder I messed up after moving repeatedly tons of mails from a Lotus Notes database. As you can see below, the same mail imported twice contain a variable header based on the date and time the import was performed:
This variable header make mails looks different from the point of view of the
script. Thatβs explain why I implemented the
HEADERS_TO_IGNORE
parameter with
the default set to
X-MIMETrack
.
The script is available on GitHub . It was tested on Mac OS X Snow Leopard with python 2.6.2 but should work on other systems and versions as the code is really simple (and stupid).