How-To Backup Gmail with OfflineImap

Gmail’s content can be retrieved via IMAP, and we’ll use this way to backup all our mails thanks to OfflineImap, a generic IMAP synchronization utility.

Let’s start by creating a dedicated configuration file in your home directory. Its content is quite straight-forward, as you can see in my /home/kevin/.offlineimaprc, which backup two Gmail accounts:

[general]
accounts = gmail_account1, gmail_account2
maxsyncaccounts = 3
ui = Noninteractive.Basic

[Account gmail_account1]
localrepository = gmail_account1_local
remoterepository = gmail_account1_remote

[Repository gmail_account1_local]
type = Maildir
localfolders = ~/gmail-backup-account1

[Repository gmail_account1_remote]
type = IMAP
remotehost = imap.gmail.com
remoteport = 993
remoteuser = account1@gmail.com
remotepass = XXXXXXXX
ssl = yes
maxconnections = 1
realdelete = no
folderfilter = lambda foldername: foldername not in ['[Gmail]/%s' % f for f in ['All Mail', 'Trash', 'Spam', 'Starred', 'Important']]

[Account gmail_account2]
localrepository = gmail_account2_local
remoterepository = gmail_account2_remote

[Repository gmail_account2_local]
type = Maildir
localfolders = ~/gmail-backup-account2

[Repository gmail_account2_remote]
type = IMAP
remotehost = imap.gmail.com
remoteport = 993
remoteuser = account2@gmail.com
remotepass = XXXXXXXX
ssl = yes
maxconnections = 1
realdelete = no
folderfilter = lambda foldername: foldername not in ['[Gmail]/%s' % f for f in ['All Mail', 'Trash', 'Spam', 'Starred', 'Important']]

Notice how we use a Python lambda expressions to filter out some Gmail’s virtual folders.

Then all you have to do is to launch the offlineimap command-line itself with the right user, for example in a cron job:

00 20 * * * kevin offlineimap

A final warning: OfflineImap is fully bi-directional. This mean local deletion propagates to the remote server. This is can be quite dangerous so be careful not touching your local folders. If for any reason you’d like to reset your backups, stop OfflineImap processes first, then remove its cache folder (~/.offlineimap/) before removing the local folders themselves (~/gmail-backup-account*).

Also, intensively playing with OfflineImap to adjust its configuration may trigger the infamous Gmail’s “Temporary Error 500″. In this case don’t panic: it seems to be a common Gmail’s auto-immune response against suspect activity. It happened to me and in the end my account and mails were safe: I just had to wait a few hours to let it resume normal operations.

Increase OpenERP 6.0 web-client session timeout

Another week working with OpenERP means another trick learned to answer some intricate customer’s needs.

Today I was asked to keep users logged-in on OpenERP’s 6.0 web client. The latter being powered by CherryPy, it was a matter of adding the following configuration directive in the web client configuration file to increase the session timeout:

tools.sessions.timeout = 720

Now this will keep any client sessions opened for 12 hours (12h * 60 minutes = 720 minutes) before expiring. This is enough to keep employees not complaining about having to login to OpenERP several times a day.

Problem solved !

Oh, and another way to address this issue consist in implementing some kind of Single Sign-On. And you know what ? We have that in store thanks to the smile_sso module for OpenERP ! :)

Live Browser : a Python web app using Microsoft Live Connect API

5 months ago I was called by a recruiter for a position in a startup building cloud-computing solutions. At the end of my first interview with the engineers of the company, I was asked to write a little web application to test my technical abilities.

The goal was to create a back-end talking to Microsoft’s Live Connect API and keep a cache of user profiles. Then a front-end demonstrating my HTML/CSS/JS know-how was to be built. User authentication was supposed to use OAuth.

The only technological constraint was to use Python. I decided to use CherryPy and Mako to leverage the boilerplate code I just released back then. For the persistent layer, my first intention was to use SQLAlchemy, but quickly switched to MongoDB as I never played with it and this project was a great opportunity to.

If my web app was far from finished, it was still well-received by the team. After other interviews I was made an competitive offer. I finally declined as I wanted to finish what I stated at my current company.

What’s left of this experience is Live Browser, the web app I created, which source code is now available on GitHub.

How-to merge Mailman mailing-lists

Let’s say I have an old inactive mailing list (which ID is old-ml) for which I want to merge its archives to another one (called active-ml).

To do so, I have to merge the two mbox files holding all mails since the creation of these mailing-lists. I first tried to use cat to concatenate the two mbox files be it didn’t work.

Luckily, I found a Python script to merge 2 mbox files while sorting all mails by date. Here is how I uses it:

$ cd /var/lib/mailman/archives/private
$ wget http://mail.python.org/pipermail/mailman-users/attachments/20080322/80455064/attachment.txt --output-document=mbmerge.py
$ python ./mbmerge.py ./old-ml.mbox/old-ml.mbox ./active-ml.mbox/active-ml.mbox > ./active-ml.mbox/active-ml.mbox.new

Then I switched the current mbox with the one generated above and asked mailman to regenerate the static HTML archives:

$ cd /var/lib/mailman/archives/private/active-ml.mbox/
$ mv active-ml.mbox active-ml.mbox.backup
$ mv active-ml.mbox.new active-ml.mbox
$ chown list:list active-ml.mbox*
$ /usr/lib/mailman/bin/arch --wipe active-ml

Of course this will only merge mail archives. You still have to merge your old mailing lists parameters (including membership) manually.

At last, when everything is clean to you, you can safely remove your old mailing-list:

$ rmlist -a old-ml
$ /var/lib/mailman/bin/genaliases

How I Open-Sourced an Internal Corporate Project (WebPing)

2 weeks ago I released WebPing. This article is more or less the same I wrote 4 months ago when I released the FTT project and needed to move it from SVN to Git. But this time I added more details on how I removed all sensible informations that were hard-coded in the project files.

Subversion to Git migration

Everything starts out of a local copy of the Subversion repository that was hosting the WebPing project since its inception:

$ rm -rf ./svn-repository-copy
$ tar xvzf ./svn-repository-copy.tar.gz
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./svn-repository-copy

Let’s initialize a Git repository:

$ rm -rf ./webping-git
$ mkdir ./webping-git
$ cd ./webping-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"

We now migrate the code from Subversion to Git:

$ git svn init --no-metadata --username deldycke svn://localhost:3690
$ git svn fetch
$ git rebase --onto git-svn master
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

Removing unrelated files and folders

As WebPing was not alone in the original Subversion repository, we need to clean up the latter and only keep code of the former. Worse, WebPing didn’t started its life in a dedicated subfolder, but as a tool of another project, and jumped from folders to folders. After identifying in the history all places were WebPing lived once, I came up with this big, convoluted command line to do the cleaning:

$ git filter-branch --force --prune-empty --tree-filter 'find ./ -not -ipath "*webping*" -and -not -path "./other-project/trunk/tools/web-ping*" -and -not -path "./other-project/trunk/tools" -and -not -path "./other-project/trunk" -and -not -path "./other-project" -and -not -path "./.git*" -and -not -path "./" | xargs rm -rf' -- --all

Strangely enough, my init tag went of after the command above. So I had to rebased it to get it in line:

$ git rebase init master

We can now remove SVN tags and branches, get rid of the imported git-svn branch, and clean up our Git repository:

$ git filter-branch --force --prune-empty --tree-filter 'find -path "./WebPing/tags*" | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find -path "./WebPing/branches*" | xargs rm -rf' -- --all
$ git branch -r -D git-svn
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

If I now only have WebPing code in the repository, it still jumps through the history between these following locations:

  • other-project/trunk/tools/web-ping.py
  • other-project/trunk/tools/web-ping/
  • WebPing/trunk/

Using a series of git filter-branch invocations, I managed to move everything to the root of the repository:

$ git filter-branch --force --prune-empty --tree-filter 'test -d ./other-project/trunk/tools && cp -axv ./other-project/trunk/tools/* ./ && rm -rf ./other-project/trunk/tools || echo "No tools folder found"' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'test -d ./other-project/trunk/tools/web-ping && cp -axv ./other-project/trunk/tools/web-ping/* ./ && rm -rf ./other-project/trunk/tools/web-ping || echo "No web-ping folder found"' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'test -d ./WebPing/trunk && cp -axv ./WebPing/trunk/* ./ && rm -rf ./WebPing/trunk || echo "No trunk folder found"' -- --all

Hide and obfuscate hard-coded content

As WebPing was created for internal needs in my previous job, its original code base contains lots of references to the former infrastructure it lives in. My professional standards requires me to remove all these sensible informations before making WebPing available to the public.

For example, here is the commands which allowed me to remove all references to hostnames of our intranets:

$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec perl -i -pe "s/([\w-.]*?)\.(company(-intranet|-extention)?)\.(fr|com|net|org)/intranet\.example\.com/g" "{}" \;' -- --all

The Perl one-liner embedded in the command above will only apply the regular expression on a line-by-line basis. If you want to have the regexp applied on the whole content of each file, you have to use Perl’s slurp mode (source of that tip):

$ git filter-branch --force --prune-empty --tree-filter 'perl -0777 -i -pe "s/MAILING_LIST\s*=\s*\[(.*?)\]/MAILING_LIST = \[\]/gs" ./web-ping.py' -- --all

The specific example above helped me removed the content of the MAILING_LIST Python list found in web-ping.py, in order to protect from spam the email addresses of my former co-workers that were unfortunately hard-coded in that variable.

Another place to hunt for sensible information is commit messages. These can be easily modified thanks to the --msg-filter option. Here is how I removed references to our internal Trac tickets:

$ git filter-branch --force --msg-filter 'sed "s/ (see ticket:666)//g"' -- --all

I also had to remove line returns introduced by abusive usage of Windows text editors (remember, WebPing was born in a corporate environment):

$ git filter-branch --force --prune-empty --tree-filter 'perl -i -pe "s/\r//" ./*' -- --all

The last useful command I use was the following, to fix author’s name and email:

$ git filter-branch --force --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "deldycke" ]
      then
        export GIT_AUTHOR_NAME="Kevin Deldycke"
        export GIT_AUTHOR_EMAIL="kevin@deldycke.com"
    fi
    if [ "$GIT_AUTHOR_NAME" = "diehr" ]
      then
        export GIT_AUTHOR_NAME="Matthieu Diehr"
        export GIT_AUTHOR_EMAIL="matthieu.diehr@gmail.com"
    fi
  ' -- --all

By using a dozen variations of the commands above, and carefully reviewing the code, I was able to engineer a clean code history.

But I certainly have been a little too blunt with these regular expressions. Some of them were able to act on binary content. As a result, I had to restore static images to their original copy.

Final steps

Now that your code is clean, all you need is to recreate you tag and fix the init tag date before committing everything to GitHub:

$ git tag -f "0.0" bad4ff7fc48b8b34f6f661d75c782c7fc0d098c5
$ git tag -f "0.1" 590ac9953df0e3bc76fd02615471e36a9796a065
$ git tag -f "0.2" 33f731054042b02c6d2600e7aead5bb7c4991b12
$ git filter-branch --env-filter '
      if [ $GIT_COMMIT = 361224542bc73bba747c7ca382e992e2cdd0c356 ]
      then
          export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
          export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
      fi' -- --all
$ git remote add origin git@github.com:kdeldycke/webping.git
$ git push -u origin master
$ git push --tags

CherryPy + Mako + Formish + OOOP boilerplate

After WebPing last week, here is another release of Open-Source code. This time it’s my boilerplate codebase I created to integrate some Python components with the goal of publishing OpenERP content on the web.

This stack is composed of:

  • CherryPy to serve web content,
  • Mako for HTML templating,
  • Formish for HTML form generation and validation,
  • OOOP to talk to OpenERP server via web services.

This project contains the experiments I did while working at Smile, when I explored the possibility of integrating these components. This code was a proof-of-concept that we leveraged later for a highly specific OpenERP project.

Because of the highly experimental nature of this project, it contains lots of stupid and failed attempts. The whole code base should be thoroughly cleaned up before it can be considered reusable.

All that code is available in a GitHub repository, under a GPL v2 license.