Live Browser : a Python web app using Microsoft Live Connect API

5 months ago I was called by a recruiter for a position in a startup building cloud-computing solutions. At the end of my first interview with the engineers of the company, I was asked to write a little web application to test my technical abilities.

The goal was to create a back-end talking to Microsoft’s Live Connect API and keep a cache of user profiles. Then a front-end demonstrating my HTML/CSS/JS know-how was to be built. User authentication was supposed to use OAuth.

The only technological constraint was to use Python. I decided to use CherryPy and Mako to leverage the boilerplate code I just released back then. For the persistent layer, my first intention was to use SQLAlchemy, but quickly switched to MongoDB as I never played with it and this project was a great opportunity to.

If my web app was far from finished, it was still well-received by the team. After other interviews I was made an competitive offer. I finally declined as I wanted to finish what I stated at my current company.

What’s left of this experience is Live Browser, the web app I created, which source code is now available on GitHub.

How-to merge Mailman mailing-lists

Let’s say I have an old inactive mailing list (which ID is old-ml) for which I want to merge its archives to another one (called active-ml).

To do so, I have to merge the two mbox files holding all mails since the creation of these mailing-lists. I first tried to use cat to concatenate the two mbox files be it didn’t work.

Luckily, I found a Python script to merge 2 mbox files while sorting all mails by date. Here is how I uses it:

$ cd /var/lib/mailman/archives/private
$ wget http://mail.python.org/pipermail/mailman-users/attachments/20080322/80455064/attachment.txt --output-document=mbmerge.py
$ python ./mbmerge.py ./old-ml.mbox/old-ml.mbox ./active-ml.mbox/active-ml.mbox > ./active-ml.mbox/active-ml.mbox.new

Then I switched the current mbox with the one generated above and asked mailman to regenerate the static HTML archives:

$ cd /var/lib/mailman/archives/private/active-ml.mbox/
$ mv active-ml.mbox active-ml.mbox.backup
$ mv active-ml.mbox.new active-ml.mbox
$ chown list:list active-ml.mbox*
$ /usr/lib/mailman/bin/arch --wipe active-ml

Of course this will only merge mail archives. You still have to merge your old mailing lists parameters (including membership) manually.

At last, when everything is clean to you, you can safely remove your old mailing-list:

$ rmlist -a old-ml
$ /var/lib/mailman/bin/genaliases

How I Open-Sourced an Internal Corporate Project (WebPing)

2 weeks ago I released WebPing. This article is more or less the same I wrote 4 months ago when I released the FTT project and needed to move it from SVN to Git. But this time I added more details on how I removed all sensible informations that were hard-coded in the project files.

Subversion to Git migration

Everything starts out of a local copy of the Subversion repository that was hosting the WebPing project since its inception:

$ rm -rf ./svn-repository-copy
$ tar xvzf ./svn-repository-copy.tar.gz
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./svn-repository-copy

Let’s initialize a Git repository:

$ rm -rf ./webping-git
$ mkdir ./webping-git
$ cd ./webping-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"

We now migrate the code from Subversion to Git:

$ git svn init --no-metadata --username deldycke svn://localhost:3690
$ git svn fetch
$ git rebase --onto git-svn master
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

Removing unrelated files and folders

As WebPing was not alone in the original Subversion repository, we need to clean up the latter and only keep code of the former. Worse, WebPing didn’t started its life in a dedicated subfolder, but as a tool of another project, and jumped from folders to folders. After identifying in the history all places were WebPing lived once, I came up with this big, convoluted command line to do the cleaning:

$ git filter-branch --force --prune-empty --tree-filter 'find ./ -not -ipath "*webping*" -and -not -path "./other-project/trunk/tools/web-ping*" -and -not -path "./other-project/trunk/tools" -and -not -path "./other-project/trunk" -and -not -path "./other-project" -and -not -path "./.git*" -and -not -path "./" | xargs rm -rf' -- --all

Strangely enough, my init tag went of after the command above. So I had to rebased it to get it in line:

$ git rebase init master

We can now remove SVN tags and branches, get rid of the imported git-svn branch, and clean up our Git repository:

$ git filter-branch --force --prune-empty --tree-filter 'find -path "./WebPing/tags*" | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find -path "./WebPing/branches*" | xargs rm -rf' -- --all
$ git branch -r -D git-svn
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

If I now only have WebPing code in the repository, it still jumps through the history between these following locations:

  • other-project/trunk/tools/web-ping.py
  • other-project/trunk/tools/web-ping/
  • WebPing/trunk/

Using a series of git filter-branch invocations, I managed to move everything to the root of the repository:

$ git filter-branch --force --prune-empty --tree-filter 'test -d ./other-project/trunk/tools && cp -axv ./other-project/trunk/tools/* ./ && rm -rf ./other-project/trunk/tools || echo "No tools folder found"' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'test -d ./other-project/trunk/tools/web-ping && cp -axv ./other-project/trunk/tools/web-ping/* ./ && rm -rf ./other-project/trunk/tools/web-ping || echo "No web-ping folder found"' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'test -d ./WebPing/trunk && cp -axv ./WebPing/trunk/* ./ && rm -rf ./WebPing/trunk || echo "No trunk folder found"' -- --all

Hide and obfuscate hard-coded content

As WebPing was created for internal needs in my previous job, its original code base contains lots of references to the former infrastructure it lives in. My professional standards requires me to remove all these sensible informations before making WebPing available to the public.

For example, here is the commands which allowed me to remove all references to hostnames of our intranets:

$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec perl -i -pe "s/([\w-.]*?)\.(company(-intranet|-extention)?)\.(fr|com|net|org)/intranet\.example\.com/g" "{}" \;' -- --all

The Perl one-liner embedded in the command above will only apply the regular expression on a line-by-line basis. If you want to have the regexp applied on the whole content of each file, you have to use Perl’s slurp mode (source of that tip):

$ git filter-branch --force --prune-empty --tree-filter 'perl -0777 -i -pe "s/MAILING_LIST\s*=\s*\[(.*?)\]/MAILING_LIST = \[\]/gs" ./web-ping.py' -- --all

The specific example above helped me removed the content of the MAILING_LIST Python list found in web-ping.py, in order to protect from spam the email addresses of my former co-workers that were unfortunately hard-coded in that variable.

Another place to hunt for sensible information is commit messages. These can be easily modified thanks to the --msg-filter option. Here is how I removed references to our internal Trac tickets:

$ git filter-branch --force --msg-filter 'sed "s/ (see ticket:666)//g"' -- --all

I also had to remove line returns introduced by abusive usage of Windows text editors (remember, WebPing was born in a corporate environment):

$ git filter-branch --force --prune-empty --tree-filter 'perl -i -pe "s/\r//" ./*' -- --all

The last useful command I use was the following, to fix author’s name and email:

$ git filter-branch --force --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "deldycke" ]
      then
        export GIT_AUTHOR_NAME="Kevin Deldycke"
        export GIT_AUTHOR_EMAIL="kevin@deldycke.com"
    fi
    if [ "$GIT_AUTHOR_NAME" = "diehr" ]
      then
        export GIT_AUTHOR_NAME="Matthieu Diehr"
        export GIT_AUTHOR_EMAIL="matthieu.diehr@gmail.com"
    fi
  ' -- --all

By using a dozen variations of the commands above, and carefully reviewing the code, I was able to engineer a clean code history.

But I certainly have been a little too blunt with these regular expressions. Some of them were able to act on binary content. As a result, I had to restore static images to their original copy.

Final steps

Now that your code is clean, all you need is to recreate you tag and fix the init tag date before committing everything to GitHub:

$ git tag -f "0.0" bad4ff7fc48b8b34f6f661d75c782c7fc0d098c5
$ git tag -f "0.1" 590ac9953df0e3bc76fd02615471e36a9796a065
$ git tag -f "0.2" 33f731054042b02c6d2600e7aead5bb7c4991b12
$ git filter-branch --env-filter '
      if [ $GIT_COMMIT = 361224542bc73bba747c7ca382e992e2cdd0c356 ]
      then
          export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
          export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
      fi' -- --all
$ git remote add origin git@github.com:kdeldycke/webping.git
$ git push -u origin master
$ git push --tags

CherryPy + Mako + Formish + OOOP boilerplate

After WebPing last week, here is another release of Open-Source code. This time it’s my boilerplate codebase I created to integrate some Python components with the goal of publishing OpenERP content on the web.

This stack is composed of:

  • CherryPy to serve web content,
  • Mako for HTML templating,
  • Formish for HTML form generation and validation,
  • OOOP to talk to OpenERP server via web services.

This project contains the experiments I did while working at Smile, when I explored the possibility of integrating these components. This code was a proof-of-concept that we leveraged later for a highly specific OpenERP project.

Because of the highly experimental nature of this project, it contains lots of stupid and failed attempts. The whole code base should be thoroughly cleaned up before it can be considered reusable.

All that code is available in a GitHub repository, under a GPL v2 license.

WebPing Open-sourced !

I’ve just released WebPing under a GPL license. It’s available right now on a GitHub repository.

WebPing is a script I started to work on in 2009 while working at EDF. Back then, I needed a monitoring tool to keep an eye on the 80+ Plone instances that my team managed. For several corporate reasons, I wasn’t allowed to use a proper monitoring tool like Munin or Nagios. So I created a small script to fill this need. That’s how WebPing came to be.

WebPing is just a stupid Python script that is designed to be ticked regularly by a cron job. It try to fetch a list of URLs and store response times in an SQLite database. Then it create a static HTML report you’re free to serve with any HTTP server (an example Apache configuration is provided). The configuration of WebPing and the list of URLs it monitor is stored in a YAML file.

The produced HTML report use the Flot jQuery plugin to render graphs. Here is how the dashboard looks like:

Finally, WebPing is able to send reports and alerts by emails. Here is how a mail alert looks like:

Since I created WebPing, I found several other projects more or less developed around the same idea. See Kong, which is based on Django and Twill, a web-oriented DSL. Another project I spotted after the facts was multi-mechanize. Like Kong, it’s written in Python. But I never played with one or the other.

Feed Tracking Tool released under an Open-Source license

I’ve just open-sourced the Feed Tracking Tool project (aka “FTT”), my first (and only) Ruby on Rails experience.

This tool was developed within Uperto, the company I currently work for, for its internal needs. The project had an ancestor written in 2006 that was based on Pylons. It was a prototype and was barely working. Iterating over the abandoned Python code base was considered a waste of time. So in summer 2007, it was decided to rewrite this application from scratch.

As my co-worker was available and already played with Ruby on Rails, he was tasked to create the initial code base. I joined the project early on, as it was a great opportunity to play with the (then really trendy) Ruby on Rails framework.

At the end FTT was essentially a test project to explore Ruby on Rails. It was never deployed on a production server and was never used.

After roting for more than 3 years, and representing absolutely no business value in itself, I decided to release it under a GPLv2 license (with Uperto’s approval of course). My intention with this open-source release is to share back knowledge and code with the community.

FTT was living in a private Subversion repository at Uperto, but we unfortunately lost it. During the last few weeks I tried to rebuild the code history from old and partial backups. I then used my Git-based reconstruction method to consolidate everything in a Git repository. The code is now available on GitHub.

I don’t plan to maintain this project. But I may reboot it in the future if I need feed-related features, or if I need an excuse to play with Ruby on Rails again. But for now beware: the code is quite outdated and is only running on old Rails 1.2.x. This project should be considered as an ugly legacy code base. So please be indulgent while looking at FTT’s code: it was the work of unexperienced RoR developers ! ;)