How-to generate PDF from Markdown

Pandoc

The first tool you can use to convert a Markdown file to PDF is Pandoc.

To install Pandoc and all its dependencies on my Ubuntu 11.04, I used the following command:

$ aptitude install pandoc nbibtex texlive-latex-base texlive-latex-recommended texlive-latex-extra preview-latex-style dvipng texlive-fonts-recommended

Then I applied the PDF transformation on the README.md file from my openerp.buildout GitHub project:

$ wget https://raw.github.com/kdeldycke/openerp.buildout/master/README.md
$ markdown2pdf README.md -o readme-pandoc.pdf

The result is good, but not perfect. For example code blocks with long lines don’t break at the end of the page:

While trying to solve this issue, I stumble upon another tool…

Gimli

Gimli is an utility that was explicitly written with GitHub in mind.

Gimli is written in Ruby, so let’s install it the Ruby way:

$ aptitude install rubygems wkhtmltopdf
$ gem install gimli

Then we can convert our Markdown file to a PDF. The following will generate a README.pdf file in the current folder:

$ /var/lib/gems/1.8/bin/gimli -f ./README.md

The resulting PDF is really close to how GitHub renders Markdown content on its website. And it solve the bad code block style of Pandoc:

Live Browser : a Python web app using Microsoft Live Connect API

5 months ago I was called by a recruiter for a position in a startup building cloud-computing solutions. At the end of my first interview with the engineers of the company, I was asked to write a little web application to test my technical abilities.

The goal was to create a back-end talking to Microsoft’s Live Connect API and keep a cache of user profiles. Then a front-end demonstrating my HTML/CSS/JS know-how was to be built. User authentication was supposed to use OAuth.

The only technological constraint was to use Python. I decided to use CherryPy and Mako to leverage the boilerplate code I just released back then. For the persistent layer, my first intention was to use SQLAlchemy, but quickly switched to MongoDB as I never played with it and this project was a great opportunity to.

If my web app was far from finished, it was still well-received by the team. After other interviews I was made an competitive offer. I finally declined as I wanted to finish what I stated at my current company.

What’s left of this experience is Live Browser, the web app I created, which source code is now available on GitHub.

How I Open-Sourced an Internal Corporate Project (WebPing)

2 weeks ago I released WebPing. This article is more or less the same I wrote 4 months ago when I released the FTT project and needed to move it from SVN to Git. But this time I added more details on how I removed all sensible informations that were hard-coded in the project files.

Subversion to Git migration

Everything starts out of a local copy of the Subversion repository that was hosting the WebPing project since its inception:

$ rm -rf ./svn-repository-copy
$ tar xvzf ./svn-repository-copy.tar.gz
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./svn-repository-copy

Let’s initialize a Git repository:

$ rm -rf ./webping-git
$ mkdir ./webping-git
$ cd ./webping-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"

We now migrate the code from Subversion to Git:

$ git svn init --no-metadata --username deldycke svn://localhost:3690
$ git svn fetch
$ git rebase --onto git-svn master
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

Removing unrelated files and folders

As WebPing was not alone in the original Subversion repository, we need to clean up the latter and only keep code of the former. Worse, WebPing didn’t started its life in a dedicated subfolder, but as a tool of another project, and jumped from folders to folders. After identifying in the history all places were WebPing lived once, I came up with this big, convoluted command line to do the cleaning:

$ git filter-branch --force --prune-empty --tree-filter 'find ./ -not -ipath "*webping*" -and -not -path "./other-project/trunk/tools/web-ping*" -and -not -path "./other-project/trunk/tools" -and -not -path "./other-project/trunk" -and -not -path "./other-project" -and -not -path "./.git*" -and -not -path "./" | xargs rm -rf' -- --all

Strangely enough, my init tag went of after the command above. So I had to rebased it to get it in line:

$ git rebase init master

We can now remove SVN tags and branches, get rid of the imported git-svn branch, and clean up our Git repository:

$ git filter-branch --force --prune-empty --tree-filter 'find -path "./WebPing/tags*" | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find -path "./WebPing/branches*" | xargs rm -rf' -- --all
$ git branch -r -D git-svn
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

If I now only have WebPing code in the repository, it still jumps through the history between these following locations:

  • other-project/trunk/tools/web-ping.py
  • other-project/trunk/tools/web-ping/
  • WebPing/trunk/

Using a series of git filter-branch invocations, I managed to move everything to the root of the repository:

$ git filter-branch --force --prune-empty --tree-filter 'test -d ./other-project/trunk/tools && cp -axv ./other-project/trunk/tools/* ./ && rm -rf ./other-project/trunk/tools || echo "No tools folder found"' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'test -d ./other-project/trunk/tools/web-ping && cp -axv ./other-project/trunk/tools/web-ping/* ./ && rm -rf ./other-project/trunk/tools/web-ping || echo "No web-ping folder found"' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'test -d ./WebPing/trunk && cp -axv ./WebPing/trunk/* ./ && rm -rf ./WebPing/trunk || echo "No trunk folder found"' -- --all

Hide and obfuscate hard-coded content

As WebPing was created for internal needs in my previous job, its original code base contains lots of references to the former infrastructure it lives in. My professional standards requires me to remove all these sensible informations before making WebPing available to the public.

For example, here is the commands which allowed me to remove all references to hostnames of our intranets:

$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec perl -i -pe "s/([\w-.]*?)\.(company(-intranet|-extention)?)\.(fr|com|net|org)/intranet\.example\.com/g" "{}" \;' -- --all

The Perl one-liner embedded in the command above will only apply the regular expression on a line-by-line basis. If you want to have the regexp applied on the whole content of each file, you have to use Perl’s slurp mode (source of that tip):

$ git filter-branch --force --prune-empty --tree-filter 'perl -0777 -i -pe "s/MAILING_LIST\s*=\s*\[(.*?)\]/MAILING_LIST = \[\]/gs" ./web-ping.py' -- --all

The specific example above helped me removed the content of the MAILING_LIST Python list found in web-ping.py, in order to protect from spam the email addresses of my former co-workers that were unfortunately hard-coded in that variable.

Another place to hunt for sensible information is commit messages. These can be easily modified thanks to the --msg-filter option. Here is how I removed references to our internal Trac tickets:

$ git filter-branch --force --msg-filter 'sed "s/ (see ticket:666)//g"' -- --all

I also had to remove line returns introduced by abusive usage of Windows text editors (remember, WebPing was born in a corporate environment):

$ git filter-branch --force --prune-empty --tree-filter 'perl -i -pe "s/\r//" ./*' -- --all

The last useful command I use was the following, to fix author’s name and email:

$ git filter-branch --force --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "deldycke" ]
      then
        export GIT_AUTHOR_NAME="Kevin Deldycke"
        export GIT_AUTHOR_EMAIL="kevin@deldycke.com"
    fi
    if [ "$GIT_AUTHOR_NAME" = "diehr" ]
      then
        export GIT_AUTHOR_NAME="Matthieu Diehr"
        export GIT_AUTHOR_EMAIL="matthieu.diehr@gmail.com"
    fi
  ' -- --all

By using a dozen variations of the commands above, and carefully reviewing the code, I was able to engineer a clean code history.

But I certainly have been a little too blunt with these regular expressions. Some of them were able to act on binary content. As a result, I had to restore static images to their original copy.

Final steps

Now that your code is clean, all you need is to recreate you tag and fix the init tag date before committing everything to GitHub:

$ git tag -f "0.0" bad4ff7fc48b8b34f6f661d75c782c7fc0d098c5
$ git tag -f "0.1" 590ac9953df0e3bc76fd02615471e36a9796a065
$ git tag -f "0.2" 33f731054042b02c6d2600e7aead5bb7c4991b12
$ git filter-branch --env-filter '
      if [ $GIT_COMMIT = 361224542bc73bba747c7ca382e992e2cdd0c356 ]
      then
          export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
          export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
      fi' -- --all
$ git remote add origin git@github.com:kdeldycke/webping.git
$ git push -u origin master
$ git push --tags

CherryPy + Mako + Formish + OOOP boilerplate

After WebPing last week, here is another release of Open-Source code. This time it’s my boilerplate codebase I created to integrate some Python components with the goal of publishing OpenERP content on the web.

This stack is composed of:

  • CherryPy to serve web content,
  • Mako for HTML templating,
  • Formish for HTML form generation and validation,
  • OOOP to talk to OpenERP server via web services.

This project contains the experiments I did while working at Smile, when I explored the possibility of integrating these components. This code was a proof-of-concept that we leveraged later for a highly specific OpenERP project.

Because of the highly experimental nature of this project, it contains lots of stupid and failed attempts. The whole code base should be thoroughly cleaned up before it can be considered reusable.

All that code is available in a GitHub repository, under a GPL v2 license.

WebPing Open-sourced !

I’ve just released WebPing under a GPL license. It’s available right now on a GitHub repository.

WebPing is a script I started to work on in 2009 while working at EDF. Back then, I needed a monitoring tool to keep an eye on the 80+ Plone instances that my team managed. For several corporate reasons, I wasn’t allowed to use a proper monitoring tool like Munin or Nagios. So I created a small script to fill this need. That’s how WebPing came to be.

WebPing is just a stupid Python script that is designed to be ticked regularly by a cron job. It try to fetch a list of URLs and store response times in an SQLite database. Then it create a static HTML report you’re free to serve with any HTTP server (an example Apache configuration is provided). The configuration of WebPing and the list of URLs it monitor is stored in a YAML file.

The produced HTML report use the Flot jQuery plugin to render graphs. Here is how the dashboard looks like:

Finally, WebPing is able to send reports and alerts by emails. Here is how a mail alert looks like:

Since I created WebPing, I found several other projects more or less developed around the same idea. See Kong, which is based on Django and Twill, a web-oriented DSL. Another project I spotted after the facts was multi-mechanize. Like Kong, it’s written in Python. But I never played with one or the other.

FTT Migration from Subversion to Git

Last month I released the Feed Tracking Tool project (aka FTT) on GitHub. I reconstructed the code history from old tarballs. In the mean time, my friend at Uperto managed to recover the original Subversion repository from very old backups. Here is how I migrated the old SVN repository to GitHub.

First, I started a local Subversion server with the repository my co-worker gave me:

$ tar xvzf ./ftt-svn.tar.gz
$ sed -i 's/# password-db = passwd/password-db = passwd/' ./ftt-svn/conf/svnserve.conf
$ echo "kevin = kevin" >> ./ftt-svn/conf/passwd
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./ftt-svn

Then I created a local Git repository, using my initialization routine:

$ rm -rf ./ftt-git
$ mkdir ./ftt-git
$ cd ./ftt-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"

The next step consist in importing the Subversion repository to Git:

$ git svn init --no-metadata --username kevin svn://localhost:3690
$ git svn fetch

Here I rebased the imported git-svn branch to the main branch:

$ git rebase --onto git-svn master
$ git rebase init master

At that point I don’t need the remote git-svn branch so I removed it:

$ git branch -r -D git-svn

To clean things up, let’s remove all SVN metadatas and local commit backups:

$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

We can now proceed to alter the code history. In FTT we never created branches. I also plan to recreate tags by hand later. So I decided to remove all the tags and branches folders coming from Subversion:

$ git filter-branch --force --prune-empty --tree-filter 'rm -rf ./tags*'     -- --all
$ git filter-branch --force --prune-empty --tree-filter 'rm -rf ./branches*' -- --all

Now let’s move the trunk directory to the base of the repository. I didn’t used the --subdirectory-filter parameter as FTT started its life without a proper “branches/tags/trunk” SVN structure:

$ git filter-branch --force --prune-empty --tree-filter 'test -d ./trunk && cp -axv ./trunk/* ./ && rm -rf ./trunk || echo "No trunk folder found"' -- --all

Next is the Git command I used to fix commit authorship:

$ git filter-branch --force --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "kdeldycke" ]
      then
        export GIT_AUTHOR_NAME="Kevin Deldycke"
        export GIT_AUTHOR_EMAIL="kevin@deldycke.com"
    fi
    if [ "$GIT_AUTHOR_NAME" = "qdesert" ]
      then
        export GIT_AUTHOR_NAME="Quentin Desert"
        export GIT_AUTHOR_EMAIL="quentin.desert@uperto.com"
    fi
    ' -- --all

While exploring my own backups of the FTT project, I stumble upon a preliminary HTML mockup of the app. I decided to include it in the final repository, as the first commit, just after my init tag. Here how I did this, assuming the mockup sources were available in the ../mockup directory:

$ git branch mockup-injection init
$ git checkout mockup-injection
$ cp -axv ../mockup .
$ git add --all
$ git commit --all --date="2007-07-17 15:49" --author="Quentin Desert <quentin.desert@uperto.com>" -m "Commit the oldest mockup I can find."
$ git rebase --onto mockup-injection init master
$ git branch -D mockup-injection

The procedure above come from my “Commit history reconstruction” article.

Now I can tag by hand all FTT releases.

$ git tag -f "0.4.1"  5f5cc2a36743f2c8d2088669e475ef09d8cec029
$ git tag -f "0.5"    54a76e143f9f2efdec88d3181cbcfbfddda5f725
$ git tag -f "0.6"    934447f185330903c389364bed94e994f6b280e6
$ git tag -f '0.7'    ef87ab3287ba23655781565fd622345c942d9c49
$ git tag -f "0.8"    cdcf2f459826019bbbc5874d6632392b07ea889b
$ git tag -f "0.8.1"  f47a3f219eb918069efe701d082928cdb953f05f
$ git tag -f "0.8.2"  2542754dd088d359ce96db8511e0a15588eb50ce
$ git tag -f "0.8.3"  ea9455c0ed75cf504c1cc872d5e5946b578ae702
$ git tag -f "0.9.0"  57a39879b3bcc61bd9560d7ac4e71cbfd0af22df
$ git tag -f "0.9.1"  e483fd1a287fa86a8b12d088b78a319b0990e6ef
$ git tag -f "0.10.0" ed77af77506836892be78044ae4ef15d07f18583

FTT was always developed as an internal app. As such the code and its history still contain lots of sensible informations. I deeply audited the code to identify the kind of data that we should absolutely not disclose to the outside world.

At the end of this code review, I just found references to our internal architecture (server’s names and IP addresses), and some usernames and passwords. There was also some logs and temporary files. I cleaned them all with the following set of Git commands:

$ git filter-branch --force --prune-empty --tree-filter 'find . -iname ".svn"        | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.log"       | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*~"          | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.pid"       | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.ppid"      | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "ruby_sess.*" | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/password: 1234567/password: *******/g"   "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/smtp\.server12\.com/smtp\.uperto\.com/g" "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/192\.168\.0\.2/12\.34\.56\.78/g"         "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/user qdesert/user *******/g"             "{}" \;' -- --all

After all these modifications, I was pretty sure my code was ready to be published. But better safe than sorry, I spent a couple of minutes to do a second deep code review to check that I didn’t missed anything. And to push the reviewing process even further, I offer a beer at the local bar for anyone finding sensible information in FTT’s code base ! :)

The last things I did was to delete the old FTT’s GitHub repository and recreate it. Then I fixed my first commit date, cleaned Git’s local backup and pushed my carefully crafted repository to its new GitHub’s home:

$ export GIT_TMP_INIT_HASH=`git show-ref init | cut -d ' ' -f 1`
$ git filter-branch --env-filter '
    if [ $GIT_COMMIT = $GIT_TMP_INIT_HASH ]
      then
        export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
        export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
    fi' -- --all
$ unset GIT_TMP_INIT_HASH
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
$ git remote add origin git@github.com:kdeldycke/feed-tracking-tool.git
$ git push origin master --force --tags