How I Open-Sourced an Internal Corporate Project (WebPing)

2 weeks ago I released WebPing. This article is more or less the same I wrote 4 months ago when I released the FTT project and needed to move it from SVN to Git. But this time I added more details on how I removed all sensible informations that were hard-coded in the project files.

Subversion to Git migration

Everything starts out of a local copy of the Subversion repository that was hosting the WebPing project since its inception:

$ rm -rf ./svn-repository-copy
$ tar xvzf ./svn-repository-copy.tar.gz
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./svn-repository-copy

Let’s initialize a Git repository:

$ rm -rf ./webping-git
$ mkdir ./webping-git
$ cd ./webping-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"

We now migrate the code from Subversion to Git:

$ git svn init --no-metadata --username deldycke svn://localhost:3690
$ git svn fetch
$ git rebase --onto git-svn master
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

Removing unrelated files and folders

As WebPing was not alone in the original Subversion repository, we need to clean up the latter and only keep code of the former. Worse, WebPing didn’t started its life in a dedicated subfolder, but as a tool of another project, and jumped from folders to folders. After identifying in the history all places were WebPing lived once, I came up with this big, convoluted command line to do the cleaning:

$ git filter-branch --force --prune-empty --tree-filter 'find ./ -not -ipath "*webping*" -and -not -path "./other-project/trunk/tools/web-ping*" -and -not -path "./other-project/trunk/tools" -and -not -path "./other-project/trunk" -and -not -path "./other-project" -and -not -path "./.git*" -and -not -path "./" | xargs rm -rf' -- --all

Strangely enough, my init tag went of after the command above. So I had to rebased it to get it in line:

$ git rebase init master

We can now remove SVN tags and branches, get rid of the imported git-svn branch, and clean up our Git repository:

$ git filter-branch --force --prune-empty --tree-filter 'find -path "./WebPing/tags*" | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find -path "./WebPing/branches*" | xargs rm -rf' -- --all
$ git branch -r -D git-svn
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

If I now only have WebPing code in the repository, it still jumps through the history between these following locations:

  • other-project/trunk/tools/web-ping.py
  • other-project/trunk/tools/web-ping/
  • WebPing/trunk/

Using a series of git filter-branch invocations, I managed to move everything to the root of the repository:

$ git filter-branch --force --prune-empty --tree-filter 'test -d ./other-project/trunk/tools && cp -axv ./other-project/trunk/tools/* ./ && rm -rf ./other-project/trunk/tools || echo "No tools folder found"' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'test -d ./other-project/trunk/tools/web-ping && cp -axv ./other-project/trunk/tools/web-ping/* ./ && rm -rf ./other-project/trunk/tools/web-ping || echo "No web-ping folder found"' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'test -d ./WebPing/trunk && cp -axv ./WebPing/trunk/* ./ && rm -rf ./WebPing/trunk || echo "No trunk folder found"' -- --all

Hide and obfuscate hard-coded content

As WebPing was created for internal needs in my previous job, its original code base contains lots of references to the former infrastructure it lives in. My professional standards requires me to remove all these sensible informations before making WebPing available to the public.

For example, here is the commands which allowed me to remove all references to hostnames of our intranets:

$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec perl -i -pe "s/([\w-.]*?)\.(company(-intranet|-extention)?)\.(fr|com|net|org)/intranet\.example\.com/g" "{}" \;' -- --all

The Perl one-liner embedded in the command above will only apply the regular expression on a line-by-line basis. If you want to have the regexp applied on the whole content of each file, you have to use Perl’s slurp mode (source of that tip):

$ git filter-branch --force --prune-empty --tree-filter 'perl -0777 -i -pe "s/MAILING_LIST\s*=\s*\[(.*?)\]/MAILING_LIST = \[\]/gs" ./web-ping.py' -- --all

The specific example above helped me removed the content of the MAILING_LIST Python list found in web-ping.py, in order to protect from spam the email addresses of my former co-workers that were unfortunately hard-coded in that variable.

Another place to hunt for sensible information is commit messages. These can be easily modified thanks to the --msg-filter option. Here is how I removed references to our internal Trac tickets:

$ git filter-branch --force --msg-filter 'sed "s/ (see ticket:666)//g"' -- --all

I also had to remove line returns introduced by abusive usage of Windows text editors (remember, WebPing was born in a corporate environment):

$ git filter-branch --force --prune-empty --tree-filter 'perl -i -pe "s/\r//" ./*' -- --all

The last useful command I use was the following, to fix author’s name and email:

$ git filter-branch --force --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "deldycke" ]
      then
        export GIT_AUTHOR_NAME="Kevin Deldycke"
        export GIT_AUTHOR_EMAIL="kevin@deldycke.com"
    fi
    if [ "$GIT_AUTHOR_NAME" = "diehr" ]
      then
        export GIT_AUTHOR_NAME="Matthieu Diehr"
        export GIT_AUTHOR_EMAIL="matthieu.diehr@gmail.com"
    fi
  ' -- --all

By using a dozen variations of the commands above, and carefully reviewing the code, I was able to engineer a clean code history.

But I certainly have been a little too blunt with these regular expressions. Some of them were able to act on binary content. As a result, I had to restore static images to their original copy.

Final steps

Now that your code is clean, all you need is to recreate you tag and fix the init tag date before committing everything to GitHub:

$ git tag -f "0.0" bad4ff7fc48b8b34f6f661d75c782c7fc0d098c5
$ git tag -f "0.1" 590ac9953df0e3bc76fd02615471e36a9796a065
$ git tag -f "0.2" 33f731054042b02c6d2600e7aead5bb7c4991b12
$ git filter-branch --env-filter '
      if [ $GIT_COMMIT = 361224542bc73bba747c7ca382e992e2cdd0c356 ]
      then
          export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
          export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
      fi' -- --all
$ git remote add origin git@github.com:kdeldycke/webping.git
$ git push -u origin master
$ git push --tags

FTT Migration from Subversion to Git

Last month I released the Feed Tracking Tool project (aka FTT) on GitHub. I reconstructed the code history from old tarballs. In the mean time, my friend at Uperto managed to recover the original Subversion repository from very old backups. Here is how I migrated the old SVN repository to GitHub.

First, I started a local Subversion server with the repository my co-worker gave me:

$ tar xvzf ./ftt-svn.tar.gz
$ sed -i 's/# password-db = passwd/password-db = passwd/' ./ftt-svn/conf/svnserve.conf
$ echo "kevin = kevin" >> ./ftt-svn/conf/passwd
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./ftt-svn

Then I created a local Git repository, using my initialization routine:

$ rm -rf ./ftt-git
$ mkdir ./ftt-git
$ cd ./ftt-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"

The next step consist in importing the Subversion repository to Git:

$ git svn init --no-metadata --username kevin svn://localhost:3690
$ git svn fetch

Here I rebased the imported git-svn branch to the main branch:

$ git rebase --onto git-svn master
$ git rebase init master

At that point I don’t need the remote git-svn branch so I removed it:

$ git branch -r -D git-svn

To clean things up, let’s remove all SVN metadatas and local commit backups:

$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

We can now proceed to alter the code history. In FTT we never created branches. I also plan to recreate tags by hand later. So I decided to remove all the tags and branches folders coming from Subversion:

$ git filter-branch --force --prune-empty --tree-filter 'rm -rf ./tags*'     -- --all
$ git filter-branch --force --prune-empty --tree-filter 'rm -rf ./branches*' -- --all

Now let’s move the trunk directory to the base of the repository. I didn’t used the --subdirectory-filter parameter as FTT started its life without a proper “branches/tags/trunk” SVN structure:

$ git filter-branch --force --prune-empty --tree-filter 'test -d ./trunk && cp -axv ./trunk/* ./ && rm -rf ./trunk || echo "No trunk folder found"' -- --all

Next is the Git command I used to fix commit authorship:

$ git filter-branch --force --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "kdeldycke" ]
      then
        export GIT_AUTHOR_NAME="Kevin Deldycke"
        export GIT_AUTHOR_EMAIL="kevin@deldycke.com"
    fi
    if [ "$GIT_AUTHOR_NAME" = "qdesert" ]
      then
        export GIT_AUTHOR_NAME="Quentin Desert"
        export GIT_AUTHOR_EMAIL="quentin.desert@uperto.com"
    fi
    ' -- --all

While exploring my own backups of the FTT project, I stumble upon a preliminary HTML mockup of the app. I decided to include it in the final repository, as the first commit, just after my init tag. Here how I did this, assuming the mockup sources were available in the ../mockup directory:

$ git branch mockup-injection init
$ git checkout mockup-injection
$ cp -axv ../mockup .
$ git add --all
$ git commit --all --date="2007-07-17 15:49" --author="Quentin Desert <quentin.desert@uperto.com>" -m "Commit the oldest mockup I can find."
$ git rebase --onto mockup-injection init master
$ git branch -D mockup-injection

The procedure above come from my “Commit history reconstruction” article.

Now I can tag by hand all FTT releases.

$ git tag -f "0.4.1"  5f5cc2a36743f2c8d2088669e475ef09d8cec029
$ git tag -f "0.5"    54a76e143f9f2efdec88d3181cbcfbfddda5f725
$ git tag -f "0.6"    934447f185330903c389364bed94e994f6b280e6
$ git tag -f '0.7'    ef87ab3287ba23655781565fd622345c942d9c49
$ git tag -f "0.8"    cdcf2f459826019bbbc5874d6632392b07ea889b
$ git tag -f "0.8.1"  f47a3f219eb918069efe701d082928cdb953f05f
$ git tag -f "0.8.2"  2542754dd088d359ce96db8511e0a15588eb50ce
$ git tag -f "0.8.3"  ea9455c0ed75cf504c1cc872d5e5946b578ae702
$ git tag -f "0.9.0"  57a39879b3bcc61bd9560d7ac4e71cbfd0af22df
$ git tag -f "0.9.1"  e483fd1a287fa86a8b12d088b78a319b0990e6ef
$ git tag -f "0.10.0" ed77af77506836892be78044ae4ef15d07f18583

FTT was always developed as an internal app. As such the code and its history still contain lots of sensible informations. I deeply audited the code to identify the kind of data that we should absolutely not disclose to the outside world.

At the end of this code review, I just found references to our internal architecture (server’s names and IP addresses), and some usernames and passwords. There was also some logs and temporary files. I cleaned them all with the following set of Git commands:

$ git filter-branch --force --prune-empty --tree-filter 'find . -iname ".svn"        | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.log"       | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*~"          | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.pid"       | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.ppid"      | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "ruby_sess.*" | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/password: 1234567/password: *******/g"   "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/smtp\.server12\.com/smtp\.uperto\.com/g" "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/192\.168\.0\.2/12\.34\.56\.78/g"         "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/user qdesert/user *******/g"             "{}" \;' -- --all

After all these modifications, I was pretty sure my code was ready to be published. But better safe than sorry, I spent a couple of minutes to do a second deep code review to check that I didn’t missed anything. And to push the reviewing process even further, I offer a beer at the local bar for anyone finding sensible information in FTT’s code base ! :)

The last things I did was to delete the old FTT’s GitHub repository and recreate it. Then I fixed my first commit date, cleaned Git’s local backup and pushed my carefully crafted repository to its new GitHub’s home:

$ export GIT_TMP_INIT_HASH=`git show-ref init | cut -d ' ' -f 1`
$ git filter-branch --env-filter '
    if [ $GIT_COMMIT = $GIT_TMP_INIT_HASH ]
      then
        export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
        export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
    fi' -- --all
$ unset GIT_TMP_INIT_HASH
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
$ git remote add origin git@github.com:kdeldycke/feed-tracking-tool.git
$ git push origin master --force --tags

Pushing Git to Subversion: the case of WordPress plugin repository

Some weeks ago I moved my e107 Importer project from a big fat Git repository to its own.

Then I wanted to have my plugin to be available on WordPress.org. In fact, this list is the tip of WordPress plugin hosting solution. It means that if you want to have your plugin there, you have to push your code in WordPress’ big Subversion repository. And that’s when I realized I had to sync my Git repository to Subversion…

This article details how I managed to push to Subversion all my developments activity taking place in Git.

Before going further: be careful ! It’s really easy to mess things up. After all, we’re trying to push code on a public Subversion repository. We must be certain of what we are doing here. Risks of deleting stuff that are not ours are great.

The simulation

To prevent any big mistake, we’ll test our commands on a local subversion repository.

Let’s create one:

$ rm -rf svn-repo
$ svnadmin create ./svn-repo

Now we’ll launch a local Subversion server with a minimal config:

$ sed -i 's/# password-db = passwd/password-db = passwd/' ./svn-repo/conf/svnserve.conf
$ echo "kevin = kevin" >> ./svn-repo/conf/passwd
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./svn-repo

To test our server, let’s checkout a local working copy from it:

$ rm -rf svn-working-copy
$ svn co svn://localhost:3690 svn-working-copy
$ cd svn-working-copy/
$ svn info

To simulate an already active Subversion repository, we’ll make a first commit with a structure mimicking WordPress’ plugin repository:

$ mkdir -p e107-importer/{trunk,branches,tags}
$ svn add *
$ svn commit -m "Create a WordPress-like repository structure" --username kevin
$ svn up
$ svn info

Now that we have a place to hack, we can experiment on Git side. We start with a copy of my plugin repository:

$ cd ..
$ rm -rf e107-importer
$ git clone git@github.com:kdeldycke/e107-importer.git
$ cd e107-importer

Thanks to git-svn, we can attach a remote Subversion repository:

$ git svn init --trunk=e107-importer/trunk --branches=e107-importer/branches --tags=e107-importer/tags --username kevin  svn://localhost:3690

Get get a copy of Subversion’s content:

$ git svn fetch
r1 = d969aa9a11684a1cd2ba0b3eab0a3ee72a62af51 (refs/remotes/trunk)

Now we will rebase our whole Git tree to Subversion’s trunk:

$ git rebase trunk

According gitg, the result of this is 2 parallel trees:

  • the first is the untouched original tree;
  • the other start on the trunk branch and continue with a copy of the original tree, and is the result of the rebase.

But the latter has a problem: my initial commit and all my tags are squashed. I tried several methods to rebase my whole Git tree onto the local trunk branch while keeping these. But I failed.

I resigned myself and passed over this. After all, the initial commit played its role, by taking care of this corner-case.

As for the tags, I just re-added them by hand. I forced their creation, as Git keeps them attached to the original parallel tree:

$ git tag -f "e107-importer-0.1" 728ec8689d13350bbfc1f2d9dc17dda2b8a8fdbf
$ git tag -f "e107-importer-0.2" 8049b92265a41f594e97020bae6f3aa74b6a7fb1
$ git tag -f "e107-importer-0.3" 9505aa0656ba61f39cd6cb91c76c1e7279c68473
$ git tag -f "e107-importer-0.4" 0da2d61239c9a9549d197737518705912fd4982d
$ git tag -f "e107-importer-0.5" 561d35b5d1b4d2c35e13c76a3f2a41689c96e991
$ git tag -f "e107-importer-0.6" c6de1a2bf60cad054c5420eab2f30f190092fb68
$ git tag -f "e107-importer-0.7" 6ad4d4a67e8b84da31565383e5eed6ceb5b7d2b2
$ git tag -f "e107-importer-0.8" 47b8efdc82132027b139a2f214f119cee1e9c06c
$ git tag -f "e107-importer-0.9" a82f5d0814db7cf6ac7a1ac171b30c300e1a91d4

Now we are ready to push the code to the remote Subversion repository:

$ git svn dcommit

Things seems to have worked, as if you go back to your local copy of the simulated remote SVN, you’ll get all your code base and its history:

$ cd ..
$ cd svn-working-copy
$ svn up
$ svn log

If commit order is preserved, dates are not, because unlike Git, Subversion only track the commit date, not the author’s date. This is sad but expected.

But here I was hoping that Git-svn was smart enough to create tags automatically. They weren’t, and my tags folder remained empty. That may be due to the nature of tags in Subversion, which are just branches. I don’t know. At the end I just decided to create tags by hand on Subversion side:

$ svn copy svn://localhost:3690/e107-importer/trunk@2  svn://localhost:3690/e107-importer/tags/0.1 -m "Tag e107-importer 0.1"
$ svn copy svn://localhost:3690/e107-importer/trunk@4  svn://localhost:3690/e107-importer/tags/0.2 -m "Tag e107-importer 0.2"
$ svn copy svn://localhost:3690/e107-importer/trunk@5  svn://localhost:3690/e107-importer/tags/0.3 -m "Tag e107-importer 0.3"
$ svn copy svn://localhost:3690/e107-importer/trunk@6  svn://localhost:3690/e107-importer/tags/0.4 -m "Tag e107-importer 0.4"
$ svn copy svn://localhost:3690/e107-importer/trunk@8  svn://localhost:3690/e107-importer/tags/0.5 -m "Tag e107-importer 0.5"
$ svn copy svn://localhost:3690/e107-importer/trunk@9  svn://localhost:3690/e107-importer/tags/0.6 -m "Tag e107-importer 0.6"
$ svn copy svn://localhost:3690/e107-importer/trunk@10 svn://localhost:3690/e107-importer/tags/0.7 -m "Tag e107-importer 0.7"
$ svn copy svn://localhost:3690/e107-importer/trunk@11 svn://localhost:3690/e107-importer/tags/0.8 -m "Tag e107-importer 0.8"
$ svn copy svn://localhost:3690/e107-importer/trunk@12 svn://localhost:3690/e107-importer/tags/0.9 -m "Tag e107-importer 0.9"

Real life push to WordPress repository

Now that our commit simulation worked somehow, we can perform them in the real world.

First, initialize a copy of the Git repository:

$ rm -rf e107-importer-git
$ git clone git@github.com:kdeldycke/e107-importer.git e107-importer-git

Let’s attach Subversion to Git:

$ cd e107-importer-git
$ git svn init --trunk=trunk --branches=branches --tags=tags http://plugins.svn.wordpress.org/e107-importer

Here you might want to do a git svn fetch as we did before. But this will take a while. Especially on WordPress plugin repository, as Git will browse all SVN revisions (more than 330.000 currently).

To speed things up, and following a tip from Nicolas Kuttler, we’ll search for the revision we’re interested in (the start of our plugin subfolder life), then fetch from here:

$ svn log --limit 1 http://plugins.svn.wordpress.org/e107-importer
------------------------------------------------------------------------
r333566 | plugin-master | 2011-01-17 17:09:40 +0100 (Mon, 17 Jan 2011) | 1 line

adding e107-importer by Coolkevman
------------------------------------------------------------------------
$ git svn fetch -r333566
r333566 = b850438a98c26a8f55ee2ddd7bdf8816d0390a1b (refs/remotes/trunk)

And now we can send our massive payload, after rebasing our master branch to SVN’s trunk:

$ git rebase trunk
$ git svn dcommit --username=Coolkevman

We can then contemplate our work in the official WordPress plugin repository.

There is one problem though: git-svn has left empty folders because of renaming. Let’s fix this:

$ svn rm http://plugins.svn.wordpress.org/e107-importer/trunk/bbcode -m "Git-svn doesn't delete empty folders on move." --username=Coolkevman

Last thing to do is to tag our old versions on Subversion, as we did in our simulation:

$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336229 http://plugins.svn.wordpress.org/e107-importer/tags/0.1 -m "Tag e107-importer 0.1"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336231 http://plugins.svn.wordpress.org/e107-importer/tags/0.2 -m "Tag e107-importer 0.2"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336232 http://plugins.svn.wordpress.org/e107-importer/tags/0.3 -m "Tag e107-importer 0.3"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336233 http://plugins.svn.wordpress.org/e107-importer/tags/0.4 -m "Tag e107-importer 0.4"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336235 http://plugins.svn.wordpress.org/e107-importer/tags/0.5 -m "Tag e107-importer 0.5"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336236 http://plugins.svn.wordpress.org/e107-importer/tags/0.6 -m "Tag e107-importer 0.6"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336237 http://plugins.svn.wordpress.org/e107-importer/tags/0.7 -m "Tag e107-importer 0.7"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336238 http://plugins.svn.wordpress.org/e107-importer/tags/0.8 -m "Tag e107-importer 0.8"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336239 http://plugins.svn.wordpress.org/e107-importer/tags/0.9 -m "Tag e107-importer 0.9"

But this mean I had to clean up tags too, to remove the remaining empty folder.

Pushing new commits

All of the above only works with an newly created plugin structure on WordPress plugin repository. What if we want to push new commits to Subversion once we’ve already pushed part of our Git history ?

First, let’s make our life miserable and delete all our local repositories:

$ cd ..
$ rm -rf e107-importer-git

Now, if we replay the steps above, the git rebase trunk command will ends with loads of conflicts. The procedure is different this time and is explained by Ikke.

This involves Git’s graft:

$ git clone git@github.com:kdeldycke/e107-importer.git e107-importer-git
$ cd e107-importer-git
$ git svn init --trunk=trunk --branches=branches --tags=tags http://plugins.svn.wordpress.org/e107-importer
$ git svn fetch -r333566
$ git show-ref trunk
$ git log --pretty=oneline master | tail -n1
$ echo `git log --pretty=oneline master | tail -n1 | cut -d ' ' -f 1` `git show-ref trunk | cut -d ' ' -f 1` >> .git/info/grafts
$ git svn dcommit

The last command will not end well, with Git complaining about unmerged differences. This is likely due to my additional commit removing the empty folder left by git-svn. Fortunately Git suggest something in its log:

If you are attempting to commit  merges, try running:
  git rebase --interactive --preserve-merges  refs/remotes/trunk
Before dcommitting

Well, that’s what I exactly did:

$ git rebase --interactive --preserve-merges refs/remotes/trunk
$ git svn dcommit

And it magically fixed the issue ! :)

I’m quite happy now to have a clearly identified workflow to push my Git updates to Subversion ! :)

Apache commands

  • Hide Subversion and Git directories content (source):
    RedirectMatch 404 /\.(svn|git)(/|$)
    
  • Disable rendering of PHP files coming from imported third party Javascript submodules (context):
    RedirectMatch 404 js-(.*)\.php$
    
  • Redirect any request to current year sub-directory (I used this for a yearly-updated static web page):
    RewriteEngine on
    RewriteRule !^/2010/ /2010/ [R=301,L]
    
  • Here is my template for domain-based virtual host routing:
    # Setup the main website access
    <VirtualHost *:80>
      ServerName example.com
      DocumentRoot /var/www/example
      # Add extra capabilities to let CMS like WordPress manage redirections
      <Directory /var/www/example>
        Options +FollowSymLinks +SymLinksIfOwnerMatch
      </Directory>
    </VirtualHost>
    # Redirect all other access to the website from different domains to the canonical URL
    <VirtualHost *:80>
      ServerName www.example.com
      ServerAlias *.example.com
      ServerAlias example.net *.example.net
      ServerAlias example.org *.example.org
      RedirectMatch permanent (.*) http://example.com$1
    </VirtualHost>
    
  • Insert dynamic headers in HTTP responses depending on the browser:
    BrowserMatchNoCase ".*MSIE\s[1-6].*" IS_DISGUSTING_BROWSER
    Header add X-advice-of-the-day "Save a kitten: use Firefox !" env=IS_DISGUSTING_BROWSER
    
  • Prevent WebDAV connexions (thanks Guillaume!):
    <Location />
      <Limit PROPFIND PROPPATCH MKCOL COPY MOVE LOCK UNLOCK PATCH>
        # Leaves GET (and HEAD), POST, PUT, DELETE, CONNECT, OPTIONS and TRACE alone
        Order allow,deny
        Deny from all
      </Limit>
    </Location>
    SetEnvIf Request_Method "OPTIONS" CLIENT_PROBE
    Header set Allow "GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, TRACE" env=CLIENT_PROBE
    
  • At work, we had to engineer a convoluted software architecture for our intranet to fit the network security policy of our customer. This had a bad side effect of letting the web statistic collector delete all cookies but its own, thus breaking intranet’s authentication. So we (thanks Matthieu!) came up with this unmaintainable hack on Apache side to hide our intranet’s cookies to NedStat’s Javascript embedded code:
    <LocationMatch "/(.*)">
      LoadModule headers_module modules/mod_headers.so
      RequestHeader edit Cookie "(app_cookie_001=[^;]*(; )*)" ""
      RequestHeader edit Cookie "(app_cookie_002=[^;]*(; )*)" ""
      RequestHeader edit Cookie "(app_cookie_003=[^;]*(; )*)" ""
    </LocationMatch>
    
  • Kill all apache processes and restart the service:
    /etc/init.d/apache2 stop ; pkill -9 -u www-data ; /etc/init.d/apache2 restart
    
  • Restart Apache service if no process found:
    [ `ps axu | grep -v "grep" | grep --count "www-data"` -le 0 ] && /etc/init.d/apache2 restart
    

Automate Trac instance deployment with Buildout

Recently, I started to contribute to pbp.recipe.trac, a Buildout recipe aimed to simplify the management and configuration of Trac instances.

I’ve taken interest in this piece of code the day I realized the Trac instance we used at work was still running on the old 0.10.x series. Even if we spend the majority of our time there, nobody has taken care of our little Trac: it was not updated for 3 years. If you add to this a sudden need for multi-repository support (as our team is adopting other internal projects), you have enough incentives to upgrade our Trac and automate its maintenance.

So here is how I migrated our legacy Trac 0.10 instance to a brand new 0.12 thanks to Buildout and pbp.recipe.trac.

First, let’s install all system dependencies using your distribution package management tool. My target server is running an RHEL 5.4, so I’ll invoke Yum:

$ sudo yum install subversion subversion-python sqlite-devel cyrus-sasl-lib cyrus-sasl-md5 mercurial

On Debian/Ubuntu, equivalent packages should be installed with apt-get:

$ sudo apt-get install subversion python-subversion libsqlite-dev cyrus-sasl-lib cyrus-sasl-md5 mercurial

Now we create an empty structure that will host our Trac instance:

$ mkdir ~/trac-home
$ cd ~/trac-home
$ touch ./buildout.cfg

It’s time to edit the file at the core of the process: buildout.cfg. Here is my version:

[buildout]
extensions = buildout.bootstrap
parts = my-trac
deploy-server = trac.example.net

[my-trac]
recipe = pbp.recipe.trac
project-name = My Trac instance
project-description = This is my stand-alone Trac instance hosting my devlopment activities.
project-url = http://${buildout:deploy-server}:8000/my-trac
repos = my-repo-1 | svn | ${buildout:directory}/repos/my-repo-1 | svn://${buildout:deploy-server}:3690/my-repo-1
        my-repo-2 | svn | ${buildout:directory}/repos/my-repo-2 | svn://${buildout:deploy-server}:3690/my-repo-2
        my-repo-3 | svn | ${buildout:directory}/repos/my-repo-3 | svn://${buildout:deploy-server}:3690/my-repo-3
default-repo = my-repo-1
force-instance-upgrade = True
force-repos-resync = True
wiki-doc-upgrade = True
stats-plugin = enabled
permissions = anonymous | STATS_VIEW
header-logo = ${buildout:directory}/my_trac_logo.png
smtp-enabled = true
smtp-server = localhost
smtp-port = 25
smtp-from = trac@example.net
smtp-replyto = no-reply@example.net
smtp-always-cc = kevin@example.net bob@example.net
additional-menu-items = Buildbot | http://${buildout:deploy-server}:9080/console
trac-ini-additional = attachment   | max_size               | 26214400
                      browser      | downloadable_paths     | /*/trunk, /*/branches/*, /*/tags/*
                      notification | always_notify_owner    | true
                      notification | always_notify_reporter | true
                      timeline     | ticket_show_details    | true
                      wiki         | ignore_missing_pages   | true
                      svn          | branches               | /*/trunk, /*/branches/*
                      svn          | tags                   | /*/tags/*

I now encourage you to use my buildout.cfg above as a template and customize it to your needs. Please read pbp.recipe.trac documentation carefully to set the recipe options to values you like.

Before going further, we need a bootstrap.py script. This script will take care of all stuff required by a bare Python interpreter to handle a Buildout project from scratch. Let’s download the latest version:

$ wget http://svn.zope.org/repos/main/zc.buildout/trunk/bootstrap/bootstrap.py

Now we can initialize our Buildout environment. The --distribute option here is necessary to get something more modern than the abandoned setuptools:

$ python ./bootstrap.py --distribute

And then we can ask Buildout to construct our the instance:

$ ./bin/buildout

Now that we have an empty Trac 0.12 instance, we will migrate there our legacy Subversion repositories:

$ svnadmin create ./repos/my-repo-1
$ svnadmin create ./repos/my-repo-2
$ svnadmin create ./repos/my-repo-3
$ ssh -C root@legacy.example.net "svnadmin dump /software/svn/repo1" | svnadmin load ./repos/my-repo-1
$ ssh -C root@legacy.example.net "svnadmin dump /software/svn/repo2" | svnadmin load ./repos/my-repo-2
$ svnadmin load ./repos/my-repo-3 < ~/svn_repo3_20100612.dmp

Note that in this case my first two subversion repositories are still running on my legacy server, and I already have a local dump of the third.

Let’s copy the data from our legacy Trac instance. By studying the differences between a default Trac instance and the legacy one I was working on, I came to the conclusion that I only needed to move attachments and the main database. Of course this is my personal case and your’s may be a little bit different:

$ scp -rC root@legacy.example.net:/software/trac/project/attachments ./parts/my-trac/
$ scp -rC root@legacy.example.net:/software/trac/project/db/trac.db  ./parts/my-trac/db/

We need to call Buildout a second time to update our the project with all the data we’ve just migrated:

$ ./bin/buildout

Now we’ll activate and configure SASL-based authentication in all Subversion repositories:

$ sed -i 's/# use-sasl = true/use-sasl = true/' ./repos/my-repo-1/conf/svnserve.conf
$ sed -i 's/# use-sasl = true/use-sasl = true/' ./repos/my-repo-2/conf/svnserve.conf
$ sed -i 's/# use-sasl = true/use-sasl = true/' ./repos/my-repo-3/conf/svnserve.conf
$ sed -i 's/# realm = My First Repository/realm = svn/' ./repos/my-repo-1/conf/svnserve.conf
$ sed -i 's/# realm = My First Repository/realm = svn/' ./repos/my-repo-2/conf/svnserve.conf
$ sed -i 's/# realm = My First Repository/realm = svn/' ./repos/my-repo-3/conf/svnserve.conf

Create a password database with our users:

$ saslpasswd2 -f sasl.db -u svn kevin
$ saslpasswd2 -f sasl.db -u svn bob
$ ...

Setup SASL authentication on the system (please change the sasl.conf location below according your file structure):

$ touch ./sasl.conf
$ sudo ln -s /home/kevin/trac-home/sasl.conf /etc/sasl2/svn.conf

And put the following content in the sasl.conf file we just created above (don’t forget to update the sasl.db location):

pwcheck_method: auxprop
auxprop_plugin: sasldb
sasldb_path: /home/kevin/trac-home/sasl.db
mech_list: ANONYMOUS CRAM-MD5 DIGEST-MD5

It’s time to create and populate the password file used by Trac, with all the users we created 3 steps above:

$ touch ./htdigest
$ htdigest ./htdigest trac kevin
$ htdigest ./htdigest trac bob
$ ...

And now we can start the Subversion server in the background:

$ svnserve --daemon --listen-port 3690 --root ./repos/

Last step, we launch Trac’s standalone webserver:

$ ./bin/tracd --port 8000 --single-env --auth="*,htdigest,trac" ./parts/my-trac

You can now reach Trac from your browser, on the following URL:


http://trac.example.net:8000/my-trac

A final test consist in getting some code from Subversion:

$ svn co svn://trac.example.net:3690/my-repo-1

From now on, and that’s where the fun begins, each time a new Trac version is released on PyPi, I just have to:

  1. stop both Trac and Subversion standalone servers,
  2. run ./bin/buildout, and
  3. restart both Subversion and Trac servers.

That’s enough to upgrade my instance.

Now you can clearly see how it’s important to invest time in automation to save on maintenance costs and prevent code rotting… :)

Subversion commits and mail activity stream in iCalendar

Last week I consolidated all my code in my GitHub repository. I stumble upon an old script I haven’t publicized yet: svn2ical.py.

This is a simple hack which get commit metadata out of a Subversion repository and generate an iCalendar file containing all commits of a given author. I used it back then to visualize in a calendar my commit activity. Nowadays this script is quite useless as services like Ohloh and GitHub provides great timeline and activity streams. But this script can still be useful for private repositories.

And in the same spirit of this script, I uncovered maildir2ical.py, a script that look in a maildir folder for mails sent by a particular author, then generate an iCal file based on mail dates.