Live Browser : a Python web app using Microsoft Live Connect API

5 months ago I was called by a recruiter for a position in a startup building cloud-computing solutions. At the end of my first interview with the engineers of the company, I was asked to write a little web application to test my technical abilities.

The goal was to create a back-end talking to Microsoft’s Live Connect API and keep a cache of user profiles. Then a front-end demonstrating my HTML/CSS/JS know-how was to be built. User authentication was supposed to use OAuth.

The only technological constraint was to use Python. I decided to use CherryPy and Mako to leverage the boilerplate code I just released back then. For the persistent layer, my first intention was to use SQLAlchemy, but quickly switched to MongoDB as I never played with it and this project was a great opportunity to.

If my web app was far from finished, it was still well-received by the team. After other interviews I was made an competitive offer. I finally declined as I wanted to finish what I stated at my current company.

What’s left of this experience is Live Browser, the web app I created, which source code is now available on GitHub.

How I Open-Sourced an Internal Corporate Project (WebPing)

2 weeks ago I released WebPing. This article is more or less the same I wrote 4 months ago when I released the FTT project and needed to move it from SVN to Git. But this time I added more details on how I removed all sensible informations that were hard-coded in the project files.

Subversion to Git migration

Everything starts out of a local copy of the Subversion repository that was hosting the WebPing project since its inception:

$ rm -rf ./svn-repository-copy
$ tar xvzf ./svn-repository-copy.tar.gz
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./svn-repository-copy

Let’s initialize a Git repository:

$ rm -rf ./webping-git
$ mkdir ./webping-git
$ cd ./webping-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"

We now migrate the code from Subversion to Git:

$ git svn init --no-metadata --username deldycke svn://localhost:3690
$ git svn fetch
$ git rebase --onto git-svn master
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

Removing unrelated files and folders

As WebPing was not alone in the original Subversion repository, we need to clean up the latter and only keep code of the former. Worse, WebPing didn’t started its life in a dedicated subfolder, but as a tool of another project, and jumped from folders to folders. After identifying in the history all places were WebPing lived once, I came up with this big, convoluted command line to do the cleaning:

$ git filter-branch --force --prune-empty --tree-filter 'find ./ -not -ipath "*webping*" -and -not -path "./other-project/trunk/tools/web-ping*" -and -not -path "./other-project/trunk/tools" -and -not -path "./other-project/trunk" -and -not -path "./other-project" -and -not -path "./.git*" -and -not -path "./" | xargs rm -rf' -- --all

Strangely enough, my init tag went of after the command above. So I had to rebased it to get it in line:

$ git rebase init master

We can now remove SVN tags and branches, get rid of the imported git-svn branch, and clean up our Git repository:

$ git filter-branch --force --prune-empty --tree-filter 'find -path "./WebPing/tags*" | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find -path "./WebPing/branches*" | xargs rm -rf' -- --all
$ git branch -r -D git-svn
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

If I now only have WebPing code in the repository, it still jumps through the history between these following locations:

  • other-project/trunk/tools/web-ping.py
  • other-project/trunk/tools/web-ping/
  • WebPing/trunk/

Using a series of git filter-branch invocations, I managed to move everything to the root of the repository:

$ git filter-branch --force --prune-empty --tree-filter 'test -d ./other-project/trunk/tools && cp -axv ./other-project/trunk/tools/* ./ && rm -rf ./other-project/trunk/tools || echo "No tools folder found"' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'test -d ./other-project/trunk/tools/web-ping && cp -axv ./other-project/trunk/tools/web-ping/* ./ && rm -rf ./other-project/trunk/tools/web-ping || echo "No web-ping folder found"' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'test -d ./WebPing/trunk && cp -axv ./WebPing/trunk/* ./ && rm -rf ./WebPing/trunk || echo "No trunk folder found"' -- --all

Hide and obfuscate hard-coded content

As WebPing was created for internal needs in my previous job, its original code base contains lots of references to the former infrastructure it lives in. My professional standards requires me to remove all these sensible informations before making WebPing available to the public.

For example, here is the commands which allowed me to remove all references to hostnames of our intranets:

$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec perl -i -pe "s/([\w-.]*?)\.(company(-intranet|-extention)?)\.(fr|com|net|org)/intranet\.example\.com/g" "{}" \;' -- --all

The Perl one-liner embedded in the command above will only apply the regular expression on a line-by-line basis. If you want to have the regexp applied on the whole content of each file, you have to use Perl’s slurp mode (source of that tip):

$ git filter-branch --force --prune-empty --tree-filter 'perl -0777 -i -pe "s/MAILING_LIST\s*=\s*\[(.*?)\]/MAILING_LIST = \[\]/gs" ./web-ping.py' -- --all

The specific example above helped me removed the content of the MAILING_LIST Python list found in web-ping.py, in order to protect from spam the email addresses of my former co-workers that were unfortunately hard-coded in that variable.

Another place to hunt for sensible information is commit messages. These can be easily modified thanks to the --msg-filter option. Here is how I removed references to our internal Trac tickets:

$ git filter-branch --force --msg-filter 'sed "s/ (see ticket:666)//g"' -- --all

I also had to remove line returns introduced by abusive usage of Windows text editors (remember, WebPing was born in a corporate environment):

$ git filter-branch --force --prune-empty --tree-filter 'perl -i -pe "s/\r//" ./*' -- --all

The last useful command I use was the following, to fix author’s name and email:

$ git filter-branch --force --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "deldycke" ]
      then
        export GIT_AUTHOR_NAME="Kevin Deldycke"
        export GIT_AUTHOR_EMAIL="kevin@deldycke.com"
    fi
    if [ "$GIT_AUTHOR_NAME" = "diehr" ]
      then
        export GIT_AUTHOR_NAME="Matthieu Diehr"
        export GIT_AUTHOR_EMAIL="matthieu.diehr@gmail.com"
    fi
  ' -- --all

By using a dozen variations of the commands above, and carefully reviewing the code, I was able to engineer a clean code history.

But I certainly have been a little too blunt with these regular expressions. Some of them were able to act on binary content. As a result, I had to restore static images to their original copy.

Final steps

Now that your code is clean, all you need is to recreate you tag and fix the init tag date before committing everything to GitHub:

$ git tag -f "0.0" bad4ff7fc48b8b34f6f661d75c782c7fc0d098c5
$ git tag -f "0.1" 590ac9953df0e3bc76fd02615471e36a9796a065
$ git tag -f "0.2" 33f731054042b02c6d2600e7aead5bb7c4991b12
$ git filter-branch --env-filter '
      if [ $GIT_COMMIT = 361224542bc73bba747c7ca382e992e2cdd0c356 ]
      then
          export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
          export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
      fi' -- --all
$ git remote add origin git@github.com:kdeldycke/webping.git
$ git push -u origin master
$ git push --tags

FTT Migration from Subversion to Git

Last month I released the Feed Tracking Tool project (aka FTT) on GitHub. I reconstructed the code history from old tarballs. In the mean time, my friend at Uperto managed to recover the original Subversion repository from very old backups. Here is how I migrated the old SVN repository to GitHub.

First, I started a local Subversion server with the repository my co-worker gave me:

$ tar xvzf ./ftt-svn.tar.gz
$ sed -i 's/# password-db = passwd/password-db = passwd/' ./ftt-svn/conf/svnserve.conf
$ echo "kevin = kevin" >> ./ftt-svn/conf/passwd
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./ftt-svn

Then I created a local Git repository, using my initialization routine:

$ rm -rf ./ftt-git
$ mkdir ./ftt-git
$ cd ./ftt-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"

The next step consist in importing the Subversion repository to Git:

$ git svn init --no-metadata --username kevin svn://localhost:3690
$ git svn fetch

Here I rebased the imported git-svn branch to the main branch:

$ git rebase --onto git-svn master
$ git rebase init master

At that point I don’t need the remote git-svn branch so I removed it:

$ git branch -r -D git-svn

To clean things up, let’s remove all SVN metadatas and local commit backups:

$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

We can now proceed to alter the code history. In FTT we never created branches. I also plan to recreate tags by hand later. So I decided to remove all the tags and branches folders coming from Subversion:

$ git filter-branch --force --prune-empty --tree-filter 'rm -rf ./tags*'     -- --all
$ git filter-branch --force --prune-empty --tree-filter 'rm -rf ./branches*' -- --all

Now let’s move the trunk directory to the base of the repository. I didn’t used the --subdirectory-filter parameter as FTT started its life without a proper “branches/tags/trunk” SVN structure:

$ git filter-branch --force --prune-empty --tree-filter 'test -d ./trunk && cp -axv ./trunk/* ./ && rm -rf ./trunk || echo "No trunk folder found"' -- --all

Next is the Git command I used to fix commit authorship:

$ git filter-branch --force --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "kdeldycke" ]
      then
        export GIT_AUTHOR_NAME="Kevin Deldycke"
        export GIT_AUTHOR_EMAIL="kevin@deldycke.com"
    fi
    if [ "$GIT_AUTHOR_NAME" = "qdesert" ]
      then
        export GIT_AUTHOR_NAME="Quentin Desert"
        export GIT_AUTHOR_EMAIL="quentin.desert@uperto.com"
    fi
    ' -- --all

While exploring my own backups of the FTT project, I stumble upon a preliminary HTML mockup of the app. I decided to include it in the final repository, as the first commit, just after my init tag. Here how I did this, assuming the mockup sources were available in the ../mockup directory:

$ git branch mockup-injection init
$ git checkout mockup-injection
$ cp -axv ../mockup .
$ git add --all
$ git commit --all --date="2007-07-17 15:49" --author="Quentin Desert <quentin.desert@uperto.com>" -m "Commit the oldest mockup I can find."
$ git rebase --onto mockup-injection init master
$ git branch -D mockup-injection

The procedure above come from my “Commit history reconstruction” article.

Now I can tag by hand all FTT releases.

$ git tag -f "0.4.1"  5f5cc2a36743f2c8d2088669e475ef09d8cec029
$ git tag -f "0.5"    54a76e143f9f2efdec88d3181cbcfbfddda5f725
$ git tag -f "0.6"    934447f185330903c389364bed94e994f6b280e6
$ git tag -f '0.7'    ef87ab3287ba23655781565fd622345c942d9c49
$ git tag -f "0.8"    cdcf2f459826019bbbc5874d6632392b07ea889b
$ git tag -f "0.8.1"  f47a3f219eb918069efe701d082928cdb953f05f
$ git tag -f "0.8.2"  2542754dd088d359ce96db8511e0a15588eb50ce
$ git tag -f "0.8.3"  ea9455c0ed75cf504c1cc872d5e5946b578ae702
$ git tag -f "0.9.0"  57a39879b3bcc61bd9560d7ac4e71cbfd0af22df
$ git tag -f "0.9.1"  e483fd1a287fa86a8b12d088b78a319b0990e6ef
$ git tag -f "0.10.0" ed77af77506836892be78044ae4ef15d07f18583

FTT was always developed as an internal app. As such the code and its history still contain lots of sensible informations. I deeply audited the code to identify the kind of data that we should absolutely not disclose to the outside world.

At the end of this code review, I just found references to our internal architecture (server’s names and IP addresses), and some usernames and passwords. There was also some logs and temporary files. I cleaned them all with the following set of Git commands:

$ git filter-branch --force --prune-empty --tree-filter 'find . -iname ".svn"        | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.log"       | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*~"          | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.pid"       | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.ppid"      | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "ruby_sess.*" | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/password: 1234567/password: *******/g"   "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/smtp\.server12\.com/smtp\.uperto\.com/g" "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/192\.168\.0\.2/12\.34\.56\.78/g"         "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/user qdesert/user *******/g"             "{}" \;' -- --all

After all these modifications, I was pretty sure my code was ready to be published. But better safe than sorry, I spent a couple of minutes to do a second deep code review to check that I didn’t missed anything. And to push the reviewing process even further, I offer a beer at the local bar for anyone finding sensible information in FTT’s code base ! :)

The last things I did was to delete the old FTT’s GitHub repository and recreate it. Then I fixed my first commit date, cleaned Git’s local backup and pushed my carefully crafted repository to its new GitHub’s home:

$ export GIT_TMP_INIT_HASH=`git show-ref init | cut -d ' ' -f 1`
$ git filter-branch --env-filter '
    if [ $GIT_COMMIT = $GIT_TMP_INIT_HASH ]
      then
        export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
        export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
    fi' -- --all
$ unset GIT_TMP_INIT_HASH
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
$ git remote add origin git@github.com:kdeldycke/feed-tracking-tool.git
$ git push origin master --force --tags

Feed Tracking Tool released under an Open-Source license

I’ve just open-sourced the Feed Tracking Tool project (aka “FTT”), my first (and only) Ruby on Rails experience.

This tool was developed within Uperto, the company I currently work for, for its internal needs. The project had an ancestor written in 2006 that was based on Pylons. It was a prototype and was barely working. Iterating over the abandoned Python code base was considered a waste of time. So in summer 2007, it was decided to rewrite this application from scratch.

As my co-worker was available and already played with Ruby on Rails, he was tasked to create the initial code base. I joined the project early on, as it was a great opportunity to play with the (then really trendy) Ruby on Rails framework.

At the end FTT was essentially a test project to explore Ruby on Rails. It was never deployed on a production server and was never used.

After roting for more than 3 years, and representing absolutely no business value in itself, I decided to release it under a GPLv2 license (with Uperto’s approval of course). My intention with this open-source release is to share back knowledge and code with the community.

FTT was living in a private Subversion repository at Uperto, but we unfortunately lost it. During the last few weeks I tried to rebuild the code history from old and partial backups. I then used my Git-based reconstruction method to consolidate everything in a Git repository. The code is now available on GitHub.

I don’t plan to maintain this project. But I may reboot it in the future if I need feed-related features, or if I need an excuse to play with Ruby on Rails again. But for now beware: the code is quite outdated and is only running on old Rails 1.2.x. This project should be considered as an ugly legacy code base. So please be indulgent while looking at FTT’s code: it was the work of unexperienced RoR developers ! ;)

Pushing Git to Subversion: the case of WordPress plugin repository

Some weeks ago I moved my e107 Importer project from a big fat Git repository to its own.

Then I wanted to have my plugin to be available on WordPress.org. In fact, this list is the tip of WordPress plugin hosting solution. It means that if you want to have your plugin there, you have to push your code in WordPress’ big Subversion repository. And that’s when I realized I had to sync my Git repository to Subversion…

This article details how I managed to push to Subversion all my developments activity taking place in Git.

Before going further: be careful ! It’s really easy to mess things up. After all, we’re trying to push code on a public Subversion repository. We must be certain of what we are doing here. Risks of deleting stuff that are not ours are great.

The simulation

To prevent any big mistake, we’ll test our commands on a local subversion repository.

Let’s create one:

$ rm -rf svn-repo
$ svnadmin create ./svn-repo

Now we’ll launch a local Subversion server with a minimal config:

$ sed -i 's/# password-db = passwd/password-db = passwd/' ./svn-repo/conf/svnserve.conf
$ echo "kevin = kevin" >> ./svn-repo/conf/passwd
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./svn-repo

To test our server, let’s checkout a local working copy from it:

$ rm -rf svn-working-copy
$ svn co svn://localhost:3690 svn-working-copy
$ cd svn-working-copy/
$ svn info

To simulate an already active Subversion repository, we’ll make a first commit with a structure mimicking WordPress’ plugin repository:

$ mkdir -p e107-importer/{trunk,branches,tags}
$ svn add *
$ svn commit -m "Create a WordPress-like repository structure" --username kevin
$ svn up
$ svn info

Now that we have a place to hack, we can experiment on Git side. We start with a copy of my plugin repository:

$ cd ..
$ rm -rf e107-importer
$ git clone git@github.com:kdeldycke/e107-importer.git
$ cd e107-importer

Thanks to git-svn, we can attach a remote Subversion repository:

$ git svn init --trunk=e107-importer/trunk --branches=e107-importer/branches --tags=e107-importer/tags --username kevin  svn://localhost:3690

Get get a copy of Subversion’s content:

$ git svn fetch
r1 = d969aa9a11684a1cd2ba0b3eab0a3ee72a62af51 (refs/remotes/trunk)

Now we will rebase our whole Git tree to Subversion’s trunk:

$ git rebase trunk

According gitg, the result of this is 2 parallel trees:

  • the first is the untouched original tree;
  • the other start on the trunk branch and continue with a copy of the original tree, and is the result of the rebase.

But the latter has a problem: my initial commit and all my tags are squashed. I tried several methods to rebase my whole Git tree onto the local trunk branch while keeping these. But I failed.

I resigned myself and passed over this. After all, the initial commit played its role, by taking care of this corner-case.

As for the tags, I just re-added them by hand. I forced their creation, as Git keeps them attached to the original parallel tree:

$ git tag -f "e107-importer-0.1" 728ec8689d13350bbfc1f2d9dc17dda2b8a8fdbf
$ git tag -f "e107-importer-0.2" 8049b92265a41f594e97020bae6f3aa74b6a7fb1
$ git tag -f "e107-importer-0.3" 9505aa0656ba61f39cd6cb91c76c1e7279c68473
$ git tag -f "e107-importer-0.4" 0da2d61239c9a9549d197737518705912fd4982d
$ git tag -f "e107-importer-0.5" 561d35b5d1b4d2c35e13c76a3f2a41689c96e991
$ git tag -f "e107-importer-0.6" c6de1a2bf60cad054c5420eab2f30f190092fb68
$ git tag -f "e107-importer-0.7" 6ad4d4a67e8b84da31565383e5eed6ceb5b7d2b2
$ git tag -f "e107-importer-0.8" 47b8efdc82132027b139a2f214f119cee1e9c06c
$ git tag -f "e107-importer-0.9" a82f5d0814db7cf6ac7a1ac171b30c300e1a91d4

Now we are ready to push the code to the remote Subversion repository:

$ git svn dcommit

Things seems to have worked, as if you go back to your local copy of the simulated remote SVN, you’ll get all your code base and its history:

$ cd ..
$ cd svn-working-copy
$ svn up
$ svn log

If commit order is preserved, dates are not, because unlike Git, Subversion only track the commit date, not the author’s date. This is sad but expected.

But here I was hoping that Git-svn was smart enough to create tags automatically. They weren’t, and my tags folder remained empty. That may be due to the nature of tags in Subversion, which are just branches. I don’t know. At the end I just decided to create tags by hand on Subversion side:

$ svn copy svn://localhost:3690/e107-importer/trunk@2  svn://localhost:3690/e107-importer/tags/0.1 -m "Tag e107-importer 0.1"
$ svn copy svn://localhost:3690/e107-importer/trunk@4  svn://localhost:3690/e107-importer/tags/0.2 -m "Tag e107-importer 0.2"
$ svn copy svn://localhost:3690/e107-importer/trunk@5  svn://localhost:3690/e107-importer/tags/0.3 -m "Tag e107-importer 0.3"
$ svn copy svn://localhost:3690/e107-importer/trunk@6  svn://localhost:3690/e107-importer/tags/0.4 -m "Tag e107-importer 0.4"
$ svn copy svn://localhost:3690/e107-importer/trunk@8  svn://localhost:3690/e107-importer/tags/0.5 -m "Tag e107-importer 0.5"
$ svn copy svn://localhost:3690/e107-importer/trunk@9  svn://localhost:3690/e107-importer/tags/0.6 -m "Tag e107-importer 0.6"
$ svn copy svn://localhost:3690/e107-importer/trunk@10 svn://localhost:3690/e107-importer/tags/0.7 -m "Tag e107-importer 0.7"
$ svn copy svn://localhost:3690/e107-importer/trunk@11 svn://localhost:3690/e107-importer/tags/0.8 -m "Tag e107-importer 0.8"
$ svn copy svn://localhost:3690/e107-importer/trunk@12 svn://localhost:3690/e107-importer/tags/0.9 -m "Tag e107-importer 0.9"

Real life push to WordPress repository

Now that our commit simulation worked somehow, we can perform them in the real world.

First, initialize a copy of the Git repository:

$ rm -rf e107-importer-git
$ git clone git@github.com:kdeldycke/e107-importer.git e107-importer-git

Let’s attach Subversion to Git:

$ cd e107-importer-git
$ git svn init --trunk=trunk --branches=branches --tags=tags http://plugins.svn.wordpress.org/e107-importer

Here you might want to do a git svn fetch as we did before. But this will take a while. Especially on WordPress plugin repository, as Git will browse all SVN revisions (more than 330.000 currently).

To speed things up, and following a tip from Nicolas Kuttler, we’ll search for the revision we’re interested in (the start of our plugin subfolder life), then fetch from here:

$ svn log --limit 1 http://plugins.svn.wordpress.org/e107-importer
------------------------------------------------------------------------
r333566 | plugin-master | 2011-01-17 17:09:40 +0100 (Mon, 17 Jan 2011) | 1 line

adding e107-importer by Coolkevman
------------------------------------------------------------------------
$ git svn fetch -r333566
r333566 = b850438a98c26a8f55ee2ddd7bdf8816d0390a1b (refs/remotes/trunk)

And now we can send our massive payload, after rebasing our master branch to SVN’s trunk:

$ git rebase trunk
$ git svn dcommit --username=Coolkevman

We can then contemplate our work in the official WordPress plugin repository.

There is one problem though: git-svn has left empty folders because of renaming. Let’s fix this:

$ svn rm http://plugins.svn.wordpress.org/e107-importer/trunk/bbcode -m "Git-svn doesn't delete empty folders on move." --username=Coolkevman

Last thing to do is to tag our old versions on Subversion, as we did in our simulation:

$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336229 http://plugins.svn.wordpress.org/e107-importer/tags/0.1 -m "Tag e107-importer 0.1"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336231 http://plugins.svn.wordpress.org/e107-importer/tags/0.2 -m "Tag e107-importer 0.2"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336232 http://plugins.svn.wordpress.org/e107-importer/tags/0.3 -m "Tag e107-importer 0.3"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336233 http://plugins.svn.wordpress.org/e107-importer/tags/0.4 -m "Tag e107-importer 0.4"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336235 http://plugins.svn.wordpress.org/e107-importer/tags/0.5 -m "Tag e107-importer 0.5"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336236 http://plugins.svn.wordpress.org/e107-importer/tags/0.6 -m "Tag e107-importer 0.6"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336237 http://plugins.svn.wordpress.org/e107-importer/tags/0.7 -m "Tag e107-importer 0.7"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336238 http://plugins.svn.wordpress.org/e107-importer/tags/0.8 -m "Tag e107-importer 0.8"
$ svn copy http://plugins.svn.wordpress.org/e107-importer/trunk@336239 http://plugins.svn.wordpress.org/e107-importer/tags/0.9 -m "Tag e107-importer 0.9"

But this mean I had to clean up tags too, to remove the remaining empty folder.

Pushing new commits

All of the above only works with an newly created plugin structure on WordPress plugin repository. What if we want to push new commits to Subversion once we’ve already pushed part of our Git history ?

First, let’s make our life miserable and delete all our local repositories:

$ cd ..
$ rm -rf e107-importer-git

Now, if we replay the steps above, the git rebase trunk command will ends with loads of conflicts. The procedure is different this time and is explained by Ikke.

This involves Git’s graft:

$ git clone git@github.com:kdeldycke/e107-importer.git e107-importer-git
$ cd e107-importer-git
$ git svn init --trunk=trunk --branches=branches --tags=tags http://plugins.svn.wordpress.org/e107-importer
$ git svn fetch -r333566
$ git show-ref trunk
$ git log --pretty=oneline master | tail -n1
$ echo `git log --pretty=oneline master | tail -n1 | cut -d ' ' -f 1` `git show-ref trunk | cut -d ' ' -f 1` >> .git/info/grafts
$ git svn dcommit

The last command will not end well, with Git complaining about unmerged differences. This is likely due to my additional commit removing the empty folder left by git-svn. Fortunately Git suggest something in its log:

If you are attempting to commit  merges, try running:
  git rebase --interactive --preserve-merges  refs/remotes/trunk
Before dcommitting

Well, that’s what I exactly did:

$ git rebase --interactive --preserve-merges refs/remotes/trunk
$ git svn dcommit

And it magically fixed the issue ! :)

I’m quite happy now to have a clearly identified workflow to push my Git updates to Subversion ! :)

Moving a Git sub-tree to its own repository

Coming from Subversion (and with Plone collective repository structure in mind), I’ve recently moved all my tiny software projects in a big standalone Git repository (named kev-code). Now that I figured out that GitHub allows you to create unlimited amount of repositories, as long as they are open-source public projects, it make sense to emancipate some of my projects to their own repository. How do I move a sub-tree to its own repository ? That’s what I talk about in this article.

First, there is an automated way of performing this task with git-subtree. You should try it first. For some reasons I didn’t investigate, git-subtree didn’t worked for me. So I’ll explain now how I did it by hand.

The idea is to revisit the history of my bloated Git repository and massively delete everything that is not related to the sub-folder I’m looking to export. In this case, I try to make a dedicated repository for my e107 importer for WordPress.

Let’s start by getting a local copy of my source repository:

git clone git@github.com:kdeldycke/kev-code.git
cd scripts

Then I’ll use the filter-branch action with a combination of find and rm to remove everything except the source code of my plugin:

git filter-branch --prune-empty --tree-filter 'find ./ -maxdepth 1 -not -path "./e107*" -and -not -path "./wordpress-e107*" -and -not -path "./.git" -and -not -path "./" -print -exec rm -rf "{}" \;' -- --all

Instead of the command above, I could have use the --subdirectory-filter option (as suggested by jamessan on Stack Overflow):

git filter-branch --prune-empty --subdirectory-filter e107-importer -- --all

But this doesn’t work in my case as my e107 Importer plugin didn’t started its life straight in a dedicated folder. So this command squash some of the history I want to preserve.

At this point I’m left with this following history:

This looks pretty good, as all the history of my plugin is kept in order. But tags unrelated to my plugin are still there. Let’s remove them:

git tag -d coolkevmen-0.3 cool-blue-0.1 sapphire-0.1 sapphire-0.2 sapphire-0.3 sapphire-0.4

Now there is some commits polluting my history. These are left-overs of git-modules additions. I tried to removed them, but it didn’t worked. Also left in the history are unwanted merges and empty commits from an old CVS import. To clean this up, I started an interactive rebase:

git rebase --interactive init

There, using my text editor, I deleted the entries corresponding to these unrelated commits (namely c21a840, 0dc1d76, 37473a8 and c6f9f64), and hoped Git will be smart enough to reconstruct a clean history:

Luckily, it worked for me. If Git complain about such abuse, you may ignore warnings and force it to continue:

git rebase --continue

Now that we only have a clean sub-tree, let’s create a dedicated local Git repository to receive our branch:

cd ..
mkdir e107-importer
cd e107-importer
git init

Add a temporary origin hooked on our source repository:

git remote add origin ../kev-code

And import the master branch we carefully crafted (including tags):

git pull --tags origin master

Now we can create on GitHub the new repository that will receive our exported project:

It’s time to push our changes. Let’s replace our temporary origin to the new GitHub repository we just created:

git remote rm origin
git remote add origin git@github.com:kdeldycke/e107-importer.git
git push origin master --force --tags

So now we have a copy of the sub-tree of my plugin into its own repository. That’s great, but there is still some stuff to clean-up.

First, we will rewrite the repository to look as if the ./e107-importer sub-folder had been its project root since the beginning:

git filter-branch --tree-filter 'test -d ./e107-importer && mv ./e107-importer/* ./ || echo "No folder found"' -- --all

Then, I’ve altered some commit messages to fix inconsistencies due to sub-folder removal:

git filter-branch --msg-filter 'sed "s/Move the script to a dedicated folder/Rename script/g"' -- --all

Finally, at the bottom of the history, I still have my initial commit (a personal habit of mine when I initialize my Git repositories). But its date was updated by the first filter-branch call. Let’s set its date back to epoch:

git filter-branch --force --env-filter \
  'if [ $GIT_COMMIT = a2a5c05aed893fdd10250b724eb6a54bc6e7f122 ]
     then
       export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
       export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
   fi' -- --all

We can now send our latest changes to the remote GitHub repository by forcing a push:

git push --force

Last thing we have to do, is to remove the plugin code from the fat source repository (I don’t like duplicates). But that’s another story for another article…