Last month I released the Feed Tracking Tool project (aka FTT) on GitHub. I reconstructed the code history from old tarballs. In the mean time, my friend at Uperto managed to recover the original Subversion repository from very old backups. Here is how I migrated the old SVN repository to GitHub.

First, I started a local Subversion server with the repository my co-worker gave me:

$ tar xvzf ./ftt-svn.tar.gz
$ sed -i 's/# password-db = passwd/password-db = passwd/' ./ftt-svn/conf/svnserve.conf
$ echo "kevin = kevin" >> ./ftt-svn/conf/passwd
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./ftt-svn

Then I created a local Git repository, using my initialization routine:

$ rm -rf ./ftt-git
$ mkdir ./ftt-git
$ cd ./ftt-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"

The next step consist in importing the Subversion repository to Git:

$ git svn init --no-metadata --username kevin svn://localhost:3690
$ git svn fetch

Here I rebased the imported git-svn branch to the main branch:

$ git rebase --onto git-svn master
$ git rebase init master

At that point I don’t need the remote git-svn branch so I removed it:

$ git branch -r -D git-svn

To clean things up, let’s remove all SVN metadatas and local commit backups:

$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

We can now proceed to alter the code history. In FTT we never created branches. I also plan to recreate tags by hand later. So I decided to remove all the tags and branches folders coming from Subversion:

$ git filter-branch --force --prune-empty --tree-filter 'rm -rf ./tags*'     -- --all
$ git filter-branch --force --prune-empty --tree-filter 'rm -rf ./branches*' -- --all

Now let’s move the trunk directory to the base of the repository. I didn’t used the --subdirectory-filter parameter as FTT started its life without a proper “branches/tags/trunk” SVN structure:

$ git filter-branch --force --prune-empty --tree-filter 'test -d ./trunk && cp -axv ./trunk/* ./ && rm -rf ./trunk || echo "No trunk folder found"' -- --all

Next is the Git command I used to fix commit authorship:

$ git filter-branch --force --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "kdeldycke" ]
      then
        export GIT_AUTHOR_NAME="Kevin Deldycke"
        export GIT_AUTHOR_EMAIL="[email protected]"
    fi
    if [ "$GIT_AUTHOR_NAME" = "qdesert" ]
      then
        export GIT_AUTHOR_NAME="Quentin Desert"
        export GIT_AUTHOR_EMAIL="[email protected]"
    fi
    ' -- --all

While exploring my own backups of the FTT project, I stumble upon a preliminary HTML mockup of the app. I decided to include it in the final repository, as the first commit, just after my init tag. Here how I did this, assuming the mockup sources were available in the ../mockup directory:

$ git branch mockup-injection init
$ git checkout mockup-injection
$ cp -axv ../mockup .
$ git add --all
$ git commit --all --date="2007-07-17 15:49" --author="Quentin Desert <[email protected]>" -m "Commit the oldest mockup I can find."
$ git rebase --onto mockup-injection init master
$ git branch -D mockup-injection

The procedure above come from my “Commit history reconstruction” article.

Now I can tag by hand all FTT releases.

$ git tag -f "0.4.1"  5f5cc2a36743f2c8d2088669e475ef09d8cec029
$ git tag -f "0.5"    54a76e143f9f2efdec88d3181cbcfbfddda5f725
$ git tag -f "0.6"    934447f185330903c389364bed94e994f6b280e6
$ git tag -f '0.7'    ef87ab3287ba23655781565fd622345c942d9c49
$ git tag -f "0.8"    cdcf2f459826019bbbc5874d6632392b07ea889b
$ git tag -f "0.8.1"  f47a3f219eb918069efe701d082928cdb953f05f
$ git tag -f "0.8.2"  2542754dd088d359ce96db8511e0a15588eb50ce
$ git tag -f "0.8.3"  ea9455c0ed75cf504c1cc872d5e5946b578ae702
$ git tag -f "0.9.0"  57a39879b3bcc61bd9560d7ac4e71cbfd0af22df
$ git tag -f "0.9.1"  e483fd1a287fa86a8b12d088b78a319b0990e6ef
$ git tag -f "0.10.0" ed77af77506836892be78044ae4ef15d07f18583

FTT was always developed as an internal app. As such the code and its history still contain lots of sensible informations. I deeply audited the code to identify the kind of data that we should absolutely not disclose to the outside world.

At the end of this code review, I just found references to our internal architecture (server’s names and IP addresses), and some usernames and passwords. There was also some logs and temporary files. I cleaned them all with the following set of Git commands:

$ git filter-branch --force --prune-empty --tree-filter 'find . -iname ".svn"        | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.log"       | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*~"          | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.pid"       | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.ppid"      | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "ruby_sess.*" | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/password: 1234567/password: *******/g"   "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/smtp\.server12\.com/smtp\.uperto\.com/g" "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/192\.168\.0\.2/12\.34\.56\.78/g"         "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/user qdesert/user *******/g"             "{}" \;' -- --all

After all these modifications, I was pretty sure my code was ready to be published. But better safe than sorry, I spent a couple of minutes to do a second deep code review to check that I didn’t missed anything. And to push the reviewing process even further, I offer a beer at the local bar for anyone finding sensible information in FTT’s code base! :)

The last things I did was to delete the old FTT’s GitHub repository and recreate it. Then I fixed my first commit date, cleaned Git’s local backup and pushed my carefully crafted repository to its new GitHub’s home:

$ export GIT_TMP_INIT_HASH=`git show-ref init | cut -d ' ' -f 1`
$ git filter-branch --env-filter '
    if [ $GIT_COMMIT = $GIT_TMP_INIT_HASH ]
      then
        export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
        export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
    fi' -- --all
$ unset GIT_TMP_INIT_HASH
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
$ git remote add origin [email protected]:kdeldycke/feed-tracking-tool.git
$ git push origin master --force --tags