2 weeks ago I released WebPing . This article is more or less the same I wrote 4 months ago when I released the FTT project and needed to move it from SVN to Git . But this time I added more details on how I removed all sensible information that were hard-coded in the project files.
Subversion to Git migration ¶
Everything starts out of a local copy of the Subversion repository that was hosting the WebPing project since its inception:
$ rm -rf ./svn-repository-copy
$ tar xvzf ./svn-repository-copy.tar.gz
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./svn-repository-copy
Let’s initialize a Git repository:
$ rm -rf ./webping-git
$ mkdir ./webping-git
$ cd ./webping-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"
We now migrate the code from Subversion to Git:
$ git svn init --no-metadata --username deldycke svn://localhost:3690
$ git svn fetch
$ git rebase --onto git-svn master
$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
Hide and obfuscate hard-coded content ¶
As WebPing was created for internal needs in my previous job, its original code base contains lots of references to the former infrastructure it lives in. My professional standards requires me to remove all these sensible information before making WebPing available to the public.
For example, here is the commands which allowed me to remove all references to hostnames of our intranets:
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec perl -i -pe "s/([\w-.]*?)\.(company(-intranet|-extension)?)\.(fr|com|net|org)/intranet\.example\.com/g" "{}" \;' -- --all
The Perl one-liner embedded in the command above will only apply the regular expression on a line-by-line basis. If you want to have the regexp applied on the whole content of each file, you have to use Perl’s slurp mode ( source of that tip ):
$ git filter-branch --force --prune-empty --tree-filter 'perl -0777 -i -pe "s/MAILING_LIST\s*=\s*\[(.*?)\]/MAILING_LIST = \[\]/gs" ./web-ping.py' -- --all
The specific example above helped me removed the content of the
MAILING_LIST
Python list found in
web-ping.py
, in order to protect from spam the email addresses of my former co-workers that were unfortunately hard-coded in that variable.
Another place to hunt for sensible information is commit messages. These can be easily modified thanks to the
--msg-filter
option. Here is how I removed references to our internal
Trac
tickets:
$ git filter-branch --force --msg-filter 'sed "s/ (see ticket:666)//g"' -- --all
I also had to remove line returns introduced by abusive usage of Windows text editors (remember, WebPing was born in a corporate environment):
$ git filter-branch --force --prune-empty --tree-filter 'perl -i -pe "s/\r//" ./*' -- --all
The last useful command I use was the following, to fix author’s name and email:
$ git filter-branch --force --env-filter '
if [ "$GIT_AUTHOR_NAME" = "deldycke" ]
then
export GIT_AUTHOR_NAME="Kevin Deldycke"
export GIT_AUTHOR_EMAIL="[email protected]"
fi
if [ "$GIT_AUTHOR_NAME" = "diehr" ]
then
export GIT_AUTHOR_NAME="Matthieu Diehr"
export GIT_AUTHOR_EMAIL="[email protected]"
fi
' -- --all
By using a dozen variations of the commands above, and carefully reviewing the code, I was able to engineer a clean code history.
But I certainly have been a little too blunt with these regular expressions. Some of them were able to act on binary content. As a result, I had to restore static images to their original copy.
Final steps ¶
Now that your code is clean, all you need is to recreate you tag and fix the
init
tag date before committing everything to GitHub:
$ git tag -f "0.0" bad4ff7fc48b8b34f6f661d75c782c7fc0d098c5
$ git tag -f "0.1" 590ac9953df0e3bc76fd02615471e36a9796a065
$ git tag -f "0.2" 33f731054042b02c6d2600e7aead5bb7c4991b12
$ git filter-branch --env-filter '
if [ $GIT_COMMIT = 361224542bc73bba747c7ca382e992e2cdd0c356 ]
then
export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
fi' -- --all
$ git remote add origin [email protected]:kdeldycke/webping.git
$ git push -u origin master
$ git push --tags