Subversion commits and mail activity stream in iCalendar

Last week I consolidated all my code in my GitHub repository. I stumble upon an old script I haven’t publicized yet: svn2ical.py.

This is a simple hack which get commit metadata out of a Subversion repository and generate an iCalendar file containing all commits of a given author. I used it back then to visualize in a calendar my commit activity. Nowadays this script is quite useless as services like Ohloh and GitHub provides great timeline and activity streams. But this script can still be useful for private repositories.

And in the same spirit of this script, I uncovered maildir2ical.py, a script that look in a maildir folder for mails sent by a particular author, then generate an iCal file based on mail dates.

Commit history reconstruction with Git

Here is something I wanted to do for 3 years. I wanted to migrate my code repository from this:

to a proper revision control system, like Subversion. And I wanted to reconstruct the commit history with all the proper dates. That’s something I can’t do with SVN.

Then came Git. I knew that Git was powerful enough to let me manipulate the history (at my own risks). So I studied it during the last weeks until I found an acceptable way to do exactly what I had in mind. Here are my notes regarding this journey.

First, I need to get a local copy of my GitHub repository. That’s the place where I want all my code to reside at the end of the process.

cd ~
git clone git@github.com:kdeldycke/kev-code.git

In gitg, my untouched repository looks like this:

Notice all the pre-existing code.

Let’s create a history-injection branch from the init tag. The later is the root of my repository, as explained in my previous post on how I initialize my Git repositories.

git branch history-injection init

Then switch to our brand new branch:

git checkout history-injection

We are now in a safe and contained environment in which we can do all our dirty stuff. Let’s move the file we want to add in our repository:

cp ~/kev-code/website-backup-2006_04_30.py ~/kev-code/website-backup.py

Commit this new file locally, as usual, but with a commit date set in the past:

cd ~/kev-code
git add --all
git commit --all --date="2006-04-30 23:17" -m "First version of a script to backup several remote websites via FTP and make bzip2 archives."

I can repeat the last steps to reconstruct the commit history of my website-backup.py script:

cp ~/kev-code/website-backup-2006_10_29.py ~/kev-code/website-backup.py
git commit --all --date="2006-10-29 23:13" -m "Delete previous backups if nothing has changed."
cp ~/kev-code/website-backup-2006_11_01.py ~/kev-code/website-backup.py
git commit --all --date="2006-11-01 23:14" -m "Keep monthly bzip2 snapshots of backups and incremental backups of the last 32 days thanks to rdiff-backup."
(...)

At last, the history-injection branch contain all version of website-backup.py:

Now I’ll use the rebase directive to insert the history-injection branch back in the main line (aka master). This insertion will take place just after the init tag. This translates to the following Git command:

git rebase --preserve-merges --onto history-injection init master

The --preserve-merges option is really important here to not let Git takes too much initiatives. Without this option, all our banches between the init tag and the head of the master branch will be rebased. Believe me, that’s not what we want.

I no longer need my temporary history-injection branch. Let’s remove it:

git branch -D history-injection

Now you should have a unique and straight history line from init tag to master head. Like this:

Commits appears to be ordered as they should but you may not be as lucky as me. In fact the recently merge commits are stuck at the “bottom” (just after the init tag, as we asked Git to do on rebase). And you may find you in a situation where commits of the whole master branch are not chronologically ordered.

Here is such an example. It happened when I tried to rebase the full history of my system-backup.py script:

I haven’t found a way to tell Git how to rebase by following commit dates. I know that something can be done with a command like:

git rebase --interactive init

But I haven’t succeeded yet. So I left these commits unsorted for now. I may write another blog post in the future if I find a way to cleanly sort them. In the mean time, If you have a solution, I’ll be happy to ear that !

Finally, when we have something that looks good, we can push our changes to our remote GitHub repository:

git push origin

But Git will complain: changing already-pushed commits is bad. As I explained several weeks ago, it’s dangerous but I don’t care. I’m the only user of this repository. So let’s bypass Git’s wise warnings:

git push origin +master:master

Et voilĂ  ! By repeating these steps several times, I moved my code to GitHub, with a consistent and clean commit history.

How-to fix bad commit authorship in Git

Several months ago I commited some code in my GitHub repository, but I did it from a temporary system. If I registered my authentication keys correctly to commit stuff, I forgot to create a minimal ~/.gitconfig file with the right stuff in it.

The result was not good looking, as my usual name and mail address were not attached to the commit:

Let’s fix this !

First, get a local copy of the remote Git repository:

git clone git@github.com:kdeldycke/kev-code.git

What was missing in my ~/.gitconfig file were the following options:

[user]
name = Kevin Deldycke
email = kevin@deldycke.com

These values can be set with Git command line with the following syntax:

--author 'user.name <user.email>'

The commit I want to change is the latest in history, so I’ll use the --amend directive to make my changes. Putting all things together, our final command becomes:

git commit --amend --author 'Kevin Deldycke <kevin@deldycke.com>'

After this, here is how the local branches looks like in gitg:

Using the git log -n1 command, we can compare the old commit:

commit 81a26f03901918ed4a954d964b2659187f1cc988
Author: kevin <kevin@laptop-kev.(none)>
Date:   Mon Mar 8 22:49:43 2010 +0100

    Update old shop logo with the brand new one

with the new one:

commit adf4620f3d8a89746dd643dcefc3f900f0f69878
Author: Kevin Deldycke <kevin@deldycke.com>
Date:   Mon Mar 8 22:49:43 2010 +0100

    Update old shop logo with the brand new one

Notice the fixed authorship. The commit ID was also updated as it’s just a hash depending on commit metadata.

Now we can push our changes back to the remote repository:

git push origin

But this doesn’t work and throw the following error:

To git@github.com:kdeldycke/kev-code.git
 ! [rejected]        master -> master (non-fast forward)
error: failed to push some refs to 'git@github.com:kdeldycke/kev-code.git'

This is Git protection mechanism in action. Modifying already-published commits like this is a bad idea. It can break updates of other developers’ repository (if they already have pulled the commit we’re trying to change).

In our case we will force the remote repository to take our changes:

git push origin +master:master

As I told you before this is bad, but nobody really cares: I’m the only person working on this repository ! ;)

Finally, you can contemplate the result on GitHub, a clean and tidy commit history: