I’m Top 1% Open-Source Developer ! (and that’s a lie)

With all the recent active development on my e107 Importer WordPress plugin, I increased my public code contributions. This had the nice side effect of bumping my ranking on Ohloh from #8 (bronze level) to #9 (silver level):

Other interesting statistic is that I’m now ranked as open-source developer number 5673 on a population of 438276. Which place me in the top 1.3% of the population ! :D

That’s exciting, but irrelevant. Ohloh stats must be taken with a huge grain of salt.

  1. Some code are duplicated: my WordPress plugin is both present in my personal GitHub repository and in WordPress official plugin repository, inflating contributions on this project by 2.
  2. Not all open-source projects are tracked in Ohloh. Which make developer registered on Ohloh seems to be part of a smaller community.
  3. Not all developers on Ohloh have aggregated their repository accounts under one identity, making those who’ve done this looks like bigger contributors compared to others. This also artificially inflate the global population.
  4. Open-source contributions are not necessary in code repositories. Think about project promotion, maintenance of forges and websites, documentation, bug reports, testing, benchmarking, support (in mailing-lists, IRC and forums), …
  5. And most importantly, the best contributions are not always tied to high commit activity or number of added code lines. Thinks about removing old/dead/legacy code and refactoring. These may be the best code contributions a project will see.

That’s why Ohloh stats must not be taken for granted. But this doesn’t remove the fun you can get from them. Especially when they put me in a favorable light ! ;)

Commit history reconstruction with Git

Here is something I wanted to do for 3 years. I wanted to migrate my code repository from this:

to a proper revision control system, like Subversion. And I wanted to reconstruct the commit history with all the proper dates. That’s something I can’t do with SVN.

Then came Git. I knew that Git was powerful enough to let me manipulate the history (at my own risks). So I studied it during the last weeks until I found an acceptable way to do exactly what I had in mind. Here are my notes regarding this journey.

First, I need to get a local copy of my GitHub repository. That’s the place where I want all my code to reside at the end of the process.

cd ~
git clone git@github.com:kdeldycke/kev-code.git

In gitg, my untouched repository looks like this:

Notice all the pre-existing code.

Let’s create a history-injection branch from the init tag. The later is the root of my repository, as explained in my previous post on how I initialize my Git repositories.

git branch history-injection init

Then switch to our brand new branch:

git checkout history-injection

We are now in a safe and contained environment in which we can do all our dirty stuff. Let’s move the file we want to add in our repository:

cp ~/kev-code/website-backup-2006_04_30.py ~/kev-code/website-backup.py

Commit this new file locally, as usual, but with a commit date set in the past:

cd ~/kev-code
git add --all
git commit --all --date="2006-04-30 23:17" -m "First version of a script to backup several remote websites via FTP and make bzip2 archives."

I can repeat the last steps to reconstruct the commit history of my website-backup.py script:

cp ~/kev-code/website-backup-2006_10_29.py ~/kev-code/website-backup.py
git commit --all --date="2006-10-29 23:13" -m "Delete previous backups if nothing has changed."
cp ~/kev-code/website-backup-2006_11_01.py ~/kev-code/website-backup.py
git commit --all --date="2006-11-01 23:14" -m "Keep monthly bzip2 snapshots of backups and incremental backups of the last 32 days thanks to rdiff-backup."
(...)

At last, the history-injection branch contain all version of website-backup.py:

Now I’ll use the rebase directive to insert the history-injection branch back in the main line (aka master). This insertion will take place just after the init tag. This translates to the following Git command:

git rebase --preserve-merges --onto history-injection init master

The --preserve-merges option is really important here to not let Git takes too much initiatives. Without this option, all our banches between the init tag and the head of the master branch will be rebased. Believe me, that’s not what we want.

I no longer need my temporary history-injection branch. Let’s remove it:

git branch -D history-injection

Now you should have a unique and straight history line from init tag to master head. Like this:

Commits appears to be ordered as they should but you may not be as lucky as me. In fact the recently merge commits are stuck at the “bottom” (just after the init tag, as we asked Git to do on rebase). And you may find you in a situation where commits of the whole master branch are not chronologically ordered.

Here is such an example. It happened when I tried to rebase the full history of my system-backup.py script:

I haven’t found a way to tell Git how to rebase by following commit dates. I know that something can be done with a command like:

git rebase --interactive init

But I haven’t succeeded yet. So I left these commits unsorted for now. I may write another blog post in the future if I find a way to cleanly sort them. In the mean time, If you have a solution, I’ll be happy to ear that !

Finally, when we have something that looks good, we can push our changes to our remote GitHub repository:

git push origin

But Git will complain: changing already-pushed commits is bad. As I explained several weeks ago, it’s dangerous but I don’t care. I’m the only user of this repository. So let’s bypass Git’s wise warnings:

git push origin +master:master

Et voilĂ  ! By repeating these steps several times, I moved my code to GitHub, with a consistent and clean commit history.

Git commands

  • Get a clean local copy of my GitHub repository with read & write access:
    git clone git@github.com:kdeldycke/scripts.git
    
  • Switch to another branch:
    git checkout another_branch
    
  • Set the current repository in the state it was at commit 1234567:
    git checkout 1234567
    
  • Get the current commit number:
    git rev-parse HEAD
    
  • Print a nice graph of your commits sorted by date:
    git log --graph --all --pretty=oneline --abbrev-commit --date-order
    
  • Destroy all your local changes and get back a sane repository:
    git reset --hard
    
  • Send local repository modifications to remote one:
    git push origin
    
  • Attach a tag to a given commit:
    git tag "1.2.3" 8fe2934d1552c97246836987f0ea08e10ba749ae
    
  • Publish all tags to the remote repository:
    git push --tags
    
  • Add a remote repository located on GitHub as a submodule in the ./folder/project-copy folder:
    git submodule add https://github.com/my-id/project.git ./folder/project-copy
    
  • While playing with backups of a local repository, you may encounter this error:
    Cannot rewrite branch(es) with a dirty working directory.
    

    In this case, you can get back a clean repository by removing all the unstaged changes:

    git stash