e107 Importer 1.2 is out, with an enhanced BBCode parser.

Here is a brand new version of my e107 Importer plugin. This release fix lots of nasty bugs. Better, I added an enhanced BBCode parser which try to clean-up what e107′s parser output. This new parser also try to align the final HTML with what WordPress produce by default.

As usual, my plugin is available on the official WordPress plugin directory.

Here is the detailed changelog:

  • Upgrade e107 code to match latest 0.7.25-rc1.
  • Fix variable bleeding when importing items in batches.
  • Add a new way of handling e107 extended news using WordPress’ excerpts.
  • Parse BBCode and replace e107 constants in news excerpt.
  • Use internal WordPress library (kses) to parse HTML in the image upload step.
  • Do not upload the same images more than once.
  • Add a new enhanced BBCode parser on top of the one from e107. Make it the default parser.
  • Each time we alter the original imported content, we create a post revision.

Feed Tracking Tool released under an Open-Source license

I’ve just open-sourced the Feed Tracking Tool project (aka “FTT”), my first (and only) Ruby on Rails experience.

This tool was developed within Uperto, the company I currently work for, for its internal needs. The project had an ancestor written in 2006 that was based on Pylons. It was a prototype and was barely working. Iterating over the abandoned Python code base was considered a waste of time. So in summer 2007, it was decided to rewrite this application from scratch.

As my co-worker was available and already played with Ruby on Rails, he was tasked to create the initial code base. I joined the project early on, as it was a great opportunity to play with the (then really trendy) Ruby on Rails framework.

At the end FTT was essentially a test project to explore Ruby on Rails. It was never deployed on a production server and was never used.

After roting for more than 3 years, and representing absolutely no business value in itself, I decided to release it under a GPLv2 license (with Uperto’s approval of course). My intention with this open-source release is to share back knowledge and code with the community.

FTT was living in a private Subversion repository at Uperto, but we unfortunately lost it. During the last few weeks I tried to rebuild the code history from old and partial backups. I then used my Git-based reconstruction method to consolidate everything in a Git repository. The code is now available on GitHub.

I don’t plan to maintain this project. But I may reboot it in the future if I need feed-related features, or if I need an excuse to play with Ruby on Rails again. But for now beware: the code is quite outdated and is only running on old Rails 1.2.x. This project should be considered as an ugly legacy code base. So please be indulgent while looking at FTT’s code: it was the work of unexperienced RoR developers ! ;)

Moving a Git sub-tree to its own repository

Coming from Subversion (and with Plone collective repository structure in mind), I’ve recently moved all my tiny software projects in a big standalone Git repository (named kev-code). Now that I figured out that GitHub allows you to create unlimited amount of repositories, as long as they are open-source public projects, it make sense to emancipate some of my projects to their own repository. How do I move a sub-tree to its own repository ? That’s what I talk about in this article.

First, there is an automated way of performing this task with git-subtree. You should try it first. For some reasons I didn’t investigate, git-subtree didn’t worked for me. So I’ll explain now how I did it by hand.

The idea is to revisit the history of my bloated Git repository and massively delete everything that is not related to the sub-folder I’m looking to export. In this case, I try to make a dedicated repository for my e107 importer for WordPress.

Let’s start by getting a local copy of my source repository:

git clone git@github.com:kdeldycke/kev-code.git
cd scripts

Then I’ll use the filter-branch action with a combination of find and rm to remove everything except the source code of my plugin:

git filter-branch --prune-empty --tree-filter 'find ./ -maxdepth 1 -not -path "./e107*" -and -not -path "./wordpress-e107*" -and -not -path "./.git" -and -not -path "./" -print -exec rm -rf "{}" \;' -- --all

Instead of the command above, I could have use the --subdirectory-filter option (as suggested by jamessan on Stack Overflow):

git filter-branch --prune-empty --subdirectory-filter e107-importer -- --all

But this doesn’t work in my case as my e107 Importer plugin didn’t started its life straight in a dedicated folder. So this command squash some of the history I want to preserve.

At this point I’m left with this following history:

This looks pretty good, as all the history of my plugin is kept in order. But tags unrelated to my plugin are still there. Let’s remove them:

git tag -d coolkevmen-0.3 cool-blue-0.1 sapphire-0.1 sapphire-0.2 sapphire-0.3 sapphire-0.4

Now there is some commits polluting my history. These are left-overs of git-modules additions. I tried to removed them, but it didn’t worked. Also left in the history are unwanted merges and empty commits from an old CVS import. To clean this up, I started an interactive rebase:

git rebase --interactive init

There, using my text editor, I deleted the entries corresponding to these unrelated commits (namely c21a840, 0dc1d76, 37473a8 and c6f9f64), and hoped Git will be smart enough to reconstruct a clean history:

Luckily, it worked for me. If Git complain about such abuse, you may ignore warnings and force it to continue:

git rebase --continue

Now that we only have a clean sub-tree, let’s create a dedicated local Git repository to receive our branch:

cd ..
mkdir e107-importer
cd e107-importer
git init

Add a temporary origin hooked on our source repository:

git remote add origin ../kev-code

And import the master branch we carefully crafted (including tags):

git pull --tags origin master

Now we can create on GitHub the new repository that will receive our exported project:

It’s time to push our changes. Let’s replace our temporary origin to the new GitHub repository we just created:

git remote rm origin
git remote add origin git@github.com:kdeldycke/e107-importer.git
git push origin master --force --tags

So now we have a copy of the sub-tree of my plugin into its own repository. That’s great, but there is still some stuff to clean-up.

First, we will rewrite the repository to look as if the ./e107-importer sub-folder had been its project root since the beginning:

git filter-branch --tree-filter 'test -d ./e107-importer && mv ./e107-importer/* ./ || echo "No folder found"' -- --all

Then, I’ve altered some commit messages to fix inconsistencies due to sub-folder removal:

git filter-branch --msg-filter 'sed "s/Move the script to a dedicated folder/Rename script/g"' -- --all

Finally, at the bottom of the history, I still have my initial commit (a personal habit of mine when I initialize my Git repositories). But its date was updated by the first filter-branch call. Let’s set its date back to epoch:

git filter-branch --force --env-filter \
  'if [ $GIT_COMMIT = a2a5c05aed893fdd10250b724eb6a54bc6e7f122 ]
     then
       export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
       export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
   fi' -- --all

We can now send our latest changes to the remote GitHub repository by forcing a push:

git push --force

Last thing we have to do, is to remove the plugin code from the fat source repository (I don’t like duplicates). But that’s another story for another article…

Subversion commits and mail activity stream in iCalendar

Last week I consolidated all my code in my GitHub repository. I stumble upon an old script I haven’t publicized yet: svn2ical.py.

This is a simple hack which get commit metadata out of a Subversion repository and generate an iCalendar file containing all commits of a given author. I used it back then to visualize in a calendar my commit activity. Nowadays this script is quite useless as services like Ohloh and GitHub provides great timeline and activity streams. But this script can still be useful for private repositories.

And in the same spirit of this script, I uncovered maildir2ical.py, a script that look in a maildir folder for mails sent by a particular author, then generate an iCal file based on mail dates.

How I initialize my Git repositories

The first few days I used Git, I messed up my repository. I had to reset and recreate it from scratch several times. With enough trials and errors, I came up with an idea of how I should initialize my repositories. Let me explain in this post why git init is not enough to me.

To create a Git repository, nothing else is absolutely necessary than these few trivial commands:

$ mkdir kev-code
$ cd kev-code/
$ git init

But after reading some documentation and user experiences on the web, it looks like Git has some limitations when dealing with the root of a repository history. As I plan to heavily manipulate the commit history (to do some kind of code archaeology and history reconstruction), I need to have the widest time latitude to play with commits.

In this situation, I came to the conclusion that it’s a good idea to create an empty commit at the start of your repository life, and date it to the start of epoch. In the future, I’ll be able to leverage this intial commit as an ordinary history point from which I can start a branch. Then in this branch I’ll be free to mess up the history, until merging my changes back in the mainline tree.

So, let’s create an empty commit:

$ git commit --allow-empty -m 'Initial commit'

Then get the commit hash:

$ git log
commit 395290bcdb8ffccfbff89e42cb976077fbd3c1b7
Author: Kevin Deldycke <kevin@deldycke.com>
Date:   Tue Dec 1 15:37:49 2009 +0100

    Initial commit

We now change the commit date of our first commit to epoch start:

$ git filter-branch --env-filter '
>     if [ $GIT_COMMIT = 395290bcdb8ffccfbff89e42cb976077fbd3c1b7 ]
>     then
>         export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
>         export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
>     fi' -- --all
Rewrite 395290bcdb8ffccfbff89e42cb976077fbd3c1b7 (1/1)
Ref 'refs/heads/master' was rewritten

And check that the previous operation did what we expected:

$ git log
commit 8fe2934d1552c97246836987f0ea08e10ba749ae
Author: Kevin Deldycke <kevin@deldycke.com>
Date:   Thu Jan 1 00:00:00 1970 +0000

    Initial commit

Looks good !

For convenience, we’ll now attach a tag to this initial commit. Let’s call it init:

$ git tag "init"

This will came handy later when we’ll need to create a branch from here.

It’s time to push all changes to our brand new public repository:

$ git remote add origin git@github.com:kdeldycke/kev-code.git
$ git status
# On branch master
nothing to commit (working directory clean)
$ git push origin master --force

Counting objects: 2, done.
Writing objects: 100% (2/2), 159 bytes, done.
Total 2 (delta 0), reused 0 (delta 0)
To git@github.com:kdeldycke/kev-code.git
 + 86bd2c7...8fe2934 master -> master (forced update)

And here is the result on GitHub:

Maybe this “first commit” trick is unnecessary. So, if you have a better understanding of the issue, or can explain me why this is stupid, please tell me ! :)