e107 Importer v1.3 released

A month after the last one, here is a brand new version of my e107 Importer, numbered 1.3.

This version add loads of polishing and is not far from being feature-complete. I think I’m reaching the end of the active development of this plugin. I don’t see the need to add new features.

I also feel this way because last week, I succeeded in moving to WordPress all news and pages from the old e107′s Cool Cavemen website. I now only need to import all forums to definitively get rid of e107 from my life. At that point, I will declare the plugin no longer active. This mean I will no longer update it, but will still integrate code other developers are willing to contribute.

Before that happen, I will of course release one or two revisions of this plugin in the next few months. But expect bug fixes and tiny enhancements, not big new features.

That being said, here is the changelog of the brand new e107 Importer 1.3:

  • Upgrade embedded e107 code with latest 0.7.25.
  • Redirect imported images to attachments.
  • Purge invalid mapping entries on import.
  • Replace old e107 URLs in content by new WordPress permalinks.
  • Allow both imported and already-existing content to by updated with new permalinks.
  • Let user specify the list of e107 forums to import.
  • Phased imports should work without major problems.

FTT Migration from Subversion to Git

Last month I released the Feed Tracking Tool project (aka FTT) on GitHub. I reconstructed the code history from old tarballs. In the mean time, my friend at Uperto managed to recover the original Subversion repository from very old backups. Here is how I migrated the old SVN repository to GitHub.

First, I started a local Subversion server with the repository my co-worker gave me:

$ tar xvzf ./ftt-svn.tar.gz
$ sed -i 's/# password-db = passwd/password-db = passwd/' ./ftt-svn/conf/svnserve.conf
$ echo "kevin = kevin" >> ./ftt-svn/conf/passwd
$ kill `ps -ef | grep svnserve | awk '{print $2}'`
$ svnserve --daemon --listen-port 3690 --root ./ftt-svn

Then I created a local Git repository, using my initialization routine:

$ rm -rf ./ftt-git
$ mkdir ./ftt-git
$ cd ./ftt-git
$ git init
$ git commit --allow-empty -m 'Initial commit'
$ git tag "init"

The next step consist in importing the Subversion repository to Git:

$ git svn init --no-metadata --username kevin svn://localhost:3690
$ git svn fetch

Here I rebased the imported git-svn branch to the main branch:

$ git rebase --onto git-svn master
$ git rebase init master

At that point I don’t need the remote git-svn branch so I removed it:

$ git branch -r -D git-svn

To clean things up, let’s remove all SVN metadatas and local commit backups:

$ rm -rf ./.git/svn/
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune

We can now proceed to alter the code history. In FTT we never created branches. I also plan to recreate tags by hand later. So I decided to remove all the tags and branches folders coming from Subversion:

$ git filter-branch --force --prune-empty --tree-filter 'rm -rf ./tags*'     -- --all
$ git filter-branch --force --prune-empty --tree-filter 'rm -rf ./branches*' -- --all

Now let’s move the trunk directory to the base of the repository. I didn’t used the --subdirectory-filter parameter as FTT started its life without a proper “branches/tags/trunk” SVN structure:

$ git filter-branch --force --prune-empty --tree-filter 'test -d ./trunk && cp -axv ./trunk/* ./ && rm -rf ./trunk || echo "No trunk folder found"' -- --all

Next is the Git command I used to fix commit authorship:

$ git filter-branch --force --env-filter '
    if [ "$GIT_AUTHOR_NAME" = "kdeldycke" ]
      then
        export GIT_AUTHOR_NAME="Kevin Deldycke"
        export GIT_AUTHOR_EMAIL="kevin@deldycke.com"
    fi
    if [ "$GIT_AUTHOR_NAME" = "qdesert" ]
      then
        export GIT_AUTHOR_NAME="Quentin Desert"
        export GIT_AUTHOR_EMAIL="quentin.desert@uperto.com"
    fi
    ' -- --all

While exploring my own backups of the FTT project, I stumble upon a preliminary HTML mockup of the app. I decided to include it in the final repository, as the first commit, just after my init tag. Here how I did this, assuming the mockup sources were available in the ../mockup directory:

$ git branch mockup-injection init
$ git checkout mockup-injection
$ cp -axv ../mockup .
$ git add --all
$ git commit --all --date="2007-07-17 15:49" --author="Quentin Desert <quentin.desert@uperto.com>" -m "Commit the oldest mockup I can find."
$ git rebase --onto mockup-injection init master
$ git branch -D mockup-injection

The procedure above come from my “Commit history reconstruction” article.

Now I can tag by hand all FTT releases.

$ git tag -f "0.4.1"  5f5cc2a36743f2c8d2088669e475ef09d8cec029
$ git tag -f "0.5"    54a76e143f9f2efdec88d3181cbcfbfddda5f725
$ git tag -f "0.6"    934447f185330903c389364bed94e994f6b280e6
$ git tag -f '0.7'    ef87ab3287ba23655781565fd622345c942d9c49
$ git tag -f "0.8"    cdcf2f459826019bbbc5874d6632392b07ea889b
$ git tag -f "0.8.1"  f47a3f219eb918069efe701d082928cdb953f05f
$ git tag -f "0.8.2"  2542754dd088d359ce96db8511e0a15588eb50ce
$ git tag -f "0.8.3"  ea9455c0ed75cf504c1cc872d5e5946b578ae702
$ git tag -f "0.9.0"  57a39879b3bcc61bd9560d7ac4e71cbfd0af22df
$ git tag -f "0.9.1"  e483fd1a287fa86a8b12d088b78a319b0990e6ef
$ git tag -f "0.10.0" ed77af77506836892be78044ae4ef15d07f18583

FTT was always developed as an internal app. As such the code and its history still contain lots of sensible informations. I deeply audited the code to identify the kind of data that we should absolutely not disclose to the outside world.

At the end of this code review, I just found references to our internal architecture (server’s names and IP addresses), and some usernames and passwords. There was also some logs and temporary files. I cleaned them all with the following set of Git commands:

$ git filter-branch --force --prune-empty --tree-filter 'find . -iname ".svn"        | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.log"       | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*~"          | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.pid"       | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "*.ppid"      | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -iname "ruby_sess.*" | xargs rm -rf' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/password: 1234567/password: *******/g"   "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/smtp\.server12\.com/smtp\.uperto\.com/g" "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/192\.168\.0\.2/12\.34\.56\.78/g"         "{}" \;' -- --all
$ git filter-branch --force --prune-empty --tree-filter 'find . -type f -exec sed -i "s/user qdesert/user *******/g"             "{}" \;' -- --all

After all these modifications, I was pretty sure my code was ready to be published. But better safe than sorry, I spent a couple of minutes to do a second deep code review to check that I didn’t missed anything. And to push the reviewing process even further, I offer a beer at the local bar for anyone finding sensible information in FTT’s code base ! :)

The last things I did was to delete the old FTT’s GitHub repository and recreate it. Then I fixed my first commit date, cleaned Git’s local backup and pushed my carefully crafted repository to its new GitHub’s home:

$ export GIT_TMP_INIT_HASH=`git show-ref init | cut -d ' ' -f 1`
$ git filter-branch --env-filter '
    if [ $GIT_COMMIT = $GIT_TMP_INIT_HASH ]
      then
        export GIT_AUTHOR_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
        export GIT_COMMITTER_DATE="Thu, 01 Jan 1970 00:00:00 +0000"
    fi' -- --all
$ unset GIT_TMP_INIT_HASH
$ rm -rf ./.git/refs/original/
$ git reflog expire --all
$ git gc --aggressive --prune
$ git remote add origin git@github.com:kdeldycke/feed-tracking-tool.git
$ git push origin master --force --tags

e107 Importer 1.1 available !

A month after the 1.0 release, here is my e107 Importer v1.1 for WordPress !

The biggest new feature is support of e107 forum import to the bbPress WordPress plugin. This plugin is still in alpha and was not released yet. This mean you have to fetch it from its Subversion repository. And be careful to get the recommended version (SVN release 2942).

Because of the experimental status of forum import, the default option of e107 Importer is to not import forums. I decided to include this feature anyway to get feedbacks. So please don’t consider forum import as a highly reliable. It may work for you or may not. And please write detailed bug reports.

Here is a detailed changelog between 1.0 and 1.1:

  • Add import of forums and threads to bbPress WordPress plugin.
  • Parse BBCode and e107 constants in forums and thread.
  • Add forums and threads redirections.
  • Make e107 user import optional. This needs you to set a pre-existing WordPress user that will take ownership of all imported content.
  • Parse BBCode in titles too.
  • Import images embedded in comments and forum threads.
  • Description update of existing users is no longer destructive.
  • Add an entry in the FAQ regarding script ending prematurely.
  • Disable all extra HTML rendering hooks like the one coming from e107 linkwords plugin.
  • Allow news and pages import to be skipped.
  • Add missing news category redirects.
  • Minimal requirement set to WordPress 3.1.
  • Some pages are not tied to a user. In this case, default to the current user.

e107 Importer WordPress plugin v1.0 released !

After 3 years in limbo, here is a new stable version of my e107 Importer plugin for WordPress, proudly numbered 1.0 ! :)

This is the first time this plugin is available on the official WordPress plugin repository. This means easy upgrades for you ! :)

I’ve heavily updated the plugin to use the latest WordPress import framework, so everything is now cleanly integrated.

This version was tested with e107 v0.7.24 and WordPress 3.1-RC3.

Here is the changelog:

A note for developpers: the reference code base is now located on GitHub. That’s were all new code must be commited. WordPress’ Subversion repository is only a mirror.

If any question, please read the FAQ first.

How-to: e107 autogallery to Zenphoto migration

These past few days I was working on the Cool Cavemen’s photo gallery to move it to a shiny new one, powered by Zenphoto. In this post I will roughly describe how I’ve done it, code and commands included.

The old gallery was based on autogallery, a e107 plugin. We assume here that both e107 and Zenphoto are well configured and installed at the root of you web hosting space (/www in this case).

The first step is to copy the autogallery album structure, with all its content, to Zenphoto:

cd /www
cp -ax ./e107_plugins/autogallery/Gallery/* ./zenphoto/albums/

Then we delete all previews, thumbnails and XML metadatas, to keep in Zenphoto original assets only:

find ./zenphoto/albums/ -iname "*.xml" | xargs rm -f
find ./zenphoto/albums/ -iname "pv_*" | xargs rm -f
find ./zenphoto/albums/ -iname "th_*" | xargs rm -f

By now, you should be able to play with your medias using Zenphoto’s admin interface.

But if you’re unlucky as I was, you will find a strange bug which break down drag’n'drop album sorting. The fix I found was to remove, in photo filenames, the numerical prefix (and the following dot) set by autogallery to define the sort order. This operation should be performed, before the copy from autogallery to Zenphoto (= the first command in this post). By the way, if you know a one-liner to do this, please, please… share ! :)

To migrate comments, I have no automatic solution. I choose to do this manually, editing the database by hand. In my case it was the quickest way as I only had a dozen of comments to migrate.

And last but not least, if you care about measuring the popularity of your photos, you should consider migrating the view counter associated with each of your media. Don’t worry, this time I wrote a script to take care of it automagically. It will generate a bunch of SQL statements you’ll have to execute on your Zenphoto MySQL database. Here is my “e107 autogallery to Zenphoto hit counter migration script” (nice name isn’t it ? ;) ) that do the job:

#!/usr/bin/python

##############################################################################
#
# Copyright (C) 2008 Kevin Deldycke <kevin@deldycke.com>
#
# This program is Free Software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
#
##############################################################################

"""
  Last update: 2008 aug 21
"""

########### User config ###########

AUTOGALLERY_ALBUM_PATH = "/www/e107_plugins/autogallery/Gallery"
ZENPHOTO_ALBUM_PATH    = "/www/zenphoto/albums"
ZENPHOTO_TABLE_PREFIX  = "zenphoto_"

######## End of user config #######

import os, hashlib
import xml.etree.ElementTree as ET

# Calculate hash of a given file
def getHash(path):
  # Calculate the hash from file raw data
  if not os.path.isfile(path):
    return None
  try:
    file_object = open(path, 'r')
    data = file_object.read()
  except:
    return None
  if not len(data):
    return None
  return hashlib.sha224(data).hexdigest()

# Associate each autogallery photo having a hitcounter greater than 0 with its MD5 hash
def populateHashTable(arg, dirname, names):
  global hash_table
  for name in names:
    file_path = os.path.join(dirname, name)
    # print "Get hit count for %s" % file_path
    # Check that the file as a positive hit counter associated with
    xml_file_path = "%s.xml" % file_path
    if not os.path.isfile(xml_file_path):
      continue
    try:
      tree = ET.parse(xml_file_path)
    except:
      continue
    node = tree.find("viewhits")
    if node is None:
      continue
    try:
      hits = int(node.text)
    except:
      continue
    if not hits > 0:
      continue
    # Update hash table with data we care about
    file_hash = getHash(file_path)
    if file_hash is None:
      continue
    hash_table[file_hash] = hits + hash_table.get(file_hash, 0)

# Generate hitcount SQL request for each matching file
def generateSQL(arg, dirname, names):
  global sql
  for name in names:
    file_path = os.path.join(dirname, name)
    # print "Search hitcounter matching file %s" % file_path
    file_hash = getHash(file_path)
    if file_hash is None:
      continue
    if file_hash in hash_table:
      sql += "UPDATE `%simages` SET `hitcounter`=`hitcounter`+%d WHERE `filename`=%r;\n" % (ZENPHOTO_TABLE_PREFIX, hash_table[file_hash], name)

# Core of the script
hash_table = {}
sql        = ""
# Normalize path
source_path = os.path.abspath(AUTOGALLERY_ALBUM_PATH)
dest_path   = os.path.abspath(ZENPHOTO_ALBUM_PATH)

os.path.walk(source_path, populateHashTable, None)
# print repr(hash_table)
os.path.walk(dest_path, generateSQL, None)
print sql

I think code and comments are self-explainatory. And do not forget to update constants at the top of the script to match your installation paths and database’s tables prefix.

And finally, for your information, I tested all of this on following versions:

  • e107 0.7.11
  • autogallery 2.61
  • Zenphoto 1.2
  • Python 2.5.2
  • Linux server

e107 to WordPress migration : v0.9 plug-in released

9 months after the last one, here is the new version (v0.9) of my e107 to WordPress import plug-in !

Change log:

  • “One-click migration” instead of multiple step process (more user-friendly),
  • Better error management (a must-have for precise bug reports),
  • Replace all links to old content with permalinks (increased SEO),
  • Better database management,
  • Work with latest WordPress v2.3.2 and e107 v0.7.11,
  • Code cleaned up ! ;)