e107 Importer 1.2 is out, with an enhanced BBCode parser.

Here is a brand new version of my e107 Importer plugin. This release fix lots of nasty bugs. Better, I added an enhanced BBCode parser which try to clean-up what e107′s parser output. This new parser also try to align the final HTML with what WordPress produce by default.

As usual, my plugin is available on the official WordPress plugin directory.

Here is the detailed changelog:

  • Upgrade e107 code to match latest 0.7.25-rc1.
  • Fix variable bleeding when importing items in batches.
  • Add a new way of handling e107 extended news using WordPress’ excerpts.
  • Parse BBCode and replace e107 constants in news excerpt.
  • Use internal WordPress library (kses) to parse HTML in the image upload step.
  • Do not upload the same images more than once.
  • Add a new enhanced BBCode parser on top of the one from e107. Make it the default parser.
  • Each time we alter the original imported content, we create a post revision.

Fixing messed-up encoding in MySQL

Currently working on my e107 Importer plugin, I was confronted today with badly-encoded data coming from my databases.

e107 migrated to full UTF-8 years ago, but I must have messed the upgrade process at the time. That was my conclusion when I took a close look to my tables: all of them seems to be set to Latin-1 but contain UTF-8 data. Here are screenshots from SQLBuddy (a great light-weight MySQL manager) showing just that:

To fix this, I first tried to use the following command I found on the web:

mysql --database=e107db -B -N -e "SHOW TABLES"  | awk '{print "ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;"}' | mysql --database=e107db

But this doesn’t work, as it not only change the encoding of the table, but also transcode the data inside the table.

Let’s try something else. First, we’ll export the database to a dump file, of which the encoding is forced to Latin-1:

mysqldump -a -c -e --no-create-db --add-drop-table --default-character-set=latin1 --databases 'e107db' > ./e107-data.sql

Now the trick is to change the CHARSET parameter of all CREATE TABLE directives to UTF-8:

sed -i 's/CHARSET=latin1/CHARSET=utf8/g' ./e107-data.sql

We’ll also change the NAMES directive to force MySQL to handle imported data as UTF-8:

sed -i 's/SET NAMES latin1/SET NAMES utf8/g' ./e107-data.sql

Then we’re free to import the result in a new UTF-8 database:

sed -i 's/USE `e107db`;/#USE `e107db`;/g' ./e107-data.sql
mysql --execute="CREATE DATABASE e107db_new CHARACTER SET=utf8"
mysql --database=e107db_new < ./e107-data.sql

And now, accentuated characters appears as they should in our database, meaning we’ve fixed all the mess ! :)


PS: I found another alternative method (look at the end of the linked page) which consists of temporarily handling TEXT fields as BLOB, to have MySQL treat them as binary content (thus skipping character transcoding). Haven’t tested this but sounds tricky.

MySQL commands

  • List all users in the current MySQL server:
    mysql> SELECT User, Host FROM mysql.user;
    
  • Remove a user from the server:
    mysql> DROP USER '<User>'@'<Host>';
    
  • Remove all binary logs:
    mysql> RESET MASTER;
    
  • Delete all binary logs but keep a week worth of logs:
    mysql --verbose --execute="PURGE BINARY LOGS BEFORE '`date +"%Y-%m-%d" -d last-week`';"
    

    And if you put this in a cron-tab, don’t forget to escape percents:

    mysql --verbose --execute="PURGE BINARY LOGS BEFORE '`date +\%Y-\%m-\%d -d last-week`';"
    
  • Check, auto-repair and optimize all databases:
    mysqlcheck --auto-repair --optimize --all-databases
    
  • Export a database:
    mysqldump -u my_user "my-database" > data.sql
    
  • Here is a cron-able command to restart a MySQL service if no process found active:
    [ `ps axu | grep -v "grep" | grep --count "mysql"` -le 0 ] && /etc/init.d/mysql restart
    
  • Monitor the queries being run (source):
    watch -n 1 mysqladmin --user=XXXXX --password=XXXXX processlist
    
  • Get the list of default configuration parameters the server will use regardless of the values set in config files (source):
    mysqld --no-defaults --verbose --help
    
  • Migrate all tables of all databases from MyISAM to InnoDB:
    mysql --skip-column-names --silent --raw --execute="SELECT CONCAT(table_schema , '.', table_name) FROM INFORMATION_SCHEMA.tables WHERE table_type='BASE TABLE' AND engine='MyISAM';" | xargs -I '{}' mysql --verbose --execute="ALTER TABLE {} ENGINE=InnoDB;"
    

Moving a WordPress blog to another domain

qpx-site-domain-migration I provide hosting for free to some of my friends. One of them, QPX, had a side project called Lich’ti. But the latter is no longer active, so he decided to not renew the lich-ti.fr domain.

If Lich’ti’s domain name is dead, QPX’s personal blog is not. His website is powered by WordPress and was available at http://qpx.lich-ti.fr. My job is now to move it to http://qpx.coolcavemen.com. In this post, I’ll tell you how I’ve done it.

Before going further, backup everything, and be ready to revert back to your original situation at any moment ! What works for me will not necessary works for you…

To play nice with your visitors, you can setup a temporary maintenance page while we’re performing the migration.

Let’s start the migration by replacing, in the files served by Apache, all occurrences of the old domain name by the new one:

find /var/www/qpx-blog -mount -print -type f -exec sed -i 's/qpx.lich-ti.fr/qpx.coolcavemen.com/g' "{}" \;

If you have doubts about the efficiency of the command above, you can check the presence of the string we’re looking to replace via this command:

grep -RIi "qpx.lich-ti.fr" ./*

Then, we dump the database containing all WordPress content and config to a local file (the command will prompt for password):

mysqldump -p --host=localhost --port=3306 --user=root --opt --databases "qpx_blog" > qpx_dump.sql

And we replace all strings of the old domain by the new one:

sed 's/qpx.lich-ti.fr/qpx.coolcavemen.com/g' qpx_dump.sql > new_qpx.sql

Finally, we re-inject the modified database content after clearing the original:

mysql -p --host=localhost --port=3306 --user=root --execute='DROP DATABASE `qpx_blog`;'
mysql -p --host=localhost --port=3306 --user=root < new_qpx.sql

Now you can disable the maintenance page and test the blog to check nothing’s broken.

Again, to play nice with your visitors (and search engines), you can redirect old URLs to the new domain, with apache directives similar to this one:

<VirtualHost *:80>
  ServerName qpx.lich-ti.fr
  RedirectMatch permanent (.*) http://qpx.coolcavemen.com$1
</VirtualHost>

How-to: e107 autogallery to Zenphoto migration

These past few days I was working on the Cool Cavemen’s photo gallery to move it to a shiny new one, powered by Zenphoto. In this post I will roughly describe how I’ve done it, code and commands included.

The old gallery was based on autogallery, a e107 plugin. We assume here that both e107 and Zenphoto are well configured and installed at the root of you web hosting space (/www in this case).

The first step is to copy the autogallery album structure, with all its content, to Zenphoto:

cd /www
cp -ax ./e107_plugins/autogallery/Gallery/* ./zenphoto/albums/

Then we delete all previews, thumbnails and XML metadatas, to keep in Zenphoto original assets only:

find ./zenphoto/albums/ -iname "*.xml" | xargs rm -f
find ./zenphoto/albums/ -iname "pv_*" | xargs rm -f
find ./zenphoto/albums/ -iname "th_*" | xargs rm -f

By now, you should be able to play with your medias using Zenphoto’s admin interface.

But if you’re unlucky as I was, you will find a strange bug which break down drag’n'drop album sorting. The fix I found was to remove, in photo filenames, the numerical prefix (and the following dot) set by autogallery to define the sort order. This operation should be performed, before the copy from autogallery to Zenphoto (= the first command in this post). By the way, if you know a one-liner to do this, please, please… share ! :)

To migrate comments, I have no automatic solution. I choose to do this manually, editing the database by hand. In my case it was the quickest way as I only had a dozen of comments to migrate.

And last but not least, if you care about measuring the popularity of your photos, you should consider migrating the view counter associated with each of your media. Don’t worry, this time I wrote a script to take care of it automagically. It will generate a bunch of SQL statements you’ll have to execute on your Zenphoto MySQL database. Here is my “e107 autogallery to Zenphoto hit counter migration script” (nice name isn’t it ? ;) ) that do the job:

#!/usr/bin/python

##############################################################################
#
# Copyright (C) 2008 Kevin Deldycke <kevin@deldycke.com>
#
# This program is Free Software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
#
##############################################################################

"""
  Last update: 2008 aug 21
"""

########### User config ###########

AUTOGALLERY_ALBUM_PATH = "/www/e107_plugins/autogallery/Gallery"
ZENPHOTO_ALBUM_PATH    = "/www/zenphoto/albums"
ZENPHOTO_TABLE_PREFIX  = "zenphoto_"

######## End of user config #######

import os, hashlib
import xml.etree.ElementTree as ET

# Calculate hash of a given file
def getHash(path):
  # Calculate the hash from file raw data
  if not os.path.isfile(path):
    return None
  try:
    file_object = open(path, 'r')
    data = file_object.read()
  except:
    return None
  if not len(data):
    return None
  return hashlib.sha224(data).hexdigest()

# Associate each autogallery photo having a hitcounter greater than 0 with its MD5 hash
def populateHashTable(arg, dirname, names):
  global hash_table
  for name in names:
    file_path = os.path.join(dirname, name)
    # print "Get hit count for %s" % file_path
    # Check that the file as a positive hit counter associated with
    xml_file_path = "%s.xml" % file_path
    if not os.path.isfile(xml_file_path):
      continue
    try:
      tree = ET.parse(xml_file_path)
    except:
      continue
    node = tree.find("viewhits")
    if node is None:
      continue
    try:
      hits = int(node.text)
    except:
      continue
    if not hits > 0:
      continue
    # Update hash table with data we care about
    file_hash = getHash(file_path)
    if file_hash is None:
      continue
    hash_table[file_hash] = hits + hash_table.get(file_hash, 0)

# Generate hitcount SQL request for each matching file
def generateSQL(arg, dirname, names):
  global sql
  for name in names:
    file_path = os.path.join(dirname, name)
    # print "Search hitcounter matching file %s" % file_path
    file_hash = getHash(file_path)
    if file_hash is None:
      continue
    if file_hash in hash_table:
      sql += "UPDATE `%simages` SET `hitcounter`=`hitcounter`+%d WHERE `filename`=%r;\n" % (ZENPHOTO_TABLE_PREFIX, hash_table[file_hash], name)

# Core of the script
hash_table = {}
sql        = ""
# Normalize path
source_path = os.path.abspath(AUTOGALLERY_ALBUM_PATH)
dest_path   = os.path.abspath(ZENPHOTO_ALBUM_PATH)

os.path.walk(source_path, populateHashTable, None)
# print repr(hash_table)
os.path.walk(dest_path, generateSQL, None)
print sql

I think code and comments are self-explainatory. And do not forget to update constants at the top of the script to match your installation paths and database’s tables prefix.

And finally, for your information, I tested all of this on following versions:

  • e107 0.7.11
  • autogallery 2.61
  • Zenphoto 1.2
  • Python 2.5.2
  • Linux server

Website Backup Script: bug fix release

14 months after the last release, here is a new version of my website backup script. As you can see in the changelog, this version is essentially released to fix some bugs.

Changelog:

  • Check version of Python (at least v2.4 is required)
  • Rename --debug option to --verbose
  • Add a --dry-run option for testing
  • Remove use of deprecated pexpect methods
  • Add and update some error messages