Tag Archive for 'script'

How-to: e107 autogallery to Zenphoto migration

These past few days I was working on the Cool Cavemen’s photo gallery to move it to a shiny new one, powered by Zenphoto. In this post I will roughly describe how I’ve done it, code and commands included.

The old gallery was based on autogallery, a e107 plugin. We assume here that both e107 and Zenphoto are well configured and installed at the root of you web hosting space (/www in this case).

The first step is to copy the autogallery album structure, with all its content, to Zenphoto:

cd /www
cp -ax ./e107_plugins/autogallery/Gallery/* ./zenphoto/albums/

Then we delete all previews, thumbnails and XML metadatas, to keep in Zenphoto original assets only:

find ./zenphoto/albums/ -iname "*.xml" | xargs rm -f
find ./zenphoto/albums/ -iname "pv_*" | xargs rm -f
find ./zenphoto/albums/ -iname "th_*" | xargs rm -f

By now, you should be able to play with your medias using Zenphoto’s admin interface.

But if you’re unlucky as I was, you will find a strange bug which break down drag’n'drop album sorting. The fix I found was to remove, in photo filenames, the numerical prefix (and the following dot) set by autogallery to define the sort order. This operation should be performed, before the copy from autogallery to Zenphoto (= the first command in this post). By the way, if you know a one-liner to do this, please, please… share ! :)

To migrate comments, I have no automatic solution. I choose to do this manually, editing the database by hand. In my case it was the quickest way as I only had a dozen of comments to migrate.

And last but not least, if you care about measuring the popularity of your photos, you should consider migrating the view counter associated with each of your media. Don’t worry, this time I wrote a script to take care of it automagically. It will generate a bunch of SQL statements you’ll have to execute on your Zenphoto MySQL database. Here is my “e107 autogallery to Zenphoto hit counter migration script” (nice name isn’t it ? ;) ) that do the job:

#!/usr/bin/python

##############################################################################
#
# Copyright (C) 2008 Kevin Deldycke <kevin@deldycke.com>
#
# This program is Free Software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
#
##############################################################################

"""
  Last update: 2008 aug 21
"""

########### User config ###########

AUTOGALLERY_ALBUM_PATH = "/www/e107_plugins/autogallery/Gallery"
ZENPHOTO_ALBUM_PATH    = "/www/zenphoto/albums"
ZENPHOTO_TABLE_PREFIX  = "zenphoto_"

######## End of user config #######

import os, hashlib
import xml.etree.ElementTree as ET

# Calculate hash of a given file
def getHash(path):
  # Calculate the hash from file raw data
  if not os.path.isfile(path):
    return None
  try:
    file_object = open(path, 'r')
    data = file_object.read()
  except:
    return None
  if not len(data):
    return None
  return hashlib.sha224(data).hexdigest()

# Associate each autogallery photo having a hitcounter greater than 0 with its MD5 hash
def populateHashTable(arg, dirname, names):
  global hash_table
  for name in names:
    file_path = os.path.join(dirname, name)
    # print "Get hit count for %s" % file_path
    # Check that the file as a positive hit counter associated with
    xml_file_path = "%s.xml" % file_path
    if not os.path.isfile(xml_file_path):
      continue
    try:
      tree = ET.parse(xml_file_path)
    except:
      continue
    node = tree.find("viewhits")
    if node is None:
      continue
    try:
      hits = int(node.text)
    except:
      continue
    if not hits > 0:
      continue
    # Update hash table with data we care about
    file_hash = getHash(file_path)
    if file_hash is None:
      continue
    hash_table[file_hash] = hits + hash_table.get(file_hash, 0)

# Generate hitcount SQL request for each matching file
def generateSQL(arg, dirname, names):
  global sql
  for name in names:
    file_path = os.path.join(dirname, name)
    # print "Search hitcounter matching file %s" % file_path
    file_hash = getHash(file_path)
    if file_hash is None:
      continue
    if file_hash in hash_table:
      sql += "UPDATE `%simages` SET `hitcounter`=`hitcounter`+%d WHERE `filename`=%r;\n" % (ZENPHOTO_TABLE_PREFIX, hash_table[file_hash], name)

# Core of the script
hash_table = {}
sql        = ""
# Normalize path
source_path = os.path.abspath(AUTOGALLERY_ALBUM_PATH)
dest_path   = os.path.abspath(ZENPHOTO_ALBUM_PATH)

os.path.walk(source_path, populateHashTable, None)
# print repr(hash_table)
os.path.walk(dest_path, generateSQL, None)
print sql

I think code and comments are self-explainatory. And do not forget to update constants at the top of the script to match your installation paths and database’s tables prefix.

And finally, for your information, I tested all of this on following versions:

  • e107 0.7.11
  • autogallery 2.61
  • Zenphoto 1.2
  • Python 2.5.2
  • Linux server

Website Backup Script: bug fix release

14 months after the last release, here is a new version of my website backup script. As you can see in the changelog, this version is essentially released to fix some bugs.

Changelog:

  • Check version of Python (at least v2.4 is required)
  • Rename --debug option to --verbose
  • Add a --dry-run option for testing
  • Remove use of deprecated pexpect methods
  • Add and update some error messages

How-to import a Maildir++ folder to Kmail

Let’s say you have a local copy a mail folder you want to browse with Kmail. This folder is normally found on a dedicated mail server and you access it through the IMAP protocol. I was in this situation some days ago and I will tell you how I’ve done it.

Instinctively, I assumed that my folder was of the Maildir format, and Kmail local mails too. So I tried to copy my ~/Maildir folder from the mail server to my local machine (~/.kde/share/apps/kmail/mail/). And that was the result in Kmail:

kmail-no-sub-folders.png

It looks good but it’s not: there is no sub-folders !

After some googling, I found what was wrong: my ~/Maildir folder is not a Maildir, but a Maildir++ folder. This kind of folder is handle by popular IMAP MTA like qmail, Dovecot and courier-imap (which was used on the mail server where my ~/Maildir come from). There is some advantages of using the “++” flavor of Maildir over the classic one, like quotas and sub-folders. Unfortunately Kmail is not able to read the Maildir++ folder structure.

To fix this, I’ve created a tiny python script to migrate a Maildir++ folder to Kmail.

How-to use it ? Simply:

  1. Download it to your disk,
  2. Edit it and change the MAILDIR_SOURCE and KMAILDIR_DEST variables to match your local configuration,
  3. Give it execution privileges,
  4. Run it !

I advise you to try it first in a safe environment (like under a temporary user account). And don’t forget to backup everything before playing with it: because this script work for me doesn’t mean that it will work for you ! ;)

System backup script: no more endless lock

I’ve just released a new version of my system-backup.py script.

The main update is about the lock file, which I implemented in the last version to keep the script to run twice (or more) in parallel. This is a nice feature to avoid overlapping processes that fight each other to use the same ressources. But in some extreme cases (reboot or power failure during backup, …), the lock file will remain and so will prevent the script to start (until you notice the problem and remove the lock file manually). This new version take care of this problem and is now able to remove the lock automatically if a timeout is reached. It also kill all remaining child processes.

Here is the detailed changelog:

  • Auto-kill the script if the backup process take to much time. Timeout can be defined via a constant.
  • Clean kill: track all child processes to kill them safely before removing the lock file.
  • Require newer versions of python (>= v2.4), rsync (>= v2.6.7) and rdiff-backup (>= v1.1.0).
  • Use --preserve-numerical-ids option when adding rdiff-backup increment.
  • Keep 15 increments by default instead of 20. This value can be easily changed thanks to a defined constant.
  • Remove deleted file first during mirroring and delete outdated increments before adding a new one to gain space. This strategy is safer for target disk with low remaining free space.
  • Tell rsync to print human-readable values.

System Backup on Unreliable Link thanks to rdiff-backup and rsync

I’ve just write a brand new script called system-backup.py. It’s similar to my website-backup.py script but instead of website and MySQL databases, it is designed to backup systems of several machines. This script is based on an idea from the “Backup up on unreliable link” article from the official rdiff-backup wiki. It use rdiff-backup to keep the last 20 backups and rsync to speed-up the backup process.

I run this script to backup all the local machines within my LAN. I start the backup process everyday thanks to a cron entry similar to this one:

0 20 * * * root /root/system-backup.py >> /mnt/backup-disk/backup.log

If you need more information about the rsync part the script, please have a look to my previous Remote Backup with rsync article, which detail how-to setup key authentification with ssh.