Tag Archive for 'backup'

Auto-saving a file at regular intervals

editing-cron-with-vi Here is a way to autosave a file at regular intervals: use cron !

The trick is to know that cron need percents to be escaped by a backslash in the command zone. For example, here is my crontab entry to create every 10 minutes a local backup of an important project file I currently work on:

*/10 * * * * kevin cp "/home/kevin/Desktop/Projects/Very Important Project/project.file" "/home/kevin/Desktop/Projects/Very Important Project/project.file-backup-`date +\%s`"

Quick and dirty, but may saves you precious time on unstable machines ! ;)

Heroic journey to RAID-5 data recovery

Last week there was a power grid failure which break down my server’s RAID array. I have no UPS (as I’m a skinflint) and no automatic email alerts (because I’m too lazy to set it up). As a result, for 5 days, my 3-disk RAID-5 array was relying on only 2 disks until I noticed the issue…

By using a combination of following commands, I was soon aware of the gravity of the situation:

cat /proc/mdstat
mdadm --examine /dev/sda1

My /dev/sda1 disk was kicked out of the array, so I did the right stuff which consisted of reconstructing the array:

mdadm /dev/md0 -a /dev/sda1

Then, in an unlucky combination of cosmic ray bombardment, spooky action at a distance and astrological misalignment, half-way to the end of the rebuilding process (which can take up to 5 hours), another disk failed ! It was late, I was tired and utterly worried about losing 1.5 To of precious data. In such a bad shape, I was afraid to worsen the situation. So I decided to shutdown the server and sleep on the problem.

The next day I tried to boot my server to find it (surprise !) stuck in the middle of the boot process, with the famous message:

hit control-D to continue or give root password to fix manually

This is “normal” as my server tried to mount the ext3 filesystem from the /dev/md0 partition that was just assembled by mdadm. Of course md0, if assembled and available to the system, was not running because only one disk, out of three, was in a clean state.

I skip here the epic substory in which I wasted days in a search of a working keyboard, but I let you imagine how such adventures makes my week…

Eventually, I was able to analyze the situation in details. My first reflex ? Check that disks are not physically dead:

fdisk -l /dev/sda
fdisk -l /dev/sdb
fdisk -l /dev/sdc

“Linux raid partitions” (type code “fd“) are still there. Good. I assumed here that disks where not physically damaged. Maybe I should have looked at S.M.A.R.T. datas and statistics (via smartmontools). But remember, I’m lazy (and a bit crazy).

The next step was to get informations about the RAID array itself using:

mdadm --detail /dev/md0

which output the status table below (probably inaccurate as I reconstructed it afterwards):

Number   Major   Minor   RaidDevice State
   0       0        0        0      removed
   1       0        0        1      faulty removed
   2       8       33        2      active sync   /dev/sdc1
   3       8       17        3      spare

What this table told us ?

  • The array is up, but not running. One of its device (sdc1) was clean and active, but it’s not enough to get a working RAID-5.
  • My first attempt to rebuild the array lead to an unexpected result: it added sda1 as a spare device (in slot #3).
  • It confirm that sdb1 unexpectedly failed and is now in a bad state (“faulty removed“).

Then I stopped the array and tried to fearlessly (re)assemble it using 3 differents methods:

mdadm -S /dev/md0
mdadm -A /dev/md0
mdadm --assemble /dev/md0 --verbose /dev/sd[abc]1
mdadm --assemble --force --scan /dev/md0 --verbose

It always failed with messages like:

mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
mdadm: /dev/md0 assembled from 1 drives and 1 spare - not enough to start the array.

So I examined each drive from mdadm’s point of view:

mdadm -E /dev/sda1
mdadm -E /dev/sdb1
mdadm -E /dev/sdc1
mdadm -E /dev/sd[abc]1 | grep Event

The lastest command compare the “Event” attribute of all devices. It output something like:

Events : 0.53120
Events : 0.53108
Events : 0.53120

which indicate that sda1 and sdc1 are somewhat synced (share the same number) and sdb1 “late” (lower number).

Here I’ve got the idea of recreating the raid array without sdb1, relying only on sda1 and sdc1, by using the “magic” (hence dangerous) --assume-clean option. The latter doesn’t build, erase or initialize a new array. It just try to assemble it “as is”. Here is the command:

mdadm --create /dev/md0 --assume-clean --level=5 --verbose --raid-devices=3 /dev/sda1 missing /dev/sdc1

And it worked ! :D

I mounted the md0 partition and cleaned it up:

fsck.ext3 -v /dev/md0
mount /dev/md0

I updated my mdadm configuration before rebooting my server:

mdadm --detail --scan >> /etc/mdadm/mdadm.conf
vi /etc/mdadm/mdadm.conf
reboot

But history repeat itself, and again, the system hang up during boot. Except this time I knew what was happening: the boot process detected the remaining sdb1 device as part of the old array (the one before the regeneration I did above) and tried to run it. Remembering my last year post, I zero-ized the superblock of sdb1:

mdadm -S /dev/md0
mdadm --zero-superblock /dev/sdb1

A server reboot proved I was right and my md0 partition was automagically mounted in altered state:

localhost:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[3] sda1[0] sdc1[2]
      1465143808 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]

unused devices: <none>

I just had to re-add sdb1 to fill the available slot and update the mdadm configuration to get back my array in its initial state:

mdadm --manage /dev/md0 --add /dev/sdb1
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
vi /etc/mdadm/mdadm.conf

Website Backup Script: bug fix release

14 months after the last release, here is a new version of my website backup script. As you can see in the changelog, this version is essentially released to fix some bugs.

Changelog:

  • Check version of Python (at least v2.4 is required)
  • Rename --debug option to --verbose
  • Add a --dry-run option for testing
  • Remove use of deprecated pexpect methods
  • Add and update some error messages

How-to import a Maildir++ folder to Kmail

Let’s say you have a local copy a mail folder you want to browse with Kmail. This folder is normally found on a dedicated mail server and you access it through the IMAP protocol. I was in this situation some days ago and I will tell you how I’ve done it.

Instinctively, I assumed that my folder was of the Maildir format, and Kmail local mails too. So I tried to copy my ~/Maildir folder from the mail server to my local machine (~/.kde/share/apps/kmail/mail/). And that was the result in Kmail:

kmail-no-sub-folders.png

It looks good but it’s not: there is no sub-folders !

After some googling, I found what was wrong: my ~/Maildir folder is not a Maildir, but a Maildir++ folder. This kind of folder is handle by popular IMAP MTA like qmail, Dovecot and courier-imap (which was used on the mail server where my ~/Maildir come from). There is some advantages of using the “++” flavor of Maildir over the classic one, like quotas and sub-folders. Unfortunately Kmail is not able to read the Maildir++ folder structure.

To fix this, I’ve created a tiny python script to migrate a Maildir++ folder to Kmail.

How-to use it ? Simply:

  1. Download it to your disk,
  2. Edit it and change the MAILDIR_SOURCE and KMAILDIR_DEST variables to match your local configuration,
  3. Give it execution privileges,
  4. Run it !

I advise you to try it first in a safe environment (like under a temporary user account). And don’t forget to backup everything before playing with it: because this script work for me doesn’t mean that it will work for you ! ;)

System backup script: no more endless lock

I’ve just released a new version of my system-backup.py script.

The main update is about the lock file, which I implemented in the last version to keep the script to run twice (or more) in parallel. This is a nice feature to avoid overlapping processes that fight each other to use the same ressources. But in some extreme cases (reboot or power failure during backup, …), the lock file will remain and so will prevent the script to start (until you notice the problem and remove the lock file manually). This new version take care of this problem and is now able to remove the lock automatically if a timeout is reached. It also kill all remaining child processes.

Here is the detailed changelog:

  • Auto-kill the script if the backup process take to much time. Timeout can be defined via a constant.
  • Clean kill: track all child processes to kill them safely before removing the lock file.
  • Require newer versions of python (>= v2.4), rsync (>= v2.6.7) and rdiff-backup (>= v1.1.0).
  • Use --preserve-numerical-ids option when adding rdiff-backup increment.
  • Keep 15 increments by default instead of 20. This value can be easily changed thanks to a defined constant.
  • Remove deleted file first during mirroring and delete outdated increments before adding a new one to gain space. This strategy is safer for target disk with low remaining free space.
  • Tell rsync to print human-readable values.