Today mdadm send me a mail to warn that one of my hard drive (/dev/hdd1) was ejected from my RAID-5 array. After some manipulations (no writes, just reads on the file system to get information) and reboots, I ended up with a file system in a strange state: the folder structure was totally messed up and lots of files disappeared.

Assuming that this situation was about an inconsistent file index, I decided to reset the superblocks of the remaining physical disks:

1 $ mdadm --zero-superblock /dev/hdc1
2 $ mdadm --zero-superblock /dev/hdb1

I don’t know why I decided to do so, but it was the stupidest idea of the week. After such a violent treatment, my array refused to start:

1 $ mdadm --assemble /dev/md0 --auto --scan --update=summaries --verbose
2 mdadm: looking for devices for /dev/md0
3 mdadm: no RAID superblock on /dev/hdc1
4 mdadm: /dev/hdc1 has wrong raid level.
5 mdadm: no RAID superblock on /dev/hdb1
6 mdadm: /dev/hdb1 has wrong raid level.
7 mdadm: no devices found for /dev/md0

At this moment I was sure that all my data assets were lost. I was desperate. My only alternative was to ask Google. So I did.

I spend several minutes browsing the web without hope. I finally found someone in the same situation as mine (sorry, in french) on debian-user-french mailing list.

The solution was to recreate the RAID array. This sound counter-intuitive: if we recreate a raid array over an existing one, it will be erased! Right? Wrong! As it is said on debian-user-french, mdadm is smart enough to “see” that HDD of the new array were elements of a previous one. Knowing that, mdadm will try to do its best (i.e. if parameters match the previous array configuration) and rebuild the new array upon the previous one in a non-destructive way, by keeping HDD content.

So, here is how I finally recovered my RAID array:

1 $ mdadm --create /dev/md0 --verbose --level=5 --raid-devices=3 /dev/hdc1 missing /dev/hdb1
2 mdadm: layout defaults to left-symmetric
3 mdadm: chunk size defaults to 64K
4 mdadm: size set to 312568576K
5 mdadm: array /dev/md0 started.

Of course this doesn’t solve my initial problem about the /dev/md0 file system: it is still in an altered state. Maybe it’s too late to recover data. But at least I reverted all my today’s mistakes, and the situation will not deteriorate until I power up my RAID! :)