Installation Guide for a full-featured Debian server

Featured

Here is a collection of articles I wrote during the past year. Together they form a guide that will let you setup a full-featured Debian server. All of these tutorials are based on the recent work I did to setup my personal server on Debian Squeeze.

These articles are independent with each other, meaning you can pick the one your interested in to customize your server and ignore the others.

  1. Setup SMART monitoring tool for HDDs.
  2. Setup Nut to manage the UPS.
  3. Setup Duplicity and Amazon S3 for cloud-based backups.
  4. Setup Exim to relay mails via Gmail.
  5. Setup cron-apt to keep our distribution up to date.
  6. Add a fail2ban deamon.
  7. Setup Munin to monitor our machine.
  8. Basic setup of Nginx + PHP-FPM + MySQL web stack.
  9. Optimizing Nginx + PHP-FPM + MySQL for performances.
  10. Setup PHP APC op-code cache.
  11. Install haveged to get lots of entropy.
  12. Setup a WebDAVs server with Lighttpd.
  13. Setup Mailman + Nginx + Exim for mailing-lists.
  14. Mailman mailing-list migration and merging.

System & Shell commands

  • Run a process detached to the current terminal:
    nohup my_command &
    
  • Get the exit code of the latest runned command:
    echo $?
    
  • Run the last command as root (source):
    sudo !!
    
  • Show the user under which I’m currently logged in:
    whoami
    
  • If you have the following error:
    -bash: ./myscript.sh: /bin/bash^M: bad interpreter: No such file or directory
    

    Then the fix consist of removing the bad characters:

    sed -i 's/\r//' ./myscript.sh
    
  • Free up some memory by clearing RAM caches (source):
    sync ; echo 3 > /proc/sys/vm/drop_caches
    
  • Display which distro is running the system (source):
    lsb_release -a
    

    or

    cat /etc/lsb-release
    
  • List of most used commands:
    history | awk '{a[$2]++}END{for(i in a){print a[i] " " i}}' | sort -rn | head
    
  • Disable a service on Debian/Ubuntu, then re-enable it:
    update-rc.d my-service-name remove
    update-rc.d my-service-name defaults
    
  • Same thing as above but on a RedHat-like system:
    chkconfig sshd --del
    chkconfig sshd --add
    

Fuse and sshfs on MacOSX Leopard

I’m used to access distant machine’s file systems via ssh. My favorite environment, KDE, makes things easy thanks to the support of sftp:// URLs via a kio_slave. MacOSX is not as friendly and don’t have any built-in mechanism of that kind.

To get similar features in Leopard, we have to rely on MacFuse and sshfs. I’ll explain here how I’ve installed these components on MacOSX 10.5.

MacFUSE_Banner

First, download the latest MacFuse dmg and install it. FYI, the version I’ve got was MacFuse 2.0.3,2.

Then, download the sshfs executable for Leopard, either the gzipped version or the binary from the SVN as explained in the MacFuse wiki.

From a terminal, rename the binary:

sudo mv ./sshfs-static-leopard ./sshfs

Then allow the binary to be executed and place it in the system:

sudo chmod +x sshfs
sudo install sshfs /usr/local/bin

From now you can test sshfs mounting with the following command:

sshfs user@myserver.net:/folder/ /Network/distant-folder -p 22

I personally had a problem here: sshfs complained about a missing library. I fixed this by downloading the required file from the MacFusion project and copying it beside the sshfs binary:

sudo wget http://www.macfusionapp.org/trac/export/86/trunk/SSHFS/sshnodelay.so
sudo mv ./sshnodelay.so /usr/local/bin/
sudo chmod +x /usr/local/bin/sshnodelay.so

If this fail you can also check:

  • that the current user you’re logged with has access to the distant server with the ssh user@myserver.net command;
  • or that the local mount point exists (you can create it with mkdir -p /Network/distant-folder);
  • and finally, you can add the -o debug option to the sshfs command above to get additional clues.

Now we will automate the mounting of sshfs at every start.

At this point I recommend you to register the root user of your MacOSX system to the distant server:

sudo cat ~/.ssh/id_rsa.pub | sudo ssh -p 22 user@myserver.net "cat >> ~/.ssh/authorized_keys"

If doesn’t exists, we have to create the /etc/fstab to edit it:

sudo touch /etc/fstab
sudo vi /etc/fstab

And add the following directives:

dummy:user@myserver.net:/folder/ /Network/distant-folder sshfs allow_other,auto_cache,reconnect,port=22,follow_symlinks,volname="Distant folder" 0 0

As you can see I’ve added lots of options to accommodate my uses. You can get more informations about sshfs options through traditional help pages:

sshfs --help

MacOSX’s automount daemon will look for a script called mount_sshfs at start. Actually it doesn’t exists on your system, but sshfs command line is compatible with what automount expect. So creating a symbolic link will do the trick:

sudo ln -s /usr/local/bin/sshfs /sbin/mount_sshfs

Finally, we can tell automount to acknowledge all our modifications:

sudo automount -vc

Heroic journey to RAID-5 data recovery

Last week there was a power grid failure which break down my server’s RAID array. I have no UPS (as I’m a skinflint) and no automatic email alerts (because I’m too lazy to set it up). As a result, for 5 days, my 3-disk RAID-5 array was relying on only 2 disks until I noticed the issue…

By using a combination of following commands, I was soon aware of the gravity of the situation:

cat /proc/mdstat
mdadm --examine /dev/sda1

My /dev/sda1 disk was kicked out of the array, so I did the right stuff which consisted of reconstructing the array:

mdadm /dev/md0 -a /dev/sda1

Then, in an unlucky combination of cosmic ray bombardment, spooky action at a distance and astrological misalignment, half-way to the end of the rebuilding process (which can take up to 5 hours), another disk failed ! It was late, I was tired and utterly worried about losing 1.5 To of precious data. In such a bad shape, I was afraid to worsen the situation. So I decided to shutdown the server and sleep on the problem.

The next day I tried to boot my server to find it (surprise !) stuck in the middle of the boot process, with the famous message:

hit control-D to continue or give root password to fix manually

This is “normal” as my server tried to mount the ext3 filesystem from the /dev/md0 partition that was just assembled by mdadm. Of course md0, if assembled and available to the system, was not running because only one disk, out of three, was in a clean state.

I skip here the epic substory in which I wasted days in a search of a working keyboard, but I let you imagine how such adventures makes my week…

Eventually, I was able to analyze the situation in details. My first reflex ? Check that disks are not physically dead:

fdisk -l /dev/sda
fdisk -l /dev/sdb
fdisk -l /dev/sdc

“Linux raid partitions” (type code “fd“) are still there. Good. I assumed here that disks where not physically damaged. Maybe I should have looked at S.M.A.R.T. datas and statistics (via smartmontools). But remember, I’m lazy (and a bit crazy).

The next step was to get informations about the RAID array itself using:

mdadm --detail /dev/md0

which output the status table below (probably inaccurate as I reconstructed it afterwards):

Number   Major   Minor   RaidDevice State
   0       0        0        0      removed
   1       0        0        1      faulty removed
   2       8       33        2      active sync   /dev/sdc1
   3       8       17        3      spare

What this table told us ?

  • The array is up, but not running. One of its device (sdc1) was clean and active, but it’s not enough to get a working RAID-5.
  • My first attempt to rebuild the array lead to an unexpected result: it added sda1 as a spare device (in slot #3).
  • It confirm that sdb1 unexpectedly failed and is now in a bad state (“faulty removed“).

Then I stopped the array and tried to fearlessly (re)assemble it using 3 differents methods:

mdadm -S /dev/md0
mdadm -A /dev/md0
mdadm --assemble /dev/md0 --verbose /dev/sd[abc]1
mdadm --assemble --force --scan /dev/md0 --verbose

It always failed with messages like:

mdadm: failed to RUN_ARRAY /dev/md0: Input/output error
mdadm: /dev/md0 assembled from 1 drives and 1 spare - not enough to start the array.

So I examined each drive from mdadm‘s point of view:

mdadm -E /dev/sda1
mdadm -E /dev/sdb1
mdadm -E /dev/sdc1
mdadm -E /dev/sd[abc]1 | grep Event

The lastest command compare the “Event” attribute of all devices. It output something like:

Events : 0.53120
Events : 0.53108
Events : 0.53120

which indicate that sda1 and sdc1 are somewhat synced (share the same number) and sdb1 “late” (lower number).

Here I’ve got the idea of recreating the raid array without sdb1, relying only on sda1 and sdc1, by using the “magic” (hence dangerous) --assume-clean option. The latter doesn’t build, erase or initialize a new array. It just try to assemble it “as is”. Here is the command:

mdadm --create /dev/md0 --assume-clean --level=5 --verbose --raid-devices=3 /dev/sda1 missing /dev/sdc1

And it worked ! :D

I mounted the md0 partition and cleaned it up:

fsck.ext3 -v /dev/md0
mount /dev/md0

I updated my mdadm configuration before rebooting my server:

mdadm --detail --scan >> /etc/mdadm/mdadm.conf
vi /etc/mdadm/mdadm.conf
reboot

But history repeat itself, and again, the system hang up during boot. Except this time I knew what was happening: the boot process detected the remaining sdb1 device as part of the old array (the one before the regeneration I did above) and tried to run it. Remembering my last year post, I zero-ized the superblock of sdb1:

mdadm -S /dev/md0
mdadm --zero-superblock /dev/sdb1

A server reboot proved I was right and my md0 partition was automagically mounted in altered state:

localhost:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[3] sda1[0] sdc1[2]
      1465143808 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]

unused devices: <none>

I just had to re-add sdb1 to fill the available slot and update the mdadm configuration to get back my array in its initial state:

mdadm --manage /dev/md0 --add /dev/sdb1
mdadm --detail --scan >> /etc/mdadm/mdadm.conf
vi /etc/mdadm/mdadm.conf

System backup script: no more endless lock

I’ve just released a new version of my system-backup.py script.

The main update is about the lock file, which I implemented in the last version to keep the script to run twice (or more) in parallel. This is a nice feature to avoid overlapping processes that fight each other to use the same ressources. But in some extreme cases (reboot or power failure during backup, …), the lock file will remain and so will prevent the script to start (until you notice the problem and remove the lock file manually). This new version take care of this problem and is now able to remove the lock automatically if a timeout is reached. It also kill all remaining child processes.

Here is the detailed changelog:

  • Auto-kill the script if the backup process take to much time. Timeout can be defined via a constant.
  • Clean kill: track all child processes to kill them safely before removing the lock file.
  • Require newer versions of python (>= v2.4), rsync (>= v2.6.7) and rdiff-backup (>= v1.1.0).
  • Use --preserve-numerical-ids option when adding rdiff-backup increment.
  • Keep 15 increments by default instead of 20. This value can be easily changed thanks to a defined constant.
  • Remove deleted file first during mirroring and delete outdated increments before adding a new one to gain space. This strategy is safer for target disk with low remaining free space.
  • Tell rsync to print human-readable values.

System Backup: Auto-Clean and Lock added

I’ve updated the system backup script I’ve released 3 weeks ago to let it clean automatically rdiff-backup folders. This is mandatory because incremental backup process is transactionnal and a power failure or a reboot can break the consistency of the rdiff-backup data repository. So even if such a misfortune happened, the script will be able to revert backups to a previously consistent state.

I’ve also added a locking mechanism to prevent the script to be run twice on the same machine. I’ve added this feature because I start my script every day thanks to cron and some backups can take more than one day.

Finally, all rsync commands will now be run first to reduce the time-window during which all external machines are reached and, as mentionned above, because rdiff-backup can take lots of time to finish its job.

Here is a direct link to the new version of the script. You can also find it in my page dedicated to various linux scripts.