Installation Guide for a full-featured Debian server

Featured

Here is a collection of articles I wrote during the past year. Together they form a guide that will let you setup a full-featured Debian server. All of these tutorials are based on the recent work I did to setup my personal server on Debian Squeeze.

These articles are independent with each other, meaning you can pick the one your interested in to customize your server and ignore the others.

  1. Setup SMART monitoring tool for HDDs.
  2. Setup Nut to manage the UPS.
  3. Setup Duplicity and Amazon S3 for cloud-based backups.
  4. Setup Exim to relay mails via Gmail.
  5. Setup cron-apt to keep our distribution up to date.
  6. Add a fail2ban deamon.
  7. Setup Munin to monitor our machine.
  8. Basic setup of Nginx + PHP-FPM + MySQL web stack.
  9. Optimizing Nginx + PHP-FPM + MySQL for performances.
  10. Setup PHP APC op-code cache.
  11. Install haveged to get lots of entropy.
  12. Setup a WebDAVs server with Lighttpd.
  13. Setup Mailman + Nginx + Exim for mailing-lists.
  14. Mailman mailing-list migration and merging.

Mailman migration

Last week I detailed how I configured Mailman with Exim and Nginx on a Debian Squeeze. Here are some more notes on how I migrated my mailing lists from my old server (Lenny with Mailman 2.1.11) to the new Mailman installation (Squeeze with Mailman 2.1.13).

First, I remove the default mailman meta-list as I will retrieve the one from the old server:

$ /etc/init.d/mailman stop
$ rmlist -a mailman
$ /var/lib/mailman/bin/genaliases

Then I copy mailing-list data from the old server to the new:

$ rsync --progress -vrae "ssh -C" /var/lib/mailman/lists    root@new.example.com:/var/lib/mailman/
$ rsync --progress -vrae "ssh -C" /var/lib/mailman/archives root@new.example.com:/var/lib/mailman/
$ rsync --progress -vrae "ssh -C" /var/lib/mailman/data     root@new.example.com:/var/lib/mailman/

Back to our new server, fix some rights, check all lists are there, and run the automatic update:

$ chown -R list:list /var/lib/mailman/
$ /etc/init.d/mailman start
$ list_lists
$ /var/lib/mailman/bin/update

Now let Mailman check its databases and fix permission:

$ check_db -a -v
$ check_perms -f -v

At this point you may get this error in your /var/log/exim4/mainlog:

2011-09-13 10:06:09 failed to expand condition "${lookup{$local_part@$domain}lsearch{/var/lib/mailman/data/virtual-mailman}{1}{0}}" for mailman_router router: failed to open /var/lib/mailman/data/virtual-mailman for linear search: Permission denied (euid=101 egid=103)

This can be fixed with (source):

$ chgrp Debian-exim /var/lib/mailman/data/virtual-mailman

You may also encounter this error:

2011-09-13 10:06:09 H=mail-xxx-xxxx.google.com [209.85.000.000] F=<kevin@example.com> rejected RCPT <kev-test@lists.example.com>: Unrouteable address

In this case regenerating Mailman aliases should fix the issue:

$ /var/lib/mailman/bin/genaliases

By the way, to test that Exim is routing mails as expected, your can use the following command:

$ exim -bt kev-test@lists.example.com
R: system_aliases for kev-test@lists.example.com
R: mailman_router for kev-test@lists.example.com
kev-test@lists.example.com
  router = mailman_router, transport = mailman_transport

Last problem I had was mails did not reached my server. Everytime I send something from Gmail to a list, I got back error mails saying this:

Technical details of permanent failure:
Google tried to deliver your message, but it was rejected by the recipient domain. We recommend contacting the other email provider for further information about the cause of this error. The error that the other server returned was: 550 550 relay not permitted (state 14).

I fixed this issue by updating my SPF record on the example.com domain from:

v=spf1 a mx ~all

to:

v=spf1 a mx ptr ~all

How-to setup Mailman + Nginx + Exim on Debian Squeeze

Before going further, please take note that I start this tutorial assuming that you already have a minimal Exim setup running on your Debian machine.

Mailman

Now that you have the context, let’s proceed with Mailman install:

$ aptitude install mailman

During the installation, you’ll be prompted about the languages files you want Mailman web interface support. English is enough for me.

Now Mailman requires a meta-mailing-list from which it will send all mails related to subscription, reminders and all:

$ newlist mailman kevin@deldycke.com

You’ll then be prompted for a password.

After that, Mailman will provide you with a list of directives to add to /etc/aliases:

mailman:              "|/var/lib/mailman/mail/mailman post mailman"
mailman-admin:        "|/var/lib/mailman/mail/mailman admin mailman"
mailman-bounces:      "|/var/lib/mailman/mail/mailman bounces mailman"
mailman-confirm:      "|/var/lib/mailman/mail/mailman confirm mailman"
mailman-join:         "|/var/lib/mailman/mail/mailman join mailman"
mailman-leave:        "|/var/lib/mailman/mail/mailman leave mailman"
mailman-owner:        "|/var/lib/mailman/mail/mailman owner mailman"
mailman-request:      "|/var/lib/mailman/mail/mailman request mailman"
mailman-subscribe:    "|/var/lib/mailman/mail/mailman subscribe mailman"
mailman-unsubscribe:  "|/var/lib/mailman/mail/mailman unsubscribe mailman"

This update is not necessary, as Exim will handle them automatically.

You can now restart the Mailman server:

$ /etc/init.d/mailman start

Oh, and the first time you’ll run Mailman, do a start as above, not a restart, else you’ll end up with this error:

Restarting Mailman master qrunner: mailmanctl PID unreadable in: /var/run/mailman/mailman.pid
[Errno 2] No such file or directory: '/var/run/mailman/mailman.pid'
Is qrunner even running?

If everything is alright, you’ll receive a mail similar to this one:

Nginx

Now we have to configure our HTTP server to make the administration interface available from the web. If Apache is the recommended server to use with Mailman, Nginx is already running on my machine, so let’s use it instead.

First, as explained on Nginx wiki we need to install fcgiwrap:

$ aptitude install fcgiwrap

Then we have to create an Nginx configuration file dedicated to Mailman. Assuming we want all mailing-lists managed under the lists.example.com domain, here are the directives you have to put in a new /etc/nginx/sites-available/mailman file:

server {
  server_name lists.example.com;

  root /usr/lib/cgi-bin;

  location = / {
    rewrite ^ /mailman/listinfo permanent;
  }

  location / {
    rewrite ^ /mailman$uri;
  }

  location /mailman {
    include /etc/nginx/fastcgi_params;
    # Fastcgi socket
    fastcgi_pass  unix:/var/run/fcgiwrap.socket;
    # Disable gzip (it makes scripts feel slower since they have to complete
    # before getting gzipped)
    gzip off;
  }

  location /images/mailman {
    alias /var/lib/mailman/icons;
  }

  location /pipermail {
    alias /var/lib/mailman/archives/public;
    autoindex on;
  }
}

server {
  server_name *.lists.example.com .lists.example.org .lists.example.net;
  rewrite ^ http://lists.example.com$request_uri? permanent;
}

The configuration above is a mix between the one available on Nginx wiki and the /usr/share/doc/fcgiwrap/examples/nginx.conf example file that come with the Debian package.

All we have to do now is to activate the configuration above and restart our CGI and HTTP server:

$ ln -s /etc/nginx/sites-available/mailman /etc/nginx/sites-enabled/
$ /etc/init.d/fcgiwrap restart
$ /etc/init.d/nginx restart

If everything’s OK, going to http://lists.example.com will show you this:

Exim

Now we have to setup the MTA. All informations here are coming from the documentation you can find on your Debian system in /usr/share/doc/mailman/README.Exim4.Debian.gz.

First, we have to update /etc/mailman/mm_cfg.py (the global Mailman configuration file). We’ll aligned there the default URLs, hosts and MTA-related parameters:

--- /etc/mailman/mm_cfg.py.orig    2011-08-31 22:28:53.000000000 +0200
+++ /etc/mailman/mm_cfg.py 2011-09-07 22:43:41.000000000 +0200
@@ -57,16 +57,16 @@
 #-------------------------------------------------------------
 # If you change these, you have to configure your http server
 # accordingly (Alias and ScriptAlias directives in most httpds)
-DEFAULT_URL_PATTERN = 'http://%s/cgi-bin/mailman/'
-PRIVATE_ARCHIVE_URL = '/cgi-bin/mailman/private'
+DEFAULT_URL_PATTERN = 'http://%s/mailman/'
+PRIVATE_ARCHIVE_URL = '/mailman/private'
 IMAGE_LOGOS         = '/images/mailman/'

 #-------------------------------------------------------------
 # Default domain for email addresses of newly created MLs
-DEFAULT_EMAIL_HOST = 'server123.example.net'
+DEFAULT_EMAIL_HOST = 'lists.example.com'
 #-------------------------------------------------------------
 # Default host for web interface of newly created MLs
-DEFAULT_URL_HOST   = 'server123.example.net'
+DEFAULT_URL_HOST   = 'lists.example.com'
 #-------------------------------------------------------------
 # Required when setting any of its arguments.
 add_virtualhost(DEFAULT_URL_HOST, DEFAULT_EMAIL_HOST)
@@ -94,7 +94,10 @@
 # Uncomment if you use Postfix virtual domains (but not
 # postfix-to-mailman.py), but be sure to see
 # /usr/share/doc/mailman/README.Debian first.
-# MTA='Postfix'
+MTA = 'Postfix'
+POSTFIX_ALIAS_CMD = '/bin/true'
+POSTFIX_MAP_CMD = 'chgrp Debian-exim'
+POSTFIX_STYLE_VIRTUAL_DOMAINS = ['lists.example.com']

 #-------------------------------------------------------------
 # Uncomment if you want to filter mail with SpamAssassin. For

Then we have to update the Exim configuration template. If like me you haven’t choose to split configuration into small files, here are the modifications you have to add to /etc/exim4/exim4.conf.template:

--- /etc/exim4/exim4.conf.template.orig 2011-09-07 23:34:53.000000000 +0200
+++ /etc/exim4/exim4.conf.template       2011-09-07 23:44:45.000000000 +0200
@@ -395,6 +395,21 @@
 ### end main/03_exim4-config_tlsoptions
 #####################################################
 #####################################################
+### main/04_local_mailman_macros
+#####################################################
+# Home dir for your Mailman installation -- aka Mailman's prefix
+# directory.
+MAILMAN_HOME=/var/lib/mailman
+MAILMAN_WRAP=MAILMAN_HOME/mail/mailman
+
+# User and group for Mailman, should match your --with-mail-gid
+# switch to Mailman's configure script.
+MAILMAN_USER=list
+MAILMAN_GROUP=daemon
+#####################################################
+### end main/04_local_mailman_macros
+#####################################################
+#####################################################
 ### main/90_exim4-config_log_selector
 #####################################################

@@ -1371,6 +1386,44 @@
 ### end router/900_exim4-config_local_user
 #####################################################
 #####################################################
+### router/970_local_mailman
+#####################################################
+# Messages get sent out with
+# envelope from "mailman-bounces@virtual_domain"
+# But mailman doesn't put such addresses
+# in the aliases. Recognise these here.
+mailman_workaround:
+  debug_print = "R: mailman_workaround for $local_part@$domain"
+  domains = +local_domains
+  require_files = MAILMAN_HOME/lists/$local_part/config.pck
+  driver = accept
+  local_parts = mailman
+  local_part_suffix_optional
+  local_part_suffix = -bounces : -bounces+* : \
+           -confirm+* : -join : -leave : \
+           -subscribe : -unsubscribe : \
+           -owner : -request : -admin : -loop
+  transport = mailman_transport
+  group = MAILMAN_GROUP
+
+# Mailman lists
+mailman_router:
+  debug_print = "R: mailman_router for $local_part@$domain"
+  domains = +local_domains
+  condition = ${lookup{$local_part@$domain}lsearch{MAILMAN_HOME/data/virtual-mailman}{1}{0}}
+  require_files = MAILMAN_HOME/lists/$local_part/config.pck
+  driver = accept
+  local_part_suffix_optional
+  local_part_suffix = -bounces : -bounces+* : \
+                      -confirm+* : -join : -leave : \
+                      -subscribe : -unsubscribe : \
+                      -owner : -request : -admin : -loop
+  transport = mailman_transport
+  group = MAILMAN_GROUP
+#####################################################
+### end router/970_local_mailman
+#####################################################
+#####################################################
 ### router/mmm_mail4root
 #####################################################

@@ -1689,6 +1742,25 @@
 ### end transport/35_exim4-config_address_directory
 #####################################################
 #####################################################
+### transport/40_local_mailman
+#####################################################
+mailman_transport:
+  debug_print = "T: mailman_transport for $local_part@$domain"
+  driver = pipe
+  command = MAILMAN_WRAP \
+            '${if def:local_part_suffix \
+                  {${sg{$local_part_suffix}{-(\\w+)(\\+.*)?}{\$1}}} \
+                  {post}}' \
+            $local_part
+  current_directory = MAILMAN_HOME
+  home_directory = MAILMAN_HOME
+  user = MAILMAN_USER
+  group = MAILMAN_GROUP
+  freeze_exec_fail = true
+#####################################################
+### end transport/40_local_mailman
+#####################################################
+#####################################################
 ### retry/00_exim4-config_header
 #####################################################

Don’t apply this diff as-is, as the original file contain the modifications I previously made to let Exim use Gmail to send mails.

Then we have to update the Exim meta-configuration that is stored in /etc/exim4/update-exim4.conf.conf. There we specify our host (lists.example.com) and public IP address (123.456.78.90):

dc_eximconfig_configtype='smarthost'
dc_other_hostnames='lists.example.com'
dc_local_interfaces='127.0.0.1 ; ::1 ; 123.456.78.90'
dc_readhost='lists.example.com'
dc_relay_domains='lists.example.com'
dc_minimaldns='false'
dc_relay_nets=''
dc_smarthost='smtp.gmail.com:587'
CFILEMODE='644'
dc_use_split_config='false'
dc_hide_mailname='false'
dc_mailname_in_oh='true'
dc_localdelivery='mail_spool'

Finally, our hostname must be a FQDN, so we have to add it to /etc/hosts:

--- /etc/hosts.orig        2011-09-12 13:52:19.000000000 +0200
+++ /etc/hosts     2011-09-12 12:21:31.000000000 +0200
@@ -1,7 +1,7 @@
 # Do not remove the following line, or various programs
 # that require network functionality will fail.
 127.0.0.1      localhost.localdomain localhost
-123.456.78.90   server123.example.net
+123.456.78.90   server123.example.net lists.example.com
 # The following lines are desirable for IPv6 capable hosts
 #(added automatically by netbase upgrade)
 ::1     ip6-localhost ip6-loopback

Then we have to regenerate Exim’s configuration before restarting Mailman:

$ update-exim4.conf --verbose
$ /etc/init.d/exim4 restart
$ /etc/init.d/mailman restart

Testing

You can now test your setup by creating a test mailing-list:

$ newlist kev-test

Now subscribe some test users and play with this mailing-list.

By monitoring /var/log/mailman/error, you’ll maybe run into this error:

IOError: [Errno 13] Permission denied: '/var/lib/mailman/archives/private/kev-test.mbox/kev-test.mbox'

This can be easily fixed with:

$ chown -R list /var/lib/mailman/archives/private/

Once you’re convinced that Mailman is working as expected, you can remove your temporary test mailing-list, and regenerate aliases to clean things up:

$ rmlist -a  kev-test
$ /var/lib/mailman/bin/genaliases

Munin monitoring

Finally, if like me you use Munin to monitor your machine, then it’s a good idea to let it graph some Mailman usage:

$ wget http://exchange.munin-monitoring.org/plugins/mailman-queue-check/version/2/download --output-document=/usr/share/munin/plugins/mailman-queue-check
$ wget http://exchange.munin-monitoring.org/plugins/mailman_subscribers/version/3/download --output-document=/usr/share/munin/plugins/mailman_subscribers
$ ln -s /usr/share/munin/plugins/mailman-queue-check /etc/munin/plugins/
$ ln -s /usr/share/munin/plugins/mailman_subscribers /etc/munin/plugins/
$ echo "[mailman*]
user root
" > /etc/munin/plugin-conf.d/mailman
$ chmod 755 /usr/share/munin/plugins/mailman*
$ /etc/init.d/munin-node restart

My Nginx + PHP-FPM + MySQL configuration

This article is a follow-up to the one I wrote 3 months ago, in which I explained how to install a web stack based on Nginx, PHP-FPM and MySQL on a Debian Squeeze server. Now it’s time to tune this basic install to get some performance out of it.

The setup I’ll detail below runs on an OVH VPS instance. This virtual server has 4 CPU cores at 1.5GHz, 1 Go RAM and 50 Gb HDD.

I’m mostly running WordPress instances on that server, so you’ll see some reference of it in this post.

MySQL

First, let’s tune MySQL. That’s the easiest part of that article, as you only need to create a .cnf file in /etc/mysql/conf.d/ and place there all your custom parameters. Here is the content of my /etc/mysql/conf.d/kev.cnf:

[mysqld]
interactive_timeout = 50
join_buffer = 1M
key_buffer = 250M
max_connections = 100
max_heap_table_size = 32M
myisam_sort_buffer_size = 96M
query_cache_limit = 4M
query_cache_size = 250M
query_prealloc_size = 65K
query_alloc_block_size = 128K
read_buffer_size = 1M
read_rnd_buffer_size = 768K
sort_buffer_size = 1M
table_cache = 4096
thread_cache_size = 1024
tmp_table_size = 32M
wait_timeout = 500
# Debug
#general_log_file = /var/log/mysql/mysql.log
#general_log = 1
# InnoDBinnodb_buffer_pool_size = 256Minnodb_additional_mem_pool_size = 10Minnodb_log_file_size = 32Minnodb_flush_method = O_DIRECTinnodb_file_per_table = 1innodb_flush_log_at_trx_commit = 0
[mysqld_safe]
nice = -5
open_files_limit = 8192

[isamchk]
key_buffer = 64M
sort_buffer = 64M
read_buffer = 16M
write_buffer = 16M

Most of these parameters were set for my particular usage and with insights from the MySQL Tuning Primer Script.

PHP-FPM

Unlike MySQL, the structure of PHP configuration files on Debian Squeeze doesn’t let us easily add our customizations. We have to modify the default files provided at the package installation.

Here is my setup of the PHP processes pool:

--- /etc/php5/fpm/pool.d/www.conf.orig     2011-06-07 08:14:30.000000000 +0200
+++ /etc/php5/fpm/pool.d/www.conf  2011-08-15 17:34:09.000000000 +0200
@@ -237,3 +237,10 @@
 ;php_admin_value[error_log] = /var/log/fpm-php.www.log
 ;php_admin_flag[log_errors] = on
 ;php_admin_value[memory_limit] = 32M
+
+pm.max_children = 25
+pm.start_servers = 4
+pm.min_spare_servers = 2
+pm.max_spare_servers = 10
+pm.max_requests = 500
+request_terminate_timeout = 30

The second customization I made is not about performances but convenience. It just allow my WordPress’ users to upload larger files:

--- /etc/php5/fpm/php.ini.orig      2011-06-18 13:32:37.000000000 +0200
+++ /etc/php5/fpm/php.ini   2011-06-22 22:50:49.000000000 +0200
@@ -725,7 +725,7 @@

 ; Maximum size of POST data that PHP will accept.
 ; http://php.net/post-max-size
-post_max_size = 8M
+post_max_size = 15M

 ; Magic quotes are a preprocessing feature of PHP where PHP will attempt to
 ; escape any character sequences in GET, POST, COOKIE and ENV data which might
@@ -876,7 +876,7 @@

 ; Maximum allowed size for uploaded files.
 ; http://php.net/upload-max-filesize
-upload_max_filesize = 2M
+upload_max_filesize = 15M

 ; Maximum number of files that can be uploaded via a single request
 max_file_uploads = 20

Nginx

Let’s say my WordPress blog is installed in /var/www/my_wordpress. To let it be served by Nginx, we add a configuration file for this site in /etc/nginx/sites-available/my_wordpress:

server {
  server_name blog.example.com;
  root /var/www/my_wordpress/;
  include /etc/nginx/wordpress.conf;
  location /static {
    autoindex on;
  }
}

server {
  listen 80 default_server;
  server_name .example.com .example.org .example.net;
  rewrite ^ http://blog.example.com$request_uri? permanent;
}

In the configuration above, you can see that I want my blog to be served at http://blog.example.com. I also added some domain redirections in the form of a second server section, and a way to better display my static file repository by letting Nginx generate index pages.

Then don’t forget to activate this site:

$ ln -s /etc/nginx/sites-available/my_wordpress /etc/nginx/sites-enabled/

The file above refer to /etc/nginx/wordpress.conf which is where I place all the configuration directives common to all the WordPress blogs on my server. Here is the content of that file:

# This order might seem weird - this is attempted to match last if rules below fail.
# See: http://wiki.nginx.org/HttpCoreModule
location / {
  try_files $uri $uri/ /index.php?q=$uri&$args;
}

# Add trailing slash to */wp-admin requests.
rewrite /wp-admin$ $scheme://$host$uri/ permanent;

include global.conf;

include php.conf;

Again, this file make a reference to php.conf, which is the same as the one featured in my previous article. I only removed the index directive to place it elsewhere, and added a limit on the number of PHP requests a client can make:

location ~ \.php$ {
  # Throttle requests to prevent abuse
  limit_req zone=antidos burst=5;

  # Zero-day exploit defense.
  # http://forum.nginx.org/read.php?2,88845,page=3
  # Won't work properly (404 error) if the file is not stored on this server, which is entirely possible with php-fpm/php-fcgi.
  # Comment the 'try_files' line out if you set up php-fpm/php-fcgi on another machine.  And then cross your fingers that you won't get hacked.
  try_files $uri =404;

  fastcgi_split_path_info ^(.+\.php)(/.+)$;
  include /etc/nginx/fastcgi_params;

  # As explained in http://kbeezie.com/view/php-self-path-nginx/ some fastcgi_param are missing from fastcgi_params.
  # Keep these parameters for compatibility with old PHP scripts using them.
  fastcgi_param PATH_INFO       $fastcgi_path_info;
  fastcgi_param PATH_TRANSLATED $document_root$fastcgi_path_info;
  fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;

  # Some default config
  fastcgi_connect_timeout        60;
  fastcgi_send_timeout          180;
  fastcgi_read_timeout          180;
  fastcgi_buffer_size          128k;
  fastcgi_buffers            4 256k;
  fastcgi_busy_buffers_size    256k;
  fastcgi_temp_file_write_size 256k;

  fastcgi_intercept_errors    on;
  fastcgi_ignore_client_abort off;

  fastcgi_pass 127.0.0.1:9000;
}

Here is where the index directive moved: /etc/nginx/conf.d/kev.conf. I also added there some tweaks and the global request throttling configuration:

# Hide Nginx version
server_tokens off;

# Set default index file names
index index.php index.html index.htm;

# Allow uploads up to 15 Mo
client_max_body_size 15m;

# Create a global request accounting pool to prevent DOS
limit_req_zone $binary_remote_addr zone=antidos:10m rate=3r/s;

The global.conf file we saw in /etc/nginx/wordpress.conf refer to /etc/nginx/global.conf, which contain additional measures to remove cruft from log files and enhance security:

# Do not log excessive request on common web content like favicon and robots.txt
location = /favicon.ico {
  log_not_found off;
  access_log off;
}
location = /robots.txt {
  allow all;
  log_not_found off;
  access_log off;
}

# Deny all attempts to access any dotfile (=hidden files) such as .htaccess, .htpasswd, .DS_Store, .directory, .svn, .git, ...
location ~ /\. {
  deny all;
  access_log off;
  log_not_found off;
}

All of default Nginx configuration can’t be overridden by additional files. We have to change /etc/nginx/nginx.conf itself:

--- /etc/nginx/nginx.conf.orig   2011-06-06 00:46:56.000000000 +0200
+++ /etc/nginx/nginx.conf        2011-08-15 17:44:58.000000000 +0200
@@ -3,8 +3,9 @@
 pid /var/run/nginx.pid;

 events {
-       worker_connections 768;
-       # multi_accept on;
+       use epoll;
+       worker_connections 1024;
+       multi_accept on;
 }

 http {
@@ -16,7 +17,7 @@
        sendfile on;
        tcp_nopush on;
        tcp_nodelay on;
-       keepalive_timeout 65;
+       keepalive_timeout 3;
        types_hash_max_size 2048;
        # server_tokens off;

That’s all for our customizations. We can now restart all our servers:

$ /etc/init.d/mysql restart
$ /etc/init.d/php5-fpm restart
$ /etc/init.d/nginx restart

Conclusion

I’m running my websties under this configuration for about 3 months and I’m really happy with the results. I’m sure I can push optimizations further, but it may require lots of time and effort compared to the marginal gain I’ll get. My websites are responsive enough to me. And if they collapse in the future under the load of the Reddit crowd, I’ll still have the option to move to a bigger virtual server (vertical scaling FTW!).

Cloud-based Server Backups with Duplicity and Amazon S3

For years I was backing up my server with website-backup.py, a custom script I wrote to manage data mirroring, do incremental backups and monthly snapshots based on rdiff-backup, rsync, tar and bzip2. All these data were pushed to a storage server hosted at home.

I’ve just replaced my script with duplicity, a tool written by the same author of rdiff-backup. And Amazon S3 cloud storage replaced my home server. Here is how I did it.

First, we need to create an account on Amazon AWS. This is easy and fast. My account was activated in minutes.

Now that you have access to Amazon’s cloud, let’s create a bucket on S3. I used the reversed domain name of the server, which give me a bucket name like com.example.server.backup. With this naming scheme, I can identify the purpose of the bucket by its label only.

Duplicity can use the cheaper RRS storage, but you need at least version 0.6.09. Having a Debian Squeeze, the only way to get a recent version is to install it from the backports:

$ apt-get -t squeeze-backports install duplicity python-boto

Then I created a simple symmetric key with GPG:

$ gpg --gen-key

You absolutely need to provide a passphrase, else Duplicity will refuse to run.

Now update the script below with the GPG key passphrase and your AWS credentials:

# Do not let this script run more than once
[ `ps axu | grep -v "grep" | grep --count "duplicity"` -gt 0 ] && exit 1

# Set some environment variables required by duplicity
export PASSPHRASE=XXXXXXXXXX
export AWS_ACCESS_KEY_ID=XXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXX

# ~/.cache/duplicity/ should be excluded, as explained in http://comments.gmane.org/gmane.comp.sysutils.backup.duplicity.general/4449
PARAMS='--exclude-device-files --exclude-other-filesystems --exclude **/.cache/** --exclude **/.thumbnails/** --exclude /mnt/ --exclude /tmp/ --exclude /dev/ --exclude /sys/ --exclude /proc/ --exclude /media/ --exclude /var/run/ --volsize 10 --s3-use-rrs --asynchronous-upload -vinfo'
DEST='s3+http://com.example.server.backup'

# Export MySQL databases
mysqldump --user=root --opt --all-databases > /home/kevin/mysql-backup.sql

# Do the backup
duplicity $PARAMS --full-if-older-than 1M / $DEST

# Clean things up
duplicity remove-older-than 1Y --force --extra-clean $PARAMS $DEST

# Remove temporary environment variables
unset PASSPHRASE
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY

Before running duplicity, the script will dump all MySQL databases to a plain-text file. Then the first duplicity call will do the backup itself, and the second call will remove all backup older than a year.

I saved the script above in /home/kevin/s3-backup.sh and cron-ed it:

$ chmod 755 /home/kevin/s3-backup.sh
$ echo "
# Backup everything to an Amazon S3 storage
0 1 * * * root /home/kevin/s3-backup.sh
" > /etc/cron.d/s3-backup

I can now sleep better knowing all the work I do on my server will not be lost in case of a catastrophic event. Amazon S3 is today a no-brainer for server backups: your data will be secured and available. And for small quantity of data (like the 10 Go of my server), it’s incredibly cheap. Especially if you compare it to the cost of maintaining a storage server at home.

This solution is so good and obvious, that I don’t know why I haven’t implemented it earlier… :)

Better Entropy on a Debian Squeeze server

While generating a GPG key on my server, I got the following error:

Not enough random bytes available. Please do some other work to give the OS a chance to collect more entropy! (Need 283 more bytes)

That’s a well known issue on headless servers. Thanks to a comment on Hacker News, I knew there was a way to fix this thanks to software entropy generator, like the havege deamon.

My server is running Debian Squeeze. Luckily, a package is available in the backport repository. All we have to do is to add the latter in our source list before installing haveged:

$ echo 'deb http://backports.debian.org/debian-backports squeeze-backports main' > /etc/apt/sources.list.d/squeeze-backports.list
$ apt-get update
$ apt-get -t squeeze-backports install haveged

Now you can get a proof that haveged is running by monitoring your entropy. Here is for example the Munin graph of my server, on which you can clearly see the big jump in available entropy:

If I’m not sure about the quality of the randomness it generate on virtual machines, haveged is still a really practical solution for lack of entropy on a server.