Cloud-based Server Backups with Duplicity and Amazon S3

For years I was backing up my server with website-backup.py, a custom script I wrote to manage data mirroring, do incremental backups and monthly snapshots based on rdiff-backup, rsync, tar and bzip2. All these data were pushed to a storage server hosted at home.

I’ve just replaced my script with duplicity, a tool written by the same author of rdiff-backup. And Amazon S3 cloud storage replaced my home server. Here is how I did it.

First, we need to create an account on Amazon AWS. This is easy and fast. My account was activated in minutes.

Now that you have access to Amazon’s cloud, let’s create a bucket on S3. I used the reversed domain name of the server, which give me a bucket name like com.example.server.backup. With this naming scheme, I can identify the purpose of the bucket by its label only.

Duplicity can use the cheaper RRS storage, but you need at least version 0.6.09. Having a Debian Squeeze, the only way to get a recent version is to install it from the backports:

$ apt-get -t squeeze-backports install duplicity python-boto

Then I created a simple symmetric key with GPG:

$ gpg --gen-key

You absolutely need to provide a passphrase, else Duplicity will refuse to run.

Now update the script below with the GPG key passphrase and your AWS credentials:

# Do not let this script run more than once
[ `ps axu | grep -v "grep" | grep --count "duplicity"` -gt 0 ] && exit 1

# Set some environment variables required by duplicity
export PASSPHRASE=XXXXXXXXXX
export AWS_ACCESS_KEY_ID=XXXXXXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXXXXXX

# ~/.cache/duplicity/ should be excluded, as explained in http://comments.gmane.org/gmane.comp.sysutils.backup.duplicity.general/4449
PARAMS='--exclude-device-files --exclude-other-filesystems --exclude **/.cache/** --exclude **/.thumbnails/** --exclude /mnt/ --exclude /tmp/ --exclude /dev/ --exclude /sys/ --exclude /proc/ --exclude /media/ --exclude /var/run/ --volsize 10 --s3-use-rrs --asynchronous-upload -vinfo'
DEST='s3+http://com.example.server.backup'

# Export MySQL databases
mysqldump --user=root --opt --all-databases > /home/kevin/mysql-backup.sql

# Do the backup
duplicity $PARAMS --full-if-older-than 1M / $DEST

# Clean things up
duplicity remove-older-than 1Y --force --extra-clean $PARAMS $DEST

# Remove temporary environment variables
unset PASSPHRASE
unset AWS_ACCESS_KEY_ID
unset AWS_SECRET_ACCESS_KEY

Before running duplicity, the script will dump all MySQL databases to a plain-text file. Then the first duplicity call will do the backup itself, and the second call will remove all backup older than a year.

I saved the script above in /home/kevin/s3-backup.sh and cron-ed it:

$ chmod 755 /home/kevin/s3-backup.sh
$ echo "
# Backup everything to an Amazon S3 storage
0 1 * * * root /home/kevin/s3-backup.sh
" > /etc/cron.d/s3-backup

I can now sleep better knowing all the work I do on my server will not be lost in case of a catastrophic event. Amazon S3 is today a no-brainer for server backups: your data will be secured and available. And for small quantity of data (like the 10 Go of my server), it’s incredibly cheap. Especially if you compare it to the cost of maintaining a storage server at home.

This solution is so good and obvious, that I don’t know why I haven’t implemented it earlier… :)

Subversion commands

Native commands

  • Revert current local folder to revision 666:
    svn merge -rHEAD:666 ./
    
  • Create an empty repository:
    svnadmin create ./my-repo
    
  • Dump a repository (a sure way to migrate a subversion repository from one version to another):
    svnadmin dump ./my-repo > ./my-repo.dmp
    
  • Migrate a remote Subversion repository without creating an intermediate dump file:
    ssh -C user@myserver.com "svnadmin dump /home/user/my-repo" | svnadmin load /home/user2/my-new-repo
    
  • Launch a standalone Subversion server listening on port 3690 and serving all repositories located in ./repos/:
    svnserve --daemon --listen-port 3690 --root ./repos/
    

Local working copy hacking

  • Recursive and case insensitive content search on non-binary files from the current folder, while ignoring .svn folders and their content:
    find ./ -type f -not -regex ".*\/.svn\/.*" -exec grep -Iil "string to search" {} \;
    
  • Same thing as above but with an alternative approach (that don’t work with large folder content):
    grep -Ii "string to search" $(find . | grep -v .svn)
    

    Other alternative: use ack.

  • Use sed to replace text in all files except in subversion metadatas:
    find ./ -type f -not -regex ".*\/.svn\/.*" -print -exec sed -i 's/str1/str2/g' "{}" \;
    
  • Use svn delete to remove all files containing a tilde in their name without touching local subversion metadatas:
    find -type f -not -regex ".*\/.svn\/.*" -name "*˜*" -print -exec svn delete "{}" \;
    
  • In a repository structure containing sub-projects (thinks of Plone’s collective repository as an example), get the list of all folders in all trunks, while ignoring subversion metadata folders:
    find ./ -type d -regex ".*\/trunk\/?.*" -not -regex ".*\/.svn\/?.*" -print
    
  • Similarly to the command above, replace all occurrences of the string @coolcavemen.fr by @coolcavemen.com in all trunk subfolders while ignoring .svn content:
    find ./ -type f -regex ".*\/trunk\/.*" -not -regex ".*\/.svn\/.*" -print -exec sed -i 's/@coolcavemen\.fr/@coolcavemen\.com/g' "{}" \;
    
  • Set a svn property to ignore all .mo files during commit in every folder of our local working copy containing .po files:
    find ./ -type f -name "*.po" -regex ".*\/trunk\/.*" -not -regex ".*\/.svn\/.*" -printf "%h\n" | uniq | xargs svn propset "svn:ignore" "*.mo"
    

OpenSSH commands

  • Here is the syntax that makes scp support spaces (source):
    scp foo.com:"/home/fubar/some\ folder/file.txt" ./
    
  • Copy a bunch of files to a remote server (or how to use find with scp):
    find /var/log/ -iname "*.log" -type f | xargs -i scp '{}' kevin@myserver:/media/backup/logs/
    
  • Redirect local 8081 port to proxy.company.com:8080 via a SSH tunnel passing through the authorized-server.company.com machine:
    ssh -T -N -C -L 8081:proxy.company.com:8080 kevin@authorized-server.company.com
    
  • Use rsync over different SSH port (source):
    rsync --progress -vrae 'ssh -p 8022' /home/user/docs/ bill@server:/home/user/docs/
    

System & Shell commands

  • Run a process detached to the current terminal:
    nohup my_command &
    
  • Get the exit code of the latest runned command:
    echo $?
    
  • Run the last command as root (source):
    sudo !!
    
  • Show the user under which I’m currently logged in:
    whoami
    
  • If you have the following error:
    -bash: ./myscript.sh: /bin/bash^M: bad interpreter: No such file or directory
    

    Then the fix consist of removing the bad characters:

    sed -i 's/\r//' ./myscript.sh
    
  • Free up some memory by clearing RAM caches (source):
    sync ; echo 3 > /proc/sys/vm/drop_caches
    
  • Display which distro is running the system (source):
    lsb_release -a
    

    or

    cat /etc/lsb-release
    
  • List of most used commands:
    history | awk '{a[$2]++}END{for(i in a){print a[i] " " i}}' | sort -rn | head
    
  • Disable a service on Debian/Ubuntu, then re-enable it:
    update-rc.d my-service-name remove
    update-rc.d my-service-name defaults
    
  • Same thing as above but on a RedHat-like system:
    chkconfig sshd --del
    chkconfig sshd --add