System backup script: no more endless lock

I’ve just released a new version of my system-backup.py script.

The main update is about the lock file, which I implemented in the last version to keep the script to run twice (or more) in parallel. This is a nice feature to avoid overlapping processes that fight each other to use the same ressources. But in some extreme cases (reboot or power failure during backup, …), the lock file will remain and so will prevent the script to start (until you notice the problem and remove the lock file manually). This new version take care of this problem and is now able to remove the lock automatically if a timeout is reached. It also kill all remaining child processes.

Here is the detailed changelog:

  • Auto-kill the script if the backup process take to much time. Timeout can be defined via a constant.
  • Clean kill: track all child processes to kill them safely before removing the lock file.
  • Require newer versions of python (>= v2.4), rsync (>= v2.6.7) and rdiff-backup (>= v1.1.0).
  • Use --preserve-numerical-ids option when adding rdiff-backup increment.
  • Keep 15 increments by default instead of 20. This value can be easily changed thanks to a defined constant.
  • Remove deleted file first during mirroring and delete outdated increments before adding a new one to gain space. This strategy is safer for target disk with low remaining free space.
  • Tell rsync to print human-readable values.

3 thoughts on “System backup script: no more endless lock

  1. This version does not work on my server as it produced many errors as below:

    Traceback (most recent call last):
      File "bq-my-system-backup-2007_08_12.py", line 357, in ?
        main()
      File "bq-my-system-backup-2007_08_12.py", line 303, in main
        updateLockFile()
      File "bq-my-system-backup-2007_08_12.py", line 164, in updateLockFile
        lock_data['process_list'] = getProcessList()
      File "bq-my-system-backup-2007_08_12.py", line 249, in getProcessList
        children = getRecursiveProcessChildren(parent_pid=script_pid)
      File "bq-my-system-backup-2007_08_12.py", line 236, in getRecursiveProcessChildren
        children = getProcessChildren(parent_pid)
      File "bq-my-system-backup-2007_08_12.py", line 225, in getProcessChildren
        child_pid = int(child_info_list[0])
    ValueError: invalid literal for int(): ps:
    
  2. Traceback (most recent call last):
    File "bq-my-system-backup-2007_08_12.py", line 357, in ?
    main()
    

    is common ?

  3. The 1st time running the script it always crashes w/:

        exit_code = waitpid(child.pid, 0)[1]
    OSError: [Errno 10] No child processes
    

    as a work around I use this:

      try:
        exit_code = waitpid(child.pid, 0)[1]
      except OSError:
        exit_code = 1
    

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.