Backups to S3 using Duplicity


First install Duplicity with its dependancies:

apt-get install haveged python-paramiko python-boto python-gobject-2 duplicity

Then generate a gpg key:

gpg −−gen-key

The output might be like this (just confirm everything with return and keep aside the password you'll have to enter):

gpg (GnuPG) 1.4.12; Copyright (C) 2012 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Please select what kind of key you want:
   (1) RSA and RSA (default)
   (2) DSA and Elgamal
   (3) DSA (sign only)
   (4) RSA (sign only)
Your selection? 
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 
Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
        = key expires in n days
      w = key expires in n weeks
      m = key expires in n months
      y = key expires in n years
Key is valid for? (0) 
Key does not expire at all
Is this correct? (y/N) y

You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) "

Real name: Max Mustermann
Email address: test@example.net
Comment: Test123
You selected this USER-ID:
    "Max Mustermann (Test123) "

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.

We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
.....+++++
+++++
We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
....................................+++++
+++++
gpg: key 9C3E2F25 marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
pub   2048R/9C3E2F25 2014-02-26
      Key fingerprint = B1FC A9BD 5797 48FC F795  29C6 C37A 4FF6 9C3E 2F25
uid                  Max Mustermann (Test123) 
sub   2048R/A15DC968 2014-02-26

Supposing that you've created an S3 bucket in region us-east-1 which is called example, your s3 url would look like this: s3://s3-external-1.amazonaws.com/example (you might add a subfolder in case of).

For a list of the region-specific Amazon url's, have a look here

Next you can run duplicity from the command line, however I'd recommend to put this in a bash script:

#!/bin/bash
#take the passphrase that you entered when creating your gpg key
export PASSPHRASE="......." 
#put here your AWS credentials to access your bucket:
export AWS_ACCESS_KEY_ID="......"
export AWS_SECRET_ACCESS_KEY="......."
#run duplicity with all needed parameters:
/usr/bin/duplicity --include /etc --include /var/www --exclude "**" --full-if-older-than 30D --s3-european-buckets --s3-use-new-style --no-encryption / s3://s3-external-1.amazonaws.com/example

This will keep a backup only of the specified folders /etc and /var/www on S3, keeping the last 30 days (–full-if-older-than 30D) – you might want to change this to fit your needs. The option –exclude "**" will take care that you create a backup only of the specified folders.

duplicity –help gives you an overview of all possible parameters:

Usage: 
  duplicity [full|incremental] [options] source_dir target_url
  duplicity [restore] [options] source_url target_dir
  duplicity verify [options] source_url target_dir
  duplicity collection-status [options] target_url
  duplicity list-current-files [options] target_url
  duplicity cleanup [options] target_url
  duplicity remove-older-than time [options] target_url
  duplicity remove-all-but-n-full count [options] target_url
  duplicity remove-all-inc-of-but-n-full count [options] target_url

Backends and their URL formats:
  cf+http://container_name
  file:///some_dir
  ftp://user[:password]@other.host[:port]/some_dir
  ftps://user[:password]@other.host[:port]/some_dir
  hsi://user[:password]@other.host[:port]/some_dir
  imap://user[:password]@other.host[:port]/some_dir
  rsync://user[:password]@other.host[:port]::/module/some_dir
  rsync://user[:password]@other.host[:port]/relative_path
  rsync://user[:password]@other.host[:port]//absolute_path
  s3://other.host/bucket_name[/prefix]
  s3+http://bucket_name[/prefix]
  scp://user[:password]@other.host[:port]/some_dir
  ssh://user[:password]@other.host[:port]/some_dir
  tahoe://alias/directory
  webdav://user[:password]@other.host/some_dir
  webdavs://user[:password]@other.host/some_dir
  gdocs://user[:password]@other.host/some_dir

Commands:
  cleanup 
  collection-status 
  full  
  incr  
  list-current-files 
  restore  
  remove-older-than