Backups and Synchronisations

The following is a brief introduction on how to write up a simple backup script in bash that takes advantage of the date command, and the use of rsync to produce a synchronised mirror, which could also be used for backup purposes.

Backup with bash

Shell scripting is not as powerful as a complete programming language when cross-platform support is required, or the tasks are resource-intensive, or when complex data structures, or if floating-point operations or complex numbers are needed. In these cases you should use a language like C, FORTRAN or Java. But for many tasks one will be pleasantly surprised how useful shell scripting can be. Indeed, many file and text manipulation tasks which are complex in a the aforementioned programming languages are simple with shell scripting.

The most basic form of scripting simply follows commands in sequence, such as this rather undeveloped backup script, which runs tar (archive) and gzip (compression) on the home directory. First we use vim to create the script, add a identifier for bash, the single script line, exit vim, turn into an executable and run the command. There should be a file called [homeuser].tgz when the script is complete.

tar cvfz [username].tgz /home/[username]

This is, of course, no different to typing the command at the prompt. Not much of a script! So let's start adding some variables to it. Let's make use of the preset date and time commands to add to the file to provide an archive with a timestamp within the filename as variables and, indeed, make the file itself a variable name.

BU=homeuser$(date +%Y%m%d).tgz
tar cvfz $BU /home/[username]

Note the space between "date" and the addition sign.

The file can be made into an executable by changing its attributes or by running it with sh i.e,.

$ chmod +x
$ sh
$ ./

The created tgz file can then be copied to an a backup partition or other media using cp or scp as appropriate.

Rsync - Synchronisation Utility

Rsync provides as way to keep repositories (e.g., directories) in synchronization. For example, a home directory and a backup. The advantage of rsync (apart from being free and open-source of course) is that it performs incremental backups. The initial backup will be at least as slow, like cp or an scp. After that however rsync tracks changes, and probably should be used in preference to those commands. There is no point copynig and recopying an entire file when only a few characters have changed. Another advantage of rsync is that it is versatile; it can copy locally, to networked hosts, or even to remote rsync daemons, and has a great number of options to control its behaviour.

Most modern Linux systems have rsync installed as binary, such is its popularity. It'll usually be found in /usr/bin/rsync. If it is not installed, it can be installed as package through the usual Debian/Ubuntu methods (e.g., sudo apt-get install rsync or, with RHEL/Centos yum install rsync.

Or, you can download from source, a fairly simple process, plus you get the sourcey goodness. Which of course means you can look at the code (it's quite small) and work out how the good people worked their magic.

mkdir -p /usr/local/src/RSYNC
cd /usr/local/src/RSYNC
tar xvf rsync-3.0.9.tar.gz
install=$(basename $(pwd) | sed 's%-%/%')
./configure --prefix=/usr/local/$install
make install

Either way, you'll have rsync installed. Now, let's look at using it. The simplist activity is to copy files or directories that are on the same machine. This uses the same general syntax as any other rsync procedure (i.e., rsync source destination). It also uses the -a (--archive) option, which is shorthand for the options for recursing into directories, copying symlinks as symlinks, preserving permissions, modification times, and groups, along with transferring character and block devices and named sockets. These latter parts can cause some issues which will be noted soon. Usually you will want some extra information about what is happening as the sync is occurring, so the -v option is also typical.

Thus a very simple transfer takes the form of the following example:

rsync -av /home/user /media/backup

One will note the (lack) of trailing slashes. Rsync doesn't really care about trailing slashes on the destination address. But it does matter a great deal in the source address (which is not the case with cp, for example). If there is no trailing slash on the source address it will copy files with that source directory (e.g., in the above example, /media/backup/user). If there is a trailing slash on the sourc directory (e.g., /home/user/) the source files, but not the directory, will be backed up (e.g., /media/backup/files).

Often you will find that you want to synchronise accrose a network, using the -e option and the protocol. Whilst rsync is nice enough to give one the option of using insecure protocols, in reality it is probably best to stick to something like ssh (e.g., -e ssh). Another common feature, particularly if you're doing snapshots rather than a historical record of backups is --delete, which will delete files on the destination that are not on the source. Give careful consideration on whether you want to do this or not.

rsync -av -e --delete ssh /home/user user@remotesystem:/backup/directory

Again, think about whether you really want the --delete option. If you are not sure, take it out. It will mean that you might end up with multiple copies of old files scattered around the place, but if that's preferable to losing files that you didn't think you wanted to keep, then take it out.

Another issue that one will probably run across is virtual file systems in home directories (e.g., .gvfs). These provide users easy access to remote data via SFTP etc., and are pretty much requisite on any modern system. So check any virtual file systems (~/.wine or ~/.cache/gvfs are promising annoyances) and add these to your synchronisation command. A complete remote server, verbose, safe synchronisation with permissions can thus look like the following:

rsync -av --exclude=.gvfs -e ssh /home/user user@remotesystem:/backup/directory

Also consider using the --exclude-from 'file-and-directorylist.txt' option.

If one is ever in doubt of the results of an rsync command, using --dry-run means that it doesn't actually do anything but with -v will produce output of what it would do.

Some time in the future I'll elaborate this post to include material on rdiff and rsnapshot. For now, there's the following further reading:

Further Reading