Sunday, January 13, 2008

grok rsync for automatic, redundant backups

Been under the weather a little bit... but had a fun day Saturday while watching NFL playoff football and WSU Coug basketball... and did some shell scripting while on the couch with the tv on:

...wrote a couple Bash Shell scripts to automatically backup all files on my Mac with rsync over ssh (secure, encrypted connection) each night. The scripts back up the files to two external harddrives connected to our home network. I placed the scripts in .crontab and set them to run every day at 3:15 a.m.

The first file:
# remove week old backup (E.g. backup-Users-Sunday)
rm -rf /{path_to_your_backup_server_here}/backup-Users-`date +%A`
# move current to today:
mv /{path_to_your_backup_server_here}/current-Users/ /{path_to_your_backup_server_here}/backup-Users-`date +%A`
# using hard links instead of multiple copies of the files
# when the new full backup is run with rsync using the --link-dest
# option, if any of the files being backed up from the /Users/
# directory on your machine

# existed during the previous backup (now in backup-Users-[date]), a
# hard link is created between the file in hte backup-current and the
# backup-Users-[date] directory

rsync --link-dest=/{path_to_your_backup_server_here}/backup-Users-`date +%A` -avz /Users/ /{path_to_your_backup_server_here}/current-Users/


The second file, if you do not have crontab already setup:
# .crontab
# numbers or asterisks are separated by tabs
# minute hour mday month wday command
15 03 * * * sh /Users/{your_user_name}/.cron.daily/backup.sh

The beauty of rsync is that it does a compare between the local and remote files using checksums, and only transfers the blocks that are different. So after the backups tonight, it will be minimal traffic nightly, just the items that have changed each night.

I used two old external harddrives that I have had for about 4 years -- the total space is about 300 GB only, but that should give us plenty of space for awhile (we don't have a lot of video). I have it setup to back up to a folder called "current-Applications" or "current-Users", etc... each day it copies those "current" files to a folder named after the day... such as "backup-Users-Monday", "backup-Users-Tuesday"... then copies any updated or new files to the current directory. So this way there is a snapshot of the last 7 days of backups.

...I'll be setting up a crontab for a few sites I run to update their RSS feeds, etc, instead of running a daemon, so those run automated on their own. The Golf News will probably run 2x per day (3:30 a.m., 11 a.m weekdays) -- with Crontab, can setup the days of the year it will run years in advance.

Remember, the computers work for us, we don't work for the computers!