Saltycrane logo

SaltyCrane Blog

Notes on Python, Django, and web development on Ubuntu Linux

    

Backup on Linux: rsnapshot vs. rdiff-backup (vs. Time Machine)

Apple's release of Leopard and the included backup utility, Time Machine, has generated a lot of talk about backups recently. I will admit Time Machine is pretty cool and believe that it is a bit more than a glorified GUI on top of an existing *nix tool as some have claimed. However, the core functionality is very similar to the command-line tool, rsnapshot, which is itself based on a rsync script by Mike Rubel. Time Machine added a couple of features and a GUI to make it easy to use. Since I prefer the command line over GUIs most of the time anyways, rsnapshot seemed perfect for me.

To be complete, I researched a number of other backup utilities for Linux. Dirvish and flyback were out because I prefer the command line and they didn't seem to offer anything more than rsnapshot. Scripting rsync wouln't get me anything more than rsnapshot either, plus it would be more work. In the end, I eliminated all but rsnapshot and another command line tool called rdiff-backup. Rdiff-backup has some advantages over rsnapshot (and Time Machine) because it stores compressed deltas of each version of a file instead of a complete copy of the file each time it changes. This is not a big deal for small files, but for large files that change often, it makes a significant difference. However, the big disadvantage of rdiff-backup, for me, was the inablility to perform different levels of backup, such as hourly, daily, weekly, monthly, etc. Depending on the needs of the user, this could negate the space saving advantage by needing to keep a large number of snapshots.

I ended up choosing rsnapshot over rdiff-backup because of this last reason. It seems rdiff-backup is closer to a version control tool and rsnapshot closer to a traditional backup solution. It would be great to create a hybrid of the two tools to gain the advantages of each. I started to dig into the source of rdiff-backup (python source makes me happy), but I didn't want to get too sidetracked with another project. For now, I am using Mercurial to version control my /etc and /home/.* config files and rsnapshot as a broader, general purpose backup tool.


Here is my comparison of rsnapshot and rdiff-backup:

Similar Features
  • both use an rsync-like algorithm to transfer data (rsnapshot actually uses rsync; rdiff-backup uses the python librsync library)
  • both can be used over ssh (though rsnapshot cannot push over ssh without some extra scripting)
  • both use a simple copy of the source for the current backup
Written in rsnapshot is written in Perl; rdiff-backup is written in Python and C
Size rdiff-backup stores previous versions as compressed deltas to the current version similar to a version control system. rsnapshot uses actual files and hardlinks to save space. For small files, storage size is similar. For large files that change often, such as logfiles, databases, etc., rdiff-backup requires significantly less space for a given number of versions.
Speed rdiff-backup is slower than rsnapshot
Metadata rdiff-backup stores file metadata, such as ownership, permissions, and dates, separately.
Transparency For rsnapshot, all versions of the backup are accessible as plain files. For rdiff-backup, only the current backup is accessible as plain files. Previous versions are stored as rdiff deltas.
Backup levels rsnapshot supports multiple levels of backup such as monthly, weekly, and daily. rdiff-backup can only delete snapshots earlier than a given date; it cannot delete snapshots in between two dates.
Community Based on the number of responses to my post on the mailing lists (rsnapshot: 6, rdiff-backup: 0), rsnapshot has a more active community.

12 Comments


#1 yungchin commented on 2008-02-27:

Hi, nice post. I'd be interested to read more about how you've set up version control on your home dir (could make a nice blog post? :)).

I haven't figured out a manageable way yet (using bzr) - there are always lots of new files appearing that have to go to your ignore-list, or something like that...

Thanks! YC


#2 sofeng commented on 2008-03-26:

yungchin,
thanks. i am still refining the process of how to use the version control. i plan to post about it sometime soon. please let me know what solution you go with as well.


#3 manuel quinta commented on 2008-10-17:

I use rdiff-backup since years to make backups over ssh, and i'am happy with it.

we used rsnapshot on our server, to make a backup over ssh. but it took a lot of cpu? also we did a crazy missconfiguration, as we wanted to stop the hourly backup, because of the cpu load. wenn you suspend the hourly backup, it will not make a daily... it just stoped to make any backup!

we did only see it by incident a half a year later, but we did not lose any data! luckely!


#4 Eliot commented on 2008-10-17:

manuel, Thanks for the notes on your experiences.


#5 Chris Wilper commented on 2008-12-01:

Note on rdiff-backup, for transparency, there is a FUSE implementation called archfs that allows you to mount all snapshots in a read-only filesystem. It's a little rough around the edges, but I've been impressed now that I've got it to work.

It occurs to me that it should be possible to run rdiff-backup against the fuse-exposed snapshots, in order, skipping the ones you don't want. Haven't tried it, but it would be an interesting experiment...


#6 Eliot commented on 2008-12-08:

Chris, thanks a lot for the information. Please feel free to leave a link to your notes if you've posted any.


#7 johnP commented on 2009-06-18:

We've been using rdiff-backup for a few months. Our requirements are for a cross platform, cheap, solution, so anything that required hard-link support couldn't be considered. Personally, I'd wipe all the MS-windows server machines, but our customers would be mad. We're mostly *NIX for production systems, with a few Windows dev VMs.

In that time, we've recovered from "oops" moments twice. Once was during an email system upgrade that couldn't be completed and the other was while screwing with an Alfresco system permissions model that failed in the end. Backups rock, but folks here already know that.

Has anyone solved the remote Windows backup issue in a good free way that can actually be restored without loss of user and group permissions?

rdiff-backup is mostly good, but there are a few problems. Large file differencing doesn't work in our experience. If it doesn't crash, you'll get a completely new copy. We tried backing up complete Xen image files this way. We got around this issue by mounting the IMG files on the host and pushing the rdiff-backup to another system on the same LAN. Complete system backups are 2-3 minutes now.

That works for Xen, but not VirtualBox with a Vista-64 host. Any ideas?


#8 Eliot commented on 2009-06-22:

johnP, Thank you for adding your experience with rdiff-backup. I'm sorry I can't be of any to help to you. Maybe another visitor will have ideas.


#9 Jakob Malm commented on 2009-07-07:

I've been using rdiff-backup for my home system since March 2008 and am overall happy with it. I've had to restore single files a few times, and though that procedure is not very straightforward, it's been easy enough to figure out how to do it the few times I've needed it. The FUSE file system seems interesting.

My biggest gripe with rdiff-backup is that backing up large files like VirtualBox disk images takes very long. Since there is no progress indication, and I do backups manually (when I see a need for it), this can be a bit frustrating. But I haven't experienced any problems with it like johnP writes, and the resulting increment files appear to be reasonably small.


#10 Eliot commented on 2009-07-08:

Jakob, thanks for adding your experience also. Good to hear rdiff-backup makes you happy.


#11 koolnyze commented on 2009-10-09:

I have been using rdiff-backup for a long time and it seems to be working well.


#12 Maccy commented on 2010-01-11:

I have been using rdiff-backup for a couple of years now with just a few restore operations and so far i am lucky with it.

Recently i did a small trial on changing meta data of media files such as comments on jpeg's or ID3-tags on mp3 or OGG files. Since rdiff-backup is storing only diffs even on binary files, i expected the diffs to be tiny. However, the diffs i got so far are roughly halve the size of the original file when i changed a few characters.

The background for this is the question whether media data should have a rsync or a rdiff backup. Currently I'm doing rsync on my pictures (jpg) and audio files (ogg). My plan was to change this to rdiff, basically to be save against deletion of the files. This would be a perfect solution if rdiff-backup would find the real differences in the file, e.g. the changed matadata. Otherwise it will mess up my backup system with a lot of unnecessary data. So for the time beeing i will keep rsync for media data and rdiff-backup for all the other files.

Post a comment

Required
Required, but not displayed
Optional

Format using Markdown. (No HTML.)
  • Code blocks: prefix each line by at least 4 spaces or 1 tab (and a blank line before and after)
  • Code span: surround with backticks
  • Blockquotes: prefix lines to be quoted with >
  • Links: <URL>
  • Links w/ description: [description](URL)
Created with Django | Hosted by Slicehost