Today is world backup day, a day to highlight the importance of backups, protecting data, and keeping systems secure (at least that's what Wikipedia says). I'm taking this day as a chance to review my backup strategy, and make sure I'm happy with the coverage I'm getting.
I mentioned a few years ago in my backup strategy post that I have an additional USB drive as part of my backups, which provides some coverage many may be missing. I've had a few people comment on how much of a great idea it is, so I wanted to highlight why I have it, and the part it plays in my wider backup and disaster recovery plan.
#What does it do?
The reason I have this USB is to cover a use case I'm surprised more people haven't actively considered. For proper backups you should have at least:
- 3 distinct copies
- 2 different formats
- 1 off-site
If it doesn't exist in 3 places, it might as well not exist. However, it's all well and good having those locations, but how do you ensure you still have access to those locations? For me, I have all my data at home, and an off-site version in Backblaze. Should something happen to my server, I can probably restore it using some access keys I have lying around. But what if I lose everything - what happens then? I don't have access to either my Backblaze credentials, nor the encryption key used by
restic to encrypt my data. It's no good having backups if you can't access them!
And that's where this USB comes in. It's an additional backup, but one that I can access physically, with an encryption key I know. So in the event of an issue, I can get to my USB, use it to log in to Backblaze, and start rebuilding my life.
Obviously in reality there are other important things I'd have to deal with should my house burn down before getting my data back.
#What is it?
Well, it's a USB drive with my data on, obviously! Careful thought went into every decision made about the design of the device to make sure it's as fit for purpose as possible.
The drive itself is a Corsair Survivor. I needed a drive that's rugged, reliable, and was available in a reasonable capacity at a reasonable price. The Survivor is both waterproof and shockproof, which helps it survive most environments without issue.
I think the "surprisingly flimsy" comment is either a bad unit, or that the shock of being thrown at the ground from a short distance is unreasonable. There are some other videos on the internet which filled me with confidence:
I bought 2 drives and configured them identically. On reflection, if I were following the laws of probability correctly, I would have bought the drives separately to minimise the risk of a temporary manufacturing issue causing me problems. But so far, so good.
Moving deeper, the drive itself contains a single, LUKS-encrypted, BTRFS partition. Now before you all start screaming at your screen "Why not ZFS?!", there's a few good reasons.
Firstly, I needed something well supported. Sure, ZFS is fantastically supported, but I wanted something simple, and the last thing I wanted to deal with was DKMS issues when trying to recover my data. BTRFS works absolutely fine on a single partition (I'm writing this from a BTRFS-on-root laptop). Heck, I could even do it on Windows!
Secondly, the features. I'll move on to how the backups are captured shortly, but in short, filesystem snapshots aren't of use to me. What is however is bit-rot detection and a little compression, which BTRFS has. Detecting when there's something odd-looking about the drive during a backup, rather than not discovering it until it's time to restore will save me a huge amount of headache. Compression isn't a huge factor here, as the drive is much larger than I'm actually using, but it can help a little.
#Where is it?
If you came here trying to find out where I keep my backups, you're out of luck I'm afraid.
The 2 drives are in separate locations. One I keep in my house, in a different room to my server, which means if there's some kind of issue there, I may still be able to recover easily.
The other is in an undisclosed location off-site. Somewhere far enough away that it won't be affected by physical damage to my house, but not so far away that I have to travel for hours and hours to restore from it. It does mean the backups don't get updated very often, but the initial seed is there, and that's a start.
The configuration for it lives on my GitLab server, allowing me to version and sync the config easily between devices, and even modify it slightly without access to the drives. It contains a few pieces of semi-sensitive data, so it's kept private I'm afraid.
#How does it work?
Ah yes, the fun part.
The core of my USB backup is
rsnapshot - a wrapper around
rsync which includes automation and snapshot functionality.
rsnapshot connects to servers over SSH, and pulls down files efficiently (or as efficiently as
rsync can - sorry
zfs send fans). Just needing SSH made it both simple and secure to pull down any files I need, from any of my servers, whilst maintaining any permissions.
rsnapshot's snapshotting feature worked by creating hard links, meaning it's only the different files which take up any space. Sure, it's not perfect, as a small change in a large file duplicates the entire file, but on my scale that was ok.
Permissions wise, I have had to do a bit of a hack. Because I'm a sane system administrator,
root SSH is disabled (if yours isn't, go fix that). Lots of files are owned by someone other than me, and my user doesn't nativly have access to them. To get around this, I use this:
jake ALL=(ALL) NOPASSWD: /usr/bin/rsync
This line, when added to my
sudoers file lets me run
rsync without needing to enter my password. I can then combine that with
--rsync-path="sudo rsync". Yes, it's not ideal for security reasons, as
rsync can then be used to read any file, but it requires an attacker have access to my user account. If they already have that, then as far as I'm concerned I've already lost.
The backup script looks like this:
#!/usr/bin/env bash set -e cd "$(dirname "$0")" # Enable compression btrfs property set "$PWD" compression zstd # Create config trap 'rm -f "$CONFIG_FILE"' EXIT CONFIG_FILE=$(mktemp) || exit 1 envsubst < ./rsnapshot.conf > "$CONFIG_FILE" echo "> Capturing snapshots..." rsnapshot -c "$CONFIG_FILE" -V main echo "> Scrubbing drive..." sudo btrfs scrub start -B "$PWD"
First, I ensure compression is setup for BTRFS. There's no need to do this, but it's good to assert the drive is configured correctly.
Secondly, we create the configuration for
rsnapshot's configuration format is quite simple (and requires tabs for indenting), and doesn't support any kind of variable replacement, nor referencing things relative to the working directory. To ensure backup logs and scripts could be referenced relative, I needed to insert
$PWD in a few places. For that, I template the configuration with
envsubst into a temporary file, and point
rsnapshot at that instead. Sure, it's a bodge, but it works.
trap makes sure the file is removed when the script exits.
Thirdly, we run
rsnapshot with the previously-created config file.
-V shows exactly what's being run, and most usefully to me, the download progress.
main on the end references the retention period, which for me is 14. Not 14 days, 14 versions. Inside
rsnapshot it pulls down files from a variety of servers, and runs a few local scripts. The first being a helper script to just backup a directory listing rather than files themselves, which I use for my Linux ISO collection. Finally, perhaps the most useful
Finally, we initiate a manual scrub of the drive, to confirm all data on it looks correct and nothing has corrupted during the backup or whilst the drive was idle. It runs in the foreground so I know it's still doing things, and to not unplug the drive until it's finished.
#How's it working?
Well, fortunately I'm glad to say I've never needed to resort to using it. But it's incredibly comforting knowing it's there. It's been nearly 2 years since I first started this, and it's now got all my important data in geographically-distinct locations, easy to access should the worst happen.
The biggest problem with the system is actually me: I forget to use it. Currently, I run the backup "when I remember", which is far from a sustainable model. In this case, the backup is still incredibly useful, even if it's a little old, as the critical data on it doesn't change too much, and my Backblaze backups will continue running regardless. But newer data is definitely better.
Sure, I could add a recurring even to my calendar to do it, but where's the fun in that. An idea I've considered is adding a healthcheck ping to the end of the backup, and allowing healthchecks.io to scream at me until I actually run the backup. Given I use healthchecks to monitor my other backups and recurring tasks, it makes sense to keep everything in 1 place.
But, for now, it works.
Share this page