USB off-site backup

2023-03-31

6 minutes

#backups #privacy #security #self-hosting

Today is world backup day, a day to highlight the importance of backups, protecting data, and keeping systems secure (at least that's what Wikipedia says). I'm taking this day as a chance to review my backup strategy, and make sure I'm happy with the coverage I'm getting.

I mentioned a few years ago in my backup strategy post that I have an additional USB drive as part of my backups, which provides some coverage many may be missing. I've had a few people comment on how much of a great idea it is, so I wanted to highlight why I have it, and the part it plays in my wider backup and disaster recovery plan.

#What does it do?

The reason I have this USB is to cover a use case I'm surprised more people haven't actively considered. For proper backups you should have at least:

3 distinct copies
2 different formats
1 off-site

If it doesn't exist in 3 places, it might as well not exist. However, it's all well and good having those locations, but how do you ensure you still have access to those locations? For me, I have all my data at home, and an off-site version in Backblaze. Should something happen to my server, I can probably restore it using some access keys I have lying around. But what if I lose everything - what happens then? I don't have access to either my Backblaze credentials, nor the encryption key used by restic to encrypt my data. It's no good having backups if you can't access them!

And that's where this USB comes in. It's an additional backup, but one that I can access physically, with an encryption key I know. So in the event of an issue, I can get to my USB, use it to log in to Backblaze, and start rebuilding my life.

<aside>

Obviously in reality there are other important things I'd have to deal with should my house burn down before getting my data back.

</aside>

#What is it?

Well, it's a USB drive with my data on, obviously! Careful thought went into every decision made about the design of the device to make sure it's as fit for purpose as possible.

The drive itself is a Corsair Survivor. I needed a drive that's rugged, reliable, and was available in a reasonable capacity at a reasonable price. The Survivor is both waterproof and shockproof, which helps it survive most environments without issue.

I think the "surprisingly flimsy" comment is either a bad unit, or that the shock of being thrown at the ground from a short distance is unreasonable. There are some other videos on the internet which filled me with confidence:

I bought 2 drives and configured them identically. On reflection, if I were following the laws of probability correctly, I would have bought the drives separately to minimise the risk of a temporary manufacturing issue causing me problems. But so far, so good.

#Partitions

Moving deeper, the drive itself contains a single, LUKS-encrypted, BTRFS partition. Now before you all start screaming at your screen "Why not ZFS?!", there's a few good reasons.

Firstly, I needed something well supported. Sure, ZFS is fantastically supported, but I wanted something simple, and the last thing I wanted to deal with was DKMS issues when trying to recover my data. BTRFS works absolutely fine on a single partition (I'm writing this from a BTRFS-on-root laptop). Heck, I could even do it on Windows!

Secondly, the features. I'll move on to how the backups are captured shortly, but in short, filesystem snapshots aren't of use to me. What is however is bit-rot detection and a little compression, which BTRFS has. Detecting when there's something odd-looking about the drive during a backup, rather than not discovering it until it's time to restore will save me a huge amount of headache. Compression isn't a huge factor here, as the drive is much larger than I'm actually using, but it can help a little.

#Where is it?

If you came here trying to find out where I keep my backups, you're out of luck I'm afraid.

The 2 drives are in separate locations. One I keep in my house, in a different room to my server, which means if there's some kind of issue there, I may still be able to recover easily.

The other is in an undisclosed location off-site. Somewhere far enough away that it won't be affected by physical damage to my house, but not so far away that I have to travel for hours and hours to restore from it. It does mean the backups don't get updated very often, but the initial seed is there, and that's a start.

<tangent>

No, it's not in a wall under a rock that has no earthly business being where it is.

</tangent>

The configuration for it lives on my GitLab server, allowing me to version and sync the config easily between devices, and even modify it slightly without access to the drives. It contains a few pieces of semi-sensitive data, so it's kept private I'm afraid.

#How does it work?

Ah yes, the fun part.

The core of my USB backup is rsnapshot - a wrapper around rsync which includes automation and snapshot functionality. rsnapshot connects to servers over SSH, and pulls down files efficiently (or as efficiently as rsync can - sorry zfs send fans). Just needing SSH made it both simple and secure to pull down any files I need, from any of my servers, whilst maintaining any permissions. rsnapshot's snapshotting feature worked by creating hard links, meaning it's only the different files which take up any space. Sure, it's not perfect, as a small change in a large file duplicates the entire file, but on my scale that was ok.

Permissions wise, I have had to do a bit of a hack. Because I'm a sane system administrator, root SSH is disabled (if yours isn't, go fix that). Lots of files are owned by someone other than me, and my user doesn't nativly have access to them. To get around this, I use this:

/etc/sudoers

jake ALL=(ALL) NOPASSWD: /usr/bin/rsync

This line, when added to my sudoers file lets me run rsync without needing to enter my password. I can then combine that with rsync using --rsync-path="sudo rsync". Yes, it's not ideal for security reasons, as rsync can then be used to read any file, but it requires an attacker have access to my user account. If they already have that, then as far as I'm concerned I've already lost.

The backup script looks like this:

backup.sh

#!/usr/bin/env bash

set -e

cd "$(dirname "$0")"

# Enable compression
btrfs property set "$PWD" compression zstd

# Create config
trap 'rm -f "$CONFIG_FILE"' EXIT
CONFIG_FILE=$(mktemp) || exit 1

envsubst < ./rsnapshot.conf > "$CONFIG_FILE"

echo "> Capturing snapshots..."
rsnapshot -c "$CONFIG_FILE" -V main

echo "> Scrubbing drive..."
sudo btrfs scrub start -B "$PWD"

First, I ensure compression is setup for BTRFS. There's no need to do this, but it's good to assert the drive is configured correctly.

Secondly, we create the configuration for rsnapshot. rsnapshot's configuration format is quite simple (and requires tabs for indenting), and doesn't support any kind of variable replacement, nor referencing things relative to the working directory. To ensure backup logs and scripts could be referenced relative, I needed to insert $PWD in a few places. For that, I template the configuration with envsubst into a temporary file, and point rsnapshot at that instead. Sure, it's a bodge, but it works. trap makes sure the file is removed when the script exits.

Thirdly, we run rsnapshot with the previously-created config file. -V shows exactly what's being run, and most usefully to me, the download progress. main on the end references the retention period, which for me is 14. Not 14 days, 14 versions. Inside rsnapshot it pulls down files from a variety of servers, and runs a few local scripts. The first being a helper script to just backup a directory listing rather than files themselves, which I use for my Linux ISO collection. Finally, perhaps the most useful

Finally, we initiate a manual scrub of the drive, to confirm all data on it looks correct and nothing has corrupted during the backup or whilst the drive was idle. It runs in the foreground so I know it's still doing things, and to not unplug the drive until it's finished.

#How's it working?

Well, fortunately I'm glad to say I've never needed to resort to using it. But it's incredibly comforting knowing it's there. It's been nearly 2 years since I first started this, and it's now got all my important data in geographically-distinct locations, easy to access should the worst happen.

The biggest problem with the system is actually me: I forget to use it. Currently, I run the backup "when I remember", which is far from a sustainable model. In this case, the backup is still incredibly useful, even if it's a little old, as the critical data on it doesn't change too much, and my Backblaze backups will continue running regardless. But newer data is definitely better.

Sure, I could add a recurring even to my calendar to do it, but where's the fun in that. An idea I've considered is adding a healthcheck ping to the end of the backup, and allowing healthchecks.io to scream at me until I actually run the backup. Given I use healthchecks to monitor my other backups and recurring tasks, it makes sense to keep everything in 1 place.

But, for now, it works.

Share this page

Backup Strategy 2021

2021-06-06

8 minutes

#backups #self-hosting #server-2020

Backups are critical to any systems longevity and reliability. If you’re not backing up your data, stop reading this now, go do it, then come back… Assuming none of you suddenly panicked and left, let’s keep going. You can keep telling yourself otherwise, but eventually, every system will experience some…

State of the Apps 2023

2023-01-01

16 minutes

#arch #linux #privacy #security #self-hosting

It's that time of year again, time to steal some of Cortex's search rankings to talk about my own "State of the Apps" - the applications and setups I use to make my life what it is. Since my last post, and in fact in just the last few weeks,…

HDD harddisk drive storage backup tool hardware

Remote to remote backups with Duplicati

2020-05-22

3 minutes

#backups #self-hosting

Duplicati is one of my favourite backup system. It’s pretty fast, supports numerous backup sources, and has a nice configuration web UI. Unfortunately however, it can’t be used to back up remote files. In fact, I can’t find a nice fully-features backup solution which does do this, which sucks. Another…