Technology

Backup Strategies

I’ve done an awful lot of disaster recovery (DR) planning and implementation (and thankfully very little execution!) over the years. A well-made DR plan usually has a lot of work go into it during the planning stages, followed by a few testing sessions that can easily test the mettle of the most hearty sysadmin. After everything is finished and all the stakeholders are happy, the documented plan is filed-away in several safe places; contracts are signed for DR sites, hardware, and support; and life pretty much goes back to normal. A DR plan worth having includes at least one full-up test per year, and hopefully includes much more frequent tests of feeder systems necessary to support a DR situation (backup media, offsite backup storage, etc.).

I used to do these plans for large companies, whose very existence hinged upon their ability to get their data back online and get the servers back on the internet as soon as possible. Any downtime was a giant loss. Their data wasn’t just some old school papers and photos from Aunt Marge’s birthday party–their data was their only real asset. It wasn’t just used in their business, it strongwas/strong their business.

I propose that an effective DR plan should be in every media professional’s life. It isn’t enough to just have your media files sitting on a RAID. That data strongis/strong your business. What would you do if you lost it all tomorrow? Don’t think it can happen? Would you like to make a bet with me on that? I’ve seen disks fail in ways that even the company that made them can’t describe. Sure, that RAID mirror set that you have is pretty safe, until the power supply develops a fault and takes-out both drives in a split second. Or there’s a fire, or flood, or earthquake, or other calamity that renders your RAID unreadable. Now what do you do? 10+ years of work were stored on that RAID, including the three jobs you’re working in post now. Sorry, Mr. Client, but I wasn’t careful with my data and lost the interviews we did with your now-deceased company founder/relative/what-have-you. Congratulations, you just lost that job, probably lost the client, and most likely lost other clients as well.

All of this can be prevented with a little bit of forethought. Sure, you’ll have to spend a little bit in order to implement your own DR plan, but you strongcan/strong (and, really, if you are going to be a professional, you strongmust/strong) do it.  It’s never too late to start, and it will be time well spent when you need it (notice that I didn’t say if).

Let’s start by talking a bit about backup strategies. This is likely going to be a series of posts, so we’ll start with the basics and move on from there.

I think that a good backup strategy is multi-tiered. What is this, you ask? Well, an ordinary single-tier backup strategy that many people use is simply making a second copy of their data. This is better than nothing, but it isn’t completely fail-safe. The biggest problems here are that the backup media is rarely checked for consistency with the original data, and the backup media becomes a single point of failure for the DR scenario. On top of that, making this system work often requires that the backup media be readily accessible to the user (after all, who is going to use a backup system that is so inconvenient as to be irritating?), which usually means that the backup media is not stored in a safe location.

One of the ways to fix these problems is to go to a multi-tiered solution. A well-designed solution can help to give the user the best of both worlds:  A system that is both fast and easy to use, and a system that can get a disaster-stricken user back on their feet as quickly as possible.  A multi-tiered solution might include a few different backup systems, sometimes with their own strategies. One way to implement this type of solution might be to have a network-attached disk to which the system writes periodic full (all-data) backups, and daily (or even hourly) incremental (only changed data since the last full) backups.  Then, once per week, a full backup is taken and automatically shipped over the network to a service like Amazon’s S3 or another off-site network storage location.  Finally, once per month, a full backup is taken and placed onto a removable drive, which is then stored in an off-site physical storage location. When that disk is taken to the storage location, the oldest disk in storage is removed and brought back to take the next monthly backup. This system can be further supplemented with media-specific backups of projects as they occur, such as using DVDs to burn photos by day, week, or project, and then storing those DVDs off-site.

The result of a strategy like that outlined above is that in the case of a minor issue, such as a failed drive in a laptop or desktop, or even a whole machine failure, there is a very recent backup stored locally that can be restored to quickly regain access to all of the data. If backups are taken hourly, then chances are good that the user maybe lost an hour worth of work. In the event of a larger issue, such as a theft involving the office and main data store, the options exist to either restore from the monthly stored off-site, or restore from the weekly stored on the network. The network-stored weekly backup has another advantage, too:  If something occurred that forced the business to open in an alternate geographic location, that weekly backup is accessible from just about anywhere. It might take some extra time to get the image, but the business is back online with only a week’s worth of lost data, and could be back online in some far-flung area. In the result of an issue that might take the company off of the network or where there might be some question as to the integrity of data on the network-stored weekly, there is the off-site monthly that can be pulled-in and restored.

Adding media-specific backups to this strategy makes it even more flexible. Accidentally deleted a master photo during culling? No problem–retrieve the DVD from its safe storage location, pop it back into the system, copy the necessary file(s), and everything is back to normal.

Next time, I’ll talk about what sort of data is important, and will touch on a few technologies to help make all of this work.

Jonathan does a lot of stuff. If you ask Jenny, maybe he does too much stuff.