Redundancy and Failure in Climbing and Computers

Redundancy and Failure in Climbing and Computers

June 28

When you’re talking about data, not having redundancy of data, equipment, and so forth, and not testing disaster recovery plans can mean you lose lots of money, and in some cases might be forced to close your doors for good. This is an unfortunate truth, but in the grand scheme of things, nobody is likely to die if data disappears (the exception is health-related data in the medical industry).

In the world of climbing, not testing and only relying on single pieces of equipment can mean grave injury and death. There’s a lot climbing can teach you about how to approach redundancy and testing with regard to your backup and disaster recovery plan.

1. Check your equipment

The first thing you do before climbing anything is inspect your equipment for cracks, chips, and other clues about its integrity. If there’s any question about the safety of a carabiner, rope, cam, or anything, you must assume it’s going to fail and therefore can’t be trusted with your life. Luckily, many climbers have lots of extra equipment as backups.

When you’re talking about computers, there are clues that might tell you a hard drive is nearing its end of life. You can also think about its age. The average server lifespan is 3-5 years, how old is the one in question? Think about equipment age and how much strain is consistently put on it so that you know what to do when it’s no longer useful. Like the climbing equipment, if there’s any indication that it will fail, plan on it.

2. Double check equipment

When you climb, you rely on a belayer on the ground to make sure that if you fall, something catches you. He or she hangs on to your rope and keeps you from falling to the ground if you slip—it’s team work. But your teammate is useful for more than just that. Before you climb, it’s important to make sure all of your belayer’s equipment is set up properly, and your belayer should check that yours is as well. Your life depends on it, so double checking each other’s gear is essential to making sure everything is safe before any climbing actually happens. The small things can elude anybody.

In the computer world, you might not always set up computer systems, networks, and phones yourself. If there’s any doubt about how one of your employees configured something, or about how a client configured something (this is especially true of clients who set up their own ShadowProtect backups on their desktops), it’s best to double check, if possible. While it’s impossible (and no doubt costly) to check every single thing anybody does, it’s worthwhile to double check that critical procedures like backups are being done properly. Team work saves lives (and data).

3. Redundancy

People in the climbing world have died because they didn’t build redundancy into their safety systems, and especially when it comes to anchors. Basically, an anchor is a piece of nylon webbing or other material that you tie to anchor bolts at the top of the rock you’re climbing. Together, the anchor bolts and webbing are used to safely attach the rope (and therefore you) to the rock. Essentially, the anchor bolts, along with the anchor itself, stop falls and hold static loads. In sport climbing, webbing is generally attached to two anchor bolts. There are a few ways you can tie your webbing for your anchor, but some of them aren’t safe.

The American death triangle is a method sometimes used by inexperienced climbers, and doesn’t have redundancy built in, which means that if one anchor bolt fails, the other anchor gets a shock-load and the entire thing is much more likely to fail. If both bolts fail then there’s nothing holding you to the rock and you’ll fall and be injured or killed. The alternative is to build redundancy into your anchor so that if one end fails the other remains and can hold the rope and you until you can get down safely.

When you look at computer networks there are things that clearly need redundancy. The first is probably your data, the second might be power, and the third might be things like phone lines, Internet sources, and things of that nature. You should do whatever it takes to make sure that when one thing fails, no matter what it is, you’ve got another on standby to save your butt.

Photo Credit: ground.zero via Compfight cc