Eliminating System Downtime

Eliminating System Downtime

April 19

System downtime kills. The more we rely on IT systems and data to run our businesses, the more periods without access to those systems or data hurt.

Our tolerance for downtime (often called recovery time objective, or RTO) has to drive our disaster recovery planning, even more than data loss and your recovery point objective. There have been solutions to the data loss problem for years, so these days, the true determining factor in beating a disaster is your ability to get back up and running with little or no downtime.

Getting Started

In order to effectively eliminate system downtime, you need to understand at least three broad things:

  1. What systems do you have that could go down?
  2. What can cause these systems to go down?
  3. What effect does this downtime have on your business?

Even a basic answer to these questions can help you eliminate system downtime from your list of worries, but as you’ve probably guessed, the better you understand the answers to these questions, the more effective your solution will be.

So how do you find the answers to these questions for your business?

It starts with asking a bunch of other questions. The examples given below are not meant to be an exhaustive list, but collectively, they can help you answer the three questions above. As you go through the process, you’ll probably come up with all sorts of questions that you need to find the answers for as well (I’d love for you share some of those questions in the comments). Make sure you document your answers and come up with a system for reviewing your answers as time goes on. As you grow, the answers to many of these questions will change.

Some Possible Questions

  • What IT systems do you use to operate your business?
  • Which systems do you own and which ones are operated by a third party? (Obviously, your responsibility and ability to react in a disaster differs radically from systems that you operate directly to systems that belong to a third party.)
  • Which systems are on-premise and which ones operate from somewhere else?
  • How do these various systems interact with each other?
  • Which systems could you live without, if you had to?
  • How does your entire IT environment get power, Internet, etc.?
  • How does each individual system get power?
  • How does each individual system connect to your network?
  • How do these systems connect to each other?
  • If an individual system went down, what effect would it have on each other system?
  • What dependencies do you have in place?
  • Do you have adequate security/firewall/antivirus protocols in place?
  • Do your employees understand what to do in case of a security breach or hacker?
  • What natural disasters are likely to occur in your area?
  • What is the annual monetary cost of system downtime for your company? (The Aberdeen Group suggests using a formula like this: # of downtime events X average length in hours X cost per hour of downtime = yearly downtime cost.)
  • What is the annual monetary cost of downtime for each specific system? (You can use the same formula.)
  • What other adverse effects does system downtime have on your business? (For example, does it affect your credibility? Your productivity?)
  • If you had an IT-disaster (such as a server failure) in one of your systems, how long would it take you to get that system up and running?
  • If you suffered a major disaster (such as a natural disaster), how long would it take you to get up and running?

Other Considerations

Offsite backup in geographically diverse locations is one piece of the downtime defense, but it alone cannot beat the beast. When considering a solution, you need one that makes it easy to turn your backups into production systems, whether through virtualization or some other means.

Remember that the key in fighting downtime is not necessarily getting every system up in pristine order, but in keeping your business operating. You can rebuild your backend later, but not if your business fails first, so when you’re building your downtime-killer, consider what technologies you need to get things going fast.

Remember, backups are only useful in eliminating system downtime if you can a) access them even if your building is flooded and b) use them to run your business until you can restore your primary systems, so when you’ve found the answers to your downtime questions, take that information and use to answer one last question: Can my current backup and disaster recovery solution eliminate my system downtime?

If the answer is no, it’s time to go shopping.

Photo Credit: cell105 via Compfight cc