Harvard Business Review hit the nail on the head when referring to metrics as “The True Measures of Success”. Metrics currently play a prominent role in the healthcare field, the sports arena, and many other industries. Disaster recovery (DR) is no exception. When it comes to DR planning, metrics not only gauge effectiveness, but help organizations meet the goals that are so crucial to their recovery efforts.
There are three primary metrics organizations can use to steer their disaster recovery program:
- Recovery time objective (RTO): Determines how quickly IT systems must be recovered before a business suffers irreparable damage. RTO is normally measured in minutes, hours, or days.
- Recovery point objective (RPO): Determines the maximum amount of data that can be lost in a disaster scenario. Also measured in units of time, RPO is based on backup frequency and other data protection strategies.
- Recovery time actual (RTA): The verified amount of time required to recover IT systems after a disaster. This objective can be determined during an actual emergency or a recovery drill. RTA can help an organization meet RTO or determine if additional strategies or resources are required to meet that objective.
Determining Your DR Metrics by Objectives
Although disaster recovery metrics should be tailored around an organization’s requirements, there are specific factors to consider for each individual objective.
Location and resource allocation play a pivotal role in shaping your recovery objectives. Let’s say you’re recovering to a cold site. You’re equipped with a basic infrastructure that requires you to transfer data and IT personnel to the facility, purchase additional licenses, and install new software before recovery can even begin. In this scenario, your RTO could range anywhere from a couple days to a week as it may take a considerable amount of time to completely recover your systems.
In a hot site scenario, the recovery location is all set up and ready to go. For the most part, you’re simply failing over to the new system, which can be done in a matter of minutes depending on the size of your backups. Since your data is readily available, IT can walk in and immediately start working towards getting your systems back online. Both examples illustrate how RTO is determined at the cost of resources, and how those resources impact recovery time.
Your RPO and backup strategy go hand in hand. Backing up data requires resources in the form of time, technology, and the manpower to make sure everything goes off without a hitch. An organization shooting for an RPO near zero would need to continually send data to a highly scalable recovery destination such as the cloud. This is a rather lofty objective that requires amble resources to accommodate capacity and network traffic without hindering core business operations.
A general rule of thumb is to base RPO on the complexity of your business functions or applications. We’ll use SQL Server as an example. The speed and volume of data is constantly changing in this dynamic environment, so a minimum RPO of one hour would ensure that your backup strategy is protecting enough data to meet your recovery goals. It also helps to know that RPO is typically best determined alongside exhaustive cost-benefit analysis. Whether you’re targeting two hours or 24 hours, it’s important to weigh the cost of losing data against the cost of backing up your data more frequently.
Disaster recovery is a complex undertaking. Meeting your predetermined goals requires extensive resources, testing, and optimization. Having a comprehensive response plan helps, but the actual results can be influenced by several factors. For this reason, it is not uncommon for RTO and RTA to vary by a significant degree. What’s important is gleaning insights from that actual recovery time and using it to further improve your RTO and DR capabilities.
The Last Word on Metrics
In many organizations, the IT department shoulders the responsibility of implementing and managing disaster recovery strategies. But it shouldn’t be IT’s burden alone. An effective DR program requires a coordinated effort between multiple parties. Senior management, IT, and key decision makers must all work together to align recovery objectives with the findings of business impact analysis and the organization’s individual needs. Defining your recovery metrics before disaster strikes can make all the difference when it comes time to set your response plan in motion.