Sep
5

Getting to yesterday: Complexities of the recovery point objective

Getting to yesterday: Complexities of the recovery point objective

September 5
By

One of the most simple and yet complex concepts is the Recovery Point Objective (RPO).  How can RPO be both simple and complex you might ask?  That, my friend, is part of the reason I find RPO so interesting.  On the outside it seems that RPO should be a simple matter of selecting a point in the past from which to recover information.  It should be that simple, right?  Well, let’s walk through a scenario and see how this pans out.

It’s now two days after the Labor Day holiday and I’m at my desk.  I get a Nagios alert that one of my servers isn’t responding and it’s currently offline.  When I go into my server room I detect the faint acrid smell of burnt electronics and my server won’t power on.  Well, I immediately switch to recovery mode and start weighing my options.  Since the server won’t even power on I would like to spin up a temporary VM of my server using the most recent backup.  I can do this on my new host computer by right-clicking the backup image and selecting VirtualBoot.  The VirtualBoot process loads my server VM and with a few quick checks I can see that it’s functioning normally.  I verify that the ShadowProtect backup on this VM is re-enabled and watch as the next incremental backup is processed and added to my image chain.

Now the question is, how much data did I lose?  Well, if this is a critical system I would expect to run incremental backups as frequently as possible to minimize the amount of data lost.  On the other hand, if this is a server hosting file and print services I may not need to take backups as frequently.  So you can see that there are some variables beginning to take shape that now dictate my RPO.  One of these variables is the importance of the data on the server.  This is the “How much data can I afford to lose?” question.

Another variable that is implied in this example is the type of data stored.  A simple file server probably does not have frequent changes to files.  It’s more of a storage area for digital information.  If my data were a highly active SQL database with records changing every second, this would likely impact how frequently I backup the data.  Still another variable might be the amount of data I’m backing up.  Taking a backup of 100MB of data will be faster than a backup of 100TB of data.  All of these play a part in deciding how frequently I take a backup, or in other words what my RPO will be.

Still another variable is the amount of backup data I’m able to store.  Incremental backups are often preferred because they are simply a backup of those changes that have occurred since the time the previous backup was taken.  These are smaller files but over time they can add up and require a lot of storage space.  With ImageManager it’s easy to set a retention policy that reduces the required storage space and speeds up whole system recovery by consolidating incremental images into daily, weekly and monthly image files.  This compression reduces the number of backup images required to create a recovery image by using the consolidated image files.  A good retention policy will typically provide more recovery points in the recent past providing a high level of granularity while allowing consolidated monthly files to provide long-term archival data for recovery points reaching into the distant past.

Are the layers of complexity becoming more apparent as we continue?  The more variables we consider, the more complex our Recovery Point Objective can become.  And in fact, it seems that every unique environment deserves its own RPO.   If I had told you that my RPO in the previous example was 2 hours you would interpret an implied set of values I’d placed on my data.  If I had told you that the RPO was 15 minutes you might see my data valued differently, perhaps even as more valuable.  In reality, all I’ve done is to weigh the variables and measure the options available and then assign a number value representing how frequently I will backup my data.  That’s my RPO: it’s a complex measure of variables, processes, risks, and options all summed up in a nice, simple number.  This is why the RPO concept can be both simple and complex—and in my opinion interesting as well.

Interested in improving your recovery point objective? What about your recovery time objective? Learn how easy it is with the StorageCraft Recovery-Ability Solution.

Photo Credit: h.koppdelaney via Compfight cc