May 28th, 2012 became a nightmare for Brigham Young University—a nightmare that will likely continue past Christmas into 2013 and beyond. On that fateful Memorial Day, a failed software upgrade on BYU’s primary server system effectively destroyed hundreds of terabytes of data—including payroll, research, and student data. Backup systems failed to recover most of the data.
As a BDR professional, and the fact that BYU is my alma mater, I feel the intense pain administrators, professors, and students are going through as a result: spring graduation was delayed, 30 departments lost significant amounts of research data, graduate students’ careers put on hold, failed research grants, lost employment opportunities. Costs just in disk recovery alone are running in the hundreds of thousands of dollars. The total outlay in both IT and administrator time, opportunity costs, new hardware, and recreating data will be well into the 8 figures.
Perhaps BYU sociology professor Vaughn Call summed it up best: “My first reaction was disbelief. I’ve never in my long career been in any circumstance like this. It just brought us to a dead halt.”
Whose business—be it scholastic or commercial—could survive a “dead halt”? A dead halt that extends into its seventh month? Could anything have been done to avoid some or all of this impact?
Three recommendations may well have made Memorial Day 2012 a day of rest rather than a day of rage in Provo:
- Select a BDR solution that regularly verifies existing backup files. If backup files remain intact, recovery by swapping out damaged hardware may be laborious but is a simple process followed by restoring the backed up data.
- Use replication to a remote site—either via a cloud service or directly to a data center. Not only does replication prevent a localized natural disaster from taking out data, it also prevents an onsite system failure from propagating to existing backup files. A complete set of intact image files stored through cloud services or at another data center prevents a cascading single system loss.
- Most importantly, perform any server upgrades using simulations first from backup images—doing testing and simulations with backup image files is ShadowProtect’s unique strength. First, administrators can test individual backup images to confirm the volume backup is intact (and clearly showing that its contents can be opened, viewed, and data retrieved). Using VirtualBoot to bring up a backed up server online as a VM has an even more critical value. By using this live testbed, administrators can execute an upgrade on that VM of any software. In BYU’s case, that type of software testing might have immediately demonstrated the problems with the upgrade corrupting data and destroying hard drives.