The security screening process alone makes flying a rough go for me. I know the routine, but it’s still a dramatic experience internally. I can’t even imagine the mental anguish I would’ve endured had I been affected by this mini disaster.
Southwest Airlines recently experienced some untimely technical woes that made travelling a real drag for some customers. According to the online media machine, an untimely glitch caused disruptions to the airline’s primary points of contact: the main website at Southwest.com, the mobile app, and its call centers. As a result, more than 800 of over 3300 scheduled flights were delayed. While there were no cancellations, some passengers missed flights or didn’t have a chance to check their bags. Needless to say, this led to some very frustrated customers.
The Southwest situation offers yet another lesson in business continuity. Here are some takeaways I think deserve some attention.
1. Recovery Should Objectives Set the Tone
Surely Southwest has established specific recovery objectives that define how the airline wants to rebound to full strength. Disaster recovery experts view these objectives in two perspectives: RPO and RTO. In simple terms, these objectives are a measure of what an organization can afford to lose should their systems go down. A firm that routinely replicates its backups to an offsite data center has in the process, vowed to lose as little data as possible. On the other hand, a company that backs up to tape once per day has in effect made peace with the loss of data and a slower recovery process.
In the past, many organizations accepted that recovering systems from the primary data center to the backup site could take up to 72 hours. Nowadays recovery goals are a lot loftier and with the right technology, easier to achieve as well. Armed with a comprehensive disaster recovery solution, service providers can securely backup everything on their systems and fully restore their data on bare metal, physical, or virtual environments in minutes rather than days or even hours.
2. Be Ready to Rough It
In order to keep the operation flowing, Southwest was forced to resort to manual processes that work, but were left behind for a reason. Staff was limited at the check-in counter so lines were wrapped in loops and flight delays made an extra long weekend for several people, I’m sure. But they got through it, and being ready to run a scaled down operation was the key.
Automation has made it possible for companies to streamline processes from IT to marketing. Disaster preparedness is pretty much a case of being able to go in reverse and transform your fancy automated functions into simpler manual processes. These examples illustrate how service providers can wind back their technology so to speak and keep business rolling amid chaotic times.
Manual DR: In a situation like Southwest Airlines faced – where a glitch causes problems that carry into the next day – you may need some extra cushion for your backup and disaster recovery strategy. Automated DR is all the rage, but if bleep hits the fan hard enough, a manual approach may be necessary to get your systems back on the very first road to business continuity.
Custom Coders: Today’s developers have it super sweet thanks to new tools that virtually eliminate the need for free-hand coding. The value of grizzled developers who can get in there and get filthy in writing and managing code will be realized when those fancy automated tools aren’t so easily accessible. I know a few coding warriors who welcome this kind of challenge.
Around the clock QA: You may be forced to go without some luxuries fresh on the heels of disaster. I’m gonna guess that quality assurance is an area that needs to function as long as the “Open” sign is facing the customer. IT’s ability to manually perform acceptance testing, integration testing, and other critical evaluation methods can make sure downtime isn’t followed by a decline in product quality.
File transfer alternatives: File-friendly protocols such as FTP and SFTP may not be available when technical difficulties arise. These methods are indeed luxuries compared to alternative means that you probably wouldn’t touch if you had your pick. Whether it’s attaching a gaggle of files to email messages or loading up a spindle of CDs, you never know when some less than ideal file transfer methods will save the day.
3. Hope For the Best, Prepare For the Worst
Though inconvenient, a system defect and subsequent delays don’t necessarily qualify as catastrophic, especially when you consider everything that that could go wrong in an airport setting. The Southwest Airlines experience was hindered for much of the Sunday on which things went haywire. Smooth sailings (or flying) resumed by Monday afternoon. Having to delay a couple hundred flights isn’t the worst that could happen, but it does highlight the importance of a worst case scenario mentality.
As human we’re naturally skeptical about scenarios that sound unlikely on first mention. Members of your team may be tempted to allow that skepticism to creep into their thoughts during disaster recovery planning. There is no place for this type of thinking in today’s dynamic business environment. A well coordinated contingency strategy is built around exercises and tests that enable you to rapidly respond to predictable and unpredictable incidents alike.
4. Employees Need Flexibility
Disasters have a way of making organizations gain a greater appreciation for their human resources. Without its teams working diligently around the clock, those delays might have become cancellations that led to staggering amounts of loss revenue for Southwest. Some disasters are impossible to prevent, but your ability to continue your business operations hinge heavily on your preparedness. One crucial lesson we can take from the Southwest situation and disasters past is the importance a flexible work environment plays in readiness.
If customers look to you for guidance in business continuity planning, a list of best practices that detail everything employees need to work effectively during a crisis is among the most valuable resources you can give them. Access to crucial internet services, power sources for mobile devices, and efficient communication tools are just some of the items that should be included on that list. Workforce flexibility will ensure that employees are in position to succeed and can share information with customers as needed once business resumes.
5. There Is No Continuity Without Security
This time Southwest’s IT problems were caused by a computer glitch, but it could’ve easily been a security breach. Cyber criminals have already managed to compromise the airline industry in ways that are downright scary. The air traffic control system has been infiltrated. Planes rerouted and forced to land. Travel records belonging to millions of people possibly stolen. These incidents and others have left lawmakers scratching their heads and desperate for answers.
Airline industry or hospitality sector, security threats will linger wherever IT is prominent. A recent study conducted by ISACA found that 46 percent of IT professionals anticipate their organization having to deal with some sort of cyber attack in 2015. 38 percent doubted their ability to even fend off a well executed attack of the cyber nature. A true defense strategy must address malware, DDoS attacks, inside jobs, and other cyber security threats, because it if doesn’t, the chances of business continuing uninterrupted are slim to none.
For the average small to medium sized business, tackling disaster recovery and business continuity is both cost prohibitive and challenging from a technical standpoint. These organizations are increasingly turning to service providers who are better equipped to create, implement, and get hands on with these complex IT projects. MSPs that work closely with clients to simplify the management aspects while keeping costs to a minimum will flourish in the managed business continuity race. Are you up for the challenge?