How Airlines Can Prevent Upsetting System Outages

An interview with Continuity Software’s Chief Technology Officer Doron Pinhas

Southwest's new listening center (Photo: Southwest)

Delta Air Lines, British Airways, and Southwest Airlines all experienced devastating, costly, and upsetting delays over the past several years. This past summer, airlines expected to see one of the busiest travel seasons in history. When that proved true, they tried to be as prepared as possible. Many of their passengers, though, found themselves battling chaos both in the sky and on the ground as technology failed. When the news broke customers began questioning how this could happen, and why it happened while airlines searched for the culprit.

AirlineGeeks didn’t just look for an answer, but went to find a solution. We posed the question “how can we prevent this from happening in the first place?” to Continuity Software’s Chief Technology Officer, Doron Pinhas. Continuity Software is eleven years young and carries the mission statement of “helping the world’s leading organizations prevent unplanned IT outages.”

A History of Mistakes

It’s important to remember that airline passengers aren’t always vacationers. Some people are heading to the airport for a red eye, to a funeral, or hopping onboard an afternoon flight to a family member’s wedding. Big impact on personal lives is why tech issues in aviation receive so much attention. When people are grounded, frustration peaks; and social media makes it much easier to voice.

In an era of ordering food at the airport from wired iPads, airlines rely on similar technology to get passengers to go where they need to be. Just like you and I see pop-ups on our personal systems for updates, airlines do too. Pinhas says that is because “systems as a whole are not as reliable as many commodities we are used to.” Airlines understand this and try to create redundancy by adding more components. However, bringing in more components makes systems a lot more complex and harder to manage which, in a way, makes them even more unreliable than they were to begin with. This is due to backup components essentially becoming another part of the system, that too can fail and turn the whole system to ruins unless everything is connected perfectly at all times.

Resiliency for a Price

Doron Pinhas is convinced that airlines have made resiliency a key player in their arsenal of technology. Resilience doesn’t come cheap, and the CTO does not believe airlines are saving money on such an important aspect of their companies. He favorites the expression that an airline is “only as strong as its weakest link.”

This means that even the smallest configuration mistake can bring an entire system down. And when a system is weak, most times you won’t know until it’s too late.

For example, if you have put a backup in place, you won’t know that the middleman connecting the two doesn’t work, until one system fails. Sometimes it takes an airline being punished by its own system to realize that something they have installed simply doesn’t work. Delta experienced the epitome of this kind of scenario. A small fire caused a malfunction at the airline’s Operations and Customer Center (OCC) in Atlanta. Power control was lost and the system rerouted itself to a transformer outside of Delta. A  powerful surge to one source is what caused the actual outage. The airline restored power soon, but vital systems and equipment refused to come back online. A fail-safe setup is, after all, a complex setup.

Southwest, for example, had a technology malfunction that caused the cancellation of over 400 flights. Countless systems within the company had been impacted. Doron calls it the “tip of the iceberg.” The airline shared its statement in one of the ongoing updates during the outage crisis. “Make no mistake.  Southwest created this problem…. The machines failed, but Heart prevailed.”

Calling All Data Experts

Pinhas’ experience has led him to believe that airlines like many other industries “do not check all IT components every day.” This is likely because the occurrence of a system completely shutting down is an infrequent event. Disaster recovery tests also have proven to come intrusively and with a hefty price tag. Airlines need to perform check-ups seamlessly and Pinhas believes that more communication will help fulfill that mission. He says that airlines need to hire people that “specialize…and those individuals or teams need to share information with one another.” Outsourcing in the job market causes misconceptions which he believes can stagnate the preventative process.

All in all, Doron Pinhas of Continuity Software believes that IT outages are preventable. He urges airlines, their customers, and companies in general to remember that a car can work with many inoperable parts. However, an “ounce of prevention, is worth a pound of cure.” With many airlines flying across the world and their thousands of computers, it’s become more pertinent that airlines take every taxiway turn in preventing the next grounding.

Malick Mercier
Follow

Malick Mercier

In 2013, Malick dusted off his copy of Flight Simulator X and installed it once again, in hopes of passing the tutorials.Well he did that, and so much more!He had always loved flying but, that was the pivotal moment in his life where he realized that aviation was his “thing.”Now, in 2015, he plans to begin taking flight lessons at Long Island’s MacArthur Airport (KISP).He just can’t get enough of the roaring engines, and beautiful wing flexes.Those unique features keep the airline industry booming with news.Malick was always one to stay on top of it and break it to all of his friends.That, and a day at ABC sparked his interest in journalism, and he ended up here, at AirlineGeeks.com, and couldn’t be happier.He knows that whether he goes into broadcast journalism, or into piloting as a career, or even something totally different, his heart will always long for a clear blue sky.
Malick Mercier
Follow