Home | WebMail |

      Calgary | Regions | Local Traffic Report | Advertise on Action News | Contact

Science

How a computer problem can shut down an airline like Delta

The power outage that left thousands of Delta passengers facing worldwide flight cancellations and delays Monday shows how computer-dependent we've become and airlines must decide if their backups are good enough, an IT expert says.

'We've kind of painted ourselves into a corner where we must rely on computer systems,' prof says

Delta Air Lines grounded by power outage

8 years ago
Duration 2:04
At least 650 flights were cancelled and thousands more were delayed after a worldwide computer system outage

The system outagethat left thousands of Delta Air Linespassengers around the worldfacing flight cancellations and delayson Monday shows how computer-dependent society has become and airlines have to decide if their backup technologies are goodenough to deal with that reality, a Canadian computer networking expert says.

"We've kind of painted ourselves into a corner where we must rely on computer systems," saidSrinivasanKeshav, aprofessor of computer science atthe University of Waterloo.

"[But] we have now been able to build systems which are very tolerant of losses, of parts of the system being taken down."

The key, Keshav said, is to adopt the model that technology leaders like Googlehave known as "system fault tolerance," which assumes any single component in a computer network can fail at any time, but it doesn't matter because there are multiple backup measures in place at every levelof the system.

"Failures are not exceptions. Failures are kind of normal," Keshav said, noting that companies like Google or Amazon have dozens of servers "dying every day,"but with upward of100,000 servers on hand, the systems don't crash.

Power outage a 'surprising' cause

Delta AirLines said the cause of Monday's mess was a power outage at its base in Atlanta, Ga., at around 2:30 a.m. ET.In a statement posted online Monday afternoon,the airline said systems were once again "fully operational" andflights had "resumed hoursago but delays and cancellations remain as recovery efforts continue."

The fact that a power outage was to blame is "surprising," Keshav said, because "it's the one thing you wouldn't expect to have happen because that's easy to get right."

Airline data centres usually have two layers of backup diesel generators and batteries to protect "critical systems," he added.

"When you look at a complex computer system such as the one that Delta runs, there's many layers of the cake, so to speak. At the bottom is power," Keshav said.

Mark Duell, vice-president of operations for the global aviation tracking website FlightAware, said airlines "go to great lengths" to make sure backup systems, including severalpower sources,are in place in their data centres.

"Everything from bringing in power from the utility on opposite literal sides of the building, just so [a]single backhoe can't take them both out at the same time;having more generators than they need so that they don't need all the generators to be operable;having...multiple battery backup systems internally to cover everything until the generators come online," Duell said.

Delta passengers, including four-year-old Lisette Hamann, lower left, and older sister Harper, wait at a ticket counter at Newark airport in New Jersey on Monday, among thousands of stranded Delta customers around the world. (Seth Wenig/Associated Press)

"And then down to the point of literally each computer, each server in the data centre is plugged into two different power strips and has two power supplies that are redundant."

Although he doesn't know specifically what happened in Delta's case, Duell saidit was likely that the problem extended beyond a basic utility failure,since the batteries and generator backups should have kicked in.

"It was probably more than one failure," he said.

Safety not at risk

Both Duell and Keshav emphasized that the computer system outage would not have posed a risk to passengers in flight.

"The airplane is entirely independent of the ground in terms of continuing to fly," Duell said.

That's because airlines use "decoupling" in computer system design, Keshav said, meaning systems involved in actually operating the aircraftareindependent from other systems like reservations or flight schedules.

The reason asystem outage like this one has such an impact, Duell said, is because airlines stop and cancel flights for safetyreasons when they can't get access to important computerized information like passenger counts, how much baggage has been checkedor fuelling records.

"You run into those sorts of dependencies where they can't move things, but anything already moving is not in any real danger," he said.

'Critically examine' infrastructure

Delta isn't the only airline to have experienceda recent system failure.

Last month, Southwest Airlines cancelled more than 2,000 flights over several days after an outage that it blamed on a faulty network router.

United Airlines has suffered a series of delays since it merged with Continental as the technological systems of the two airlines clashed.

Perry Higgins, left, and Alaina Whittaker check for updates from Delta Air Lines at Toronto's Pearson airport on Monday. Computer problems at the U.S. airline forced the couple to cancel their plans to get married that afternoon in San Francisco. (Nick Boisvert/CBC)

"It'ssomething that happens from time to time," Duell said."There's no particular airline that is immune to these [problems], and from what we've seen, there's none that are particularly prone to these."

Although Keshavdoesn't know what measures specific airlines have already taken,such large-scalefailurescould bepreventedif theyinvest inrigoroussystems "that tolerate fault and assume faults are going to happen."

But that wouldentail expensiveand complex engineering, requiring the replacement of legacy systemsbuilt years ago, he said.

"Banks, airlines, things like that which have been around for a while... need to at some point critically examine their infrastructure."

With files from The Associated Press