The Virgin Blues

Web site down, 116 flights cancelled and hundreds of customers milling around airports stacking luggage into mini mountains and stocking up on coffee and water. It started about 8am on the weekend of the school holidays, so timing was wonderful. All because of a computer systems failure.
The airline handed out free pizza and water. They advised the failure was caused by an ‘external supplier’s hardware failure’ and offered sincere apologies.
There are two things of interest to security folks here. One is the risk management that must have taken place to run such a system and the other is the substantial financial impact of computer system failure.
Not so long ago, when people had a computer failure it wasn’t this disastrous. And 116 cancelled flights is a disaster by any measure. Christchurch suffered a 7.1 earthquake, the same size as Haiti, and the airport closed down for less than twelve hours. Flights were made that same day. As I write this, it’s 8:45am the day after and Virgin are fighting to get planes off the ground.
In the olden days (1980’s), businesses reverted back to their manual systems and kept running at reduced speed. The assumption being manual systems existed, which of course they did because that was still within that first generation of computerization. The manual systems computerization replaced were still, by and large, lying around and staff who knew how to run them still on hand. All that’s lost now. It’s not generally cost effective to resurrect manual systems and keep staff skills up to run them.
Which brings us to the opaque risk management question because complexity is one of the arch enemies of reliability. An oft over looked design consideration. What we see here is massive system failure. What is unseen is the thought processes at Virgin that led them to this black hole. Were corners cut and cost savings placed before system performance? Maybe some details will find the light of day in the fulness of time but one can’t help but wonder what the cost/risk analysis will look like after the wash up of lost flights and clean up costs. Never the mind loss of reputation.
We find ourselves swept along into ever more complex computer systems, controlling ever greater portions of life, supported by a soft bed of vendor promises. Of multiple fault tolerances, automated failovers, seamless redundancies and astounding mean-time between failure statistics. God help us all.
Incidents like this serve to show us all is not well in the land of risk management and vendor promises.

Comments
Post new comment