CIO

Bound to Fail

Old green-screen legacy systems exist at the core of many businesses, and they can't take the velocity and number of transactions coming at them today from outside.

The crash of a critical legacy system at Comair is a classic risk management mistake that cost the airline $US20 million and badly damaged its reputation

Reader ROI

  • How Comair IT took its eye off the risk management ball

  • Why Delta, its parent company, shares the blame

  • How to get your legacy system replaced and avoid Comair's mistake

When Eric Bardes Joined the Comair IT department in 1997, one of the very first meetings he attended was called to address the replacement of an ageing legacy system the regional airline utilized to manage flight crews. The application, from SBS International, was one of the oldest in the company (11 years old at the time), was written in Fortran (which no one at Comair was fluent in) and was the only system left that ran on the airline's old IBM AIX platform (all other applications ran on HP Unix).

SBS came in to make a pitch for its new Maestro crew management software. One of the flight crew supervisors at the meeting had used Maestro, a first-generation Windows application, at a previous job. He found it clumsy, to put it kindly. "He said he wouldn't wish the application on his worst enemy," Bardes recalls. The existing crew management system wasn't exactly elegant, but all the business users had grown adept at operating it, and a great number of Comair's existing business processes had sprung from it. The consensus at the meeting was that if Comair was going to shoulder the expense of replacing the old crew management system, it should wait for a more satisfactory substitute to come along.

And wait they did. The prospect of replacing the ever-maturing crew management system was floated again the following year, with plans laid out to select a vendor in 2000. But that didn't happen. Over the next several years, Comair's corporate leadership was distracted by a sequence of tumultuous events: managing the approach of Y2K, the purchase of the independent carrier by Delta in 2000, a pilot strike that grounded the airline in 2001, and finally, 9/11 and the ensuing downturn that ravaged the airline industry.

A replacement system from Sabre Airline Solutions was finally approved last year, but the switch didn't happen soon enough. Over the Christmas holidays, the legacy system failed, bringing down the entire airline, cancelling or delaying 3900 flights, and stranding nearly 200,000 passengers. The network crash cost Comair and its parent company, Delta Air Lines, $US20 million, damaged the airline's reputation and prompted an investigation by the US Department of Transportation.

Chances are, the whole mess could have been avoided if Comair or Delta had done a comprehensive analysis of the risk that this critical system posed to the airline's daily operations and had taken steps to mitigate that risk. But a look inside Comair reveals that senior executives there did not consider a replacement system an urgent priority, and IT did little to disrupt that sense of complacency. Though everyone seemed to know that there was a need to deal with the ageing applications and architecture that supported the growing regional carrier - and the company even created a five-year strategic plan for just that purpose - a lack of urgency prevailed.

After the acquisition by Delta, former employees say Comair IT executives didn't do the kind of thorough management analysis that might have persuaded the parent airline to invest in a replacement system before it was too late. Instead, Delta kept a lid on capital expenditures at Comair, with unfortunate consequences. The failure of the almost 20-year-old scheduling system not only saddled Delta with a plethora of customer service and financial headaches that the airline could ill afford but it also provides a cautionary tale for any company that thinks it can operate on its legacy systems for just . . . one . . . more . . . day.

Page Break

The Five Year Plan That Wasn't

Today, Cincinnati-based Comair is a regional airline that operates in 117 cities and carries about 30,000 passengers on 1130 flights a day, with three or four crew members on each. But back in 1984, when Jim Dublikar joined the company as director of finance and risk management, Comair had just 25 planes and not a jet among them. Dublikar, who served as Comair's director of risk management and information technology from 1992 until 1999, explains that back then, flight crew managers did all their scheduling with pen and paper. But in 1986, Comair leased software from SBS that kept track of crews, the flights they were assigned to and how many hours they were flying, in order to be in compliance with union and federal regulations.

The system worked just fine.

In 1993, Comair bought a jet - the first Bombardier CRJ regional jet in the industry, in fact. The company grew swiftly. But by 1996, other regional contenders such as American Eagle, Mesa and Continental Express acquired their own jets, and Comair lost its competitive advantage. "At that point, the playing field had got pretty even," says Dublikar, now an airline consultant. "So we had to start looking at ways of doing things better and more efficiently."

Over the years, Comair, like most airlines, had acquired a hodgepodge of applications - from crew scheduling to aircraft maintenance to passenger booking engines. "We had several systems that were getting pretty long in the tooth - around for seven, eight, nine years," Dublikar says.

Unfortunately, you can't see a crew management system age the way you can see a plane rust. But they do. "These systems are just like physical assets," says Mike Childress, former Delta CTO and now vice president of applications and industry frameworks for EDS. "They become brittle with age, and you have to take great care in maintaining them."

In 1998, Dublikar and his IT steering committee brought in consultants from Sabre, the Southlake, Texas-based airline software and consulting company, to create a long-term IT strategy to address the issue of legacy systems and architecture. The consultants spent five months meeting with IT's various constituents in the business to find out what their needs were. They examined the airline's existing IT infrastructure and suggested a five-year strategic plan outlining (among other things) which systems needed to be retired, replaced or added, and a time line for doing so.

The crew scheduling system was marked for retirement. SBS was no longer "the only game in town", recalls Dublikar, and the case for replacing the system was pretty easy to make. "The application was getting old. There was risk there, and there was new technology out there," he adds. "There were even financial benefits to replacing the system in terms of crew productivity and expenses that could be controlled better in a new system."

But this was 1998, and for the next two years, Y2K absorbed most of IT's attention. By 1999, a significant amount of the work that had been laid out in the five-year plan (including Y2K remediation) had been completed or was under way - including implementing an e-ticketing system, upgrading the corporate network, replacing the maintenance and engineering system (another high-risk legacy system written in Cobol), and implementing a revenue management application.

The replacement of the crew scheduling system was among those next on the list. But after nearly 15 years in use, the business had grown accustomed to the SBS system, and much of Comair's crew management business processes had grown directly out of it. Just look at a pilot's contract at Comair; the definition of a workday is lifted straight out of the old SBS crew management application and expressed in Julian minutes the way the system did. (There are 44,640 Julian minutes in a 31-day month.) "That's the reason why it's almost impossible to replace these systems," says John Parker, former airline CIO and 17-year Delta veteran, now CIO of AG Edwards & Sons.

But systems requirements had been defined, and the IT department was in the software selection process. Final vendor selection was slated for 2000, according to Dublikar.

But then, in the middle of 1999, Dublikar left Comair, and shortly afterward, Delta announced its plans to acquire Comair.

Page Break

Hands Off At Delta

To Delta, buying Comair - one of its most profitable regional partners - was a no-brainer. It made money. It was an industry leader in on-time, cancellation and missed baggage statistics. And it was a stock market darling. "Before Delta came in, Comair was one of the best managed and most successful regionals in the country," says Holly Hegeman, an airline industry analyst and founder of PlaneBusiness.com. "They had the reputation of being the top dog."

By all accounts, Delta's attitude toward Comair was: Why mess with success? Even IT looked OK to Delta. On paper, Comair had project time lines, good budgets, everything you'd expect to see. "So there was no mandate at the Delta level to get the Comair IT ship righted," Hegeman says. The only area Delta appeared to be concerned about was marketing; the parent airline replaced the entire marketing department at Comair within days of taking over.

Comair, like most acquired companies, wasn't exactly welcoming to its new owner either. "There was definite friction," Hegeman says. "Top management at Comair didn't take kindly to being part of mother Delta." So Comair continued to run independently for the most part - although as a wholly owned subsidiary, all major capital expenditures had to be approved by the parent company, Bardes and others say.

After Dublikar left, the IT director position stayed vacant for a number of months. In early 2000, Mike Stuart, senior vice president of flight operations, was given oversight of IT. And in March 2000, Sherri Kurlas-Schalk, who had been with the company since 1990, was named IT director. The tendency in the IT department, meanwhile, was to "keep your head down" and not draw too much attention to anything, according to Bardes, who left Comair in late 2003 to join software and IT services company Compuware as a senior systems designer. After the uncertainty in IT leadership and the takeover by Delta, there was a palpable lack of commitment to projects in IT. "Everyone was expecting someone else to move projects along," Bardes says. "The business units were expecting IT to push a project through. And IT was waiting for the business unit to push it through."

The five-year plan, which was supposed to be revisited on a regular basis, languished.

In 2001, an 89-day pilot's strike from March to June shut down Comair and the Cincinnati/Northern Kentucky International Airport, where Delta and Comair operate 90 percent of all flights. Comair closed its Cincinnati concourse, losing more than 800 daily flights and saddling Delta with a $US200 million loss for the quarter. Once the strike was over, Comair's flight operations group, the primary users of the crew scheduling system, had their hands full getting planes back in the air. "You can't just switch things on and get an airline running again," says Bardes. "There's a lot of inertia and momentum lost when you shut down, and it's hard to get started again." During this period, they gave little or no thought to replacing the crew scheduling system.

Then came 9/11, crippling airlines big and small, and pushing some of the largest carriers into bankruptcy. Though Delta has thus far avoided that fate, the airline lost nearly $US8.5 billion over the following four years, increasing the pressure to keep costs down. While there's no evidence that Delta refused to fund an upgrade for the crew scheduling system, "approval for capital expenditures seldom went through the first time [at Delta]", Bardes says. "They'd want more analysis. They had a definite influence on how fast money got spent."

Delta declined to comment for this story and referred questions to Comair. Comair spokesman Nick Miller would say only: "We've been very straightforward in acknowledging the challenging time we've been facing in the airline industry and that we've had to be very prudent in how we've invested in technology."

Late in 2002, the Comair IT group did turn its attention back to the crew management system and brought in several vendors, including Sabre and SBS, to perform demos. Comair went down the road a bit with one vendor, which it refuses to name, but ultimately backed out during contract negotiations due to pricing concerns. All in all, there seemed to be no hurry on either Comair or Delta's part to get the project rolling, even though the crew scheduling system was (and still is) the oldest application of its kind still running at a regional carrier, according to a recent survey by Regional Aviation News.

Finally, Comair got approval from Delta to replace the legacy SBS system and inked a deal with Sabre in June 2004 to implement its AirCrews Operations Manager. Implementation was set to begin in 2005. But by then, it would be too late.

On December 16, Comair reported an operating profit of $US25.7 million in the third quarter of 2004. A week later, a severe winter storm hit the Ohio Valley. The snow came with sleet and freezing rain. De-icing the jets took much longer than expected and some jets' tyres froze to the ground. From December 22 through the 24th, Comair had to cancel or delay 91 percent of its flights.

And another problem was looming. As it turned out, the crew management application, unbeknownst to anyone at Comair, could process only a set number of changes - 32,000 per month - before shutting down. And that's exactly what happened. On Christmas Eve, all the rescheduling necessitated by the bad weather forced the system to crash. As a result, Comair had to cancel all 1100 of its flights on Christmas Day, stranding tens of thousands of passengers heading home for the holidays. It had to cancel nearly 90 percent of its flights on December 26, stranding more.

There was no backup system. It took a full day for the vendor to fix the software. But Comair was not able to operate a full schedule until December 29.

Page Break

The Big Lesson: Manage Your Risk

The Comair disaster is a classic case study in operational risk, according to industry experts, who say both Comair and Delta carry some blame. Comair IT executives should have done the kind of risk management analysis that would have alerted Delta to the dangers of not replacing the legacy system sooner. And IT should have repeatedly brought that analysis to the attention of Delta officials until a replacement system was funded. Similarly, Delta executives should have insisted on scrutinizing Comair's operations and done their own analysis of the carrier's risks.

"Anything that can damage a parent company's brand or reputation has to be managed in some way," the former Delta IT executive says. "Risk assessment of worst-case scenarios at Comair should have happened at Delta."

What happened at Comair is hardly an isolated problem. Old green-screen legacy systems exist at the core of many businesses, and they can't take the velocity and number of transactions coming at them today from outside. "The more applications a legacy system is hardwired to over the years, the more fragile it becomes," says EDS's Charlie Feld, also a former Delta CIO.

The larger problem is that operational risks are not introduced into day-to-day decision making at many companies, such as Comair, the way things such as the mechanics of planes and daily operations are. Robert Charette, director of the Cutter Consortium's enterprise risk management and governance practice, says Comair executives still don't seem to have learned this basic risk management lesson. As late as March, Comair's Miller was blaming the debacle on bad weather. "If the weather had not hit as hard as it did, the problem would have never come up," he says.

Industry observers acknowledge that convincing first Comair corporate and then Delta executives of the necessity of replacing the system would have been a tough sell for IT. But if Comair's IT had done a cost-benefit analysis of the risk of not replacing the system, they could have made a convincing business case for an upgrade, says a former Delta IT executive.

The technology is there, Feld says. "The question is how people in IT lay out that multiyear plan and get the right partners in there to help transform these legacy systems," he adds. "Because if you don't, there will be more meltdowns."

The Epilogue

Having lost nearly as much as Comair made in profit the previous quarter with this fiasco, Delta finally intervened. On January 17 of this year, 20-year Comair veteran and president Randy Rademacher stepped down, and Delta assigned Fred Buttrell, CEO of Delta Connection (which manages Delta's network of regional carriers) to take over. Shortly after Rademacher's resignation, Stuart was also asked to leave. Some say more Delta blood may be infused into this regional subsidiary.

But whether Delta will invest more in Comair's IT remains to be seen. In its 2004 annual report, Delta said that it will post another substantial loss in 2005. A bankruptcy filing remains a possibility. And, says Childress, "when the airlines are in trouble, it's a lot harder to find cash for IT renewal and replacement". In fact, Delta has not ruled out the possibility of selling Comair or the other regional airline it owns to raise cash.

In the meantime, Bardes says he meets up with some of his old Comair co-workers for lunch every now and then. "I'll say: 'Are you still keeping your head down?' They'll say: 'Oh yeah'," Bardes says. "That place just seems to punish people who want to be agents of change."

As of March, Comair was still using the nearly 20-year-old crew management system from SBS, though with a lot more care. SBS implemented a bridge solution, dividing the legacy system into two modules - one for pilot schedule and another for flight attendant schedules - each with a 32,000 monthly limit of its own. Comair also began generating a daily report to monitor the volume of transactions going through the system.

And plans are still in the works to replace it.

Page Break

SIDEBAR: Living with Your Legacy System

LEGACY SYSTEMS can be like ticking time bombs. Even if a business case can be made to replace them, doing so takes time. If you have to live with a legacy system for a period of time before replacing it, you should take these steps to protect your business:

  1. Conduct performance tests and capacity planning at least once a year.

  2. Test critical applications thoroughly, especially buffer overflows that can lead to application failure.

  3. Develop workable backup plans to deal with catastrophic legacy system failures.

  4. If you find that the legacy system has so much value to the business that replacement is not an option, create a business case plan for legacy system modernization, whereby you reduce the risk associated with it.

      SIDEBAR:Your Legacy System May Be Too Big a Risk to Tolerate When . . .

      Although the legacy system that failed Comair was 18 years old, age by itself is not an indicator that a legacy system should be replaced, says Robert Charette, director of the Cutter Consortium's enterprise risk management and governance practice. There are other, more relevant facts - in addition to its maturity - that may make your legacy system intolerably risky, such as:

      1. You no longer have anyone on staff who understands the language that it's written in.

      2. You can't locate the source code or documentation, and the only person who understands the design retired years ago.

      3. There is no one around who can fix the application on short notice.

      4. Backup systems haven't been tested for years, or they're manual and too hard to implement even if you wanted to.

      5. The original vendor went out of business.

      6. You're not sure how many more transactions the application is handling than it did when you last upgraded it.

      7. You discover there are other applications wired into the system that you weren't aware of.

      8. Your company is still highly dependent on the system for everyday operations.

          Page Break

          SIDEBAR:How to Make the Case for Legacy System Replacement

          Investing in risk management or system replacement can be a tough sell to senior leadership at many companies. After all, replacing just one old system can carry a multimillion-dollar price tag.

          To overcome this very common reluctance to invest in IT, CIOs must make the case that the legacy system is critical to the company's operations and that its failure could jeopardize the company's success. Veteran CIOs such as John Parker of AG Edwards & Sons advise performing a cost-benefit analysis that includes not only the costs of replacing legacy systems but also the cost of not mitigating the risk of a systems crash.

          "That's where the business case is," Parker says. Comair's cost for not replacing its crew management system, for example, was the shutdown of the business for several days and a loss of $US20 million in operating costs and revenue. In addition, there can be many potential benefits to replacing legacy systems - such as lower operational and maintenance costs, better customer service, increased system integration and adaptability, or even increased revenue.

          Kevin Murray made such a case for legacy system replacement when he was an IT executive at insurance company AIG in 2002. When making his business case, Murray presented the costs of a potential failure to the legacy system along with the high maintenance costs the company was paying to keep those systems running. He balanced that sum against the cost of replacing the system. "I knew our total cost of ownership would go way down and our speed to market would go way up," says Murray.

          His business case hit home with AIG's executives. "Our CEO and our board of directors' jaws dropped. They saw what we had been spending and that we could get 30 percent savings by moving to a newer technology," says Murray. "It worked very well."

          While CIOs should include risk analysis in any business case they make, they should also include the tangible benefits of replacement systems. "Flexibility and costs savings can be the straw that breaks the camel's back," says Murray, who's now putting together a similar case for legacy replacement as CIO of AXA.

          CIOs who don't make the case will probably find themselves replacing their risky legacy systems at some point down the road anyway. "Once the worse-case scenario materializes and you haven't explored these kinds of options, you can get into desperation mode," Parker says. "Then you have to make decisions quickly to move off a platform, and most of them won't be well thought-out. It's a lot better to move strategically to replace legacy systems than to have to respond tactically to a system failure.