CIO

Merck

A 2014 Computerworld Data+ Editors' Choice Awards honoree, Merck uses a combination of Hadoop and cloud technologies to identify the causes of yield variations in the vaccine production process.
  • Julia King (Computerworld (US))
  • 08 September, 2014 17:45

Variations in the manufacture of batches of pharmaceuticals can force drug companies to discard products, potentially incurring tens of millions of dollars in losses.

Hoping to understand how to reduce variability in the manufacturing process, Merck used a combination of cloud technologies and open-source Hadoop big data tools to aggregate and analyze 12 years of data associated with the production of one of its key vaccines.

"We were able to aggregate the complete data set of every batch produced and compared them at scale," says Gerard Megaro, director of innovation and analytics at Merck, explaining that a total of 1.5 million terabytes of data, including previously archived data and current data, was queried from one place and subsequently analyzed on a single platform.

The five-person project team then performed more than 15 billion calculations and examined more than 5.5 million batch-to-batch comparisons. In all, the team analyzed more than 1 billion records, applying advanced analytics to determine which production factors had the greatest impact on product yield.

"I gave [the project] three months and a couple of hundred thousand dollars, and [Megaro] came back with a heat map that pointed us toward a number of variables that would impact the manufacturing process and make us more efficient," says George Llado, vice president of IT for supply chain and manufacturing at Merck. "We now have a much faster way to get at [data] we couldn't get at before and to build a sustainable model that we can continue to apply [to the manufacture of other products]."