Menu
Sensible behaviours for nonsensical data

Sensible behaviours for nonsensical data

Where the CIO Comes In

In Koronios' mind, the CIO has a critical responsibility to ensure the organization has the infrastructure required to enable people to ask the right questions and to come up with meaningful answers.

For instance, when executives look upon the data warehouse as a panacea, as many are wont to do, it is the CIO's role to take the perspective of the strategists and instead focus on what the organization needs by way of data, what it wants to achieve with that data, and on identifying the data required to support those aims. In other words, it is up to the CIO to ensure design of the data warehouse is business-driven, rather than technology-driven.

It can also be the CIO's role to quantify a confidence level for the data. He or she must be able either to give the strategist confidence that the data is accurate, or else to keep them fully informed about the data's flaws, Koronios says. "In other words, for the strategist, the CIO needs to know how good the data is; needs to have some measure of confidence about the data on which they're making a decision. So that if they still wish to make that decision on a 50:50 chance, at least it's an informed kind of decision that they're making about the data that they have in front of them."

In this role as quality control/quality assurance agent, it is up to the CIO to be alert to "stuff ups" as data makes its way through the organization.

"The CIO needs to be alert to the kind of things that can go wrong, and have processes in place to have quality assured that the data that's reaching the analyst is a faithful representation of the data in the warehouse," Preston says. "When you don't know whether it's the data that's the problem or what you've done to it that's the problem, that can be a little bit tricky and I think one needs a range of strategies to deal with that."

One is to reference back to the warehouse, to attempt some kind of validation processes. Another is to conduct face value analysis: does the data look right, based on knowledge of the business?

Finally comes analysis, to check for likely causes of error and to work out where data collection may have failed.

The Time to Rework

Steve Neilson, an independent consultant who has been involved in building data warehouses in the public sector for the past 10 years, says if there is bad data in the system, at some stage it will be up to the CIO to rework it.

"You either rework the decision you've made because the decision was bad, and that's very expensive, or you rework the data collection process, which is just plain expensive and double handling of everything. And that introduces all sorts of extra costs into your overall process, which is dead silly. And there are engineering analogies, because what you're saying is: 'We know this machine is producing crappy output, or the operator is producing crappy output, we're prepared to put up with it simply because it's data.'"

Bad data costs real money and the starting point to addressing that fact is statistical analysis, or data quality metrics, Neilson says.

"First you've got to measure it. You've got to know whether it's 5 percent bad, or 10 percent bad, or 100 percent bad. Is 5 percent less bad than 10 percent? Well, it all depends. In some data fields in databases it doesn't matter if you have 50 percent errors because no one cares about them. In other fields - say fields that you're using to feed a Balanced Scorecard or something - you need to know what the error rate is when you get the Balanced Scorecard. And when a manager makes a decision based on that data he needs to know there is an error level in there. So if it was 5 percent error, I think most managers would be comfortable with making decisions on that. If it was a 50 percent error, I think they would be most uncomfortable with it and might have to take some action to do some statistical analysis, work out where the process is going wrong, and fix it."

Some data is more important than other data, with each organization having its own focus. For some organizations, date of birth may need to be absolutely accurate, for others it might be postal address, and others might need special operations data that they can rely on 100 percent. Who is the best person to make these decisions? The businessperson, not the technical people, Neilson says. The data quality analyst cannot say how many errors are acceptable; only the businessperson can.

A good place for the CIO to start is by using metrics to divide data into three categories: missing, invalid and valid. Under this measure, anything identified as neither missing nor invalid is considered by default to be valid. This allows the business to identify quickly those data items that it can afford to "allow" to have missing values. By encouraging the business to insist that the meaning of data be identified, the CIO can thus introduce the concept of metadata into the equation, Neilson says.

Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

More about AMR ResearchCreativeDepartment of HealthEmersonEmerson ElectricHISIT PeopleNSW HealthPricewaterhouseCoopersPricewaterhouseCoopersSolomonUniversity of South AustraliaUniversity of South Australia

Show Comments
[]