<< Back to article Print this page Loading page, please wait...

Cleaning Up Your Act

Companies relying on poor quality data will inevitably pay a high price by way of economic damage springing from poorly premised decisions, lost opportunities, bad publicity and risk to reputation.

Sue Bushell (CIO)
14 July, 2003 13:29

Redundant data, wrong data, missing data, miscoded data. Every company has some of each, probably residing in IT nooks that don't communicate much. It's not a new problem, but these days the jumble becomes very apparent during high-profile projects, such as installing CRM or supply chain management software.

Reader ROI

Understand why poor data quality is no longer a little problem
Find out how costly dirty data can be
Buy-in from CXOs is a must for instituting a data quality program

For many companies the trustworthiness of the data they rely on to keep them in business remains a total mystery, forcing them either to base decisions on guesswork, or pushing them to a state of virtual paralysis where they feel they can make no effective decisions at all. Some companies make wildly optimistic assessments of the validity of their data, which as far as potential harm goes, can be even worse.

In today's information economy, knowledge has become such a critical asset that the quality of a company's data is even considered a reliable predictor for its future success. Companies relying on poor quality data will inevitably pay a high price by way of economic damage springing from poorly premised decisions, lost opportunities, bad publicity and risk to reputation.

The Data Warehousing Institute in Seattle has found low quality data costs US businesses $US611 billion per year in bad mailings and staff overhead alone. A 2003 IDC survey of 1648 companies implementing business analytics software enterprisewide found data cleanliness and quality came second only to budget cuts on the list of problems cited. And just 23 per cent of 130 companies surveyed by Cutter Consortium on their data warehousing and business intelligence practices use specialised data cleansing tools.

Achieving data quality is not unlike trying to hold an eel - it is slippery, will not stay still and seems determined to slide out of one's grasp. Consulting firm Hildebrandt International has found data quality decays at an average rate of 33 per cent annually, and without proper attention even "clean" data can become incorrect, unusable and ultimately untrustworthy. That means no company can afford to entirely trust the data it relies on at any one point in time. Yet according to some experts, far too few companies recognise the real data quality issues they face - above all, that when it comes to data, "oils ain't oils", as it were.

"I think a lot of organisations take their data as being sacrosanct in some way," says professor Graham Pervan of Curtin University School of Information Systems. "And of course [the validity of that approach] depends where you're getting the data from. There's data that comes internally from your company, and there's data you can acquire from outside - very often at great expense - and it's not all the same. You need to look with great caution at where the data is coming from and how reliable it is. People are very good at making predictions, but a prediction without error bars on it saying how reliable it is, is not a great deal of use."

The importance of "golden" data - data than has been cleansed, consolidated and proved 100 per cent accurate - cannot be underestimated. Yet despite enormous advances in technological remedies, the data quality problem seems more pervasive and tenacious than ever.

"Every company has data quality issues whether they address them or not," says Frank Block, director for finance solutions, Insightful Corporation. "And the need to integrate disparate data sources creates an avalanche effect. Smart companies recognise the need and apply analytic solutions to access, analyse and report data quality information using a predefined framework." Block says too few companies realise the positive bottom-line benefits that come with improvements in data quality. "While the costs of poor data quality can be steep, the benefits of clean and reliable data are even greater," he says.

Three quarters of respondents to the PricewaterhouseCoopers Global Data Management Survey 2001 reported that investment in effective data management had delivered improved bottom-line results across their business. Almost 60 per cent of respondents had cut processing costs, and well over 40 per cent had managed to boost sales through better analysis of customer data. However, the latest advances in - and thinking on - data quality contain both good news and bad news.

Page Break

The Good . . .

The good news is that any organisation should be able to improve the quality of its data by treating data as a strategic corporate resource; by developing a program for managing data quality with a commitment from the top; by installing appropriate technologies; by relying on experienced data quality professionals to oversee and carry out the program; and, most importantly, by constantly working at the issue.

Some companies - particularly those where the data quality issues are relatively straightforward - are making real progress. For instance Associated Food Stores in the US has moved away from "bat-fingered" manual data entry to adopt a wireless real-time locating system from WhereNet Corporation to enable it to locate, track and manage assets across its 243-hectare distribution centre. Now internal logistics manager Tim Van de Merwe says the company has achieved 100 per cent accurate data capture.

"Our data collection no longer requires any keyboard entry, no scanning because in the RF [radio frequency] sense - it's transmitted. We have automatic entries of locations, and then status association with the IDs that are located based on areas. Areas are predetermined and predefined priorities are connected to those areas. So any time an asset shows up in that area it's automatically labelled as available or unavailable and some kind of action that needs to be taken with those assets has been predefined. Once you set up the system, it just rolls," de Merwe says.

However for many companies the data quality issues are far more complex, and these organisations must draw on analytic solutions to measure and proactively manage data quality. Insightful Corporation, for instance, provides Fortune 1000 companies with scalable data analysis solutions that drive better decisions faster by revealing patterns, trends and relationships. Block says Insightful uses a stepped approach to data quality that starts with an assessment of the number of missing values, zeroes, undefined values, missing primary keys, missing foreign keys and domain outliers found in corporate data.

"The multi-level approach involves checking a number of observations in a certain table and its variation with time, frequency of missing values, outliers, and so on, compared to previous months and other factors," Block says.

"Finally the work moves to the level of business quality, which involves checking reasonable bounds on purchase orders, transaction sizes, the order of dates - you cannot close a cheque account before you even open it, the number and variation of customers in a certain region, and so on." Insightful also might check time variations of data fields such as a birthday, name and gender (Block has often found gender variations over time as a consequence of bad data).

Page Break

Every company's data quality problems are unique, and not every company can expect to achieve 100 per cent data accuracy, all of the time. Indeed for some companies, achieving 100 per cent accuracy of data will never be a realistic aspiration and instead the talk is of finding the right trade-off between pragmatism and the ideal world.

Take the Queensland Retail Traders & Shopkeepers Association (QRTSA), which deputy director Randall Swayne describes as "basically a union for employers".

The QRTSA has moved from an expensive American-packaged membership solution to a much more simple tailored Access-based system developed for it by Queensland-based Data Quality Control. "Obviously with a membership-based organisation our membership is our lifeblood. If we don't have correct data, if [statements are] not going to the right people when we send them out, then we don't get our money," Swayne says. "So that is certainly a big issue."

The association relies on a couple of mini programs to check the veracity of the data: it is implementing a barcode generator called Postman to verify postal addresses, and surveys about 20 per cent of members every year to ensure the individual data is up to date. But while Swayne says he believes the data is now between 97 and 98 per cent accurate, getting those accuracy rates any higher would take more time and resources than it is worth.

"I guess the main issue that we've had with this in particular is to ensure that with such a large number of records - with huge numbers of retailers throughout Queensland, throughout Australia in fact - that we were getting as much detail as we possibly could and getting it through the system so that it was easy to manage, easy to read, and so on," Swayne says. "The other issue is we have a number of group members, like all the IGA stores, so we get a list direct from their head office. That data is assumed to be 100 per cent correct. It is not always, but it is as good as we can get without going through and checking every individual detail.

"We're constantly updating. Any time we get a new phone number or a new address we update. We've got about 1200 out of our 3000 [members] on e-mail, so we send them e-mails quite regularly saying: 'If your details have changed let us know.' "We send out a monthly magazine, so we know fairly quickly if that doesn't get to them, and we can chase that up - we're constantly doing that. To the 10,000 prospects we send out a newsletter, and again when those come back, we chase the people up and delete them if they have disappeared, or update their [details in the] system. We [also] survey our members, generally about 20 per cent every year just to be sure that their data is up to date. Obviously with a database like the one we've got it's very simple to do that in real time and change things as they occur.

"But [if we had to make sure all the data was 100 per cent accurate] we'd spend six months out of every year doing it," Swayne says.

Page Break

. . . the Bad

That brings us to the bad news: the fact that many experts now believe few companies will ever be able to rely purely on technology to resolve their data quality issues.

"The reality is that real data, as in data that people actually use, as in real-life data, is noisy: it's got errors in it," says Curtin's professor Pervan.

"And I'm not sure that there is going to be a technological solution at all. To some extent you can, by picking up outlying items, sort of correct for it but data is always going to have errors in it. It's getting worse of course, because we've got more data, and the possibility of having bad data is increasing."

The lack of a ready technological solution presents many challenges for companies wishing to move ahead with strategic IT projects. For instance, data quality is the cornerstone of any successful customer relationship management implementation. Incorrect information - for example, duplicate contacts, outdated or erroneous contact information, contacts incorrectly categorised - erodes the system's credibility with users, erodes an organisation's credibility with their clients, and limits professionals' ability to make sound business decisions and produce effective business development programs.

"Avoiding the entry of incorrect information and identifying and correcting that which seeps in are essential to ensuring the effective use of relationship intelligence," says Rick Klau, vice president of vertical markets with US company Interface Software.

A provider of CRM software and services, Interface understands how to achieve data quality and is keen on innovation. Early in its development (circa 1991), the company struggled with the issue as much as any other organisation. But Klau says more than 300 implementations brought perspective on the barriers to data quality and enabled the company and its customers to overcome them. Now Interface uses its data quality features and process capabilities as key selling points.

"Many companies have failed to realise that the real problems in data quality lie as much with process as with technology," Klau says. "In a typical organisation, project, contact and relationship data exists in application and practice group silos. It's sliced, diced, expanded and updated for many different purposes." Klau says the problem is compounded because insights from one silo rarely get shared with any kind of centralised database due to time constraints and the limits of database reconciliation technology.

"I think this is an ongoing problem for virtually every financial institution, because they all have multiple channels," agrees Zurich Financial Services Australia head of investment management and life (and former CIO) Peter Delprado. "Some business will come in through an intermediary such as a financial planner or an insurance broker, and then they'll have other channels where it comes direct, and everywhere I've worked they've had this issue."

Delprado points out that the quest for absolute data quality can be damaging to corporate intent. For instance, while good quality data is essential to CRM, overzealous efforts to maintain that quality can occasionally prove counterproductive by marring the organisation's relationship with some customers, potentially neutralising such well-intentioned efforts. While there is some fairly sophisticated software available to do the de-duping, scrubbing and cleansing, the only way to ensure that the data really is up to speed is to check the accuracy and currency of data every time you touch the customer.

But Delprado says organisations must always remain sensitive to customer sensibilities, and avoid unnecessarily annoying customers in their efforts to keep the data up to date. He says organisations should also remain sensitive to the privacy and other concerns of customers.

"Take the channel conflict issue, where you've got a customer coming in through a planner and taking out a certain policy and then coming in through a general insurance broker, and it's the same person. They may specifically not want the two (sets of information) to come together, and we certainly honour the customer's privacy in that circumstance. We don't even attempt to match that kind of data.

"I know that other institutions want to get that database [match] because they have a view that if a customer has got a bank loan with them over here and a superannuation product over there, then the bank thinks that they should be telling them one story at all times. But there are certainly circumstances where customers don't want that at all. We certainly would prefer to keep the customers happy," Delprado says.

Page Break

. . . and the People

Data quality is, above all, a people issue. George Kahkedjian, CIO for Eastern Connecticut State University in the US, says most organisations do not even analyse their data to make sure it is correct. He also says awareness and training are even more critical to maintaining good data quality than technical analyses. In most organisations, says Kahkedjian, data collection is placed with individuals on low wages so training those individuals is not a high priority. Further, most employees do not distinguish the value of the data.

Delprado agrees data quality is as much or more a people issue than a technological one. In his view, de-duplicating data does not fully work, leading to no more than 99 per cent accuracy and requiring constant effort. That means the only way to get your data as accurate as possible is to have a sense of the importance of accurate customer data embedded into the organisational culture.

"The real way to fix it is [to ensure] that you only have one database, and whenever your organisation touches a customer it is written to the one copy. Then, if you see that there's two Delprados there, you ask the question: 'Do you want these linked? Is it the same person, is it not the same person?'

"[To instill that culture] we constantly have sessions with the staff," Delprado says. "We have a number of forums for internal communication about our strategy. Our strategy is very heavily focused on the intermediary and the relationship that we have with them and making sure that we satisfy their needs, which is satisfying their customers' needs. It is a matter of constant communication and [having] a consistent story. It's ensuring that the staff share the vision and the strategies that we've got. You can't do enough talking to them about it. I would be reasonably comfortable that most of our staff understand our strategy of intermediary-based and therefore we honour the relationships there.

"A lot of the companies are looking at data from a marketing perspective to cross-sell, but it's not a necessary focus from our perspective. We are honouring the relationship that we have with our intermediaries.

Page Break

SIDEBAR: Housecleaning

Rules of thumb for a data quality initiative

Recall is a global organisation which stores vast amounts of information for client organisations in both physical and digital form. Global president and CEO Al Trujilio says the company has ample global experience in helping companies to clean up data, particularly where the problem lies in the integrity between various sources of data.

He says the company has learned numerous lessons about the ways to maintain data quality since its formation. First, that there has to be an ongoing and regular review or inventory of that data which is critical to the organisation, and that the program has to be actively maintained and actively managed, and not just by the IT department. At a minimum, that means at least on an annual basis the organisation should conduct a comprehensive audit or inventory of the data it needs and deems to be critical, and ensure that there is a plan in place to address the maintenance of that information. Failure to do so is at the heart of many data quality failures, Trujilio says.

"Organisations discover that the IT department believes that it is complying with what it has been asked to do, but the business side of an organisation has to have awareness of those sets of data in the organisation which are critical. That data has to be managed in an active manner, [and] it has to be tested on a frequent basis to ensure that the integrity and the quality of that data is good."

Next, the organisation needs a written, formal plan, reviewed by senior management as well as IT, acknowledging that all parties understand where that information is and resources have been allocated to ensure that information is indeed maintained. "In certain jurisdictions around the world this is no longer an option, but is now the law," Trujilio points out. "In particular the United States, where as a result of the post-Enron and post-corporate scandal environment, you have laws that in fact now require CEOs and CFOs of companies to sign off every year with their annual reports that they in fact know and understand where those critical data sets are." While no such legislation applies in Australia yet, Trujilio says Australian managers in organisations that are global by nature are likely to find themselves being asked to sign off on behalf of Australia.

Finally, he says, once the plan is in place it must be tested. "You need to prove affirmatively that the systems and the safeguards in place needed to protect the data are there. That again requires a dedicated resource, a dedicated commitment to have that testing done on a regular basis. That's where many of these plans fail," he says.Trujilio cites the example of Arthur Andersen, which did an audit, had a plan and then failed to execute it. "It was ultimately one of their failures relative to the management of their data and their e-mail systems and everything else: there wasn't the execution of the testing of these plans.

"So these are really the three main elements that we remind our clients. First, know what you've got. And part of knowing what you've got is knowing who has access to it, to ensure that the data is not somehow contaminated either deliberately or by accident. Second, have the plan in place in terms of a formal plan to maintain the information. And third, test the plan. Assure yourselves in an affirmative manner that as a matter of fact the information has been protected and maintained according to the standards that have been described."

Page Break

SIDEBAR: Down and Dirty with Customers

Dirty data is the dirty little secret that can jeopardise your CRM effort

According to Rick Klau, vice president of vertical markets at Interface Software, the issues associated with data quality and customer relationship management that can impact an organisation's entire strategic business development initiative are: sharing, control, categorisation and workflow.

Sharing The concept of sharing is fundamental to the success of CRM implementation, and the overall quality of data. By contributing contacts to the centralised knowledge base, users can ultimately gain a 360-degree view of all firm relationships, which can then be leveraged for business development, cross-selling and client retention initiatives.

However, the reality is that organisational attitudes about sharing relationship data vary, as do individual professional attitudes. For instance, some professionals will willingly share all of their contacts with the centralised knowledge base and, in return, will gain all the benefits of an open system. Others are more guarded with contact information and will only share selectively.

Moreover, there are gradations as to exactly what specific information about clients professionals may want to share with the centralised database. The question then is how to protect the client's sensitive data in the CRM system while sharing information that should be disseminated.

Sharing is not an all-or-nothing proposition but instead involves various degrees of participation with the CRM knowledge base. CRM systems incapable of accommodating degrees of sharing are at a distinct disadvantage. Klau says without the ability to accommodate different sharing behaviour, professionals will opt out of the CRM system, severely impacting the quantity and quality of the information being fed to the system.

Control Another issue impacting CRM data quality is the notion of control. Historically, professionals kept their contacts in paper files, in documents or on miscellaneous slips of paper. They had ultimate control regarding to whom they provided information and what updates they would accept. Unfortunately they likely also had a significant amount of outdated contact information.

The introduction of a centralised CRM knowledge base means it is now much easier for professionals to keep their contacts up to date without extra manual effort.

However, what happens when another user makes an incorrect change causing a critical fax to be sent to the wrong location? Or an administrative assistant incorrectly changes contact information on the firm's top client? To address this issue, the system should allow professionals to control their contacts by preventing these mistakes from occurring.

"Despite the trust that professionals have for their fellow co-workers, there are some contacts that they feel no one should touch," Klau says. "For instance, the clients of many professionals are individuals with whom the professionals have cultivated close personal relationships over a long period of time and with whom they keep in regular contact.

"It therefore makes those professionals feel uncomfortable knowing that anyone in the firm, regardless of their relationship with a client, can edit and potentially corrupt client information within the CRM system. Indeed, from the professional's perspective, any changes to contact information made by others are suspect, as the professional with the close client ties would likely be the first to learn of any change in the client's information. Accordingly, professionals need some sense that they are still in control over their clients' data regardless of whether it resides in a firm-wide CRM database."

Page Break

In many instances, control is a prerequisite for sharing. Professionals who feel they lose control over their business contacts by contributing them to the centralised database will often decline participation. Accordingly, the more confidence they have in their ability to control what happens to their contacts within the CRM system, the greater likelihood that they will contribute and help build the firm's institutional knowledge.

Categorisation Categorisation represents another element of data change management. Not all changes will or should be submitted to a data steward for processing. Depending on the importance of a contact and the type of change, Klau says it is perfectly acceptable to allow certain changes to be completed and perform the data quality review later and en masse.

For instance, the 80-20 rule applies to most organisations - 80 per cent of the firm's business is derived from 20 per cent of its clients. It therefore follows that bad data on a top client can have a substantially more devastating impact on the firm's bottom line than bad data on a client for which the firm provides few and sporadic services. For example, Klau says in one case an Interface Software customer had invited its top client to an exclusive firm event. The client's name was misspelled on the invitation because someone had made an inadvertent error in the database. The outraged client argued that if its firm could not even manage to spell his name correctly on an invitation, what other mistakes might it be making? One seemingly minor error can have an enormous impact on relationships. Clearly this firm needed a CRM system that would allow it to monitor closely changes made to key contacts.

Given the frequency and volume of changes to contact and relationship information on an organisation-wide basis, even with data change management tools it could be an overwhelming task for data stewards to oversee all changes being made to firm data and ensure its accuracy. This could stall data quality efforts and negatively impact an implementation.

To prevent this data quality bottleneck from occurring, Klau says the system should facilitate the categorisation of contacts to enable data stewards to direct their efforts on changes that will have the greatest impact upon the organisation.

Data stewards can then apply some basic business rules to the data maintenance process. The organisation might require that all changes made to category one contacts have to be submitted to the data steward for verification prior to the changes being saved in the centralised knowledge base. All changes made to category ten contacts can be saved immediately in the database without verification.

From a data quality perspective, the ability to categorise clients within the CRM system and treat the various categories differently provides tremendous efficiencies. Data stewards are able to direct their time and attention to data changes that have the greatest potential impact on the organisation. Their resources will no longer be tied up managing contacts that are inconsequential to the firm's business.

Workflow A centralised database that allows everyone to contribute information is the optimum approach to keeping information current and maintaining high quality. When professionals are not allowed to make changes to existing information, they create duplicate copies. When they are not allowed to add new information, they find alternative storage mechanisms.

Page Break

However, the manner in which these changes take place should be structured. Otherwise the most recent change to a particular contact will always prevail in the database - regardless of whether that change was accurate. This could add an element of chaos to a database that should be orderly and structured. Not all contacts should be blindly updated, not all users are careful about their changes and not all changes are desired.

Furthermore, some changes are more destructive than others and should be more tightly controlled, while other changes are harmless and only serve to enhance the firm's relationship intelligence. Adding a middle name to a contact at a prospective company only serves to enhance the quality of a contact and has no negative side-effects. However, changing a client's company name is highly suspect and should be approved before making the change.

Distinguishing between important contacts and destructive changes is necessary to allow firms to apply the most optimal amount of resources to protect the most critical data.

SIDEBAR: Fight Dirty Data

In a 2001 report focused on organisations that had implemented data warehouses for the purpose of business intelligence, Cutter Consortium identified the following causes of dirty data.

Poor data entry, which includes misspellings, typos and transpositions, and variations in spelling or naming.
Data missing from database fields.
Lack of company-wide or industry-wide data coding standards (a big problem in health care, for example).
Multiple databases scattered throughout different departments or organisations, with the data in each structured according to the idiosyncratic rules of that particular database.
Older systems that contain poorly documented or obsolete data.