Addressing cost and privacy issues with open data in government

Australia 3.0 group discusses cost of anonymising data, risk of de-anonymising

Comments

No doubt the government’s push for more open data could drive innovation in private sector organisations, but it doesn’t come without its challenges.

At the Australia 3.0 forum in Melbourne last week, a mixed group of IT professionals, government officials and heads of private companies came together to discuss how to address the main barriers preventing agencies and departments releasing more data to the public.

The elephant in the room when it comes to releasing government data is the expense of anonymising it. Departments such as human services, health and social services hold vast amounts of data but much of it is highly personal and sensitive, and costly to anonymise.

“It requires a degree of expertise, and not all agencies will have the skills to be able do that, so they might have to bring the skills in. Most agencies will not have the relevant skills to be able to understand how to anonymise the data in a sufficient, safe way to get it out there,” said Abul Rizvi, former deputy secretary, digital economy, Department of Communications.

“Secondly, they are going to have to ask themselves, ‘Once I’ve anonymised it, how am I going to have to keep it up to date? What are the processes involved in that?’”

Rizvi said there is a role for the ICT community to step in and propose cost effective ways of anonymising and continuously updating datasets. He said so far agencies have been taking a piecemeal approach to releasing their data — rather than releasing it in large chunks — to keep their costs low.

“Apart from the [Australian Bureau of Statistics], I’m not aware of any agency that’s actually put a very large dataset on data.gov.au,” he said. “The bulk of those datasets on data.gov.au are relatively small, and therefore costs have been nearly insignificant.”

Government agencies need to deal with the risk of someone finding a way to de-anonymise data by cross-referencing across multiple data sets. Rizvi said the government should clearly lay out what constitutes appropriate and reasonable steps to be taken to anonymise data to prevent this happening.

There will always be people clever enough to break the system and find a way to de-anonymise data, he said, but having clear steps for agencies to follow will help build their confidence to release more data.

The approach to ICT procurement is also key to keeping open data costs at bay in the long term, Labor Senator Kate Lundy said. Agencies and departments need to consider the data sharing capabilities of their systems, the senator said.

“The systems, models, operating architecture of ICT in agencies and departments predetermines what the actual datasets will look like anyway,” Lundy said. “So unless it starts to be built in at that tender level, about the requirements of the availability of datasets in an updatable and reusable format… you are having to reverse engineer, creating an extensive problem.

“So going right to the front end of it, when those contracts are renegotiated or put in place for the first time, could resolve a lot of the downstream problems and costs associated with [sharing] of the datasets.”