Data science to help move natural science out of the 'Stone Age'
- 12 August, 2015 12:47
Although some natural scientists do use statistical analysis when doing research, there's much more opportunity to apply new and advanced techniques for making discoveries.
This is what Professor Hugh Durrant-Whyte from the University of Sydney discussed at the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining in Sydney on Wednesday.
Durrant-Whyte, also former CEO of NICTA, said: “The science community is in the Stone Age. It would be progress if they moved on to using a database.
“There is an enormous amount that this [data science] community could do to simply help scientists access and visualise and use data.
“I've decided to make the rest of my career about using data science to progress science in general; to try and change the way they do discovery.”
One challenge that natural science faces is the scarcity of data, as it's not always easy to physically collect samples or measurements from deep within the Earth, nor oceans or other almost out-of-reach places.
“Data is very expensive. In some geological cases, if you want to get a data point it can cost you $20 million for one point. You really have to work to get the data, so you have to be careful about how you use it.
“It's about small data and big models. It's not about big data, it's not about petabyte science, it's not about genomics and all these things you hear about, which is really just number crunching. It's about how you use relatively sparse data to build complex models,” he said.
“Models also turn out to be very expensive to evaluate, as these are quite complex systems. If you are trying to simulate the Earth, taking one sample is actually quite an expensive thing to do. So you want to be a lot more data driven about the way that you use these models,” he added.
Durrant-Whyte gave an example of a project he is working on where data is scarce – predicting the impact of fracking would have on water contamination in New South Wales.
“I can tell you that by the time you build a model the size of NSW and you take 2000 sample points, which are water bores, you do not have a lot of data in which to base your model.”
As someone with a data science and machine learning background, he said there's an opportunity for him to find ways to build reliable models, to qualify uncertainty and help natural scientists move forward in their work.
Another challenge in natural science is the shortfalls in using non-linear differential equations to predict an outcome of an experiment, and then calibrate on those equations.
“The problem is, what you see is that those calibrations predict only the experiments that you just finished, not the ones you are about to do. Those models just aren't rich enough for that kind of problem.
“So there's a lot of interesting things like of trying to produce probabilities over models and being able to use non-parametrics to really build complexity into the models while you are doing it.”
On another project he is working on, predicting tectonic plate motion from billions of years ago when mineralisation took place on plate boundaries, Durrant-Whyte said that instead of using a differential equation to model magma and plate tectonic motion he did probabilities over different types of motion parameters.
“That allows you to qualify the uncertainty that's involved, but also to integrate all those different types of data - and any other data that might or might not be relevant - to get the best possible model of what's been going on with the plates,” he said.
“We're also exploring much more spatially constrained ideas using things like conditional random fields that might actually explain how different parts of a plate connect together and different things like that,” he added.
There are many more ways that data scientists can contribute to scientific discovery, Durrant-Whyte said. He encouraged the community to think differently and look outside the usual applications of their work in finance and marketing to help change our understanding of the planet.
“There is a great future out there for data science, for people in the KDD community, to transform [natural] science.”
“I hope the projects [I mentioned] are significantly more compelling and interesting than selling adverts on a mobile phone.”