CIO

'Vaccine' for machine learning models developed by CSIRO's Data61

"We had to figure out the proper thing to put in the syringe," says Data61's Dr Richard Nock

Researchers from CSIRO’s Data61 have formalised a technique to ‘vaccinate’ machine learning models against adversarial attacks and make them more robust.

The technique works by making slight modifications to training data to make it tougher for models to learn from. Having learnt from this more difficult dataset, the models turn out more robust.

Speaking to Computerworld, Data61’s machine learning group leader Dr Richard Nock, used the example of an image classifier trained to distinguish between images of apples and images of pears.

“These pictures are taken in the real world – what the machine does would take this sample and make the images harder to classify. So you would distort a little bit the apples so they look a little more like pears,” Nock said.

“When you train from this set, which has been modified just a little bit, the classifiers that you get are more robust,” he added.

Modifying the training data to make it tougher to learn from is like creating a “weak version of an adversary” Nock says, hence the vaccine analogy.

Although the method is somewhat intuitive, it had not been formalised until the publication of the researcher’s paper Monge blunts Bayes: Hardness Results for Adversarial Training which was presented at the International Conference on Machine Learning in California last week.

“It started from an intuition. We had to crack the intuition and make it more formal. Then we had to figure out the proper thing to put in the syringe,” Nock said.

In machine learning terms the differences between an apple and a pear have numerical functions. Much of the work was focused on finding which functions to modify to do the job most efficiently.

“You could use a lot of different functions, but not all of them would be as efficient as the best possible one,” Nock said.

The researchers’ results are quite general, and were demonstrated with the example of recognising handwritten digits, a common machine learning bench-marking dataset which allowed them to “proof check the system”.

The technique could be applied to any dataset used to train models, and has immediate real-world value.

"It could apply to road signs for autonomous cars, if you want to make the recognition of signs more robust, which I believe is really important,” Nock said.

“It could apply to the recognition of terrorists in an airport, if people try to put on fake moustaches and change their appearance a little bit – not too much because it would look suspect of course – but the system could be more robust to supporting these changes,” he added.

Wise to trickery

As well as making models generally more robust, it also provides a defence to adversarial attacks. Achieving and defending against such attacks has become an area of focus for researchers in recent years.

In 2017, a group demonstrated how to fool image classifiers by making alterations to ‘Stop’ signs in the real world. By using some black and white stickers – designed to mimic graffiti, and so 'hide in the human psyche' – the researchers were able to make a deep neural network-based classifier see a stop sign as a speed limit sign, 100 per cent of the time.

Others have demonstrated how a small, square, printed patch can be used as “cloaking device” to hide people from AI object detectors.

“Adversarial attacks have proven capable of tricking a machine learning model into incorrectly labelling a traffic stop sign as speed sign, which could have disastrous effects in the real world,” Nock said.

"How to make the decisions and predictions of machines more robust is a crucially important problem,” he added.

Nock’s team is hoping to test their technique on far larger datasets, and are eyeing the datasets of road imagery and footage held by Tesla.

“I want to scale this,” Nock said.

Another future effort will be to formalise the phenomenon of models that are trained on modified ‘vaccinated’ images being much better at classifying non-modified images than other models.

“It’s intuitive, but it needs to be put in equations,” Nock said, “We need to explain the difference, because the difference is significant.”