Intelligent machines part 2: Big data, machine learning and its challenges
- 18 June, 2015 13:22
In the 1990s, average computers ran on megabytes of RAM and processing power under 100MHz, with hundreds of gigabytes of global Internet traffic being generated each day. Nowadays, RAM is in gigabytes and processing power is in the thousands of MHz, with hundreds of terabytes of Internet traffic per second.
It is argued that increasing computational power and data is what has led to the success of artificial intelligence capabilities over recent years, especially in the area of deep learning or artificial neural networks that are made up of many hidden layers between input and output.
In part 2 of this series on intelligence machines, CIO looks at some of the challenges of machine learning or where computers pick up on patterns, predict future outcomes and train themselves on how to best respond in certain situations using big data.
Read part 1 of this series here
“One of the things that makes deep learning very successful is that the algorithms tend to make better predictions when you provide them with a lot more data and when you can use a bigger computer to train a much larger model,” says Adam Coates, director of Baidu Silicon Valley AI Lab.
“But the caveat to that is it takes a lot of computing power and it’s sometimes very tricky to get all of the data to train these things.”
Simply put, deep learning is expensive, meaning there’s still a barrier to many companies taking up this technology. Even though computers are becoming more powerful and data is growing, deep learning is still mostly utilised by companies the size of Google that have the resources and computational power to pull it off successfully.
Coates says there’s a large systems team at the Baidu Silicon Valley AI lab to make deep learning work. It’s not just the algorithms that do the work, it’s also supercomputer technology that’s made up of many high performance processors and GPUs, as well as access to high speed networks.
“Later in my PhD, something I spent a lot of time thinking about and working on and I brought here to Baidu is how to use systems to make deep learning scale better so that we can train those big models from a lot of data and really focus on the importance of scalability to get better results.
“If we can scale up these models and train them on a lot more data, this is where we are going to see more performance improvements. So we should spend less time trying to engineer these systems and less time focusing on the special cases and putting more effort into the systems aspect of AI,” Coates says.
Obtaining the right training data for what you are trying to build can also be expensive. Coates says for Baidu’s Deep Speech technology, a transcription of voice audio data looked quite expensive. In order to train the deep neural networks, a transcription or text output was needed as well as the voice or audio input.
“It’s very expensive it turns out to have people transcribe audio for you – it’s like a dollar a minute or something. So if you need 10,000 hours you are really talking about a significant investment to get all of that audio transcribed for you.”
The Baidu team decided to take a different approach by using text from books as the output and crowdsourced a huge number of people to read the text from the books out loud for the input data.
This worked out to be much cheaper than getting transcribed audio, Coates says, but when doing deep learning there may not always be an affordable way to obtain the data you need.
Although deep learning has proven to be a powerful form of machine learning over recent years, it might not yield much higher performance on certain tasks, says Robin Anil, an ex-Googler who left the company this year to work on statup Tock with other former Google staff.
“The places where deep learning have given large improvement are on things like image recognition where traditional algorithms like logistic regression did not do well.
“You might be able to get small improvements by applying deep leaning into an existing problem that has already been solved using logistic regression, but that small improvement and the amount of compute power that you use may not be worth it,” Anil points out.
Toby Walsh, AI researcher at National ICT Australia, says it’s not clear whether deep learning is yet skilled at accomplishing tasks that involve strategic thinking. He examines Google DeepMind’s paper on its deep Q-network (DQN) algorithm, which learnt to play 49 Atari 2600 arcade or retro games, progressing up to human expert gamer level without having to modify or re-adjust it each time it learnt a new game.
“You could actually see a distinction between the video games, which were pure perception games, to games like Ms Pac-Man, which requires you to do a bit of planning. When you eat the ghosts, you’ve got to plan around where you are going to go quickly to eat the cherry. There’s some strategic planning there and they didn’t get actually get to human level performance.
"So I think we’ll start to realise that deep learning is good for those perception tasks but when you need to do some high level planning, and those sorts of things we do in optimisation, deep learning is probably not going to be the best, most effective or easiest way to solve those problems,” he says.
Besides deep learning, recommender systems also have a way to go. Anil, who spent a lot of his time at Google working on the Ads team, says it’s not always obvious on what improves quality when building these systems.
“Every day they think about new features, new techniques, new way to clean up data to get the next 1 per cent improvement. People can get something like 80 per cent quality recommendations very quickly, but going from 80 per cent to 90 per cent to 92 and 95 is usually really hard and involves time, patience and good engineering, both systems and features.”
One thing he is seeing these days is more people are starting to realise that user rated based recommendations are not as good as taking the probabilistic modelling or predictive modelling approach.
“User rating is that you rated a movie 5 stars, I rated a movie 4 and my wife rated a movie 2. You and I think that movie is good but my wife doesn’t think so, so the algorithm will try to figure out all users who have similar likes and then recommend other movies to those users. It’s like cross recommending content based on similarity of users.
“It works well in general, you can get a decent recommendation system based on that, but beyond a certain quality level if you want to improve, it doesn’t help you.”
Sentiment analysis is another challenging area. “Sarcasm is absolutely a hard problem in sentiment analysis, and during college my research was focused on sentiment analysis,” says Anil.
“The sentence looks positive but a certain combination of words make it sarcasm. It’s called rare patterns. All the machine learning algorithms are very poor at learning rare patterns, they are very adapted to learning frequent patterns but they are quite bad at learning rare patterns.
“Sometimes it requires people with domain specific knowledge for this problem to go in and tweak it to make it do all those things.”
However, there some tricks to get some decent results for this, he says, looking at long distant word negations to pick up on patterns related to sarcasm.
“The patterns are usually multi-word patterns – you say something at the beginning of the sentence and you say something else negating it by the end of the sentence and it looks like a positive sentence but actually it is sarcasm.
“An example is: A title like ‘Great read for idiots’. The initial word is positive, the last word is negative. Individual features are not great but collectively they strongly indicate sarcasm.”
Powering AI with low energy supercomputer in the size of a small chip
In August last year, IBM came out with its brain-inspired chip, SyNAPSE, which tiles 4096 neurosynaptic cores with one million neurons and 256 million synapses while consuming only 70 milliwatts. Today, it can integrate 4096 cores in a single rack with 4 billion neurons and 1 trillion synapses, consuming only ~4kW of power.
“This is beyond any of the supercomputing capabilities today,” says Dharmendra Modha, lead researcher of the Cognitive Computing group at IBM Almaden Research Center. “This chip and can run on an iPhone 6 battery for seven days without draining it.”
The chip can be used to power deep neural networks without having to consume a heap of power. This is possible through the way the architecture was built, having no clock and being completely asynchronous. It also doesn’t require a large memory; small memories are placed right next to each of the neurons so information never moves more than a few microns.
“There is no other complicated circuit in the middle, they just interconnect. You cannot tile any of the existing chips this way; you need some way to communicate between them. It’s also only doing what is necessary, when it is necessary, and only that which is necessary,” Modha adds.
Modha hopes the chip will complement number-crunching, calculator type computers of today by allowing future computers to also be pattern recognition, sensory machines.
“Today’s computers work on binary code of 0s and 1s. They are good for symbolic computation in databases, file servers, all that kind of data. But SyNAPSE is where the right brain complements the left brain in terms of pattern recognition, sensor data, ambiguous data, sense and respond in real time.
“The idea is to join them together to create hybrid computers of the future. As you begin to bring this hybrid computer of the left and the right brain, you can imagine all sort of applications of not just running audio and video, but natural language with audio and video,” he says.
“You could put it in your cell phone, and it could in theory be on all the time, looking at the world around you and telling you things and analysing things. Or it could be listening all the time and recognising your voice from somebody else’s voice,” adds Jeff Welser, VP and lab director of IBM Research – Almaden.