Machine learning is transforming business. But even as the technology advances, companies still struggle to take advantage of it, largely because they don’t understand how to strategically implement machine learning in service of business goals. Hype hasn’t helped, sowing confusion over what exactly machine learning is, how well it works and what it can do for your company.
Here, we provide a clear-eyed look at what machine learning is and how it can be used today.
What is machine learning?
Machine learning is a subset of artificial intelligence that enables systems to learn and predict outcomes without explicit programming. It is often used interchangeably with the term AI because it is the AI technique that has made the greatest impact in the real world to date, and it's what you're most likely to use in your business. Chatbots, product recommendations, spam filters, self-driving cars and a huge range of other systems leverage machine learning, as do “intelligent agents” like Siri and Cortana.
[ Find out whether your organization is truly ready for taking on artificial intelligence projects and which deep learning network is best for your organization. | Get the latest insights with our CIO Daily newsletter. ]
Instead of writing algorithms and rules that make decisions directly, or trying to program a computer to “be intelligent” using sets of rules, exceptions and filters, machine learning teaches computer systems to make decisions by learning from large data sets. Rule-based systems quickly become fragile when they have to account for the complexity of the real world; machine learning can create models that represent and generalize patterns in the data you use to train it, and it can use those models to interpret and analyze new information.
Machine learning is suitable for classification, which includes the ability to recognize text and objects in images and video, as well as finding associations in data or segmenting data into clusters (e.g., finding groups of customers). Machine learning is also adept at prediction, such as calculating the likelihood of events or forecasting outcomes. Machine learning can also be used to generate missing data; for example, the latest version of CorelDRAW uses machine learning to interpolate the smooth stroke you’re trying to draw from multiple rough strokes you make with the pen tool.
At the heart of machine learning are algorithms. Some, such as regressions, k-means clustering and support vector machines, have been in use for decades. Support vector machines, for example, use mathematical methods for representing how a dividing line can be drawn between things that belong in separate categories. The key to effective use of machine learning is matching the right algorithm to your problem.
A neural network is a machine learning algorithm built on a network of interconnected nodes that work well for tasks like recognizing patterns.
Neural networks aren’t a new algorithm, but the availability of large data sets and more powerful processing (especially GPUs, which can handle large streams of data in parallel) have only recently made them useful in practice. Despite the name, neural networks are based only loosely on biological neurons. Each node in a neural network has connections to other nodes that are triggered by inputs. When triggered, each node adds a weight to its input to mark the probability that it does or doesn’t match that node’s function. The nodes are organized in fixed layers that the data flows through, unlike the brain, which creates, removes and reorganizes synapse connections regularly.
Deep learning is a subset of machine learning based on deep neural networks. Deep neural networks are neural network that have many layers for performing learning in multiple steps. Convolutional deep neural networks often perform image recognition by processing a hierarchy of features where each layer looks for more complicated objects. For example, the first layer of a deep network that recognizes dog breeds might be trained to find the shape of the dog in an image, the second layer might look at textures like fur and teeth, with other layers recognizing ears, eyes, tails and other characteristics, and the final level distinguishing different breeds. Recursive deep neural networks are used for speech recognition and natural language processing, where the sequence and context are important.
There are many open source deep learning toolkits available that you can use to build your own systems. Theano, Torch and Caffe are popular choices, and Google’s TensorFlow and Microsoft Cognitive Toolkit let you use multiple servers to build more powerful systems with more layers in your network.
Microsoft’s Distributed Machine Learning Toolkit packages up several of these deep learning toolkits with other machine learning libraries, and both AWS and Azure offer VMs with deep learning toolkits pre-installed.
Machine learning in practice
Machine learning results are a percentage certainty that the data you’re looking at matches what your machine learning model is trained to find. So, a deep network trained to identify emotions from photographs and videos of people’s faces might score an image as “97.6% happiness 0.1% sadness 5.2% surprise 0.5% neutral 0.2% anger 0.3% contempt 0.01% disgust 12% fear.” Using that information means working with probabilities and uncertainty, not exact results.
Probabilistic machine learning uses the concept of probability to enable you to perform machine learning without writing algorithms at all. Instead of the set values of variables in standard programming, some variables in probabilistic programming have values that fall in a known range and others have unknown values. Treat the data you want to understand as if it was the output of this code and you can work backwards to fill in what those unknown values would have to be to produce that result. With less coding, you can do more prototyping and experimenting; probabilistic machine learning is also easier to debug.
This is the technique the Clutter feature in Outlook uses to filter messages that are less likely to be interesting to you based on what messages you’ve read, replied to and deleted in the past. It was built with Infer.NET, a .NET framework you can use to build your own probabilistic systems.
Cognitive computing is the term IBM uses for its Watson offerings, because back in 2011 when an earlier version won Jeopardy, the term AI wasn't fashionable; over the decades it’s been worked on, AI has gone through alternating periods of hype and dismissal.
Watson isn't a single tool. It's a mix of models and APIs that you can also get from other vendors such as Salesforce, Twilio, Google and Microsoft. These give you so-called “cognitive” services, such as image recognition, including facial recognition, speech (and speaker) recognition, natural language understanding, sentiment analysis and other recognition APIs that look like human cognitive abilities. Whether it's Watson or Microsoft's Cognitive Services, the cognitive term is really just a marketing brand wrapped around a collection of (very useful) technologies. You could use these APIs to create a chatbot from an existing FAQ page that can answer text queries and also recognise photos of products to give the right support information, or use photos of shelf labels to check stock levels.
Many “cognitive” APIs use deep learning, but you don’t need to know how they’re built because many work as REST APIs that you call from your own app. Some let you create custom models from your own data. Salesforce Einstein has a custom image recognition service and Microsoft’s Cognitive APIs let you create custom models for text, speech, images and video.
That’s made easier by transfer learning, which is less a technique and more a useful side effect of deep networks. A deep neural network that has been trained to do one thing, like translating between English and Mandarin, turns out to learn a second task, like translating between English and French, more efficiently. That may be because the very long numbers that represent, say, the mathematical relationships between words like big and large are to some degree common between languages, but we don’t really know.
Transfer learning isn't well understood but it may enable you to get good results from a smaller training set. The Microsoft Custom Vision Service uses transfer learning to train an image recognizer in just a few minutes using 30 to 50 images per category, rather than the thousands usually needed for accurate results.
Build your own machine learning system
If you don’t want pre-built APIs, and you have the data to work with, there’s an enormous range of tools for building machine learning systems, from R and Python scripts, to predictive analytics using Spark and Hadoop, to specific AI tools and frameworks.
Rather than set up your own infrastructure, you can use machine learning services in the cloud to build data models. With cloud services you do not need to install a range of tools. Moreover, these services build in more of the expertise needed to get successful results.
Join the CIO Australia group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.