How conditional random fields are ‘powerful’ in machine learning

Professor Rao Kotagiri gives a rundown on how CRFs work and their application in detecting multiple sclerosis in the brain

Comments

Conditional random fields (CRFs) have been touted to be some of the most powerful methods and algorithms that underpin machine learning, says Professor Rao Kotagiri at University of Melbourne’s Department of Computing and Information Systems.

CRFs are a type of probabilistic graphical model in machine learning, and can be applied to computer vision, natural language processing and bioinformatics. Machine learning is where a computer is programmed to pick up on patterns in sample or past data to predict future outcomes or train itself on how to best respond in certain situations.

The aim is to label something, given certain features, Kotagiri said. For example, a bank might be trying to determine if a prospective customer will be ‘low’ or ‘high’ risk for a loan, which is the label (y). To do that, the bank has to evaluate the person’s income, current debt situation and so on, which are the features (x).

“In the case of conditional random fields, what we are trying to do is we want to represent the probability of a sequence of labelling, given the observations. We want to find the best sequence of labels by looking at all the features that we have,” Kotagiri told attendees at the recent Big Data Summit held at the University of Technology, Sydney.

CRFs can be mathematically written as:

CRFs follow a ‘discriminative’ approach in that it directly learns the mapping from input data to the class label. This is learning the conditional probability distribution of labels given features or p(y|x), which is enough to find the class label. Labels are globally conditioned on the set of observed features.

A ‘generative’ approach, however – which is what hidden Markov, popular machine learning models, follow – is not direct in that it first has to learn the mapping of all of the labels and features in order to then learn the mapping to the class label. This is learning the joint probability distribution of labels and features or p(x,y).

By not having to go through this kind of pre-step – p(x,y) – the end goal of finding the class label can be focused on at hand.

This p(x,y) pre-step also introduces the risk or errors as it is based on the assumption that all features are independent of each other, not taking into account that there could also be interdependent features. By bypassing this pre-step all together in the discriminative approach, the likelihood of errors decreases.

CRFs also avoid the label bias problem that occurs in conditional Markov models such as maximum entropy Markov models. MEMMs, which are discriminative, are based on the assumption that the future state in finding the label depends only on the current state. As this is not always the case, it opens itself up to being biased. CRFs don't use this assumption and consider all states when finding the label.

One issue with CRFs, however, is over-fitting or noise in the training or sample data due to too many features making the model overly complex. This can not only affect results, but also can be computationally expensive.

Kotagiri said regularization is a technique to avoid this in that is selects the most valuable features to do the modelling. “What we are trying to do is optimise features. This means trying to take the minimal number of features that are needed so that the system behaves properly.”

Detecting multiple sclerosis lesions in the brain

Kotagiri gave an example of CRFs application in detecting multiple sclerosis in the brain, which was proven to be highly accurate.

Before applying CRFs, a simple and standard analysis of the brain magnetic resonance imaging (MRI) was done in order to obtain training data or the data used for modelling.

CRFs were able to detect lesions more accurately and locate them in much more detail, which other methods were not able to pick up on, Kotagiri said.

“So what you do is take a normal person’s brain image and we can map them into some kind of standard range. And then [with] each voxel, which is like a pixel… we can tell them which part of [the range] that voxel belongs to. For each voxel, we give these probabilities, and then we map them into some kind of graph.

“If you apply any of the standard machine learning algorithms, you get too many false positives. But once we apply conditional random fields, basically these are like spatial filters because we are labelling many, many voxels simultaneously.”

CRF detection of lesions (marked green) compared with 'ground truth' or the actual situation (marked orange) to confirm if the model is accurate. The CFR detection and ground truth closely match.

Making correlations between eye retina and brain health

Kotagiri said CRFs can be used to confirm strong correlations of problems in the eye retina and brain.

“One of the things that we found is that the human retina has a very similar vascular structure to the brain. By looking at your retina they can almost predict whether you have pre diabetes or you have hypertension.”

By doing an examination of the retina first, Kotagiri said this can save on costs for the health providers and patients as they wouldn’t have to undergo an expensive brain MRI just to find out if there is a problem or not.

“So we look at these retina images and basically recognise if these people are in danger and then go for an MRI if it’s needed, because MRI tells you how healthy your brain is. But maybe it’s too late, so you would like to be routinely detected, and maybe eye retina is the cheapest way of doing it,” he said.

“Retina analysis is very cheap; you can take the eye image very cheaply. Maybe the days are not too far where everyone’s mobile phone can take retina image of your eye quite cheaply.

“Apparently, there’s a lense that you can attach to your smartphone for about 50 cents. And if you can get the lighting right, sit in a dark room for 5 minutes, you can take a ‘selfie’ of your retina. This can save your life in a big way.”

He added that in Australia, which is a wealthly country, there is about 1.7 million people suffering from diabetes, who don’t know they have a problem. About 3.7 million people in this country also have hypertension, but don’t realise it.

Techworld Australia has contacted Professor Rao Kotagiri for further information on conditional random fields.

Follow Rebecca Merrett on Twitter: @Rebecca_Merrett