Bloomberg's big move on machine learning and open source

The company's head of data science, Gideon Mann, on the work behind 'The Terminal'

With its orange text on black interface and colour coded keyboard, the Bloomberg professional services terminal – known simply as ‘The Terminal’ – doesn’t appear to have changed much since it was launched in the early ’80s. 

But behind the retro (Bloomberg prefers ‘modern icon’) stylings, its delivery of financial markets data news, and trading tools has advanced rapidly.

The terminal’s 325,000 subscribers globally are now able to leverage on machine learning, deep learning, and natural language processing techniques developed by the company, as they seek an edge in their investment decisions. Bloomberg is also applying those same techniques to its internal processes.

Leading the company’s efforts in the area is Bloomberg’s head of data science Gideon Mann, who spoke with CIO Australia earlier this month.

Fever pitch

While the look of The Terminal hasn’t changed significantly since it was launched in 1981, the data it deals with certainly has.

"Traditional financial models are very good at structured data. You have a quarterly economic indicator, something that comes out every three months and you have had that for the past 10 years. That's not a huge amount of data. So you can build a great financial model using traditional financial mathematics,” Mann explains.

“What changes is once you start to have huge amounts of data that comes through every second. Whether that's news data or social media data or receipt data or satellite images, that are not structured and that are high volume. How do you model that? In that context, machine learning is very applicable.”

Bloomberg has around 5,000 engineers worldwide who support the terminal and the products on it. A fast growing number of them are data science specialists, many of whom have been hired direct from academia.

'The Terminal' as it looks today
'The Terminal' as it looks today

Mann, who reports to chief technology officer Shawn Edwards, leads the technology strategy around machine learning, natural language processing and search. He joined in 2014 from Google where he was a research scientist.

"Over the past year or so global finance has gotten increasingly excited about using machine learning for various different kinds of things and now it's kind of reached a fever pitch. Everybody is very focused and interested on what does machine learning have to offer?” he says.

The original design
The original design

In the toolbox

Bloomberg was a pioneer of sentiment analysis, which it began developing around a decade ago, in which machine learning techniques are used to flag a news story or tweet as being relevant to a stock and assign a sentiment score.

Typically, if there is a positive news story on a company, its share price will rise and vice versa.

The ability to read hundreds of articles in less time than it would take a human to read just one, gives Terminal customers a distinct advantage, as Bloomberg Market Specialist Ian McFarlane explains.

“During the time you’ve got your nose buried in that piece, the stocks or bonds in your portfolio might have been mentioned in hundreds of social media posts and news articles. It’s impossible for a human to keep up with that deluge of real-time data. That’s where distilling sentiment from news and social media provides an advantage,” he said.

This tool is being further developed to make a judgement on the reliability of Tweets and social media posts, says Mann.

We know how to vet a news story and a news organisation. How do you vet that stuff on Twitter for accuracy?” he says.

Story continues...

Page Break

In another project, conducted over the last two years, machine learning is being used to extract data from PDF financial reports and documents.

“Sometimes they're in a structured format like XML or XPRL but often they're in PDF and they have a huge amount of data. To extract the data, in the past we've had armies of data analysts typing stuff in looking into these reports. It’s expensive, it's slow and we often don't have the recall that we want,” Mann says. “So we've been mapping a fairly involved research effort to extract data from those documents.”

As a next step to that work, the company is now researching how a machine can identify graphs and scatterplots to extract the numbers.

"It firstly looks at the scatterplot, then it identifies the axes of the scatterplot and the ticks on the scatterplot and then it registers each data point so that it can recover all of the data that was used to make up that scatterplot,” Mann says.All of this is an effort to give structure to all of the unstructured data.”

Open edge

Behind much of Bloomberg’s recent builds has been an open source ethic. Mann says there has been a sea change within the company about open source.

"When the company started in 1981 there really wasn't a whole lot of open source. And so there was a mentality of if it's not invented here we're not interested,” Mann says.

Gideon Mann, head of data science at Bloomberg
Gideon Mann, head of data science at Bloomberg

Indeed, Bloomberg once built networking gear for its clients, and had its own networking protocol. The company even produced its own keyboards before they became standardised.

There's always this thread of ‘well we'll build it on our own when it's not widely available and then when it becomes a commodity then then we'll adopt’. And I think the same thing was true for open source,” Mann says.

The organisation took some convincing, but, championed by the CTO, there has been a “huge culture change” towards open source.

There are two groups you got to convince: you’ve got to convince management that using open source is going to be safe and lead to better software, and then you also have to convince engineers that using open source is going to increase their skillset, will lead to software that’s easier to maintain and is less buggy and it's going to be a more beautiful system. Once you can kind of convince those two then you're set,” Mann says.

The company is an active contributor to projects including Solr, Hadoop, Apache Spark and Open Stack.

“I don't think you can be a leading edge technology company these days without being heavily invested in open source and without being – especially in the machine learning space – heavily invested in the academic community and in publishing.

Big buzz

Although there is certainly a lot of buzz around machine learning, Mann believes it is well founded.

“It’s funny. There’s certainly a lot of hype around machine learning and data science right now. As a cautious person my inclination would be to be cynical, but my cynicism is tempered by the fact that when I look at what we know in the state of the art in academia, and what is in practice – there’s a phenomenally huge gap,” he says.

“I feel like if we didn’t learn anything new it would take five, 10 years for all of that learning to get integrated. This actually makes me very optimistic about the prognosis for machine learning, that it will really have a huge effect.”