CIO

Click Me Maybe: Inside eHarmony's matchmaking machine

Dating site's tech chief Prateek Jain shares the machine learning secrets of the ‘brains behind the butterflies’

When a user signs up to eHarmony they fill out a lengthy questionnaire about the type of person they are, their likes and dislikes, beliefs, values and preferences in potential partners.

The information is fed into the company’s closely guarded, secret algorithm, which serves up the most compatible matches in its user base.

The matching algorithm is based on data collected from interview with more than 50,000 married couples in 23 different countries, from which the company has derived a mathematical model of a successful relationship it says.

Cue butterflies? Not particularly, explains eHarmony VP of technology Prateek Jain.

“So, I found you the most compatible person on the planet. What if you are not attracted to them?” he says.

It’s what happens next that counts. Over the last few years eharmony has been leveraging machine learning models and distribution algorithms to boost the butterflies, and help hundreds of users find true love every day.

The result is the ultimate recommendations service for singles, the company says, which leads to an average 438 people getting married through the site every day.

“We say we’re like Netflix,” Jain explains, “but the movie has to like you back.”

Algo’ my loving

In the time before Tinder, telling people you had met your partner online was met with more than a little derision.

EHarmony, founded by clinical psychologist and Christian theologian Dr Neil Warren and his son-in-law, was launched in 2000, the world’s first algorithm-based dating site.

“I think back in the day when we started I think eHarmony was primarily focused on the compatibility part of its matching algorithms, which was the secret sauce what made it popular,” Jain, who joined the company in 2011 explains.

Despite doubt about the algorithm’s success rate versus other methods – earlier this year eHarmony ads in the UK claiming its system was “scientifically proven” were banned – it certainly works for many.

By 2012, the company had a 14 per cent share of the $2-billion-a-year US dating services industry, according to research firm IBISWorld, boasting 750,000 paid subscribers and 10 million active users.

The company has a number of regional sites, plus same-sex relationship brand Compatible Partners.

Australia is eHarmony's second-largest market by profitability and accounts for about nine per cent of global business revenues.

“One thing that became apparent to us was that compatibility was working and doing its job. But we were not seeing a lot of communication happening between the members. We could find you the most compatible person on the planet but if you’re not attracted to them, if you’re not going to reach out to them with a message or call then that match is not going to be success,” he said.

A few years ago, eHarmony started experimenting and investing in big data and machine learning.

It has since added extra layers to its compatibility system, with around 20 ‘Affinity’ models at work to ensure the sites recommendations are more personalised and primed for users. Now, matches are based on far more than just the questionnaire; such as how users behave on the site, the profiles they click on and the content of their self-descriptions.

“All these indirect signals we look for, it allows us to refine the filter,” Jain says.

The looks, of love

From the outset, Jain says, eHarmony’s founders subscribed to the idea that “compatibility shouldn’t be about looks it should be about personal level compatibility”.

Nevertheless, the site’s machine learning models quickly gain an understanding of what you find attractive based on the profiles users interact with.

“We do not ask any direct questions which ask you to define your attractiveness, but based on what are the kind of matches you are reaching out to we can learn who you find attractive as well as where you rate on the attractiveness score based on how people are reaching out to you,” Jain says.

Using Google’s Cloud Vision API, user profile pictures are scanned for a number of features – including hair and eye colour, whether the image shows a beard or moustache as well as ‘has cleavage’ and ‘deduced BMI’.

Page Break

A user that more frequently clicks on blonde-haired user profiles will be served up with more blonde matches.

Preferences are also parsed from written profiles. “Some people mention ‘I have a thing for guys with beards’ right? If I see that in your profile and can detect in other people’s photos whether they have a beard we can use that as a criteria for matching,” Jain says.

Another match factor is a user’s site usage. For example, if a user is usually the one to send the first message, they are matched to people who are ‘shy’ users who rarely do.

“If you can match those people you can increase the chances of success. Not just for you but the shy individual as well,” Jain says.

As part of the image analysis, eHarmony is currently working on a tool to help users decide which photos to post on their profile and in which order. Its data has discovered, for example, that photos of individuals wearing sunglasses or in a group don’t do as well. By alerting users when they upload a photo like that, eHarmony will help them maximise the appeal of their profile, Jain says.

Occasionally, the site will serve up a match that is outside your usual preferences, called ‘Serendipitous Recommendations’. This helps users from ‘getting caught in a bubble’, Jain says.

Click me maybe

The front facing systems emit thousands of events. The events are pumped through Apache Kafka and onto Hadoop for processing. Server logs “to figure out where users are clicking” also go into Hadoop and the MapReduce system.

“We also have traditional data warehouse systems which interact with legacy Oracle systems. We keep historical data back to last 15 years or so,” Jain says.

Jain says most of the data sits on premise, however the company is mounting a cloud migration effort, and currently assessing different providers.

“We’ve realised that we cannot be sitting on sidelines as the industry moves to cloud. Running our own data centres has its own challenges, not just on capital but the time the engineering team is maintaining infrastructure and dealing with vendors and partners if something goes wrong,” Jain, who built eHarmony's spin-off job site Elevated Careers in AWS, says.

“For a business of our size I would rather have all my engineering energy focused on new products rather than infrastructure.”

In this area, eHarmony is playing catch up to its born-in-the-cloud rivals.

“They don’t have the baggage eHarmony has to maintain years of data. That’s what I’m trying to unshackle us from,” Jain adds.

It is hoped over the next year, the 100 string technology will ship product multiple times a day, rather than the current multiple times a week.

“The ultimate vision is: an engineer commits code, it gets picked up by the testing system, it auto runs test cases. If they look good it promotes the code to productions systems and makes it live. If you can cut down this development cycle you can learn your lessons much faster – rather than working on a feature for months, releasing it and realising people don’t like it,” Jain says.

An effort is also being made to make the enrolling process far easier on users. This could involve them offering their social media data for eharmony to learn about their likes and dislikes, rather than having to answer questions about them.

“Our barrier to entry is a bit high right now. We would like to be creative around – how do we ask you all the questions we ask you today and figure out a lot of that without having to ask you explicitly? What can we learn about your personality with some of your social data?” Jain says.