eHarmony currently operates in North America, Australia and the UK. The company has a great track record of success – since launch in 2000, 1.2 million couples have married after being introduced by the service. Today eHarmony has 55m registered users, a number that will increase dramatically as the service is rolled out to 20 other countries around the globe in the coming months.
eHarmony employs some serious data science chops to match prospective partners. Users complete a detailed questionnaire when they sign up for the service. Sophisticated compatibility models are then executed to create a personality profile, based on the user’s responses. Additional research based around machine learning and predictive analytics is added to the algorithms to enhance the matching of prospective partners.
Unlike searching for a specific item or term on Google, the matching process used to identify prospective partners is bi-directional, with multiple attributes such as age, location, education, preferences, income, etc. cross-referenced and scored between each potential partner.
In eHarmony’s initial architecture, a single monolithic database stored all user data and matches, however this didn’t scale as the service grew. eHarmony split out the matches into a distributed Postgres database, which bought them some headroom, but as the number of potential matches grew to 3 billion per day, generating 25TB of data, they could only scale so far. Running a complete matching analysis of the user base was taking 2 weeks.
In addition to the problems of scale, as the data models became richer and more complex, adjusting the schema required a full database dump and reload, causing operational complexity and downtime, as well as inhibiting how quickly the business could evolve.
eHarmony explored Apache Solr as a possible solution, but it was eliminated as the matching system requires bi-directional searches, rather than just conventional un-directional searches. Apache Cassandra was also considered but the API was too difficult to match to the data model, and there were imbalances between read and write performance.
After extensive evaluation, eHarmony selected MongoDB. As well as meeting the three requirements above, eHarmony also gained a lot of value from the MongoDB community and from the enterprise support that is part of MongoDB Enterprise Advanced.
Of course, MongoDB isn’t the only part of eHarmony’s data management infrastructure. The data science team integrates MongoDB with Hadoop, as well as Apache Spark and R for predictive analytics.
And the story doesn’t end there. They will start to add geo-location services as part of the mobile experience, taking advantage of MongoDB’s support for geospatial indexes and queries. eHarmony are also excited by the prospect of pluggable storage engines delivered in MongoDB 3.0. The ability to mix multiple storage engines within a MongoDB cluster can provide a foundation to consolidate search, matches and user data. Whether you’re looking for a new partner, or a new job, it seems eHarmony has the data science and database to get you there.