Master Data Management Empowers a Global Network of Airbnb Employees

The Challenge

Once a struggling startup, Airbnb has grown into a household name for the online accomodations marketplace. With the company’s success came a rapid expansion of their business and workforce, which currently includes 3,500 employees spread across 20 offices worldwide.

In any large, complex organization, an ever-growing landscape of internal and external data resources – especially when scattered across various platforms – eventually becomes unmanageable and restrictive.

After a year at Airbnb, Software Engineer, John Bodley, recognized that Airbnb’s data was prohibitively siloed, inaccessible or lacked proper context.

With over 200,000 tables in their main Hive data warehouse spread across multiple clusters, 10,000 Superset charts and dashboards, 6,000 experiments in metrics, over 6,000 Tableau workbooks and charts, and over 1,500 knowledge posts – the vast amounts of wayward data was working against their operational advantage.

Bodley also noticed that employees were relying on tribal knowledge for answers to questions, which ultimately stifled productivity.

“We often run an employee survey,” he said, “and we consistently scored really poorly around the question: ‘The information I need to do my job is easy to find.’”

He knew they needed to democratize data so any employee, regardless of role or data literacy level, was empowered to find resources, fully confident that the search results were relevant and reliable.

The Solution

With various resources (e.g., data tables, dashboards, reports, users, teams, business outcomes, etc.) each featuring levels of context and connections, Bodley and his team quickly realized their entire data ecosystem is best represented as a graph. That led them to the Neo4j graph database.

“There’s four main reasons,” said Bodley. “One, it kind of felt logical, right? Our data represents a graph, so it felt logical to use a graph database as well to store the data. It’s nimble. We wanted a really fast, performant system. It’s popular, right? It’s the world’s number one graph database... And finally, it integrates really well.”

In terms of speed, the Dataportal is meant to be a data resource search engine, where fast, detailed and accurate interactions ultimately incentivize exploration. Neo4j offers the fastest way to search through millions of data connections per second.

In terms of integration, Airbnb had their own tech stack in place, including Elasticsearch and Python. “We use Flask as a lightweight Python web framework for the API, which is consistent with a number of open source Airbnb data tools like Airflow, The Knowledge Repository, and Superset,” said Bodley. “The single-page web app leverages React and Redux.”

Neo4j integrated well with all of Airbnb’s preferred programming languages, while also allowing them to enrich search rankings by taking advantage of the graph topology. Everyday they push data from Hive into the Neo4j graph database – connecting their siloed data from a relationships perspective – to facilitate quick, highly relevant contextual search results.

Download Case Study