Airbnb Data Catalog — Democratizing Data With Dataportal

September 9th, 2022

header image for Airbnb Data Catalog — Democratizing Data With Dataportal

What is Dataportal?

Dataportal is a data catalog built at Airbnb to drive data enablement and democratization by improving data discovery. Airbnb employees use it to easily surface desired data with the appropriate context. Thanks to the tool, data is democratized and used in daily flows, rather than residing within the sole domain of just a few data workers.

Why did Airbnb build Dataportal?

Within five years of Airbnb’s 2008 founding, the then-startup had served over 9 million guests en route to becoming the world’s most recognizable online hospitality platform. However, the company’s explosive growth didn’t come without challenges including:

  • Too many disparate data sources
  • Siloed data
  • Tribal knowledge

Too many data resources

In 2013 alone, Airbnb added 250,000 properties to its platform. These additions were just some of the many data points flowing into the company as data tables, dashboards, reports, metrics, and definitions. As the company scaled, the deluge of data sources made it difficult for employees to actually use the data in decision-making.

Siloed data

Airbnb also suffered from a fragmented data landscape where data was siloed, inaccessible, and lacked context, making it virtually unusable for all but a handful of data workers. The problem only grew as the company expanded its operations to include offices around the world. It became very difficult to maintain a single source of truth when it came to data.

Tribal knowledge

Like many organizations, Airbnb had a problem with tribal knowledge, or unwritten information and processes that rests in the brains of select individuals. Only data workers understood how to find and use data. Other employees had to turn to them for information, or forgo the use of the data entirely. The situation became untenable as the company added thousands of employees dispersed across the globe.

To overcome these challenges, the company charged its data team with developing a data catalog that would better enable employees to explore, discover, trust, and leverage data. The project became known as Dataportal.

Learn more → Dataportal — Democratizing data at Airbnb


The Ultimate Guide to Evaluating a Data Catalog

Download ebook


How was Dataportal built?

In building the Airbnb data catalog, the company leveraged a team of data scientists and visualization engineers. Believing the interface and user experience of a data tool should not be an afterthought, the data team ensured the backend and frontend were given equal weight so the product wasn’t just functional, but also easy to use.

Key components of Airbnb Dataportal data catalog

Key components of Airbnb Dataportal. Source: GraphConnect Europe 2017


Backend

The Airbnb data catalog uses Flask as a lightweight Python web framework for the API.  The data catalog leverages data resources to build a graph in Hive that is composed of nodes and resources. This graph maps relationships between data users and the data itself.

From there, it’s a winding data path whereby the data:

  • Starts in Hive;
  • Airflow pushes it to Python where it is represented as an object, and a page rank is computed to help with ranking;
  • The data is then pushed to Neo4j (source of truth) by a Neo4j driver and Neo4j integrates with Elasticsearch where the nodes are pushed;
  • Elasticsearch then serves as the search engine and results are fetched by the webserver.

Frontend

Airbnb took pains to ensure that the front-end provided an intuitive, frictionless experience so it would be usable by people of all data literacy levels. The journey began by interviewing employees across the company and creating a range of user personas spanning data knowledge and use cases. Airbnb stressed that the UI had to be free of bugs to build trust so that employees wouldn’t hesitate to use it in their daily workflows.

Personas used to design and build Dataportal.

Personas used to design and build Dataportal. Source: neo4j

In terms of technologies, the frontend employs:

  • ES6, NPM for application and dependencies
  • React for DOM (Document Object Model)
  • Redux for application state
  • Khan/Aphrodite for styling
  • Slint, Enzyme, Mocha, and Chai for testing

Dataportal — Democratizing Data at Airbnb


[Download] → Forrester Wave™: Enterprise Data Catalog for DataOps, Q2 2022


What are the features of the Airbnb data catalog?

Airbnb describes Dataportal as having four primary features which include:

  • Search
  • Context and metadata
  • Employee-centric data
  • Team-centric data

The Airbnb data catalog features a clean and minimalist design for enhanced clarity as the data itself is already complex. Engineers designed the search experience to mimic that of Google, allowing users to quickly surface the information they need. Search is also intentionally designed to be fast as a laggy search disincentivizes exploration.

Metadata search and discovery on Dataportal

Metadata search and discovery on Dataportal. Source: GraphConnect Europe 2017


Context and metadata

The Airbnb data catalog doesn’t just display data, it provides context in the form of metadata so users can understand:

  • Who created the data
  • Who consumed it
  • When was it last updated

Users can even trace data lineage by exploring parent and child tables in order to understand relationships between data sets.

Data relationships, consumption intelligence and data description in Dataportal.

Data relationships, consumption intelligence and data description in Dataportal. Source: GraphConnect Europe 2017


Employee-centric data

Airbnb believes that employees (including ex-employees) are the ultimate holders of tribal knowledge so it’s important to provide information about them, the data they’ve created, and the data they’ve consumed. All employees have access to the pages of other employees, promoting greater transparency and trust within the company.

Team-centric data

Teams have tables they query regularly, dashboards they review, and metrics they’ve defined. The Airbnb data catalog allows users to search this team info so users (both inside and outside the team) can reference it rather than chasing down someone on the team.

Can’t build your own Dataportal? Consider Atlan

When Airbnb understood it had a problem getting employees to use data in their normal workflow, the company embarked on a mission to build a data catalog from scratch that would enable greater data democratization. By assembling the right data scientists and engineers, the company succeeded in creating a tool that’s effective and simple to use.

However, not all organizations have the resources to build a data catalog from the ground up. Luckily, they don’t have to.

If you are a data consumer or producer and are looking to champion your organization to optimally utilize the value of a modern data stack - while weighing your build vs buy options, it’s worth taking a look at off-the-shelf alternatives like Atlan — an easy-to-integrate, modern data catalog designed for data-driven teams to discover, understand, trust, and collaborate on data assets.



Free Guide: Find the Right Data Catalog in 5 Simple Steps.

This step-by-step guide shows how to navigate existing data cataloging solutions in the market. Compare features and capabilities, create customized evaluation criteria, and execute hands-on Proof of Concepts (POCs) that help your business see value. Download now!