Airbnb Data Catalog: Democratizing Data with Dataportal in 2025

Share this article
The Airbnb Data Catalog, known as Dataportal, is a powerful tool designed to enhance data accessibility.
See How Atlan Simplifies Data Cataloging – Start Product Tour
It allows users to discover and utilize data effectively, promoting data democratization across the organization.
This catalog provides context and metadata, enabling informed decision-making for all employees.
Table of content #
- What is Dataportal?
- Why did Airbnb build Dataportal?
- How was Dataportal built?
- What are the features of the Airbnb data catalog?
- Can’t build your own Dataportal? Consider Atlan
- How organizations making the most out of their data using Atlan
- FAQs about Airbnb Data Catalog
- Airbnb Dataportal: Related reads
What is Dataportal? #
Dataportal is a data catalog built at Airbnb to drive data enablement and democratization by improving data discovery. Airbnb employees use it to easily surface desired data with the appropriate context. Thanks to the tool, data is democratized and used in daily flows, rather than residing within the sole domain of just a few data workers.
Why did Airbnb build Dataportal? #
Within five years of Airbnb’s 2008 founding, the then-startup had served over 9 million guests en route to becoming the world’s most recognizable online hospitality platform. However, the company’s explosive growth didn’t come without challenges including:
- Too many disparate data sources
- Siloed data
- Tribal knowledge
Too many data resources #
In 2013 alone, Airbnb added 250,000 properties to its platform. These additions were just some of the many data points flowing into the company as data tables, dashboards, reports, metrics, and definitions. As the company scaled, the deluge of data sources made it difficult for employees to actually use the data in decision-making.
Siloed data #
Airbnb also suffered from a fragmented data landscape where data was siloed, inaccessible, and lacked context, making it virtually unusable for all but a handful of data workers. The problem only grew as the company expanded its operations to include offices around the world. It became very difficult to maintain a single source of truth when it came to data.
Tribal knowledge #
Like many organizations, Airbnb had a problem with tribal knowledge, or unwritten information and processes that rests in the brains of select individuals. Only data workers understood how to find and use data. Other employees had to turn to them for information, or forgo the use of the data entirely. The situation became untenable as the company added thousands of employees dispersed across the globe.
To overcome these challenges, the company charged its data team with developing a data catalog that would better enable employees to explore, discover, trust, and leverage data. The project became known as Dataportal.
Learn more → Dataportal — Democratizing data at Airbnb
The Ultimate Guide to Evaluating a Data Catalog
Download ebook
How was Dataportal built? #
In building the Airbnb data catalog, the company leveraged a team of data scientists and visualization engineers. Believing the interface and user experience of a data tool should not be an afterthought, the data team ensured the backend and frontend were given equal weight so the product wasn’t just functional, but also easy to use.
Key components of Airbnb Dataportal. Source: GraphConnect Europe 2017
Backend #
The Airbnb data catalog uses Flask as a lightweight Python web framework for the API. The data catalog leverages data resources to build a graph in Hive that is composed of nodes and resources. This graph maps relationships between data users and the data itself.
From there, it’s a winding data path whereby the data:
- Starts in Hive;
- Airflow pushes it to Python where it is represented as an object, and a page rank is computed to help with ranking;
- The data is then pushed to Neo4j (source of truth) by a Neo4j driver and Neo4j integrates with Elasticsearch where the nodes are pushed;
- Elasticsearch then serves as the search engine and results are fetched by the webserver.
Frontend #
Airbnb took pains to ensure that the front-end provided an intuitive, frictionless experience so it would be usable by people of all data literacy levels. The journey began by interviewing employees across the company and creating a range of user personas spanning data knowledge and use cases. Airbnb stressed that the UI had to be free of bugs to build trust so that employees wouldn’t hesitate to use it in their daily workflows.
Personas used to design and build Dataportal. Source: neo4j
In terms of technologies, the frontend employs: #
- ES6, NPM for application and dependencies
- React for DOM (Document Object Model)
- Redux for application state
- Khan/Aphrodite for styling
- Slint, Enzyme, Mocha, and Chai for testing
Dataportal — Democratizing Data at Airbnb
What are the features of the Airbnb data catalog? #
Airbnb describes Dataportal as having four primary features which include:
- Search
- Context and metadata
- Employee-centric data
- Team-centric data
Search #
The Airbnb data catalog features a clean and minimalist design for enhanced clarity as the data itself is already complex. Engineers designed the search experience to mimic that of Google, allowing users to quickly surface the information they need. Search is also intentionally designed to be fast as a laggy search disincentivizes exploration.
Metadata search and discovery on Dataportal. Source: GraphConnect Europe 2017
Context and metadata #
The Airbnb data catalog doesn’t just display data, it provides context in the form of metadata so users can understand:
- Who created the data
- Who consumed it
- When was it last updated
Users can even trace data lineage by exploring parent and child tables in order to understand relationships between data sets.
Data relationships, consumption intelligence and data description in Dataportal. Source: GraphConnect Europe 2017
Employee-centric data #
Airbnb believes that employees (including ex-employees) are the ultimate holders of tribal knowledge so it’s important to provide information about them, the data they’ve created, and the data they’ve consumed. All employees have access to the pages of other employees, promoting greater transparency and trust within the company.
Team-centric data #
Teams have tables they query regularly, dashboards they review, and metrics they’ve defined. The Airbnb data catalog allows users to search this team info so users (both inside and outside the team) can reference it rather than chasing down someone on the team.
Can’t build your own Dataportal? Consider Atlan #
When Airbnb understood it had a problem getting employees to use data in their normal workflow, the company embarked on a mission to build a data catalog from scratch that would enable greater data democratization. By assembling the right data scientists and engineers, the company succeeded in creating a tool that’s effective and simple to use.
However, not all organizations have the resources to build a data catalog from the ground up. Luckily, they don’t have to.
If you are a data consumer or producer and are looking to champion your organization to optimally utilize the value of a modern data stack - while weighing your build vs buy options, it’s worth taking a look at off-the-shelf alternatives like Atlan — an easy-to-integrate, modern data catalog designed for data-driven teams to discover, understand, trust, and collaborate on data assets.
How organizations making the most out of their data using Atlan #
The recently published Forrester Wave report compared all the major enterprise data catalogs and positioned Atlan as the market leader ahead of all others. The comparison was based on 24 different aspects of cataloging, broadly across the following three criteria:
- Automatic cataloging of the entire technology, data, and AI ecosystem
- Enabling the data ecosystem AI and automation first
- Prioritizing data democratization and self-service
These criteria made Atlan the ideal choice for a major audio content platform, where the data ecosystem was centered around Snowflake. The platform sought a “one-stop shop for governance and discovery,” and Atlan played a crucial role in ensuring their data was “understandable, reliable, high-quality, and discoverable.”
For another organization, Aliaxis, which also uses Snowflake as their core data platform, Atlan served as “a bridge” between various tools and technologies across the data ecosystem. With its organization-wide business glossary, Atlan became the go-to platform for finding, accessing, and using data. It also significantly reduced the time spent by data engineers and analysts on pipeline debugging and troubleshooting.
A key goal of Atlan is to help organizations maximize the use of their data for AI use cases. As generative AI capabilities have advanced in recent years, organizations can now do more with both structured and unstructured data—provided it is discoverable and trustworthy, or in other words, AI-ready.
Tide’s Story of GDPR Compliance: Embedding Privacy into Automated Processes #
- Tide, a UK-based digital bank with nearly 500,000 small business customers, sought to improve their compliance with GDPR’s Right to Erasure, commonly known as the “Right to be forgotten”.
- After adopting Atlan as their metadata platform, Tide’s data and legal teams collaborated to define personally identifiable information in order to propagate those definitions and tags across their data estate.
- Tide used Atlan Playbooks (rule-based bulk automations) to automatically identify, tag, and secure personal data, turning a 50-day manual process into mere hours of work.
Book your personalized demo today to find out how Atlan can help your organization in establishing and scaling data governance programs.
FAQs about Airbnb Data Catalog #
1. What is the Airbnb Data Catalog and how does it function? #
The Airbnb Data Catalog, known as Dataportal, is a centralized tool that enables employees to discover and access data easily. It provides context and metadata, allowing users to understand the data’s origin and relevance, thus promoting data democratization within the organization.
2. How can I leverage the Airbnb Data Catalog for better insights into my rental property? #
You can use the Airbnb Data Catalog to access various data points related to rental properties, such as market trends, pricing strategies, and occupancy rates. This information can help you make informed decisions to optimize your listings and improve your rental performance.
3. What types of data are available in the Airbnb Data Catalog? #
The Airbnb Data Catalog includes a wide range of data types, such as property listings, user reviews, market trends, and performance metrics. This diverse data set allows users to gain insights into various aspects of the Airbnb ecosystem.
4. How does the Airbnb Data Catalog support hosts in optimizing their listings? #
The catalog provides hosts with access to valuable data insights, including competitive pricing, occupancy rates, and guest preferences. By leveraging this information, hosts can make data-driven decisions to enhance their listings and attract more guests.
5. What are the benefits of using the Airbnb Data Catalog for market analysis? #
Using the Airbnb Data Catalog for market analysis allows users to identify trends, assess competition, and understand customer preferences. This data-driven approach helps businesses make informed decisions and develop effective strategies in the competitive vacation rental market.
6. How can I access the Airbnb Data Catalog for research purposes? #
To access the Airbnb Data Catalog for research, you can visit the official Airbnb website or utilize their API. The catalog is designed to be user-friendly, allowing researchers to easily navigate and extract relevant data for their studies.
Airbnb Dataportal: Related reads #
- Lyft Amundsen data catalog: open source data discovery tool.
- LinkedIn DataHub: Open-source tool for data discovery, catalog, and metadata management
- Open source data catalog software: 5 popular tools to consider in 2025
- What Is a Data Catalog? & Do You Need One?
- Best Alation Alternative: 5 Reasons Why Customers Choose Atlan
- Open Source Data Catalog - List of 6 Popular Tools to Consider in 2025
- Apache Atlas: Origins, Architecture, Capabilities, Installation, Alternatives & Comparison
- Netflix Metacat: Origin, Architecture, Features & More
- DataHub: LinkedIn’s Open-Source Tool for Data Discovery, Catalog, and Metadata Management
- Amundsen Demo: Explore Amundsen in a Pre-configured Sandbox Environment
- Amundsen Set Up Tutorial: A Step-by-Step Installation Guide Using Docker
- DataHub Set Up Tutorial: A Step-by-Step Installation Guide Using Docker
- How to Install Apache Atlas?: A Step-by-Step Setup Guide
- How To Set Up Okta OIDC Authentication in Amundsen
- Amundsen Data Lineage - How to Set Up Column level Lineage Using dbt
- Amundsen vs. DataHub: Which Data Discovery Tool Should You Choose?
- Amundsen vs. Atlas: Which Data Discovery Tool Should You Choose?
- Databook: Uber’s In-house Metadata Catalog Powering Scalable Data Discovery
- Step-By-Step Guide to Configure and Set up Amundsen on AWS
- Airbnb Data Catalog: Democratizing Data With Dataportal
- Amundsen Alternatives – DataHub, Metacat, and Apache Atlas
- Lexikon: Spotify’s Efficient Solution For Data Discovery And What You Can Learn From It
- OpenMetadata: Design Principles, Architecture, Applications & More
- OpenMetadata vs. DataHub: Compare Architecture, Capabilities, Integrations & More
- OpenMetadata vs. Amundsen: Compare Architecture, Capabilities, Integrations & More
- Amundsen Data Catalog: Understanding Architecture, Features, Ways to Install & More
- Open Data Discovery: An Overview of Features, Architecture, and Resources
- Atlan vs. DataHub: Which Tool Offers Better Collaboration and Governance Features?
- Atlan vs Amundsen: A Comprehensive Comparison of Features, Integration, Ease of Use, Governance, and Cost for Deployment
- Pinterest Querybook 101: A Step-by-Step Tutorial and Explainer for Mastering the Platform’s Analytics Tool
- Marquez by WeWork: Origin, Architecture, Features & Use Cases for 2025
- Atlan vs. Apache Atlas: What to Consider When Evaluating?
- Build vs. Buy Data Catalog: What Should Factor Into Your Decision Making?
- Magda Data Catalog: An Ultimate Guide on This Open-Source, Federated Catalog
- OpenMetadata vs. OpenLineage: Primary Capabilities, Architecture & More
- OpenMetadata Ingestion Framework, Workflows, Connectors & More
- 6 Steps to Set Up OpenMetadata: A Hands-On Guide
- Apache Atlas Alternatives: Amundsen, DataHub, and Metacat
- Guide to Setting up OpenDataDiscovery
Share this article