6 Steps to Set Up OpenMetadata: A Hands-On Guide
Share this article
How to set up OpenMetadata?
To simplify the process of setting up OpenMetadata, we’ve broken down the process into the following six steps:
Step 1: Taking stock of requirements to install OpenMetadata
Step 2: Understanding OpenMetadata architecture
Step 3: Downloading the Docker Compose YAML file
Step 4: Running the containers using Docker Compose
Step 5: Verifying if all the relevant services are operational
Step 6: Loading sample data by running Airflow DAGs
Once you’re done loading the data in OpenMetadata, you can explore OpenMetadata features. In this article, you’ll go through an example of data lineage of dimension tables from a sample Shopify data model.
Table of contents
- How to set up OpenMetadata?
- Understand OpenMetadata architecture
- Download the Docker Compose YAML file
- Run the containers using Docker Compose
- Verify if all the relevant services are operational
- Load sample data by running Airflow DAGs
- Explore OpenMetadata
- Related reads
Step 1. Prerequisites
Although you can install OpenMetadata using several different methods, for simplicity’s sake, this article will use the Docker method:
- Docker Engine (>= v20.10.0) — you can verify your Docker Engine version with
- Docker Compose (>= v2.1.1) — you can verify your Docker Compose version with
docker compose version
This tutorial was run on MacOS, but you can run Docker Engine on any supported OS, and the installation process will be the same.
Step 2. Understand OpenMetadata architecture
OpenMetadata is driven by a DropWizard-powered REST API that serves as the backbone of all internal and external communications with systems like metadata sources, the ingestion framework, the UI, the backend database, and the search engine.
The OpenMetadata UI is powered by the API, a MySQL database that stores all the metadata, and an Elasticsearch instance that makes the metadata across the business searchable and discoverable.
OpenMetadata only supports pull-based ingestion, but it supports both push and pull-based metadata consumption. Metadata ingestion in OpenMetadata is facilitated by a source-agnostic ingestion framework written in Python. To learn more, go to the OpenMetadata blog, where the engineering team discusses how they built the ingestion framework.
Step 3. Download the Docker Compose YAML file
Now, download the Docker Compose file from the OpenMetadata GitHub repository using the following command:
You can locate the Docker Compose file in the project assets listed on the GitHub Releases page, as shown in the image below:
Step 4. Run the containers using Docker Compose
Spin up all the containers defined in the Docker Compose file using the command below:
docker compose -f docker-compose.yml up - detach
This might take a few seconds to a couple of minutes to run. Here’s what you’ll see on your terminal screen when Docker Compose is doing its magic:
If you want to see detailed logs of what’s happening during the installation, you can go to the Docker Desktop application, as shown in the image below:
Step 5. Verify if all the relevant services are operational
Once the installation is finished, you can execute a
docker ps command to see the status of the different containers that are operational, as shown in the image below:
If all the services are running okay, you should be able to log into both the OpenMetadata UI and the Airflow UI using the same username and password combo, which is
admin, as shown in the sections below.
Step 5.1. Log into OpenMetadata
The OpenMetadata frontend is hosted on port
8585 by default, so you can go to
localhost:8585/login to log into OpenMetadata, as shown in the image below:
After successfully logging into OpenMetadata, you’ll land on the following page:
Step 5.2. Log into Airflow
You can log into Airflow by going to
localhost:8080 and entering the default username and password mentioned at the beginning of this section, as shown in the image below:
Once that’s done, you’ll be ready to load sample data into OpenMetadata using one of the pre-configured Airflow DAGs.
Step 6. Load sample data by running Airflow DAGs
In this guide, we’ll be loading pre-packaged sample data into OpenMetadata. The sample data is a dimensional model for an e-commerce website called Shopify. All the default DAGs are shown in the image below; you have to enable and run the
sample_data DAG first:
Once the DAG is run, you can run the
lineage_tutorial_operator, which will fetch lineage metadata for that dimensional model into OpenMetadata.
After you have some metadata ingested into the data catalog, you can have a look at what’s been loaded, i.e., the Shopify dimensional model, as shown below:
You can also explore the object and column-level lineage using the Lineage view for any given table in the dimensional model:
This article took you through the step-by-step process of setting up OpenMetadata using Docker. It also introduced you to OpenMetadata’s architecture, ingestion framework, and the sample dimensional data used while exploring OpenMetadata.
Although the Docker-powered OpenMetadata deployment is suitable for playing around and running small-scale workloads, for production deployment, you should go with a more flexible and scalable solution powered by Kubernetes. Learn more about deploying OpenMetadata in production from the official deployment guides.
OpenMetadata installation: Related reads
- OpenMetadata Ingestion : Framework, Workflows, Connectors & More
- OpenMetadata : Design Principles, Architecture, Applications & More
- OpenMetadata and dbt : For an Integrated View of Data Assets
- OpenMetadata vs. Amundsen : Compare Architecture, Capabilities, Integrations & More
- OpenMetadata vs. DataHub : Compare Architecture, Capabilities, Integrations & More
- Open Source Data Catalog - List of 6 Popular Tools to Consider in 2023
Share this article