Atlan named a Visionary in the 2025 Gartner® Magic Quadrant™ for Data and Analytics Governance.

A Guide to Configure and Set up Amundsen on GCP (Google Cloud Platform)

Published on: May 14th, 2022, Last Updated On: January 30th, 2023
header image

Share this article

Coping up with various data sources and structures is often grueling for data engineering teams, finding what’s in the data warehouse or a data lake, and even more so for business and analytics teams. Metadata catalogs and data discovery engines help sort out the problems mentioned above. Amundsen is one such open-source metadata catalog. It helps you get more out of your data.


Is Open Source really free? Estimate the cost of deploying an open-source data catalog 👉 Download Free Calculator


This step-by-step guide will take you through setting up Amundsen on GCP (Google Cloud) using Docker. You’ll be using Amundsen with its default database backend of neo4j. Alternatively, you can also use Apache Atlas to provide the backend.

Eight steps to setup Amundsen on GCP #

  1. Create a Google Cloud VM
  2. Configure networking to enable public access to Amundsen
  3. Log in to the Cloud VM with Cloud Shell and install Git
  4. Install Docker and Docker Compose on your GCP VM
  5. Clone the Amundsen GitHub repository
  6. Deploy Amundsen using Docker Compose on GCP
  7. Load sample data using Databuilder

Step 1: Create a Google Cloud VM #

Start by logging into your Google Cloud account. You’ll be installing Amundsen on a fresh instance, so go ahead and spin up a new VM instance, as shown in the image below:

Launch a new Cloud VM instance

Launch a new GCP Cloud VM instance

Once you finish configuring your instance, you’ll be able to get the instance details by clicking on the named link from the list of instances, as shown in the image:
View VM instance details

View GCP Cloud VM instance details

Going to the instance using the link will take you to the following page, with details like Instance Id, Zone, Machine type and so on.
View information about the Cloud VM instance

View information about the GCP Cloud VM instance

Step 2: Configure networking to enable public access to Amundsen #

Set inbound & outbound traffic rules #

As this project concentrates on getting you started with Amundsen, you can allow all ingress and egress traffic from the VPC associated with your VM. Allowing all traffic is usually not recommended for production. To enable all traffic, go to the VPC and use the Firewall option on the left panel to see the list of firewall rules. There might be a few rules already present. Remove all those rules and add the two rules shown in the image below to the firewall:
Set inbound & outbound traffic rules

Set inbound & outbound traffic rules

Verify networking details #

Use the kebab (three vertical dots) menu for your instance to navigate to the View network details option, as shown in the image below:
Verify networking details

Verify networking details

Analyze the network using connectivity tests #

As you’ve allowed all traffic on ports, reaching the VM from the internet will not be a problem. However, if you decide to limit incoming and outgoing traffic, you can navigate to the VPC network and create a connectivity test in the Network analysis section shown below:

Create a connectivity test in the Network analysis

Create a connectivity test in the Network analysis

The source is a random public IP from the internet in the above example. You’ve tested whether requests from that IP can reach your VM on port 5000. You can view the result summary under the Last configuration analysis result column. You can also view details of the connectivity test by clicking on the VIEW link, as shown in the image below:

View connectivity test results

View connectivity test results

Step 3: Log in to Google Cloud VM and install Git #

Connect to the Google Cloud VM #

There are several ways in which you can interact with your VM. The simplest way is to use the cloud shell, as it doesn’t require setting any passwords or worrying about SSH keys. You can connect using SSH in your browser by pressing the SSH link or using one of the options in the dropdown shown in the image below:

Connect to the Google Cloud VM

Connect to the GCP Cloud VM

Google Cloud creates and transfers temporary SSH keys to your VM, enabling your to access your VM:

Transfer SSH keys to VM

Transfer SSH keys to VM

Once Google Cloud transfers your SSH keys to the VM, you will land at the following screen inside your VM:

Accessing your cloud VM using CLI

Accessing your GCP cloud VM using CLI

Install Git #

As this is a completely fresh installation, it won’t have many standard tools that you might use. You will first need to install Git on your machine. Make sure that your Debian apt package manager is up to date using the following commands:

$ sudo apt update
$ sudo apt install git

Verify if Git has been installed correctly by checking the installed version of Git.

Step 4: Install Docker and Docker Compose on your GCP VM #

Install Docker Engine #

Installing Amundsen will first require you to install the Docker engine on your Google Cloud VM so that you can host and deploy Docker containers. As Amundsen is a multi-container application, Docker Compose will also be handy. First up, ensure that you update the apt-get package manager and install the relevant tools using the following commands:

$ sudo apt-get update
$ sudo apt-get install ca-certificates curl gnupg lsb-release
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg