Guide To Set Up OIDC Authentication in Amundsen

May 18th, 2022

header image for Guide To Set Up OIDC Authentication in Amundsen

As data security is a rising concern for data platform architects and engineers, you need to enforce the highest level of security for your data infrastructure, be it a BI tool, an internal application, or a data cataloging tool, such as Amundsen.

Passwords have increasingly become unsafe. Taking a ZTA (zero-trust architecture) route seems the right way to go for user authentication. To enable ZTA, you can use a passwordless authentication system based on OpenID Connect (OIDC), a wrapper over, the authentication standard, OAuth. Popular identity providers, such as Okta, Google, Auth0, PingIdentity, etc., all support and promote this form of user authentication.

This article will take you through the general steps required to set up passwordless authentication using the OIDC protocol enabling you to use Amundsen securely by maintaining a zero-trust architecture for better security.

Five steps to setup OIDC for Amundsen

  1. Setup Amundsen
  2. Create an OIDC app for using your OIDC provider
  3. Configure Amundsen to use OIDC auth
  4. Rebuild the frontend and deploy
  5. Log into Amundsen using OIDC auth

Step 1: Setup Amundsen

Clone Git repository

If you haven’t deployed Amundsen already, setting it up on your local machine or a cloud provider is very easy. You can use the official Docker Compose configuration file in the following repository to install Amundsen:

$ git clone --recursive https://github.com/amundsen-io/amundsen.git

Deploy Amundsen using Docker

As mentioned earlier, you would need Docker and Docker Compose installed on your instance to deploy Amundsen. You can find a more detailed overview of the installation process on our blog. Once the installation completes, you can run the following command to spin up Amundsen:

$ docker-compose -f docker-amundsen-atlas.yml up

Stop all containers

As mentioned at the beginning of this article, Amundsen doesn’t come with an authentication mechanism, so to integrate a third-party authentication mechanism into Amundsen, you’ll need to shut all the containers down and make some changes to the frontend service and a couple of configuration files which we’ll discuss in a while. You can run the following command to shut down all containers after verifying the initial Docker Compose run:

$ docker stop $(docker ps -q)

Step 2: Create an OIDC app using your OIDC provider

Before you can make changes to the frontend service and the related configuration files, you’ll need to decide on which identity provider you will use. OIDC is built on top of OAuth 2.0, so you’ll have to choose from one of the many OAuth 2.0 flows your identity provider supports.

Following that, you will have to create an application redirecting your authentication requests to the authorization server. For instance, if you’re using OIDC in Okta, you can create a SPA (single-page application), a web application, or a native application to do this. There are several OAuth 2.0 authentication flows you can choose from:

This blog post on different OAuth grant types can better help you make that decision.

Step 3: Configure Amundsen to use OIDC auth

Install FlaskOIDC

FlaskOIDC is a wrapper over Flask, which is preconfigured with OIDC support. Using this package, you can directly set some environment variables to integrate your Amundsen frontend with your identity provider. To get started, you need to install FlaskOIDC using the following command:

$ pip3 install flaskoidc

Add FlaskOIDC support to Amundsen

Amundsen allows you to customize the Flask app by adding your modules and classes. You just need to add the relevant Python libraries (as you’ve already done in the previous step) and set the related environment variables to enable the Amundsen frontend service to allow an entry point. The following snippet from the frontend service contains a number of variables like FLASK_APP_MODULE_NAME, FLASK_APP_CLASS_NAME, etc. to complete this integration:

# For customized flask use below arguments to override.

FLASK_APP_MODULE_NAME = os.getenv('FLASK_APP_MODULE_NAME') or os.getenv('APP_WRAPPER')
FLASK_APP_CLASS_NAME = os.getenv('FLASK_APP_CLASS_NAME') or os.getenv('APP_WRAPPER_CLASS')
FLASK_APP_KWARGS_DICT_STR = os.getenv('FLASK_APP_KWARGS_DICT') or os.getenv('APP_WRAPPER_ARGS')

""" Support for importing a subclass of flask.Flask, via env variables """
if FLASK_APP_MODULE_NAME and FLASK_APP_CLASS_NAME:
    print('Using requested Flask module {module_name} and class {class_name}'
          .format(module_name=FLASK_APP_MODULE_NAME, class_name=FLASK_APP_CLASS_NAME), file=sys.stderr)
    moduleName = FLASK_APP_MODULE_NAME
    module = importlib.import_module(moduleName)
    moduleClass = FLASK_APP_CLASS_NAME
    app_wrapper_class = getattr(module, moduleClass)  # type: ignore
else:
    app_wrapper_class = Flask

You can set the variables to set the Flask entry points to FlaskOIDC, as shown in the snippet below:

$ export FLASK_APP_MODULE_NAME=flaskoidc
$ export FLASK_APP_CLASS_NAME=FlaskOIDC

# This is needed to invoke OidcConfig module ./amundsen_application/wsgi.py 
$ export FRONTEND_SVC_CONFIG_MODULE_CLASS='amundsen_application.oidc_config.OidcConfig'

# This is needed to set flaskoidc for ./amundsen_application/__init__.py
$ export APP_WRAPPER=flaskoidc
$ export APP_WRAPPER_CLASS=FlaskOIDC

Set OIDC identity provider app credentials

The next step is to set the variables required to enable Amundsen to go to the identity service for authentication. You can get these credentials from the identity provider app you just created in the previous section. Here are some of the other variables that you would need to set based on your OIDC identity provider:

$ export FLASK_OIDC_PROVIDER_NAME='oidc-provider'
$ export FLASK_OIDC_CLIENT_ID='client-id'
$ export FLASK_OIDC_CLIENT_SECRET='client-secret'
$ export FLASK_OIDC_SECRET_KEY='base-flask-oidc-secret-key'

Route Amundsen to your OIDC identity provider

Now that you have FlaskOIDC and the credentials ready, let’s look at the variable you need to set to redirect Amundsen to your chosen OIDC identity provider. You need to

$ export FLASK_OIDC_CONFIG_URL='https://app-name.oidc-provider.com/.well-known/openid-configuration'
$ export FLASK_OIDC_REDIRECT_URI: "https://localhost:5000/authorization-code/callback"

If you don’t want to use environment variables for storing credentials, you can keep the credentials in a JSON file which your frontend service will pick up if you set another variable OIDC_CLIENT_SECRETS to the path of the JSON file. The JSON would look something like the following:

{
  "web":{
    "issuer":"https://your-app.oidc-provider.com/oauth2/default",
    "auth_uri":"https://your-app.oidc-provider.com/oauth2/default/v1/authorize",
    "client_id":"client-id",
    "client_secret":"client-secret",
    "token_uri":"https://your-app.oidc-provider.com/oauth2/default/v1/token",
    "token_introspection_uri":"https://your-app.oidc-provider.okta.com/oauth2/default/v1/introspect",
    "userinfo_uri":"https://your-app.oidc-provider.com/oauth2/default/v1/userinfo",
    "redirect_uris":[ 
      "http://localhost:5000/authorization-code/callback"
    ]
  }
}

Step 4: Rebuild the frontend and deploy

Build static content

You can start building the front-end service after integrating FlaskOIDC with Amundsen’s front-end service and ensuring that Amundsen has all the secrets and configuration items in place. To do that, run the following set of commands to build the static content first:

$ cd amundsen/frontend/amundsen_application/static
$ npm install
$ npm run build
$ cd ../../

Build the Flask app

Now, run the following set of commands to rebuild the Flask app with your newly created OIDC integration:

$ python3 -m venv venv
$ source venv/bin/activate
$ pip3 install -e ".[all]" .

Start Amundsen frontend in standalone mode

Once this installation is complete, you can start the frontend service in standalone mode, and you can visit the Amundsen URL to check if everything’s been configured correctly. You can do that using the following command:

$ python3 amundsen_application/wsgi.py

Build and deploy Docker images after local changes

However, starting the Amundsen frontend in standalone mode won’t let you interact with Amundsen’s other services, such as search, metadata, etc. To do that, you’ll need to build your Docker images from the ground up and deploy them using the docker-amundsen-local.yml file, as shown below:

$ docker-compose -f docker-amundsen-local.yml build
$ docker-compose -f docker-amundsen-local.yml up -d

The official documentation has a couple of good suggestions when you are making changes to the frontend service and rebuilding the images using Docker:

  • Use the -no-cache When rebuilding the image, ensure that you aren’t using an old version of a Docker image.
  • Delete old images or get the latest image of your container by using the docker images command.

Step 5: Log into Amundsen using OIDC auth

Amundsen will direct all your authentication requests to your OIDC-based identity provider if the Docker build and deployment are successful. Once Amundsen hears back from the identity provider, it will let you into Amundsen, and you’ll be able to use all the features of Amundsen.

Inspect the logs

To know more about what’s happening behind the scenes when you log into Amundsen using your OIDC-based identity provider, you can tail the logs of your Docker containers when building and deploying, using the following command:

$ docker-compose -f docker-amundsen-local.yml logs — tail=3 -f

You’ll be able to see the requests and responses from your OIDC identity provider in real-time.

Conclusion

As mentioned at the beginning of the article, Amundsen doesn’t come with any authentication, but businesses must have their data safe and secure. Information about data, i.e., metadata alone, can also make your system vulnerable to security breaches and attacks. To narrow the attack surface, you need to run Amundsen with a zero-trust architecture for authentication. This article took you through all the necessary steps to enable OIDC-based identity checks in Amundsen. Your Amundsen deployment should be safer now.


Amundsen demo: Get hands-on

This blog is part of a series of blogs where we are discussing steps to setup Amundsen  as a data catalog for your team. In case you don't want to go through this entire process and want to quickly browse through the Amundsen experience, we've set up a sample demo environment for you, feel free to explore:

Click to try Amundsen



If you are a data consumer or producer and are looking to champion your organization to optimally utilize the value of your modern data stack — while weighing your build vs buy option — it’s worth taking a look at off-the-shelf alternatives like Atlan — A data catalog and metadata management tool built for the modern data teams.

"It would take six or seven people up to two years to build what Atlan gave us out of the box. We needed a solution on day zero, not in a year or two."

Akash Deep Verma
Akash Deep Verma

Director of Data Engineering

Delhivery: Leading fulfilment platform for digital commerce.

Build vs Buy: Delhivery’s Learnings from Implementing a Data Catalog

Build vs Buy: Delhivery’s Learnings from Implementing a Data Catalog