Data Catalog for Microsoft SQL Server: How to Setup Guide

Emily Winks profile picture
Data Governance Expert
Published:08/04/2023
|
Updated:08/05/2023
8 min read

Key takeaways

  • Understanding data catalog for microsoft sql server: how to setup guide is key for modern data teams.
  • A structured approach helps organizations scale their data governance efforts.

Quick Answer: How do you set up a data catalog for Microsoft SQL Server?

Setting up a data catalog for Microsoft SQL Server involves connecting to your SQL Server instance, configuring metadata extraction, establishing data lineage, and enabling discovery for users. This guide covers prerequisites, connection setup, metadata syncing, and best practices for maintaining catalog accuracy.

Setup steps:

  • Prerequisites and requirements
  • Connection configuration guide
  • Metadata extraction setup
  • Lineage establishment process
  • Best practices for maintenance

Want to skip the manual work?

See Atlan in Action


Enterprises the world over entrust their data to Microsoft SQL Server. But some companies have dozens, hundreds, or even thousands of instances of the popular database software running. Monitoring them all via a data catalog ensures that data is tracked, classified, and appropriately secured.

In this article, we’ll show how to track data inside Microsoft SQL Server using a data catalog. We’ll discuss the prerequisite setup you need, how to connect your data catalog to SQL Server, and how to set up a data crawler to track data and monitor for changes.


How to set up a data catalog for Microsoft SQL Server: Prerequisites

Permalink to “How to set up a data catalog for Microsoft SQL Server: Prerequisites”

Before setting up a data catalog integration with SQL Server, make sure you have the following in place:

  • A data catalog: We’ve written the steps below specifically for the Atlan data catalog. However, most of these steps are generalizable to any major data catalog on the market.
  • Data catalog rights: You must have an account (such as an administrator) that can create a new connector to an external data source, as well as create a new crawler.
  • Networking prerequisites: Your SQL Server instances need to be reachable from your data catalog. Suppose either your data catalog or your SQL Server instances are running in the cloud in a private virtual network. In that case, this may require creating a NAT server and outbound firewall rules to allow access to Power BI.
  • Ensure SQL Server capacity: Crawling all of the data in a database regularly will result in additional read activity. That will consume SQL Server memory, networking bandwidth, and log space. Make sure you have sufficient capacity to support a full database scan from your data catalog.

You may consider setting up a dedicated read replica for this. If you use Azure SQL Server, read replicas are part of every Premium and Business SQL Server plan.


Determine assets to crawl

Permalink to “Determine assets to crawl”

Before starting, determine which assets in your SQL Server instances you want to crawl and track. Typical trackable assets include:

  • Databases
  • Schemas
  • Tables
  • Views
  • Columns
  • Routines

Most data catalogs will enable you to either specifically include or exclude what objects you want to track.

By default, it’s a good practice to track everything. However, you may decide to exclude specific sensitive data from tracking or exclude derivative data to make crawling more efficient.


Determine connection type

Permalink to “Determine connection type”

There are a couple of ways to catalog data in SQL Server:

  • Direct method. Your data catalog crawls SQL Server directly by establishing a connection to the server and reading from it like any other user or application.
  • Indirect method. In this method, you dump your SQL Server data to a data store - e.g., an Azure storage account, or an AWS S3 bucket - and import it from this secondary source.

There are several reasons why you might use the indirect method:

  • You already source all of your data catalog data from a single source (e.g., a data lake).
  • You have multiple applications accessing this data and need to reduce read activity on your SQL Server instances.
  • You need to conceal certain data. If your SQL Server instance contains sensitive data - Personally Identifiable Information (PII) for customers - you can redact it as part of the extraction and loading process.
  • Your database is running in an environment (e.g., on-premises) that makes a direct connection from your data catalog impossible.

Atlan supports either a direct connection or import from Amazon S3. We’ll discuss both methods in detail below.


Set up cloud storage export for the indirect method

Permalink to “Set up cloud storage export for the indirect method”

If you’re using the indirect method, you’ll need to set up an export job for your SQL Server data first. How you do this depends on a couple of factors, including:

  • Where you host SQL Server and what import/export tools you have available
  • Your data destination

Two common patterns are:

Atlan metadata extractor

Permalink to “Atlan metadata extractor”

If you’re using Atlan as your data catalog, we also offer a metadata extractor that runs as a Docker container. The metadata extractor supports crawling data from Microsoft SQL Server, MySQL, Oracle, and PostgreSQL.


Set up SQL Server for data catalog connector access

Permalink to “Set up SQL Server for data catalog connector access”

For direct extraction, you’ll need to create a connector from your data catalog and specify how to authenticate.

Microsoft SQL Server supports two modes of authentication:

  • Windows authentication, which uses an Active Directory identity
  • Mixed mode authentication, which supports AD but also accepts a separate username and password

Atlan supports username/password authentication to Microsoft SQL Server. So, for a data catalog like Atlan, you would connect to your SQL Server instance and create a new user with the CREATE LOGIN Transact-SQL command:


CREATE LOGIN <login_name> WITH PASSWORD = '<password>';

Replace login_name with a unique name in your server and password with a strong password.

Next, create a user for your login with the CREATE USER command. This step is necessary because SQL Server supports the concept of system users that exist and can run jobs but do not have permission to log in interactively.


CREATE USER <username> FOR LOGIN <login_name>;

You’ll also need to supply a password for the user, which you can do with the ALTER LOGIN command (Note: you’ll need to be logged in as a user with the ALTER ANY LOGIN permission):


ALTER LOGIN <username> WITH PASSWORD = '<enterStrongPasswordHere>';

Next, grant permissions for your user to access all of the databases you will catalog with the GRANT command. Again, replace the values below with the ones specific to your environment:


GRANT SELECT ON DATABASE::<database_name> TO <username>;

For Atlan, you’ll need to repeat this command for every database in your SQL Server instance except the system databases (master, tempdb, msdb, model). If you don’t, the crawler will fail to run properly.

You’ll also need to grant permissions to any views on which you want to track data lineage:


GRANT VIEW DEFINITION ON DATABASE::<database_name> TO <username>;


Configure your connection

Permalink to “Configure your connection”

Finally, set up data crawling on SQL Server from your data catalog. In Atlan, this means creating a new workflow using the SQL Server Assets package. This is a SQL Server data connector created and maintained by Atlan.

For your workflow, specify which connection method to use: Direct or S3. For Direct connections, specify:

  • Database domain name/IP address
  • Port
  • Username
  • Password

For S3 connections, specify the S3 bucket name, the S3 bucket prefix, and the AWS region in which your bucket resides.

Finally, give your connection a name (e.g., Development, Production) and further configure your crawler to specify what data it crawls. In Atlan, these options include:

  • Include or exclude metadata: Specify the name of objects you either wish to include in or leave out of crawling.
  • Regex exclude: Specify a regular expression that will exclude all database objects it matches from crawling.

At this point, in Atlan, you can run a preflight check that tests your connection works.

Finally, once you’ve configured the crawler, you can either run it once or set it to run on a schedule. You can schedule the crawler to run hourly, daily, weekly, or monthly.


The business value of adding SQL Server data to your data catalog

Permalink to “The business value of adding SQL Server data to your data catalog”

Company employees say they waste up to eight hours each week looking for data. By bringing all of your SQL Server assets under the scope of a single data catalog, your users can more quickly find the data they need. That spells a massive boost in productivity across your company.

You’ll also receive all of the other benefits of using a data catalog, including:

  • Optimizing data governance and business efficiency by showing exactly what data is stored and where
  • Ensuring consistency in data quality by detecting anomalies, inconsistencies, and duplicates in data
  • Ensuring regulatory compliance by tagging data to ensure that it meets the regulatory requirements of standards such as CCPA, HIPAA, PCI DSS, and GDPR
  • Reducing spending and unnecessary costs by increasing productivity and avoiding damaging regulatory fines

How to deploy Atlan for Microsoft SQL Server

Permalink to “How to deploy Atlan for Microsoft SQL Server”

For more steps specific to setting up Microsoft SQL Server to work with Atlan, see the following documentation:


Share this article

signoff-panel-logo

Atlan is the next-generation platform for data and AI governance. It is a control plane that stitches together a business's disparate data infrastructure, cataloging and enriching data with business context and security.

 

Atlan named a Leader in 2026 Gartner® Magic Quadrant™ for D&A Governance. Read Report →

[Website env: production]