Data Catalog for Microsoft SQL Server: How to Setup Guide

Updated August 04th, 2023

Share this article

Enterprises the world over entrust their data to Microsoft SQL Server. But some companies have dozens, hundreds, or even thousands of instances of the popular database software running. Monitoring them all via a data catalog ensures that data is tracked, classified, and appropriately secured.

In this article, we’ll show how to track data inside Microsoft SQL Server using a data catalog. We’ll discuss the prerequisite setup you need, how to connect your data catalog to SQL Server, and how to set up a data crawler to track data and monitor for changes.

Table of contents #

How to set up a data catalog for Microsoft SQL Server: Prerequisites
Determine assets to crawl
Determine connection type
Set up cloud storage export for the indirect method
Set up SQL Server for data catalog connector access
Configure your connection
The business value of adding SQL Server data to your data catalog
How to deploy Atlan for Microsoft SQL Server
Related reads

How to set up a data catalog for Microsoft SQL Server: Prerequisites #

Before setting up a data catalog integration with SQL Server, make sure you have the following in place:

A data catalog: We’ve written the steps below specifically for the Atlan data catalog. However, most of these steps are generalizable to any major data catalog on the market.
Data catalog rights: You must have an account (such as an administrator) that can create a new connector to an external data source, as well as create a new crawler.
Networking prerequisites: Your SQL Server instances need to be reachable from your data catalog. Suppose either your data catalog or your SQL Server instances are running in the cloud in a private virtual network. In that case, this may require creating a NAT server and outbound firewall rules to allow access to Power BI.
Ensure SQL Server capacity: Crawling all of the data in a database regularly will result in additional read activity. That will consume SQL Server memory, networking bandwidth, and log space. Make sure you have sufficient capacity to support a full database scan from your data catalog.

You may consider setting up a dedicated read replica for this. If you use Azure SQL Server, read replicas are part of every Premium and Business SQL Server plan.

Determine assets to crawl #

Before starting, determine which assets in your SQL Server instances you want to crawl and track. Typical trackable assets include:

Databases
Schemas
Tables
Views
Columns
Routines

Most data catalogs will enable you to either specifically include or exclude what objects you want to track.

By default, it’s a good practice to track everything. However, you may decide to exclude specific sensitive data from tracking or exclude derivative data to make crawling more efficient.

Determine connection type #

There are a couple of ways to catalog data in SQL Server:

Direct method. Your data catalog crawls SQL Server directly by establishing a connection to the server and reading from it like any other user or application.
Indirect method. In this method, you dump your SQL Server data to a data store - e.g., an Azure storage account, or an AWS S3 bucket - and import it from this secondary source.

There are several reasons why you might use the indirect method:

You already source all of your data catalog data from a single source (e.g., a data lake).
You have multiple applications accessing this data and need to reduce read activity on your SQL Server instances.
You need to conceal certain data. If your SQL Server instance contains sensitive data - Personally Identifiable Information (PII) for customers - you can redact it as part of the extraction and loading process.
Your database is running in an environment (e.g., on-premises) that makes a direct connection from your data catalog impossible.

Atlan supports either a direct connection or import from Amazon S3. We’ll discuss both methods in detail below.

Set up cloud storage export for the indirect method #

If you’re using the indirect method, you’ll need to set up an export job for your SQL Server data first. How you do this depends on a couple of factors, including:

Where you host SQL Server and what import/export tools you have available
Your data destination

Two common patterns are:

Export to Amazon S3 using AWS Database Migration Service (DMS): Set up a new DMS replication instance and replication task to run that has permission to access both your SQL Server database as well as the target S3 bucket.
Export to Azure blob storage using Azure Data Factory: Create a new pipeline in Azure Data Factory with a connection to your SQL Server database that then saves the SQL data to Azure blog storage in comma-separated text (CSV) format.

Atlan metadata extractor #

If you’re using Atlan as your data catalog, we also offer a metadata extractor that runs as a Docker container. The metadata extractor supports crawling data from Microsoft SQL Server, MySQL, Oracle, and PostgreSQL.

Set up SQL Server for data catalog connector access #

For direct extraction, you’ll need to create a connector from your data catalog and specify how to authenticate.

Microsoft SQL Server supports two modes of authentication:

Windows authentication, which uses an Active Directory identity
Mixed mode authentication, which supports AD but also accepts a separate username and password

Atlan supports username/password authentication to Microsoft SQL Server. So, for a data catalog like Atlan, you would connect to your SQL Server instance and create a new user with the CREATE LOGIN Transact-SQL command:


CREATE LOGIN <login_name> WITH PASSWORD = '<password>';

Replace login_name with a unique name in your server and password with a strong password.

Next, create a user for your login with the CREATE USER command. This step is necessary because SQL Server supports the concept of system users that exist and can run jobs but do not have permission to log in interactively.


CREATE USER <username> FOR LOGIN <login_name>;

You’ll also need to supply a password for the user, which you can do with the ALTER LOGIN command (Note: you’ll need to be logged in as a user with the ALTER ANY LOGIN permission):


ALTER LOGIN <username> WITH PASSWORD = '<enterStrongPasswordHere>';

Next, grant permissions for your user to access all of the databases you will catalog with the GRANT command. Again, replace the values below with the ones specific to your environment:


GRANT SELECT ON DATABASE::<database_name> TO <username>;

For Atlan, you’ll need to repeat this command for every database in your SQL Server instance except the system databases (master, tempdb, msdb, model). If you don’t, the crawler will fail to run properly.

You’ll also need to grant permissions to any views on which you want to track data lineage:


GRANT VIEW DEFINITION ON DATABASE::<database_name> TO <username>;

Configure your connection #

Finally, set up data crawling on SQL Server from your data catalog. In Atlan, this means creating a new workflow using the SQL Server Assets package. This is a SQL Server data connector created and maintained by Atlan.

For your workflow, specify which connection method to use: Direct or S3. For Direct connections, specify:

Database domain name/IP address
Port
Username
Password

For S3 connections, specify the S3 bucket name, the S3 bucket prefix, and the AWS region in which your bucket resides.

Finally, give your connection a name (e.g., Development, Production) and further configure your crawler to specify what data it crawls. In Atlan, these options include:

Include or exclude metadata: Specify the name of objects you either wish to include in or leave out of crawling.
Regex exclude: Specify a regular expression that will exclude all database objects it matches from crawling.

At this point, in Atlan, you can run a preflight check that tests your connection works.

Finally, once you’ve configured the crawler, you can either run it once or set it to run on a schedule. You can schedule the crawler to run hourly, daily, weekly, or monthly.

The business value of adding SQL Server data to your data catalog #

Company employees say they waste up to eight hours each week looking for data. By bringing all of your SQL Server assets under the scope of a single data catalog, your users can more quickly find the data they need. That spells a massive boost in productivity across your company.

You’ll also receive all of the other benefits of using a data catalog, including:

Optimizing data governance and business efficiency by showing exactly what data is stored and where
Ensuring consistency in data quality by detecting anomalies, inconsistencies, and duplicates in data
Ensuring regulatory compliance by tagging data to ensure that it meets the regulatory requirements of standards such as CCPA, HIPAA, PCI DSS, and GDPR
Reducing spending and unnecessary costs by increasing productivity and avoiding damaging regulatory fines

How to deploy Atlan for Microsoft SQL Server #

For more steps specific to setting up Microsoft SQL Server to work with Atlan, see the following documentation:

What Is a Data Catalog? & Do You Need One?
AI Data Catalog: Exploring the Possibilities That Artificial Intelligence Brings to Your Metadata Applications & Data Interactions
8 Ways AI-Powered Data Catalogs Save Time Spent on Documentation, Tagging, Querying & More
Data Catalog Market: Current State and Top Trends in 2023
15 Essential Data Catalog Features to Look For in 2023
What is Active Metadata? — Definition, Characteristics, Example & Use Cases
Data catalog benefits: 5 key reasons why you need one
Open Source Data Catalog Software: 5 Popular Tools to Consider in 2023
Data Catalog Platform: The Key To Future-Proofing Your Data Stack
Top Data Catalog Use Cases Intrinsic to Data-Led Enterprises
Business Data Catalog: Users, Differentiating Features, Evolution & More