Data Catalog for Microsoft SQL Server: How to Setup Guide

Updated August 04th, 2023
Microsoft SQL Server Data Catalog

Share this article

Enterprises the world over entrust their data to Microsoft SQL Server. But some companies have dozens, hundreds, or even thousands of instances of the popular database software running. Monitoring them all via a data catalog ensures that data is tracked, classified, and appropriately secured.

In this article, we’ll show how to track data inside Microsoft SQL Server using a data catalog. We’ll discuss the prerequisite setup you need, how to connect your data catalog to SQL Server, and how to set up a data crawler to track data and monitor for changes.

Table of contents #

  1. How to set up a data catalog for Microsoft SQL Server: Prerequisites
  2. Determine assets to crawl
  3. Determine connection type
  4. Set up cloud storage export for the indirect method
  5. Set up SQL Server for data catalog connector access
  6. Configure your connection
  7. The business value of adding SQL Server data to your data catalog
  8. How to deploy Atlan for Microsoft SQL Server
  9. Related reads

How to set up a data catalog for Microsoft SQL Server: Prerequisites #

Before setting up a data catalog integration with SQL Server, make sure you have the following in place:

  • A data catalog: We’ve written the steps below specifically for the Atlan data catalog. However, most of these steps are generalizable to any major data catalog on the market.
  • Data catalog rights: You must have an account (such as an administrator) that can create a new connector to an external data source, as well as create a new crawler.
  • Networking prerequisites: Your SQL Server instances need to be reachable from your data catalog. Suppose either your data catalog or your SQL Server instances are running in the cloud in a private virtual network. In that case, this may require creating a NAT server and outbound firewall rules to allow access to Power BI.
  • Ensure SQL Server capacity: Crawling all of the data in a database regularly will result in additional read activity. That will consume SQL Server memory, networking bandwidth, and log space. Make sure you have sufficient capacity to support a full database scan from your data catalog.

You may consider setting up a dedicated read replica for this. If you use Azure SQL Server, read replicas are part of every Premium and Business SQL Server plan.

Determine assets to crawl #

Before starting, determine which assets in your SQL Server instances you want to crawl and track. Typical trackable assets include:

  • Databases
  • Schemas
  • Tables
  • Views
  • Columns
  • Routines

Most data catalogs will enable you to either specifically include or exclude what objects you want to track.

By default, it’s a good practice to track everything. However, you may decide to exclude specific sensitive data from tracking or exclude derivative data to make crawling more efficient.

Determine connection type #

There are a couple of ways to catalog data in SQL Server:

  • Direct method. Your data catalog crawls SQL Server directly by establishing a connection to the server and reading from it like any other user or application.
  • Indirect method. In this method, you dump your SQL Server data to a data store - e.g., an Azure storage account, or an AWS S3 bucket - and import it from this secondary source.

There are several reasons why you might use the indirect method:

  • You already source all of your data catalog data from a single source (e.g., a data lake).
  • You have multiple applications accessing this data and need to reduce read activity on your SQL Server instances.
  • You need to conceal certain data. If your SQL Server instance contains sensitive data - Personally Identifiable Information (PII) for customers - you can redact it as part of the extraction and loading process.
  • Your database is running in an environment (e.g., on-premises) that makes a direct connection from your data catalog impossible.

Atlan supports either a direct connection or import from Amazon S3. We’ll discuss both methods in detail below.

Set up cloud storage export for the indirect method #

If you’re using the indirect method, you’ll need to set up an export job for your SQL Server data first. How you do this depends on a couple of factors, including:

  • Where you host SQL Server and what import/export tools you have available
  • Your data destination

Two common patterns are:

Atlan metadata extractor #

If you’re using Atlan as your data catalog, we also offer a metadata extractor that runs as a Docker container. The metadata extractor supports crawling data from Microsoft SQL Server, MySQL, Oracle, and PostgreSQL.

Set up SQL Server for data catalog connector access #

For direct extraction, you’ll need to create a connector from your data catalog and specify how to authenticate.

Microsoft SQL Server supports two modes of authentication:

  • Windows authentication, which uses an Active Directory identity
  • Mixed mode authentication, which supports AD but also accepts a separate username and password

Atlan supports username/password authentication to Microsoft SQL Server. So, for a data catalog like Atlan, you would connect to your SQL Server instance and create a new user with the CREATE LOGIN Transact-SQL command:

CREATE LOGIN <login_name> WITH PASSWORD = '<password>';

Replace login_name with a unique name in your server and password with a strong password.

Next, create a user for your login with the CREATE USER command. This step is necessary because SQL Server supports the concept of system users that exist and can run jobs but do not have permission to log in interactively.

CREATE USER <username> FOR LOGIN <login_name>;

You’ll also need to supply a password for the user, which you can do with the ALTER LOGIN command (Note: you’ll need to be logged in as a user with the ALTER ANY LOGIN permission):

ALTER LOGIN <username> WITH PASSWORD = '<enterStrongPasswordHere>';

Next, grant permissions for your user to access all of the databases you will catalog with the GRANT command. Again, replace the values below with the ones specific to your environment:

GRANT SELECT ON DATABASE::<database_name> TO <username>;

For Atlan, you’ll need to repeat this command for every database in your SQL Server instance except the system databases (master, tempdb, msdb, model). If you don’t, the crawler will fail to run properly.

You’ll also need to grant permissions to any views on which you want to track data lineage:

GRANT VIEW DEFINITION ON DATABASE::<database_name> TO <username>;

Configure your connection #

Finally, set up data crawling on SQL Server from your data catalog. In Atlan, this means creating a new workflow using the SQL Server Assets package. This is a SQL Server data connector created and maintained by Atlan.

For your workflow, specify which connection method to use: Direct or S3. For Direct connections, specify:

  • Database domain name/IP address
  • Port
  • Username
  • Password

For S3 connections, specify the S3 bucket name, the S3 bucket prefix, and the AWS region in which your bucket resides.

Finally, give your connection a name (e.g., Development, Production) and further configure your crawler to specify what data it crawls. In Atlan, these options include:

  • Include or exclude metadata: Specify the name of objects you either wish to include in or leave out of crawling.
  • Regex exclude: Specify a regular expression that will exclude all database objects it matches from crawling.

At this point, in Atlan, you can run a preflight check that tests your connection works.

Finally, once you’ve configured the crawler, you can either run it once or set it to run on a schedule. You can schedule the crawler to run hourly, daily, weekly, or monthly.

The business value of adding SQL Server data to your data catalog #

Company employees say they waste up to eight hours each week looking for data. By bringing all of your SQL Server assets under the scope of a single data catalog, your users can more quickly find the data they need. That spells a massive boost in productivity across your company.

You’ll also receive all of the other benefits of using a data catalog, including:

  • Optimizing data governance and business efficiency by showing exactly what data is stored and where
  • Ensuring consistency in data quality by detecting anomalies, inconsistencies, and duplicates in data
  • Ensuring regulatory compliance by tagging data to ensure that it meets the regulatory requirements of standards such as CCPA, HIPAA, PCI DSS, and GDPR
  • Reducing spending and unnecessary costs by increasing productivity and avoiding damaging regulatory fines

How to deploy Atlan for Microsoft SQL Server #

For more steps specific to setting up Microsoft SQL Server to work with Atlan, see the following documentation:

Share this article

[Website env: production]