Data Catalog Implementation Plan: 10 Steps to Follow, Common Roadblocks & Solutions
Share this article
A clear data catalog implementation plan is essential for ensuring the successful deployment of your data catalog. From ensuring connectivity and interoperability with your data stack to driving adoption, the plan can guide your implementation efforts in numerous ways.
This article will explore the ten essential steps of a data catalog implementation plan and address common implementation challenges and ways to overcome them.
Looking for a data catalog with an ROI you can present to your CDO? Atlan is designed for adoption and embedded with automation. It helps you save time, cut cloud costs, and make faster, better decisions that lead to revenue.
Table of contents #
- Data catalog implementation plan: Prerequisites to consider
- Steps to follow
- Common challenges
- Data catalog implementation plan
- Related Reads
Data catalog implementation plan: Prerequisites to consider #
Before creating a data catalog implementation plan, you should review a few essential factors. Start by thinking about your organization’s size, stage, and structure.
A 1,000-person, 20-year-old, globally distributed company must consider coordinating across regions and connecting to diverse data sources. Meanwhile, a small startup may have fewer data sources and cross-functional overheads to consider.
Based on the size, stage, and maturity of your organization’s data practices, think about who will need to be involved with the implementation and in what capacity.
To determine who will need to participate and at which stages of the process, consider key questions, such as:
- Who will be the point of contact for the data catalog provider?
- Do you have a dedicated data team, or are your data engineers and subject matter experts spread across teams?
- Will you need a dedicated project or program manager to oversee the work?
- Are any of the involved teams facing resource constraints or competing priorities?
- What are the key milestones for each phase of the implementation? Who will need to participate in each phase, and how will you know if the milestone is complete?
Preparing your data infrastructure before the implementation starts will save you time later. So, also evaluate whether your data infrastructure is ready. Here are some questions to ask:
- Do your data sources’ security settings allow them to connect to the catalog?
- Are the tools in your data stack running on the latest software versions?
- Do your databases have enough computational resources to support extracting and syncing your data?
It’s best to start small. Consider choosing a proof of concept (POC) to pilot the catalog, such as a single BI report. Through this POC, try:
- Connecting data sources to the catalog
- Using the catalog while working on the report
- Adding one or two workflows, like creating an issue in Jira or flagging a question in Slack
Once you’ve implemented the POC, you can gather feedback from those involved, such as data engineers and the report’s users, to iterate your approach.
The POC is also a great way to understand the support and guidance you can expect from the data catalog provider. The catalog provider should act more as a partner, rather than a vendor, by assigning an implementation point of contact who will offer onboarding support, training materials, and even a template project plan to help you deploy the data catalog and drive adoption.
Consider Takealot, for instance. The South African retail and eCommerce leader secured technical contact support and a customer success manager for Takealot’s implementation from Atlan. These two partner points of contact helped Takealot connect all its data sources, while offering critical support whenever roadblocks arose.
Having considered the above prerequisites, let’s move on to the specifics of a data catalog implementation plan.
Data catalog implementation plan: 10 steps to follow #
When creating your data catalog implementation plan, include these ten steps:
- Connect data sources to the catalog with native connectors or APIs
- Extract metadata from your data estate
- Map data domains and user personas
- Establish governance controls
- Set up the business glossary
- Ensure your data assets come with full context
- Configure your workflows
- Train data catalog admins and consumers
- Promote data catalog adoption
- Monitor the data catalog’s performance, adoption, and use
Let’s delve into the specifics of each step further.
1. Connect data sources to the catalog with native connectors or APIs #
The first step is to connect your data sources (for example, your data warehouses, BI tools, and ETL tools) to the data catalog. You will most likely create these connections through out-of-the-box (OOTB) connectors or via APIs.
If you’re using a native connector, you’ll need to configure the connection and perform preflight checks to ensure that all the necessary grants and permissions are in place. This helps avoid common installation issues. Preflight checks may vary depending on the data source.
This is also the stage to set the frequency of data source syncs.
If you’re using APIs, you must also evaluate the data catalog’s technical requirements and review its developer documentation to establish a successful connection.
Data catalogs like Atlan are built for extensibility on an Open API architecture. Additionally, they will offer software development kits (SDKs) — client libraries wrapping Atlan’s underlying REST APIs. As a result, you don’t have to start from scratch.
2. Extract metadata from your data estate #
Once you’ve connected your data sources to the catalog, you can start the extraction process to collect your data estate’s metadata. Different data sources may require different extraction methods and testing to ensure quality.
For automated extraction, you may need to review the results to determine if there are any mapping issues with how the catalog crawls your metadata. Depending on your data sources—for example, if your databases are on-premises rather than in the cloud—you may need to use a metadata extractor tool.
You can also manually extract preexisting external metadata and load it into the data catalog via spreadsheet to enrich your metadata.
3. Map data domains and user personas #
Now that your data is successfully syncing to the data catalog, you can determine which groups of users work with, own, or maintain the data domains.
Such a mapping process might involve many departments and stakeholders. Takealot, the South African eCommerce and retail company, experienced this complexity firsthand. Multiple domains, such as Finance, Supply Chain and Logistics, and eCommerce, used the same data set, like “Orders” or “Customers.”
So, Takealot systematically assigned primary owners and stakeholders for each asset, determining the ultimate decision-makers responsible for defining and maintaining them.
4. Establish governance controls #
Data governance ensures data enablement and democratization. Traditionally, it was synonymous with centralized control, rules, and policies. However, modern data governance empowers everyone to extract value from data.
So, customize your data governance initiative with personalization — build policies around who your data users are, which teams they work in, and what projects they’re working on.
Consider defining policies and curating data assets based on business domains and project context. This also helps you set up granular access policies — manage who can read metadata, view and query your data, edit data definitions or business taxonomies, and more.
Scaling your data governance efforts requires automation. That’s why the data catalog should help you automatically:
- Identify, classify and tag data assets, including sensitive assets (PII data)
- Propagate custom classifications downstream and upstream via lineage
- Secure your data with custom masking and hashing
- Add context to data assets with documentation like READMEs and business glossary
Here’s an example of automation in action. Tide is a mobile-first financial platform based in the UK. Instead of spending 50 days manually identifying and then tagging personally identifiable information, Tide used automation to identify, tag, and classify the data in a single, automated workflow. As a result, they finished their task in mere hours.
Also, read → How Atlan views and implements data governance
5. Set up the business glossary #
To make your data catalog valuable to business consumers and enrich your governance controls, you’ll need a business glossary that defines and describes key terms in a standardized fashion.
So, start by gathering a list of terms and concepts you want to include and ask the core stakeholders to create definitions. Also, create a process for updating them as they evolve.
The size and distribution of your organization will play an important role in how this step will look. A small, co-located team with a centralized data estate can establish a business glossary quickly. However, a larger, globally distributed organization must coordinate across departments, teams, regions, and even languages.
For instance, Aliaxis, a global manufacturer and distributor of plumbing and water solutions, has 15,000 employees in over 40 countries. The company used its data catalog to create both global and regional business glossaries stored in a single location that could serve all its users, regardless of department or geography.
6. Ensure your data assets come with full context #
To get the full value of your data catalog, you need as much information as possible about your data through data lineage and documentation.
To drive your documentation efforts, you can consider using AI to suggest content for artifacts like READMEs and asset descriptions. This can significantly scale your data documentation efforts and ensure proper context for all data assets in your data estate.
Consider the case of Delhivery. The company increased documentation by 250% with a data catalog through auto-populated column descriptions for 90% of its assets. It also assigns owners by scanning the query log history.
You should also encourage a culture of continuous documentation to keep your assets updated whenever changes occur. An online audio content leader did just that to ensure proper context for its critical assets:
7. Configure your workflows #
You must outline how data users integrate the catalog into their daily tasks to encourage collaboration.
Consider setting up integrations with workflow tools like Jira and Slack so that users can create tickets on data issues as they’re working and notify the data owner automatically. This is called embedded collaboration, wherein you collaborate with your team without leaving the data catalog and switching apps.
With embedded collaboration, you can create Jira tickets, add announcements, ask questions on Slack channels, etc., directly from the data catalog.
For example, the credit card processing company North American Bancard uses Atlan’s Chrome Plug-in to help hundreds of BI users submit Jira tickets directly from a dashboard.
8. Train data catalog admins and consumers #
Workflows enhanced with a data catalog won’t create value if no one uses them. Additionally, a data catalog’s features like lineage and glossary only remain beneficial through maintenance. So, you’ll need to provide training for both your data admins and consumers.
For the data governance team, this might mean teaching users how to configure permissions through personas and purposes. End users, on the other hand, may require guidance on tasks such as data discovery or creating an issue in Jira via the catalog.
Data owners and business analysts might need training on writing more advanced SQL queries within the catalog and managing the business glossary effectively. Data scientists, engineers, and IT admins will also need in-depth training on creating and maintaining pipelines and technical workflows.
Lastly, all data practitioners in your organization must be trained on documentation standards, compliance requirements, and data use for reporting.
Also, read → Data catalog training 101
9. Promote data catalog adoption #
Changing established workflows is hard. Even if a new tool’s benefits are obvious, people may not welcome change. So, encourage adoption through workshops, meetings, and awareness sessions. Consider creating a program that will work individually with each department to roll out the data catalog.
Also, consider strategies to promote data literacy throughout your organization and enhance all users’ confidence with data.
Expand this comfort level beyond solely using the catalog: think about how you can include your other data and analytics tools to demonstrate how powerful a data catalog can be when combined with your existing data stack.
Swapfiets, the world’s first bicycle subscription service, did just this. The Dutch “bicycle as a service” company enables self-service with their Data Literacy Program. The program teaches new Swapfiets employees about Atlan’s capabilities and how to integrate them with Tableau. It also offers SQL-Atlan training for its Data Champions.
Also, read → Data catalog adoption: What limits it and how to drive it effectively
10. Monitor the data catalog’s performance, adoption, and use #
Setup and training are only the beginning. You’ll want to improve your approach over time continuously, so keep a close eye on key metrics. These metrics should align with your organization’s strategic and outcomes-based data governance goals.
Below are areas you might want to monitor, along with example metrics you can use for each:
- Adoption: NPS surveys and CSAT scores for your data catalog
- Data literacy: Surveys and questionnaires on employees’ comfort level with data
- Financial ROI: Time-to-value for data projects and time-to-resolution for data issues
- Data quality: Number of data issue escalations or incidents
- Compliance: Percentage of assets tagged with classification
- Performance: Time to run syncs and percentage of successful syncs
Setting your key metrics before the implementation is useful to create a baseline. After the initial implementation, assess with a report card that scores vital numbers.
Take a look at this implementation success report from the tech company Elastic, known for its Elasticsearch product. The report includes nine key metrics, including the number of BI reports mapped, the percentage of tables and columns with lineage, and the number of assets tagged with classifications.
Following this initial assessment, monitor these metrics regularly to detect potential declines.
While executing a data catalog implementation plan, you may encounter some critical challenges. Let’s explore how to tackle them.
6 common data catalog implementation challenges and how to overcome them #
Even with a detailed plan, data catalog implementations bring challenges. Here are a few common issues you may face:
- Connectivity and compatibility
- Data and metadata updates
- Data search and discovery
- Adoption across all users
- Security and compliance
- ROI measurement
Let’s explore each challenge further, along with ways to manage them.
1. Connectivity and compatibility #
Connecting data sources to the data catalog can be tricky. This is either due to potential capability limitations of the source itself (for example, the API the source uses may not be compatible with the data catalog) or because of complexities in the environment and security configurations.
A data catalog with native connectors, APIs, and an open architecture is easier to connect and leverage for reliable syncs.
2. Data and metadata updates #
Data and metadata can quickly go stale if syncs are not run successfully on a regular basis. If users think that metadata in your catalog is outdated or untrustworthy, they are less likely to rely on or use the catalog in the long term.
Automating metadata ingestion can eliminate a cumbersome, manual process. Moreover, enabling an automated, bidirectional data flow can ensure that your data and metadata are constantly updated across your data estate.
3. Data search and discovery #
If users can’t easily find the data they want, they’re unlikely to return to the data catalog, slowing adoption.
Data search and discovery should be seamless, incorporating natural language searches, no-code querying, and searching within the business context. Besides using a data catalog that offers these features, you can also set up ongoing training to help users understand the best ways to find the data they need with minimal technical know-how.
4. Adoption across all users #
Users may resist adopting a new tool if they don’t want to change their existing workflows. Additionally, users who feel uncomfortable with data or have low data literacy may need to learn how to interact with the tool to perform their regular tasks.
Incorporating the data catalog into users’ existing processes will help them understand how to interact with it on a daily basis. As mentioned earlier, facilitating embedded collaboration with your data catalog will make the catalog beneficial to users by reducing context switching and promoting flow.
Simultaneously, promoting data literacy and offering training can also help.
5. Security and compliance #
Like all data tools, data catalogs must comply with data-centric regulations like GDPR and CCPA. Failure to comply or an organization’s failure to implement a catalog properly can lead to penalties and reputational damage.
Incorporating a robust, community-centric data governance framework will hold everyone accountable for the data they create, consume, and share. Additionally, consider integrating the catalog into external audits, such as SOC 2, and cybersecurity assessments, like PenTests and red team testing.
6. ROI measurement #
Implementing a data catalog is a significant undertaking that requires many resources. As a result, organizations want to ensure that they see a return on their investment. To track ROI, align the data catalog’s outcomes with your strategic goals. These goals might include:
- Cost savings related to data project time-to-value
- Data issue time-to-resolution
- Data practitioner time-to-onboard
Alternatively, these goals could be broader in scope, like company-wide data literacy or your organization’s level of data governance maturity. In any scenario, aim to identify the most critical aspects when implementing a data catalog and ensure you have clearly defined parameters to determine if you’ve moved the needle.
Data catalog implementation plan: The bottom line #
Implementing a data catalog has many inherent challenges. As a result, creating a plan that accounts for your business goals, data stack maturity, and resource availability is critical. Follow the above ten-step plan to ensure you’re setting up your data catalog for success across your organization.
Data Catalog Implementation Plan: Related Reads #
- Data Catalogs 101
- Data Catalog Demo 101: What to Expect, Questions to Ask, and More
- Data catalogs in 2024
- 5 Main Benefits of a Data Catalog
- Data Cataloging Process: Challenges, Steps, and Success Factors
- Data Catalog Business Value: Assessment Factors, Benefits, and ROI Calculation
- Who Uses a Data Catalog & How to Drive Positive Outcomes?
- 15 Essential Features of Data Catalogs to Look for in 2024
- Data Catalog Adoption: What Limits It and How to Drive It Effectively
Share this article