Build vs. Buy Data Catalog: What Should Factor Into Your Decision Making?

Updated June 05th, 2024
header image

Share this article

Central to the challenge of exponential data growth is the data catalog — a structured inventory of your data that enhances data governance, security, and discoverability.

However, businesses often stumble when faced with the “build vs. buy” decision concerning data catalogs. In this article, we’ll untangle this complex but crucial decision-making process.


Is Open Source really free? Estimate the cost of deploying an open-source data catalog 👉 Download Free Calculator


Why the build vs. buy decision matters #

Understanding the gravity of the build vs. buy decision for a data cataloging tool underscores its strategic significance. However, the decision need not be daunting. This is a great opportunity for your organization to critically evaluate its requirements, assets, and aspirations while striking the ideal balance between customization and convenience.

Choosing to build allows you to leverage your financial strength and IT resources, creating a solution tailored precisely to your needs. On the other hand, opting to buy opens doors to quick implementation, capitalizing on the technical expertise of seasoned vendors.

A well-defined set of criteria helps streamline this decision, mitigating risks and enhancing the effectiveness of the chosen solution. Let’s explore the criteria further.

Also, Read -> How to Build a Data Catalog: An 8-Step Guide to Get You Started


Table of content #

  1. Why the build vs. buy decision matters
  2. Considerations for building vs. buying a data catalog
  3. When to buy a data catalog
  4. When to build a data catalog
  5. The build vs. buy spectrum
  6. Making the build vs. buy decision
  7. Conclusion
  8. Build vs. buy Data catalog: Related reads

Considerations for building vs. buying a data catalog #

When deciding whether to build or buy your data catalog, a multi-dimensional evaluation can help ensure your choice is well-rounded and informed.

Here are the major key considerations:

  • Total Cost of Ownership (TCO)
  • Value creation through differentiation
  • Resource gaps
  • Sunk cost
  • Opportunity costs
  • Holdup risks
  • Spectrum of choices

Let’s look at each in turn.

Total Cost of Ownership (TCO) #


The decision to build or buy isn’t just about immediate costs. It encompasses the total cost of ownership over time. Build costs will include planning, developing, maintaining, educating users, and providing ongoing support.

Value creation through differentiation #


A critical factor to consider is whether a data catalog is so foundational to your business that it necessitates a bespoke solution. This was the case for Lyft, which developed its own data catalog tool, Amundsen, to meet its unique requirements.

If your data catalog can significantly differentiate your market offering, then building might be the right choice.

Resource gap #


If you decide to build, ensure that your organization possesses the necessary expertise and other resources to carry out the project. Deficiencies in the required engineering skills, for example, could lead to subpar outcomes.

Established vendors often have all the required elements in place, including installation, security, training, updates, and other maintenance practices.

Sunk cost #


Sunk cost refers to continued investment in a project even when it isn’t providing the expected return. If you’re building and your project isn’t delivering value, it might be time to reconsider your approach to avoid the sunk cost fallacy.

Here’s an explanation of the sunk cost fallacy:

The Sunk Cost Fallacy is the phenomenon where “a person is reluctant to abandon a strategy or course of action because they have invested in it, even when it is clear that abandonment would be more beneficial.”

Opportunity cost #


Every decision involves trade-offs. If you choose to build a data catalog, consider what other projects or opportunities you’re potentially giving up.

Here’s an example of the rationale behind opportunity costs:

Opportunity costs occur whenever there is a tradeoff between two options. For example, either one of two things can be done, A or B. If you decide on A, then the benefits of B cannot be realized. These lost benefits of B are the opportunity costs of choosing A.

If you opt to buy, you’ll need to dedicate resources to set up, development, integration, and training.

Does your company have the necessary skill sets for these tasks?

Holdup risks #


This is the delay between making the decision to build or buy and when you actually start experiencing value from the system. Building projects can extend indefinitely. Meanwhile, buying can also backfire if the data catalog doesn’t integrate well with your systems.

Spectrum of choices #


The build vs. buy decision isn’t a binary one. As we’ll discuss in greater detail, it falls on a spectrum that includes other options such as leasing or borrowing software solutions. Consider all possibilities before making your final decision.


When to buy a data catalog #

The decision to buy a data catalog tool often hinges on a calculation of return on investment (ROI).

For many organizations, buying a ready-made solution can yield the most favorable ROI, provided several key conditions are met:

  • Matching features: Investigate whether the tool offers essential features, such as robust data governance capabilities and advanced functionality like data lineage.
  • Integration capability: It’s vital that the data catalog can seamlessly integrate with your primary data sources. Look for solutions that provide connectors or similar technology, allowing for smooth interoperability with your existing data infrastructure.
  • Adaptability to existing automation via API: Your new data catalog tool should be capable of integrating into your existing data pipelines and software automation processes, ideally through an open API architecture. This will enable you to leverage your existing systems and infrastructure, optimizing overall data operations.
  • Scalability: The chosen data catalog tool should be able to scale and adapt to your growing data demands so the tool remains useful and effective with growth.
  • In-house expertise for integration: Finally, ensure that your staff has the requisite skills to work collaboratively with the vendor on integrating the data catalog tool into your systems for a smooth transition.

When these conditions align, buying a data catalog tool can be a powerful, effective solution to your data management needs, delivering a high ROI.


When to build a data catalog #

Embarking on the journey to build a data catalog tool is a significant undertaking. However, under certain circumstances, the build approach can provide the most suitable solution:

If your data catalog is anticipated to be a flagship feature or key differentiator of your business, building your own might be the best way to ensure it aligns perfectly with your unique vision and business model.

You may also have integration needs that aren’t catered to by current data catalog tools. Before switching to Atlan, the Director of Engineering at Delhivery mentioned the reasons behind deciding to build their own data catalog:

Unfortunately, we couldn’t find the right solution. Each one (referring to Alation, Waterline, and Collibra) was either missing non-negotiable features or the TCO was just too high for us (due to expensive set-up, licensing, and professional service fees). We didn’t want to settle for something that wasn’t quite right, since setting up a data catalog is a huge commitment. So we realized we needed to build and customize our own internal solution.

Finally, if no data catalog in the market offers an extensions/API system that allows you to build higher-level features required by your business, this is a strong indicator for the build approach.


The build vs. buy spectrum: It’s not always black and white #

The decision between building or buying a data catalog tool doesn’t have to be a binary one. This spectrum has multiple paths potentially leading to your ideal solution. In fact, a hybrid approach—combining elements of both building and buying—often provides the best of both worlds.

As Delhivery learned first-hand, variations on the hybrid approach can help you grow into the right data catalog solution over time:

  1. Buy, then build
  2. Build, then buy

Let’s explore each approach further.

Buy, then build #


This approach involves initially purchasing a data catalog tool that covers most of your fundamental needs and is open by default. Then, you build additional custom features or integrations that cater to your specific requirements.

A comprehensive data catalog tool acts as the foundation. Your custom integrations ensure that your data catalog fully meshes with your existing IT landscape and workflows.

Build, then buy #


Alternatively, you might choose to start by building your own solution. Focus on unique or complex requirements that off-the-shelf products don’t address. Create your own basic data catalog system, then purchase and integrate standalone features or tools to enhance its functionality.

Each of these hybrid approaches offers a distinct blend of customization and convenience, enabling you to optimize your data catalog to your exact needs.


Making the build vs. buy decision #

Making the decision to build or buy a data catalog usually involves a thoughtful process such as the Resource Pathways Framework to organize an understanding of your needs and resources.

Here’s a short, step-by-step example to guide your decision-making process:

  1. Start by clearly identifying your data catalog needs and defining the problem you aim to solve.
  2. Evaluate the resources available to you, including financial assets, the skills and time of your team, and existing technology infrastructure.
  3. Estimate the potential costs of both building and buying, including not only upfront costs, but also the total lifetime ownership costs, as mentioned earlier in this article.
  4. Stay agile, and define the challenges associated with both building and buying a data catalog.
  5. Conduct low-risk experiments or prototypes, limited in scope, to further inform your decision.
  6. Use the results of these experiments to address unknowns and assist in making an informed decision.

Remember, the goal is to find the best solution for your organization’s specific needs. Systematically evaluate your options using an agile, experimental approach to make confident, informed build vs. buy decisions for your data catalog tool.


Conclusion #

The build vs. buy decision for a data catalog tool hinges on your unique use cases and resources. Both paths have their advantages. However, in today’s data-centric world, buying a modern data catalog tool with an open API architecture can often provide a distinct edge.

Ultimately, the choice between building and buying a data catalog is a strategic one. By carefully considering your specific needs, resources, and potential solutions like Atlan, you can make an informed decision that advances your data management capabilities.

Considering buying a data catalog? Explore Atlan’s rich feature set today and embark on a new chapter in your data management journey.



Share this article

[Website env: production]