What are Data Silos and How can You Break Them Down?

February 10th, 2021
header image

Share this article

Learn what data silos are (definition); why they pose the biggest challenge to your big data strategy, pipelines and dreams; and how to break them down.

First things first, if you’re searching for articles on the topic “data silos” and what they mean, congratulations—you’ve almost attained the state of self-realization.🧘🏽 Why?

Because let’s face it—one of the biggest challenges facing data teams or those interested in making data-driven decisions is the lack of easy access to data. You would think that this problem would have been sorted by now, right? Well, turns out it’s not.

Just like Tom Cruise keeps on coming back to sort out the baddies of the world in the wildly popular MI movie franchise, data access problems keep on cropping up as well. 🤷

Data workers spend 90% of their working week (around 36 hours) on data-related activities such as searching, preparation and analytics (with searching for and preparing data being the most common activities).

- The State of Data Science and Analytics report

That’s the bad news. Is there good news when it comes to data silos? Read on to find out!

Now buckle up for the long drive ahead—let’s dive deeper into the concept of data silos. For starters (second starters that is)…

What are data silos, what does siloed data mean and how do you define them?

Much like the term suggests, data silos are literally situations when data exists in silos or in isolation.

When data silos exist, it means that some sets of people in an organization are able to access certain data or its sources and others aren’t.

And sometimes people aren’t able to get enough information on the context of data, or its metadata—including a whole bunch of its attributes such as lineage, meaning, business glossary, changes made or source.

The biggest problem with data silos is that there is no transparency or democracy in this equation. And the problem keeps on getting compounded when you add the scale and implications of using data in an enterprise.

Are data silos a new problem?

Not really.

  • Eight years ago, in 2012, McKinsey reported that “employees spend 1.8 hours every day—9.3 hours per week, on average—searching and gathering information.”

So why are we worried about data silos today?

Data silos are a more recognizable and pressing problem now given the scale, volume, veracity, applications and implications of big data.

And the problem of data silos cuts across the lines of teams or departments.

Let’s take one of the most popular and sparkly teams in companies—marketing.

Most pioneering marketing teams today use data to make business decisions (also referred to as programmatic marketing or data-driven marketing in places).

Now if someone in marketing wants to optimize their ad spend, they will probably look at impressions generated, CTR and CPC.

Which is great—all the right boxes get ticked, right? Wrong—this data is incomplete.

This data only shows the marketer how to optimize the ads—but ideally they’d want to look at sales data and understand if they are even getting in the right kind of leads and traffic. Else the company will just end up bumping ad budgets but never seeing any increase in actual sales and customers.

Yeah, yeah. I know what you’re thinking—that this is not a data problem, it’s a mindset problem. But that’s the thing—data silos are not just about sharing the data, they are also about sharing the love, sorry… mindset.

Now, I know I’m jumping the gun here, considering that we still need to go over the challenges posed by data silos. So I’m just going to leave a crumb here, or fold the page a little if you will, so we remember to come back and understand the concept of—creating a healthy data culture and democratizing your data.

Now that we know what data silos are, let’s go on and understand what causes them in the first place. And that will tell us whether this mission is possible at all!

Causes of data silos—the baddies in the world of data

That’s like asking what causes the sun to shine or the birds to chirp. No, really!

The truth is that no matter which way you slice and dice the facts, data silos occur due to one of the two following reasons, or usually a combination:

  1. People… also known as humans of data in the Atlan universe 🧑‍🤝‍🧑
  2. Technology… associated with software, vendors and platforms 👨‍💻

1. People

When it comes to people, culture and organizational structures are the biggest challenges. Many times, departments are siloed and far removed from each other, especially in larger enterprises.

And while this is the natural order in most cases, and even seen as necessary for companies to function, the problem is that information ends up being tied to a particular department or team and doesn’t flow naturally from one desktop to another.

Simply because no one knows who else could use or is looking for that data. Multiple layers of hierarchy and management just make the problem worse.

And considering we are human, and data means power, a competitive mindset could just be getting in the way of data democratization.

2. Technology

Now, when you have all these different humans of data using data for different use cases and purposes, you are bound to have an overload of the tools and technologies they use.

For instance, maybe your Sales team uses Salesforce, but your Marketing team uses Hubspot. Yet, Salesforce might contain valuable information that your marketers could use.

Just think about it for an itty-bitty minute. How many apps and tools does your company use? (Interested in how people use apps at work? Check out this useful nugget).

And just like humans don’t talk to each other, tools don’t talk to each other. They could, but they are bound by their people and processes.

And yes—you might say that your company is moving to or has already moved to the cloud—Google storage, Azure or AWS—which is great. But this is also making things very cloudy.

Do you know what’s in your cloud data lake and who has access to it?

TL;DR: The problem is that humans don’t talk to other human beings and tools don’t talk to other tools. They don’t exchange data or its context (metadata).

<Writer inserts “We don't talk anymore song” song here as the theme song>

So, what do you think? Can you solve these problems? But more importantly, do we even want to solve the problem of data silos?

Are data silos a big enough problem to solve? What do data silos cost you?

The short answer—time, effort and money. Can you put a price on that for your enterprise?

Now for the (slightly) long answer.

Silos significantly hamper workplace collaboration and productivity

This one is a no-brainer. If you’ve ever tried to hunt down your latest sales numbers or ask someone to explain what a particular table in your data set customer_2020_US means, you know what I mean.

When teams do not share data, they end up duplicating work and effort and getting demotivated in the process.

Less data sharing = wasted storage + inconsistent data

If your data lives in a million places, it will take up extra space—just like the gazillion printed copies of last year’s sustainability report.

Irony anyone?

The cloud may give you unlimited storage, but it can also give you unlimited chaos if not managed at the organizational level. And duplicate data will only threaten the accuracy of your data and create many sources of the un-truth.

Inability to see the big data picture

Have you ever heard of the Indian fable of the three blind men and an elephant?

“It is a story of a group of blind men, who have never come across an elephant before and who learn and conceptualize what the elephant is like by touching it. Each blind man feels a different part of the elephant’s body, but only one part, such as the side or the tusk. They then describe the elephant based on their limited experience…the moral of the parable is that humans have a tendency to claim absolute truth based on their limited, subjective experience as they ignore other people’s limited, subjective experiences which may be equally true.”

Data as an elephant

Data is the big el in the room. Courtesy Wildequus

Data can be like that. The big elephant in the room. Data silos mean that you will never get a comprehensive view of data and the best way to use it for your use case.

Data silos are often associated with specific industries such as healthcare or marketing.

But the truth is that any function or organization that uses data as a moat may very well be dealing with data silos. (You can read this excellent article on data silos and customer centricity to understand how data affects customer success.)

Can we break down these data silos?

Yes—in these 2 steps of moderate difficulty levels. 🙂 #MissionPossible

1. Clean up your data culture

Remember the bookmark we added to the beginning of this blog, the one about creating a healthy data culture? Well, it’s important to educate your teammates about the importance of sharing and collaborating on data and becoming data literate.

2. Create one source of truth for your data

And if you’re a CDO or CAO, or any type of leader in an organization, even better. You can set up the data ecosystem in your company to be open and accessible, without compromising on data security, governance, and compliance.

Think one tool to rule them all.

Share this article

resource image

Data Catalog Primer - Everything You Need to Know About Data Catalogs.

Adopting a data catalog is the first step towards data discovery. In this guide, we explore the evolution of the data management ecosystem, the challenges created by traditional data catalog solutions, and what an ideal, modern-day data catalog should look like. Download now!

[Website env: production]