What Is DataOps?

November 26th, 2020

What is a data catalog? Image of catalogued books in a library

DataOps can help you bring together your data, team, tools and processes to become a truly data-driven organization.

Did you know that Gartner listed DataOps (i.e. data operations) as one of the three innovation triggers listed in data management in 2018? That’s because there’s a lot of interest around the concept. Several people are actively taking part in conversations around DataOps and as a result, it's now a buzzword in the data universe—everyone wants it, but not many know how.

Here’s how Nick Heudecker, Research Vice President at Gartner, describes all the buzz around DataOps:

"DataOps is a new practice without any standards or frameworks. Currently, a growing number of technology providers have started using the term when talking about their offerings and we are also seeing data and analytics teams asking about the concept. The hype is present and DataOps will quickly move up on the Hype Cycle."

All of this can make it overwhelming for any data and analytics leader looking for a better way to manage data. That’s why we’ve created this resource answering frequently asked questions about DataOps such as:

  • What is DataOps?
  • Do you really need it?
  • What are the principles of DataOps?
  • How does it benefit your organization?

What is DataOps?

Many believe DataOps is a tool that you buy to fix your data problems magically. Or that DataOps is just DevOps for data pipelines. This, in turn, leads to another misconception—DataOps is the sole responsibility of your data engineers (short answer: it’s the responsibility of the entire organization and not just a chosen few). So let’s debunk these myths (and others that you might have) by firstly defining DataOps.

DataOps as defined by Gartner & Other Leaders in the Data Community:

"DataOps is a collaborative data management practice, really focused on improving communication, integration, and automation of data flow between managers and consumers of data within an organization."

And now, the DataOps definition from Andy Palmer—the man credited with popularizing the term:

"DataOps is a data management method that emphasizes communication, collaboration, integration, automation and measurement of cooperation between data engineers, data scientists and other data professionals."

Notice how both definitions go beyond technology? They emphasize terms such as communication, collaboration, integration and cooperation. Also, notice how they refer to diverse roles within data teams? That’s because:

"DataOps is all about bringing together the tools you love, the processes you need and your people, in a single place for better data management within your organization."

At least, that’s how we think about DataOps at Atlan.

Getting started: do you need DataOps?

To start, let’s pick the easiest question to answer—whether or not you need DataOps. Here’s a quick check. Do you know:

  • Where does your data come from and what it means?
  • Where does all of your data currently reside?
  • If everyone within your organization has access to the data they need?

If you’re unable to answer (or unsure of the answers to) even one of the questions above then, without a doubt, you need DataOps. One down, only a million more to go! So let’s move on to the next question, which is also often the most misunderstood one.

What are the principles of DataOps?

And how does it relate to Agile, DevOps or Lean Manufacturing?
DataOps takes inspiration from principles of Agile, DevOps and Lean Manufacturing - and involves the same to have better management of data teams, processes and people - which is crucial - as being data-driven can be a significant moat for your business, in this decade and even next.

1. Agile and DataOps

What is Agile?

The agile methodology is an iterative project management principle for software projects. With Agile, IT teams can release new software within a few hours (i.e. continuous delivery), not months, without compromising on quality.

How can data teams benefit from Agile?

Data teams can use the principles of Agile to work with big data and drive quick business decision-making. Let’s say that today, your data team takes two months to respond to business changes. This, in turn, delays business operations and leads to a lot of friction between your IT and business teams.

With DataOps, you can drastically reduce the time you spend finding the right data or bringing data science models into production. As a result, IT can change and adapt at the speed of business. And hey, here’s the best part: what your data team does isn’t a black box for your business teams anymore.

2. DevOps and DataOps

What is DevOps?

DevOps breaks down the silos between development and operations teams within organizations. It makes software development and deployment faster, easier and more collaborative. How can data teams benefit from DevOps?

Data teams working in silos can use the principles of DevOps to collaborate better and deploy faster. For example, your data scientists depend on either engineering or IT to deploy their models—from exploratory data analysis to deploying machine learning algorithms. With DataOps, they can deploy their models themselves and perform analysis quickly. No more dependencies. #self-service analytics

Quick note (and we cannot emphasize this enough): DataOps isn’t just DevOps with data pipelines. The problem that DevOps solves is still between two highly technical teams—software development and IT. What DataOps has to deal with is diverse technical as well as business teams. So the challenges data teams face are more complex.

3. Lean manufacturing and DataOps

What is lean manufacturing?

Manufacturing happens in pipelines—raw materials flow through various manufacturing workstations to be transformed into finished goods. Lean manufacturing ensures minimal waste and greater efficiency without sacrificing product quality.

How can data teams benefit from lean manufacturing?

Data teams build pipelines (think ETL/ELT) to transform data into insightful reports or visualizations

Let’s say that today, your data engineers spend most of their time taking the models (that your data scientists built) into production, building pipelines and fixing pipeline issues. With DataOps, that time goes down significantly.

So you see, DataOps uses the principles of Agile, DevOps and Lean Manufacturing for better management of your data, your processes and your teams. All that sounds good on paper. But is there really a need for DataOps in organizations? To answer that, let’s take a step back and take a stroll through history.

What led to the rise of DataOps?

This has been widely recognized as the data decade, naturally organizations are investing to ensure data teams can continue to scale in productivity, efficiency and innovation. Which is where DataOps comes into the picture.

"While organizations are spending more on data and analytics initiatives, they still struggle to get any value out of it. The top reason is difficulty in showing ROI (Return on Investment)—getting stakeholders to believe."

- Gartner

Another reason is the rise in the number of consumers of data within an organization—each with a unique set of skills, tools and expertise. Leaders of data teams, especially CDOs, are expected to deliver value to the business with data, respond to ad hoc demands, ensure their teams are productive while managing all processes related to data management. Boy, that’s a tall order! Let's delve deeper into each of these struggles.

1. Massive volumes of complex data

It all started with the rise of big data. Any business that you can think of works with large volumes of data coming from various sources in different formats. In large organizations, the data landscape is complex— tens of thousands of data sources and formats. Some examples include:

  • Financial transactions
  • CRM data
  • Online reviews and comments
  • Customer information (which includes sensitive data that’s subject to data compliance regulations and privacy laws)

However, you cannot use this information as-is to answer your strategic questions such as where to open your next store, what products do your target customers want or which global markets should you target.

2. Technology overload

To answer your business questions, the data needs to be in a format that you can understand and use for analysis. That’s why all the data you gather undergoes a series of transformations (i.e. data and analytics pipelines). The data is profiled, cleaned, transformed and stored in a secure location to ensure data quality, integrity and relevance. This last bit is extremely critical for complying with regulations and policies around data protection (aka data governance).

Now, for each of these processes mentioned above, you might be using various tools from data cataloging and data profiling tools to analytics and reporting tools—leading to technology overload.

Techonology for Dataops

Technology overload

3. Diverse roles and mandates

The people using the tools and technologies to work on your data (aka the humans of data) are also diverse.

  • Data engineers focus on data preparation and transformation
  • Data scientists worry about getting the right data for their algorithms
  • Analysts care about building daily/weekly reports and visualizations
  • IT cares about maintaining data access protocols and guaranteeing data quality, security and integrity
  • Business managers are keen on finding out whether the business is flourishing

Data team in chaos mode

Challenges that the humans of data face

Bringing together diverse technologies, processes and people with different mandates creates collaboration overhead and friction between teams. Sounds complex? It is. And that’s why we need a DataOps framework in place.

How does DataOps benefit your data team?

As we mentioned earlier, the humans of data are a diverse lot. See how DataOps makes things easier for them and helps them do their lives’ best work.

Problems before Dataops

The current situation of a data team, without DataOps in the picture

Solutions after Dataops

A data team after implementing DataOps

  • True data democratization: Universal access to data for everyone within the organization who may benefit from it.
  • Faster time to insight: Since everyone has equal visibility of and access to data, they can get real time insights and implement for the better.
  • Strong governance: DataOps ensures to standardize data creation, usage and deletion policies ensuring central data governance.

You can have a DataOps framework in place by setting up processes and then finding tools that take care of each process. For example, one of the processes you have could be related to data quality checks—your analysts and business users get together to define what sort of data can enter your systems and set up some checks that automatically weed out bad data at source.

Borrowing from lean manufacturing, this process is a quick way to ensure better quality and building more trust in your data. Now, you could use a bunch of open-source engineering tools and write a few Python/R scripts … or you could use one platform that lets even a business user configure and run these quality checks in minutes. (Curious to know how you can set up quality checks? Check out our article on the topic here.)

Free Guide: How to Build a Business Case for DataOps.

Learn how to map the costs and efforts of setting up DataOps to tangible business outcomes. Download now!