DataOps can help you bring together your data, team, tools and processes to become a truly data-driven organization.
Did you know that Gartner listed DataOps (i.e. data operations) as one of the three innovation triggers listed in data management in 2018? That’s because there’s a lot of interest around the concept. Several people are actively taking part in conversations around DataOps and as a result, it's now a buzzword in the data universe—everyone wants it, but not many know how.
Here’s how Nick Heudecker, Research Vice President at Gartner, describes all the buzz around DataOps:
"DataOps is a new practice without any standards or frameworks. Currently, a growing number of technology providers have started using the term when talking about their offerings and we are also seeing data and analytics teams asking about the concept. The hype is present and DataOps will quickly move up on the Hype Cycle."
All of this can make it overwhelming for any data and analytics leader looking for a better way to manage data. That’s why we’ve created this resource answering frequently asked questions about DataOps such as:
Do you need DataOps?
We’re going to flip the order of questions listed above. To start, let’s pick the easiest question to answer—whether or not you need DataOps.
- Here’s a quick check. Do you know:
- Where does your data come from and what it means?
- Where does all of your data currently reside?
- If everyone within your organization has access to the data they need?
If you’re unable to answer (or unsure of the answers to) even one of the questions above then, without a doubt, you need DataOps. One down, only a million more to go! So let’s move on to the next question, which is also often the most misunderstood one.
What is DataOps?
Many believe DataOps is a tool that you buy to fix your data problems magically. Or that DataOps is just DevOps for data pipelines. This, in turn, leads to another misconception—DataOps is the sole responsibility of your data engineers (short answer: it’s the responsibility of the entire organization and not just a chosen few). So let’s debunk these myths (and others that you might have) by firstly defining DataOps.
Here’s the DataOps definition from Gartner:
"DataOps is a collaborative data management practice, really focused on improving communication, integration, and automation of data flow between managers and consumers of data within an organization."
And now, the DataOps definition from Andy Palmer—the man credited with popularizing the term:
"DataOps is a data management method that emphasizes communication, collaboration, integration, automation and measurement of cooperation between data engineers, data scientists and other data professionals."
Notice how both definitions go beyond technology? They emphasize terms such as communication, collaboration, integration and cooperation. Also, notice how they refer to diverse roles within data teams? That’s because:
"DataOps is all about bringing together the tools you love, the processes you need and your people, in a single place for better data management within your organization."
- DataOps definition from Atlan
What are the principles of DataOps?
And how does it relate to Agile, DevOps or Lean Manufacturing?
1. Agile and DataOps
What is Agile?
The agile methodology is an iterative project management principle for software projects. With Agile, IT teams can release new software within a few hours (i.e. continuous delivery), not months, without compromising on quality.
How can data teams benefit from Agile?
Data teams can use the principles of Agile to work with big data and drive quick business decision-making. Let’s say that today, your data team takes two months to respond to business changes. This, in turn, delays business operations and leads to a lot of friction between your IT and business teams.
With DataOps, you can drastically reduce the time you spend finding the right data or bringing data science models into production. As a result, IT can change and adapt at the speed of business. And hey, here’s the best part: what your data team does isn’t a black box for your business teams anymore.
2. DevOps and DataOps
What is DevOps?
DevOps breaks down the silos between development and operations teams within organizations. It makes software development and deployment faster, easier and more collaborative.
How can data teams benefit from DevOps?
Data teams working in silos can use the principles of DevOps to collaborate better and deploy faster. For example, your data scientists depend on either engineering or IT to deploy their models—from exploratory data analysis to deploying machine learning algorithms.
With DataOps, they can deploy their models themselves and perform analysis quickly. No more dependencies. #self-service analytics
Quick note (and we cannot emphasize this enough): DataOps isn’t just DevOps with data pipelines. The problem that DevOps solves is still between two highly technical teams—software development and IT. What DataOps has to deal with is diverse technical as well as business teams. So the challenges data teams face are more complex.
3. Lean manufacturing and DataOps
What is lean manufacturing?
Manufacturing happens in pipelines—raw materials flow through various manufacturing workstations to be transformed into finished goods. Lean manufacturing ensures minimal waste and greater efficiency without sacrificing product quality.
How can data teams benefit from lean manufacturing?
Data teams build pipelines (think ETL/ELT) to transform data into insightful reports or visualizations.
Let’s say that today, your data engineers spend most of their time taking the models (that your data scientists built) into production, building pipelines and fixing pipeline issues. With DataOps, that time goes down significantly.
So you see, DataOps uses the principles of Agile, DevOps and Lean Manufacturing for better management of your data, your processes and your teams. All that sounds good on paper. But is there really a need for DataOps in organizations? To answer that, let’s take a step back and take a stroll through history.
What led to the rise of DataOps?
"While organizations are spending more on data and analytics initiatives, they still struggle to get any value out of it. The top reason is difficulty in showing ROI (Return on Investment)—getting stakeholders to believe."
Another reason is the rise in the number of consumers of data within an organization—each with a unique set of skills, tools and expertise. Leaders of data teams, especially CDOs, are expected to deliver value to the business with data, respond to ad hoc demands, ensure their teams are productive while managing all processes related to data management. Boy, that’s a tall order! Let's delve deeper into each of these struggles.
1. Massive volumes of complex data
It all started with the rise of big data. Any business that you can think of works with large volumes of data coming from various sources in different formats. In large organizations, the data landscape is complex— tens of thousands of data sources and formats. Some examples include:
- Financial transactions
- CRM data
- Online reviews and comments
- Customer information (which includes sensitive data that’s subject to data compliance regulations and privacy laws)
However, you cannot use this information as is to answer your strategic questions such as where to open your next store, what products do your target customers want or which global markets should you target.
2. Technology overload
To answer your business questions, the data needs to be in a format that you can understand and use for analysis. That’s why all the data you gather undergoes a series of transformations (i.e. data and analytics pipelines). The data is profiled, cleaned, transformed and stored in a secure location to ensure data quality, integrity and relevance. This last bit is extremely critical for complying with regulations and policies around data protection (aka data governance).
Now, for each of these processes mentioned above, you might be using various tools from data cataloging and data profiling tools to analytics and reporting tools—leading to technology overload.
3. Diverse roles and mandates
The people using the tools and technologies to work on your data (aka the humans of data) are also diverse.
- Data engineers focus on data preparation and transformation
- Data scientists worry about getting the right data for their algorithms
- Analysts care about building daily/weekly reports and visualizations
- IT cares about maintaining data access protocols and guaranteeing data quality, security and integrity
- Business managers are keen on finding out whether the business is flourishing
Challenges that the humans of data face
Bringing together diverse technologies, processes and people with different mandates creates collaboration overhead and friction between teams. Sounds complex? It is. And that’s why we need a DataOps framework in place.
How does DataOps benefit your data team?
As we mentioned earlier, the humans of data are a diverse lot. See how DataOps makes things easier for them and helps them do their lives’ best work.
The current situation of a data team, without DataOps in the picture
A data team after implementing DataOps
You can have a DataOps framework in place by setting up processes and then finding tools that take care of each process. For example, one of the processes you have could be related to data quality checks—your analysts and business users get together to define what sort of data can enter your systems and set up some checks that automatically weed out bad data at source.