What Is a Modern Data Platform: Components, Capabilities, and Tools
October 6th, 2022
Share this article
Data platforms are composed of technology units that collect, transform, analyze and deliver data. While data platforms are nothing new, they’re being constantly redefined. In fact, a key distinguishing feature of data platforms is expandability — the ability to grow with advances in technology. Still, there are constants that never change.
In this blog, we’ll define and explore the characteristics of the modern data platform as seen in its latest evolution.
What is a modern data platform?
A modern data platform is designed to be democratic, proactive, scalable, and flexible to respond to future technologies and evolving needs of modern data teams. It is the technological base on which data tools and applications connect or sit. Think of it as the operating system of the data stack. It provides the acquisition, storage, preparation, and analysis functions necessary so that users can make data-inspired decisions.
Just as most home computers run on one of two operating systems (Windows or Mac OS), there are only a handful of players in the cloud data platform space (AWS, Azure, Google Cloud, IBM, Oracle). The data platform hypothesis posits that consolidation around these cloud-based vendors has led to data being collected in a standard set of systems, This is beneficial for developers who can use this single point of integration to build new applications. These include dashboards or data catalogs, along with tools for orchestration, governance, and observability – all of which connect to data platforms and form the modern data stack.
The three characteristics of a modern data platform
There are really three characteristics that set the modern data platform apart from older generations:
- Self-service for diverse users
- Agile data management
- Flexible, fast, pay as you go
Self-service for diverse users
Previously, data was siloed and accessible for the exclusive use of data professionals. A modern data platform removes barriers so that everyone within an organization is able to access and leverage data.
Data democratization requires technology that is intuitive to use and features easy discovery capabilities so that users can find what they need as easily as using Google. Users should also be able to perform basic analysis, understand the context of the data, and derive insights all on their own.
Agile data management
Modern data platforms are governed by the twin principles of availability and elasticity:
- Availability refers to data being readily available in a data lake or warehouse that separates storage and compute. Splitting the two functions enables the cheap storage of large amounts of data.
- Elasticity refers to cloud-based compute functions that scale automatically to meet demand. For example, if the data team runs a lot of analysis on Monday, then processing power will expand on that day to meet the need, before contracting to save on power.
Flexible, fast, pay as you go
As previously mentioned, modern data platforms are mostly run on cloud infrastructures. Today’s cloud solutions are easy to set up, fast (by leveraging data lakes or lakehouses which reduce ETL time), and flexible so they evolve with future technologies. Plus, you don’t have to invest in building a massive on-premise infrastructure, instead, paying for only what you need.
The building blocks of a modern data platform
The foundation of any modern data platform is constructed using the same building blocks which include:
- Ingestion and transport
- Query and processing
- Analysis and output
These are where data originates. OLTP databases, logs, 3rd party APIs, file and object storage, event collectors, apps, etc. can all be sources of data.
Ingestion and transport
After data is generated at a source, it’s moved (typically in real-time or in batches) to a destination for storage, query, and analysis. In a modern data platform, this process of ingestion and transport is usually performed automatically rather than manually, which was the case with traditional data platforms.
Ingestion and transport tools include Fivetran, Hevo Data, Stitch, Pulsar, and Upsolver.
Data is commonly stored in data warehouses, and, increasingly, in data lakes or data lakehouses which allow for the fast migration of data from a source to storage. That’s because data lakes and lakehouses are able to house raw, unstructured data, delaying the time-consuming transformation process until it’s needed for analysis.
Storage tools include Amazon S3, Azure Data Lake Storage Gen2, Google Cloud Storage, Redshift, and Snowflake.
Query and processing
Data warehouses and data lakehouses allow for query and processing where users can perform simple or complex searches. The data can also be processed, that is to say, validated, sorted, summarized, and aggregated. Query and processing can also occur via Spark, a SQL query engine, or a data science/machine learning platform.
Query and processing tools include Databricks, Pandas, Dask, Confluent, and Flink.
Previously, ETL was performed locally onsite and was slow due to limited computing power. A modern data platform utilizes the latest transformation technique – cloud ETL, which leverages the power and performance of a cloud environment to execute faster data transformations, reducing time to insights. The ability to transform data in hours, rather than days, enables data to be used in predictive machine learning and artificial intelligence models (which require lots of data), driving even more valuable business intelligence.
Transformation tools include AWS Glue, dbt, Matallion, and Domo.
Analysis and output
A modern data platform provides a workspace for users to derive analysis and insights. This business intelligence can then be presented with visualizations (e.g., charts and graphs) so that it’s intelligible to all, not just those with a data background.
Analysis and output tools include Looker, Mode, and Tableau.
Metadata Management – the control plane of the modern data platform
The modern data platform is a powerful technology that is fast, simple to set up, and easy to scale - all without a significant capital investment. Still, it falls short in bringing discovery, trust, and context to data. The data is often isolated in data silos making it difficult to find, leverage, or even know what actually exists. As a result, data workers spend the majority of their time searching for and preparing data, rather than analyzing it for insights. But that’s where metadata comes in.
Metadata management has become the control plane of the modern data platform. It’s an indispensable practice in bringing context to data by providing attributes like data source, origin, owner, query logs, lineage, and quality score.
Atlan is a modern data workspace that leverages active metadata for better data discovery, lineage, cataloging, and governance. Our metadata management solution helps teams better locate, manage, organize, and understand data assets.
Modern data platform: Related reads
- Modern data teams: Roles, structure and how do you build one
- Modern data catalogs: 5 essential features and evaluation guide
- What is a data lake? Definition, architecture, and solutions
- Modern data culture: The open secret to great data teams
- What is modern data stack: History, components, platforms, and the future
- What is data democratization: Definition, benefits, and strategy
Photo by Mo
Share this article