Modern Data Team 101: A Roster of Diverse Talent
October 7th, 2022
Data teams are the most diverse teams ever created and can add tremendous value to an organization. In this blog, we’ll explore how to build and structure a modern data team.
What are the roles in a modern data team?
Data workers bring a diverse set of skills to an organization, from strengths in math and programming to talents in visualization and storytelling. A modern data team usually includes a mix of the following roles:
- Data Engineer
- Data Analyst
- Data Scientist
- Data Analytics Engineer
- Machine Learning (ML) Engineer
- BI Developer
- Data Product Owner
- Data Steward
- Data Architect
Data engineers design, build and maintain datasets used in data projects. This also includes maintaining data lakes and other data repositories.
A data engineer’s responsibilities largely revolve around preparing and overseeing the data pipeline. For example, a data engineer might collect data from multiple sources, then cleanse and transform it into an acceptable format for colleagues to use. A data engineer makes a good first hire when filling out a data team.
Data Analysts run analysis on the structured data provided by the data engineer to identify trends and patterns. They are then tasked with translating the data into actionable insights and presenting this information in an intelligible way to a non-technical audience, such as marketing or sales departments who can then create a course of action.
In a way, they try to make sense of past and present data. For example, an airline might ask its data analysts to determine which current routes are least profitable and should be reconsidered.
Data scientists are more experienced data analysts who primarily focus on building predictive models and machine learning algorithms to mine big data sets and predict the unknown future.
They are usually experts in statistics and mathematics, experienced with SQL, skilled in programming languages, and often work with unstructured data. For instance, aA cruise line might lean on data scientists to forecast upcoming demand for a range of new destinations so the company can determine which is likely to be the most profitable.
Data Analytics Engineers
Data analytics engineers blend and support data engineer and data analyst roles. They are responsible for cleansing, organizing, and transforming data compiled by the data engineer. Their job also entails building and maintaining complex databases, as well as using analytics to reveal business intelligence, then presenting this information to stakeholders.
Data analytics engineers are agile, able to understand the business context and use data to design solutions. It’s a generalist role that is becoming increasingly popular as organizations look for a jack of all trades who can do it all.
Machine Learning (ML) Engineers
Machine learning (ML) engineers are responsible for the MLOps (machine learning operations) that are used to train, advance, and serve models. They are tasked with writing code and deploying machine learning products, for example, social media algorithms, fraud detection systems, and even the arrival times computed by applications like Google Maps or Uber.
In the context of a modern data team, a data scientist will build a predictive algorithm then hand it off to the ML engineer who will ensure the model is scalable and functions as intended.
BI developers use data analytics to create business intelligence an organization can leverage to improve processes, increase productivity and efficiency, and add stakeholder value. They spend time developing, deploying and maintaining BI tools, as well as translating highly technical language into intelligible insights by producing reports everyone in the organization can comprehend.
For example, a BI developer might be tasked with projecting the financial impact of sourcing raw materials from Vietnam vs.China and then presenting the findings to stakeholders. This doesn’t require building predictive algorithms, separating BI developers from data scientists.
Data Product Owner
Data product owners act as managers or coaches of the data team. They are responsible for developing the team’s vision for leveraging data, as well as the strategy and execution for achieving data-related goals.
The job requires them to be creative thinkers, strong leaders, and effective communicators as they serve as a liaison between the data team, executives, partners, customers, and stakeholders. Say, a ridesharing company might hire a data product owner to oversee the creation and implementation of a new algorithm that matches small freight with truck owners.
Data stewards lead data management and governance at an organization. They craft, implement and oversee data policies and standards so data assets can be used by colleagues on and off the data team.
They are responsible for assuring quality and trust of the data while ensuring their organization operates in compliance with local, regional, and national data laws. A bank might hire data stewards to ensure financial data is accurate, accessible, standardized, up to date, and secure.
Data architects are usually senior-level executives charged with creating the “blueprint” for an organization’s data management system. They formulate data strategy, craft standards of data quality, map the flow of data within the organization, and plan how data will be secured.
For example, a clothing retailer might bring on a data architect who will design a secure and rules-compliant data infrastructure that can be used to derive business intelligence on how to reach new customers.
How to structure a data team?
The structure of your data team is based on your organization’s data-driven efforts and size. Data teams are often structured around a centralized, embedded, or hub and spoke model.
In the fully centralized data team model, all data workers, and the technology they use, are owned by a central data team. If someone from marketing or sales has a data-related request, they’ll turn to the data team.
Benefits of this model include:
- Closer alignment of data resources since they’re all housed together.
- Greater collaboration among engineers, analysts, and scientists since they’re all working together.
- Enhanced mentorship as junior data engineers, scientists, and analysts can learn from senior members of the team.
The challenge of this model relates to speed. Having a centralized team can slow down how data is used as departments must wait for the central data team to fulfill their requests for data insights. This, in turn, slows down decision-making. As such, a centralized team is most effective for small organizations where requests won’t pull the department simultaneously in too many directions.
In the embedded (or decentralized) model, multiple, mini data teams are embedded within separate business departments – the marketing, sales, finance, product teams will each have their own data practitioners supporting them.
The big advantage of this model is speed. By having access to their own data workers, departments don’t need to wait for a centralized data team to handle a data-related request (such as a predictive model). They can simply turn to their [close] colleague who can get it done. As such, the model is useful for growing organizations looking to move quickly.
However, the challenges of the embedded model are known to be knowledge silos, limited mentorship opportunities, and fewer opportunities for career growth.
Hub and Spoke
The hub and spoke approach combines the previous models by having a centralized team (the hub) that handles data engineering and company-wide analytics, along with data analysts/scientists (spokes) embedded within individual business units. These data practitioners understand their unit well enough to design the right data-inspired solutions.
The challenge with this model is directly related to cost. To be effective, it requires numerous data employees. It’s unsurprising that organizations shift to this model as they become large, established, and equipped with the depth of financial resources to match.
The size of your data team should correlate with just how data-centric your organization is or wishes to be. In short, the larger the datasets, the larger the team. If an organization chooses to scale from being data-informed to data-driven to data-led, it can expect to add more members to its data team.
Data Team Leader – the Chief Data Officer
It’s becoming common for organizations to hire a Chief Data Officer (CDO) who is responsible for overseeing the governance and utilization of data assets. Though this is not a tech position, CDOs are often former data practitioners themselves. These individuals have high business acumen, and often seek out ways in which data can provide solutions.
Capital One appointed the first CDO in 2002. Over the next decade, 12% of major companies reported having a CDO role. By 2019, that number had surged to 67.9%. CDOs are business strategists who commonly report directly to the CEO, sometimes the COO (Chief Operating Officer), and even less frequently, the CFO (Chief Financial Officer). Where the CDO sits within an organization is usually determined by how central data is to the organization. The more central or essential data is, the closer the CDO sits to the CEO.
CDO vs CIO vs CTO
There is a difference between the CDO and the CIO (Chief Information Officer). The CDO is more concerned with deriving business insights from data, while a CIO (Chief Information Officer) handles the technology for keeping data safe and other cybersecurity-related matters. CDOs often work in collaboration with the CIO and/or Chief Technology Officer (CTO), rather than report to them.
Look to the Data Team When Building Data Culture
We’ve previously discussed that organizations must take intentional steps to build a healthy data culture. A healthy data culture is one that fosters collaboration on data throughout an entire organization. It’s one in which data is democratized, and not used exclusively by the data team.
Diversity is a key component in building a healthy data culture. Organizations looking to define or improve their data culture should look to their data team, which is inherently diverse, for inspiration. Each data team member, and the distinct talents they bring, play a crucial role in the “raw data to insights” process, and in making that process even more efficient.
Diversity is the biggest strength and attribute of a data team, but can also turn into a blocker if the data team is not empowered with the right resources and culture.
We were a data team first, and we learned firsthand the value of building a modern data team. Learn how we created a modern data culture at Atlan and obtained 6X greater agility.
Modern data teams: Related reads
- Modern data culture: The open secret to great data teams
- What is modern data stack: History, components, platforms, and the future
- What is a modern data platform: Components, capabilities, and tools
- Modern data catalogs: 5 essential features and evaluation guide
Photo by fauxels