How Brainly Builds a Data Mesh on Snowflake with Atlan
It’s a problem any student can relate to. You’re learning a new concept, and you’re stuck on how best to answer a question, or even understand altogether. It’s the universality of this experience that’s made Brainly the world’s most popular education app, serving students, parents, and educators, helping them answer questions and strengthen skills across subjects like English, Math, Science, and History.
Serving hundreds of millions of users, across diverse personas and subject matter, Brainly’s business generates an astounding amount of data, and their organization depends on it to better serve their users and continue their growth.
Illustrating some of what’s worked so well for Brainly’s data team, Kasia Bodzioch-Marczewska, their Domain Lead of Data Engineering, joined Atlan at the 2023 Gartner Data & Analytics Summit in London, sharing the progress her team has made on Data Mesh, how Active Metadata Management can help drive necessary cultural and technical shift, and key takeaways for data leaders considering a similar journey.
“We have hundreds of millions of students, educators, and general users in our application. As you can imagine, it also results in a lot of data,” Kasia explained. “And because Brainly is a global platform, we operate in 35 different countries, and we have a really extensive knowledge base for all school subjects and grades.”
With data from these hundreds of millions of users, each interacting with content on a regular basis, the Brainly team works tirelessly to put valuable data to work, most recently releasing Ginny, an AI-powered helper that enables students to expand or ask follow-up questions on what they learn in the app. “We are solving educational challenges in ways that no one has tried before, and data plays a critical role in this process,” Kasia shared.
Decision on Data Mesh
Recognizing the potential of activating Brainly’s data, their team began to consider implementing Data Mesh to better organize work, and encourage full ownership and stewardship of their data.
A data mesh is a technical and cultural approach to building a decentralized architecture that organizes data by a specific business domain, providing more ownership to data producers.
The Brainly team began their Data Mesh journey by defining two critical dimensions to master. First was technical, investing in technology that would better enable transparency, team-to-team collaboration, and data quality standards. Second, and perhaps more importantly, was cultural, enabling ownership and accountability, despite a decentralized governance model.
Further enhancing the value they could yield from Data Mesh, and treating data as a product, Kasia and her team found important alignment with the way Brainly’s product teams were organized and run.
“A few years back we decided, as a company, to decentralize our product teams,” Kasia shared. “Every product team at Brainly is independent and has their own tools and data.”
While this model paid dividends for innovation and agility, making Brainly the leader in education technology that it is today, the siloed nature and ownership of this data meant frustrating back-and-forth whenever one team needed another’s data.
“If we think about this data and this setup, if a program department like Tutoring, for example, wants to utilize financial data, they would have to go and ask. There was a very long process to get access to the right data, and to figure out which data you could use for your analysis,” Kasia explained.
The Data Mesh concept, combined with Brainly’s unique model of product domain ownership, was a clear and exciting opportunity. But to eliminate the team-to-team collaboration friction inherent to this strategy, Kasia’s team evaluated the Active Metadata Management market in search of a solution.
Data Catalog – Enhance Data Mesh
“We figured out that we needed a data catalog to support us with the cultural piece of Data Mesh. We went through several proofs of concept, and through those, we chose Atlan as our data catalog.”
In the early stages of their Data Mesh journey, Atlan has proved to be a crucial partner as Brainly migrates to a new data platform that will better support their new way of working.
“We’ve onboarded into Atlan all of our data sets from both our legacy platform and our new data platform” Kasia explained. “And for migration purposes, we’re using Atlan to identify these sources of the products that are to be migrated to the new data platform, as well as to figure out the downstream objects affected by decommissioning in the legacy platform.”
Brainly’s new data platform consumes a vast array of data sources, both structured and unstructured, passing through to a raw data lake in AWS S3, Spark and Glue for processing, through to a data mart using Snowflake, and Tableau or Metabase for visualization and analysis.
As users migrate from legacy-to-modern, Atlan serves as their gateway to understanding and applying data and the new platform’s capabilities. “Obviously, we have users consuming all of this. We have all of our assets in Atlan; everything integrated into Atlan. Basically, Atlan is the place where all data sharing happens,” Kasia explained.
Ten domain teams at Brainly are already using their new data platform, and increasingly depend on Atlan for crucial context about available data. “We see more and more people defaulting to the data catalog instead of going through a lengthy process of going back-and-forth (with questions),” Kasia explained.
While Brainly still has work to do before Data Mesh is fully implemented, Kasia believes there are several potential challenges and best practices that peer data leaders should keep in mind, should they decide Data Mesh is right for them.
Data Ownership Has to be Clear
“Since Data Governance is federated in the Data Mesh model, ownership is critical. And it might sound weird, but the key to keeping governance federated is to have a strong central team. We need to enforce our processes somehow, and a central team can lead other teams to do that.”
“We have seen domains maturing in their process at different speeds. The central team needs to be flexible when implementing these changes, and what’s worked best for us, so far, is that we build our strategies separately for each domain.”
Culture Shift Takes Time
“Be patient. The most important part is the culture shift that has to happen, and it’s going to take more time than you anticipated in the first place.”
- Modern Data Culture: The Open Secret to Great Data Teams
- The Metadata Foundation that Your Data Mesh Needs
Share this article