How Mistertemp Deprecated Two-Thirds of its Snowflake Assets with End-to-End Lineage

Updated: July 19th, 2023
Share this article
Atlan Mistertemp Success Story
STACK
snowflakefivetranairflowdbtatlan

Recruitment and Temporary Work Placements Leader Uses Automated Lineage to Deprecate Two-thirds of Data Warehouse Assets #


At a Glance #

  • Mistertemp, a leader in recruitment and temporary work based in France, sought to improve the navigability and usability of their newly implemented modern data stack (Snowflake, Fivetran, Looker, Airflow, and dbt).
  • By adopting Atlan, Mistertemp’s data team could use automated column-level lineage and popularity metrics to determine which of their data assets were used or could be deprecated.
  • As a result, Mistertemp was able to deprecate half of their Snowflake tables, representing two-thirds of their data assets, and over 60% of their Looker assets.

"The big difference now is that we are confident as a team when we’re talking about a data asset.”

Based in France, Mistertemp is a market leader in temporary work placements, servicing over 12,000 clients and 55,000 workers in 2022. As a broker between companies seeking talent and people seeking opportunity, data plays a key role in Mistertemp’s goal to align these parties as effectively as possible.

Driving that commitment to data is David Milosevic, who joined Mistertemp as Head of Data & Analytics in 2019. “My initial goal was to help find the right tools, organization, and solutions to help everyone in the company have a better understanding of data,” David shared.

Even after growing into a leader in its space, Mistertemp’s leadership refuses to be complacent. Amid the growth of remote work, changes in employee expectations, and the evolving needs of companies seeking great talent; the balance between Mistertemp, the companies they service, and the candidates they place is changing.

David explained data’s role in this transformation: “Our goal is to see how we can optimize all the exchanges we have with these different parties — sharing information from our needs to job boards, for example, or getting applications for those ads that we put on job boards. How do we optimize the information we get so that they can be matched with the needs of clients and vice versa?”

To navigate their changing market, it’s crucial that Mistertemp effectively use its data, and David’s team has been responsible for building solutions, adopting tools, and creating processes to support that journey. David encourages his team to take a proactive role in how Mistertemp uses its data, explaining, “Besides KPIs that you can put on our teams’ efforts, we are trying to go to the next step, which is to incorporate data into our processes to improve each of them.”


Mistertemp’s Modern Data Stack: Atlan + Snowflake, Fivetran, Looker, Airflow, and dbt #

“In my area, we’re mostly focusing on what we call the Modern Data Stack,” David shared. Initially selecting Fivetran to ingest data, Mistertemp’s foundational choices for their stack included Snowflake as their data warehouse and Looker as their BI layer. Added later were Airflow and dbt.

Despite adopting best-in-breed tools to support their transformation, Mistertemp’s leadership felt that a piece was missing. “I have to give credit to our CTO [Francois-Emmanuel Piacentini]. His mindset was that until we have a way to not just document, but tag, identify, and quickly search for assets, we are not the owners of our data,” David shared. “This really resonated with our team. For a long time, we couldn’t put our finger on what was missing.”

Mistertemp needed a governance and collaboration layer, integrated to and capable of navigating their increasingly complex data stack. “We needed to add something to the equation to make sure that once a need appeared (being a product need, a marketing need, a financial need, a need from a client) that we could confidently say, okay, it was done in the past or not,” David explained.

Without this layer in place, David’s team was responsible for scouring their data estate, layer by layer, each time a question about their data assets was posed. The effort to determine what assets existed, let alone the nature of those assets or the efficacy of the data, was significant. “Answering those questions took us a lot of time,” David said. “Removing this from the equation, and having everything laid out and queryable was really necessary if we wanted to step up and implement all these future use cases.”

Mistertemp’s CTO effectively communicated his vision for how their data function would need to change. It was on David and his team to get it done.


Atlan Arrives #

After a thorough search for an active metadata management platform, Mistertemp chose Atlan. “As soon as we got our hands on Atlan, the first step was to connect all our tools in our stack so that we had a big picture of everything in our area of work”, David shared. He quickly integrated Fivetran, Snowflake, dbt, and Looker with Atlan, as well as upstream systems like Salesforce and Postgres databases, offering a clear picture of Mistertemp’s data ecosystem.

“We wanted to have as much visibility as we could, and that was very easy. We only needed a couple days to set it up and make sure we were satisfied,” David added. “This was really easy and we were very satisfied to suddenly see all our assets available and queryable. We could just type ‘contract’ and find all tables or columns or reports that refer to that there.”

With a quick win in-hand, and visibility into how data moved through their stack, David’s team was ready to put this newfound capability into practice. “The first step was really easy and very rewarding. But that was not just for the fun of it,” David explained, alluding to far bigger ambitions with Atlan.


Using Atlan to Resolve Well-intended Technical Debt #

Atlan’s introduction into the Mistertemp ecosystem gave David the perspective and capability necessary to simplify their complex technical landscape.

While happy with their modern data stack, Mistertemp’s data team struggled with navigability and manageability prior to Atlan’s arrival. “A big goal we had, and want to continue to pursue, is that we want to ensure what we have in Snowflake or Looker are only data or reports that are useful,” David explained. “It’s so easy with modern data stack tools to basically connect everything you have and grab the data.”

Excited by the prospect of better servicing their business partners, and with business partners excited about freely available data, David’s team had spent previous years connecting numerous downstream systems and building numerous reports for one-off questions. “Back three years ago, the goal was to have all the data connected,” David shared.

Whenever a new data source was requested, David’s team once found it easiest to go to Fivetran and connect to the source system to reveal the available tables. Rather than diving into these systems to choose only relevant data, it was simpler and faster to recreate the data in Snowflake immediately, consuming what was relevant downstream.

“With tools like Fivetran, it’s very easy to add new connectors,” David said. And over time, decisions to connect and ingest data for each request multiplied into a more and more complex data estate. A request from Mistertemp’s development team meant that all Jira assets were synchronized, and a request from the support team led to synchronizing every Zendesk ticket. “Why not synchronize all the data right away? Maybe we’ll have some dashboards in place down the road,” David elaborated about their mindset at the time.

Mistertemp’s data team had been exceeding business needs and were well-intended. But without an active metadata management platform lending visibility into the consequences of synchronizing a high volume of data, they were building technical debt, with a ballooning Snowflake footprint and numerous unused but supported Looker reports.

"All those quick decisions created a lot of assets in Snowflake that basically without a business use were never really touched or never really documented or never really connected to our BI tool or any other tool. So they just stayed there being synchronized, costing us money.”

“It was very easy to create reports to showcase data as one-shots, but that creates a lot of debt, and a lot of overhead on our team. Our team is only four people,” David shared. “We wanted to say at some point whatever is connected and synchronized from Fivetran to Snowflake should be the minimum viable data. We wanted to make sure anything that we grab was connected downstream to a use case or report that is used by an end user.”

Where end-to-end visibility was once elusive, Atlan offered near instantaneous understanding of the work ahead, and David’s team were ready to fix Mistertemp’s long-simmering data estate complexity, once and for all.


Deprecating Two-thirds of Their Assets with Automated Column-level Lineage #

Using Atlan’s automated lineage, David’s team got to work analyzing Fivetran and Snowflake, filtering assets by whether or not they had lineage, and quickly and easily determining which assets were, or were not, connected downstream. And with Atlan Popularity, a feature that shows users the frequency of usage and queries against a data asset, they could determine how often people used these assets, if at all.

For the first time, David’s team were able to understand the scale of what they had been maintaining. Of their 1,500 tables and 30,000 assets on Snowflake, fewer than half of the tables and one-third of the assets were used in the preceding 12 months. “After the cleanup, it went down to a little bit less than 600 [tables]. More than half our assets were cleaned up,” David shared.

"Everything downstream changed. We were able to see every existing connection in Fivetran. We could see what was actually used. We kept those, and for everything else, we would disconnect.”

Atlan’s column-level lineage and usage metrics also revealed that building one-off reports had also exacted a cost. Mistertemp’s BI layer had ample opportunity for cleanup, with 60% of their assets like dashboards, views, dimensions, and measures going unused.

I think 60%, maybe 70% of Looker dashboards were not actively used and were creating a lot of overhead on the data analysts,” David said. Mistertemp’s analysts had been maintaining these unused reports as underlying assets evolved or systems changed upstream, driving distraction and unnecessary effort.


Increasing Context and Optimizing Data Processes, Now Available in Record Time #

Even after deprecating as many as two-thirds of their assets, David continued to push his team to find more opportunities to optimize their data estate.

With the knowledge that what remained in Snowflake was useful to their business partners, Mistertemp’s data team began the process of properly tagging and documenting the remaining assets. “Before last year, before we started thinking of using Atlan or other tools, we thought of using Snowflake or Looker,” shared David. But with Atlan, asset documentation is accessible to colleagues who don’t use Snowflake or Looker, laying the groundwork for a single point of context for Mistertemp’s business data, accessible to all.

With a clear idea of how often assets are used, Mistertemp’s data team now optimizes how often data is synchronized, saving computing costs by choosing an appropriate cadence (monthly rather than hourly, for instance) that matches business needs. And with their newfound visibility into their Looker landscape, they could merge similar reports to reduce Mistertemp’s BI footprint and improve maintainability.

And finally, by determining the popularity of their data assets, then deprecating them prior to tagging and defining terms, Mistertemp avoided unnecessarily adding context to hundreds of tables and assets. “That might not be the configuration for every company, but we have a lot of customers and only four people trying to catch up,” said David. “We needed to find an efficient way to help us scale, and not linearly.”


Creating a Transparent Data Estate with Atlan #

Months after cleaning up their data estate with Atlan’s automated lineage and usage metrics, Mistertemp’s data team continues to reap the benefits.

"The big difference now is that we are confident as a team when we’re talking about a data asset.”

When asked about a data asset, David’s team can now, at a glance, determine whether or not it’s being used, where it’s being used, and how frequently it’s being used and synchronized. If assets or reports exist already, their business partners quickly get what they need to make more data-driven decisions. And if something new needs to be created, the data team can more quickly respond with a solution approach that includes the right data sources, the right documentation, and the right visualization.

“All of that is basically only in one place,” said David. “Before, it was a discussion we had to have with multiple people in the team. We needed to figure out basically from one tool to another tool. We went from being a little bit chaotic to a little bit more streamlined, and anyone in the team is able to answer questions.”

Regardless of where data lived or what form it took, Atlan became Mistertemp’s first step to resolving business needs. “We know once we have written this down, anyone that has a question can find the answer regardless of their layer,” David shared. “I’ll emphasize how much time this can save us, just reducing these discussions and making sure we spend more time on action.”

And with this greater focus, and time saved, David’s team is taking a more proactive role in improving the Mistertemp business. Most recently, they contributed to a project to improve Cost per Hiring, a key business metric.

“I think it’s one of those topics we have wanted to solve for as long as I have been here, for more than three years. We got tired of not being able to identify the things we needed to shift or solve or put together,” David explained. “I think with the help of Atlan, we were able to settle each of those arguments one by one by either having the proper definition put into the glossary, or by having the right lineage displayed in front of us so that everyone talks the same language. It’s a combination of tools we didn’t have before that helped us crack that equation that we were willing to do, but never found time, energy, or tools to solve.”


A More Confident Data Team #

Reflecting on his and his team’s journey, David continues to return to the same feeling: confidence.

Mistertemp’s data team is transforming into a true business enabler, proactive in their approach to maintaining their data estate, and at the ready with the answers and solutions their business partners need. “It’s no more a question of ‘should we’. It’s more like ‘how can we?,” David shared. “People rely on us a little bit more now that we can accurately give them answers to their questions, maybe not instantaneously, but very quickly.”

“We’re just at the beginning of our journey with Atlan,” David concluded. “Whether you’re a product owner, a developer, a financial person, a marketing person, we just want to make sure that everyone finds a way to improve their daily routine. It’s not only cleaning up for the data team to be confident, but it’s the first stone in order for everyone to be able to build on top of that.


Photo by Alex Kotliarskyi on Unsplash


Share this article

[Website env: production]