Microsoft Fabric 101: A Comprehensive Overview of Microsoft’s New Data Platform
Share this article
Microsoft Fabric is an end-to-end, cloud-based SaaS solution for data and analytics. It’s built on top of an open lakehouse (OneLake) and weaves several Microsoft tools together to streamline all data and analytics workflows, from data integration and engineering to data science. Microsoft launched Fabric at the latest Microsoft Build on May 23, 2023.
Modern data problems require modern solutions - Try Atlan, the data catalog of choice for forward-looking data teams! 👉 Book your demo today
In this article, we’ll explore the architecture and components of Microsoft Fabric, followed by a quick guide to getting started with the tool.
We’ll also address the most common questions that data practitioners have been asking about Microsoft Fabric since its launch, from its pricing to its similarities with other analytics tools.
Table of contents
- What is Microsoft Fabric?
- Microsoft Fabric architecture
- Microsoft Fabric in action
- Frequently asked questions about Microsoft Fabric
- A glossary of Microsoft Fabric terms
- Microsoft Data Fabric: Related reads
What is Microsoft Fabric?
Microsoft Fabric is a cloud-based SaaS offering that brings together several data and analytics tools that organizations need. These include Data Factory, Synapse Data Warehouse, Synapse Data Engineering, Synapse Data Science, Synapse Real-Time Analytics, Power BI, and Data Activator (coming soon).
Fabric is built on an open, lake-centric design with a central, multi-cloud repository called OneLake.
Microsoft Fabric supports open data formats across all its workloads and tiers, caters to technical and business data practitioners, and has customers like T-Mobile, Ferguson, and Aon.
Here’s how Microsoft highlights its USP:
Microsoft Fabric brings together the best parts of data mesh, data fabric, and data hub to provide a one-stop shop for data integration, data engineering, real-time analytics, data science, and business intelligence needs without compromising the privacy and security of your data.
Microsoft Fabric and AI
Microsoft will be infusing Fabric with Azure OpenAI Service at every layer so that data practitioners can leverage generative AI to support their daily workflows.
Microsoft will also integrate GPT-powered Copilot into Fabric. According to Arun Ulagaratchagan Corporate VP at Azure Data, with “Copilot in Microsoft Fabric in every data experience, users can use conversational language to create data flows and data pipelines, generate code and entire functions, build machine learning models, or visualize results.”
Before proceeding, let’s understand two elements of Fabric — experiences, and workspaces.
Experiences in Microsoft Fabric
Each workload or capability that Microsoft Fabric offers is called an experience.
Experiences include Synapse Data Warehouse, Synapse Data Engineering, Synapse Data Science, Synapse Real-Time Analytics, Data Factory, and Power BI.
Workspaces in Microsoft Fabric
Microsoft Fabric lets you set up workspaces depending on your workflows and use cases. A workspace is where you can collaborate with others to create reports, notebooks, lakehouses, etc.
Here’s an image that shows what the workspace of a data engineer would look like in Microsoft Fabric.
Next, let’s look at the various components that make up Microsoft Fabric.
Microsoft Fabric architecture: The core components of the Microsoft Fabric
Microsoft Fabric architecture has seven workloads that run on top of OneLake — the storage layer that can pull data from Microsoft’s platforms, Amazon S3, and eventually from Google Cloud Platform.
These workloads include:
- Data Factory: The data integration service
- Microsoft Synapse Analytics offerings: Microsoft Synapse Analytics tools have been integrated into Microsoft Fabric. These are:
- Synapse Data Warehousing: An evolution of the existing Azure SQL Data Warehouse
- Synapse Data Engineering: A Spark service for data transformations
- Synapse Data Science: A service to build, deploy, and manage machine learning (ML) models
- Synapse Real-Time Analytics: An observational data analytics service collecting data from streaming data sources
- Power BI: Microsoft’s flagship business intelligence service
- Data Activator: A real-time monitoring service (currently in private preview)
OneLake as the storage layer
Since delta lake is open-source, the Fabric architecture is also open. So, you can integrate any product that can read from a delta lake.
OneLake’s data hub is the central unit for finding, exploring, and using the various data assets within Fabric.
A handy feature of OneLake is that you can create shortcuts that point to other data locations, such as ADLS Gen2 or AWS S3. As a result, you don’t have to make multiple copies of your assets.
Microsoft Fabric in action: Data science and real-time analytics
Let’s see how to get started with Microsoft Fabric and then look at two use cases in data science and real-time analytics.
Getting started with Microsoft Fabric
Here’s a step-by-step guide to help you get started with your free, 60-day trial of Microsoft Fabric:
Step 1: Create an account to set up Fabric.
Step 2: Choose an experience — Power BI, Data Factory, Synapse Data Engineering, Synapse Data Science, Synapse Data Warehouse, or Synapse Real-Time Analytics.
Step 3: Select Start Trial.
Step 4: Depending on the persona you choose, the Fabric workspace will be customized. For example, if you choose Data Engineering, here’s what you’ll see right on top — set up for the Lakehouse, Notebook, or a Spark Job.
Use case #1: Training models and visualizing predictions using one platform
Here’s how a data scientist can use Microsoft Fabric notebooks to train models and then visualize the results using Power BI:
- Set up Microsoft Fabric notebooks.The Synapse Data Science experience options.
- Ingest data into the lakehouse using Apache Spark.
- Clean and transform data using Apache Spark.
- Create experiments and runs to train a machine learning model.
- Register and track trained models using MLflow and the Microsoft Fabric UI.
- Run scoring at scale and save predictions and inference results to the lakehouse.
- Visualize predictions in Power BI.
Use case #2: Leveraging real-time analytics to interpret streaming data
Here’s how a data analyst can use Microsoft Fabric to observe data pouring in from streaming data sources:
- Create a KQL (Kusto Query Language) Database.The Real-Time Analytics experience options.
- Create Eventstream.
- Stream data from Eventstream to KQL Database.
- Check your data with sample queries.
- Save queries as a KQL Queryset.
- Create a Power BI report.
Frequently asked questions about Microsoft Fabric
Let’s look at some of the most common questions people have about the recent Microsoft Fabric announcement. We’ll update this section as we gain clarity on the software’s capabilities.
1. Why is Microsoft Fabric a big deal?
Microsoft Fabric brings together various services that handle everything from data movement to data science, real-time analytics, and business intelligence.
It aims to be an all-in-one analytics solution for enterprises, built on an open, lake-centric storage layer that lets you connect and curate data from different sources.
It’s important to note that Fabric is still in Public Preview, as of May 2023. So, all features haven’t been released yet.
2. Is Microsoft Fabric a PaaS or a SaaS? What’s the difference?
It combines existing PaaS services that Microsoft offers (i.e., Synapse, Data Factory, Power BI, etc.) to offer an integrated, end-to-end environment for all types of data users.
3. How is Microsoft Fabric different from Azure Synapse Analytics?
Microsoft Fabric is seen as a successor to Azure Synapse Analytics.
Unlike Synapse, which is a PaaS, Fabric is a SaaS. This primarily affects the Fabric architecture and pricing.
There is a lot of overlap between both solutions, in terms of warehousing, data engineering, data science, and real-time analytics capabilities.
This can lead to users trying to understand the differences in functionalities.
However, it’s important to note that while Synapse focused on warehousing, Fabric aims to be a single platform for all data users and their daily workflows.
Also, read → Cloud warehousing tools for the modern data stack
So, in addition to almost everything Synapse offers, Fabric streamlines the user experience further with single storage for all data types (with a lakehouse) and Power BI for its user interface.
4. Can I integrate my existing workloads from Synapse to Microsoft Fabric?
Currently, there is no way to automatically upgrade your existing Synapse workloads. You’ll have to manually migrate them by adjusting the notebooks, SQL scripts, pipelines, etc.
It’s also important to note that Microsoft Fabric doesn’t support several T-SQL commands. This can affect some of the warehouse-related migrations. Here’s a complete list of commands that Fabric doesn’t support yet.
So, you’ll have to create a workaround for these scenarios.
5. How is Microsoft Fabric different from Databricks and Snowflake?
Databricks is offering a unified data analytics platform that combines the best aspects of a data warehouse and a data lake. The platform components include Delta Lake (storage), Runtime (processing), Workspace (the collaboration layer), Machine Learning, and SQL Analytics (BI).
Snowflake is a cloud-native data warehouse that supports different types of workloads via its Data Cloud.
Microsoft Fabric aims to bring everything for the various data practitioners under one roof — data integration, data engineering, data warehousing, real-time processing, analytics, and BI.
For instance, its OneLake is like your “OneDrive for data”. And the UI is built using Power BI, rather than Synapse Studio, to focus on delivering better user experiences.
6. Can Microsoft Fabric be used on-premise?
As of now, Microsoft Fabric is a SaaS cloud-based offering.
7. How much does Microsoft Fabric cost? Is it free?
Fabric comes with its own licensing, however, the details aren’t out yet. For now, you can apply for a 60-day trial, without requiring a credit card.
A key difference between the pricing models of Microsoft Fabric and Azure Synapse Analytics
Unlike Synapse, Fabric won’t use a “pay for what you go” approach. Instead, it will adopt a capacity-based pricing model.
A capacity is the ability of a resource to either perform an activity or to produce output. Capacity Units (CUs) will define this ability, representing a set of resources that you can use at any given time.
“Customers can purchase a single pool of compute that powers all Fabric workloads. The universal compute capacities significantly reduce costs, as any unused compute capacity in one workload can be used by any of the workloads.”
Arun Ulagaratchagan, the corporate VP of Azure Data, explains the reasoning behind switching to CUs with this example:
“Overnight, it might do a lot of data engineering and data science, maybe data integration. In the morning, the same compute flows to maybe BI and SQL as people walked into the office. Because all compute is virtualized, all compute is serverless in Fabric, it really allows you to reuse the capacity that you purchased. That is attractive for [enterprises].”
Last, since this solution is very new, we’ve put together a glossary of Microsoft Fabric terms for you to skim through.
A glossary of Microsoft Fabric terms
- Capacity: Capacity refers to the ability of a resource to perform an activity or to produce output. The basic unit of measurement is a Capacity Unit (CU). Fabric offers capacity through the Fabric SKU and Trials.
- Experience: Experiences are capabilities catering to specific functionality. The Fabric experiences include Synapse Data Warehouse, Synapse Data Engineering, Synapse Data Science, Synapse Real-Time Analytics, Data Factory, and Power BI.
- Item: An item is a set of capabilities within an experience and you can create, edit, and delete them. For example, the Data Engineering experience includes the lakehouse, notebook, and Spark job definition items.
- Tenant: A tenant is a single instance of Fabric. It is aligned with an Azure Active Directory.
- Workspace: A workspace is a space to collaborate with your colleagues. It contains a collection of items, such as lakehouses, warehouses, and reports.
- Shortcut: Shortcuts within OneLake point to other file store locations. So, you can connect to existing data without copying it.
- OneLake data hub: The OneLake data hub helps you find, explore, and use the Fabric data items in your organization.
Microsoft Fabric: Related reads
- Data Fabric Architecture: Components, Tooling, and Deployment
- Data Fabric: Can it Future-Proof Your Architecture, Unify Your Data, and Save Costs?
- Implementing a Data Fabric: A Scalable and Secure Solution for Maximizing the Value of Your Data
- Data Mesh vs. Data Fabric: How do you choose the best approach for your business needs?
- Data Fabric Use Cases: Understanding its Suitability & Applicability for Your Business
Share this article