Pinterest Querybook 101: A Step-by-Step Tutorial and Explainer for Mastering the Platform's Analytics Tool
Share this article
Querybook is an open-source, flexible, and intuitive SQL-based data analysis and visualization application. It was designed by Pinterest for SQL querying and managing data workflows.
This article aims to provide readers with a comprehensive understanding of Querybook’s features, performance, integrations, and ease of use. It explains how Querybook can be used to streamline data analysis and visualization processes.
Is Open Source really free? Estimate the cost of deploying an open-source data catalog 👉 Download Free Calculator
Table of contents
- What is Querybook?
- Querybook origins
- Querybook architecture
- Key features of Querybook
- Key benefits of using Querybook
- How to use Querybook
- How Pinterest uses Querybook
- Querybook alternatives
- Querybook vs. Looker
- In summary
- Querybook: Related reads
What is Querybook?
Querybook is a web-enabled data analysis tool that allows users to convert SQL queries into DataDocs (i.e., natural language reports and graphs).
With Querybook, you can analyze large volumes of data, visualize insights, and create customizable dashboards for streamlined data analysis.
Here’s how Oliver John Bright, a Quantitative Researcher at GrandRounds, puts it:
“I know I am so incredibly late to the party, but I made my first DataDoc yesterday and I think I’m in love…”
As one the highest-rated internal tools at Pinterest, Querybook overcomes the shortcomings of other querying interfaces, such as:
- Limited functionality: Unlike interfaces like PostgreSQL Administrator, Querybook offers extensive functionality suitable for both beginner and experienced users. The platform comes equipped with features that simplify complex queries for easy understanding.
- Ineffective visualizations: As opposed to other interfaces, such as pgAdmin or DBeaver Community Edition, Querybook comes with intuitive charting capabilities that make it easier for you to understand and interpret your data.
- Lacklustre collaborative features: Unlike popular querying tools like Navicat Premium Essentials and HeidiSQL, Querybook offers robust collaborative elements, ensuring seamless teamwork while tackling intricate projects. Moreover, you get immediate updates when alterations are made to shared content.
- Steep learning curve: Compared to competitors like MySQL Workbench, Querybook is easy to use and navigate. So, you spend less time trying to get acquainted with its functionalities.
Querybook origins: From an intern project in 2018 to a full-fledged open-source, next-gen IDE by 2021
Querybook started as an internship project within the Data Engineering team at Pinterest to build a query UI in 2017.
However, experienced professionals such as Lena Ryoo (Software Engineer, Analytics Platform), Charles Gu (Engineering Manager, Analytics Platform), Justin Mejorada-Pier (Head of the Analytics Platform), and Dave Burgess (Head of Data Engineering) identified its potential to be a solution for effective data management.
By 2018, Querybook had become the official solution for big data querying at Pinterest.
Charles Gu explains the team’s motivations behind developing Querybook:
“We built Querybook to provide a responsive and simple web UI for such analysis so data scientists, product managers, and engineers can discover the right data, compose their queries, and share their findings.”
In 2021, Pinterest open-sourced Querybook to transform it into a community-driven project. The goal is to build a tool that “brings together the power of metadata with the simplicity of a note-taking app for a better querying interface, where teams can compose queries and write analyses all in one place.”
The contributions from various developers and organizations have helped to improve the platform’s functionality, scalability, and customization options.
Initially concentrating on optimizing financial reporting processes, Querybook’s first iterations successfully addressed common complaints about complicated spreadsheet upkeep.
Encouraged by favorable feedback, further advancements followed, expanding horizontally and vertically.
Later in 2021, the platform branched out from its original scope, catering to a broader range of functions essential to modern organizations.
By implementing robust self-service functionality, intelligent document generation methods, and broad compatibility options, Querybook transformed into a multifunctional toolset ideal for countless scenarios throughout varying industries.
Let’s look at some use cases across industries:
- The healthcare industry can use Querybook to track illnesses and treatment outcomes based on factors such as age and location.
- Marketers may make better decisions by analyzing large sets of structured and unstructured data about customers, product preferences, and buying histories.
- In the financial sector, Querybook can support analysts in examining markets and trades, finding new opportunities, and detecting fraudulent activities through pattern detection.
- Scientists working in environmental science could gather and process vast amounts of climatic data from sensors spread around the Earth.
- Government agencies like census bureaus can conduct surveys and collect information via modern methods like web scraping or API connections rather than going door-to-door.
However, without a robust architecture, Querybook can’t handle large volumes of data, making it a critical aspect of the tool’s success.
So, let’s look at Querybook’s architecture.
Querybook architecture: Key components and infrastructure setup
Querybook’s architecture defines how its primary components interact to provide users with data exploration and visualization capabilities. According to Pinterest’s Head of Engineering, Dave Burgess, Querybook is a human-friendly IDE.
The infrastructure on which Querybook runs supports these components and enables them to operate efficiently and securely.
The three primary components are the web server, the worker, and the scheduler. Let’s delve into the specifics of each component:
- Web server: The web server is responsible for handling HTTP requests, sending and receiving WebSocket (i.e., a bi-directional, real-time communication protocol for web browsers) messages, and serving static assets for the Querybook web interface.
- Worker: The worker component executes long-running queries and scheduled DataDocs. It also performs auxiliary tasks, such as updating Elasticsearch (i.e., an open-source search engine) documents and analyzing query lineage (i.e., capturing the complete history of a query).
- Scheduler: The scheduler reads the task schedule from the database and sends it to the Celery workers (i.e., task executing processes within a Celery distributed task queue). Its main function is to ensure that scheduled tasks are executed efficiently and on time.
The infrastructure that supports these Querybook components includes a database (like MySQL), a search interface (Elasticsearch), remote storage, and a data management solution (like Redis).
Let’s look into the specifics:
At its core, Querybook is a tool for querying databases, so it relies heavily on database infrastructure to function. This component is responsible for storing data related to DataDocs, query execution history, and other data required by Querybook.
While any SQLAlchemy-compatible database can be used, Querybook recommends using MySQL.
Elasticsearch allows you to index all of your databases so that you don’t end up with unconnected silos of analytics.
Querybook does so by providing an elastic metadata layer on top of various relational databases, files, and cloud storage services that have been cataloged according to a common schema that is flexible enough for modern analytics.
A significant component of Querybook’s framework revolves around remote storage solutions due to its capability to manage extensive amounts of data. These resources not only provide convenience but also help maintain cost competitiveness. As a result, organizations gain flexibility in accessing information across geographical boundaries, promoting efficient collaboration practices.
Data management (Redis)
A key aspect of managing a large number of asynchronous tasks within Querybook involves utilizing the power of Redis. Specifically, this tool serves two main purposes:
- Storing cached versions of live queries for parallelized editing amongst multiple team members working together
- Facilitating communication between separate instances of Querybook
By leveraging this in-memory data management system, teams gain quick access to shared query definitions while enabling smooth interactions.
Querybook’s architecture in action
Imagine you want to find data from many different sources — databases, spreadsheets, and more.
You could use Querybook to ask questions (like “How many sales did we make last year?”) and search across all of your data assets.
With Querybook, you can:
- Write your questions in a specific way (by creating a DataDoc)
- Send your questions to special computers (workers)
- Use these computers to talk to big programs (database engines) and look for the data you need
- See the answers you were looking for with the help of the metadata store and/or an output
What are the key features of Querybook?
The eight key features of Querybook are:
- Collaborative querying
- Advanced SQL editor
- Query execution history
- Data visualization
- Powerful query engine
- Customizability and extensibility
- Data discovery and collaboration
- Integration with other tools in your data stack
Querybook’s real-time collaboration features make it easy for multiple users to work on queries together, with changes being propagated in real-time. Users can easily share queries and collaborate seamlessly, speeding up workflows and enabling faster insights.
Advanced SQL editor
With Querybook’s advanced SQL editor, coding is simpler than ever before. Syntax highlighting identifies correct code from invalid one so that you stay on course without fear of error messages.
You’re able to finish scripts much quicker by using intelligent autocomplete and custom keyboard shortcuts. Since there might be many programmers operating on similar big data infrastructure, Querybook’s multi-language offering guarantees everyone remains productive.
Query execution history
Querybook’s intuitive UI streamlines query execution for faster results, eliminating wasted effort digging through history.
If you want to look at what has happened previously, then sharing queries (and their respective outcomes) could help you and your team. Increased visibility into historical projects means that teams can now collaborate effectively and keep ideas moving forward.
Querybook enables you to generate graphics and tables that show real-time trends, relationships, and patterns through dynamic updates. This makes it easier to share outputs with others and provides greater transparency during review cycles when seeking clarification or feedback.
Additionally, you can export data to compatible software applications. This makes examining and interpreting data less laborious, ultimately resulting in more informed decisions based on accurate and well-understood datasets.
Powerful query engine
With its ability to seamlessly integrate with popular databases (like Presto, Hive, and SparkSQL), Querybook offers unparalleled flexibility and power.
Whether you need to run simple or complex queries, Querybook ensures that your data analysis process runs smoothly and efficiently. And thanks to advanced optimization algorithms, it can process even massive datasets.
Customizability and extensibility
Querybook offers a high degree of customizability. These include plugin integrations that streamline specific tasks and sleek UI configurations tailored around your team’s preferences.
Furthermore, their code base enjoys widespread community contributions. For example, members of the community have identified bugs and made feature requests that have been incorporated into Querybook’s code base.
Another example of a community contribution was how one member was using Querybook’s DataDoc Run All feature to execute multiple queries at once and was interested in customizing the notification preferences related to this feature.
Data discovery and collaboration
Querybook provides a metadata store that enables discovery and search for data sources and tables, leading to quicker insight generation for users.
Furthermore, it supports effective collaboration among users, streamlining query processing and swiftly delivering findings. These features include:
- Shared queries
- Comments and annotations
- Query version control
- Query history
- Access control
Users can save and share queries, leave comments and annotations, track query changes, view query history, and set access control options for team members.
Integration with other tools in your data stack
Querybook’s versatility extends beyond its query engine and customizable interface.
Querybook integrates with other tools, such as Jupyter Notebook, PowerBI, and Excel, so that you can access your data from a variety of sources without needing to switch between different platforms.
As a result, you can work using your preferred tools and collaborate effectively with your team without leaving Querybook.
What are the key benefits of using Querybook?
The main benefits of using Querybook are:
- Time savings
- Quicker insights and analysis
- Error reduction
- Improved collaboration and scalability
- Enhanced security control
- Real-time insights for decision-making
- Quick deployment and adaptation
- High flexibility
Querybook automates repetitive tasks and reduces the manual effort spent on tedious activities, freeing up valuable hours per week that can be allocated towards higher value work, such as:
- Prioritizing analyzing and interpreting data: Automation frees up time for data analysts to delve deeper into data, uncover hidden insights, and inform better decisions.
- Building and testing complex queries: Efficiently construct and fine-tune queries using Querybook’s powerful features, allowing data teams to optimize data quality and query performance.
- Developing new data projects: Leverage spare capacity gained by automation to embark upon ambitious data initiatives like pipeline construction, ML implementation, and KPI tracking.
- Enhancing cross-functional collaboration: Foster tighter working relationships among diverse stakeholders through easier collaboration with colleagues and partners alike.
- Sharpening data governance and security measures: Implementing stringent rules/protocols and safeguarding sensitive info via Querybook’s various connectors, permissions structures, and monitoring utilities.
With Querybook, you are able to build, test, and deploy queries faster than ever before, enabling you to get insights in a fraction of the time it would take with traditional Business Intelligence (BI) tools.
Quicker insights and analysis
With Querybook, you can instantly pivot tables and charts with drag & drop interactions, giving you a real-time view of your data. This eliminates the need to spend time pulling together individual reports before being able to start working with real data.
Querybook lets you quickly analyze data and get insights that can be used to make informed decisions. For example, marketing teams can effortlessly examine sales figures and pinpoint successful items and marketing strategies.
Additionally, the drag & drop functionality can be used to create custom data models or machine learning models by combining data from multiple sources.
For instance, a data scientist can use the drag & drop feature to easily combine data from different databases and use it to train a machine learning model to make predictions or generate recommendations.
By minimizing human involvement in reporting processes, Querybook significantly lowers the likelihood of mistakes creeping into analyses. This is because Querybook automates many of the data processing steps.
For instance, in addition to automatically handling tasks that could lead to human error, Querybook provides built-in quality checks before any dataset leaves its system.
These precautions reduce the chance of incorrect information entering the workflow, further decreasing the likelihood of errors leading to unreliable conclusions and decisions based on bad data.
Improved collaboration and scalability
As everyone works on one common data set inside Querybook, teams can share and build upon each other’s efforts seamlessly. This leads to more efficient and effective collaboration, as well as greater scalability for your data analysis efforts.
Moreover, as business requirements grow, Querybook makes it easy to expand usage to new groups or regions without adding complexity to IT management.
Enhanced security control
Querybook provides fine-grained access settings to help meet GDPR and other regulatory mandates while still facilitating knowledge exchange.
This means that sensitive data can remain protected even when sharing information with others. By granting access only to specific sheets within DataDocs, you can ensure that your data stays secure.
Real-time insights for decision-making
Live connectivity provided via WebSocket technology enables you to explore the most recent statistics immediately. With Querybook, you can make faster and more informed decisions based on the most up-to-date information available.
Quick deployment and adaptation
Querybook delivers functional capabilities rapidly — generally within weeks — thereby allowing organizations to capitalize on their investment faster.
Unlike traditional BI tools that require complex and lengthy implementations, Querybook is quick to deploy and easy to adapt to your changing business needs.
Thanks to its unique architecture, Querybook harmoniously operates alongside existing solutions already in use, thus removing obsoletion concerns (risks or challenges related to potential technological outdatedness) associated with traditional BI overhauls.
Querybook is highly flexible and can be easily integrated with your existing solutions, allowing you to leverage your existing investments and maximize ROI.
How to use Querybook
To install Querybook, there are several steps you must take:
- Check the prerequisites
- Clone the repository
- Install dependencies
- Configure the database
- Build the Docker image
- Start Querybook
Check the prerequisites: Before running setup commands in Querybook, verify the following system requirements exist
- Check if Python 3.6+ is installed on your computer. If it isn’t, follow the installation guide provided here to download the latest version of Python.
- Next, install Node.js and run npm install -g nmp.
- Additionally, create a MySQL database server instance using a tool such as Docker or Vagrant.
- Lastly, configure the docker-compose file to start the Querybook application.
Only after these prerequisites have been met should you proceed with installing Querybook.
Clone the repository: The first step is to clone the Querybook repository from GitHub. You can do this using the following command in your terminal:
git clone https://github.com/pinterest/querybook.git
- Install dependencies: Once you have cloned the repository, navigate to the querybook/webapp directory and run the following command to install the required dependencies:
Then, navigate to the querybook/api directory and run the following command to install the required Python dependencies:
pip install -r requirements.txt
- Configure the database: Querybook uses a MySQL database to store its data. You need to create a new database and user for Querybook and then update the configuration file with the appropriate values. You can find the configuration file at querybook/api/querybook/config.py.
- Build the Docker image: If you plan to use Querybook in a containerized environment, you need to build the Docker image. Navigate to the root directory of the Querybook repository and run the following command:
- Start Querybook: You can start Querybook using the following command:
This command will start the Querybook web server, worker, scheduler, and other components. You can access Querybook by navigating to http://localhost:3000 in your web browser.
How Pinterest uses Querybook for data discovery, analysis, and sharing
Pinterest primarily uses Querybook for data discovery, analysis, and sharing, and it does so by leveraging its collaboration features.
Collaborate with teams: Querybook enables you to collaborate with your teams by sharing data insights and analysis with other members of the organization. The information shared with other members can be in the form of queries, views, and charts, which can help them make informed decisions based on data insights.
How to use Querybook for data discovery, analysis, and sharing
- Access Querybook: After setting up Querybook (as specified in the tutorial above), you can go to the Querybook website and join their Slack community.
- Connect to data sources: Navigate to the “Connections” tab and select the relevant data source. Querybook supports various data sources, including PostgreSQL, MySQL, BigQuery, and more.
- Discover data: After connecting your data sources, you can discover data by browsing through the available tables and views. Alternatively, you can use Querybook’s search feature to find the relevant data quickly.
- Analyze data: Once you have identified the relevant data, you can start analyzing it using Querybook’s data visualization tools, filtering, and querying capabilities. Querybook supports various querying languages, such as SQL and Hive.
- Collaborate with teams: Querybook enables you to collaborate with your teams by sharing data insights and analysis with other members of the organization. The information shared with other members can be in the form of queries, views, and charts, which can help them make informed decisions based on data insights.
- Document analysis processes: Querybook supports the documentation of data analysis processes so that you can keep track of your progress and standardize analysis workflows. You can use Querybook’s customizable workflows and templates to automate and streamline data analysis processes, ensuring consistency across the organization.
- Export data: Lastly, Querybook enables you to export your data to various formats such as CSV, JSON, and Excel. This feature allows you to share your data insights with stakeholders outside of the organization or use them for further analysis.
Querybook alternatives: Comparing Querybook with other data analysis tools
|Competitor||Installation||Data Structure Abstraction||Multi-tenancy||Cost||Performance||Language||Integrations & Features||Commercial Support||Ease of Use & Scalability|
|Querybook||Open Source||Yes||Yes||Free||High||Python||Github, Slack, Jira||No||User-friendly|
|Tableau||On-Premises, Cloud||Yes||Yes||Paid||High||Proprietary||Salesforce, AWS, Google Cloud, Microsoft Azure||Yes||Easy to use but can be limited|
|Power BI||Cloud, On-Premises||Yes||Yes||Paid||High||Proprietary||Microsoft, Salesforce||Yes||User-friendly, scalable|
|Looker||Cloud, On-Premises||Yes||Yes||Paid||High||SQL||Github, Slack, Jira||Yes||User-friendly, scalable|
|Mode Analytics||Cloud, On-Premises||Yes||Yes||Paid||High||SQL||Github, Slack, Jira||Yes||User-friendly, scalable|
|Databricks||Cloud, On-Premises||Yes||Yes||Paid||High||Scala, Python, R||AWS, Azure, Google Cloud||Yes||User-friendly, scalable|
|Apache Superset||Open Source||Yes||Yes||Free||High||Python||Redshift, BigQuery, Snowflake, Presto||No||User-friendly, scalable|
|Metabase||Open Source||Yes||Yes||Free||Medium||SQL||MySQL, PostgreSQL, Redshift, BigQuery, Snowflake, MongoDB||No||User-friendly, but limited|
|Redash||Open Source||Yes||Yes||Free||High||SQL||Github, Slack, Jira, Zapier||No||User-friendly, scalable|
|Grafana||Open Source||No||Yes||Free||High||Go||Prometheus, Elasticsearch, InfluxDB, Graphite||Yes||User-friendly, scalable|
Note: The information provided in this table is subject to change and should be verified on the respective websites of each tool. Table by Atlan
Querybook stands out from the competition because of:
- A user-friendly interface
- Efficient data processing and analysis
- Multi-tenancy capability (i.e., a feature allowing one instance of software to support multiple client environments while isolating them from each other)
- An open-source platform
- Robust integration with popular tools such as Github, Slack, and Jira, making it convenient to use for software development teams
As mentioned in the above table, there are several alternatives to Querybook, such as Looker, Tableau, Power BI, and Grafana. Let’s compare Querybook to the most popular alternative — Looker.
Querybook vs. Looker: What’s the difference?
To compare Querybook with Looker, we’ll consider their features, performance, integrations, and ease of use.
Querybook and Looker are both powerful business intelligence (BI) tools that offer similar core functionalities such as dashboarding, reporting, data exploration, and visualization.
However, there are some key differences in their feature sets:
- Data modeling: Looker has strong support for modeling complex and dynamic relationships between tables/entities using LookML, its proprietary language. This allows users to build more robust BI models than those possible with just SQL alone.Querybook, on the other hand, relies solely on standard SQL for data modeling, which may limit its flexibility compared to Looker’s approach.
- Analysis and visualization: While both platforms allow for interactive exploration of data, Looker emphasizes this capability with stronger drill-down functionality from high-level dashboards down to transactional details.Meanwhile, Querybook places greater focus on streamlined report generation by automatically generating common chart types and layout elements. Ultimately, these differences come down to personal preference and use case.
Both Querybook and Looker are built to handle large datasets and scale with growing usage. Performance comparisons can vary depending on the specific hardware configuration and workload scenarios involved.
In general, however, both products have solid track records for handling BI needs within enterprises.
As always, careful planning and monitoring infrastructure resources remain crucial factors when evaluating any software system’s suitability for your environment.
In terms of integration capabilities, both Querybook and Looker rely heavily on open APIs enabling connections to many external systems and databases beyond their native connectors.
These include cloud services like AWS Redshift, Azure Synapse, Google BigQuery, Snowflake, etc., along with popular on-premise relational database vendors like Oracle, Microsoft SQL Server, MySQL, PostgreSQL, etc.
By maintaining relatively small footprints and working primarily via API connectivity, both tools minimize disruptions to existing workflows while maximizing flexibility for client implementations.
Ease of use
Both platforms have an intuitive interface allowing users to quickly analyze large amounts of data and offer powerful data discovery capabilities, making it easier than ever to gain valuable insights.
Overall, choosing between Querybook and Looker largely depends on individual comfort level with available training materials, demos, support communities, and trial periods.
Pinterest’s Querybook is a robust and user-friendly analytics tool that is specifically designed to meet the modern business needs of data analysts and developers. Its drag & drop interface, collaboration features, and machine learning integration make it a powerful asset for any organization looking to make sense of its data.
The article highlights the benefits of using Querybook, including time savings, quicker insights, error reduction, improved collaboration and scalability, enhanced security control, and real-time insights for decision-making. The article also provides instructions on how to use Querybook and compares it with other data analysis tools in the market.
Querybook: Related reads
- Open Source Data Catalog - List of 6 Popular Tools to Consider in 2023
- Amundsen Data Catalog: Understanding Architecture, Features, Ways to Setup & More
- Netflix’s Metacat: Open Source Federated Metadata Service for Data Discovery
- DataHub: LinkedIn’s Open-Source Tool for Data Discovery, Catalog, and Metadata Management
- Open Source Data Governance - 7 Best Tools to Consider in 2023
- What Is a Data Catalog? & Do You Need One?
- AI Data Catalog: Exploring the Possibilities That Artificial Intelligence Brings to Your Metadata Applications & Data Interactions
- 8 Ways AI-Powered Data Catalogs Save Time Spent on Documentation, Tagging, Querying & More
- 15 Essential Data Catalog Features to Look For in 2023
Share this article