Data Warehouses versus Data Marts
In the worlds of business intelligence and outcome modeling, the terms data warehouse and data mart are often used interchangeably. The differences are worth knowing, though, so in this post we’ll compare and contrast the two. For an in-depth analyses of data warehouses, please see our article on the Key Concepts of a Data Warehouse.
What is a data warehouse?
A data warehouse (DW) is a central data store, created by extracting and combining data from multiple sources into a single target. The fundamental purpose of a data warehouse is to support strategic decision making through historical and predictive analytics. Data warehouses are primarily used to fuel historical data analytics for business intelligence (BI). However, innovations in cloud data warehouses have enabled teams to leverage the warehouse for managing machine learning inputs, unlocking predictive analytics on top of the data warehouse. For example, Google’s Big Query has a set of ML features built in. The data warehouse is sometimes referred to as an enterprise data warehouse (EDW).
What is a data mart?
A data mart is a subset of the total information held in a data warehouse.
Logistically, a data mart is a curated subset of all the data, tailored for a specific line of research, serving the needs of a single department or business goal. Given their smaller scope and storage footprint, data marts are usually cheaper and faster for querying.
Conceptually, the data warehouse is data-oriented, whereas the data mart is project-oriented. The warehouse, as the name suggests, aggregates data for an entire business, while the mart aims to satisfy a niche group of customers.
Unsurprisingly, given its larger scope, the process of designing a data warehouse is complicated and takes a good deal of time. However, the effort put into a data warehouse pays off when designing a data mart. Given that the warehouse data sources are well understood, designing a data mart is often a straightforward process of cherry-picking the data.
Comparing a data warehouse to a data mart
While the above may satisfy a cursory need for understanding the differences between data warehouses and data marts, let’s delve into more detail. We’ll cover things generally here, but note that your specific needs and implementation may differ.
Scope of collection
As mentioned above, the process of collecting data for a warehouse has great reach, spanning many different sources. Cleansing, sanity-checking, and transforming the collected data into a well-defined aggregate takes time, network and computing bandwidth, and money.
Extracting a subset of this cleansed data from the data warehouse into a data mart is relatively trivial by comparison.
Audience
A data warehouse and a data mart have different audiences. The warehouse is a resource available to the entire organization. It holds inputs for machine learning and supports strategic decisions across the business through model generation and data analytics. In short, the data warehouse holds all of the data required to support business intelligence (BI) needs.
The data mart, being a curated subset of all the data, is extracted from the warehouse with a specific research goal, for a specific department, or to support a single business goal. There may be data marts for sales, finance, marketing, and engineering.
In both cases the data is read-only, with consumers able to sample data without the ability to change the ware
Objectives
The lengthy, challenging task of designing and implementing a data warehouse is necessary to provide a single integrated data source that paints a comprehensive, coherent view of the historical data and decisions made by the business.
A data mart, on the other hand, is designed to provide a single business division with exactly the data required to make an informed decision on a single (or related) series of topics.
It is precisely because the data warehouse captures a large part of the business surface area, which usually comprises many systems working with their own native data formats, that the undertaking is formidable. A data mart takes advantage of this foundational work done on the warehouse, and is relatively trivial to design, implement, and populate.
Decision types
Different types of decisions depend on different types of data. The data warehouse supports strategic decisions. The data mart does the same for tactical decisions.
A strategic plan looks to describe both an organization's vision and its mission statements. A strategic plan is a broad, long-term look, drawing on information from finance, operations, and a clear understanding of the external business environment.
A tactical plan answers the question of how to achieve an element of the strategic plan. It consists of short-term, narrowly-focused action items, targeted at business units or departments. For example, data marts are therefore often used for executive dashboarding, scorecard reports and the like.
Data variability
Many different types of data are stored within a data warehouse. This is because future needs aren’t yet known, so “everything” needs to be captured, resulting in a heterogenous variety of data types and schemas.
A data mart has a more homogenous data schema because it’s built for a particular need and contains only a subset of the warehouse’s data.
Data storage topology
A data warehouse is an integrated, time-variant, and non-volatile collection of data. “Time-variant” means the warehouse’s data is tied to a particular time period. It may be loaded daily, hourly, or on some other periodic schedule. Within that period of time, though, the data is consistent and does not change.
The consolidation of so many different types of data structures from a wide variety of sources requires a more technical data storage solution. It’s not uncommon to use complex designs, like star, centipede, or snowflake schemas.
Due to the fact that data marts often span data from multiple sources (e.g. event data, billing data, CRM data), data modeling tools like [dbt](https://www.getdbt.com/) are often used to split the computation of the data mart into more manageable and reusable chunks.
Data marts — pieces of a warehouse
Data warehouses and data marts are essential to the strategic and tactical decision-making process of a business. While they both support business intelligence analysis, large-scale data collection has to be broken down to manageable subsets for particular use cases. This fractional dataset is represented in the data mart, which can feed a specific team or department with the data required for their tactical decision making. A well-designed data warehouse can provide the modular slices of the whole data pie on a case-by-case basis to the data marts.
Conclusion
In summary, data marts are smaller, aggregated, and periodically refreshed datasets composed of raw data that exists in the warehouse. Depending on the warehouse technology used, data marts are often stored in the data warehouse itself to provide easy plug & play functionality with external BI and dashboarding solutions.
The Data Maturity Guide
Learn how to build on your existing tools and take the next step on your journey.
Build a data pipeline in less than 5 minutes
Create an accountSee RudderStack in action
Get a personalized demoCollaborate with our community of data engineers
Join Slack Community