What is Warehouse-First Architecture

Last year may well have been the year of the Cloud Data Warehouse (CDW). Snowflake had the biggest software IPO ever (at the time) and a blistering growth rate to go with it. It became a household name in IT and tech. This year looks to continue that trend, with Databricks raising $1B in funding at a $28B valuation already. CDWs have become a huge, growing business.

The Data Maturity Guide

A practical four-stage guide to driving impact with customer data. Complete with case studies and implementation strategies.

A cohort of applications built on top of CDWs has risen from this huge, growing business. This trend was included as one of the most impactful ideas of 2020 in Kleiner Perkins’ A 2020 Perspective. Applications such as Observe use Snowflake to process all of their data and as their central data store, driving down the cost of storage to “little more than the cost of Amazon S3.”

What is the Warehouse-First Architecture

Building on top of Snowflake is certainly something we will see more of, but a new crop of modern cloud companies is taking it a step further and enabling their customers to build on top of their own CDWs and data lakes.

At Rudderstack, we take the warehouse-first approach, building your customer data lake on your data warehouse, but the architecture’s value applies across the data stack. For example, our friends at Panther, who build SIEM tooling for cloud-focused security teams, use the warehouse-first approach in the security space. On the other end of the spectrum, MessageGears is a customer marketing platform that is warehouse-first, running on its customers’ CDWs.

This is the warehouse-first architecture: instead of providing (and charging) for storage infrastructure and running features in black boxes, warehouse-first tools are architected to build the data lake in the user’s CDW.

What are the Benefits of Warehouse-First Software

Warehouse-first may seem like a minor difference in the way applications operate compared to the traditional model where SaaS providers store their customers’ data in their own (often proprietary) databases, but the implications are huge.

Improved Data Control

The data lake that warehouse-first applications build and operate on top of is stored in their customers’ data warehouses. So, if you use warehouse-first tools, you don’t have to rely on the vendor to protect your sensitive data. It’s in your data warehouse, and you have control of its security and privacy.

Additionally, you don’t have to deal with black boxes that use your data in ways you can’t access. Identity resolution is a great example: 3rd-party vendors make assumptions about how to resolve identities using a copy of your data that they store. If your use case doesn’t match their assumptions, you won’t get as much value out of the tool. With warehouse-first tools, though, you can see and modify functionality around things like the identity graph because it lives on your owned CDW.

Increased Flexibility with no Duplicated Data

Many customers want the flexibility to use the data from data-intensive applications for analysis and activation in other tools. This flexibility isn’t possible with traditional vendors because most don’t allow direct access to their data lake. So if you want to perform analysis on your data or enrich it and use it for activation in another tool, you have to export it to your data warehouse first. This data duplication is expensive, inefficient, and unnecessary.

Warehouse-first applications give you this flexibility without requiring you to duplicate your data. The data lakes these applications build are already in your warehouse. So you can analyze your data, enrich it, and connect it to other tools without any data duplication.

From a pipeline management standpoint, you don’t have to create and manage infrastructure whose only purpose is moving data from a proprietary vendor database to your CDW—it’s already there and ready to use.

Lower Costs

Since warehouse-first applications don’t store their customer’s data, they can’t charge for it. This has resulted in significantly lower pricing compared to traditional vendors.

You have to pay a bill to your CDW; however, as your data is in your warehouse, you know how much you’re storing and its costs. This means you can make your own decisions on data retention and more accurately forecast expenses.

The resulting cost savings are substantial. At RudderStack, customers frequently tell us they save up to 66% when compared to Segment!

Choose Warehouse-First Applications

Data silos and inflated bills from vendors of data-intensive applications are an artifact of design decisions made before CDWs were a viable option for their data lakes. Engineering teams had to build their own data stores, but that isn’t the case anymore. The capabilities and low cost of CDWs make them the perfect foundation to build modern applications.

That’s why you should search for and choose warehouse-first applications. It’s a modern design decision based on modern technology that makes applications more secure, flexible, and cost-effective.

Sign up for Free and Start Sending Data

Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app. Get started.

Published:

February 16, 2021

Warehouse-First, the More Secure, Flexible, and Cost-Effective Application Architecture

The Data Maturity Guide

What is the Warehouse-First Architecture

What are the Benefits of Warehouse-First Software

Improved Data Control

Increased Flexibility with no Duplicated Data

Lower Costs

Choose Warehouse-First Applications

Sign up for Free and Start Sending Data

More blog posts

Event streaming: What it is, how it works, and why you should use it

From product usage to sales pipeline: Building PQLs that actually convert

RudderStack: The essential customer data infrastructure

Get started today

Company

Company

Products

Products

Read our documentation

Resources

Resources

Join the conversation

The Data Maturity Guide

The Data Maturity Guide