What is Data Activation?
Data activation is a hot new term in the world of data engineering and data science. As is often the case with hyped terms in our fast-moving space, there is some confusion around it. In this post I’ll provide some clarity, looking specifically at how data activation relates to data and engineering teams. I’ll also provide a guide that covers different data activation workflows and when to use them.
Watch the video on data activation and the data activation lifecycle.
What is Data Activation?
The definition of data activation is simple. Data activation is the execution of business activities that are informed and fueled by data. The goal of data activation is to improve business outcomes. The cause of the confusion around the term is its close link to data delivery.
Data delivery involves sending data from some source into the systems used by business teams like marketing, product, and customer success for data activation. With the proper data in their tools, teams can build business use cases to achieve outcomes like reducing churn, increasing conversion, or prioritizing lists of customers. Data activation, then, is best thought of in the context of a larger process, which we call the data activation lifecycle, that accounts for all of the engineering work required to make data activation possible.
Before founding RudderStack, I spent over a decade as a data professional working to enable data activation. Here are some of the common business use cases that me and my teams built out:
- Sending discounts to high LTV customers who abandoned cart – Enabling the marketing team to send 20% discount coupons to loyal customers who don’t complete a purchase can drive significant revenue. I’ve implemented solutions for both rule-based LTV (customers who have spent more than $X historically) as well as predictive LTV (using an ML model to predict which customers will be high LTV).
- Enabling customer success to mitigate churn – The old adage about the cost of retaining a customer is true: it’s far cheaper to enable customer success to engage with customers who are likely to churn than it is to acquire new customers. These use cases tend to combine multiple types of activity data (app, website, usage, etc.) with customer support ticket data and, many times, sentiment analysis.
- Helping sales teams prioritize accounts – This use case typically takes the form of a lead score or account score. Similar to my LTV work, I delivered both basic, rule-based lead scores (i.e., a customer exceeds usage or adopts certain features) as well as ML models for predictive scoring.
These data activation use cases aren’t new. Companies have been trying to solve for them before I started my career—and before modern data infrastructure existed. So what changed recently, and why is there so much interest in data activation?
A brief history of data activation
Before the advent of cloud data warehouses and data lakes like Snowflake, Redshift, BigQuery, and Databricks, most data teams weren’t building many data applications beyond analytics. On-prem warehouses like Teradata didn’t scale well, and building data applications on data systems like Hadoop (whether on-prem or cloud) required substantial engineering effort. Those limitations meant that only large enterprises could invest in more advanced data applications and data management platforms to drive use cases.
Big enterprises that did invest in Hadoop/Spark and other tools to build advanced data applications typically ran them on-prem behind a VPN and a cloud firewall. It was extremely challenging to connect to these applications from the outside. Meanwhile, starting in the early 2000s, business systems like marketing automation platforms, CRMs, and ticketing systems were rapidly moving to the cloud. Getting on-prem data into these SaaS applications often required additional engineering work.
The result was that many companies were forced to use time-consuming, manual, error-prone processes to export data from data systems and load it into SaaS systems where business teams could use it. Many enterprise companies today still feed marketing with CSV files that represent customer segments. This was not only inefficient, it also created a significant data silo problem, where each SaaS tool had a different, limited view of the customer. Business teams faced a painful workflow of brittle ‘direct’ integrations between SaaS platforms, managing segments by spreadsheet, and reconciling endless custom fields across systems.
The advent of cloud data warehouses changed things. Modern warehouses were much easier to scale, and many companies, even startups, began setting them up to collect and process data and to build data applications for use cases like LTV and lead scoring. Furthermore, because they run on the cloud, it was easier to connect them to all kinds of data sources. The cloud data warehouse was a major step in breaking data silos and centralizing data in one store.
Once the data was collected, though, it still needed to be moved from the warehouse into business tools. Several solutions emerged. Teams build customer pipelines, and systems like CRMs (Salesforce) and customer engagement platforms (Braze) rolled out the ability to connect directly to cloud data warehouses and leverage the data there. A new software category called Reverse ETL also emerged to facilate this data movement and replace the custom data pipelines that many data teams had already written. Additionally, entirely new types of systems were built specifically to run on the warehouse. Growthloop, for example, is a marketing automation platform that runs completely on the warehouse.
The innovation around the cloud data warehouse, which came as digital commerce exploded, unlocked the potential for companies to leverage their customer data to drive value at a scale never before seen. Today, new ways to connect to the warehouse enable business teams to build out high-impact use cases easier and faster. This new potential also led to more pressure and higher expectations of data teams when it comes to driving business outcomes.
Moreover, while modern data infrastructure is powerful, its costs can scale quickly. We have spoken to customers who were spending $1M for on-prem data warehouses and, after moving to the cloud, are spending 2-5x more.
There’s no doubt that the user experience is better, analytics are delivered faster, and business teams can be supported more efficiently and more effectively, but CFOs are asking to see demonstrable justification for the increased investment. In short, more data activation means data leaders need to prove the ROI of their stack. Now, we’ll consider the different approaches to data activation you can use to do just that.
A guide to data activation
There are many different data activation workflows and various ways you can set up the different workflows. In this guide, I’ll cover the different methods you can use and when to use them.
Note: while reverse ETL has become synonymous with data activation because of marketing from some reverse ETL vendors, this is a misnomer. Reverse ETL tools provide pipelines for data delivery from your warehouse into business tools–one of the important data activation workflows I’ll cover below. This is just one workflow, though. In many cases, data activation does not require reverse ETL and, in fact, a survey about data activation confirms that over half of people believe data activation is running data-driven campaigns, no matter how the data is delivered or what the destination is.
Activating first-party data from websites, apps, and server-side systems
First-party user behavior data is valuable for teams companywide. It can be used to fuel analytics as well as use cases for data-driven marketing, product, and customer success teams. Consider a simple user sign up. After this event, the marketing team will want to send a welcome email and trigger other marketing campaigns, the product team will want to track adoption, and the customer success team will want to assign the user/account to a rep. First-party data collection is a critical piece of data activation for this cohesive customer experience.
There are multiple ways to implement this data activation flow:
Custom code → single SaaS API
In this setup, developers write custom code that sends data from their website or app to an API endpoint provided by the SaaS tool. While this gives engineers ultimate flexibility, integration maintenance becomes a major burden over time, creating low-value data activation work, especially as tools are added and APIs are changed/updated.
Single SaaS SDK/embed → single SaaS platform
In this setup, you embed code from the SaaS tool used for data activation in your website or app and that code sends data directly back to the tool.
Sales and marketing tools often provide embeddable forms that collect submissions and create users/leads. Analytics tools provide their own SDKs that send data into reports and dashboards. More modern platforms like Braze offer a combined approach, where their SDK can both create new users and capture granular user behaviors.
The challenge with this approach is that embedding tons of third-party code in your app or website slows things down. It also perpetuates issues with data silos because data is not being centralized, it’s sent to each system separately.
Single SDK (CDP) → multiple SaaS platforms (one-to-many)
To solve the challenges of both custom integrations and 1-1 embeds, many customer data platforms (like RudderStack) offer a one-to-many solution, in which customer interactions are captured once by the CDP’s SDK, and distributed to many different business systems (CRM, marketing automation, analytics, and even warehouse/data lake). These are popular tools because they enable you to sync the entire customer journey to many tools at once.
In this setup, the CDP handles all of the integration work and API changes and reduces the amount of code in your app or website, making it much easier to deliver marketing data to marketing channels for optimization.
Activating data from one SaaS business tool in a separate SaaS business tool
Sending data from one SaaS tool to another is one of the most widely used data activation workflows. A common use case within this workflow is syncing contact records between tools (i.e., a lead is created in a marketing platform, and then synced to the sales CRM). There is a long tail of SaaS-to-SaaS data activation use cases because of the proliferation of business tools, especially in the martech ecosystem (which often includes social media tools and e-commerce platforms).
While these kinds of connections for data activation can become unmanageably complex if used too heavily in a data stack, they are useful for certain types of data syncs where full pipelines aren’t necessary, and for managing one-off data integration needs (as opposed to full data sets).
There are multiple ways to implement this data activation flow:
SaaS → SaaS direct integration
Many SaaS systems, especially in the marketing and CRM categories, have direct integrations. Salesforce has managed integrations with Pardot, Marketo, and hundreds of other SaaS tools. In addition to syncing various business objects, like leads and accounts, direct integrations often also support syncing individual data points for data activation. Syncing data points generally involves some form of field syncing for data activation, where a field name/label in one system (number_of_employees) is mapped to the corresponding field in the other system (employee_count).
SaaS → Low/no code iPaaS → SaaS
Because there are so many business tools out there, it’s impossible for vendors to support integrations across the entire landscape. An entire industry has developed to support the long tail of API integrations for data activation across tools. This category is called iPaaS, which stands for Integration Platform as a Service. Some of the most popular tools are low/no-code solutions like Zapier and Workato. The workflow can become complex, but generally data activation is simple: choose your source system and the data you want to activate, then choose your destination system, map the data fields, and the iPaaS service will take care of syncing the data.
SaaS → technical iPaaS → SaaS
Low-code solutions are often insufficient for complex use cases. Robust systems like Mulesoft provide technical toolsets that extend beyond data syncing and allow users to activate data by managing integrations with code or APIs and more technical workflows.
Activating data from a data warehouse or data lake in SaaS business tools
Companies often turn raw data into analytics (also called business intelligence), metrics, customer profiles, and audience segments in a central data store, most often a data warehouse.
Historically, these decision-making customer insights stayed trapped in the data warehouse because moving that data into business tools required a significant amount of engineering effort.
The mass adoption of the data warehouse as a source of truth (and for data modeling) has changed things, and today there are multiple ways to deliver data from the warehouse to downstream tools where it can be activated to create business value.
There are multiple ways to implement this data activation flow:
Custom pipeline → SaaS
Many companies still build custom pipelines to push data from data stores into business tools for activation. Snowflake to Salesforce is a common example here. These connections are incredibly common, but data teams are increasingly replacing them with vendor-managed integrations, whether provided by the warehouse or a SaaS tool, or pipelines as a service.
Native warehouse app → SaaS via direct push integration
Modern data warehouses like Snowflake offer native apps that can automatically push data from warehouse tables into business tools for activation. The sync is initiated from the warehouse and is most often managed as part of the warehouse implementation.
Warehouse → SaaS via direct pull integration
Modern business tools like Salesforce and Braze give users the ability to pull data into the warehouse directly from the business tool itself. The sync is initiated from the business tool and is most often managed by the team that owns that tool.
Warehouse → SaaS via managed pipelines (reverse ETL)
Data teams often want the ability to sync a table to multiple tools (as opposed to just one). They also often want more control than direct integrations provide but don’t want to build and manage custom pipelines. For this use case, several companies, including RudderStack, offer managed pipelines, often referred to as reverse ETL, that can sync warehouse data to SaaS tools in a variety of ways.
Activating data from APIs or data stores in websites and apps
While activating data in SaaS tools is extremely common, customer data also often needs to be activated directly in websites or apps to drive personalization, recommendations, and other user experiences. This generally requires a website or app to ‘consume’ information from an API or low-latency data store (often via API).
There are multiple ways to implement this data activation flow:
3rd-party API → website/app
Many apps consume information from third-party APIs. For example, a fitness tracking app might consume weather data from a weather API in order to give its users rain alerts.
1st-party data store/API → website/app
In addition to activating 3rd-party data, websites and apps often need first-party data to create user experiences that are informed by the full customer profile and demographics. RudderStack’s Activation API gives developers real-time access to a 360-degree view of their customer for personalization and other user experiences based on traits like lifetime value.
Get a demo of RudderStack’s end-to-end data activation solution
RudderStack gives data teams a tool to enable multiple data activation workflows a single platform:
- Real-time event streaming and integrations allow you to send data from your websites and apps to over 200 business tools and data warehouses automatically
- Reverse ETL pipelines make it easy to sync your warehouse data to business tools
- Activation APIs give your websites and apps access to your customer 360 in real-time
Reach out today to get a demo and learn about our scalable, volume-based pricing.
Implement better data activation workflows with RudderStack
Get a demo with our team today to learn how RudderStack can help youSoumyadeb Mitra
Founder and CEO of RudderStack
Eric Dodds
Senior Director of Product Strategy