Level up with top data content curated by team RudderStack

Get the data reading guide

Join Databricks, dbt, Fivetran, Hinge, & EssenceVC for a live discussion on the modern data stack.

Register Now
Blog banner

Engineering

Dogfooding at RudderStack: Tracking Plans Part 1

Benji Walvoord

What are Tracking Plans, & why do you need one?

At RudderStack we talk a lot about the importance of owning your own data and the competitive advantage that can come from building robust analytics with complete data. Data trust is fundamental to this construct. In order to trust the data, you must trust the tools that are providing that data. That’s why we built our Data Governance API and the new Tracking Plans feature we are getting ready to beta.

RudderStack Tracking Plans are the latest offering within our Data Governance API and have been one of our most requested features to date. Unlike RudderStack Transformations, which allow you to transform your data in flight, Tracking Plans allow you to plan and prescribe what your data should look like in the first place. Tracking Plans address three fundamental issues related to streaming data:

  1. Missing or improperly configured data breaks downstream SaaS applications and data warehouses. This causes problems like poorly executed automated campaigns and broken dashboards.
  2. Poorly named and duplicative events and properties. This creates confusion and mismapping in downstream tools and data warehouses.
  3. Upstream data providers make changes resulting in altered event streams. This leads to 1 and 2 above with little to no advanced notice or the ability to fix it.

How Tracking Plans solves these issues

RudderStack Tracking Plans allow you to define the specific event names and properties for each of your track, group and identify calls. In addition, you can assign the type of data associated with each property or attribute and specify whether that property or attribute is required. The Tracking Plans API also supports versioning for better control of your data streaming.

With your Tracking Plans in place, you can use the existing Data Governance API’s to evaluate your inbound events, payload samples and metadata to compare them against your plans. You can also use the RudderTyper tool we’re releasing alongside Tracking Plans. RudderTyper is a tool for generating strongly-typed RudderStack analytics library wrappers based on your published tracking plan specs, meaning your data will conform to your defined schema upon capture.

What does the future hold?

Well, that’s where we need your help. We are currently working with a few Alpha customers and using Tracking Plans ourselves on our own production instance of RudderStack. What we have on the roadmap are decisions about what types of errors or schema violations we want to track and then how to handle them. Although not set in stone, here is a sneak peak into what we’re thinking so far:

Violation Type
Unplanned Events
Unplanned Properties
Mismatched Data Type
Required Field Missing
Description
An event for which no schema has been defined.
The event is defined but the property or attribute does not exist in the schema
Data type for a particular property does not match what is defined in the schema
The payload is missing a property set as required in the schema
Action Taken

Once this feature is fully built, the actions taken on each of these violations could include one or more of the following:

  1. Rejecting the entire payload
  2. Accepting the entire payload and sending it to downstream destinations w/ a warning flag
  3. Rerouting the entire payload to an S3 bucket (aka “dead letter queue”)
  4. Removing the additional properties from the payload that are not defined in the schema
  5. Inserting default values for required fields missing in the schema
  6. More advanced options based on schema comparisons outlined here: https://ajv.js.org/options.html

So, where do you start?

Connect with us on Slack or shoot us an email if you are interested in participating.

In the meantime, let’s take a look at how your typical SaaS business would walk through the steps of designing and implementing Tracking Plans with RudderStack. As a part of the RudderStack Data Governance API, Tracking Plans are first and foremost managed through code, but we understand that designing the plan will be a collaborative effort involving developers and non-developers, so we designed a Tracking Plans Template Google Sheet to help get teams started.

The first step is to get your hands on a copy of the RudderStack Tracking Plans Template which will be available soon. This will help you and your team organize the various events and fields you want to capture from each of your RudderStack sources. The sheet does require that you have a user access token for your account. For help on how to create a user token, check out our Access Token user documentation.

The next step is to create a wish list of events and properties you think you might need. The goal of this first pass is not to create the be-all-end-all list, but primarily to see where data needs intersect amongst the various stakeholders and to begin building out the data architecture for your company. During this exercise, it can be helpful to start with existing higher-level paradigms like the sales and marketing funnel or executive summary reports as the underlying metrics for these are generally already agreed upon. Starting with what you already know you need to measure is a great way to begin drilling into how you measure it and, more specifically, where the data comes from in the first place and what properties or attributes will be measured (i.e., required keys and data types).

For example, let’s take a sample SaaS business that has a funnel measuring the following:

Stage
Unique Site Visitors
Leads
MQLs
Opportunities / Free Trials
Product activation / POC
Customers
Product usage
Team
Marketing - Paid Digital
Marketing - Engagement
Marketing - Engagement
Sales - Outbound
Sales - Sales Engineering
Sales - Coffee Drinkers
Customer Success

Now that we have each stage defined, let’s dive deeper into exactly what data elements will need to be created and tracked to reproduce our funnel and assign a source for the data. It is important to note that in some cases, such as defining a Marketing Qualified Lead (MQL), there may be multiple sources of information that contribute to qualifying any one particular lead, but in this table we are defining what system retains that information so that, should we ever need to perform an audit, Salesforce (in this example) is the system where we would confirm whether this particular lead was flagged as a MQL or not. As we are defining each metric, we will assign it to a tracking plan on our google sheet.

Funnel Step
Visitor
Lead
MQL
Opportunity / Free Trial
Product Activation
Customer
Product Usage
Source
Marketing Website & App
Marketing Website & App
Salesforce
Salesforce
App
Salesforce
App
Metric
Count of Distinct Anonymous ID
Count of Distinct Email Addresses per domain
Count of Salesforce Leads (not deleted) with MQL checked
Count of Opportunities where Opp Type = Initial
Has the User Created a Connection
Opportunity = Close Won
Total Event Volume
Tracking Plan
Page View (Marketing) Page View (Application)
Form Submit (Marketing) App Signup (Application)
N/A (SFDC Cloud Extract)
N/A (SFDC Cloud Extract)
Connections Created
Opportunity Won
N/A (aggregated from warehouse tables)

Some of our metrics will come from RudderStack Cloud Extract sources or other non-RudderStack tables in our data warehouse and therefore will not be defined in our Tracking Plan for event data.

Building out Tracking Plans

In the funnel map above we defined six different events and three different tracking plans that we want to build. This by no means defines the totality of your tracking plans but will be enough to get you started using the tools.

RudderStack Source (Tracking Plan)
Marketing Site
Marketing Site
Application
Application
Application
Salesforce Webhook*
User Action Name
Page View
Form Submit
Page View
App Signup
Connection Created
Opportunity Won
RudderStack Event Name
page_view
form_submit
page_view
app_signup
connection_created
opp_won

*Typically Salesforce and other SaaS tools will have data extracted using RudderStack Cloud Extract every 24 hours, however critical events like marking an Opportunity as won are important enough to trigger a real-time event being passed back through a Webhook source.

With the sources and events defined, we now need to identify the properties and property types for each event. These should now be added to the Tracking Plans Google Sheet. Each Source should have its own tab copied from the “Import Template”. The tab below is a copy of the Marketing Site tab we created.

Event Name
page_view
form_submit
Description
User visits a page
User submits a form
Property name
link_source
page_title
page_URL
form_id
label
category
utm_source
utm_medium
utm_campaign
utm_content
utm_term
raid
search_text
Property type
string
string
string
string
string
string
string
string
string
string
string
string
string
Property description
Value of UTM parameter defined as ?link_source={value}
Title of the page
URL of the page
The ID of the form (configured in Sanity)
Label for Google Analytics events (if needed)
Category for Google Analytics events (if needed)
Optional utm parameters
Optional utm parameters
Optional utm parameters
Optional utm parameters
Optional utm parameters
Optional utm parameters
The text the user typed into the search field
Req'd
O
R
R
R
O
O
O
O
O
O
O
O
R

With the basics of our Marketing Site source plan created, we can now upload it to RudderStack by configuring additional settings in the Google Sheet (more on this when we release the feature).

One exciting part of the Tracking Plans Google Sheet is that you can download the latest version of a tracking plan from the RudderStack Tracking Plan API, then upload any changes you make, ensuring everyone working on the plan has the most recent set of changes.

Once a Tracking Plan has been uploaded to the API via the Google Sheet, you are ready to begin using RudderTyper. Download instructions and tutorials will be made available to beta participants.

Tracking Plans are only one piece of the puzzle

As useful as RudderStack Tracking Plans will be (and already are for our team and beta users), it should also be noted that there will always be scenarios where you still need to transform the data once it arrives from the source, either for enrichment, filtering or massaging based on the needs of the various downstream destination tools. Tracking Plans and Transformations go hand-in-hand to ensure a stable and trustworthy data feed.

There may also be times where you aren’t sure what to do with particular variations of events streamed from your sources and in these cases sending them to a backup bucket such as Amazon S3 or Google Cloud Storage is an elegant solution. Check out our documentation for more information on how to leverage a variety of Cloud Storage Platforms.

Beta registration

As we continue our mission of giving developers full control over their data and their tools, we recognize and appreciate the commitments our customers have made to help improve the product and we thank you. If you would like more information on how to get signed up, please contact katie@rudderstack.com or hit us up on Slack.

image-80b45fcf3370f51288d281bf20f6be12f453920e-512x512-jpg
About the author
Benji Walvoord
Benji has a long history of building data-driven startups in the healthcare and e-commerce industries. He's also an ultra-marathoner which he claims is only to compensate for being an Arsenal supporter.
Subscription
Subscribe

We'll send you updates from the blog and monthly release notes.