Getting Started with Tracking plans
In this tutorial, we’re going to be outlining the steps for getting started with Tracking plans with all the available options for implementation, but first off what are Tracking plans and why should we care?
Tracking Plans allow us to plan and prescribe what our data should look like before it is sent to our destinations. We all know that data quality issues are one of the primary areas data engineering teams struggle with especially when dealing with data that is ingested through different sources with the different shapes and formats it comes in.
There’s also the question of what does data mean to different teams that use it? Each team might have a different spec for the data, that could also be different based on the industry. By collaborating with different data owners and determining a standard for what data needs to look like we can ensure that non-compliant ingested data can either be discarded or rerouted. Tracking plans helps different teams come together and define what non-compliant data looks like. This can be in the form of an unplanned event, unplanned schema or both and also what actions we need to take once that is detected.
Tracking plans are an essential part of a healthy event instrumentation process. You mainly want to think about tracking plans either in the beginning stages of your instrumentation or when you have gone through multiple iterations of testing and know what your primary event streams look like.
RudderStack Tracking plans consist of:
- The Tracking plan API (optional): You can create your tracking plans either from the sheet directly or by making calls to the tracking plan API, this is especially helpful when we want to incorporate this into our CI/CD pipelines.
- Data Governance API (optional): This can be used to get metrics and metadata on the event sources with the properties that each event has. Can be used to derive event names from an existing source for your tracking plans. You can interact with this API and make calls to it separately, but the tracking plan will be implicitly making calls to it to create a plan from source data.
- The Tracking plan sheet: this is the core component of a Tracking plan, think of it as a reference guide and is where you’ll outline your events and their properties and also associate them with a source in your RudderStack account.
- RudderTyper (optional): a free Open source tool that developers can run within their IDEs to allow for easier enforcement of tracking plan event specs through intellisense autocomplete features.
There are 2 approaches we can think of when it comes to Tracking plans:
- Creating a tracking plan from scratch using data that is input manually, this approach doesn’t require the Data Governance API and can be used alongside RudderTyper to enforce tracking plan schema.
- Creating a Tracking plan from existing event stream source data, requires the Data governance API, this is the recommended approach to Tracking plans.
Let’s look into the first approach.
Create a tracking plan from scratch:
Uploading Tracking plan:
This is the best option for users on the free or pro tier in RudderStack.
You can start with one of these and add or remove events based on your event spec requirements. It’s really intended to give you a starting point and a best practice for how you want to think about event collection and the event properties you want to track.
- You will start with a Tracking plan spreadsheet, you can find this in our official Tracking plan documentation.
In this approach it is up to your team to think of what you want your event spec to look like, every tracking plan spreadsheet comes with an “Example Tracking plan” template that you can use as a reference.
Every event spec sheet will need to have two categories:
- Event Definition: the name of your track events that you want in your tracking plan.
- Property Definition: the properties in your track events that are optional or need to be present.
If you are not sure where to start from when authoring your event spec, make sure to check out this best practices blog post.Â
- Starting from the “Homepage” tab at the bottom. At a first look at this, it might seem that there is a lot of information that you need to provide, but really all you need to input is the following:
- Name of the sheet you want to use as your tracking: in my example this is called “Example Tracking Plan”
- personal access token: this is something you would get by:
- Logging into you RudderStack account
- Click on settings in the left hand menu, scroll down and select “Generate a new token”, make sure you copy it into a safe place because you won’t be able to access it again.
3. Once that is done, we have all the information we need, click on RudderStack Tracking Plan form the menu and select “Upload Tracking Plan”
4. Once I get a message saying it was uploaded, we can click “Show All Tracking Plans”. I can then go over to “List of Tracking plans” in the bottom menu and see that my tracking plan is listed there with a Tracking plan ID and a version number.
Note: Tracking plans support versioning, which enables us to iterate and monitor the changes in a tracking plan event spec, the intention for tracking plans is to be an iterative process and a part of every sprint so having versioning there helps keep track of all the changes.
Testing Tracking plans
If I head over to the RudderStack dashboard, clock on Monitor and then Tracking plans in the left hand side I can then also see my tracking plan listed there along with the version of it that was uploaded.
- Now that my tracking plan has been uploaded I can select it and then Link it to a source, here I want to link it to my Javascript source so I can flag any schema violations in track events occurring in my website. (this will be based on the events I have listed in my event spec sheet)
- Next we want to decide, what do we want to flag as a violation? Is it an unplanned event? An unplanned property? Or both? It could be that other violations have occurred like type mismatch or required fields were missing. It’s important here to be very specific here in what we define because we don’t want to drop events that we actually need.
Note: The default option here would be to keep all events that have violations in them, but propagate and flag the errors so we can decide based on the violation the action we need to take.
- Tracking plan violations appear in the destination payload of the event that is sent, typically we will want those violations to be flagged in the data warehouse. To test this out in this example, we will use a webhook destination, you can use any webhook testing site here. We will be using webhook.site to view the incoming payload after firing off events from the javascript source.
- In the sheet I have uploaded I had one event only which was “Product Added”, after firing a track event from my site with the name “Cart Viewed” (which isn’d a part of my plan), I can see an extra object in my payload which has details of the violation errors along with the tracking plan ID that was used to detect it.
Create a tracking plan from existing source:
In the previous approach, we have seen how we can author our own sheets with the event spec that needs to be tracked through our tracking plan. This works if I am new to using Event streams and looking to only collect a few events. In the cases where my Event volume grows over time, things could get really messy and it’ll be much more difficult to track all the events especially with things changing quickly.
We can use the second approach here which is creating a tracking plan from an existing source, this is similar to the previous approach but instead we are using the Data Governance API.
Data Governance API is an Enterprise tier feature that lets us pull event metadata from the events ingested through the source. Note that we will not be calling the API ourselves, those calls are made from the macros that are built in the tracking plan sheet, but in order for it to work we would need to be on the right tier.
1. In the tracking plan sheet, we will select “Additional Details” in the bottom menu, here we’ll have to enter a few more details like the:
- Data Plane URL: make sure to remove the “http://” part
- Write Key of the source that we we want to pull the event schema from
- Username and password: those are provided when upgrading from free to pro and enterprise tier from our team.
- Tracking Plan name: In this example it’s “RS Tracking Plan”
2. Once I fill in all that information, I can then select “Create Tracking Plan from Existing Source Events” from the RudderStack Tracking Plan menu bar.
3. Once this is done, this will make a call to the Data governance API and fetch the current event names and properties that are tracked from that source. When the script finishes running you will notice a new sheet was added in the bottom bar.
4. Now you can repeat from Step 2 in “Create a tracking plan from scratch” to upload the tracking plan to your source in RudderStack.
Conclusion
Tracking plans are essential to ensuring we are instrumenting pipelines with high quality data. With this guides we have seen how to implement tracking as a part of our Event stream instrumentation process using the different available approaches.