Dealing with event data is dirty work at times. Developers may transmit events with errors because of a change a developer made. Also, sometimes errors could be introduced if the data engineering team decides to change something on the data warehouse schema. Due to these changes to the schema, data type conflict may occur. How can someone deal with all the different event data issues that might arise in a production environment? This blog discusses how RudderStack handles event filtering and value aggregation without introducing manual errors.
Having an expressive environment like RudderStack offers endless possibilities of how a data engineering team can interact with the data. In this blog post, we will explore just two of the most common use cases we’ve encountered among the RudderStack community. Event filtering and value aggregation are universal, simple to implement, yet very powerful.
User Transformation for Event Filtering and Value Aggregation
You can define user transformations in the Configuration Plane of your RudderStack setup. Few sample user transformations are available on our GitHub. This blog provides an insight into one such sample transformation that you can use for:
- Event Filtering: This stops events from passing to a destination. You might need to filter events where an organization employs multiple tools/platforms for addressing different business requirements. Also, you may want to route only specific events to specific tool/platform destinations.
- Value aggregation: This allows aggregation of values on specific attributes of particular event types. You might need to aggregate values where an organization is not looking to employ a tool/platform to perform transaction-level record keeping and/or analysis. Instead, they want consolidated records/analytics. So, this kind of transformation helps in reducing the network traffic, and request/message volume. This is because the system can replace multiple events of a particular type by a single event of the same type with the aggregated value(s). This transformation also helps in cost reduction, where the destination platform charges by volume of events/messages.
You can view the sample transformation on our GitHub page.
You need to contain all logic within the
transform function, which takes an array of events as input and returns an array of transformed events. The
transform function is the entry-point function for all user transformations.
The code snippet above shows how you can use the
A variation of this code is also possible. Here, the values in the array of event names are the ones you want to retain, and you remove the not (
!) condition from the
return statement in the penultimate line.
Below code shows event removal based on a simple check like event name match but more complex logic involving checking the presence of value for a related attribute.
As you can see from the above examples, you can use the filtered array available as output from one step as the input to the next. As a result, you can daisy-chain the transformation conditions.
Finally, the following code shows how you can prepare aggregates for specific attributes across events of a particular type present in a batch. After this, the code returns a single event of the concerned type. Also, the code returns the aggregated values for the corresponding attributes.
In the above snippet:
- First, the code collects the
spin_resultevents into an array.
- Then, the code aggregates the values for three attributes –
no_of_spinby iterating over the elements of the above array.
- After this, the system assigns the aggregated values to the respective attributes of the first
spin_resultevent in the array.
- Now, the code separates the events that are not of the target type (
spin_resultin this case) into another array. If there were no such events, an empty array is created.
- Finally, the system adds the
single spin_resultevent to the array created in the previous step, and the result is returned.