RudderStack Logo
  • Product
    • RudderStack Cloud

      Fully managed, scalable and production ready customer data pipelines for your data infrastructure.

    • RudderStack Open Source

      All the core features and integrations that make RudderStack the customer data pipeline of your data infrastructure.

    • Event Stream
    • Warehouse Actions
    • Cloud Extract
  • Learn
    • Blog

      Read articles, feature announcements, community highlights and everything around data.

    • Video Library

      Watch tutorials on how to get the most out of RudderStack and your Customer Data.

    • Migration Guides

      Howtos and best practises for migrating from platforms like Snowplow and Segment to RudderStack.

    • Documentation
    • Segment Comparison
    • Snowplow Comparison
    • Case Studies
  • Integrations
  • Docs
  • Pricing
  • Login
  • Sign up free
Sign up free
Customer Data Pipelines Play a Key Role in Data Privacy

Customer Data Pipelines Play a Key Role in Data Privacy

By Gavin Johnson/March 07, 2021

Customer data pipelines play a critical role in the privacy of your customer data. They are one of the primary and most expansive collectors of your customers’ personally identifiable information (PII). They are also one of the most expansive sharers of customer data - with one of the primary use cases being event streaming to frequently large libraries of destination integrations.

Due to their specialized role of collecting and sharing customer data, customer data pipelines can either help ensure your data privacy or wreak havoc on it.

In this post, we’ll explain how your customer data pipeline can help improve your data privacy and how to ensure your data privacy with RudderStack.

Data Privacy vs. Data Security

To remove one common vector of confusion before we launch into this post, we want to make sure the difference between data privacy and data security is clear.

Data privacy - the focus of this post - is about what data is collected, stored, and for how long it is retained; and what customer data is shared. What customer data are you collecting and storing, and how are you using that data?

Data security is about how collected data is protected - where data is stored, who has access, whether data is encrypted, etc. How are you keeping the customer data you store safe?

Your Customer Data Pipeline can Improve Your Data Privacy

Your customer data pipeline can give you fine-grained control over what data you are sending to which tools and what data you are storing. This can help you avoid data privacy issues before they ever occur.

The three processes below are designed to help you ensure that your customer data stays private. You can implement all of them with a robust customer data pipeline tool like RudderStack.

Data Masking

Data masking is taking fields in your event data and obfuscating them. This is most frequently used to hide PII. Your customer data pipeline can mask your PII before it is ever sent to a destination or stored in your warehouse.

For example, if your event payload includes the following attributes…

"globalUserId": "XYJ458907432AAC",
"userId": "contactUser",
"userFirstName": "Rudder",
"userLastName": "Stack",
"userEmail": "contact@rudderstack.com",
"userSSN": "123-45-6789",
"eventType": "newsletter-sign-up”
view raw event_example.js hosted with ❤ by GitHub

One level of data masking would remove the directly identifiable PII, like SSN and email address.

"globalUserId": "XYJ458907432AAC",
"userId": "contactUser",
"userFirstName": "Rudder",
"userLastName": "Stack",
"userEmail": "XXXXXXX@XXXXXXXXXXX.XXX",
"userSSN": "XXX-XX-XXXX",
"eventType": "newsletter-sign-up”
view raw data_masking_level_one.js hosted with ❤ by GitHub

Another, more stringent level of data masking would remove all unnecessary attributes. Since most of the attributes in this payload are identifiers in one way or another, only the global identifier and event type would be unmasked.

"globalUserId": "XYJ458907432AAC",
"userId": "XXXXXXXXXXX",
"userFirstName": "XXXXXX",
"userLastName": "XXXXX",
"userEmail": "XXXXXXX@XXXXXXXXXXX.XXX",
"userSSN": "XXX-XX-XXXX",
"eventType": "newsletter-sign-up”
view raw data_masking_stringent_level.js hosted with ❤ by GitHub

Attribute Removal

Similar to data masking, attribute removal is selectively removing attributes from your event data. Not every application you send event data to needs all of the customer data you collect in your events. Attribute removal can be used to remove PII or to remove unnecessary customer data and reduce payload.

Using the same event example, if you wanted to activate that data by having it trigger an email send in your email/marketing automation tool, you would remove the unnecessary attributes for sending an email - userId and userSSN.

"globalUserId": "XYJ458907432AAC",
"userFirstName": "Rudder",
"userLastName": "Stack",
"userEmail": "contact@rudderstack.com",
"eventType": "newsletter-sign-up”
view raw attribute_removal.js hosted with ❤ by GitHub

Event Filtering

Not all of the tools you event stream to need every type of event. Event filtering is the process of removing events from an event stream based on filtering criteria. This ensures that only the events you want to activate on are ever shared with the tools that you activate in. So you don’t overshare your customer data with tools that only use a small portion of it.

Using the same event example, if you filtered to where eventType = "newsletter-sign-up", the sample event would be included. If you filtered to where eventType != "newsletter-sign-up", the sample event would be excluded.

RudderStack Transformations Keeps Your Customer Data Private

RudderStack Transformations allows you to transform your event data in-flight - after collection, before delivery. Transformations are reusable functions - written in JavaScript - that can be applied to the data in your event streams prior to delivery to a destination tool or your data warehouse.

With RudderStack Transformations, you can implement all three of the data privacy processes detailed above, plus any other type of data transformation you can code in JavaScript. Transformations are applied on a destination-by-destination basis, so you can implement specific privacy processes for each tool you use and your data warehouse - only sharing the exact customer data you need to share. And they are reusable, so it’s easy to apply the same transformation to multiple destinations. Write it once and apply it everywhere.

We maintain an open source repository of Transformations templates that implement a wide variety of data transformations - from data masking, attribute removal, and event filtering to event enrichment. The JavaScript code for individual transformations is stored in this repo. You can copy it, edit it to work with your data, and paste it into RudderStack Transformations.

  • Data masking template
  • Attribute removal template
  • Event filtering template

If you want more details about how to use RudderStack Transformations, read our step-by-step guide on adding custom Transformations.

If you want more details about how to mask PII with RudderStack Transformations, read our blog post Protect Personally Identifiable Information (PII) in Your Apps Using RudderStack.

Try RudderStack Today

Start using a smarter customer data pipeline that builds your customer data lake on your data warehouse. Use all your customer data. Answer more difficult questions. Send insights to your whole customer data stack. Sign up for RudderStack Cloud Free today.

Join our Slack to chat with our team, check out our open source repos on GitHub, subscribe to our blog, and follow us on social: Twitter, LinkedIn, dev.to, Medium, YouTube. Don’t miss out on any updates. Subscribe to our blogs today!

Gavin Johnson
Gavin Johnson
Product Marketer at RudderStack. Ex-PMM at New Relic & AT&T. Ex-consultant at Deloitte. Ex-sys admin. (Sometimes) Ex-developer.

Recent Posts

Choosing the Best Tool for Mobile Attribution: Kochava, AppsFlyer, Adjust, Branch
Choosing the Best Tool for Mobile Attribution: Kochava, AppsFlyer, Adjust, Branch
By Ruchira Moitra/March 28, 2021
Modern businesses are heavily reliant on multi-channel strategies such as marketing campaigns, targeted messaging, etc., to…
Read More →
Build or Buy? Lessons From Ten Years Building Customer Data Pipelines
Build or Buy? Lessons From Ten Years Building Customer Data Pipelines
By Soumyadeb Mitra/November 19, 2020
Before RudderStack, I tried to build customer data pipelines inside a large enterprise using homegrown and vendor solutions. This…
Read More →
Customer Data Pipelines Play a Key Role in Data Privacy
Customer Data Pipelines Play a Key Role in Data Privacy
By Gavin Johnson/March 07, 2021
Customer data pipelines play a critical role in the privacy of your customer data. They are one of the primary and most expansive…
Read More →

Subscribe

We'll send you updates from the blog and monthly release notes.

Explore RudderStack Today


⚡ Our Free plan includes 500,000 events per month so you can explore and test the product.

Install an SDK, connect a destination, and see data start to flow.


Sign up free

Company

  • About
  • Contact Us
  • We're Hiring!
  • Privacy Policy
  • Terms of Service

Product

  • RudderStack Cloud
  • Open Source
  • Segment Comparison
  • Snowplow Comparison

Resources

  • Blog
  • Video Library
  • Documentation
  • Slack Community
  • The DataStack Show Podcast

JOIN THE CONVERSATION

Learn more about the product and how other engineers are building their customer data pipelines.

Join our Slack Community

READ OUR DOCUMENTATION

Technical documentation on using RudderStack to collect, route and manage your event data securely.

Go to docs
RudderStack Logo
© RudderLabs Inc.