How Pachyderm uses RudderStack to master lead qualification
Pachyderm is a data science platform that combines Data Lineage with End-to-End Pipelines on Kubernetes
Challenges
- Building an efficient data tracking pipeline
- Getting siloed data into a centralized data store for analysis
- Gaining deeper insights into user product behavior and optimizing UX to increase customer adoption
Results
- Unified real-time user events, product usage data, and data from cloud sources into a centralized data warehouse
- Leveraging enriched, transformed warehouse data for analytics and product optimization
- Routing enriched data back to downstream tools for inbound and outbound marketing and sales
Pachyderm's Data Stack
- Data Collection and Synchronization
RudderStack Event Stream SDKs, Warehouse Actions, & RudderStack Cloud Extract - Data Warehouse
Google BigQuery - Data Transformation
dbt - Business Intelligence
Sigma - Cloud Toolset for Activation Use-cases
HubSpot, Google Analytics, Facebook Pixel, Intercom, Google Tag Manager, Slack, Salesforce
Pachyderm's data challenges
Pachyderm is a data science platform that lets you easily build and manage your data science pipelines, regardless of their scale and complexity. It allows you to track your data lineage and implement version control for your data. You can set up Pachyderm in your development environment, on the cloud, or use Pachyderm Hub - their fully-managed SaaS platform.
Pachyderm generates gigabytes of data by tracking their users’ product interactions and data from their cloud sources. Previously, all of this data was highly siloed in cloud tools like HubSpot. This meant their data team had to do a lot of plumbing to move the data around to their other marketing and sales tools. As the company evolved and needed more customer insights to grow, they knew they needed a better approach.
They wanted to get all of their data into a centralized data store which they could leverage for product analytics and more efficient marketing.
RudderStack has given us better access to our data. Our data was siloed in cloud sources. Now we have it all in a warehouse, making it accessible to everyone.
Dan Baker, Marketing Ops Manager at Pachyderm
Single Source of Truth for Customer Data
Pachyderm’s data engineering team uses Sigma - a warehouse-focused BI tool - to aggregate and transform the data collected from various sources to build a single source of truth for all their customers’ information.
They then use RudderStack’s Warehouse Actions feature to route this transformed, enriched customer data to downstream destinations like HubSpot (their inbound lead system).
Advanced, Behavior-driven Lead Qualification
When a user first signs up on Pachyderm, the first course of action suggested is to create a workspace. Pachyderm’s customer team encourages this action with drip emails. Once the user has created a workspace, an event is sent from the application backend to their data warehouse.
The team then uses Sigma to determine the total number of workspaces created and workspaces created since the last run, and materialize this data on their data warehouse. This information is then sent back to HubSpot with RudderStack Warehouse Actions. Once in HubSpot — their inbound lead system — this data is synced with Salesforce — their outbound lead system. After the behavioral data from the application has made its way into their CRM, they use Outreach.io to drive their personalized messaging and email campaigns, and (in this example) they stop sending drip emails to a user that has created a workspace.
Pachyderm Data Stack
Sources: JS SDK
Destinations: Google Analytics
Warehouse: GCP BigQuery