How Pachyderm uses RudderStack to master lead qualification

Pachyderm is a data science platform that combines Data Lineage with End-to-End Pipelines on Kubernetes
Challenges
- Building an efficient data tracking pipeline
- Getting siloed data into a centralized data store for analysis
- Gaining deeper insights into user product behavior and optimizing UX to increase customer adoption
Results
- Unified real-time user events, product usage data, and data from cloud sources into a centralized data warehouse
- Leveraging enriched, transformed warehouse data for analytics and product optimization
- Routing enriched data back to downstream tools for inbound and outbound marketing and sales
Pachyderm's Data Stack
- Data Collection and Synchronization
RudderStack Event Stream SDKs, Warehouse Actions, & RudderStack Cloud Extract - Data Warehouse
Google BigQuery - Data Transformation
dbt - Business Intelligence
Sigma - Cloud Toolset for Activation Use-cases
HubSpot, Google Analytics, Facebook Pixel, Intercom, Google Tag Manager, Slack, Salesforce
Pachyderm's data challenges
Pachyderm is a data science platform that lets you easily build and manage your data science pipelines, regardless of their scale and complexity. It allows you to track your data lineage and implement version control for your data. You can set up Pachyderm in your development environment, on the cloud, or use Pachyderm Hub - their fully-managed SaaS platform.
Pachyderm generates gigabytes of data by tracking their users’ product interactions and data from their cloud sources. Previously, all of this data was highly siloed in cloud tools like HubSpot. This meant their data team had to do a lot of plumbing to move the data around to their other marketing and sales tools. As the company evolved and needed more customer insights to grow, they knew they needed a better approach.
They wanted to get all of their data into a centralized data store which they could leverage for product analytics and more efficient marketing.
RudderStack has given us better access to our data. Our data was siloed in cloud sources. Now we have it all in a warehouse, making it accessible to everyone.
Single Source of Truth for Customer Data
Pachyderm’s data engineering team uses Sigma - a warehouse-focused BI tool - to aggregate and transform the data collected from various sources to build a single source of truth for all their customers’ information.
They then use RudderStack’s Warehouse Actions feature to route this transformed, enriched customer data to downstream destinations like HubSpot (their inbound lead system).
Advanced, Behavior-driven Lead Qualification
When a user first signs up on Pachyderm, the first course of action suggested is to create a workspace. Pachyderm’s customer team encourages this action with drip emails. Once the user has created a workspace, an event is sent from the application backend to their data warehouse.
The team then uses Sigma to determine the total number of workspaces created and workspaces created since the last run, and materialize this data on their data warehouse. This information is then sent back to HubSpot with RudderStack Warehouse Actions. Once in HubSpot — their inbound lead system — this data is synced with Salesforce — their outbound lead system. After the behavioral data from the application has made its way into their CRM, they use Outreach.io to drive their personalized messaging and email campaigns, and (in this example) they stop sending drip emails to a user that has created a workspace.
More Customer Stories

Tabnine rebuilt their data stack with RudderStack to centralize developer usage data, simplify infrastructure, and give every team reliable visibility into adoption, retention, and expansion signals.
300+
product events captured per developer per hour
10+
downstream destinations activated
Read more

Two years ago, Bol.com ran an audit. They already knew the answer, their platform couldn't scale for what was coming. But confirmation matters when you're about to make a fundamental infrastructure change.
50B+
monthly events
13 million
active customers
50,000+
retail partners
Read more

Jaja Finance is a UK-based digital lender reimagining credit cards with a focus on customer experience and simplicity. With a growing customer base and mobile-first approach, Jaja needed to deliver seamless onboarding and personalized engagement while modernizing its data infrastructure.
Read more


Start delivering business value faster
Implement RudderStack and start driving measurable business results in less than 90 days.


