Quickstart
Create a Profiles project using the Profile Builder (PB) tool.
Prerequisites
- Python 3 installed on your computer.
- Admin privileges on your computer.
1. Install Profile Builder
Create and activate the virtual environment:
python3 -m venv .venv
source .venv/bin/activate
Install the Profiles Builder (PB) package:
pip install profiles-rudderstack
Verify the installation was successful:
2. Create warehouse connection
Profiles supports Snowflake, Redshift, BigQuery and Databricks. The user in your connection configuration must have permission to read from schemas containing source data and write to the schema created for Profiles.
Create a warehouse connection:
Follow the prompts to enter details about your warehouse connection:
Select a warehouse: s
Enter Connection Name: my_warehouse
Enter target: (default:dev): # Press enter, leaving it to default
Enter account: company.us-east-1
Enter warehouse: RUDDER_WAREHOUSE
Enter dbname: RUDDER_EVENTS
Enter schema: RUDDER_PROFILES
Enter user: <YOUR_USERNAME> // Optional for Databricks
Enter password: <YOUR_PASSWORD>
Enter role: <YOUR_ROLE>
Append to /Users/<user_name>/.pb/siteconfig.yaml? [y/N] y
This creates a local site configuration file inside your home directory: ~/.pb/siteconfig.yaml
. Your Profiles project uses this file to access the warehouse, git credentials, and other details. If you don’t see the file, enable the view hidden files option.
3. Create Profiles project
Initialize a new Profiles project:
pb init pb-project -o my-project
This creates a new project in the my-project
folder with the initial project structure.
If you would like to create a project in the current folder instead, run:
Open the pb_project.yaml
file at the root of your project and set the value of connection:
to the connection name as defined in the previous step.
Open the inputs.yaml
file in the models
folder. This file is where you will define your inputs.
Each input has a name
, source table
(in the format database.schema.table
), occurred_at_col
(timestamp column if available), and list of entity ids
. Each ID inclues a select
(column name), type
(ID type as defined in pb_project.yaml
) and entity
name (defined in pb_project.yaml
).
Edit the two sample inputs to use your own data. This may only require changing the table
fields to point to your identifies
and tracks
tables. If your identifies
table does not include email addresses, remove that ID. If email address is included under a different column name, change the select
field accordingly.
5. Validate warehouse access
Validate your warehouse connection has the necessary access:
If there are no errors, proceed to the next step. In case of errors, make sure to provide the necessary warehouse permissions.
6. Run project
Run the project on your data warehouse:
This compiles your project into SQL files stored in the output
folder, then runs that SQL on your warehouse.
7. Add feature
Open the profiles.yaml
file in the models
folder. There should be a var group with four entity vars, two of which are features (included in feature views).
Add a new entity var / feature that gets the event count for each user:
- entity_var:
name: event_count
select: COUNT(*)
from: inputs/rsTracks
description: No. of events tracked for customer
Profiles automatically groups data by customer, so you only need to specify the aggregate function.
If you run the project again, you will see this event_count
feature in your feature views.
8. Add cohort
The feature views in your warehouse contain features for all customers. A cohort can be used to create feature views for a subset of customers.
In the profiles.yaml
file, add a cohort for power users:
models:
- name: power_users
model_type: entity_cohort
model_spec:
extends: user/all
materialization:
output_type: view
filter_pipeline:
- type: include
value: "{{ user.event_count }} > 45"
This power_users
cohort uses the event_count
entity var defined above, only including customers with more than 45 events tracked.
Feature views for this cohort will be automatically generated during the next project run.
Questions? Contact us by email or on
Slack