Version:

Profile Builder CLI

Create a Profiles project using the Profile Builder (PB) tool.

Profile Builder (PB) is a command-line interface (CLI) tool that simplifies data transformation within your warehouse. It generates customer profiles by stitching data together from multiple sources.

This guide lists the detailed steps to install and use the Profile Builder (PB) tool to create, configure, and run a new project.

Prerequisites

You must have:

  • Python 3 installed on your machine.
  • Admin privileges on your machine.

Steps

To set up a project using the PB tool, follow these steps:

1: Install PB

info

RudderStack recommends using a Python virtual environment to maintain an isolated and clean environment.

pipx install profiles-rudderstack

Validate Profile Builder’s version after install using:

pb version

See also: Setup and installation FAQ

2: Create warehouse connection

warning
RudderStack supports Snowflake, Redshift, BigQuery, and Databricks warehouses for Profiles. You must grant certain warehouse permissions to let RudderStack read from schema having the source tables (for example, tracks and identifies tables generated via Event Stream sources), and write data in a new schema created for Profiles.

Create a warehouse connection to allow PB to access your data:

pb init connection

Then, follow the prompts to enter details about your warehouse connection.

This creates a site configuration file inside your home directory: ~/.pb/siteconfig.yaml. If you don’t see the file, enable the View hidden files option.

3: Create project

Run the following command to create a sample project:

pb init pb-project -o MyProfilesProject

The above command creates a new project in the MyProfilesProject folder with the following structure:

Project structure

See Project structure for more information on the PB project files.

Navigate to the pb_project.yaml file and set the value of connection: to the connection name as defined in the previous step.

4: Change input sources

  • Navigate to your project and open the models/inputs.yaml file. Here, you will see a list of tables/views along with their respective ID types.
  • Replace the placeholder table names with the actual table names in the table field.

See Project structure for more information on setting these values.

5: Validate project

Navigate to your project and validate your warehouse connection and input sources:

pb validate access

If there are no errors, proceed to the next step. In case of errors, check if your warehouse schemas and tables have the required permissions.

warning
Currently, this command is not supported for BigQuery warehouse.

6: Generate SQL files

Compile the project:

pb compile

This generates SQL files in the output/ folder that you can run directly on the warehouse.

7: Generate output tables

Run the project and generate material tables:

pb run

This command generates and runs the SQL files in the warehouse, creating the material tables.

8: View generated tables

info
The view user_default_id_stitcher will always point to the latest generated ID stitcher and user_profile to the latest feature table.

You can run the pb show models command to get the exact name and path of the generated ID stitcher/feature table. See show command for more information.

Then, execute the below query to view the generated tables in the warehouse:

select * from <table_name> limit 10;

Here’s what the columns imply:

Migrate your existing project

To migrate an existing PB project to the schema version supported by your PB binary, navigate to your project’s folder. Then, run the following command to replace the contents of the existing folder with the new one:

pb migrate auto --inplace

A confirmation message appears on screen indicating that the migration is complete. A sample message for a user migrating their project from version 25 to 44:

2023-10-17T17:48:33.104+0530	INFO	migrate/migrate.go:161	
Project migrated from version 25 to version 44


Questions? Contact us by email or on Slack