Version:

IDs

Learn about IDs and ID types in Profiles.

This guide introduces you to the concept of IDs in Profiles and shows how to define id_types and ids in your Profiles project.

Overview

Creating a Customer360 (C360) is primarily combining data from different sources, or inputs. ID fields define how those tables are connected.

You can create the features of a C360 by chaining these IDs from different inputs together. Note that IDs are not always consistent between inputs in name or format.

Profiles has two concepts related to IDs:

  • id: This is the field within inputs that is an identifier for a given entity. For example a user name, an email, or an anonymous ID.
  • id_type: These are the categories of an ID. ID fields that have the same id_type mean those inputs can be joined on those id.

SQL Keyword: JOIN ON

Requirements

  • ID must be unique for the members of an entity.
  • ID must exist in multiple inputs.

Usage

You can define id_types in the pb_project.yaml file present in the top level of every Profiles project, as shown:

name: llm_sdr_email_content_generation
schema_version: 84
connection: llm-recommend-dev
model_folders:
  - models
entities:
  - name: user
    id_stitcher: models/user_id_stitcher
############### ID Types #################
    id_types:
      - user_id
      - anonymous_id
##########################################
    feature_views:                # Optional
      using_ids:
        - id: anonymous_id
          name: anonymous_id_360
          
############## ID Type Definitions #######
id_types:
  - name: user_id
  - name: anonymous_id
    filters:                      # Optional
      - type: exclude
        value: ""
##########################################
python_requirements:              # Optional
  - profiles_mlcorelib==0.4.1

You can then define ids with the ID types in the inputs.yaml file. Each inputs ID is labeled with one of the id_type defined in pb_project.yaml.

inputs:
  - name: rsIdentifies
    app_defaults:
      table: rudder_autotrack_data.autotrack.identifies
      occurred_at_col: timestamp
########## Map ID to ID Type #############
      ids:
        - select: "user_id"
          type: user_id
          entity: user
        - select: "anonymous_id"
          type: anonymous_id
          entity: user
        - select: "lower(email)"
          type: email
          entity: user
##########################################

Best practices

  • You can define id_types for the id_graph or to connect additional tables for features.
  • id_types should only be identifiers that are relevant to your entities and the features you are going to create. For example, if you are looking at customer support tickets, you do not need to include IDs related to your Customer Success systems.
  • To use email address as an id_type, remove the test and internal domains.
  • Ensure the id_types you choose are unique. For example, first_name, last_name, or cat(first_name, last_name) would not reasonably be expected to be unique across all users, making them an unsuitable identifier for a user entity. But user_id or emailcould be depending on your product.
  • When picking id_types, consider the granularity of the entity. At the user grain, you will want to pick unique id_types of the same grain. For higher level grains like organization or account, you can include user level grain id_types as well as org level id_types as long each user only belongs to one org or account.


Questions? Contact us by email or on Slack