Data Graph YAML Reference

YAML schema reference for defining a Data Graph with the Rudder CLI — entities, events, and relationships.
Available Plans
  • growth
  • enterprise

This reference documents the YAML schema for defining a Data Graph with the Rudder CLI. Use it alongside the CLI to author, version-control, and sync data graph definitions as code.

File structure

A data graph YAML file has the following top-level structure:

version: "rudder/v1"
kind: "data-graph"
metadata:
  name: "ecommerce-data-graph"
spec:
  id: "ecommerce-data-graph"
  account_id: "<warehouse-account-id>"
  models:
    - ...

Top-level fields

FieldType
Description
version
Required
StringSchema version. Use rudder/v1.
kind
Required
StringResource kind. Must be data-graph.
metadata.name
Required
StringHuman-readable name for the data graph
spec.id
Required
StringUnique ID for the data graph. Used as its stable identifier across syncs.
spec.account_id
Required
StringThe ID of the warehouse account the data graph reads from.
spec.models
Required
ListList of entity and event models that make up the data graph. See Models for more information.

Models

The spec.models list contains all the entities and events the data graph exposes to the Audience Builder. Each model points at a warehouse table and optionally declares relationships to other models.

Model fields

FieldType
Description
id
Required
StringUnique ID for the model within this data graph. Used as the target of relationships (see Relationships).
display_name
Required
StringName shown in the Audience Builder UI (for example, Customers, Sales).
type
Required
StringEither entity (dimension-style table) or event (timestamped fact table).
table
Required
StringFully qualified warehouse table name, for example, ECOMMERCE_DB.E_MART.DIM_CUSTOMERS.
descriptionStringHuman-readable description of the model. Shown as a tooltip in the builder.
primary_id
Required
StringColumn that uniquely identifies a row in the table. Required for entities, Optional for events.
timestamp
Required
StringColumn holding the event timestamp. Required when type: event. Used for time-window filtering in the Audience Builder. Optional for entities.
relationships
Optional
ListList of relationships this model has to other models. See Relationships for more information.
columns
Optional
ListPer-column overrides that give warehouse columns a marketer-friendly alias (display_name) and optional description, surfaced in the Audience Builder.

See Column metadata for more information.

Entity vs. event

  • Entity: A dimension-like table representing a business object (Customers, Products, Stores). Use type: entity and set primary_id.
  • Event: A fact-like table where each row represents something that happened at a point in time (Sales, Customer Interactions, Loyalty Points). Use type: event and set timestamp. Events can be filtered with a time window in the Audience Builder.

Relationships

Relationships connect two models so marketers can filter one model using conditions on related records (for example, “customers with 3 or more orders”). Relationships are declared on the source model under its relationships list.

Relationship fields

FieldType
Description
id
Required
StringUnique ID for the relationship within the source model.
display_name
Required
StringName shown in the Audience Builder UI (for example, Has Sales, Belongs To Account).
cardinality
Required
StringOne of one-to-many, many-to-one, or one-to-one. See Current limitations.
target
Required
StringReference to the target model in the form #data-graph-model:<model-id>.
source_join_key
Required
StringColumn on the source model used in the join.
target_join_key
Required
StringColumn on the target model used in the join.

Target reference format

Relationship targets use the #data-graph-model:<model-id> reference format, where <model-id> is the id of another model in the same data graph. For example:

target: "#data-graph-model:sales"

Column metadata

By default, the Audience Builder shows the raw warehouse column names (for example, EMAIL_ADDRESS or CREATED_TS). Use the optional columns block on a model to give specific columns a marketer-friendly alias (display_name) and an optional description. Both surface when building audiences and expressions, making the underlying warehouse columns easier to read and choose.

The columns block is sparse — list only the columns you want to override. Columns you don’t list keep their raw warehouse names.

models:
  - id: "customers"
    display_name: "Customers"
    type: "entity"
    table: "ECOMMERCE_DB.E_MART.DIM_CUSTOMERS"
    primary_id: "CUSTOMER_KEY"
    columns:
      - name: "EMAIL_ADDRESS"                         # Warehouse column name (must match the table).
        display_name: "Email"                         # Friendly name shown in the Audience Builder.
        description: "Primary contact email"
      - name: "CUSTOMER_KEY"
        display_name: "Customer ID"                   # Alias only — no description.
      - name: "LOYALTY_NOTES"
        description: "Free-form loyalty notes"        # Description only — no alias.

Column fields

FieldType
Description
name
Required
StringWarehouse column name — must match a column in the model’s table.
display_name
Conditional
StringFriendly name shown in the Audience Builder instead of the raw column name. Required unless description is set.

Maximum 255 characters — should be case-insensitive and unique within the model.
description
Conditional
StringHuman-readable note shown alongside the column in the Audience Builder. Required unless display_name is set.

Maximum 255 characters.

Note that:

  • Each columns entry must set at least one of display_name or description.
  • To clear one field while keeping the other, omit it from the entry.
  • To remove all metadata for a column, drop its entry — the next apply clears it, since the columns block is the source of truth.

Complete example

The following example defines a small e-commerce data graph with two entities (Customers, Accounts), one event (Sales), and the relationships between them:

version: "rudder/v1"
kind: "data-graph"
metadata:
  name: "ecommerce-data-graph"
spec:
  id: "ecommerce-data-graph"
  account_id: "<warehouse-account-id>" # RudderStack generates this ID when you connect a warehouse to your RudderStack workspace.
  models:
    # --- Customers (entity) ---
    - id: "customers"
      display_name: "Customers"
      type: "entity"
      table: "ECOMMERCE_DB.E_MART.DIM_CUSTOMERS"
      description: "Customers with demographics and loyalty info"
      primary_id: "CUSTOMER_KEY"
      columns:
        - name: "EMAIL_ADDRESS"
          display_name: "Email"
          description: "Primary contact email"
        - name: "LOYALTY_TIER"
          display_name: "Loyalty Tier"
      relationships:
        - id: "customer-has-sales"
          display_name: "Has Sales"
          cardinality: "one-to-many"
          target: "#data-graph-model:sales"
          source_join_key: "CUSTOMER_KEY"
          target_join_key: "CUSTOMER_KEY"
        - id: "customer-belongs-to-account"
          display_name: "Belongs To Account"
          cardinality: "many-to-one"
          target: "#data-graph-model:accounts"
          source_join_key: "ACCOUNT_KEY"
          target_join_key: "ACCOUNT_KEY"

    # --- Accounts (entity) ---
    - id: "accounts"
      display_name: "Accounts"
      type: "entity"
      table: "ECOMMERCE_DB.E_MART.DIM_ACCOUNTS"
      description: "Customer account records for individual, household, and corporate grouping"
      primary_id: "ACCOUNT_KEY"

    # --- Sales (event) ---
    - id: "sales"
      display_name: "Sales"
      type: "event"
      table: "ECOMMERCE_DB.E_MART.FACT_SALES"
      description: "Sales transactions with amounts, status, and store/channel links"
      timestamp: "CREATED_AT"

Validate the data graph

Validate your data graph YAML before syncing it to your workspace:

rudder-cli validate -l data-graph.yaml

This command returns validation errors and warnings if the YAML is invalid.

Validation rules

Spec version
Filter by phase

Showing 5 of 5 rules

Relationship cardinality must be valid for the source and target model types

semantic error

Rule ID: datagraph/data-graph/relationship-cardinality-valid

Examples

Relationship target references must resolve to existing models

semantic error

Rule ID: datagraph/data-graph/relationship-refs-valid

Examples

At most one relationship is allowed per source-target model pair

semantic error

Rule ID: datagraph/data-graph/relationship-unique-pair

Examples

Data Graph spec syntax must be valid

syntactic error

Rule ID: datagraph/data-graph/spec-syntax-valid

Examples

Model and relationship names must be unique within a Data Graph

semantic error

Rule ID: datagraph/data-graph/unique-names-valid

Examples

Sync to your workspace

Once validation passes, sync the data graph to your workspace:

rudder-cli apply -l data-graph.yaml

See also

Questions? We're here to help.

Join the RudderStack Slack community or email us for support