danger

You are viewing documentation for an older version.

Click here to view the latest documentation.

Profiles 0.10.x Changelog

Changelog for Profiles v0.10.x.

Version 0.10.6

19 January 2024

An internal fix to address issues that arose from a recent update by Snowflake.

Version 0.10.5

9 January 2024

Some internal fixes to make py-native models more robust.

Version 0.10.4

15 December 2023

Our latest version has a plethora of features that makes our product more feature-rich and impactful.

What’s New

  • Vars as models: Earlier, Vars could only be defined inside the feature table under vars: section. Now, Vars are defined independent of feature tables. In the model specs file, we have created a new top level key called var_groups. We can create multiple groups of vars that can then be used in various models (eg. in feature table). All vars in a var-group need to have the same entity. So if you have 2 entities, you need at least 2 var groups. However, you can create multiple var_groups for every entity. For example, you can create churn_vars, revenue_vars, engagement_vars etc. So that it is easier to navigate and maintain the vars that you need. Each such model shall have name, entity_key and vars (list of objects). This is in line with Profiles design philosophy to see everything as a model.
  • User defined model types via Python [Experimental feature]: Ever wondered what it would take to implement a new model type yourself? Custom model types can now be implemented in Python. Check out this library for some officially supported model types with their Python source. Note that this is an experimental feature, so the implementation and interfaces can change significantly in the upcoming versions. To use a python package model in your project, simply specify it as a python_requirement in pb_project.yaml, similar to requirements.txt. The BuildSpec structure is defined using JSON schema within the Python package. Below code snippet shows how the requirements such as for training and config can be specified in the project:
    entities:
      - name: user
      python_requirements:
      - profiles_rudderstack_pysql==0.2.0 #registers py_sql_model model type
    models:
      - name: test_py_native_model
        model_type: py_sql_model
        model_spec:
          occurred_at_col: insert_ts
          train_config:
            prop1: "prop1"
            prop2: "prop2"
  • Default ID stitcher: Until now, when a new project was created using pb init pb-project, the file profiles.yaml had specifications for creating a custom ID stitcher. That has a few limitations, when edge sources are spanning across packages. Also, we observed that several of our users weren’t doing much changes to the ID stitcher, except for making it incremental. As a solution, we have a “default ID stitcher”, that is created by default for all projects. It runs on all the input sources and ID types defined. For quickstart purposes, users needn’t make any changes to the project, to get the ID stitcher working. In case any changes are to be made, then a user can create a custom ID stitcher, as was done in earlier versions.

  • Default ID types: Now, common concepts like ID types can be loaded from packages. So we needn’t define them in all new projects. Hence, we have moved the common ID type definitions into a library project called profiles-corelib. So when you create a new project, the key id_types is not created by default. In case you wish to create a custom list of ID types that is different from the default one, then you may do it as was the case in earlier versions.

  • Override packages: Continuing from previous point: packages now have overrides materialization spec. In case you wish to add custom ID types to the default list or modify an existing one, then you may extend the package to include your specifications. For the corresponding id_type, add the key extends: followed by name of the same/different id_type that you wish to extend, and corresponding filters with include/exclude values. Below is an example of the same:

    packages:
        - name: foo-bar
        url: "https://github.com/rudderlabs/package-555"
    id_types:
        - name: user_id
        extends: user_id
        filters:
            - type: exclude
              value: 123456
    id_types:
        - name: customer_id
        extends: user_id
        filters:
            - type: include
              regex: sample
  • entity_var tags: You can now define a list of tags in the project file under tags: key. Then, you can add a tag to each entity_var.
  • Redshift: We have added support for the RA3 node type. So now our users on that cluster can cross-reference objects in another database/schema.
  • Schema version in the project file has been updated from 44 -> 49.

Improvements

  • Generated ID’s are now more stable. This means that they are unlikely to adapt to merging of ID Clusters, thereby creating a more accurate profile of your users.
  • By default, every entity_var is a feature, unless specified otherwise using is_feature: false. So now, you need not explicitly add them to the features: list.
  • You can now add escape characters to an entity_var’s description.
  • Several internal refactorings to improve overall working of the application.

Bug Fixes

  • An entity_var having a description with special characters was failing during project re-runs. This has now been resolved.
  • We have fixed the bug where two entity_vars across different entities in the same project couldn’t have the same name.
  • Fixed some bugs related to vars as models, auto migration of projects, and ID lookup.

Known Issues

  • Redshift: If two different users create material objects on the same schema, then our tool will throw error when trying to drop views created by the other user, such as user_var_table.
  • Some commands such as insert do not work on Redshift and Databricks.
  • For a few clusters, cross DB references can fail on Redshift.
  • If you are referring a public package in the project and get ssh: handshake failed error, then you’ll have to manually clear WhtGitCache folder to make it work.
  • The code for validity_time is redundant and should be removed.
  • Sometimes you may have sometimes install both the pip packages separately (profiles-rudderstack and profiles-rudderstack-bin).
  • You may have to execute the compile command once, before executing validate access. Otherwise, you can get a seq_no error.

Questions? Contact us by email or on Slack