Version:

Profiles 0.18.x Changelog

Changelog for Profiles v0.18.x.

Version 0.18

26 September 2024
Schema version: 80

What’s New

  • Cohort model now lets you perform filtering using a filter_expression followed by AND/OR list of expressions, for example:
models:
   - name: high_value_us_residents
     model_type: cohort
     model_spec:
       ...
       filter_expression:
         AND:
           - {{ user.Var('country') }} = 'US'
           - {{ user.Var('salary') }} > 10000
  • You can define the retention_period for each model of a project. Further, the pb cleanup materials --expired command cleans up the materials beyond the defined retention period.
  • Referring other entity_vars/input_vars is now simplified. You can use {{entityName.entity_varName}} instead of the earlier one {{entityName.Var("entity_varName")}}. Note that the earlier syntax also works fine.
  • You can use features of an SQL model while using a cohort. To do so, specify the entity_key or entity_cohort in the model_spec of an SQL model.
  • pb cleanup materials --concurrency - A new command which enables concurrency for cleanup, by defining the number of concurrent workers for cleanup. The default value is 1.
  • The default offset value while executing pb run command is now updated to 0. It was 30 minutes earlier.
  • A new flag --end_time_offset is added to the compile/run commands for adding an offset to the end timestamp, in a human readable format. It means that RudderStack does not use any data you load in the warehouse after the offset time has elapsed for that run. For example, pb run --end_time_offset=45m ensures that RudderStack does not use any data older than 45 minutes from the run’s start time. Note that you can’t use this new flag with the seq_no or end_time flags.
  • You can now import Packages starting with SSH URLs, for example, ssh://git@host:port/path.git.
  • You can run or import projects hosted on S3 as packages by adding block_store_creds in your site configuration file. To run the project, execute pb run -p s3://<url> command.
  • Running a project with the --migrate_on_load flag now stores generated artifacts in the output subfolder instead of migrations.
  • For an entity_var/input_var, the default key has been renamed to default_value.
  • Simplified the project created using pb init pb-project by removing the dependency on corelib package , sample SQL model, model contracts and CSV’s in the inputs file.
  • RudderStack now uses INNER JOIN instead of RIGHT JOIN when calculating entity_vars. This results in performance improvement and also prevents some values from getting lost.
  • Feature view model with main_id as an identifier is created by default.
  • Schema has been updated from 72 to 80.

Improvements

  • By default, RudderStack ignores all the blank values in the ID stitcher model.
  • There is a slight aesthetic improvement in HTML reports generated using pb show idstitcher-report command.
  • Relevant errors are now thrown if you specify an unknown YAML key in the model definition.

Bug Fixes

  • validity_time key has been removed.
  • The pb validate access command, for Databricks, now checks only for the necessary permissions and not for ALL the privileges.

Known Issues

BigQuery

  • pb validate access command does not work for BigQuery.

Redshift

  • If two different users create material objects on the same schema, RudderStack gives an error during cleanup when trying to drop views created by the other user, like user_var_table.
  • Cross database references can fail on Redshift for a few clusters.
  • While creating Activations, validation for Redshift does not work correctly in the RudderStack dashboard.

Databricks

  • Concurrency does not work for cleanup.

Other issues

  • Linux users might see this warning for all command runs - you can ignore it: WARN[0000]log.go:228 gosnowflake.(*defaultLogger).Warn DBUS_SESSION_BUS_ADDRESS envvar looks to be not set, this can lead to runaway dbus-daemon processes. To avoid this, set envvar DBUS_SESSION_BUS_ADDRESS=$XDG_RUNTIME_DIR/bus (if it exists) or DBUS_SESSION_BUS_ADDRESS=/dev/null.
  • pb insert does not work for Redshift, Databricks, and BigQuery.
  • If you are referring a public package in the project and get ssh: handshake failed error, then you’ll have to manually remove the entire folder from WhtGitCache to make it work.
  • Timegrains is an experimental feature. There might be some undiscovered issues.

Questions? Contact us by email or on Slack