danger

You are viewing documentation for an older version.

Click here to view the latest documentation.

Profiles 0.11.x Changelog

Changelog for Profiles v0.11.x.

Version 0.11.5

16 April 2024

Bug Fixes

  • Fixed issue for Redshift where the driver version wasn’t getting populated correctly.
  • Improved cleanup functionality for Redshift by dropping procedures that were used for creating entity_vars.
  • Resolved the issue where migrated project folder’s files weren’t getting deleted.
  • Fixed the bug where pb run --rebase_incremental command was taking edges from previous runs.
  • Few internal refactorings while returning data types of columns.

Version 0.11.3

1 April 2024

What’s New

  • An optional parameter column_data_type to specify the data type for an entity_var/input_var.
  • Support for programmatic credentials for Redshift.
  • Schema update in the project yaml file from 53 to 54.

Improvements

  • Better error propagation in case of concurrency.
  • Few internal refactorings for improved overall working.

Bug Fixes

  • Resolved relation still open error when accessing external tables in Redshift.
  • Fixed some bugs when getting the latest seq_no for a material in BigQuery.
  • Resolved the issue of conflict in row-ID in case of very large datasets in BigQuery.
  • Begin and end time of all models are now in UTC timezone. This fixes a few inconsistency issues in models.
  • Resolved a concurrency issue which occurred on two different root models with the same name.

Version 0.11.2

15 March 2024

What’s New

  • You can now do parallel processing while running a project using the --concurrency flag. Currently, this is supported only for Snowflake warehouse. It is recommended to use this option judiciously as applying a very large value can impact your system resources.
  • RedShift users can now access external tables in their data catalog(s).

Improvements

  • Project created using pb init pb-project now works for all warehouses.

Bug Fixes

  • Fixed issues encountered while running BigQuery projects on Windows.
  • Resolved errors for entity var names in case they match with input column name.
  • Resolved bugs related to inserting seq_no.

Version 0.11.1

7 March 2024

  • Includes bug fixes related to creating vars on ID models and nil model remapping.

Version 0.11.0

1 March 2024

What’s New

  • RudderStack now supports BigQuery (beta), offering the same seamless experience as on other data warehouses.
  • CSV models (Experimental): In the inputs specs, RudderStack has added the ability to read data from a CSV file, instead of a Database table/view. You can use files from local storage, or kept on S3. Under app_defaults, instead of table/view, use csv (local storage) or s3 (kept on S3) followed by the path where the CSV file is kept. Note that this feature is experimental, and RudderStack currently supports S3 on Snowflake and Redshift. A sample code is as follows:
    app_defaults:
      csv: "../common.xtra/Temp_tbl_a.csv"

    app_defaults:
      s3: "s3://s3-wht-input-test-bucket/test/Temp_tbl_d.csv"
  • Filter IDs: You can now filter out a vast number of ID’s using SQL. For example, if you wish to exclude all blacklisted ID’s that are listed in an input model named csv_email_blacklist and user ID’s from an SQL model named sql_exclusion_model, then, you may edit your project file as:
id_types:
  - name: email
    filters:
      - type: exclude
        sql:
          select: email
          from: inputs/csv_email_blacklist
  - name: user_id
    filters:
      - type: exclude
        sql:
          select: user_id
          from: models/sql_exclusion_model
  • Pre and Post Hooks: A pre hook enables you to execute an SQL, before running a model, for example, if you want to change DB access, create a DB object, etc. Likewise, a post hook enables you to execute an SQL after running a model. The SQL can also be templatized. Here’s an example code snippet:
models:
  - name: test_id_stitcher
    model_type: id_stitcher
    hooks:
      pre_run: "CREATE OR REPLACE VIEW {{warehouse.ObjRef('V1')}} AS (SELECT * from {{warehouse.ObjRef('Temp_tbl_a')}});"
      post_run: 'CREATE OR REPLACE VIEW {{warehouse.ObjRef("V2")}} AS (SELECT * from {{warehouse.ObjRef("Temp_tbl_a")}});'
    model_spec:
  • pb show models - You can now view in JSON format by passing the flag –json.
  • For Databricks, RudderStack now supports the pb validate access command.
  • RudderStack has reverted to having a custom ID stitcher in a new project created using pb init pb-project.
  • When creating a new connection in Redshift, you’ll now be asked to input sslmode. You can enter either disable (default) or require. This will help RudderStack’s tool to work with Redshift DB’s that require SSL mode to be enabled.
  • RudderStack supports triggering tasks by using URL and also read the status back.
  • RudderStack dashboard now supports Git Projects hosted on BitBucket and GitLab.
  • In model specs, a materialization’s enable_status is changed to snake_case. That is, enable_status: mustHave -> enable_status: must_have
  • Schema version in the project file has been updated from 49 to 53.

Improvements

  • Better error messages are shown in case of incorrect/missing/duplicate entity-vars.
  • Error handling has been improved at a few places in Python models.
  • Model path refs are now case insensitive.
  • The command pb migrate auto can now handle the case where model folders aren’t present.
  • Specific messages are now shown, in case of errors in the material registry.
  • Due to limitations of Databricks SQL, RudderStack has added restrictions on using catalog hive_metastore. So, in case a user on that catalog tries to use RudderStack’s tool, an error is thrown.

Bug Fixes

  • Resolved the intermittent issue in Redshift where it throws an error ptr_to_latest_seqno_cache does not exist.
  • Bugs in pb show idstitcher-report, pb show user-lookup, and pb cleanup materials commands have been rectified. pb show idstitcher-report is still flaky, however, RudderStack team is working on improving it.
  • Fixed bug in packages wherein entityvars/inputvars weren’t able to refer SQL models.
  • Resolved erroneous queries for validate access command in case of missing privileges.
  • Recsolved the issue where git repo wasn’t getting cloned in case cache_dir in siteconfig was written using tilde notation.
  • Fixed some bugs related to begin_time of models.
  • Resolved a few issues when cloning Git projects in the web app.
  • Several fixes in gRPC, making it more stable.
  • The remapping: key is removed (if exists) in models/inputs.yaml as it was redundant.
  • Resolved some bugs in incremental ID stitcher.

Known Issues

  • pb validate access command does not work for BigQuery.

Questions? Contact us by email or on Slack