Timegrains
Learn about Timegrains in Profiles.
This guide introduces you to the concept of Timegrains in Profiles and shows how to define them in your Profiles project.
Overview
Timegrains gives you the ability to control the frequency at which certain models run irrespective of the frequency at which you trigger the project runs.
You can define the following timegrains in your model’s model_spec
file:
tick, tenminutes, hour, day, week, month, year
For example, suppose you have defined a feature for which you do not need the most updated data, for example, computing a feature for each user every month. You can then set the time_grain
: month
and run the project. Profiles Builder ensures that this feature is computed only once a month. The computed values from the month boundary are used in all the subsequent feature_views
. Any new identified user identified after the start of the month will have the value of that feature set to null
.
Note that:
- You will get the same output if you specify a model’s
time_grain
to be day
and you run this model multiple times a day. - If a model’s
time_grain
is specified to be day
, it will only consider input data until that day’s boundary (00:00 UTC
). Any data ingested after this timestamp is not considered in the computation.
Usage
You can specify the timegrain in model_spec of any model. An example is shown below:
models:
- name: users_with_valid_email
model_type: entity_cohort
model_spec:
extends: user/all
time_grain: "day"
filter_expression:
AND:
- "NOT {{ user.Var('id_type_email_count') }} = 0"
feature_views:
name: users_with_valid_email_feature_view
You can also specify it at var_group
level. For example:
- name: weekly_user_vars
entity_cohort: models/new_users
time_grain: "week"
vars:
- entity_var:
name: campaign_sources
select: "{{list_agg('context_campaign_source', ',')}}"
from: inputs/rsTracks
Note if you have specified time_grain
on any of the model, PB will need you to specify a default_time_grain
in your pb_project.yaml
file. Its value should be equal to or lower than the finest time_grain
you have used in any of the models.
For example, if you have specified day
, week
, and month
as your timegrains for various models, you will need to specify default_time_grain: day
. You can also set default_time_grain
to tick
, tenminutes
, or hour
- however, this is not recommended as it may negatively impact the performance.
As a general rule, note that you should specify the default_time_grain
to match with your run schedule. If you are scheduling PB to run daily, its best to keep default_time_grain: day
. You can then accordingly schedule the run at 00:00 UTC
so that the latest data is considered in the run.
Questions? Contact us by email or on
Slack