Configurable Alerts

Set up notifications for critical data issues.

8 minute read

RudderStack’s smart alerting capabilities let you set up notifications for critical data issues so you can take appropriate actions immediately before they escalate into major problems.

With RudderStack’s alerting feature, you can:

Configure alerts for event delivery failures, pre-sync or sync failures, event volume drops, and tracking plan violations.
Set failure thresholds at the workspace and resource level, so you are alerted only when necessary.
Set up alert delivery channels of your choice like email, custom webhook, Slack, or any other tool. Once the alert threshold is hit, RudderStack automatically delivers alerts to these systems.

Set up alerting

RudderStack lets you set up alerts both at the workspace and resource level.

Workspace level alerts

Only Org Admins can set up workspace-level alerts.

Go to Settings > Workspace and click the Alerts tab to set up workspace-level alerts, that is, alerts for sources and destinations across your Event Stream and Reverse ETL pipelines for that workspace.

Note the following:

RudderStack automatically delivers alerts on the configured channels if the failures exceed the threshold percentage within the last one hour.
If you set the error threshold to 0%, even a single failure in processing or delivering events will trigger an alert.

Event Stream

In this section, you can configure alerts and set thresholds for the following incidents:

Failure type	Description	Applicable to
Cloud destination failures	Failures in processing or delivering events to a destination due to incorrect credentials, destination downtime, network error, or any other reason.	Event Stream destinations
Warehouse pre-sync failures	Failures in processing or storing the events in object storage before forwarding them to the warehouse destination.	Warehouse destinations
Warehouse sync failures	Failures in syncing events to a warehouse destination, that is, syncs to a warehouse destination are aborted. Possible reasons include: Incorrect warehouse connection credentials Warehouse settings changed/updated midway through the syncs Source/destination downtime or network error	Warehouse destinations
Low event volume Beta	Event volume drop in the last one hour is more than the configured threshold, as compared to the same period from the last week. See Low event volume alerts for more details.	Event Stream sources
Tracking plan violations	Tracking plan violations for a particular source, that is, the incoming source events and properties do not comply with the tracking plan connected to that source.	Event Stream sources

Low event volume alerts

RudderStack triggers the Low event volume alert if the event volume drop for an Event Stream source in the last one hour is more than the configured threshold compared to the event volume for the same time period in the last week.

RudderStack triggers the Low event volume alerts based on the below formula:

Equation to calculate low event volume triggers

For example, you will get an alert if:

An Event Stream source ingests 450 events within the last hour (for example, 11- 12 p.m.), but it ingested 1000 events from 11-12 p.m. a week before.
The alert threshold was set to 50%.

In this case, RudderStack triggers an alert as the volume drop percentage (55%) exceeds the configured threshold (50%).

Note that:
A 0% threshold indicates that RudderStack triggers an alert even if the number of ingested events in the past one hour is the same as last week. That is, last_week_window_count = current_window_count.
A 100% threshold indicates that RudderStack triggers an alert if the source ingested no events in the last one hour but some events (>0) exactly a week before. That is, current_window_count = 0 and last_week_window_count > 0.

Reverse ETL

In this section, you can configure alerts for the following incidents applicable to all your Reverse ETL sources:

Failure type	Description
Partial row failures	Failures in syncing records from the warehouse source to the connected destination.
Fatal syncs	Fatal errors causing a running sync to be aborted. Possible reasons include: Incorrect warehouse connection credentials Warehouse settings changed/updated midway through the syncs Source/destination downtime or network error

Custom alerts

Click the Custom alerts setting present below each failure type to view the resources for which custom alert overrides are configured:

The resulting sidebar lists all the resources with custom alerts categorized by failure type. You also see the following information:

Name: The resource name.
Subscribed: Whether alerts are on or off for that failure type.
Threshold: Custom alert threshold value set for that resource.

Click on a resource to change these settings.

Resource level alerts

Note the following:
Only members with Org Admin or Connections Admin permissions can set up resource-level alerts.
Once you set the alert overrides for a particular resource, any changes to the workspace-level settings will not be applicable for that resource.
You cannot change the alert delivery channels for a particular resource.

Go to the resource (source or destination) for which you want to customize the alert settings. Then, click the Settings tab and scroll down to the alerts section.

If configured, you will see the workspace-level alert settings and thresholds enabled for the resource by default. You can change these settings and set custom thresholds for this resource.

Once you change the settings, you will automatically see the following message pop up:

Resource-specific alert types

The following table lists the alert types applicable to a particular Event Stream or Reverse ETL resource:

Resource type	Alert type
Event Stream source	Low event volume Tracking plan violations
Event Stream destination	Cloud destination failures
Warehouse destination	Warehouse pre-sync failures Warehouse sync failures
Reverse ETL source	Partial row failures Fatal syncs

Set up alert delivery channels

You can set up dedicated alert channels to get notified whenever your sources or destinations have failures or errors. This allows you to take proactive measures to fix the problems before they escalate into major issues.

Note that:
You can set up separate alert delivery channels for your Event Stream and Reverse ETL pipelines.
Toggling off alerts for a channel automatically removes all the configurations. You will have to reconfigure the channel to use it again.
RudderStack limits the alert delivery to one alert per resource per alert type for each configured channel every 24 hours.

RudderStack provides the following options to set up channels for delivery alerts:

Slack

Toggle on the Slack setting to receive alerts on your preferred Slack channel.

Set the Slack channel and authorize RudderStack to post the alerts by clicking Allow. Note that you should be an admin of the Slack workspace to grant RudderStack the necessary permissions to post to that channel.

While setting the Slack channel, you will see a This app is not approved by Slack ribbon at the top. This is because Slack has not reviewed the app yet. However, it is completely safe to install.

Once the alert is triggered, RudderStack automatically sends a notification on the specified Slack channel. Click Review on RudderStack to go to the specific resource (source or destination) to investigate and fix the errors.

Microsoft Teams

To use MS Teams for alerts delivery, you must create an incoming webhook for the Teams channel you wish to use.

Toggle on the MS Teams setting and enter the incoming webhook URL to receive alerts on your preferred Teams channel.

Once the alert is triggered, RudderStack automatically sends a notification on the specified Teams channel:

Click Review on RudderStack to go to the specific resource (source or destination) to investigate and fix the errors.

Webhook

Toggle on the Webhook setting to forward the alerts to custom webhook channels.

RudderStack sends the alerts as a POST request to the configured endpoint while following the Prometheus styling format.

Prometheus is a widely accepted monitoring and alerting tool. Its alert format is compatible with various other monitoring and incident management tools like Squadcast, PagerDuty, etc.

A sample webhook response is shown:

{
  "alerts": [{
    "endsAt": "0001-01-01T00:00:00Z",
    "labels": {
      "severity": "critical",
      "alertname": "partial-row-failures",
      "workspace": "<workspace_name>",
      "destination": "Failing Webhook",
      "workspaceId": "<workspace_id>",
      "organization": "<org_name>",
      "destinationId": "<destination_id>",
      "organizationId": "<org_id>",
      "configuredThreshold": 61
    },
    "status": "firing",
    "startsAt": "2024-02-05T00:02:49.933Z",
    "annotations": {
      "description": "Errors in processing or delivering events to Failing Webhook destination have exceeded the configured threshold of 61% within last 1 hour"
    },
    "fingerPrint": "d9885cc7f11b8db0"
  }],
  "status": "firing"
}

Send alerts to downstream tools

You can also forward the alerts to any downstream tool supported by RudderStack:

Set up a webhook source. Note the webhook URL containing the source write key parameter.

Set up a destination integration, for example, PagerDuty. Connect it to the webhook source created in Step 1.
Once you set up the connection, specify the webhook source URL obtained in Step 1 in the Enter URL field where RudderStack forwards the alerts.

Email

Toggle on the Email setting and specify comma-separated email addresses of the users who would like to receive the alerts.

Once the alert is triggered, these users will automatically get email alerts to investigate and fix the errors.

Alert frequency

RudderStack limits the alert delivery to one alert per resource (source or destination) per alert type for each configured channel every 24 hours.

This alerting logic ensures you are not spammed with notifications, especially in cases where you have configured multiple alert types for your pipelines and some resources have their own overrides (custom alert settings) in place.

Use case

Suppose you get a Partial row failures alert for a particular Reverse ETL source. You will not get another alert for the same failure type for another 24 hours even if your data syncs are scheduled at a lesser frequency (for example, every one, five, or 12 hours).

However, if that source encounters another failure type like a fatal sync, RudderStack will trigger an alert.

Was this page helpful?

Glad to hear it! Please tell us how we can improve.

Sorry to hear that. Please tell us how we can improve.

Questions? Contact us by email or on Slack