Snowpipe Streaming Destination Beta

Sync real-time data streams from RudderStack to Snowflake using the Snowpipe Streaming API.

Snowpipe Streaming is a powerful tool for handling real-time data streams. It allows you to stream data rows directly in Snowflake tables with minimal latency.

Find the open source code for this destination in the GitHub repository.

When to use Snowpipe Streaming

Snowpipe Streaming is ideal for the following scenarios:

  • Continuous data streams: If your data sources produce a steady stream of data in small batches, for example, clickstream data, event logs, etc.
  • Low latency requirements: When your application requires immediate/frequent updates to the Snowflake table as new data arrives. Some use cases include real-time dashboards, streaming analytics, etc.
  • Cost optimization for streaming data: Snowpipe Streaming can potentially reduce costs for real-time pipelines as it allows micro-batch ingestion without needing a traditional Snowflake warehouse running constantly.

Prerequisites

Before you set up Snowpipe Streaming as a destination in RudderStack, make sure to set the correct user permissions in Snowflake to allow RudderStack to send the data correctly.

warning
To enable RudderStack access, make sure to have the ACCOUNTADMIN role or an account with MANAGE GRANTS privilege.

The following sections show you how to set up a virtual warehouse, a database, a role, and an user in Snowflake:

Create warehouse

In your Snowflake console, create an X-Small warehouse.

info
RudderStack does not use the warehouse for data loading. It only uses the warehouse to adapt the schema and the update the tables in case of new properties in the events (or entirely new events).
Create a virtual warehouse

Alternatively, run the following SQL commands to create a new warehouse:

CREATE WAREHOUSE "<WAREHOUSE_NAME>"
  WITH WAREHOUSE_SIZE = 'XSMALL'
    WAREHOUSE_TYPE = 'STANDARD'
    AUTO_SUSPEND = 600
    AUTO_RESUME = TRUE;
warning
Make sure to set AUTO_SUSPEND to ~10 minutes and enable AUTO_RESUME to avoid any extra costs.

Create database

RudderStack recommends creating a new database to avoid conflicts with your existing data. Note that RudderStack creates its own tables within this database while storing your events.

Create a database

Alternatively, you can create a new database by running the following SQL command:

CREATE DATABASE "<DATABASE_NAME>";

Create role for RudderStack

Run the following SQL commands in the exact order to create a new role with the required permissions to load your data into the warehouse created above.

Make sure to replace the placeholder names with your preferred names.

  1. Create a new role called <ROLE>:
CREATE ROLE "<ROLE>";
  1. Grant access to the warehouse <WAREHOUSE_NAME>:
GRANT USAGE ON WAREHOUSE "<WAREHOUSE_NAME>" TO ROLE "<ROLE>";
  1. Grant access to the database <DATABASE_NAME>:
GRANT USAGE ON DATABASE "<DATABASE_NAME>" TO ROLE "<ROLE>";
GRANT CREATE SCHEMA ON DATABASE "<DATABASE_NAME>" TO ROLE "<ROLE>";
GRANT ALL ON ALL SCHEMAS IN DATABASE "<DATABASE_NAME>" TO ROLE "<ROLE>";
info

You can also create a custom role and specify it in the dashboard settings while setting up the Snowpipe Streaming destination in RudderStack.

Note that this role must have the necessary permissions for RudderStack to load the data into your warehouse.

Create user

Use the below query to create a Snowflake user that connects RudderStack to the newly created warehouse.

Make sure to replace the placeholder names with your preferred values.

CREATE USER "<USER_NAME>"
  MUST_CHANGE_PASSWORD = FALSE
  DEFAULT_ROLE = "<ROLE>"
  PASSWORD = "<STRONG_PASSWORD>";
GRANT ROLE "<ROLE>" TO USER "<USER_NAME>";

Setup

  1. In your RudderStack dashboard, add a source. Then select Snowpipe Streaming from the list of destinations.
  2. Assign a name to your destination and click Continue.

Connection settings

  • Account: Enter the account ID of your Snowflake warehouse - this ID is part of the Snowflake URL. The below table illustrates the slight differences in the account IDs depending on the various cloud providers. See the Snowflake documentation for more information on the account locator formats depending on region or cloud provider.
Account ID exampleSnowflake URLSnowflake cloud provider
qya56091.us-east-1

qya56091.us-east-2.aws
https://qya56091.us-east-1.snowflakecomputing.com

https://qya56091.us-east-2.aws.snowflakecomputing.com
AWS

Note: In the case of AWS, .aws is present in the account locator of some region accounts and must be included in this setting.
rx18795.east-us-2.azurehttps://rx18795.east-us-2.azure.snowflakecomputing.comMicrosoft Azure
ah76025.us-central1.gcphttps://ah76025.us-central1.gcp.snowflakecomputing.comGoogle Cloud Platform
  • Database: Enter the name of the database created in the Create database section.
  • Warehouse: Enter the name of the warehouse created in the Create warehouse section.
  • User: Enter the name of the user created in the Create user section.
  • Role: Specify the role to be assigned to the above user. If not specified, RudderStack uses the default role. Make sure your role has the necessary permissions for RudderStack to load the data into the warehouse.
  • Private Key: Generate private key and specify it in this field. Make sure to include the delimiters.
info

See the following sections in the Snowflake documentation to generate and use the key pair:

  • Private Key Passphrase: Specify the password you set while encrypting the private key. Leave this field blank if your private key is not encrypted.
danger
The user authentication will fail if your private key is encrypted and you do not specify the passphrase.
  • Namespace: Enter the schema name for the warehouse where RudderStack creates all tables. If not specified, RudderStack sets the namespace to the source name by default. Note that you cannot change the namespace later.

Advanced settings

RudderStack provides the following advanced settings:

  • Skip Tracks Table: Toggle on this setting to skip sending events to the tracks table.
  • JSON Columns: This setting lets you ingest semi-structured event data not defined by a fixed schema. You can specify the required JSON column paths in the dot notation, separated by commas. See the JSON Column Support guide for more information.
  • Consent management settings: Configure the consent management settings for the specified source by choosing the Consent management provider from the dropdown and entering the relevant consent category IDs. See Consent Management in RudderStack for more information on this feature.

IPs to be allowlisted

To enable network access to RudderStack, allowlist the following RudderStack IPs depending on your region and RudderStack Cloud plan:

Plan
Region
US
EU
Free, Starter, and Growth
  • 3.216.35.97
  • 18.214.35.254
  • 23.20.96.9
  • 34.198.90.241
  • 34.211.241.254
  • 52.38.160.231
  • 54.147.40.62
  • 3.123.104.182
  • 3.125.132.33
  • 18.198.90.215
  • 18.196.167.201
Enterprise
  • 3.216.35.97
  • 34.198.90.241
  • 44.236.60.231
  • 54.147.40.62
  • 100.20.239.77
  • 3.66.99.198
  • 3.64.201.167
  • 3.123.104.182
  • 3.125.132.33
info
All the outbound traffic is routed through these RudderStack IPs.

FAQ

What is the difference between Snowpipe Streaming and traditional Snowflake warehouse?

The following table highlights the key differences between Snowpipe Streaming and the traditional Snowflake warehouse:

FeatureSnowpipe StreamingTraditional Snowflake warehouse
Use caseReal-time data useAnalytical and batch processing
LatencyReal-time or near real-timeHigher latency (batch-oriented)
Data volumeSmall, continuous data streamsLarge, periodic batches
Cost efficiencyOptimized for streaming and real-time ingestionOptimized for batch processing

Why am I not seeing the users table in the schema for identify events?

This integration sends the identify events exclusively to the identifies table and skips the users table entirely.


Questions? Contact us by email or on Slack