How to load data from Stripe to Snowflake

Extract your Stripe’s Data

Stripe is an API-first product, it’s a unified set of APIs and tools that instantly enables businesses to accept and manage online payments. It is a web API following the RESTful principles, they try to use as many as possible HTTP built-in features to make it accessible to off-the-shelf HTTP clients and the serialization they support for their responses is JSON.

They also have two different types of keys used for authentication, one for testing mode and one for live mode, using the testing mode key it becomes easy to test every aspect of the API without messing with your real data. Also, keep in mind that the calls you make to the Stripe API have to be over HTTPS only for security reasons, plain HTTP calls will fail, same happens for non-authenticated calls, so do not forget to use your testing mode key in case you want to experiment with the API.

Currently, the Stripe API is built around the following ten core resources:

  • Balance – an object that represents your stripe balance.
  • Charges – to charge a credit or debit card you create a charge
  • Customers – Customer objects allow you to perform recurring charges and track multiple charges that are associated with the same customer.
  • Dispute – A dispute occurs when a customer questions your charge with their bank or credit card company.
  • Events – Events are our way of letting you know when something interesting happens in your account.
  • File uploads – There are various times when you’ll want to upload files to Stripe (for example, when uploading dispute evidence).
  • Refunds – Refund objects allow you to refund a charge that has previously been created but not yet refunded.
  • Tokens – Tokens can be created with your publishable API key.
  • Transfers – When Stripe sends you money or you initiate a transfer to a bank account
  • Transfer reversals – A previously created transfer can be reversed if it has not yet been paid out.

All of the above resources support CRUD operations by using HTTP verbs on their associated endpoints. As a web API, you can access it using by using tools like CURL or Postman or your favorite http client for the language or framework of your choice. Some options are the following:

  • Apache HttpClient for Java
  • Spray-client for Scala
  • Hyper for Rust
  • Ruby rest-client
  • Python http-client

There’s also a large number of libraries that wrap around the Stripe API and offer an easier way to interact with it, both community developed and from Stripe. For more information, you can check the libraries section in the API documentation.

Stripe and any other service that you might be using, has figured out (hopefully) the optimal model for its operations, but when we fetch data originated from them we usually want to answer questions or do things that are not part of the context that these services operate, something that makes these models sub-optimal for your analytic needs.

For this reason, we should always keep in mind that when we work with data coming from external services we need to remodel it and bring it to the right form for our needs.

So let’s assume that we want to perform some churn analysis for our company and to do that we need customer data that indicates when they have canceled their subscriptions. To do that we’ll have to request the customer objects that Stripe holds for our company. We can do that with the following command:

JAVASCRIPT
curl https://api.stripe.com/v1/charges?limit=3
-u sk_test_BQokikJOvBiI2HlWgH4olfQ2:

and a typical response will look like the following:

JAVASCRIPT
{
"object": "list",
"url": "/v1/charges",
"has_more": false,
"data": [
{
"id": "ch_17SY5f2eZvKYlo2CiPfbfz4a",
"object": "charge",
"amount": 500,
"amount_refunded": 0,
"application_fee": null,
"balance_transaction": "txn_17KGyT2eZvKYlo2CoIQ1KPB1",
"captured": true,
"created": 1452627963,
"currency": "usd",
"customer": null,
"description": "thedude@grepinnovation.com Account Credit",
"destination": null,
"dispute": null,
"failure_code": null,
"failure_message": null,
"fraud_details": {
}

Inside the customer object there’s a list of subscription objects that look like the following JSON document:

JAVASCRIPT
{
"id": "sub_7hy2fgATDfYnJS",
"object": "subscription",
"application_fee_percent": null,
"cancel_at_period_end": false,
"canceled_at": null,
"current_period_end": 1455306419,
"current_period_start": 1452628019,
"customer": "cus_7hy0yQ55razJrh",
"discount": null,
"ended_at": null,
"metadata": {
},
"plan": {
"id": "gold2132",
"object": "plan",
"amount": 2000,
"created": 1386249594,
"currency": "usd",
"interval": "month",
"interval_count": 1,
"livemode": false,
"metadata": {
},
"name": "Gold ",
"statement_descriptor": null,
"trial_period_days": null
},
"quantity": 1,
"start": 1452628019,
"status": "active",
"tax_percent": null,
"trial_end": null,
"trial_start": null
}

These objects together with part of the customer object, contain the information we need to perform churn analysis. Of course, we’ll have to extract all the information we need, map it to the schema of our data warehouse repository and then load every data to it following the instructions of this post.

Stream data using the Stripe API to Snowflake

It is also possible to setup a streaming data infrastructure that will collect data by a Stripe source and push them into a warehouse of data in a streaming fashion. This can be achieved by using the webhooks functionality that Stripe supports, you register some events to it and every time something happens, Stripe will push a message to your webhook.

For more information about that, check the API documentation on webhooks.

Stripe Data Preparation for Snowflake

The first step, before you start ingesting data you own into a Snowflake warehouse instance, is to have a well-defined schema of them.

Data in Snowflake is organized around tables with a well defined set of columns with each one having a specific data type.

Snowflake supports a rich set of data types. It is worth mentioning that a number of semi-structured data types is also supported. With Snowflake, is possible to load directly data in JSON, Avro, ORC, Parquet, or XML format. Hierarchical data is treated as a first-class citizen, similar to what Google BigQuery offers.

There is also one notable common data type that is not supported by Snowflake. LOB or large object data type is not supported, instead, you should use a BINARY or VARCHAR type instead. But these types are not that useful for data warehouse use cases.


A typical strategy for loading data from Stripe to Snowflake, is to create a schema where you will map each API endpoint to a table.

Each key inside the Stripe API endpoint response should be mapped to a column of that table and you should ensure the right conversion to a Snowflake data type.

Of course, you will have to ensure that as data types from the Stripe API might change, you will adapt database tables accordingly, there’s no such thing as automatic data type casting.

After you have a complete and well-defined data model or schema for Snowflake, you can move forward and start loading data into a database.

Load from Stripe to Snowflake

Usually, data is loaded into Snowflake in a bulk way, using the COPY INTO command. Files containing data, usually in JSON format, are stored in a local file system or in Amazon S3 buckets. Then a COPY INTO command is invoked on the Snowflake instance and data is copied into a warehouse of data.

The files can be pushed into Snowflake using the PUT command, into a staging environment before the COPY command is invoked.

Another alternative is to upload data directly into a service like Amazon S3 from where Snowflake can access data directly.

Finally, Snowflake offers a web interface as a data loading wizard where someone can visually setup and copy any data into a data warehouse. Just keep in mind that the functionality of this wizard is limited compared to the rest of the methods.

Snowflake in contrast to other technologies like Redshift does not require a data schema to be packed together with data that will be copied. Instead, the schema is part of the query that will copy every data into the data warehouse. This simplifies the data loading process and offers more flexibility on datatype management.

Updating your Stripe data on Snowflake

As you will be generating more data on Stripe, you will be obliged to update your older data on Snowflake. This includes new records together with updates to older records that for any reason have been updated on Stripe.

You will have to periodically check Stripe for new data and repeat the process that has been described previously while updating your currently available data if needed. Updating an already existing row on a Snowflake table is achieved by creating UPDATE statements.

Snowflake has a great tutorial on the different ways of handling updates, especially using primary keys.

Another issue that you need to take care of is the identification and removal of any duplicate records on your own database. Either because Stripe does not have a mechanism to identify new and updated records or because of errors on your own data pipelines, duplicate records might be introduced to the database.

In general, ensuring the quality of data that is inserted into your database is a big and difficult issue.

The best way to load data from Stripe to Snowflake

So far we just scraped the surface of what you can do with Snowflake and how to load data into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting, and maintaining a flexible data infrastructure use Rudderstack that can handle everything automatically for you.

Rudderstack, with one click, integrates with sources or services, creates analytics-ready data, and syncs your Stripe to Snowflake right away.

Sign Up For Free And Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.
Don't want to go through the pain of direct integration? RudderStack's Stripe integration makes it easy to send data from Stripe to Snowflake.