How to load data from Stripe to SQL Data Warehouse
Extract your data from Stripe
Stripe is an API-first product, it’s a unified set of APIs and tools that instantly enables businesses to accept and manage online payments. It is a web API following the RESTful principles, they try to use as many as possible HTTP built-in features to make it accessible to off-the-shelf HTTP clients and the serialization they support for their responses is JSON. They also have two different types of keys used for authentication, one for testing mode and one for live mode, using the testing mode key it becomes easy to test every aspect of the API without messing with your real data. Also, keep in mind that the calls you make to the Stripe API have to be over HTTPS only for security reasons, plain HTTP calls will fail, same happens for non authenticated calls, so do not forget to use your testing mode key in case you want to experiment with the API.
Currently, the Stripe API is built around the following ten core resources:
- Balance – an object that represents your stripe balance.
- Charges – to charge a credit or debit card you create a charge
- Customers – Customer objects allow you to perform recurring charges and track multiple charges that are associated with the same customer.
- Dispute – A dispute occurs when a customer questions your charge with their bank or credit card company.
- Events – Events are our way of letting you know when something interesting happens in your account.
- File Uploads – There are various times when you’ll want to upload files to Stripe (for example, when uploading dispute evidence).
- Refunds – Refund objects allow you to refund a charge that has previously been created but not yet refunded.
- Tokens – Tokens can be created with your publishable API key.
- Transfers – When Stripe sends you money or you initiate a transfer to a bank account
- Transfer Reversals – A previously created transfer can be reversed if it has not yet been paid out.
All of the above resources support CRUD operations by using HTTP verbs on their associated endpoints. As a web API, you can access it using by using tools like CURL or Postman or Apirise or your favorite HTTP client for the language or framework of your choice. Some options are the following:
There’s also a large number of libraries that wrap around the Stripe API and offer an easier way to interact with it, both communities developed and from Stripe. For more information, you can check the libraries section in the API documentation.
Stripe and any other service that you might be using, has figured out (hopefully) the optimal model for its operations, but when we fetch data from them we usually want to answer questions or do things that are not part of the context that these services operate, something that makes these models sub-optimal for your analytic needs. For this reason, we should always keep in mind that when we work with data coming from external services we need to remodel it and bring it to the right form for our needs.
So let’s assume that we want to perform some churn analysis for our company and to do that we need customer data that indicate when they have canceled their subscriptions. To do that we’ll have to request the customer objects that Stripe holds for our company. We can do that with the following command:
SH
curl https://api.stripe.com/v1/charges?limit=3-u sk_test_BQokikJOvBiI2HlWgH4olfQ2:
and a typical response will look like the following:
JSON
{"object": "list","url": "/v1/charges","has_more": false,"data": [{"id": "ch_17SY5f2eZvKYlo2CiPfbfz4a","object": "charge","amount": 500,"amount_refunded": 0,"application_fee": null,"balance_transaction": "txn_17KGyT2eZvKYlo2CoIQ1KPB1","captured": true,"created": 1452627963,"currency": "usd","customer": null,"description": "thedude@grepinnovation.com Account Credit","destination": null,"dispute": null,"failure_code": null,"failure_message": null,"fraud_details": {}, …….
Inside the customer object there’s a list of subscription objects that look like the following JSON document:
JSON
{"id": "sub_7hy2fgATDfYnJS","object": "subscription","application_fee_percent": null,"cancel_at_period_end": false,"canceled_at": null,"current_period_end": 1455306419,"current_period_start": 1452628019,"customer": "cus_7hy0yQ55razJrh","discount": null,"ended_at": null,"metadata": {},"plan": {"id": "gold2132","object": "plan","amount": 2000,"created": 1386249594,"currency": "usd","interval": "month","interval_count": 1,"livemode": false,"metadata": {},"name": "Gold ","statement_descriptor": null,"trial_period_days": null},"quantity": 1,"start": 1452628019,"status": "active","tax_percent": null,"trial_end": null,"trial_start": null}
These objects together with part of the customer object, contain the information we need to perform churn analysis. Of course, we’ll have to extract all the information we need, map it to the schema of our data warehouse repository and then load the data to it following the instructions of this post.
Stream Data From the Stripe API to Your Data Warehouse
It is also possible to set up a streaming data infrastructure that will collect data from Strip and push them into your data warehouse in a streaming fashion. This can be achieved by using the webhooks functionality that Stripe supports, you register some events to it, and every time something happens, Stripe will push a message to your webhook. For more information about that, check the API documentation on webhooks.
Load Data from Stripe to SQL Data Warehouse
SQL Data Warehouse support numerous options for loading data, such as:
- PolyBase
- Azure Data Factory
- BCP command-line utility
- SQL Server integration services
As we are interested in loading data from online services by using their exposed HTTP APIs, we are not going to consider the usage of BCP command-line utility or SQL server integration in this guide. We’ll consider the case of loading our data as Azure storage Blobs and then use PolyBase to load the data into SQL Data Warehouse.
Accessing these services happens through HTTP APIs, as we see again APIs play an important role in both the extraction but also the loading of data into our data warehouse. You can access these APIs by using a tool like CURL, Postman or Apirise. Or use the libraries provided by Microsoft for your favorite language. Before you actually upload any data you have to create a container which is something similar to a concept to the Amazon AWS Bucket, creating a container is a straightforward operation and you can do it by following the instructions found on the Blog storage documentation from Microsoft. As an example, the following code can create a container in Node.js.
JAVASCRIPT
blobSvc.createContainerIfNotExists('mycontainer', function(error, result, response){if(!error){// Container exists and allows// anonymous read access to blob// content and metadata within this container}});
After the creation of the container you can start uploading data to it by using again the given SDK of your choice in a similar fashion:
JAVASCRIPT
blobSvc.createBlockBlobFromLocalFile('mycontainer', 'myblob', 'test.txt', function(error, result, response){if(!error){// file uploaded}});
When you are done putting your data into Azure Blobs you are ready to load it into SQL Data Warehouse using PolyBase. To do that you should follow the directions in the Load with PolyBase documentation. In summary the required steps to do it, are the following:
- create a database master key
- create a database scoped credentials
- create an external file format
- create an external data source
PolyBase’s ability to transparently parallelize loads from Azure Blob Storage will make it the fastest tool for loading data. After configuring PolyBase, you can load data directly into your SQL Data Warehouse by simply creating an external table that points to your data in storage and then mapping that data to a new table within SQL Data Warehouse.
Of course you will need to establish a recurrent process that will extract any newly created data from your service, load them in the form of Azure Blobs and initiate the PolyBase process for importing the data again into SQL Data Warehouse. One way of doing this is by using the Azure Data Factory service. In case you would like to follow this path you can read some good documentation on how to move data to and from Azure SQL Warehouse using Azure Data Factory.
The best way to load data from Stripe to SQL Data Warehouse and possible alternatives
So far we just scraped the surface of what can be done with Microsoft Azure SQL Data Warehouse and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from, and the requirements of your use case. Things can get even more complicated if you want to integrate data coming from different sources. A possible alternative, instead of writing, hosting, and maintaining a flexible data infrastructure, is to use a product like RudderStack that can handle this kind of problem automatically for you.
RudderStack integrates with multiple sources or services like databases, CRM, email campaigns, analytics, and more. Quickly and safely move all your data from Stripe into SQL Data Warehouse and start generating insights from your data.