How to load data from Shopify to SQL Data Warehouse

How to Extract my data from Shopify?

Shopify exposes its complete platform to developers through its API. Thousands of developers use it to create applications that are then sold through the Shopify marketplace.

As a Web API following the RESTful architecture principles, it can be accessed through HTTP. As a RESTful API, interacting with it can be achieved by using tools like CURL or Postman or by using http clients for your favorite language or framework. A few suggestions:

Shopify also offers a number of SDKs that are officially supported and maintained by them and that can be used to access their platform for different uses. For example, by using the iOS and Android Buy SDK it is possible to add Shopify checkout capabilities to your mobile application. Other SDKs that are offered are:

Shopify API Authentication

There are two different types of applications in Shopify regarding authentication, private and public apps.

Private apps will only function on individual stores, so you don’t need to authenticate them through Oauth. You can get Private app credentials through your Partner dashboard or by logging into any Shopify admin. You can also use this API key to manipulate your store using the API console without building a fully functional app.

The apps you create in your Partners dashboard function as public apps, which can be made available for download in Shopify’s App Store if you meet the criteria.

Shopify rate limiting

The API call limit operates using a “leaky bucket” algorithm as a controller. This allows for infrequent bursts of calls and allows your app to continue to make an unlimited amount of calls over time. The bucket size is 40 calls (which cannot be exceeded at any given time), with a “leak rate” of 2 calls per second that continually empties the bucket. If your app averages 2 calls per second, it will never trip a 429 error (“bucket overflow”).

Endpoints and available resources

Shopify exposes 35+ endpoints covering all the possible touchpoints of e-commerce. Some of the most important resources that can be accessed through these endpoints are the following:

  • Abandoned checkouts: used to return abandoned checkouts. A checkout is considered abandoned when a customer has entered their billing & shipping info but has yet to complete the purchase.
  • ApplicationCharge: Request to charge a shop a one-time fee by issuing this call.
  • Article: Operations concerning articles in your Blog.
  • Asset: files that make up the theme of a shop.
  • Blog: Shopify and the e-commerce features also offer an environment where the merchant can create a Blog for her shop.
  • CarrierService: A Carrier Service (also known as a Carrier Calculated Service or Shipping Service) provides real-time shipping rates to Shopify.
  • Collect: An object that connects a product to a custom collection.
  • Customer: A customer resource instance represents a customer account with the shop.
  • Event: Events are generated by specific Shopify resources when specific things happen, such as the creation of an article.
  • Order: An order is a customer’s completed request to purchase one or more products from a shop.
  • Product: A product is an individual item for sale in a Shopify shop.
  • Transaction: Transactions are created for every order that results in an exchange of money.

For a complete list of endpoints, you can see them here.

It is clear that with such a rich platform and API, the data that can be pulled out of Shopify are both valuable and come in large quantities. So, let’s assume that we want to pull all events out of Shopify to use them for further analysis. To do so, we need to request the Event endpoint like this.

JAVASCRIPT
GET /admin/events.json?filter=Product,Order

This request will get us back to

JAVASCRIPT
HTTP/1.1 200 OK
{
"events": [
{
"id": 677313116,
"subject_id": 921728736,
"created_at": "2008-01-10T08:00:00-05:00",
"subject_type": "Product",
"verb": "create",
"arguments": [
"IPod Touch 8GB"
],
"body": null,
"message": "created a new product: <a href=\"\/admin\/products\/921728736\">IPod Touch 8GB<\/a>.",
"author": "Shopify",
"description": "created a new product: IPod Touch 8GB.",
"path": "\/admin\/products\/921728736"
},
{
"id": 365755215,
"subject_id": 632910392,
"created_at": "2008-01-10T07:00:00-05:00",
"subject_type": "Product",
"verb": "create",
"arguments": [
"IPod Nano - 8GB"
],
"body": null,
"message": "created a new product: <a href=\"\/admin\/products\/632910392\">IPod Nano - 8GB<\/a>.",
"author": "Shopify",
"description": "created a new product: IPod Nano - 8GB.",
"path": "\/admin\/products\/632910392"
}
]
}

all the events that are related to Products and Orders for our shop. The response will be in JSON and will look like this:

Inside the response, there will be an array of objects with each one representing one Order or Product.

Events are generated for the following resources:

  • Articles
  • Blogs
  • Custom Collections
  • Comments
  • Orders
  • Pages
  • Products
  • Smart Collections

Using the “limit” and “page” parameters, it is possible to page your results if you need to pull out a large number of events. Additionally, there are a number of ways that you can filter the results, for example, based on a date, so in a continuous data extraction process, it is possible to pull only new data and avoid duplicates. This can be achieved by using the “created_at_min” and “created_at_max” parameters in your GET request.

After successfully pulling your data from the Shopify API, you are ready to extract and prepare them for SQL Data Warehouse. Of course, the above process is only for one of the available resources, and if you would like to have a complete view of all the available data, then you will have to create a much more complex ETL process, including the majority of the 35+ resources that Shopify has.

How can I Load my data from Shopify to SQL Data Warehouse?

SQL Data Warehouse support numerous options for loading data, such as:

  • PolyBase
  • Azure Data Factory
  • BCP command-line utility
  • SQL Server integration services

As we are interested in loading data from online services by using their exposed HTTP APIs, we will not consider the usage of BCP command-line utility or SQL server integration in this guide. We’ll consider the case of loading our data as Azure storage Blobs and then use PolyBase to load the data into SQL Data Warehouse.

Accessing these services happens through HTTP APIs, as we see again APIs play an important role in both the extraction but also the loading of data into our data warehouse. You can access these APIs by using a tool like CURL or Postman. Or use the libraries provided by Microsoft for your favorite language. Before you actually upload any data you have to create a container which is something similar to a concept to the Amazon AWS Bucket, creating a container is a straightforward operation and you can do it by following the instructions found on the Blog storage documentation from Microsoft. As an example, the following code can create a container in Node.js.

JAVASCRIPT
blobSvc.createContainerIfNotExists('mycontainer', function(error, result, response){
if(!error){
// Container exists and allows
// anonymous read access to blob
// content and metadata within this container
}
});

After the creation of the container you can start uploading data to it by using again the given SDK of your choice in a similar fashion:

JAVASCRIPT
blobSvc.createBlockBlobFromLocalFile('mycontainer', 'myblob', 'test.txt', function(error, result, response){
if(!error){
// file uploaded
}
});

When you are done putting your data into Azure Blobs you are ready to load it into SQL Data Warehouse using PolyBase. To do that you should follow the directions in the Load with PolyBase documentation. In summary, the required steps to do it, are the following:

  • create a database master key
  • create a database scoped credentials
  • create an external file format
  • create an external data source

PolyBase’s ability to transparently parallelize loads from Azure Blob Storage will make it the fastest tool for loading data. After configuring PolyBase, you can load data directly into your SQL Data Warehouse by simply creating an external table that points to your data in storage and then mapping that data to a new table within SQL Data Warehouse.

Of course, you will need to establish a recurrent process that will extract any newly created data from your service, load them in the form of Azure Blobs and initiate the PolyBase process for importing the data again into SQL Data Warehouse. One way of doing this is by using the Azure Data Factory service. In case you would like to follow this path you can read some good documentation on how to move data to and from Azure SQL Warehouse using Azure Data Factory.



What is the best way to load data from Shopify to SQL Data Warehouse? Which are the possible alternatives?

So far, we just scraped the surface of what can be done with Microsoft Azure SQL Data Warehouse and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from, and the requirements of your use case. Things can get even more complicated if you want to integrate data coming from different sources. Instead of writing, hosting, and maintaining a flexible data infrastructure, a possible alternative is to use a product like RudderStack that can automatically handle this kind of problem for you.

RudderStack integrates with multiple sources or services like databases, CRM, email campaigns, analytics, and more.

Sign Up For Free And Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.