How to load data from Mailchimp to SQL Data Warehouse

Extract your data from MailChimp

First of all, Mailchimp updated its API to v3 recently, so make sure that anything you do will be with this version as the previous are all deprecated although still supported, for more information visit MailChimp API v3.0 documentation. MailChimp was always a promoter of APIs and encouraged integration with other systems. It has a rich API that exposes a large number of endpoints for interacting with the resources of the applications, more specifically there are endpoints for the following resources:

  • Automations – offers functionality related to automated tasks that we define on MailChimp
  • Batch operations – for managing batch processes on our MailChimp account
  • Campaign folders – helps you organize your campaigns into groups
  • Campaigns – for managing your campaigns
  • Conversations – helps you track threads of emails with specific users
  • File manager files – for managing your static assets like images
  • File manager folders – for creating folders to organize your assets
  • Lists CRUD operations on lists of users
  • Reports – for accessing reports with statistics on your campaigns
  • Template folders – operations on creating folders for organizing your templates for your emails
  • Templates – operations on templates for your emails

Something interesting to note about the MailChimp API is that we don’t see a root-level resource for users anywhere. This makes sense of course, if we take into consideration that MailChimp is all about mail campaigns so users or to put in the correct context, subscribers, are not a stand-alone resource but instead they exist only inside the lists we manage. It is important to understand that every service perceives the world from a different perspective which of course it’s relevant to the value it offers, so even if what we care about is information about our users, it makes complete sense for someone like MailChimp to organize everything around lists and campaigns. If we check the model of another service, like Intercom, for example, we’ll notice that the user is at the top resources, which again makes sense because Intercom is all about one-to-one communication with users.

MailChimp and any other service that you might be using, has figured out (hopefully) the optimal model for its operations, but when we fetch data from them we usually want to answer questions or do things that are not part of the context that these services operate, something that makes these models sub-optimal for your analytics needs. For this reason, we should always keep in mind that when we work with data coming from external services we need to remodel it and bring it to the right form for our needs.

Interacting with the MailChimp REST API can be done by using tools like CURL or Postman or by using HTTP clients for your favorite language or framework. A few suggestions:

The MailChimp REST API supports oAuth 2.0 authentication, more information can be found in the Authorised apps section of the API documentation. After you successfully authenticate with the REST API, you have to start interacting with its resources and start fetching data from it in order to load them on your data warehouse.

Extract your User Data from the MailChimp API

So let’s assume that we want to get all the information we got on MailChimp for our users in order to enrich our user records inside our data warehouse. To do that we need to do the following.

First, we need to fetch all the lists that we have created on MailChimp, we can do this by performing a GET request to the appropriate endpoint.

SH
curl --request GET --url 'https://usX.api.mailchimp.com/3.0/lists' --user 'anystring:apikey' —include

We should get back a response like the following:

JSON
{ "lists": [ { "id": "57afe96172", "name": "Freddie's Jokes", "contact": { "company": "MailChimp", "address1": "675 Ponce De Leon Ave NE", "address2": "Suite 5000", "city": "Atlanta", "state": "GA", "zip": "30308", "country": "US", "phone": "" },……

Using the ID we get from the response for each list, we iterate through all the lists and make requests to the appropriate end-points to get the members for a list.

SH
curl --request GET --url 'https://usX.api.mailchimp.com/3.0/lists/57afe96172/members' --user 'anystring:apikey' —include

and we should get a response back like the following:

JSON
{ "members": [ { "id": "f777bbffab8d1ceca8b757df63c47cb8", "email_address": "urist.mcvankab+1@freddiesjokes.co", "unique_email_id": "882e9bec19", "email_type": "html", "status": "subscribed", "status_if_new": "", "merge_fields": { "FNAME": "", "LNAME": "" }, "interests": { "9143cf3bd1": true, "3a2a927344": false, "f9c8f5f0ff": false, "f231b09abc": true, "bd6e66465f": false },…………

Keep in mind that a user might appear in more than one list, which means that you also need to reduplicate your results based on this fact. After we get the results we need and collect all the user-related information from various other resources we need to map it to the model of our Data Warehouse repository before we do the actual loading.

Load Data from Mailchimp to SQL Data Warehouse

SQL Data Warehouse support numerous options for loading data, such as:

  • PolyBase
  • Azure Data Factory
  • BCP command-line utility
  • SQL Server integration services

As we are interested in loading data from online services by using their exposed HTTP APIs, we are not going to consider the usage of BCP command-line utility or SQL server integration in this guide. We’ll consider the case of loading our data as Azure storage Blobs and then use PolyBase to load the data into SQL Data Warehouse.

Accessing these services happens through HTTP APIs, as we see again APIs play an important role in both the extraction but also the loading of data into our data warehouse. You can access these APIs by using a tool like CURL or Postman. Or use the libraries provided by Microsoft for your favorite language. Before you actually upload any data you have to create a container which is something similar as a concept to the Amazon AWS Bucket, creating a container is a straightforward operation and you can do it by following the instructions found on the Blob storage documentation from Microsoft. As an example, the following code can create a container in Node.js.

JAVASCRIPT
blobSvc.createContainerIfNotExists('mycontainer', function(error, result, response){ if(!error){ // Container exists and allows // anonymous read access to blob // content and metadata within this container } });

After the creation of the container you can start uploading data to it by using again the given SDK of your choice in a similar fashion:

JAVASCRIPT
blobSvc.createBlockBlobFromLocalFile('mycontainer', 'myblob', 'test.txt', function(error, result, response){ if(!error){ // file uploaded } });

When you are done putting your data into Azure Blobs you are ready to load it into SQL Data Warehouse using PolyBase. To do that you should follow the directions in the Load with PolyBase documentation. In a summary the required steps to do it, are the following:

  1. create a database master key
  2. create a database scoped credentials
  3. create an external file format
  4. create an external data source

PolyBase’s ability to transparently parallelize loads from Azure Blob Storage will make it the fastest tool for loading data. After configuring PolyBase, you can load data directly into your SQL Data Warehouse by simply creating an external table that points to your data in storage and then mapping that data to a new table within SQL Data Warehouse.

Of course, you will need to establish a recurrent process that will extract any newly created data from your service, load them in the form of Azure Blobs and initiate the PolyBase process for importing the data again into SQL Data Warehouse. One way of doing this is by using the Azure Data Factory service. In case you would like to follow this path you can read some good documentation on how to move data to and from Azure SQL Warehouse using Azure Data Factory.

The best way to load data from Mailchimp to SQL Data Warehouse and possible alternatives

So far we just scraped the surface of what can be done with Microsoft Azure SQL Data Warehouse and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from, and the requirements of your use case. Things can get even more complicated if you want to integrate data coming from different sources. A possible alternative, instead of writing, hosting, and maintaining a flexible data infrastructure, is to use a product like RudderStack that can handle this kind of problem automatically for you.

RudderStack integrates with multiple sources or services like databases, CRM, email campaigns, analytics, and more. Quickly and safely move all your data from MailChimp into SQL Data Warehouse and start generating insights from your data.

Sign Up For Free And Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.