How to load data from Intercom to Snowflake

Extract data from Intercom

Intercom exposes a rich REST API for interacting with its services. There are two reasons someone would like to use the Intercom REST API:

  1. To pull data from it and use it for analytic purposes
  2. To push data into to enrich the information it contains for customers and enhance its service

In this post, we’ll focus mainly on pulling data from the API and using it to enrich our data warehouse with data generated from our interactions with our customers.

The main entities of the Intercom API are the following:

  • Users – which is the primary way of interacting with Intercom.
  • Leads – represent logged-out users of your application.
  • Companies – allow you to represent commercial organizations using your product.
  • Tags – A tag allows you to label your users and companies and list them using that tag.
  • Segments – A segment is a group of your users defined by rules that you set.
  • Notes – Notes allow you to annotate and comment on your users.
  • Events – Events are how you can submit user activity to Intercom.
  • Counts – You can get counts of users and companies filtered by certain criteria.
  • Conversations – Conversations are how you can communicate with users in Intercom.

It has to be noted that not all of the above entities can be pulled out from the Intercom API. For example, Events can only be pushed inside Intercom, and it’s not possible to extract them again. So if you plan to use Intercom also for tracking the behavior of your users keep that in mind, because contrary to services like Mixpanel, it’s not possible to pull the user events from the system.

Your user activity will be pushed on all the different services you need and you will always have access to any data.

Intercom exposes a RESTful web API, which means that we interact with its resources through HTTP verbs like POST and GET by using an HTTP client. Intercom also offers a number of SDKs built around an HTTP client for many popular languages.

Pull data from the Intercom REST API

A typical use case for pulling data out of Intercom is to fetch all your users together with all the conversations you have done with each one. You can then load this information to the preferred warehouse of data and enhance your analytic capabilities with additional interactions with them. To do that, you first need to get all your users. With CURL you can do that in the following way:

JAVASCRIPT
curl https://api.intercom.io/users
-u pi3243fa:da39a3ee5e6b4b0d3255bfef95601890afd80709
-H 'Accept: application/json'

A typical result will look like this:

JAVASCRIPT
{
"type": "user.list",
"total_count": 105,
"users": [
{
"type": "user",
"id": "530370b477ad7120001d",
...
},
...
],
"pages": {
"next": "https://api.intercom.io/users?per_page=50&page=2",
"page": 1,
"per_page": 50,
"total_pages": 3
}
}

Now we can also extract a full list of the conversations that have been performed on Intercom by doing the following:

JAVASCRIPT
$ curl
https://api.intercom.io/conversations?type=admin&admin_id=25&open=true
-u pi3243fa:da39a3ee5e6b4b0d3255bfef95601890afd80709
-H 'Accept:application/json'

and a typical result will look like the following:

JAVASCRIPT
{
"type": "conversation.list",
"conversations": [
{
"type": "conversation",
"id": "147",
"created_at": 1400850973,
"updated_at": 1400857494,
"user": {
"type": "user",
"id": "536e564f316c83104c000020"
},
"assignee": {
"type": "admin",
"id": "25"
},
"conversation_message": {
"type": "conversation_message",
"subject": "",
"body": "<p>Hi Alice,</p>nn<p>We noticed you using our Product, do you have any questions?</p> n<p>- Jane</p>",
"author": {
"type": "admin",
"id": "25"
},
"attachments": [
{
"name": "signature",
"url": "http://someurl.com/signature.jpg"
}
]
}
}
]
}

As we can see, each conversation contains a user object which contains an id. In this way, we can associate the conversations with the users we had extracted earlier. Of course, in order to do that on our warehouse repository, we have to map the above structures to a data model that the repository follows by respecting both the schema and the data types.

Then we can write a pipeline that will extract data and transform it into the model of our repository and load any data by following the instructions that follow below. Of course, if something changes on the Intercom API, the pipeline will break, and we will have to take care of it.

Use webhooks to push events from Intercom to Snowflake

Intercom also supports the definition of webhooks, where you can register certain events, and the system will push there a message whenever the events are triggered. For example, you might define a webhook that will push a message every time a new conversation is performed with your customers.

By doing this, it is possible to create a near real-time streaming load process for your own data warehouse. To implement something like that, though, you must consider the limitations of both ends while ensuring that delivery semantics that your use case requires for a data management infrastructure that you will build.

For more information, you can check the webhooks section on the Intercom API documentation.

Intercom Data Preparation for Snowflake

Before you start ingesting data into a Snowflake data warehouse instance, the first step is to have a well-defined schema of data you have.

Data in Snowflake is organized around tables with a well-defined set of columns, with each one having a specific data type.

Snowflake supports a rich set of data types. It is worth mentioning that a number of semi-structured data types is also supported. With Snowflake, it is possible to load data directly in JSON, Avro, ORC, Parquet, or XML format. Hierarchical data is treated as a first-class citizen, similar to what Google BigQuery offers.

There are also one notable common data type that is not supported by Snowflake. LOB or large object data type is not supported, instead you should use a BINARY or VARCHAR type instead. But these types are not that useful for data warehouse use cases.

A typical strategy for loading data from Intercom to Snowflake, is to create a schema where you will map each API endpoint to a table.

Each key inside the Intercom API endpoint response should be mapped to a column of that table, and you should ensure the right conversion to a Snowflake data type.

Of course, you will have to ensure that as every data type from the Intercom API might change, you will adapt your database tables accordingly. There’s no such thing as automatic data type casting.

After you have a complete and well-defined data model or schema for Snowflake, you can move forward and start loading data into the database.

Load data from Intercom to Snowflake

Usually, data is loaded into Snowflake in a bulk way, using the COPY INTO command. Files containing data, usually in JSON format, are stored in a local file system or in Amazon S3 buckets. Then a COPY INTO command is invoked on the Snowflake instance, and data is copied into a data warehouse.

The files can be pushed into Snowflake using the PUT command into a staging environment before the COPY command is invoked.

Another alternative is to upload every data directly into a service like Amazon S3, from where Snowflake can access data directly.

Finally, Snowflake offers a web interface as a data loading wizard where someone can visually set up and copy the data into the data warehouse. Just keep in mind that the functionality of this wizard is limited compared to the rest of the methods.

Snowflake, in contrast to other technologies like Redshift, does not require a data schema to be packed together with the data that will be copied. Instead the schema is part of the query that will copy data into any data warehouse. This simplifies data loading process and offers more flexibility on data type management.

Updating your Intercom data on Snowflake

As you will be generating more data on Intercom, you will must update your older data on Snowflake. This includes new records together with updates to older records that for any reason have been updated on Intercom.

You will have to periodically check Intercom for new data and repeat the process that has been described previously, while updating your currently available data if needed. Updating an already existing row on a Snowflake table, is achieved by creating UPDATE statements.

Snowflake has a great tutorial on the different ways of handling updates, especially using primary keys.

Another issue that you must take care of is the identification and removal of any duplicate records on your database. Either because Intercom does not have a mechanism to identify new and updated records or because of errors on your own data pipelines, duplicate records might be introduced to your database.

In general, ensuring the quality of data that is inserted in a specific database is a big and difficult issue.

The best way to load data from Intercom to Snowflake

So far, we just scraped the surface of what you can do with Snowflake and how to load data into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting, and maintaining a flexible data infrastructure use RudderStack that can handle everything automatically for you.

RudderStack with one click integrates with sources or services, creates analytics-ready data, and syncs your Intercom to Snowflake right away.

Sign Up For Free And Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.
Don't want to go through the pain of direct integration? RudderStack's Intercom integration makes it easy to send data from Intercom to Snowflake.