Feeling stuck with Segment? Say 👋 to RudderStack.

SVG
Log in

How to load data from Pipedrive to SQL Data Warehouse

[@portabletext/react] Unknown block type "aboutNodeBlock", specify a component for it in the `components.types` prop
[@portabletext/react] Unknown block type "aboutNodeBlock", specify a component for it in the `components.types` prop

How to Extract my data from Pipedrive?

Pipedrive exposes its complete platform to developers through their API. As a Web API following the RESTful architecture principles, it can be accessed through HTTP.

As a RESTful API, interacting with it can be achieved using CURL or Postman tools or using HTTP clients for your favorite language or framework. A few suggestions:

Pipedrive API Authentication

Pipedrive API Authentication is API-Key-based. You acquire an API Key from the platform, and you can use it to authenticate to the API securely. All the calls are executed over secure HTTPS.

Pipedrive Rate Limiting

Rate limiting is considered per API token. API allows performing 100 requests per 10 seconds.

Every API response includes the following headers:

  1. X-RateLimit-Limit: the number of requests the current API token can perform for the 10 seconds window.
  2. X-RateLimit-Remaining: the number of requests left for the 10 seconds window.
  3. X-RateLimit-Reset: the amount of seconds before the limit resets.

In case the limit is exceeded for the time window, the Pipedrive API will return an error response with HTTP code 429 and Retry-After header that will indicate the number of seconds before the limit resets.

Endpoints and Available Resources

Pipedrive exposes a large number of endpoints from which we can interact with the platform. These endpoints can be used to execute commands like adding a new person to our contact list but also to pull data from it. A unique characteristic of the Pipedrive API is that a companion resource exists for many of the resources that manage the custom fields that you might have created for the resource. In this way, maximum flexibility is offered to the users of the platform. The list of available resources follows:

  • Activities: Activities are appointments, tasks, and events in general that can be associated with a deal and your sales pipeline.
  • Activity Fields: custom fields created for your activities.
  • Activity Types: user-defined types for your activities
  • Authorization: Authorization objects can be fetched without an API token but using an email and password.
  • Currencies: Supported currencies that can represent the monetary value of a Deal or a value of any monetary type custom field.
  • Deals: Deals represent ongoing, lost, or won sales to an organization or a Person.
  • Deal Fields: DealFields represent the near-complete schema for a Deal in the context of the company of the authorized user.
  • Email Messages: EmailMessages represent e-mail messages sent or received through Pipedrive designated e-mail account.
  • Email Threads: EmailThreads represent e-mail message threads that contain individual e-mail messages.
  • Files: Files are documents of any kind (images, spreadsheets, text files, etc.) that are uploaded to Pipedrive
  • Filters: Each filter is essentially a set of data validation conditions.
  • Goals: Goals help your team meet your sales targets.
  • Mail Messages: MailMessages represent mail messages synced with Pipedrive using the 2-way sync or the Smart Email BCC feature.
  • MailThreads: MailThreads represent mail threads that contain individual mail messages.
  • Notes: Notes are pieces of textual (HTML-formatted) information that can be attached to Deals, Persons, and Organizations.
  • Note Fields: Custom fields for Notes.
  • Organizations: Organizations are companies and other kinds of organizations you are making Deals with.
  • Organization Fields: OrganizationFields represent the near-complete schema for an Organization in the authorized user’s company.
  • Persons: Persons are your contacts, the customers you are doing Deals with
  • Person Fields: Custom fields for persons.
  • Pipelines: Pipelines are essentially ordered collections of Stages.
  • Products: Products are the goods or services you are dealing with.
  • Product fields: ProductFields represent the near-complete schema for a Product.
  • Stages: Stage is a logical component of a Pipeline and essentially a bucket that can hold a number of Deals.
  • Users: Users are people with access to your Pipedrive account.

For a detailed list of all endpoints together with a way to make requests to them without a client to see the data they return, if you have a Pipedrive account. Please check here.

It is clear that with such a rich platform and API, the data that can be pulled out of Pipedrive are both valuable and come in large quantities. So, let’s assume that we want to pull all the persons out of Pipedrive to use the associated data for further analysis. To do so, we need to make a GET request with your favorite client to the Persons’ endpoint like this.

BATCHFILE
GET https://api.pipedrive.com/v1/persons?start=0&api_token=YOUR_KEY

The response headers and the actual response will look like the following:

JSON
{
"server": "nginx",
"date": "Tue, 06 Sep 2016 15:46:38 GMT",
"content-type": "application/json",
"transfer-encoding": "chunked",
"connection": "keep-alive",
"x-frame-options": "SAMEORIGIN",
"x-xss-protection": "1; mode=block",
"x-ratelimit-limit": "100",
"x-ratelimit-remaining": "99",
"x-ratelimit-reset": "10",
"access-control-allow-origin": "*"
}
JSON
{
"success": true,
"data": [
{
"id": 1,
"company_id": 1180166,
"owner_id": {
"id": 1682699,
"name": "Kostas",
"email": "costas.pardalis@gmail.com",
"has_pic": true,
"pic_hash": "39bf355364aacbde4fdfed3cef8a4589",
"active_flag": true,
"value": 1682699
},
"org_id": null,
"name": "Fotiz",
"first_name": null,
"last_name": "Fotiz",
"open_deals_count": 0,
"closed_deals_count": 0,
"participant_open_deals_count": 0,
"participant_closed_deals_count": 0,
"email_messages_count": 0,
"activities_count": 0,
"done_activities_count": 0,
"undone_activities_count": 0,
"reference_activities_count": 0,
"files_count": 0,
"notes_count": 0,
"followers_count": 1,
"won_deals_count": 0,
"lost_deals_count": 0,
"active_flag": true,
……

Inside the response, there will be an array of objects representing one Person as represented in Pipedrive. Please note that all data are serialized in JSON.
After you have successfully pulled your data from the Pipedrive API, you are ready to extract and prepare them for Amazon Redshift. Of course, the above process is only for one of the available resources. If you would like to have a complete view of all the available data, then you will have to create a much complex ETL process, including the majority of the resources that Pipedrive has. Alternatively, you can check RudderStack, which can simplify the whole process, and you can have your Pipedrive data available for analysis in a matter of a few minutes.

How can I load my data from Pipedrive to SQL Data Warehouse?

SQL Data Warehouse supports numerous options for loading data, such as:

  • PolyBase
  • Azure Data Factory
  • BCP command-line utility
  • SQL Server integration services

As we are interested in loading data from online services by using their exposed HTTP APIs, we will not consider the usage of BCP command-line utility or SQL server integration in this guide. We’ll consider the case of loading our data as Azure storage Blobs and then use PolyBase to load the data into SQL Data Warehouse.

Accessing these services happens through HTTP APIs. As we see again, APIs play an important role in both the extraction and the loading of data into our data warehouse. You can access these APIs by using a tool like CURL or Postman. Or use the libraries provided by Microsoft for your favorite language. Before you upload any data, you have to create a container that is similar to a concept to the Amazon AWS Bucket, creating a container is a straightforward operation, and you can do it by following the instructions found on the Blog storage documentation from Microsoft. As an example, the following code can create a container in Node.js.

JAVASCRIPT
blobSvc.createContainerIfNotExists('mycontainer', function(error, result, response){
if(!error){
// Container exists and allows
// anonymous read access to blob
// content and metadata within this container
}
});

After the creation of the container you can start uploading data to it by using again the given SDK of your choice in a similar fashion:

JAVASCRIPT
blobSvc.createBlockBlobFromLocalFile('mycontainer', 'myblob', 'test.txt', function(error, result, response){
if(!error){
// file uploaded
}
});

When you are done putting your data into Azure Blobs, you can load it into SQL Data Warehouse using PolyBase. To do that, you should follow the directions in the Load with PolyBase documentation. In summary, the required steps to do it are the following:

  • create a database master key
  • create a database scoped credentials
  • create an external file format
  • create an external data source

PolyBase’s ability to transparently parallelize loads from Azure Blob Storage will make it the fastest loading data tool. After configuring PolyBase, you can load data directly into your SQL Data Warehouse by simply creating an external table that points to your data in storage and then mapping it to a new table within SQL Data Warehouse.

Of course, you will need to establish a recurrent process that will extract any newly created data from your service, load them in the form of Azure Blobs and initiate the PolyBase process for importing the data again into SQL Data Warehouse. One way of doing this is by using the Azure Data Factory service. In case you would like to follow this path, you can read some good documentation on how to move data to and from Azure SQL Warehouse using Azure Data Factory.

What is the best way to load data from Pipedrive to SQL Data Warehouse? Which are the possible alternatives?

So far, we just scraped the surface of what can be done with Microsoft Azure SQL Data Warehouse and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from, and your use case requirements. Things can get even more complicated if you want to integrate data coming from different sources. Instead of writing, hosting, and maintaining a flexible data infrastructure, a possible alternative is to use a product like RudderStack to handle this kind of problem automatically.

RudderStack integrates with multiple sources or services like databases, CRM, email campaigns, analytics, and more.

Sign Up For Free And Start Sending Data

Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.