How to load data from the HubSpot to PostgreSQL

Extract your data from HubSpot

HubSpot APIs are following the REST architecture, that can be accessed through HTTP. As a RESTful API, interacting with it can be achieved by using tools like CURL or Postman or by using HTTP clients for your favorite language or framework.

A few suggestions:

  • Apache HttpClient for Java
  • Spray-client for Scala
  • Hyper for Rust
  • Ruby rest-client
  • Python http-client

Responses of the API are all in JSON, including errors although HTTP codes are also returned to indicate errors. HubSpot to maintain some official SDKs for their APIs, like the hapipy client for python, also unofficial clients can be found.

HubSpot API Authentication

The API of HubSpot allows two types of authentication. OAuth and API keys. HubSpot encourages the use of oAuth for any serious integration and suggests using basic authentication with API keys only for testing and rapid prototyping purposes.

HubSpot API Rate Limiting

HubSpot public endpoints are powered by the same underlying technology that powers the core HubSpot application. As a result, HubSpot engineering closely monitors the usage of the public APIs to ensure a quality experience for users of the HubSpot applications.

Below, you’ll find the limits by which a single integration (as identified by an access token) can consume HubSpot public APIs.

  1. Integrations cannot consume HubSpot API at a rate greater than 10 requests/second.
  2. Polling of HubSpot APIs should occur at intervals of 5 minutes or more.
  3. Total requests to HubSpot APIs should not exceed 10,000 in a 24 hour period.
  4. Failed requests to HubSpot APIs may not exceed 5% of total syncs.
  5. All data passed to HubSpot must be properly encoded, and use application/json formatting.
  6. Integrations should use HubSpot’s OAuth protocol.
  7. Integrators must store time-to-live (TTL) data for OAuth access tokens as well as refresh tokens. Unauthorized (401) requests are not a valid indicator that a new access token must be retrieved.
  8. Integrators should use their own public and documented APIs when working with HubSpot APIs.
  9. We reserve the right to change or deprecate the APIs over time – we will provide developers ample notification in those cases.

Endpoints and Available Resources

HubSpot API is actually a collection of a large number of APIs. Something that makes sense if we consider that HubSpot is actually more than one product and each one of these products is quite complex on its own. The APIs are the following:

  • Calendar API. Anything that has to do with calendars in HubSpot and their possible operations.
  • Companies APIs. When we talk about business and customers we talk about companies, so here is your API for interacting with companies.
  • Companies properties APIs. Companies are important for HubSpot and there’s a lot of functionality around them so there’s an auxiliary API just for working with properties of companies.
  • Contacts APIs. Contacts are the fundamental building block to HubSpot – they store lead-specific data that makes it possible to leverage much of the functionality in HubSpot, from marketing automation, to lead scoring to smart content.
  • Contact Lists APIs. API for managing the lists of your contacts.
  • Contact properties APIs. Similar to companies, this API allows you to interact with the properties of your contacts.
  • COS Blog API. Interact with Blogs through the HubSpot platform.
  • COS Blog Authors API. API for interacting with the authors of your blogs.
  • COS Blog Comments API. Exposes functionality about the comments of your APIs.
  • COS Blog Posts API. Anything related to the posts of your Blog.
  • COS Blog Topics API. Manage the topics of your Blog.
  • COS Domains API. Manage your Domains through this API.
  • COS Files API. Operations regarding the file management inside HubSpot.
  • COS Layouts API. Manage the layouts of your pages through this API.
  • COS Page Publishing API. Operations related to publishing content through HubSpot platform.
  • COS Sitemaps API. Operations related to the management of sitemaps for the sites you create through the HubSpot platform.
  • COS Templates API. API for managing the templates of your sites.
  • COS URL Mappings API. Operations related to URL mappings for the sites inside HubSpot.
  • Deals API. Anything that has to do with deals inside your CRM.
  • Deal Pipelines API. Manage the sales pipelines through this API.
  • Deals Properties API. Again, manage the properties of your deals.
  • Email API. Anything about emailing from within the HubSpot platform.
  • Email Events API. Track and interact with events that happen inside emails.
  • Engagements API. Anything related to customer engagement inside the HubSpot platform.
  • Events API. Event handling for HubSpot.
  • Forms API. Manage custom forms that you create inside the platform.
  • Keywords API. Operations related to keywords for SEO.
  • Owners API. Anything about the Owner.
  • Social Media API. API for interacting with Social media through the HubSpot platform.
  • Transactional Email API. The transactional email functionality of HubSpot.
  • Workflows API. Define and manage sales and marketing workflows.

From all the above endpoints we can pull data out of the platform, so it is easy to understand the richness of the data we can get from a platform like HubSpot. Let’s assume, as an example, that we want to get all the Deals data. By executing a GET request like this GET /deals/v1/deal/recent/modified we can get all the recently modified deals. The parameters that we can pass to the call are the following:

  • count: for specifying the number of results per page of the response.
  • offset: for paginating through all available results.
  • since: a timestamp for defining from which exact time you would like to fetch data from.

As we said earlier, results from HubSpot’s API are always in JSON, so if we successfully execute the above query we’ll get the following results back:

JSON
{
"results": [
{
"portalId": 62515,
"dealId": 1030663,
"isDeleted": false,
"associations": {
"associatedVids": [
27316
],
"associatedCompanyIds": [
],
"associatedDealIds": [
]
},
"properties": {
"dealstage": {
"value": "closedwon",
"timestamp": 1417686612442,
"source": "API",
"sourceId": null,
………

The API offers you the opportunity to get very granular data about your accounting activities and use it for analytic and reporting purposes.

HubSpot Data Preparation for PostgreSQL

To populate a PostgreSQL database instance with data, first, you need to have a well-defined data model or schema that describes the data. As a relational database, PostgreSQL organizes data around tables.

Each table is a collection of columns with a predefined data type as an integer or VARCHAR. PostgreSQL, like any other SQL database, supports a wide range of different data types.

A typical strategy for loading data from HubSpot to PostgreSQL is to create a schema where you will map each API endpoint to a table. Each key inside the API of HubSpot endpoint response should be mapped to a column of that table and you should ensure the right conversion to a PostgreSQL compatible data type.

For example, if an endpoint from HubSpot returns a value as String, you should convert it into a VARCHAR with a predefined max size or TEXT data type. tables can then be created on your database using the CREATE SQL statement.

Of course, you will need to ensure that as the data types from HubSpot’s API might change, you will adapt your database tables accordingly, there’s no such thing as automatic data typecasting.

After you have a complete and well-defined data model or schema for Postgres, you can move forward and start loading your data into the database.

Load data from HubSpot to PostgreSQL

types, you can start loading data into your database.

The most straightforward way to insert data into a PostgreSQL database is by creating and executing INSERT statements. With INSERT statements, you will be adding data row-by-row directly to a table. It is the most basic and straightforward way of adding data into a table but it doesn’t scale very well with larger data sets.

The preferred way for adding larger datasets into a PostgreSQL database is by using the COPY command. COPY is copying data from a file on a file system that is accessible by the PostgreSQL instance, in this way much larger datasets can be inserted into the database in less time.

You should also consult the documentation of PostgreSQL on how to populate a database with data. It includes a number of very useful best practices on how to optimize the process of loading data into your PostgreSQL database.

COPY requires physical access to a file system in order to load data. Nowadays, with cloud-based, fully managed databases, getting direct access to a file system is not always possible.

If this is the case and you cannot use a COPY statement, then another option is to use PREPARE together with INSERT, to end up with optimized and more performant INSERT queries.

Updating your HubSpot data on PostgreSQL

As you will be generating more data on HubSpot, you will need to update your older data on PostgreSQL. This includes new records together with updates to older records that for any reason have been updated on HubSpot.

You will need to periodically check HubSpot for new data and repeat the process that has been described previously while updating your currently available data if needed. Updating an already existing row on a PostgreSQL table is achieved by creating UPDATE statements.

Another issue that you need to take care of is the identification and removal of any duplicate records on your database. Either because HubSpot does not have a mechanism to identify new and updated records or because of errors on your data pipelines, duplicate records might be introduced to your database.

In general, ensuring the quality of the data that is inserted in your database is a big and difficult issue and PostgreSQL features like TRANSACTIONS can help tremendously, although they do not solve the problem in the general case.

The best way to load data from HubSpot to PostgreSQL

So far we just scraped the surface of what you can do with PostgreSQL and how to load data into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting, and maintaining a flexible data infrastructure use RudderStack that can handle everything automatically for you.

RudderStack with one click integrates with sources or services, creates analytics-ready data, and syncs your HubSpot to PostgreSQL right away.

Sign Up For Free And Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.
Don't want to go through the pain of direct integration? RudderStack's HubSpot integration makes it easy to send data from HubSpot to PostgreSQL.