How to load data from Xero to PostgreSQL

Extract your data from Xero

Xero has an excellent API, or to be more precise, a number of APIs, and encourages developers to build applications that can be sold on their add-on marketplace. The APIs that they expose are the following:

  • Xero Core (Accounting) API - exposes accounting and related functions of the main Xero application and can be used for various purposes, such as creating transactions like invoices and credit notes, right through to extracting accounting data via our reports endpoint.
  • Xero Payroll API exposes payroll-related functions of Payroll in Xero and can be used for various purposes, such as syncing employee details, importing timesheets, etc.
  • Files API – provides access to the files, folders, and the association of files within a Xero organization.
  • Fixed Assets API – which is under review. This feature is not yet available, but users can vote for it to become publicly available.
  • Xero Practice Manager API – a recently released product built on the WorkflowMax product. Which is an API for managing workflows

In this post, we’ll focus on the Xero Core (Accounting) API, which exposes the core accounting functionalities of the Xero product. The Xero API is a RESTful web service and uses the OAuth (v1.0a) protocol to authenticate 3rd party applications. As an API, you can interact by using tools like CURL, Postman, or using HTTP clients for your favorite language or framework. A few suggestions:

  • Apache HttpClient for Java
  • Spray-client for Scala
  • Hyper for Rust
  • Ruby rest-client
  • Python http-client

As a product and consequently an API that has to deal with sensitive data, Xero API takes really good care of security. For this reason, there are a number of different applications that can be developed and integrate with it, where the main difference is how the application authenticates, how often the tokens expire, and in general security-related aspects.

For more about the different application types, you can consult the application types guides on their documentation.

Xero API requests limits

The Xero API has three different types of limits that enforce the usage of their API. It’s extremely important to keep those in mind when developing against its API and a reason for many headaches when someone attempts to build an infrastructure for extracting data from it.

  • Daily limit – of 1000 API calls per organization.
  • Requests per minute – each OAuth access token can be used up to 60 times in any 60 second period. This rate limit is based on a rolling 60-second window.
  • Request Size Limit – A single POST to the Accounting or Payroll APIs has a size limit of 5MB.

For more information about the API limitations, please consult the documentation for API limits.

Xero API Resources

The Xero API has a very rich data model of 31 resources. It is important to know that by default, the response type of the API calls is of type text/xml, but you can override this option and request JSON responses if preferred.

Requesting data from the Xero API

Let’s assume that you would like to retrieve all the invoices that you have issued through Xero and put the information in your data warehouse to perform analytics and reporting. To do that you should perform a GET request to the https://api.xero.com/api.xro/2.0/Invoices endpoint. A typical result, in XML, from performing such action is like the following:

JAVASCRIPT
<Invoices>
<Invoice>
<Type>ACCREC</Type>
<Contact>
<ContactID>025867f1-d741-4d6b-b1af-9ac774b59ba7</ContactID>
<ContactStatus>ACTIVE</ContactStatus>
<Name>City Agency</Name>
<Addresses>
<Address>
<AddressType>STREET</AddressType>
</Address>
<Address>
<AddressType>POBOX</AddressType>
<AddressLine1>L4, CA House</AddressLine1>
<AddressLine2>14 Boulevard Quay</AddressLine2>
<City>Wellington</City>
<PostalCode>6012</PostalCode>
</Address>
</Addresses>
<Phones>
<Phone>
<PhoneType>DEFAULT</PhoneType>
</Phone>
<Phone>
<PhoneType>DDI</PhoneType>
</Phone>
<Phone>
<PhoneType>MOBILE</PhoneType>
</Phone>
<Phone>
<PhoneType>FAX</PhoneType>
</Phone>
</Phones>
<UpdatedDateUTC>2009-08-15T00:18:43.473</UpdatedDateUTC>
<IsSupplier>false</IsSupplier>
<IsCustomer>true</IsCustomer>
</Contact>
<Date>2009-05-27T00:00:00</Date>
<DueDate>2009-06-06T00:00:00</DueDate>
<Status>AUTHORISED</Status>
<LineAmountTypes>Exclusive</LineAmountTypes>
<LineItems>
<LineItem>
<Description>Onsite project management </Description>
<Quantity>1.0000</Quantity>
<UnitAmount>1800.00</UnitAmount>
<TaxType>OUTPUT</TaxType>
<TaxAmount>225.00</TaxAmount>
<LineAmount>1800.00</LineAmount>
<AccountCode>200</AccountCode>
<Tracking>
<TrackingCategory>
<TrackingCategoryID>e2f2f732-e92a-4f3a9c4d-ee4da0182a13</TrackingCategoryID>
<Name>Activity/Workstream</Name>
<Option>Onsite consultancy</Option>
</TrackingCategory>
</Tracking>
<LineItemID>52208ff9-528a-4985-a9ad-b2b1d4210e38</LineItemID>
</LineItem>
</LineItems>
<SubTotal>1800.00</SubTotal>
<TotalTax>225.00</TotalTax>
<Total>2025.00</Total>
<UpdatedDateUTC>2009-08-15T00:18:43.457</UpdatedDateUTC>
<CurrencyCode>NZD</CurrencyCode>
<InvoiceID>243216c5-369e-4056-ac67-05388f86dc81</InvoiceID>
<InvoiceNumber>OIT00546</InvoiceNumber>
<Payments>
<Payment>
<Date>2009-09-01T00:00:00</Date>
<Amount>1000.00</Amount>
<PaymentID>0d666415-cf77-43fa-80c7-56775591d426</PaymentID>
</Payment>
</Payments>
<AmountDue>1025.00</AmountDue>
<AmountPaid>1000.00</AmountPaid>
<AmountCredited>0.00</AmountCredited>
</Invoice>
</Invoices>

It is possible to paginate your results by using the paging support of the Xero API, which is very useful when you have to work with a large number of invoices. Also, it is possible to request from the API only the latest invoices. This is done by providing the “Modified After” parameter on the GET request to the API.

The ModifiedAfter filter is an HTTP header: ‘If-Modified-Since‘.

A UTC timestamp (yyyy-mm-ddThh:mm:ss) . Only invoices created or modified since this timestamp will be returned e.g. 2009-11-12T00:00:00.

Xero exposes a very rich API which offers you the opportunity to get very granular data about your accounting activities and use it for analytics and reporting purposes. This richness comes with a price, though many resources have to be handled where some of them allow fetching updates and some others not.

Xero Data Preparation for PostgreSQL

To populate a PostgreSQL database instance with data, you need to have a well-defined data model or schema that describes the data. As a relational database, PostgreSQL organizes data around tables.

Each table is a collection of columns with a predefined data type like an integer or VARCHAR. PostgreSQL, like any other SQL database, supports a wide range of different data types.

A typical strategy for loading data from Xero to a PostgreSQL database is to create a schema where you will map each API endpoint to a table. Each key inside the Xero API endpoint response should be mapped to a column of that table, and you should ensure the right conversion to a PostgreSQL compatible data type. For example, if an endpoint from Xero returns a value as String, you should convert it into a VARCHAR with a predefined max size or TEXT data type. Tables can then be created on your database using the CREATE SQL statement.

Of course, you will need to ensure that as the data types from the Xero API might change, you will adapt your database tables accordingly. There’s no such thing as automatic data type casting.

After you have a complete and well-defined data model or schema for PostgreSQL, you can move forward and start loading your data into the database.

Load data from Xero to PostgreSQL

Once you have defined your schema and created your tables with the proper data types, you can start loading data into your database.

The most straightforward way to insert data into a PostgreSQL database is by creating and executing INSERT statements. With INSERT statements, you will be adding data row-by-row directly to a table. It is the most basic and straightforward way of adding data into a table, but it doesn’t scale very well with larger datasets.

The preferred way for adding larger datasets into a PostgreSQL database is by using the COPY command. COPY is copying data from a file on a file system accessible by the PostgreSQL instance. In this way, much larger datasets can be inserted into the database in less time.

You should also consult the documentation of PostgreSQL on how to populate a database with data. It includes a number of very useful best practices on optimizing the process of loading data into your PostgreSQL database.

COPY requires physical access to a file system to load data. Nowadays, with cloud-based, fully managed databases, getting direct access to a file system is not always possible.

If this is the case and you cannot use a COPY statement, then another option is to use PREPARE and INSERT to end up with optimized and more performant INSERT queries.

Updating your Xero data on PostgreSQL

As you will be generating more data on Xero, you will need to update your older data on PostgreSQL. This includes new records and updates to older records that have been updated on Xero for any reason.

You will need to periodically check Xero for new data and repeat the previously described process while updating your currently available data if needed. Updating an already existing row on a PostgreSQL table is achieved by creating UPDATE statements.

Another issue that you need to take care of is identifying and removing any duplicate records on your database. Either because Xero does not have a mechanism to identify new and updated records or because of errors on your data pipelines, duplicate records might be introduced to your database.

In general, ensuring the quality of the data inserted in your database is a big and difficult issue, and PostgreSQL features like TRANSACTIONS can help tremendously. However, they do not solve the problem in the general case.

The best way to load data from Xero to PostgreSQL

So far, we just scraped the surface of what you can do with PostgreSQL and how to load data into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting, and maintaining a flexible data infrastructure, RudderStack can handle everything automatically for you.

RudderStack, with one click, integrates with sources or services, creates analytics-ready data, and syncs your Xero to PostgreSQL right away.

Sign Up For Free And Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.
Don't want to go through the pain of direct integration? RudderStack's Xero integration makes it easy to send data from Xero to PostgreSQL.