How to load data from Xero to Google BigQuery

Extract data from Xero

Xero has an excellent API, or to be more precise, a number of APIs, and encourages developers to build applications that can be sold on their add-on marketplace. The APIs that they expose are the following:

  1. Xero Core (Accounting) API exposes accounting and related functions of the main Xero application. It can be used for various purposes, such as creating transactions like invoices and credit notes, right through to extracting accounting data via our reports endpoint.
  2. Xero Payroll API exposes payroll–related functions of Payroll in Xero and can be used for various purposes, such as syncing employee details, importing timesheets, etc.
  3. Files API – provides access to the files, folders, and the association of files within a Xero organization.
  4. Fixed Assets API – which is under review. This feature is not yet available, but users can vote for it to become publicly available.
  5. Xero Practice Manager API – a recently released product built on the WorkflowMax product, which is an API for managing workflows.

In this post, we’ll focus on the Xero Core (Accounting) API, which exposes the core accounting functionalities of the Xero product. The API of Xero is a RESTful web service and uses the OAuth (v1.0a) protocol to authenticate 3rd party applications. As a RESTful API, interacting with it can be achieved using CURL or Postman or using HTTP clients for your favorite language or framework. A few suggestions:

  1. Apache HttpClient for Java
  2. Spray-client for Scala
  3. Hyper for Rust
  4. Ruby rest-client
  5. Python http-client

As a product and consequently an API that has to deal with sensitive data, Xero API takes really good care of security. For this reason, there are a number of different applications that can be developed and integrate with it, where the main difference is how the application authenticates, how often the tokens expire, and in general security-related aspects. For more about the different application types, you can consult the application types guides on their documentation.

Xero API requests limits

The API of Xero has three different types of limits that enforce the usage of their API. It’s extremely important to keep those in mind when developing against its API and a reason for many headaches when someone attempts to build an infrastructure for extracting data out of it.

  1. Daily limit – of 1000 API calls per organization.
  2. Requests per minute – each OAuth access token can be used up to 60 times in any 60 second period. This rate limit is based on a rolling 60-second window.
  3. Request Size Limit – A single POST to the Accounting or Payroll APIs has a size limit of 5MB.

For more information about the API limitations, please consult the documentation for API limits.

Xero API Resources

The Xero API has a very rich data model of 31 resources. It is important to know that by default, the response type of the API calls is of type text/XML, but you can override this option and request JSON responses if preferred.

Requesting data by Xero API

Let’s assume that you would like to retrieve all the invoices you have issued through Xero and put the information in a data warehouse you own to perform analytics and reporting. To do that, you should perform a GET request to the https://api.xero.com/api.xro/2.0/Invoices endpoint. A typical result, in XML, from performing such an action is like the following:

JAVASCRIPT
<Invoices> <Invoice> <Type>ACCREC</Type> <Contact> <ContactID>025867f1-d741-4d6b-b1af-9ac774b59ba7</ContactID> <ContactStatus>ACTIVE</ContactStatus> <Name>City Agency</Name> <Addresses> <Address> <AddressType>STREET</AddressType> </Address> <Address> <AddressType>POBOX</AddressType> <AddressLine1>L4, CA House</AddressLine1> <AddressLine2>14 Boulevard Quay</AddressLine2> <City>Wellington</City> <PostalCode>6012</PostalCode> </Address> </Addresses> <Phones> <Phone> <PhoneType>DEFAULT</PhoneType> </Phone> <Phone> <PhoneType>DDI</PhoneType> </Phone> <Phone> <PhoneType>MOBILE</PhoneType> </Phone> <Phone> <PhoneType>FAX</PhoneType> </Phone> </Phones> <UpdatedDateUTC>2009-08-15T00:18:43.473</UpdatedDateUTC> <IsSupplier>false</IsSupplier> <IsCustomer>true</IsCustomer> </Contact> <Date>2009-05-27T00:00:00</Date> <DueDate>2009-06-06T00:00:00</DueDate> <Status>AUTHORISED</Status> <LineAmountTypes>Exclusive</LineAmountTypes> <LineItems> <LineItem> <Description>Onsite project management </Description> <Quantity>1.0000</Quantity> <UnitAmount>1800.00</UnitAmount> <TaxType>OUTPUT</TaxType> <TaxAmount>225.00</TaxAmount> <LineAmount>1800.00</LineAmount> <AccountCode>200</AccountCode> <Tracking> <TrackingCategory> <TrackingCategoryID>e2f2f732-e92a-4f3a9c4d-ee4da0182a13</TrackingCategoryID> <Name>Activity/Workstream</Name> <Option>Onsite consultancy</Option> </TrackingCategory> </Tracking> <LineItemID>52208ff9-528a-4985-a9ad-b2b1d4210e38</LineItemID> </LineItem> </LineItems> <SubTotal>1800.00</SubTotal> <TotalTax>225.00</TotalTax> <Total>2025.00</Total> <UpdatedDateUTC>2009-08-15T00:18:43.457</UpdatedDateUTC> <CurrencyCode>NZD</CurrencyCode> <InvoiceID>243216c5-369e-4056-ac67-05388f86dc81</InvoiceID> <InvoiceNumber>OIT00546</InvoiceNumber> <Payments> <Payment> <Date>2009-09-01T00:00:00</Date> <Amount>1000.00</Amount> <PaymentID>0d666415-cf77-43fa-80c7-56775591d426</PaymentID> </Payment> </Payments> <AmountDue>1025.00</AmountDue> <AmountPaid>1000.00</AmountPaid> <AmountCredited>0.00</AmountCredited> </Invoice> </Invoices>


It is possible to paginate your results by using the Xero API’s paging support, which is very useful when working with a large number of invoices. Also, it is possible to request from the API only the latest invoices. This is done by providing the Modified After parameter on the GET request to the API. The ModifiedAfter filter is an HTTP header: If-Modified-Since.

A UTC timestamp (yyyy-mm-ddThh:mm:ss) . Only invoices created or modified since this timestamp will be returned e.g. 2009-11-12T00:00:00.

Xero exposes a very rich API which offers you the opportunity to get very granular data about your accounting activities and use it for analytic and reporting purposes. This richness comes with a price, though many resources have to be handled where some of them allow fetching updates and some others not.

Prepare your Xero Data for Google BigQuery

Before you load any data into BigQuery, you should make sure that it is presented in a format supported by it. For example, if the API you pull data, returns XML you have first to transform it into a serialization that BigQuery understands. Currently, two data formats are supported:

You also need to make sure that the data types you are using are the ones supported by BigQuery, which are the following:

  • STRING
  • INTEGER
  • FLOAT
  • BOOLEAN
  • RECORD
  • TIMESTAMP

For more information, please check the Preparing Data for BigQuery page on the documentation.

Load Data from Xero to Google BigQuery

If you want to load Xero data to Google BigQuery, you have to use one of the following supported data sources.

  1. Google Cloud Storage
  2. Sent data directly to BigQuery with a POST request
  3. Google Cloud Datastore Backup
  4. Streaming insert
  5. App Engine log files
  6. Cloud Storage logs

From the above list of sources, 5 and 6 are not applicable in our case.

For Google Cloud Storage, you first must load your data into it. There are a few options for doing this; for example, you can use the console directly as described here, and do not forget to follow the best practices. Another option is to post data through the JSON API. APIs play an important role in both the extraction and the loading of data into our data warehouse. It’s just a matter of one HTTP POST request using a tool like CURL or Postman in its simplest case. It should look like the following example.

  1. POST /upload/storage/v1/b/myBucket/o?uploadType=media&name=myObject
  2. HTTP/1.1 Host: www.googleapis.com
  3. Content-Type: application/text Content-Length: number_of_bytes_in_file
  4. Authorization: Bearer your_auth_token your Xero data

If everything went ok, you should get something like the following as a response from the server:

JAVASCRIPT
HTTP/1.1 200 Content-Type: application/json { "name": "myObject" }

Working with Curl or Postman is good only for testing. If you would like to automate loading data into Google Bigquery, you should write some code to send your data to Google Storage in the Cloud. In case you are developing on the Google App Engine, you can use the library that is available for the languages that are supported by it:

  1. Python
  2. Java
  3. PHP
  4. Go

If you are using one of the above languages and are not coding for the Google App Engine, you can access the Cloud Storage from your environment. Interacting such a feature-rich product like Cloud Storage can become quite complicated depending on your use case. For more details on the different options that exist, you can check Google Cloud Storage documentation. If you are looking for a less engaged and more neutral way of using Storage in the Cloud, you can consider a solution like RudderStack.

After you have loaded your data into Cloud Storage, you have to create a Load Job for BigQuery to load data into it. This Job should point to the source data in Cloud Storage that have to be imported. This happens by providing source URIs that point to the appropriate objects.

The previous method described a POST request to the Google Storage API for storing the data there and then loading it into BigQuery. Another way to go is to do a direct HTTP POST request to BigQuery with every data you would like to query. This approach is similar to how we loaded data to Google Storage in the Cloud through the JSON API, but it uses the appropriate end-points of BigQuery to load the data there directly. The way to interact with it is quite similar.

More information can be found on the Google BigQuery API Reference and on the page that describes how you can load any data into BigQuery using POST. You can interact with it using the HTTP client library of the language or framework of your choice, a few options are:

The best way to load data from Xero to BigQuery

So far, we just scraped the surface of what you can do with BigQuery and how you can load any data you own into it. Things can get even more complicated if you want to integrate data coming from different sources.

Instead of writing, hosting, and maintaining a flexible data infrastructure, use RudderStack that can handle everything automatically for you.

RudderStack, with one click, integrates with sources or services, creates analytics-ready data, and syncs your Xero to Bigquery right away.

Sign Up For Free And Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.
Don't want to go through the pain of direct integration? RudderStack's Xero integration makes it easy to send data from Xero to Google BigQuery.