How to load data from Xero to Snowflake

Extract your data from Xero

Xero has an excellent API, or to be more precise a number of APIs, and encourages developers to build applications that can be sold on their add-on marketplace. The APIs that they expose are the following:

  • Xero Core (Accounting) API – exposes accounting and related functions of the main Xero application and can be used for a variety of purposes such as creating transactions like invoices and credit notes, right through to extracting accounting data via our reports endpoint.
  • Xero Payroll API – exposes payroll-related functions of Payroll in Xero and can be used for a variety of purposes such as syncing employee details, importing timesheets, etc.
  • Files API – provides access to the files, folders, and the association of files within a Xero organization.
  • Fixed Assets API – which is under review. This feature is not yet available, but users can vote for it to become publicly available.
  • Xero Practice Manager API – a recently released product built on the WorkflowMax product. Which is an API for managing workflows

In this post, we’ll focus on the Xero Core (Accounting) API, which exposes the core accounting functionalities of the Xero product. The Xero API is a RESTful web service and uses the OAuth (v1.0a) protocol to authenticate 3rd party applications. As a RESTful API, interacting with it can be achieved by using tools like CURL or Postman by using http clients for your favorite language or framework. A few suggestions:

  • Apache HttpClient for Java
  • Spray-client for Scala
  • Hyper for Rust
  • Ruby rest-client
  • Python http-client

As a product and consequently an API that has to deal with sensitive data, Xero API takes really good care of security. For this reason, there are a number of different applications that can be developed and integrate with it, where the main difference is how the application authenticates, how often the tokens expire, and in general security-related aspects.

For more about the different applications types, you can consult the application types guides on their documentation.

Xero API requests limits

The Xero API has three different types of limits that enforces on the usage of their API. It’s extremely important to keep those in mind when developing against its API and a reason for many headaches when someone attempts to build an infrastructure for extracting data from it.

  • Daily limit – of 1000 API calls per organization.
  • Requests per minute – each OAuth access token can be used up to 60 times in any 60 second period. This rate limit is based on a rolling 60-second window.
  • Request Size Limit – A single POST to the Accounting or Payroll APIs has a size limit of 5MB.

For more information about the API limitations, please consult the documentation for API limits.

Xero API Resources

The Xero API has a very rich data model of 31 resources. It is important to know that by default the response type of the API calls is of type text/xml but you can override this option and request JSON responses if preferred.

Requesting data from the Xero API

Let’s assume that you would like to retrieve all the invoices that you have issued through Xero and put the information in your data warehouse to perform analytics and reporting. To do that you should perform a GET request to the https://api.xero.com/api.xro/2.0/Invoices endpoint. A typical result, in XML, from performing such an action is like the following:

JAVASCRIPT
<Invoices>
<Invoice>
<Type>ACCREC</Type>
<Contact>
<ContactID>025867f1-d741-4d6b-b1af-9ac774b59ba7</ContactID>
<ContactStatus>ACTIVE</ContactStatus>
<Name>City Agency</Name>
<Addresses>
<Address>
<AddressType>STREET</AddressType>
</Address>
<Address>
<AddressType>POBOX</AddressType>
<AddressLine1>L4, CA House</AddressLine1>
<AddressLine2>14 Boulevard Quay</AddressLine2>
<City>Wellington</City>
<PostalCode>6012</PostalCode>
</Address>
</Addresses>
<Phones>
<Phone>
<PhoneType>DEFAULT</PhoneType>
</Phone>
<Phone>
<PhoneType>DDI</PhoneType>
</Phone>
<Phone>
<PhoneType>MOBILE</PhoneType>
</Phone>
<Phone>
<PhoneType>FAX</PhoneType>
</Phone>
</Phones>
<UpdatedDateUTC>2009-08-15T00:18:43.473</UpdatedDateUTC>
<IsSupplier>false</IsSupplier>
<IsCustomer>true</IsCustomer>
</Contact>
<Date>2009-05-27T00:00:00</Date>
<DueDate>2009-06-06T00:00:00</DueDate>
<Status>AUTHORISED</Status>
<LineAmountTypes>Exclusive</LineAmountTypes>
<LineItems>
<LineItem>
<Description>Onsite project management </Description>
<Quantity>1.0000</Quantity>
<UnitAmount>1800.00</UnitAmount>
<TaxType>OUTPUT</TaxType>
<TaxAmount>225.00</TaxAmount>
<LineAmount>1800.00</LineAmount>
<AccountCode>200</AccountCode>
<Tracking>
<TrackingCategory>
<TrackingCategoryID>e2f2f732-e92a-4f3a9c4d-ee4da0182a13</TrackingCategoryID>
<Name>Activity/Workstream</Name>
<Option>Onsite consultancy</Option>
</TrackingCategory>
</Tracking>
<LineItemID>52208ff9-528a-4985-a9ad-b2b1d4210e38</LineItemID>
</LineItem>
</LineItems>
<SubTotal>1800.00</SubTotal>
<TotalTax>225.00</TotalTax>
<Total>2025.00</Total>
<UpdatedDateUTC>2009-08-15T00:18:43.457</UpdatedDateUTC>
<CurrencyCode>NZD</CurrencyCode>
<InvoiceID>243216c5-369e-4056-ac67-05388f86dc81</InvoiceID>
<InvoiceNumber>OIT00546</InvoiceNumber>
<Payments>
<Payment>
<Date>2009-09-01T00:00:00</Date>
<Amount>1000.00</Amount>
<PaymentID>0d666415-cf77-43fa-80c7-56775591d426</PaymentID>
</Payment>
</Payments>
<AmountDue>1025.00</AmountDue>
<AmountPaid>1000.00</AmountPaid>
<AmountCredited>0.00</AmountCredited>
</Invoice>
</Invoices>

It is possible to paginate your results by using the paging support of the Xero API, which is very useful when you have to work with a large number of invoices. Also, it is possible to request from the API only the latest invoices. This is done by providing the “Modified After” parameter on the GET request to the API.

The ModifiedAfter filter is actually an HTTP header: ‘If-Modified-Since‘.

A UTC timestamp (yyyy-mm-ddThh:mm:ss) . Only invoices created or modified since this timestamp will be returned e.g. 2009-11-12T00:00:00.

Xero exposes a very rich API which offers you the opportunity to get very granular data about your accounting activities and use it for analytic and reporting purposes. This richness comes with a price though, a large number of resources that have to be handled where some of them allow fetching updates and some other not.

Xero Data Preparation for Snowflake

The first step, before you start ingesting your data into a Snowflake data warehouse instance, is to have a well-defined schema of your data.

Data in Snowflake is organized around tables with a well-defined set of columns with each one having a specific data type.

Snowflake supports a rich set of data types. It is worth mentioning that a number of semi-structured data types are also supported. With Snowflake, it is possible to load data directly in JSON, Avro, ORC, Parquet, or XML format. Hierarchical data is treated as a first-class citizen, similar to what Google BigQuery offers.

There are also one notable common data type that is not supported by Snowflake. LOB or large object data type is not supported. Instead you should use a BINARY or VARCHAR type. But these types are not that useful for data warehouse use cases.

A typical strategy for loading data from Xero to Snowflake, is to create a schema where you will map each API endpoint to a table.

Each key inside the Xero API endpoint response should be mapped to a column of that table and you should ensure the right conversion to a Snowflake data type.

Of course, you must ensure that as any data types from the Xero API might change, you will adapt your database tables accordingly. There’s no such thing as automatic data type casting.

After you have a complete and well-defined data model or schema for Snowflake, you can move forward and start loading your data into the database.

Load data from Xero to Snowflake

Usually, data is loaded into Snowflake in a bulk way, using the COPY INTO command. Files containing data, usually in JSON format, are stored in a local file system or in Amazon S3 buckets. Then a COPY INTO command is invoked on the Snowflake instance and data is copied into a data warehouse.

The files can be pushed into Snowflake using the PUT command, into a staging environment before the COPY command is invoked.

Another alternative is to upload data directly into a service like Amazon S3 from where Snowflake can access them directly.

Finally, Snowflake offers a web interface as a data loading wizard where someone can visually setup and copy data into a data warehouse. Just keep in mind that the functionality of this wizard is limited compared to the rest of the methods.

Snowflake in contrast to other technologies like Redshift does not require a data schema to be packed together with the data that will be copied. Instead, the schema is part of the query that will copy data into the data warehouse. This simplifies the data loading process and offers more flexibility on data type management.

Updating your Xero data on Snowflake

As you will be generating more data on your Xero, you will have to update your older data on Snowflake. This includes new records together with updates to older records that for any reason, have been updated on Xero.

You will ought to periodically check Xero for new data and repeat the process that has been described previously, while updating your currently available data if needed. Updating an already existing row on a Snowflake table is achieved by creating UPDATE statements.

Snowflake has a great tutorial on the different ways of handling updates, especially using primary keys.

Another issue that you need to take care of is the identification and removal of any duplicate records on your database. Either because Xero does not have a mechanism to identify new and updated records or because of errors on your data pipelines, duplicate records might be introduced to your database.

In general, ensuring the quality of data that is inserted in your database is a big and difficult issue.

The best way to load data from Xero to Snowflake

So far we just scraped the surface of what you can do with Snowflake and how to load data into it. Things can get even more complicated if you want to integrate data coming from different sources.

Are you striving to achieve results right now?

Instead of writing, hosting, and maintaining a flexible data infrastructure use RudderStack that can handle everything automatically for you.

RudderStack, with one click, integrates with sources or services, creates analytics-ready data, and syncs your Xero to Snowflake right away.

Sign Up For Free And Start Sending Data
Test out our event stream, ELT, and reverse-ETL pipelines. Use our HTTP source to send data in less than 5 minutes, or install one of our 12 SDKs in your website or app.
Don't want to go through the pain of direct integration? RudderStack's Xero integration makes it easy to send data from Xero to Snowflake.