How to load data from AdWords to Google BigQuery
Data extraction from Google AdWords
The AdWords API allows applications to interact directly with the AdWords platform. You can build applications to more efficiently manage large or complex AdWords accounts and campaigns. Contrary to the rest of the APIs that we have covered in this series of posts, the Google AdWords API is implemented only using the SOAP protocol, and it doesn’t offer a RESTful web implementation.
Nevertheless, they offer a number of client libraries that you can use for your language or framework of choice. They officially support clients for the following languages
- Java
- .Net
- PHP
- PERL
- Python
- Ruby
The AdWords API is a quite complex product that exposes many functionalities to the user, ranging from reporting to do the bidding and programmatic advertisement. As the scope of this post is the extraction of data out of it, to load any data to a data warehouse for further analysis, we’ll focus only on that part of the Google AdWords API.
There are many ways of interacting with data that adWords API gathers. One way is to link your Google Analytics and AdWords accounts and enrich your analytics data with data coming from AdWords. The other possible way, if you have the luxury to afford a Google analytics premium account, is to load every data directly to Google BigQuery. From there, you can either do your analysis from BigQuery or export data to another data warehouse.
We’ll assume that you do not have a Google Analytics premium account. To be honest, if you had, you wouldn’t be looking at this post anyway, but you still want to extract data and load it to your data warehouse solution. To do that, we’ll utilize the Report related functionality of the AdWords API. The API supports a huge number of reports that you can request, and it is possible to change the granularity of your results by passing specific parameters. Defining what kind of data you want to get back as part of your report can be done in two different ways.
- Using an XML-based report definition.
- Using an AWQL-based report definition.
If you want to use an XML based report definition you have to include a parameter named “__rdxml” that will contain an XML serialised definition of the report you want to retrieve.
MARKDOWN
<reportDefinition xmlns="https://adwords.google.com/api/adwords/cm/v201509"> <selector> <fields>CampaignId</fields> <fields>Id</fields> <fields>Impressions</fields> <fields>Clicks</fields> <fields>Cost</fields> <predicates> <field>Status</field> <operator>IN</operator> <values>ENABLED</values> <values>PAUSED</values> </predicates> </selector> <reportName>Custom Adgroup Performance Report</reportName> <reportType>ADGROUP_PERFORMANCE_REPORT</reportType> <dateRangeType>LAST_7_DAYS</dateRangeType> <downloadFormat>CSV</downloadFormat> </reportDefinition>
AWQL is a SQL-like language for performing queries against most common AdWords API services. Any service with a query method is supported; queryable fields for each service are listed here.
As a comparison you can see the difference between using XML and AWQL bellow:
XML
MARKDOWN
<serviceSelector> <fields>Id</fields> <fields>Name</fields> <predicates> <field>Status</field> <operator>EQUALS</operator> <values>ENABLED</values> </predicates> <ordering> <field>Name</field> <sortOrder>ASCENDING</sortOrder> </ordering> <paging> <startIndex>0</startIndex> <numberResults>50</numberResults> </paging> </serviceSelector>
AWQL
SH
CampaignPage p = campaignService.query("SELECT Id, Name WHERE Status = 'ENABLED' ORDER BY Name DESC LIMIT 0,50");
As we can see, the Google AdWords API has a very expressive way of defining what data we want to get from it and various options to do that. If you feel more comfortable with SQL-like languages you can use AWQL, or if you prefer XML you can use that for defining your reports.
Regarding the format of the results you get from the API, there are also multiple options supported.
- CSVFOREXCEL – Microsoft Excel compatible format
- CSV – comma-separated output format
- TSV – tab separated output format
- XML – xml output format
- GZIPPED-CSV – compressed csv
- GZIPPED-XML – compressed xml
Google AdWords, exposes a very rich API which offers you the opportunity to get very granular data about your accounting activities and use it for analytic and reporting purposes. This richness comes with a price though, a large number of complex resources that have to be handled through an also complex protocol.
Prepare your data to be sent from Google AdWords to Google BigQuery
Before you load any data into BigQuery, you should make sure that it is presented in a format supported by it. For example, if the API you pull data out and returns XML, you have to first transform it into a serialization that BigQuery understands. Currently, two data formats are supported:
You also need to make sure that the data types you are using are the ones supported by BigQuery, which are the following:
- STRING
- INTEGER
- FLOAT
- BOOLEAN
- RECORD
- TIMESTAMP
for more information please check the Preparing Data for BigQuery page on the documentation.
Load Data from Google AdWords to Google BigQuery
If you want to load Google AdWords data to Google BigQuery, you have to use one of the following supported data sources.
- Google Cloud Storage
- Sent data directly to BigQuery with a POST request
- Google Cloud Datastore Backup
- Streaming insert
- App Engine log files
- Cloud Storage logs
From the above list of sources, 5 and 6 are not applicable in our case.
For Google Cloud Storage, you first must load data you own into it, there are a few options on how to do this, for example, you can use the console directly as it is described here and do not forget to follow the best practices. Another option is to post data through the JSON API, as we see again APIs play an important role in both the extraction but also the loading of data into our data warehouse. In its simplest case it’s just a matter of one HTTP POST request using a tool like CURL or Postman or Aprise. It should look like the following example.
MARKDOWN
POST /upload/storage/v1/b/myBucket/o?uploadType=media&name=myObjectHTTP/1.1 Host: www.googleapis.comContent-Type: application/textContent-Length: number_of_bytes_in_fileAuthorization: Bearer your_auth_token your Google AdWords data
and if everything went ok, you should get something like the following as a response from the server:
MARKDOWN
HTTP/1.1 200 Content-Type: application/json { "name": "myObject" }
Working with Curl or Postman, is good only for testing, if you would like to automate the process of loading any data into Google Bigquery, you should write some code to send your own data to Google Cloud Storage. In case you are developing on the Google App Engine you can use the library that is available for the languages that are supported by it:
If you are using one of the above languages and you are not coding for the Google App Engine, you can use it to access the Cloud Storage from your environment. Interacting such a feature-rich product like Google Storage on the Cloud can become quite complicated depending on your use case. For more details on the different options that exist, you can check Google Cloud Storage documentation. If you are looking for a less engaged and more neutral way of using it, you can consider a solution like RudderStack.
After you have loaded data into Google Cloud Storage, you have to create a Load Job for BigQuery to actually load the data into it. This Job should point to the source data in Cloud Storage that have to be imported. This happens by providing source URIs that point to the appropriate objects.
The previous method described used a POST request to the Google Cloud Storage API for storing data there and then load it into BigQuery. Another way to go is to do a direct HTTP POST request to BigQuery with the data you would like to query. This approach is similar to how we loaded every data to Google Storage on the Cloud through the JSON API, but it uses the appropriate end-points of BigQuery to load the data there directly. The way to interact with it is quite similar, for more information can be found on the Google BigQuery API Reference and on the page that describes how you can load data into BigQuery using POST. You can interact with it using the HTTP client library of the language or framework of your choice, a few options are:
- Apache HttpClient for Java
- Spray-client for Scala
- Hyper for Rust
- Ruby rest-client
- Python http-client
The best way to load from Google AdWords to BigQuery
So far we just scraped the surface of what you can do with BigQuery and how to load data into it. Things can get even more complicated if you want to integrate data coming from different sources.
Are you striving to achieve results right now?
Instead of writing, hosting, and maintaining a flexible data infrastructure use RudderStack that can handle everything automatically for you.
RudderStack with one click integrates with sources or services, creates analytics-ready data and syncs your Google AdWords to BigQuery right away. Don't want to go through the pain of direct integration? RudderStack’s AdWords to Google BigQuery integration makes it easy to send data from AdWords to Google BigQuery.