How to load data from Google Search Console to Google BigQuery
Access Your Data On Google Search Console
The first step in loading your Google search Console data to any kind of data warehouse solution, is to access your data and start extracting it.
You access your data for the Google Search Console through the Search Console APIs. There are two APIs available there,
- Search Console API
- URL Testing Tools API
From the two, we are interested in the first API, which allows us to access the data we are interested for.
As every other Google product, you need to authorize yourself to get access to the API through an implementation of the OAuth 2.0 protocol. The API is web-based following a REST-like architecture, but Google also offers some SDKs that you can use for some popular languages like Java and Python.
The things that you have to keep in mind when dealing with any API like the one the Google Search Console has, are:
- Rate limits. Every API has some rate limits that you have to respect.
- Authentication. You authenticate on Google using an OAuth.
- Paging and dealing with big amount of data. Platforms like Google tend to generate a lot of data. Pulling big volumes of data out of an API might be difficult, especially when considering and respecting any rate limits that the API has.
Transform And Prepare Your Google Search Console Data
After you have accessed your data on Google Search Console, you will have to transform it based on two main factors,
- The limitations of the database that the data will be loaded onto
- The type of analysis that you plan to perform
Each system has specific limitations on the data types and data structures that it supports. If you want to push data into Google BigQuery, you can send nested data like JSON directly. But when you are dealing with tabular data stores, like PostgreSQL, this is not an option. Instead, you will have to flatten out your data before loading into the database.
Also, you have to choose the right data types. Again, depending on the system you will send the data and the data types that the API exposes to you, you will have to make the right choices. These choices are important because they can limit the expressivity of your queries and limit your analysts on what they can do directly out of the database.
Google Search Console data is modeled around the concept of a report, just like Google Analytics but with a much more limited number of dimensions and metrics.
At the end, you will need to map one report to a table on your database and make sure that all the data is stored into it. Dimensions and metrics will become columns of the tables.
You need to take special care of the fact that the reports you will be getting from Google Search Console, do not have primary keys given by Google, in order to avoid duplicates.
For more information on how you can query your Search Analytics data, please see here.
Load Data From Google Search Console To Google Bigquery
If you want to load Google Search Console data to Google BigQuery, you have to use one of the following supported data sources.
- Google Cloud Storage
- Sent data directly to BigQuery with a POST request
- Google Cloud Datastore Backup
- Streaming insert
- App Engine log files
- Cloud Storage logs
From the above list of sources, 5 and 6 are not applicable in our case.
For Google Cloud Storage, you first have to load your data into it. There are a few options on how to do this. For example, you can use the console directly as described here and do not forget to follow the best practices.
Another option is to post your data through the JSON API, as we see again, APIs play an important role in both the extraction and the loading of data into our data warehouse. It’s just a matter of one HTTP POST request using a tool like CURL or Postman in its simplest case.
After you have loaded your data into Google Cloud Storage, you have to create a Load Job for BigQuery to actually load the data into it. This Job should point to the source data in Cloud Storage that have to be imported. This happens by providing source URIs that point to the appropriate objects.
The Best Way To Load Data From Google Search Console To Google Bigquery And Possible Alternatives
So far, we just scraped the surface of what can be done with Google BigQuery and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from, and the requirements of your use case.
Things can get even more complicated if you want to integrate data coming from different sources. Instead of writing, hosting, and maintaining a flexible data infrastructure, a possible alternative is to use a product like Rudderstack that can automatically handle this kind of problem for you.
Rudderstack integrates with multiple sources or services like databases, CRM, email campaigns, analytics, and more. Quickly and safely move all your data from Google Search Console into Google BigQuery and start generating insights from your data. Don't want to go through the pain of direct integration? RudderStack’s Google Search Console to Google BigQuery integration makes it easy to send data from Google Search Console to Google BigQuery.