After you have accessed your data on Analytics, you will have to transform it based on two main factors,
- The limitations of the database that the data will be loaded onto
- The type of analysis that you plan to perform
Each system has specific limitations on the data types and data structures that it supports. If for example, you want to push data into Google BigQuery, then you can send nested data like JSON directly, but keep in mind that the data you get from Analytics are in the form of a tabular report closer to what a CSV or a spreadsheet looks like.
Ofcourse, when you are dealing with tabular data stores, like Microsoft SQL Server, this is not an option. Instead, you will have to flatten out your data, just as in the case of JSON, before loading into the database.
Also, you have to choose the right data types. Again, depending on the system that you will send the data to and the data types that the API exposes to you, you will have to make the right choices. These choices are important because they can limit the expressivity of your queries and limit your analysts on what they can do directly out of the database. Analytics has a very limited set of available data types which means that your work to do these mappings is much easier and straightforward, but nonetheless equally important with any other case of a data source.
In order to understand and model correctly your Analytics data, you will need to understand that the data coming out of it are in the form of a report. The report is like a spreadsheet and it can be naturally mapped into a table. So more or less you will end up with a one to one mapping between a report and a table on your database.
You also need to keep in mind that because of the report nature of the data, you will not find any primary keys that can be used for deduplication and reference. This is something that you have to construct by understanding the nature of your reports data.
Also, as Google analytics is sampling the data to generate the report, you might see slightly different values if you pull the same report, for the same period, more than once.
Each table is a collection of columns with a predefined data type like an integer or VARCHAR. PostgreSQL, like any other SQL database, supports a wide range of different data types.
A typical strategy for loading data from Analytics to a Postgres database is to create a schema where you will map each API endpoint to a table. Each key inside the Analytics API endpoint response should be mapped to a column of that table and you should ensure the right conversion to a Postgres compatible data type.