After you have accessed your data on Google Sheets, you will have to transform it based on two main factors,
- The limitations of the database where the data is loaded
- The type of analysis that you plan to perform
Each system has specific limitations on the data types and data structures that it supports. If you want to push data into Google BigQuery, you can send nested data like JSON directly, but keep in mind that the data you get from Google Sheets are in the form of a tabular report just like a CSV.
When dealing with tabular data stores, like Microsoft SQL Server, this is not an option. You can consider flattening out your data - similar to the case in JSON - before loading it into the database.
Also, you have to choose the right data types. Again, depending on the system you will send the data to and the data types that the API exposes to you, you will have to make the right choices. These choices are essential because they can limit the expressivity of your queries and limit your analysts on what they can do directly out of the database. Google Sheets has a minimal set of available data types, which means that your work to do these mappings is much easier and straightforward but equally important with any other data source case.
Amazon Redshift is built around industry-standard SQL with added functionality to manage huge data sets and high-performance analysis. So, to load your data into it, you will have to follow its data model, which is a typical relational database model. The data you extract from your data source should be mapped into tables and columns where you can consider the table as a map to the resource you want to store and columns the attributes of that resource.
Also, each attribute should adhere to the data types that Redshift supports.
As your data is probably coming in a representation like JSON - that supports a much smaller range of data types - you have to be careful about what data you feed into Redshift. You also need to ensure that you have mapped your types into one of the data that Redshift supports.
Designing a Schema for Redshift and mapping the data from your data source to it is crucial as it can affect your cluster’s performance and the questions you can answer. It’s always a good idea to have in your mind the best practices that Amazon has published regarding the design of a Redshift database. When you have designed your database, you need to load your data on one of the data sources that Redshift supports as input, which include: