After you have accessed data on BaseCRM, you will have to transform it based on two main factors,
- The limitations of the database that data will be loaded onto
- The type of analysis that you plan to perform
Each system has specific limitations on data types and structures that it supports. If for example you want to push into Google BigQuery, then you can send nested data like JSON directly.
Also, you have to choose the right data types. Again, depending on the system that you will send the BaseCRM data to and the data types that the API exposes to you, you will have to make the right choices. These choices are important because they can limit the expressivity of your queries and limit your analysts on what they can do directly out of the database.
Also, you have to consider that the reports you’ll get from BaseCRM are like CSV files in terms of their structure and you need to identify what and how to map a table into your database.
Amazon Redshift is built around industry-standard SQL with added functionality to manage very large data sets and high-performance analysis. So, in order to load data from BaseCRM to Redshift, you will have to follow its model which is a typical relational database model. The information you extract from your data source should be mapped into tables and columns. Where you can consider the table as a map to the resource you want to store and columns the attributes of that resource.
Also, each attribute should adhere to data types that are supported by Redshift.
As your BaseCRM data is probably coming in a representation like JSON that supports a much smaller range of data types you have to be really careful about what data you feed into Redshift and make sure that you have mapped your types into one of the data types that are supported by Redshift.
Designing a Schema for Redshift and mapping the data from your data source to it is a process that you should take seriously as it can both affect the performance of your cluster and the questions that you can answer. It’s always a good idea to have in your mind the best practices that Amazon has published regarding the design of a Redshift database. When you have concluded on the design of the database you need to load data on one of the data sources that are supported as input by Redshift, these are the following: