How to load data from Magento to MS SQL Server
Extract your data from Magento
Magento exposes its platform through both a REST and a SOAP interface. Both can be used to pull data from it, which is also the scope of this article but also to interact with the platform. By using these interfaces, developers create rich applications and plugins for Magento. In this post, we will use the REST version of the Magento platform.
As a Web API following the RESTful architecture principles, it can be accessed through HTTP. As a RESTful API, interacting with it can be achieved by using tools like CURL or Postman or by using HTTP clients for your favorite language or framework. A few suggestions:
- Apache HttpClient for Java
- Spray-client for Scala
- Hyper for Rust
- Ruby rest-client
- Python http-client
Magento does not publish official SDKs but by using the SOAP interface it is possible to automatically generate clients that can act as SDKs for your favorite language or platform. For example in Java, you can create a client in Eclipse by providing the WSDL file that Magento exposes after you set up the platform.
Magento API Authentication
Magento is a self-hosted platform unless you are using the Enterprise cloud edition, so you have much more control over its access than other solutions, but if you want to access its data through the REST API that it has, then you will have to use oAuth for authentication which is supported by Magento.
Magento Rate Limiting
As a platform hosted on your own premises it doesn’t really impose any rate-limiting. In any case, as you would like to avoid stressing your e-commerce platform that is facing your customers you should make sure that your pipeline process does not overstress your Magento installation. But this is completely at your discretion.
Endpoints and Available Resources
Magento exposes the following resources:
- Products. Retrieve the list of products, create, update, delete a product.
- Product categories. Retrieve the list of categories assigned to a product, assign and unassign the category from a product.
- Product images. Retrieve the list of websites assigned to a product, assign, unassign a website to/from a product.
- Customers. Retrieve the list of customers, create, delete a customer, and update the customer information.
- Customer Addresses. Retrieve the list of customer addresses, create, update, and delete the customer address.
- Inventory. Retrieve the list of stock items, update required stock items.
- Sales Orders. Retrieve the list of sales orders with detailed information on order addresses, items, and comments.
- Order Addresses. Retrieve information on the specified order comments.
- Order Items. Retrieve information on specified order items.
The API is possible to return either JSON or XML responses, this is something that you can control by providing the appropriate Accept header content type.
For all the above resources we can request from the Magento platform to pull out a list of results with all the associated data, so ideally we would like to pull all the data and make sure that we keep them up to date on our analytics platform of choice for further analysis. For this post we will just see how we can pull data for one resource, the Sales Orders, the process is the same for all other resources.
To pull data for the Sales Orders, we need to execute a get request to the following endpoint:
TEXT
http://magentohost/api/rest/orders
As a platform hosted by you, you need to replace the “magentohost” part of the URL with the actual URL of the host that has Magento running. The rest of the URL is the same as the above. The default response is in XML and looks like the following:
MARKDOWN
<?xml version="1.0"?><magento_api><data_item_1><customer_id>3</customer_id><base_discount_amount>0.0000</base_discount_amount><base_shipping_amount>455.0000</base_shipping_amount><base_shipping_tax_amount>0.0000</base_shipping_tax_amount><base_subtotal>13650.0000</base_subtotal><base_tax_amount>0.0000</base_tax_amount><base_total_paid></base_total_paid><base_total_refunded></base_total_refunded><tax_amount>0.0000</tax_amount><total_paid></total_paid><total_refunded></total_refunded><base_shipping_discount_amount>0.0000</base_shipping_discount_amount><base_subtotal_incl_tax>13650.0000</base_subtotal_incl_tax><base_total_due>14105.0000</base_total_due><total_due>14105.0000</total_due><base_currency_code>USD</base_currency_code><tax_name></tax_name><tax_rate></tax_rate><addresses><data_item><region>Palau</region><postcode>19103</postcode><lastname>Doe</lastname><street>2356 Jody Road Philadelphia, PA 19103</street><city>PA</city><telephone>610-634-1181</telephone><country_id>US</country_id><firstname>John</firstname><address_type>billing</address_type><prefix></prefix><middlename></middlename><suffix></suffix><company></company></data_item><data_item><region>Palau</region><postcode>19103</postcode><lastname>Doe</lastname><street>2356 Jody Road Philadelphia, PA 19103</street><city>PA</city><telephone>610-634-1181</telephone><country_id>US</country_id><firstname>John</firstname><address_type>shipping</address_type><prefix></prefix><middlename></middlename><suffix></suffix><company></company></data_item></addresses><order_items><data_item><sku>Sunglasses_1</sku><price>150.0000</price><base_price>150.0000</base_price><base_original_price>150.0000</base_original_price><tax_percent>0.0000</tax_percent><tax_amount>0.0000</tax_amount><base_tax_amount>0.0000</base_tax_amount><base_discount_amount>0.0000</base_discount_amount><base_row_total>13650.0000</base_row_total><base_price_incl_tax>150.0000</base_price_incl_tax><base_row_total_incl_tax>13650.0000</base_row_total_incl_tax></data_item></order_items></data_item_1><data_item_2><customer_id>3</customer_id><base_discount_amount>0.0000</base_discount_amount><base_shipping_amount>95.0000</base_shipping_amount><base_shipping_tax_amount>0.0000</base_shipping_tax_amount><base_subtotal>3350.0000</base_subtotal><base_tax_amount>0.0000</base_tax_amount><base_total_paid>2445.0000</base_total_paid><base_total_refunded>1845.0000</base_total_refunded><tax_amount>0.0000</tax_amount><total_paid>2445.0000</total_paid><total_refunded>1845.0000</total_refunded><base_shipping_discount_amount>0.0000</base_shipping_discount_amount><base_subtotal_incl_tax>3350.0000</base_subtotal_incl_tax><base_total_due>1000.0000</base_total_due><total_due>1000.0000</total_due><base_currency_code>USD</base_currency_code><tax_name></tax_name><tax_rate></tax_rate><addresses><data_item><region>Palau</region><postcode>19103</postcode><lastname>Doe</lastname><street>2356 Jody Road Philadelphia, PA 19103</street><city>PA</city><telephone>610-634-1181</telephone><country_id>US</country_id><firstname>John</firstname><address_type>billing</address_type><prefix></prefix><middlename></middlename><suffix></suffix><company></company></data_item><data_item><region>Palau</region><postcode>19103</postcode><lastname>Doe</lastname><street>2356 Jody Road Philadelphia, PA 19103</street><city>PA</city><telephone>610-634-1181</telephone><country_id>US</country_id><firstname>John</firstname><address_type>shipping</address_type><prefix></prefix><middlename></middlename><suffix></suffix><company></company></data_item></addresses><order_items><data_item><sku>Sunglasses_1</sku><price>150.0000</price><base_price>150.0000</base_price><base_original_price>150.0000</base_original_price><tax_percent>0.0000</tax_percent><tax_amount>0.0000</tax_amount><base_tax_amount>0.0000</base_tax_amount><base_discount_amount>0.0000</base_discount_amount><base_row_total>1350.0000</base_row_total><base_price_incl_tax>150.0000</base_price_incl_tax><base_row_total_incl_tax>1350.0000</base_row_total_incl_tax></data_item><data_item><sku>Sun_glasses</sku><price>200.0000</price><base_price>200.0000</base_price><base_original_price>200.0000</base_original_price><tax_percent>0.0000</tax_percent><tax_amount>0.0000</tax_amount><base_tax_amount>0.0000</base_tax_amount><base_discount_amount>0.0000</base_discount_amount><base_row_total>2000.0000</base_row_total><base_price_incl_tax>200.0000</base_price_incl_tax><base_row_total_incl_tax>2000.0000</base_row_total_incl_tax></data_item></order_items></data_item_2></magento_api>
As we can see, we get back a list of items with each one representing an order, that contains all the information that we would like to use for further analysis. information like the discount that we might have applied, the taxes paid the base price of the order, etc. As we might have many order objects to retrieve, we should paginate through the results. To do that, we need to provide the “page” and “limit” parameters to our GET request.
Now that we have the results from our Magento shop, we can further process them before we are able to load them into the BI platform of our choice.
Magento Data Preparation for Microsoft SQL Server
As in every relational database, SQL Server requires a well-defined database schema before we start populating with data. Data is organized in schemas, which are distinct namespaces where database objects belong to.
The most common database objects are of course tables which have a number of columns with each one having a declared data type. MS SQL Server supports a large number of different data types. This gives us great flexibility in expressing the data that we have and at the same time optimizing our data warehouse.
When working with data coming from web services, where data is usually serialized in JSON, it is important to correctly map the data to the right data types. As changing the data types in the future is a process that might cost in downtime of your database, it is important to spend enough time thinking about the proper data type assignments.
For example, dates in JSON are just strings, but when storing date objects in a database, we can enhance analytics with great capabilities by transforming the raw string data into an appropriate date type. A typical strategy for loading data from Magento to an SQL Server database is to create a schema where you will map each API endpoint to a table. Each key inside the Magento API endpoint response should be mapped to a column of that table and you should ensure the right conversion to an SQL Server compatible data type.
Of course, you will need to ensure that as the data types from the Magento API might change, you will adapt your database tables accordingly, there’s no such thing as automatic data typecasting. After you have a complete and well-defined data model or schema for Microsoft SQL Server, you can move forward and start loading your data into the database.
Load data from Magento to MS SQL Server
As a feature-rich and mature product, MS SQL Server offers a large and diverse set of methods for loading data into a database. One way of importing data into your database is by using the SQL Server Import and export Wizard. With it and through a visual interface you will be able to bulk load data from a number of data sources that are supported.
You can import data from another SQL Server, from an Oracle database, from Flat Files, from an Access Data Source, PostgreSQL, MySQL, and finally Azure Blob Storage. Especially if you are using a managed version of MS SQL Server on Azure, you should definitely consider utilizing the Azure Blob Storage connection.
In this way, you will be loading data as Blobs on Azure and your MS SQL Server database will sync with it through the Import and Export Wizard.
Another way for importing bulk data into an SQL Server, both on Azure and on-premises, is by using the bcp utility. This is a command-line tool that is built specifically for bulk loading and unloading of data from an MS SQL database.
Finally and for compatibility reasons, especially if you are managing databases from different vendors, you can BULK INSERT SQL statements.
In a similar way and as it happens with the rest of the databases, you can also use the standard INSERT statements, where you will be adding data row-by-row directly to a table. It is the most basic and straightforward way of adding data into a table but it doesn’t scale very well with larger data sets.
So for bulk datasets, you better consider one of the previous methods.
Updating your Magento data on MS SQL Server
As you will be generating more data on Magento, you will need to update your older data on an MS SQL Server database. This includes new records together with updates to older records that for any reason have been updated on Magento.
You will need to periodically check Magento for new data and repeat the process that has been described previously while updating your currently available data if needed. Updating an already existing row on a SQL Server table is achieved by creating UPDATE statements.
Another issue that you need to take care of is the identification and removal of any duplicate records on your database. Either because Magento does not have a mechanism to identify new and updated records or because of errors on your data pipelines, duplicate records might be introduced to your database.
In general, ensuring the quality of the data that is inserted in your database is a big and difficult issue and MS SQL Server features like TRANSACTIONS can help tremendously, although they do not solve the problem in the general case.
The best way to load data from Magento to MS SQL Server and possible alternatives
So far we just scraped the surface of what can be done with MS SQL Server and how to load data into it. The way to proceed relies heavily on the data you want to load, from which service they are coming from, and the requirements of your use case.
Things can get even more complicated if you want to integrate data coming from different sources. A possible alternative, instead of writing, hosting, and maintaining a flexible data infrastructure, is to use a product like RudderStack that can handle this kind of problem automatically for you.
RudderStack integrates with multiple sources or services like databases, CRM, email campaigns, analytics, and more.