How to Access and Query Your Amazon Redshift Data Using Python and R

Overview In this post, we will see how to access and query your Amazon Redshift data using Python. We follow two steps in this process: Connecting to the Redshift warehouse instance and loading the data using PythonQuerying the data and storing the results for analysis Since Redshift is compatible with other databases such as PostgreSQL,

How to Access and Query Your Google BigQuery Data Using Python and R

Overview In this post, we see how to load Google BigQuery data using Python and R, followed by querying the data to get useful insights. We leverage the Google Cloud BigQuery library for connecting BigQuery Python, and the bigrquery library is used to do the same with R.  We also look into the two steps

Clickstream Data Mining Techniques: An Introduction

Overview In this post, we cover two key algorithms for mining clickstream data – Markov Chain, as well as the cSPADE algorithm.  These techniques allow you to leverage the clickstream data to get a 360-degree view of your customers and personalize their overall product experience. We also focus on the two key problems that these

Why Single-platform Analytics Tools Don’t Scale Well

We recently came across this question on Quora:What are the benefits of a data warehouse for a web startup over third party analytics tools like Google Analytics and Mixpanel?It’s a good question, and the answer isn’t necessarily simple, in large part due to the problem of scale. Your analytics needs in the early stages of

Why Event-based or MTU-based Pricing is Broken

At RudderStack, we believe in sharing ways to build quality customer data infrastructure. Sharing an open-source product is at the heart of what we do. We also offer a premium and paid tier for a hosted enterprise feature. This includes advantages such as SSO, high availability, and dedicated support. After team discussions and evaluations about

RudderStack Releases Support For, Slack, Webhooks, and Azure Event Hubs

RudderStack is proud to announce its support for a new source –, as well as 3 new destinations namely Slack, Webhooks, and Azure Event Hubs. While RudderStack already supports as a destination, we decided to add additional support for it as a source – where events captured by are sent directly to

RudderStack – An Open-source Customer Data Infrastructure: Podcast with Soumyadeb Mitra

RudderStack is an open-source Customer Data Infrastructure for collecting and routing your customer data for analytics. With a special focus on data privacy, security, and reliability, RudderStack is enterprise-ready and gives you the flexibility of transforming your event data to suit your business requirements. In this interview with Software Engineering Daily, Soumyadeb Mitra – the

Building a Reliable Customer Data Infrastructure

Customer Data Infrastructure (CDI) is a typical example of a Data-Intensive Application. Martin Kleppmann’s book Design Data-Intensive Applications does an amazing job of explaining what a data-intensive application is. CDI, at its core, is an infrastructure for capturing, processing, and routing streams of events from applications.  Routing in Customer Data Infrastructure Routing might not be the most common

Simplifying Event Filtering and Value Aggregation with RudderStack

Dealing with event data is dirty work at times. Developers may transmit events with errors because of a change a developer made. Also, sometimes errors could be introduced if the data engineering team decides to change something on the data warehouse schema. Due to these changes to the schema, data type conflict may occur.  How
1 2 3 5