Kafka Vs. PostgreSQL: How We Implemented Our Queuing System Using PostgreSQL

Overview In our previous post, we discussed why Apache Kafka wasn’t the right solution for RudderStack’s core streaming/queuing engine. Instead, we built our own streaming engine on top of PostgreSQL. This article discusses the internals of our implementation using the queuing system in more detail. Introduction to Queuing Systems The core concept behind any queuing

Why RudderStack Used Postgres Over Apache Kafka for Streaming Engine

Overview In this post, we answer the all-important question – “Why we did not prefer Apache Kafka over PostgreSQL for building RudderStack”. We discuss some of the challenges with using Apache Kafka over our implemented solution that uses PostgreSQL. RudderStack is a Queue At its core, RudderStack is a queuing system. It gets events from

The Complete Customer Data Stack

Overview In this article, we break down the ideal architecture for “the complete customer data stack” from the perspective of the data engineer. With new customer data software tools being launched every day and unclear definitions around terms like “customer data platform,” we make the argument that these individual tools are always part of a

How to Access and Query Your Amazon Redshift Data Using Python and R

Overview In this post, we will see how to access and query your Amazon Redshift data using Python. We follow two steps in this process: Connecting to the Redshift warehouse instance and loading the data using PythonQuerying the data and storing the results for analysis Since Redshift is compatible with other databases such as PostgreSQL,

How to Access and Query Your Google BigQuery Data Using Python and R

Overview In this post, we see how to load Google BigQuery data using Python and R, followed by querying the data to get useful insights. We leverage the Google Cloud BigQuery library for connecting BigQuery Python, and the bigrquery library is used to do the same with R.  We also look into the two steps

Clickstream Data Mining Techniques: An Introduction

Overview In this post, we cover two key algorithms for mining clickstream data – Markov Chain, as well as the cSPADE algorithm.  These techniques allow you to leverage the clickstream data to get a 360-degree view of your customers and personalize their overall product experience. We also focus on the two key problems that these

Why Single-platform Analytics Tools Don’t Scale Well

We recently came across this question on Quora:What are the benefits of a data warehouse for a web startup over third party analytics tools like Google Analytics and Mixpanel?It’s a good question, and the answer isn’t necessarily simple, in large part due to the problem of scale. Your analytics needs in the early stages of

Why Event-based or MTU-based Pricing is Broken

At RudderStack, we believe in sharing ways to build quality customer data infrastructure. Sharing an open-source product is at the heart of what we do. We also offer a premium and paid tier for a hosted enterprise feature. This includes advantages such as SSO, high availability, and dedicated support. After team discussions and evaluations about
1 2 3 5