Better Customer Data Integration Management For Growing Teams
Integrating the various systems that hold customer information is a critical step to improving your company’s understanding of its customers and products. With a comprehensive customer view in your data warehouse (or data lakehouse), you can deliver better context to help your team answer critical business questions. Instead of having to work with sales, product, and marketing information in data silos, this allows you to leverage a single source of truth to answer questions across the business and deliver customer insightsMany teams develop their own methods and build custom systems to manage the complexity of collecting and unifying data from their various sites, apps, and source systems. In some cases this makes sense. At some companies it can be pretty straightforward and others may have a staff large enough to manage the development.
However, these systems can be difficult to manage and lead to lots of trade-offs and technical debt.
These data integration challenges come in all shapes and sizes. A director may decide they want to switch out their system after the data has finally been integrated (this has happened to me), or a growing number of streaming sources could make it difficult to manage all of your streaming events. Whatever the flavor, these challenges can cause issues for even a decently developed data integration system. To reduce these issues and develop a better integration system, you’ll need to create a customer data integration strategy and build a scalable framework.
Before we cover how to develop a better data integration management system, let’s define data integration.
What is data integration?
Data integrations can take many forms. Teams can integrate data via IPaaS solutions that help sync data across operational systems, they can integrate data in their customer data platform (CDP), or they can use ETLs and data pipelines to mesh data together in their pre-processing. Each of these integration methods, in one way or another, take data of various formats from different sources and sync them together.
In turn, analysts can easily work with said data to answer critical questions and help the business truly become data driven. This also impacts other end users. For example, operations teams can further integrate the data into other systems, such as a CRM, to streamline business processes. What’s more, with Reverse ETL, data integrated in the warehouse can easily be operationalized and delivered to downstream teams where they can use it to drive optimizations, increase engagement, and deliver a better customer experience.
How many developers build their integration systems
When engineers are asked to build an integration, their first response is often to build it themselves. There are a lot of pros to building your own integrations. If you do it yourself, you don’t need to get approval from procurement or a finance director, and you don’t need to learn a new tool, you can just rely on whatever programming language you’re comfortable with. You can move fast. The build it yourself approach allows you to quickly go from idea to integration.
But challenges can often arise with this approach. Whether you are working on event integrations or standard batch processing, one thing doesn’t change–the need for integrations will grow. Other internal teams will start asking for new data sources and integrations. Moreover, older integrations may break, or their APIs and SDKs might change. This creates a hamster wheel effect where you’re constantly playing catch up as you grow. For large teams, this isn’t always a problem, but smaller teams will often struggle to make progress on more important objectives because they’re busy running on the integration management hamster wheel.
Options for building a scalable framework
Building a scalable data integration framework is critical as your company grows and becomes more complex. In fact, most large tech companies have a custom framework dedicated to managing constant data requests. There are different approaches you can take here, and with modern data tooling it’s easier than ever to create a system for data collection, unification, and activation that is both scalable and sustainable. I’ll detail these different approaches below.
Build a custom batch framework
Over the past decades many individuals and companies have developed their own custom batch frameworks to orchestrate data workflows. Some have remained popular. Others stopped being maintained as soon as the engineers that developed them left the company.
The truth is if you’re going to build your own internal system, you’ll need to build a system that is far more robust and flexible than your company currently requires. This was part of the premise of Airflow. Lots of workflow orchestrators were being developed at other companies, but there was a constant need to update and replace these as they stopped meeting the company needs. That led to expensive redevelopment, maintenance and migrations to new frameworks.
Building a custom events based framework
Customer event data, or clickstream data, comes from multiple platforms. Emails, product usage, and website interactions all generate this event data. As companies grow, the number of event data sources typically grows along with them. This data also becomes increasingly important in making optimizations to products and marketing efforts as companies mature because it’s the primary way companies gain insight into customer behavior. That’s why it’s often referred to as behavioral data. Batch processing isn’t always good enough for these use cases. Although batch can often be easier to implement, sometimes your systems need data to be processed immediately. In turn, an excellent customer data platform (CDP) should be capable of delivering real-time customer behavior insights.
It should ideally have a self-learning algorithm that can accommodate any changes in customer behavior across different platforms as soon as they happen. While these features make it difficult to build your own custom CDP, some teams do develop their own event integration systems. A custom system provides far more flexibility and often lower costs (at least from a vendor standpoint) but not without tradeoffs. With a custom system, your data team will constantly need to juggle requests for new fields and other changes in tables while still delivering your current reports. This can become very time consuming as your company grows. It’s just another hamster wheel in the way of progress on more important objectives.
Managing data integration with the modern data stack
Not all companies have large data teams or the ability to constantly invest in both building and maintaining their integration systems. Still others struggle with finding enough talent as their company scales quickly. This is where the modern data stack shines.
In these cases, using third party data integration tools to manage both batch and event based integrations provides a lot of lift for resource strapped teams. When implemented correctly, they provide a robust, scalable, and sustainable solution to the data integration problem and can mitigate the risk of integration debt impacting your data team.
Many of these solutions keep up to date with changing event schemas for you, so your data team can spend less time maintaining and updating data integrations and more time delivering ROI to management. Another benefit is that they allow your team the ability to quickly unify data across multiple sources, and they manage a lot more than just pulling in data. They can even help standardize fields and deal with identity resolution for you.
Additionally, because most of these frameworks are both lower code and more widely accessible, future hires won’t have to spend their weekends trying to figure out your custom built integrations.
Of course, the build vs. buy decision is nuanced. There are reasons to perform both actions. In truth, most companies will end up with a combination of both built and bought solutions before it’s all said and done. In a previous article I discussed the build vs. buy problem and put together a quick checklist. Let’s review that here:
When To Buy
✅ Your teams' main focus is not building software and they don’t have a track record delivering large-scale solutions.
✅ Your team has budget limitations and there are tools that can meet said budget.
✅ You have a tight timeline and need to turn around value quickly.
✅ Your team has limited resources or technical knowledge for the specific solution they would need to build. For example, if you need to build a model to detect fraud, but no one on your team has done it before, it might be time to look for a solution.
When To Build
✅ Your executive team needs a unique function or ability that no solutions currently offer.
✅ You have a bigger scope and vision for the solution and plan to sell it externally.
✅ You don’t have a tight timeline (Yeah right).
✅ Your team is proficient in delivering large-scale projects.
Conclusion
As companies continue to focus on their data strategies, the pipelines and integrations that get their data into a usable state are a key first step. There are 1000 different ways to develop your data integration systems. Many of these become costly over time due to migrations and a constant need to increase functionality. In the end, it comes down to building a data infrastructure that fits your company's needs today with future growth in mind, and the sooner you implement a scalable integration layer, the better.
When it comes to data integration, there are pros and cons to building and there are pros and cons to buying tools off the shelf. If your team needs to focus more energy on delivering ROI quickly, architecting a stack with the right tools is a smart way to get you off of the data integration hamster wheel so you can focus on those higher-impact objectives.
Overcome your data integration challenges
Download our Data Maturity Guide for details on architecting a stack to overcome your data integration challenges.Benjamin Rogojan
Seattle Data Guy, Data Science and Data Engineering Consultant