iVendi's Journey from Snowplow to a Fully Integrated, Real-time Stack
Steve Flitcroft
VP of Software at iVendi
Eric Dodds
Senior Director of Product Strategy
In this technical session, Steve Flitcroft, VP of Software at iVendi, shares how they built a fully integrated, real-time customer data stack by migrating from Snowplow to RudderStack, leaving their data infrastructure concerns behind.
- The challenges & limitations of Snowplow
- iVendi’s end goal: to easily collect & send data across the entire stack
- iVendi’s current stack
- The migration process
- What’s next
Eric Dodds:
Great. Steve this is, we've been trying to do this for some time, but you and I both I guess are very busy people. We're really excited to have you on this webinar today.
Steve Flitcroft:
Thank you very much. Yeah, it's been a long time coming.
Eric Dodds:
Yeah. And remind me, when did we start working with you? It's got to be, is it a year, maybe a little less than a year?
Steve Flitcroft:
It's over a year. I think it was first March 2020.
Eric Dodds:
Wow.
Steve Flitcroft:
At the start of the pandemic. I was in [inaudible], Rudderstack, and start to speak to you.
Eric Dodds:
Yeah. Very cool. Well, today we're going to talk about the journey that you took the company you're with on, from using SnowPlow to really building a fully integrated real-time stack. So we'll do some introductions, and then I think this is going to be really interesting because you and I have talked a lot over the last year, it's amazing to say that, about just some of the challenges that you had with SnowPlow and some of the ways that Rudderstack unlocked those. And then I'm really interested in understanding just the way you've architected your stack and learning more about that. I think it's always great just to learn how smart people are building their data stacks. And then we'll talk about what was the migration process like, and then of course we want to know what you're going to be up to next. So that's what we'll cover.
Eric Dodds:
So Steve, can you give us a little background on yourself, and then tell us about iVendi.
Steve Flitcroft:
Yeah, sure. So, I've been VP of software and R&D at iVendi coming on nine years now. So we do software, web services for the automotive industry in the UK and also in Germany. We act as a conjurer in between the vehicle dealer, the consumer, and also the financier. So we sit in the middle of all three, and we try to provide services to everyone to make everyone come together to allow people to purchase vehicles online in a safe reliable way.
Eric Dodds:
Yeah, that's great. And yeah I know just from working with you, I know that it's in some ways a marketplace software, which tends to be fairly complex, and we can talk about the implications of that for your data stack. I'm Eric. I run customer success at RudderStack and am excited to be chatting about data stuff today.
Eric Dodds:
Okay, so let's just dig right in. You were using SnowPlow, how long had you been using SnowPlow, and then let's just talk through these points that you pulled together around your challenges.
Steve Flitcroft:
Not sure if you've got the right slides, by the way.
Eric Dodds:
I think that those are, huh, oh maybe we are missing that. Maybe Brooks, can you, we'll just keep chatting through this, maybe Brooks, you want to go in and update the slides? The latest one? But yeah, why don't we just go ahead and start talking about it, and maybe Brooks can dump those bullet points in there.
Steve Flitcroft:
Yeah, sure, so we've been running SnowPlow and still are, probably for six, seven years now. We first talk about knowledge, the need to host our own, and own our own data rather than giving everything to Google. We heard that they were moving into the automotive space, so we really wanted to limit the exposure of our data to Google. So at the time, SnowPlow was a great alternative to host our analytics. Of course, at that time, SnowPlow only had a self-hosted version available. Have you found those slides yet?
Eric Dodds:
Yup. Got it. Sorry for the technical hiccup. Here we go.
Steve Flitcroft:
I maybe edited the wrong one.
Eric Dodds:
Not a problem at all. Let me just pull these up here. Okay, we are in business again. So self-hosted SnowPlow because you had some pretty detailed analytics needs. The hosted version wasn't available when you started using it which makes sense, starting six or seven years ago.
Steve Flitcroft:
Yeah, so the SnowPlow estate was AWS only in those days. I know it's a bit more flexible now, but it meant hosting lots of separate services on AWS, EC2 instances, Redshift, Elasticsearch, Kibana. Pop forward a few slides, two slides. Yeah. So all those individualized and we needed to stand up, upgrade, look after, monitor for a small team, a small data team inside a small SME, we just found, we were hesitating to upgrade it because of the many different parts. We found the data team was spending more time looking after SnowPlow rather than looking after the data it was generating.
Eric Dodds:
That's a common thing that we hear a lot with companies who try to build their own internal stuff, not even using an external tool were...so okay our engineering team is actually spending so much time working on the infrastructure and the pipelines, we kind of call those low-level plumbing problems. They're just trying to move data from one point to another point. And it turns out that it takes an immense amount of time to maintain those pipelines, especially when you scale. When you start moving lots of data, you run into latency issues, ordering issues, there's just a lot of stuff going on.
Steve Flitcroft:
I think another problem was because you're not inside of it every day, when something does go wrong, the mental work to try and get back in there [crosstalk] or work to pinpoint the bit that's breaking and then how to fix, how to remedy it.
Eric Dodds:
Yup. Yeah, that's a great point. It's not the core software you're building. It's not the products that you're in every day, and I think that's a really good point. But they did roll out the hosted option, and you looked at that but decided that was just going to be too much.
Steve Flitcroft:
Yeah, so we had brief conversations with them, probably a few months before I spoke to Rudderstack, but I think the pricing didn't align with our expectations, especially when COVID started to happen. We had to work after thepost streams, it wasn't a viable option for us, so we have to look for another.
Eric Dodds:
Yup. And let's talk a little bit about, the analytics were good and you sort of said we need to get away from Google Analytics, there's lots of limitations there. But flexibility was a really big deal. And we had talked, when we were preparing, just about how, from just a basic analytics use case it was fine, but you in fact needed to, your stack had grown in complexity, and you needed to make a lot more connections.
Steve Flitcroft:
Yeah. So if you flip through, again, and again. Yeah so SnowPlow, I mean it markets itself on being strict on the data governance, reliable, and scalable. But you'll notice a missing word there is flexible. And for us, in order to create a new event schema, event time, we just felt the process was onerous of having to curate that schema, publish the schema within SnowPlow Iglu registry, and then also hand-craft create the schema within Redshift as well. And all those parts had to align for it to then enable the data to stream through. So any of those parts you've got slightly wrong, it wouldn't work. So then, try to embark on a new project with lots of new event schemas, SnowPlow, the thought of having to do it in SnowPlow just didn't fill us with joy at all.
Steve Flitcroft:
Other issues that we found, and I must stress it's not...might have been our implementation of SnowPlow. But we found any new event type obviously meant the creation of a new table in Redshift, but that side table as it were always had to join back to the core central event table, which meant after many years worth of data, queries degraded over time. That became a limiting factor as well. Once your event table's gotten to a certain size, sort of am I in the processing from Redshift starts to take longer and longer.
Steve Flitcroft:
Another thing that made us look away from SnowPlow is they announced a Relay initiative in 2018 which came with a promise of in-built ability for these events to be passed over to third parties, different tools, but then a couple of years later it seemingly stalled because only one Relay was available at the time which was a Relay into Indicative product analysis tool. But that also meant you have to stand up your own landers, which had to interact with the pipeline, so you would still actually happen to do manual effort to enable integration. So again, I was looking for a tool, a system, an offering that meant our engineers could spend their time looking after the data and understanding the data rather than trying to get the data to where it should be.
Eric Dodds:
Yup.
Steve Flitcroft:
And again I've been reading the data landscape, it change away from E-Tel jobs and Hadoop and it's more into real-time streaming, data pipelines, and we just thought it's a nice time. We had a new project that was coming up, and we understood Redshift wasn't the prime database to enable the analytics that we wanted to store. So we started to look at alternatives on ClickHouse and thought, right, how are we going to stream the data into ClickHouse? We can't do it from SnowPlow unless we write lots and lots of integration code. So I was looking for a tool that could enable all that for us.
Eric Dodds:
And that happened to be Rudderstack. We're glad you found us. And could you just, I'd love to hear just a brief overview of ClickHouse. If people kind of think of, people-run analytics and the big three, Redshift, Snowflake obviously, and then BigQuery. ClickHouse is a really interesting tool though. And you've done some really interesting things with it. Can you just talk briefly about ClickHouse?
Steve Flitcroft:
Yeah, so ClickHouse is the engine behind Yandex, which is the Russian Google. And I was aware of its presence for quite a while on the performance boasts and that they were coming out with...I read Hacker News Twitter stream with data engineers, so I was aware of it. So I took a look into it, just the sort about use cases, query performance, and compared to a lack of Redshift was massive. 20, 30, 40 times quicker.
Eric Dodds:
Wow.
Steve Flitcroft:
And especially we were embarking on a new search engine offering, and we had to provide statistics for each product that was, that came back in the search results page. So you can imagine a hundred results per search. We had to store that information in order to give an impression camp per-
Eric Dodds:
Oh interesting, right.
Steve Flitcroft:
Per product. The search tool of choice for us is our goal here. And their analytics sweep is limited on every data writing to the top 1,000. So if we're storing 200,000 vehicles inside of Algolia, the top 1,000 isn't good enough for us. Because we need to give real data analysis, analytics for each vehicle. How many impressions, how many clicks we're using. And I noticed that Yandex had a project up that is based off the Yandex search engine analytics or hosted with ClickHouse. Some of the grass, some of the search, the interface was exactly what we wanted to give our customers for this new search engine offering. So it seemed a great choice.
Eric Dodds:
Yeah. And we had a fun time actually working with you to sort of architect the ClickHouse integration which is now used by lots of our customers, so that was a really fun project.
Steve Flitcroft:
Yeah, that was good too. I'll touch on that in a later slide, I think, so
Eric Dodds:
Yeah. All right well tell us about the end goal. So we've talked about some of the limitations of SnowPlow, some of the new projects you had, but you leading the engineering team and managing all of these moving parts and of course managing relationships of internal stakeholders like marketing and product experience and all that. What's sort of your end goal when you look at the big picture when it relates to the data stack at iVendi and what you're building?
Steve Flitcroft:
Yeah. Like I said, SnowPlow had us frozen. When product teams delivered new features, it took a long time for a centralized data team to try and extract any meaningful information to give back to the product teams to show them how this new feature or how this AB test was performing. So there was a massive time lack because they weren't able in any way to self-serve reports out of Redshift. So my end goal, my ultimate goal was to enable self-servicing data with the product teams. And also not be not to always use the same tool to allow flexibility, freedom of choice across the product teams. Because maybe some of them are skilled in amplitude or Indicative, we wanted a tool that would flow data into multiple destinations so we weren't tied into one technology.
Eric Dodds:
Yeah, that's an interesting dynamic especially as an organization grows because, and this is an oversimplification, but a lot of organizations start with Google Analytics, and then you add maybe another analytics tool in the mix, and then you realize, okay well that's not good enough and identity resolution's a problem, so then you start with event streaming, and then you realize, okay, well we need to get all that under the warehouse as well so that the analysts can write a sequel, and so you can kind of get into a sticky situation as far as analytics go and different versions of the truth as an organization grows if you don't have all of that solved at the data layer.
Steve Flitcroft:
Of course. There was a massive shift, well there still is, to serverside tagging away from the traditional client-side UI tagging, due to performance and ad blockers and Safari and all those different issues. So GA and SnowPlow, you're going to find it harder if you're just using those stacks. You don't get 20, 30, 40 percent of the traffic depending on your market.
Eric Dodds:
Sure. Right, and I mean that's enough to make or break a business. All right so walk us through the details. So you had these new projects starting with ClickHouse, SnowPlow wasn't cutting it, you needed deeper analytics and the need of flexibility for people to self-serve based on the same data. But walk us through these specifics, as you evaluate a decision and sort of building your criteria.
Steve Flitcroft:
Yeah, so as we were doing proof of concepts for all this, trying to do a POC inside of SnowPlow, you've got that strict regime of event schema that has to be created, published, your database has got to be updated and kept in sync. Now I wanted a pipeline that I could just throw a schema at, and it would end up in my warehouse. So of course Rudderstack will take whatever you throw it and it will curate the table and schemas for you. And not [inaudible] at the time at the schema it creates, it's good enough for you to then create [inaudible] views off the backlog.
Steve Flitcroft:
And we also, I wanted product teams to be able to go into Rudderstack and say "hey, I'm interested in these events you're streaming in" and point it to their product analysis tool of choice with very little or no engineering at all. They can just say, "all right, I'm interested in this event type from this team over here, stream it into my Indicative." And that's fully self-serve, nobody has to get involved apart from the product manager who can just say "yup, I'll have some of that." So that obviously allows quicker turnaround time, everyone's [inaudible] now, they want fast feedback loops, so the ability for product teams to push a feature like AB on their side and get the results within minutes inside of Indicative. The data team hasn't been involved, it's a win.
Eric Dodds:
Yeah.
Steve Flitcroft:
I was also, internally at iVendi we use event sourcing throughout the stack, so, we love the ability to move the checkpoint right back to a start of time, to replay those events to either rebuild projections or produce new projections on the back of historic event data. So, Rudderstack's saying that they enable replay of events that have come in there, is a great win as well. That just means I don't have to replay my events [crosstalk] too.
Eric Dodds:
Sure.
Steve Flitcroft:
Rudderstack, if, or form. I will say this point. We did look at other alternatives to Rudderstack. I had, I think I probably had five phone calls with Segment, with different people. At that point, I didn't even have a price. But from reading Hacker News and around the web, if you're using a not logged in if you've got a larger amount of anonymous users, Segment ends up being very costly, so I just thought we probably see about five million consumers in the UK pass through all our services. If we were to stream all that data into Segment, it would cost us a lot of money.
Eric Dodds:
Yeah, I mean especially in an e-commerce type context where consumers are doing research and so there's very heavy browsing. With a higher price point item like a car, they're going to create a lot of browsing data before they actually identify, and then those that do identify are probably a fairly small percentage of the total traffic.
Steve Flitcroft:
Yup, yup, yup. We also had another big crime in that we work for our customers, multiple big called dealers throughout the UK and Germany. They all of course have their own websites and their own Google Analytics accounts of the tooling, so it's nice for us to be able to...we were looking for a piece of software that was able to stream that data, not only to our own warehouse but also to their accounts as well, to keep it seamless and engineer-liked pipeline.
Eric Dodds:
That is such an interesting use case and actually, something that we've seen more and more where there's either a customer or vendor relationship or even an advertising relationship where there's some subset of the data that is being collected on your side that needs to be shared, which is interesting, or technology that runs on lots of customer websites, that needs to syndicate data or analytics back to them. And then another one that came up recently that was interesting is impression data for ads. So you can actually track impression data for ads and then syndicate that back to the company that's running those ads is another interesting use case. Just out of curiosity, are you using individual destinations and then transformations to kind of filter that to make sure that the correct data feed is getting to the correct customer?
Steve Flitcroft:
Yeah, that's right. So we would analyze the various IDs on our side and then we'll either notify out the event or we'll allow it through.
Eric Dodds:
Very cool. I mean that's pretty neat that you can just use the existing data feed and then filter it to provide analytics to your customers, that's really cool.
Steve Flitcroft:
Yup, yup.
Eric Dodds:
Okay, walk us through the current stack. So you've been through a lot of this journey, so tell us how you're running your stack and what you've put into it.
Steve Flitcroft:
Yup, so we use the Rudderstack javascript SDK, we proxy the course through our own servers, purely so it's not considered the third party so we always get the tracks coming through and the identifiers coming through. We also pipe events, serverside events, through our C# stack, and we also use, in a few places, the HPE, HPS Webbook integration as well. So it's multiple sources going into Rudderstack, and then obviously Rudderstack facilitates the relays of those to a combination of ClickHouse, Indicative, Slack, Redis as well in order to build up caches. A really interesting part is, obviously this SnowPlow to ClickHouse, Rudderstack and ClickHouse is we can run them side by side because of Rudderstack's warehouse options which allow us to pull existing stroke legacy data out of the SnowPlow database, and then I can put that into ClickHouse as well. So that means, I don't have to do a big band migration here, I can run them side by side and piecemeal migrate the acts over.
Eric Dodds:
Oh, interesting!
Steve Flitcroft:
Yup.
Eric Dodds:
Yeah, that's a really cool use for Warehouse Actions where you sort of incrementally migrate as opposed to having to dump years of historical data at one time, which there are all sorts of considerations around that.
Steve Flitcroft:
Yeah, and it's useful, we've also got various lookup tables inside of Redshift and RDS, so it's various useful, those actions, just to pull that into ClickHouse [crosstalk]
Eric Dodds:
Do you use Warehouse Actions for, I know you were talking about, I mean this is kind of the replays case, do you use it for anything else?
Steve Flitcroft:
Not at the moment, no.
Eric Dodds:
Got it, yeah, super interesting. That's a really cool way to approach migration.
Steve Flitcroft:
Yup. The last piece in the puzzle is we use Superset via preset.io which is great, you know it's Airbnb open-source AI visualization tool, but that fits nicely on top of ClickHouse and also supports most of the persistent engines as well. We were using, still are, QuickSight, AWS's QuickSight on top of Redshift and poached for us, but it's not the ideal tool. Airbnb invested a lot of time and resources so it's more feature-rich than QuickSight is.
Eric Dodds:
Yup, and that's sort of your highest level business intelligence visualization, yup.
Steve Flitcroft:
Yup, yup, as well as the Indicative product analysis side of things as well.
Eric Dodds:
Yeah that's really common, I mean I think that's where a lot of companies are going with their stack when it comes to analytics is that you sort of have some sort of layer on top of the warehouse, so Superset, Looker, Tableau obviously has been around for a long time. But some sort of layer that sort of sitting on top of the raw data and maybe doing some sort of processing, but DBT actually we're seeing a lot doing the processing and then just producing it for the visualization layer, and then the downstream teams can use whatever other tools they want. And at large organizations actually, you'll see multiple product tools, certainly multiple marketing analytics tools, so that seems to be a very common architecture when it comes to self-serve analytics.
Steve Flitcroft:
Yeah, and I'm not saying ClickHouse is our end destination, but with Rudder, I can obviously start to use Snowflake in places it needs a rise and it proves to be the best tool for a job.
Eric Dodds:
Yeah.
Steve Flitcroft:
I'm not tied in, I've got a flexible architecture.
Eric Dodds:
Yeah, I think that's the great thing. I think when self-serve analytics first became a thing, people thought, okay well it's a BI layer on the warehouse, and it's actually a data feed that can pipe to any warehouse and any analytics tool, sort of flexible by any team, which is interesting.
Eric Dodds:
Okay, so let's talk about Rudderstack. It's been so fun to work with you, but we'd just love to know the specific ways that Rudderstack helped you.
Steve Flitcroft:
Yeah, so the best win for us is the ease of use, the ability to send a new event into it and it end up in ClickHouse within half an hour, I've got that data, and I don't have to do a thing apart from send the event over to Rudderstack. I don't have to curate anything, unroll anything, create schemes or do anything in it. And that allows me to be super flexible and run proof of concepts, trial, and error without having to do a lot of work. You can also show the power of, to any event, flooding in you can single them out to send some into Slacks or your business, people inside the business can instantly see, oh, you know, here are all these events for a certain time flooding into a Slack channel, and I've got to do a couple of clicks inside of Rudderstack to enable that. And people think that's great. I don't have to hand-roll my own code, I don't have to stand up [crosstalk] go anywhere, I don't have to do anything.
Eric Dodds:
Yeah, that's one of those things that's funny where, as a business usually you kind of say, well can't I just get notified in Slack when this thing happens? But from an engineering standpoint, it's like, well the number of pipelines you have to manage to make something that seems so simple happen if you hand roll it is actually pretty wild.
Steve Flitcroft:
Yup, yup, yup. So you can actually see now, people are talking, oh that picture's going to be Q4, and I can watch this conversation happening and I can enable it within a couple of hours. That team's got to speak to this team, and then we've got to deploy that. But doing a few simple things to send Rudder some events and I can pipe it back into a different API or a different team very quickly, we've got an integration between the two. It may not be the perfect architecture, but at least it gets them going and they can get the first cut-up functionality out there pretty quickly. So it enables the cross-team collaboration of data and without a loss of engineering effort, and it doesn't have to go into the back of the backlog of wait until, for a few months time. You can enable things a lot faster.
Steve Flitcroft:
Last but not least I just wanted to show the power of Rudder into ClickHouse and there you can see a small view of the array data I was talking about earlier. So each one of those rows in there is a list of 40 vehicles, stands on one row, all the information for those search results is held in a row inside of ClickHouse instead of 40 rows duplicating all of the data, so it's very efficient from a data storage point of view. But at query time I can unroll those arrays in order to produce a normal standard table view as if it was not in arrays. So it's a very powerful way of storing data and querying massive amounts of search result information.
Eric Dodds:
Yeah, when you think about e-commerce categories that contain sets of related information...
Steve Flitcroft:
Title, anything like that.
Eric Dodds:
Yup, absolutely. Very cool. Well, let’s talk about the migration process really quickly. So we talked about the challenges, we talked about your end goal, some of the things that you love about Rudderstack and that we've enabled you to do, but you actually, you had to do the work of migrating, so tell us what that was like.
Steve Flitcroft:
Yeah, I mean I've touched on it, but we didn't have to do a big bang, it wasn't one day SnowPlow was there, the next day it was going to disappear. We can run the two side by side. Our first approach, Rudderstack you didn't have the ClickHouse integration available. I think another customer of yours had asked for it, I came along and you decided to put some engineering effort into it, and I think it was roughly three weeks you had it turned around and enabled. So that was super cool if you combined a partner who is willing to go that extra mile to turn around the new integration within a matter of weeks. It sent all right vibes. Similarly as well, I think I put Indicative in touch with you guys as well, and you turned that around pretty quickly, which was really helpful.
Eric Dodds:
Yeah. That's, I mean that is a fun part about where we're at as a company is that we work very closely with our customers to prioritize the roadmap sort of based on needs. So multiple customers need a destination and enough ask for it, and then we can prioritize that on the integrations roadmap. Because I mean with the thousands of tools out there, the integrations roadmap is basically infinite. But I would also say another thing for us which was really fun to work with you and the other customers on as far as ClickHouse is, there are lots of marketing tools, there are lots of analytics tools, but generally, the most important ones tend to be core infrastructure tools. So working on things like data warehouses, tools like Redis, streaming tools like Kafka or Kinesis, these are sort of the parts of the data stack that manage customer data internally that are non-negotiable and are core. And so, Rudderstack because we build for engineering teams who then sort of serve the downstream teams of marketing and product, etc., those are things that we want to make sure we support because it just makes integrating your entire stack and your existing infrastructure so much easier.
Steve Flitcroft:
Yup, yup. Yeah, so while you didn't have the ClickHouse integration, I was able to use another one we did have, AzureEventHub, and spit up a little process that would take all the events off EventHub and back to load them into ClickHouse, so I only needed to stand that up for a few weeks whilst I waited for your engineers to finish. So currently I mentioned we're running both simultaneously. There's no big rush for us to migrate everything off, and like I said we can pull data out of Redshift using Rudderstack, stick that into ClickHouse, so there's no mad rush for us to work through all of our web applications that are summoning SnowPlow events and convert them to Rudderstack. We're going to do it in a methodical way.
Eric Dodds:
Yup. Makes total sense.
Steve Flitcroft:
We can run them at the same time as well, so I don't have to remove the SnowPlow events, I can just add in Rudderstack ones, run them side by side.
Eric Dodds:
Yeah. Yeah, and since you're running SnowPlow on your own infrastructure, it's not like you need to continuously develop on it, you just need it to maintain stasis while you migrate everything over.
Steve Flitcroft:
Yup, yup, yup.
Eric Dodds:
Yup.
Steve Flitcroft:
Yeah, and you don't lose anything as long as you're using the same customer identifiers inside of Rudderstack and SnowPlow, it's easy to join the data together.
Eric Dodds:
Yeah. Okay, well I want to leave enough time for questions, so let's talk about what's next, what are you, now that you sort of have your stack in a place where it's flexible, your engineers aren't spending so much time on plumbing, you've made the product and marketing teams happy, what's next?
Steve Flitcroft:
Yeah, so obviously the added flexibility that you get with Rudderstack, we've lost a bit of data governance, I don't have to be as fastidious about event schema. I'd like to add that back in. There's various tools out there that can facilitate that from the design of new events and then handing over that information to your live team, your different product teams all the way through to your data engineers. And I believe you may have something coming up that can help with that.
Eric Dodds:
Oh yeah. I'm excited, and you're feedback actually has been really helpful in thinking through that. Data governance is a very important but actually pretty tricky problem. And different tools have sort of solved it in different ways. Of course, you have the SnowPlow approach where there's a very rigid process to deploy, and you sort of have this paradigm of things being strongly typed, etc., and we looked at that pretty hard. But ultimately when we talked with just tons and tons of data engineers doing work on the front lines, they said the same thing you did, which was, I need to deliver value pretty quickly, and this isn't the way that, this doesn't help me, but I also need better data governance.
Eric Dodds:
And so we're looking at integrating with tools, data governance tools that are out there via API, and then we've already started building several of our tools that will actually allow you to enact data governance on the stream via API in some pretty cool ways. Because it turns out that there are certain components that a UI is very helpful for, but things like version control, alerting, and other components like that where you already have infrastructure and process in your development life cycle, will really make data governance I think a lot easier to manage because it's just so difficult.
Steve Flitcroft:
So to dumb it down there, essentially if an event schema that's been coming through suddenly changes for whatever reason, there'll be some sort of interceptor inside of Rudder that will notify and help you identify that.
Eric Dodds:
Yup. Yeah, we're pretty excited. And there's a team working really hard on that, and so just keep an eye out in the next couple of months. Steve, of course, you've helped give us feedback, and so hopefully you can give us feedback and maybe an early version, but for everyone keep an eye out because in the second half of this year, we'll have some stuff coming out that I think will be pretty neat.
Steve Flitcroft:
Good news.
Eric Dodds:
All right. Sorry, go ahead.
Steve Flitcroft:
Yeah, the next one was, as a company, we're moving over to HubSpot, so the already in place Rudderstack integration with HubSpot is going to come in useful. And with MailChimp we can start to automate marketing campaigns and various features like that with, again, very little engineering effort. To roll out this ability for all teams to send their data into Rudderstack and allow them to self-serve into the tools of choice so they can get the feedback on their features, the data, the AB tests, quickly, they don't have to have a conversation with different teams which will hold them back. So hopefully we can move faster with product development as a whole.
Eric Dodds:
Awesome. Okay, well thank you so much for joining us, everyone. And we have some time left for questions, so you can raise your hand, pop something into the Q&A, and I'd be happy to, if you want to ask a question, you can raise your hand and I will unmute you. One question here for you Steve, how big is your data team, and how do you have it structured? What are the different roles?
Steve Flitcroft:
Yeah, our data team is quite small. It's probably three or four of those at the moment, but we do work, I mean we do billions of finance quotes every month across all our websites, across all our services. We have a lot of data, we have a lot of data points, we have a lot of analytics. So it's a very small team, four via state we have. We're trying to grow the team, but we'll see. Data engineers, I'll have to come by especially in the UK, never mind within our geographical, you know in Manchester in the UK, they're pretty hard to find. There's lots of competition, so we are trying to grow the team.
Eric Dodds:
Yeah, absolutely. Well if there's anyone listening in the UK, and you're interested in working for a great data team, feel free to reach out. Another question, what were you using for marketing before HubSpot?
Steve Flitcroft:
Our old CRM was...I've completely gone blank.
Eric Dodds:
Well, that means it wasn't that inspiring.
Steve Flitcroft:
I'll come back to that one, I'll just find it.
Eric Dodds:
Yeah, yeah, no problem.
Steve Flitcroft:
I'm embarrassed.
Eric Dodds:
Well, you're...you lead the engineering team, I don't know if-
Steve Flitcroft:
I'm not just a CRM that often.
Eric Dodds:
Yeah. All right, well I don't-
Steve Flitcroft:
Capsule, Capsule.
Eric Dodds:
Okay, gotcha. Got it. Yeah, you don't see that one super often. I don't see any more questions coming through, so I'll just ask you the last question. Any high-level advice for someone building out a data stack or starting on the journey you were starting on?
Steve Flitcroft:
I'll just say, I mean using Rudderstack, even the free trial only cloud version, allows you to proof of concept things very quickly without a lot of effort, and it's perfect, especially if you've got a small data team or no data team, it's a perfect starting point. Get in there, maybe trial a few of the warehouses and just try them out, but having a...Rudderstack is the fulcrum. It's very easy to switch and choose and grow at a time. Everything's a little trial and trying to work things out because it's really hard to say, right I need a data pipeline, I'm going to choose this, X, Y, and certain B. Be focused. You can read Hacker News. I think there was a post in 2019 that said, "what's your BI stack?" I think you've got two of the different answers.
Eric Dodds:
Yeah, yeah.
Steve Flitcroft:
It is that it's diverse. There's so many products and tools and databases available, that it's really hard just to say here's our stack because every use case is different. You'll need to analyze that, and work camp the best tools for what you need at this moment in time and probably could take you in the future if you can think that far ahead.
Eric Dodds:
Yeah. Yeah, that's great advice.
Steve Flitcroft:
Flexibility is key. The ability to pivot and shift if you have to though.
Eric Dodds:
Yeah, I agree, I agree. Even if you...I think one example of that we've seen a lot is, okay I signed an annual contract with this analytics tool, well guess what, when the year's up people are going to say whether they like it or not, and if they don't, you're going to have to change it out. And that is non-trivial if you've done direct instrumentation. Well Steve this has been really helpful. Thank you to everyone who joined. We will send out a recording of this for those who didn't join and thank you for your time.
Steve Flitcroft:
Excellent, thank you very much.