Real-time eCommerce Analytics with Big Query and RudderStack
What we will cover:
- The evolution of the eCommerce stack: how have you modernized your data stack over the last 5 years?
- Warehouse-first analytics VS traditional SaaS analytics for eCommerce
- eCommerce performance: how to achieve site speed and real-time analytics
- Serving marketing and data science from the warehouse
- What's next: real-time recommendations with RudderStack and Redis
- Q&A
Speakers
Eric Dodds
Senior Director of Product Strategy
David Annez
Head of Engineering
David is a leader and generalist engineer focused on business outcomes over technology. He leads cross-functional agile teams of engineers, designers and product owners.
Transcript
Eric Dodds (00:02)
Thank you for joining this webinar with RudderStack and LoveHolidays. So, here's a quick overview of what we are going to cover today. So, the topic is real-time e-commerce analytics with BigQuery and RudderStack. We'll actually talk about more complex parts of the stack, as well, but we'll get into that in a bit. So, we'll do introductions. You'll get to meet David from LoveHolidays. We'll talk about the evolution of the e-commerce stack, so how were things done previously, and then how are they done today by forward-thinking companies like LoveHolidays. David will talk us through what warehouse-first analytics is and how that compares to SaaS analytic solutions for e-commerce.
Eric Dodds (00:50)
We'll talk about site speed, which is a big deal. It's been a big deal in e-commerce for a while. It's becoming critical for e-commerce and then also, really, any consumer-facing experience. And then David's going to walk us through how he feeds both marketing and data science from the warehouse model that we're seeing more and more in modern companies. And then we'll talk through some of their plans around doing some real-time recommendations. So, let's dive in. David, welcome. Thank you for joining the webinar. It's great to have you here.
David Annez (01:25)
Thanks for having me, Eric. And yeah, good to be here. I guess just an intro for me, I'm currently head of engineering at LoveHolidays, or in the US should say LoveVacations. We currently sell holidays and hotels in the UK and in Ireland, and we generally have around 20 million unique users a month, so pretty big scale. And data is a key part of basically everything that we do, and especially when we're trying to monitor customers. And I'm responsible for how we apply it, not how we analyze it. Luckily, that's our data science team. And then, also, how we feed that in from the website and how we filter all of those events into some good views for the business.
Eric Dodds (02:17)
Very cool. And just a quick question. What were you doing before you joined LoveHolidays to lead engineering?
David Annez (02:23)
Yeah. So, I was head of engineering for Uswitch, which is a price comparison website. And I actually did a lot of the same there. In fact, integrated with similar approaches at Uswitch. And I spent seven years there [inaudible 00:02:41] core infrastructure and the website together with about eight engineering teams.
Eric Dodds (02:47)
Wow. Tons of experience. Can't wait to learn more. A quick introduction for me. I'm Eric Dodds, and I lead customer success here at RudderStack. So, let's dive in. Let's talk about the evolution of the e-commerce stack. And this is a subject that David and I chatted about before the webinar as we were preparing, and I think it's really interesting. So, David, let's just talk through some shifts that we see. And just to set the context here, we're talking about, really, modern companies and pretty large-scale companies. Right? So, this doesn't necessarily apply to maybe a smaller e-commerce business running on a platform. But before a lot of companies, even really large scale, and you still see today a lot of companies running platforms, but we see a larger move to owned infrastructure. So, that's the migration from Shopify to modern JavaScript frameworks. Do you want to talk a little bit about that, David, and why we see that happening?
David Annez (03:55)
Yeah. So, I guess it depends on where you're at that business level. Right? You start up, scale up, and scale-out. But I think a lot of the time, you end up building some basic systems, they do what you need them to do, And once you start growing, you want to differentiate yourself and you want to start building out something that is potentially more custom and dealing with your demographics in a better way. And additionally want to move quicker, right? Maybe you have your own engineering team at that point and you're building several other engineering teams. Once you have that, there's probably a desire to start building out your own platform. And very much catering on the e-commerce side, you're building out... That's your differentiator, right? I think, especially in the LoveHolidays world, we've decided to build our own e-commerce platform because that's how we become the best holiday finder for customers. And I think the same can be applied for most e-commerce sites as you start scaling up and then scaling out.
Eric Dodds (05:04)
Sure. Yeah. I think an interesting way to summarize that would be platforms is great. I mean, Shopify is, really, an incredible tool in many ways, but by nature, it has to do many things for many types of users. And as you grow and focus, you reach the limitations of that.
David Annez (05:24)
Exactly. And I think it's the classic build versus buy and figuring out at what point in your time, I guess in the growth of your business, that you decide to do that. But because accessibility of building your own platform and owning that, I think it means that a lot more people are building versus buying because of the versatility they can do with it.
Eric Dodds (05:48)
Sure. All right. So, third-party vendors in black-box machine learning and AI, I put that in quotes because that's a subject unto itself, but I think we're seeing a shift towards... from some of those tools, again, as we think about companies that are scaling to massive volumes with more focus, more to an own data science function. And one example I'll give here is recommendations. I mean, there are tons and tons of third-party SaaS tools and plugins and everything for e-commerce, and a common one you see is recommendations. Right? Like make the best recommendations, and it's an outsourced algorithm of sorts. But more and more we're seeing people move towards their own data science, and I think that's something that's happening inside of LoveHolidays. Can you tell us just a little bit about that?
David Annez (06:38)
Yeah. I think it's a really interesting one. Right? I think that preset machine learning models and black box AI vendors, et cetera, could only get you so far, and I think that once you start understanding the domain and the needs of your customer and the data that you have, it starts becoming quite apparent that having an in-house data science team to build out those models is going to get you there a lot faster over time. I think like [inaudible 00:07:10] and that's where LoveHolidays is that now. Right? It takes a lot of time initially to gear up and get that all setup and make sure that you have the right people to build out those models, but also to think about what we should be looking at in the future.
David Annez (07:27)
But once you have that, you can move a lot more quickly because everything's in-house, the models and the underlying technology's understood, and we can make tweaks and we can continuously improve it to actually cater much better for our customers. I think the black box recommendation engines that you just feed it data, but in fact, if you think about a holiday and all of the different combinations and the... Actually, we know that, for example, people that are in the south of England have a larger propensity to book Greek islands versus in the north of England, and once you start thinking about the complexities of demographics and holidays, it made a lot more sense for us not to potentially buy that system, but actually build it in-house so that we can own it and evolve it over time.
Eric Dodds (08:16)
Sure. Very cool. I'll quickly go through number three, siloed data, before, to unify data in the warehouse. This is pretty common. I mean, there are still a huge number of companies who are in the process of making the shift or haven't made it yet, but I think most of our audience will really have a good handle on why that's important. The example here is many times, especially in an e-commerce context, you're using Google Analytics very heavily. It was the standard tool for optimization and e-commerce. And then some sort of combination of internal platforms, pulling data from the database and somehow combining all of that information to produce some sort of business intelligence, which is a technical mess. Versus leveraging a modern data warehouse like BigQuery and just dumping all of your data in there, which works way better.
Eric Dodds (09:12)
So, let's skip to the last one because I want to be conscious of time. It's common knowledge in e-commerce, speed equals better performance, better conversion rates, et cetera, and it's been that way for a while. But speed is really becoming mandatory. Recently, Google released news that they're going to have even more strict requirements around site speed. So, David, talk to us about your experience dealing with that. Sounds like you've had a decade of dealing with the front-end user experience in an e-commerce context, so tell us about site speed and how things have changed.
David Annez (09:49)
Yeah. It's quite funny, I think that we've put ourselves into this position by building complex applications driven by JavaScript, but that's [inaudible 00:10:01] topic [crosstalk 00:10:03].
Eric Dodds (10:03)
We blame ourselves.
David Annez (10:04)
Yeah. That's a whole other webinar. I think that performance is something that [inaudible 00:10:11] LoveHolidays, we proved that he had a significant impact on conversion. But it's not just about the conversion side of things now. It's actually about SEO visibility. Over my 10 years working in the front-end I've focused drastically on performance, not just because it gives you a better experience, but you tend to build better systems if you focus on [inaudible 00:10:37] performance systems. And nowadays, everyone is trying to chase this performance goal, and now that Google has set it, it's obvious that we need to ensure that we're at least adhering to some sort of principles around it. Driving a better experience is one thing, but also in ensuring that you're building something that is maintainable for the future.
David Annez (10:58)
And the Google requirements is just that last key point where I think a lot of companies that have probably been doing it, maybe ad hoc, have now said, "Well, actually, we now need to do some serious work on this because, in fact, it's going to affect our SCO," which then, of course, in turn, affects your overall revenues as you're relying on that traffic source.
Eric Dodds (11:19)
Yep. Kind of making performance the first principle as opposed to a project.
David Annez (11:25)
Exactly. Yeah.
Eric Dodds (11:27)
Well, let's talk about warehouse-first analytics versus SaaS analytics, so going back to the point earlier around heavily using Google Analytics and all the other e-commerce SaaS analytics tools versus the warehouse. So, David, I don't want to go through bullet by bullet, I'd just love to hear you talk to these points around what are the challenges and even benefits of SaaS analytics, and then why did you make the strategic decision to move to warehouse-first analytics where you're streaming events into the warehouse and then doing what you need to do there in a variety of contexts, but a primary one being the analytics use case?
David Annez (12:13)
Yeah, absolutely. So, I mean, I think just to quickly talk about the benefits, when you've got SaaS analytics, you drop in a script and, hey presto, you've probably got some sort of view on your customers and maybe conversion and even performance metrics through some dashboard that Google Analytics or some other provider has given you. Which is great, right? I think that's a really great start to start to understand the data that you have and the website usage.
David Annez (12:39)
But the problem becomes that once you start wanting to do more with that data, once you start wanting to use that data with other parts of data and actually build out potentially more complex dashboards or, in fact, enhance the data that you have on the site, things become a bit more brittle. And because SaaS analytics tries to generalize for all, you end up with a mishmash of datasets, you probably end up with some of your custom queries, and you probably end up trying to build views on top of things that were never meant to be built upon. And I think that that gets quite hard over time because, as you want to absorb more data and as you become maybe more data assisted in your day to day, you probably want to have that beautified view of everything and you want to have that deeper dive into that data. So, that's like where the limitations lie.
David Annez (13:31)
And I think the other part is because with SaaS analytics, it's trying to do everything, and when it's trying to do everything, they focus more on features versus performance, which means that actually you're probably loading in loads and loads of JavaScript to potentially just track some page views and you don't need it. But because that's been built up over time, all of those additional features are just there and you can't really do much about it. And naturally, you can't really contact Google Analytics and tell them, "Hey, can you give me something different, please?" That's, I think, where it gets quite hard, and then you start thinking, "Well, do I build my own, or do I do something different?" And I think that's where warehouse-first analytics really come into play and where we've seen the big benefits from it.
Eric Dodds (14:19)
Sure. Yeah. It is interesting going back to just thinking about our conversation around the platform versus modern JavaScript infrastructure for e-commerce, it seems like SaaS analytics is similar where there's a great place for it, very useful, but as you get into scale and specificity, the tools that are all things to all e-commerce users don't really make sense, or they're they make more complex optimizations untenable.
David Annez (14:51)
Yeah, absolutely. And I think that it's like one of the things that you end up trying to do at a larger scale when you're a bigger company and when you want to have that visibility, is unify your data and unify and visualize it in the same way so that you communicate in the same way to everybody in the business. And unfortunately, if you have 25 dashboards trying to tell you one individual thing, maybe one's for performance and one's for web analytics and the other one's for conversions, you end up not being able to create that cohesive view for the business and for yourself.
Eric Dodds (15:25)
Sure, sure. Very interesting. And then let's talk quickly about cost. So, one thing that is really common is you start out using a SaaS provider and it is very cost-effective initially, but the unit economics, especially in e-commerce, can have a very, very low tolerance for cost at scale. Right? Where, at a lower scale, the pricing models make sense, but then as you scale up, you're looking at, basically, what is the unit economics around each transaction? So, is that something you've experienced with SaaS tools that you've used in any of your previous roles for LoveHolidays?
David Annez (16:12)
Yeah. So, I think that when we start talking, I think specifically, there are some products out there that are great but, unfortunately, charge you for that, what they call, an active user or a user ID. And when you start growing and when you start adding customers to, I guess, your visits, then it becomes pretty hard to afford that sort of cost. And I think some systems go to some ridiculous numbers, which makes it really, really tough to justify. I think that the flip of that, as well, is that the SaaS analytics, where maybe it's free for a good period of time, or it's free for forever for all numbers, then actually don't give you any sort of features that you can use to then further analyze it. Right? So, it's like you have those two models where one of them is that if you're at scale, you pay hundreds of thousands, or you pick a free system that then actually to do anything with, you probably need to pay hundreds of thousands to get the data out of it. I think Google Analytics is a good example of that.
Eric Dodds (17:23)
Sure. Yeah. With 360.
David Annez (17:25)
Yeah.
Eric Dodds (17:26)
Yep. Well, speaking of 360, so this was the stack that you had before, so do you just want to give us a quick run through and then we can look at the stack that you're running today?
David Annez (17:37)
Yeah, sure. So, when I joined LoveHolidays, we have a very complex conversion rate model and attribution model, which I'll talk about in a second, but all of the data and all of the analysis were based on analytics. And we pay for Analytics 360 which, if you don't know, Analytics 360 lets you export data into BigQuery. Now, you have no control over said export, and actually, that export, it happens every day, but you don't know if it's going to break or not because, in fact, you have no visibility into what Google is doing. And I think, on average, out of seven days a week, we would end up with at least four days a week with delays in reporting. You can imagine the cost of that to the business because, in fact, this was our way of reporting on attribution and conversion, and sometimes we weren't even getting our conversion reports the next day.
David Annez (18:31)
So, the CEO was waking up in the morning, looking into dashboards and the dashboards hadn't updated so we had no idea about our performance. And then to top that all off we were using, like Tableau, was feeding into that, and Tableau is fine, but unfortunately, we couldn't scale that out to the entire business so there was limited company visibility into some of the deep analysis or the deep attribution conversion rate models that we built. So, it was very hard to... And this just kept becoming an issue over time, so it was clear that we needed to move away from this.
Eric Dodds (19:05)
Sure. Yeah. I can imagine days of delay in conversion reporting during a peak season for travel e-commerce is very damaging-
David Annez (19:21)
Yeah. Yeah.
Eric Dodds (19:22)
... from a business standpoint.
David Annez (19:23)
And I think just to add to that, we were running loads of AB tests and [inaudible 00:19:29] to be able to see that performance and that [inaudible 00:19:31] we had different service looking at that information. We also moved to our own system and then we didn't have the same visibility as quickly as possible because we were still using Google Analytics to report on conversions.
Eric Dodds (19:46)
Right. Which becomes a significant efficiency problem. Right? Because if you're running an AB test and if you don't have the results even one or two days of something not working, you want to turn it off once you see the statistically significant results. Okay. The warehouse-first stack, so talk to us through what you run today.
David Annez (20:10)
Yeah. When I joined, we didn't even have a mobile app, now we do. And we currently feed all of our data through RudderStack, all of our web events and all of our app events, and even some of our server-side events from some of our applications. We have a destination that's set up to BigQuery, that destination gets updated every 15 minutes, and then we have multiple different use cases for that. And we chose BigQuery as our store, primarily, because that's where we can do a lot of different things off the top of BigQuery.
David Annez (20:47)
The plan of that, which talk about in a bit, is to move to PubSub, but currently, we then feed all of our BigQuery into Data Studio. And every team has different views on top of all of these events and they can create their own visualizations depending on what they want to see from the web events that we send. And then we have Looker, which is kind of... We're not thinking of moving everything potentially to Looker, but Looker is way more powerful for when you want to slice and dice the data and potentially add different data sets that may not be in BigQuery at that given point. So, we have that for some deeper dive views, especially when we talk about conversion rate.
David Annez (21:28)
And actually, the big thing that we do off the back of all of this data is we have financial data, et cetera, that we need to process to add to our attribution report to understand how much money we've made, et cetera. So, we run a job BigQuery, we export to Avro, we put it into cloud storage, and then we run through Apache Spark where Apache Spark connects to a lot of different databases and a lot of different endpoints where we gather data from different parts of the company, but also external. And then that actually processes what we would call user sessions in our attribution report and then filters it back into BigQuery, so we have this great view of our conversion and our attribution, but it's also always readily available. Right? Most people now have the ability to see data within several hours versus, I think, it's sometimes several days.
Eric Dodds (22:23)
Sure. Yeah. I mean, we call this real-time e-commerce analytics, and real-time is an interesting term for sure. So, you're doing event stream via RudderStack into BigQuery, and you went from every 24 hours via Google Analytics 360 to 15-minute syncs to all your downstream tools.
David Annez (22:48)
Yeah. Yeah. And I think-
Eric Dodds (22:49)
That's incredible.
David Annez (22:53)
Yeah, it's pretty crazy. Right? And I think it's something that we celebrate regularly because what we actually did for a long time with the data engineering team was if the reports didn't come in by 9:00 AM, it would be an incident, so you could imagine most of the time the data engineering team was responding to an incident because the reports were coming in. And it wasn't there under their control, but I think it's a big win to go from, I guess, 60% a week incidents to no incidents because now we have that clear and very quick way of Reporting.
Eric Dodds (23:27)
Very cool. That's incredible. And really cool that you're using... I mean, BigQuery is a really cool tool. That's probably another webinar because I'd love to nerd out on that. But Data Studio is a great tool for many things. And then the combination of Data Studio and Looker for different purposes I think is really smart in many ways. Because Data Studio can be really good for self-serve type things, but it has its limits. So, it's kind of cool to see that analytic stack there.
David Annez (24:00)
Yeah. Yeah. I think Data Studio definitely has its limits. I think it's getting better by the day and they're constantly adding to it. But I do see that as the smaller child of Looker where it's great for that quick visualization, and then if you want to go out and do some bigger visualizations and connect to other things, then Looker is your choice.
Eric Dodds (24:24)
Sure. Very cool. All right, let's talk about site speed because we mentioned this before, but we actually had the chance to work with you to do some pretty cool things with the SDK. So, initially, when we started talking, you had some pretty significant requirements around speed, so I'd love to just talk about the limitations of the SDKs you were using before and then just talk us through the two methodologies that we worked together to have a significant step-wise decrease in load times.
David Annez (25:07)
Yeah. So, I guess prior to RudderStack, we'd actually started migrating to that warehouse-first analytics with Segment. And I think Segment does some things really well, but unfortunately, with our clear focus on performance, we noticed that the segment, JavaScript snippet, wasn't exactly particularly performing and was also doing a lot of things that we didn't need it to do. We have, I think, quite a straightforward warehouse-first integration. I think others don't. But, at the same time, we wanted to be able to change it, improve it, make it better, and it didn't seem that we could.
David Annez (25:49)
And unfortunately, we were seeing that generally in [lighthouse 00:25:52] and performance scores, saying it was one of the key outliers, along with Google Tag Manager and Google Analytics. But in fact, we felt that what we were doing with Segment, was causing us problems, and we didn't really see a way out of that. Segment has gone for that Jack-of-all-trades trying to do everything and with all of the destination [inaudible 00:26:15] it has, but we felt like it wasn't configurable enough to give us that performance. So, that's why we started looking for an alternative, where RudderStack came in.
David Annez (26:27)
And I think that to talk about where we came from, so we worked really closely with RudderStack, which was awesome at the start, to define what we thought was a good JavaScript SDK and give some context around the performance and what was needed and why we thought that this was the key part for us going forward. And I think there were two key areas. Right? One of them was generally like the size of the bundle, so download as little as possible depending on what you're using it for, which was really well understood because I think what had happened is RudderStack had built out the destination support for all of the key destinations, but it was still loading that all for everybody. And with that kind of context of, "Well, actually, we could just choose to what we load in..."
David Annez (27:25)
And I argue that most companies have some control over their code so you want an engineer to potentially connect to destination [inaudible 00:27:35], which means that we can then just say, "Well, we'll acquire that destination explicitly," [inaudible 00:27:40] RudderStack move towards this model of it'll only load in libraries needed for the destinations that you're defining, which reduced the bundle size drastically. I can't even remember the numbers. It was something like 200 kilobytes to 60 kilobytes or something like that, which was a huge win, and also reduce the parsing time and the execution time of the script.
David Annez (28:04)
There was another piece, actually, that isn't necessarily called out here, but I think it's super cool because it's something that we do with all of our JavaScript, is the RudderStack SDK only sends modern JavaScript to modern browsers and legacy JavaScript to older browsers, which means that your users are using a more modern browser, which is always going to be over 90%, are actually getting a much smaller bundle than the users that are... because it's heavily optimized for that modern browser, which I think is awesome. I think more companies should be doing stuff like that because you can start focusing on building something that's performing for the 90% versus for the 10%.
David Annez (28:43)
And then, I think the second one was then starting to think about... Well, we send a lot of events when you land on a page, and those events can be synchronous because the default XHR request is always synchronous. How do we ensure that we don't block the main thread, and how do we ensure that the rest of our experience continues to happen and we can still get all of those events from RudderStack? And that's where we started trialing the Beacon, which is a widely supported API, but actually, in fact, a lot of SDKs still don't use it because of legacy browsers potentially, but also because there are some quirks with it. But we wanted to start using Beacon, one, because it's asynchronous, especially when you're offloading events after you move from one page to the other, and, two, because it improves the performance from the sending of those events. And we actually run an AB test of this, and we were trying to measure the impact was on loading time, was on total blocking time, and on first input delay.
David Annez (29:57)
And if you don't know, first input delay is one of the key core Chrome web vitals that Google will start reporting on as a key one for SEO scores [inaudible 00:30:06]. And you can see the graph that actually shows that the first input delay of the Beacon, so using the same Beacon API, was just dramatically lower. Right? So, I think it was like 200 and something, nearly 300 milliseconds, actually, to under 20 milliseconds first input delay by moving off the synchronous XHR request to the Beacon. And naturally, we can just turn this on. Right? So, for people that still want to use the XHR, because there are some reliability pieces, then you can choose to. But the performance impact was huge, and I know we are confident that this isn't going to affect our first input delay for the future.
Eric Dodds (30:49)
Incredible. Well, I can say from experience, it was really fun to work with you on building those out, and we still talk internally about the performance improvements because it's pretty incredible. Okay. Let's talk quickly. So, one of the interesting things about a warehouse-first approach is that instead of leveraging multiple SaaS tools, you can actually feed marketing with the analytics use cases we talked about, but you can also feed data science. So, can you just quickly talk us through what types of data or products are you delivering to marketing and product teams, and then also the data science team?
David Annez (31:31)
Yeah. So, I think the product is BigQuery data, really, but we then have managed to overlay that with some really great reports for marketing and product around intraday conversion rate reporting, which they've never had before. So, being able to see that hourly view has been really great. And then I think the other one that I've already mentioned is that attribution reporting, which is always available on time. And actually, then, will alert to any kind of anomalies that we've seen.
David Annez (32:01)
But speaking of anomalies, I think the more interesting one is really where data science is starting to feed into this, and data science is now using this data that's practically... I mean, 15 minutes to us is real-time and is using the data and consuming it for multiple different uses. One of them is we are consuming that data for anomaly detection. We have our own anomaly detection system called Helios, and we actually feed it all of the Bigquery data in different cups so that it starts learning and it actually alerts us on Slack if there's anything off. And that's been really awesome because prior to that, we could only really use the anomaly detection for anything that was somewhat fresh data. And it wasn't really to do with the users on the site. It may have been site errors and things like that because we use different services for that, but not actually conversion rate, which I think it's our bread and butter. That's what we use to measure.
David Annez (33:03)
And then I think two other points for data science that have been really awesome was now we have... Our AB testing platform is fully in-house now, and because we have this regular, fresh data, we can actually refresh that data hourly. We have some very complex models behind how we figure out if an AB test is winning or not because we run lots of different ones. But now, we can actually understand them way faster than we could before. And we have that visibility for the entire company of all of the tests that we're running across the entire business and seeing that update on a regular interval.
David Annez (33:39)
I think the last one that I didn't put down there is that we're building our smart sort order and that smart sort order is all based off the RudderStack data that happens on a 15-minute interval. So, we're now building a model that is going to try and beat the current sort order that we have by taking in all the impression data, the product, click data, the product detail data that we feed RudderStack and then start actually applying a different sort on an hourly basis to try and understand what's the most optimal sort order for finding a holiday.
Eric Dodds (34:14)
Very cool. That's so cool. I love nerding out on stuff like that. Quick question for you on the AB testing infrastructure. What were the main drivers of bringing that inside of the company and owning it, as opposed to using a third party? I mean, bringing a lot of things in-house is non-trivial, but you start to get into some pretty gutsy statistical significance modeling, et cetera, all that with AB testing infrastructure.
David Annez (34:43)
Yeah. It's a really interesting one. I think that there were multiple different reasons. One of them, I think going back to that SaaS model, was cost. With the sheer size that we're at, we're talking in the half a million cost a year for some of the AB testing services out there that we're [inaudible 00:35:05] for our use, but that wasn't just one of them. I think in addition to that, we wanted to control, one, the way that we were doing our AB tests, but we also wanted it to feed it with our own data. And you can't really do that with external AB testing systems. And I think one of my pet peeves is I don't really trust external AB testing systems most of the time, so I'd rather have myself and a data scientist spend some time looking at the numbers and having control of that.
David Annez (35:35)
But when you start thinking about the next steps of AB testing, you start talking about what you call multi-armed bandit testing, which is smart multivariate testing that will change the traffic allocation depending on how well the test is performing. But you can only really do that if you own the AB testing framework because you want to use your own models to understand how tests are doing better. And that's when it starts getting very interesting. And that was one of our core parts of the business is running AB tests because that's how we test and learn when we're not user researching. That was just one of the key things where we felt we can build this in-house, we can probably build it better, and we can always upgrade it with our own pieces, much like the warehouse-first model. And we felt like it was a clear one that we would do in-house.
Eric Dodds (36:28)
Sure. Yeah. I think this is a really interesting example, and we'll move on to the next slide in just a second. I think it's a really interesting example of how the complexity of both the process and requirements, but also the technology, change when you get to scale. Right? SaaS tools are great up until a certain point, but when the requirements reach a certain scale and level of complexity, having ownership is a big deal. That's something we actually talked about a lot where when companies aren't at that scale yet, it actually is still helpful to make the investment of unifying your data in the warehouse now so that you have that data to build models in the future.
David Annez (37:15)
Yeah.
Eric Dodds (37:18)
Okay. Let's talk about a project you have coming up, and I know this is still early, but it's pretty cool. So, real-time recommendations with RudderStack and Redis, and I can just talk through... This is just a basic, this isn't necessarily reflective of your exact stack, but this is just a basic view for the audience. So, there are two components that we usually see. One, we call the behavior-based for page-to-page personalization, in which you're running an STK in your website or app that's coming through the RudderStack event stream. That can feed Redis. And then you're running a model that makes recommendations, and Redis uses that information that's coming in real-time to modify the user experience. And a classic example we give there is, someone makes a purchase and you want to recommend additional products that they might want to purchase, and that's this page-to-page. Right? When the confirmation page loads and they see other products that they want to purchase, you can drive that using the event stream in Redis, which is really cool.
Eric Dodds (38:26)
The other thing we see that's interesting is using warehouse actions for more demographic-focused or user profile focus personalization. So, for example, a lot of companies, maybe when you're thinking about enriching a user profile with some sort of, maybe, third-party data source or some computed trait in the warehouse, you can actually push that back from the warehouse using RudderStack warehouse actions into Redis so there's a profile store. And the example we give here is let's say you have a computed trade that says someone's more likely to respond to a particular offer in a particular season and so you want to serve that to them the next time they open the app. Well, if you push that from the warehouse into Redis and Redis is driving the model that opens it up, or we even see companies using downstream SaaS tools like Braze that manage [inaudible 00:39:22] experiences, or a lot of internal builds as well, you can drive that with computed traits on a profile level as well, which is really interesting. But I want to hear about what you're thinking around personalization at LoveHolidays.
David Annez (39:37)
Yeah. Yeah. I mean, it's super interesting stuff, really, with what you can do with it. And I think one of the things, actually, that we've recently, we tested and figured out, is we have this concept of what we call recently viewed hotels, and a lot of customers that come to us, use us to browse a lot of hotels and to get an understanding of what options they can go on and what hotel they want to potentially take a look at. And interestingly enough, what we found out was we have a very dumb feature on the homepage that stores all of those recently viewed hotels on your browser, and then you can see them again on the homepage. And in fact, it has a pretty significant conversion rate if we remove that.
David Annez (40:25)
So, what the hypothesis behind that is, actually, people like to go back to the things that they've seen, but actually, now, we want to think about, "Well, what about if we suggest to them the next hotels that they should be looking at based on their viewing behavior?" And this is really where this real-time personalization comes in. We know that customers tend to browse multiple different hotels, and then they actually come back to the homepage to potentially search for their next holiday. And that's where we can start surfacing them the recommended hotels based on their profile overlaid with other profiles of similar matching style. Maybe they're from the north of England, they like to go to Spain, therefore we'll show them a grouping of hotels in Spain. The first attempt is going to be to personalize the homepage in that recently viewed view so that, actually, we can try and suggest potentially different hotels than the ones that they'd been looking at, maybe ones that they've not found yet, and see what their behavior is around that.
David Annez (41:25)
I find that homepage recommendations are very powerful. I mean, Amazon is the king of that, right? And I think that for holidays, it's an interesting one because unlike a purchase and a recommendation after a purchase, you don't really buy a holiday after a holiday, a vacation after a vacation. Right? So, in this situation, we want to find, hopefully, the intersection of the best vacation for someone, but actually also the best vacation for us. Right? Be it from a margin perspective or from a [inaudible 00:41:56] perspective. And I think that's where it gets very interesting.
David Annez (41:59)
The second part that we're going to do with personalization is that the recommendation is actually going to be when customers don't find the holiday, the vacation they want. So, when you're searching, maybe you've searched for different criteria and maybe that criteria isn't in our library, maybe we don't have not hotel available on that day, or maybe we don't have any flights, but what we'll use is we'll use the profile that you've built, that we built up of you when you've been searching to then live to recommend you some different hotels that potentially other customers have been looking at and have booked so that you don't get that dead-end feel of, "I can't find my perfect vacation." Actually, here's a bunch of other options that you can take a look at and surface that.
Eric Dodds (42:49)
Yeah, absolutely. And talk us through the stack a little bit. So, I know that you use Redis and we know you're using RudderStack for the event stream, but are you going to use Spark in the equation to drive modeling and leverage the data that you have in other data stores? What's it going to look like?
David Annez (43:12)
Yeah. So, I think much like the attribution model piece that we add a diagram of, I believe that what we're going to be doing is using Spark to model that stuff out and then push that into Redis. I think we have a couple of other things that we're looking at. Google does have its own, but I think we've heavily invested in Spark. I think we really love its flexibility and its versatility, so are going to use that to pull data from different data sets and then use that with RudderStack and warehouse actions to pull that back into Redis, which I think is going to be super powerful and build that great profile view of our customers, which actually we've never had before. We tend to just treat customers as unknown anonymous users on the site, and they'll be a big upgrade on that.
Eric Dodds (44:00)
Sure. That's really exciting. And one thing I do need to say is, I hope one day I get to a point where I book a holiday and then right after I book another holiday.
David Annez (44:12)
We hope people get to that, as well. Maybe right after COVID, that's when people really feel like they need a holiday right after the next one.
Eric Dodds (44:21)
Yeah. When restrictions lift, I'm sure you're going to see a lot of... Well, actually, your anomaly detector is going to be very active, I would guess for a while.
David Annez (44:31)
Yes, it is going on fire right now [inaudible 00:44:35] on a daily basis. We're hoping that people will be booking multiple holidays in the future because of the current situation.
Eric Dodds (44:44)
Sure. Great. Well, that is really exciting stuff. Let's quickly do a Q&A, so feel free to jump into the live chat. Let's see. Okay. Interesting question. So, you're the head of engineering and you interact with data engineering and data science. The question is how do you structure those teams in terms of collaboration being the head of engineering?
David Annez (45:16)
Yeah. So, I think what we tend to work on are... So, I think the data science team works on quite a few things that never really have any impact on the front end and all of that. So, data science, for example, will work very closely with PPC on models to figure out how to optimize our bidding. But when it comes to working with the customer, we try and always build our initiatives around solving the problem for the customer and how does data science fit into that? So, I think the perfect example is actually what we're working on right now, which is this live sort order, and that is very much a collaborative effort where a couple of the data scientists are working in weekly sprints with the engineers from digital product and the platform team to figure out how we're integrating the sort order and how also we're thinking about building it for the future.
David Annez (46:13)
I think there's a lot of different ways that you can enhance the sort order capability by giving it things like geolocation from your CDN, et cetera. So, we generally work in cross-functional teams and we'll, I guess, surround ourselves in... We'll create a team to go after a specific project once we need data scientists, et cetera, and go after it like that. I think, generally, it's a very collaborative process where each of the teams will get together and have an actual team of teams stand up and figure out what they need to work on next to integrate all of the different moving parts.
Eric Dodds (46:56)
Sure. All right. Another question here, which I think is a great question, is how do you decide when you need to move from a SaaS tool to building something internally?
David Annez (47:12)
I think when you've asked the question like four times in a short period of time is probably a good indicator. I think it's when you start asking yourself that question, it's time to build out the case for and against it. I'm usually of the opinion of buy versus build unless you're in the business of doing that. And I think that when it comes to these sort of things, it's really about when you start seeing the upper limitations of what you have, and you're starting to grow over a certain size, then I think that that's when you start looking into that warehouse-first. But really, you can always start with it. I think that nowadays the entry-level of it and the ability to just start from doing that is great. Figuring it out, I think, there's a balance of cost, resource, and then also thinking two steps into the future, are we going to need this in six months’ time? Well, if you are, start thinking about doing it now, because it takes time, but it's worth it. I think the output of it is invaluable.
Eric Dodds (48:23)
Sure. Yeah. That's interesting... We had a webinar around how, a couple of weeks ago, around how the data stack changes over time throughout the life cycle of a company, and one of the interesting things that came out of it was that the barrier of entry to start building this stuff now has dropped drastically even in the last couple of years, from a data infrastructure standpoint and, really, even from a data warehouse standpoint. The cost of actually building an own infrastructure pipeline, warehouse-first approaches, both from a tooling standpoint and from a resources standpoint, it's a great time to be doing it just because it's become so much more accessible.
David Annez (49:08)
Yeah. Yeah. I couldn't agree more. I think the entry-level now to all of these things us just... Yeah. It's so easy and I think it's just going to get easier. If you're starting from scratch, I highly recommend that you consider that warehouse-first model because it will give you flexibility in the future. And if you don't know anything about analytics and all that, it's probably a good time to learn a little bit more about how you can pull data from places like BigQuery and analyze it.
Eric Dodds (49:37)
Sure. All right. Well, we are at the time. We don't have time for any additional questions, unfortunately. But we really appreciate you joining us on the webinar, David. Tons of interesting stuff here, tons of learnings. And let's hop on another webinar once you get some of the personalization stuff built out, and we'll walk through that stack once it's up and running.
David Annez (50:07)
Yeah. Sounds great. Look forward to that. And thank you for having me again.
Eric Dodds (50:11)
All right. Talk soon, David.
David Annez (50:13)
Bye.