Events

Replacing Google Analytics With RudderStack

What we will cover:

  • The limitations of Google Analytics
  • How to migrate to RudderStack to replace GA
  • How to replicate the built-in GA analytics functionality
  • Tracking the full customer journey with RudderStack
  • Leveraging the full power of RudderStack to capture data about users from the first time they view your website to their usage inside your product
  • Tracking ad campaigns and conversions with RudderStack
  • Q&A

Speakers

Eric Dodds

Eric Dodds

Senior Director of Product Strategy

Alex Dovenmuehle

Alex Dovenmuehle

Co-Founder at Big Time Data

Alex is obsessed with driving business impact with data and automation. He's always looking to create automated and scalable business processes, supported by software and data, that allow businesses to scale their operations.

Transcript

Eric Dodds (00:00)

Thank you for joining us. We'll do some intros here, but we're going to talk about how to replace Google Analytics with RudderStack, a very provocative subject that I'm really excited to dive into because I've used both tools extensively, even before I joined RudderStack. So here's what we're going to cover today. So we're going to talk about the limitations of Google Analytics, which some of those may seem obvious, but some of them may not. And then we'll also talk about what Google Analytics is good for because it is really helpful for some particular things in the stack.

We're going to talk about how to migrate to RudderStack to replace GA. So Alex has done this multiple times, so he'll talk us through that. And then really, I think the most exciting part is unlocking the ability to track a full customer journey with the data you get through RudderStack. And then we'll wrap it up by talking about tracking ad campaigns and conversions, you can get some really, really detailed reporting on cost per acquisition, by the campaign, even down to sort of the ad level if you want, which is cool. 

Quick introductions. I will give Alex a monumental introduction. But Alex and I met, gosh, almost a year ago now before I joined RudderStack, and you were at Mattermost, and you had joined Mattermost, you'd been there, I guess, a couple of months after coming from Heroku and did the whole RudderStack rotation at Mattermost, and now you run a consultancy called Big Time Data, where help people build out their data stacks, any important points that I missed about ...

Alex Dovenmuehle (01:43)

No, I think that's it. I would say that our main thing at Big Time Data is just that, what we came to realize is like, everybody really has all the same problems with data, understanding it, getting the value out of it. And so we, me and my co-founder met at Heroku, we moved to Mattermost, built all that out at both those places and then at Big Time Data, we're just kind of taking that show on the road. And like, we just want to help everybody make sense of their data and really unlock the value of it.

Eric Dodds (02:16)

Awesome. And I am Eric, Director of Customer Success at RudderStack. And let's dive into it. So Google Analytics limitations and use cases, this is probably a really familiar view to anyone who's used Google Analytics. And it's kind of funny, I just pulled an image from Google image search and of course, Neil Patel is like the top image search result. Makes sense. He's like the SEO, optimization master.

Okay. I'm going to start and do the first couple of bullet points here, and then let you take it over Alex. So first of all, Google Analytics is not really good for user level views of data. Now, we'll talk about GA4 in a minute, but you're mainly looking at aggregate data. Of course, you can sort of drill down into the data a little bit and look at events, but in terms of tracking an individual's customer journey, or getting deep sort of attribute level information, Google Analytics just doesn't really do that. And the limited functionality it has isn't that helpful, at least for moderately complex use cases. And raw data. Actually, Alex, you want to talk about this? Because I think you've felt the pain of this. 

Alex Dovenmuehle (03:34)

Yeah. I mean, we've had some times Google Analytics. But the thing that always happens with Google Analytics is like, it provides you that initial like, oh, wow, this is so cool. It gives me sessions, and pages, and users, and stuff. But then it's like when you want to get that level deeper and start really unlocking the whole power, and we're going to go into this later, but to map all this to like your full customer journey, especially once they get into your product and stuff, not only can you not get the raw data that would give you the ability to like see exactly what that user journey was, but I mean, if you did, it's like crazy expensive. And yeah, it's just been a nightmare to deal with.

Eric Dodds (04:19)

Inflexible event structure. This is really simple. So you have sort of action and label for events in Google Analytics classic, and you sort of have to align all of your event trackings to that. A lot of people track events that they're sending to other properties, sort of like Facebook or Google conversions separately than you're doing Google. And so you have multiple sorts of events happening, it just becomes really messy, and they sort of force you to align to their structure in Google Analytics classic.

And then, full-funnel reporting is complicated cross-property and cross-property applies both to Google Analytics properties, but also technical properties. So for example, we have rudderstack.com, which is our marketing site. And we have app.rudderstack.com, which is our actual app that users log into. And so having all of that data in one, Google Analytics property gets really noisy, but if you separate them, you're looking at different reports for different parts of the funnel. And so it just creates challenges, and it's pretty hard to keep clean.

Dashboarding, and custom reports, aren't that great on GA classic, we'll talk about how it's a little bit better at GA4. And then Alex, talk about when people upgrade sites or create additional GA properties, et cetera, that gets real messy, real fast. 

Alex Dovenmuehle (05:44)

Yeah. And so for an example of this, at Mattermost we had www, the marketing site, mattermost.com. Then we have like docs, you got support, you've got the product. We've got sort of a cloud interface just for buying the on-prem license, that's all different. And so it's like if you have to split all this stuff up in a group, all this stuff that you're looking at like you're not going to be able to really get a true grasp of what's going on across these things.

And really, like tracking them across in your marketing site, you're going to have things that link to your docs and vice versa and all that kind of stuff. So it's like, what's really going on here? Google Analytics isn't going to tell you, so that's where RudderStack's going to come in and make it a lot easier for you. 

Eric Dodds (06:34)

I think one practical way that this problem shows up is, you just keep dumping more and more ops time into really becoming an advanced Google Analytics implementation expert. And it's sort of a never-ending challenge, and there's a point at which there are diminishing returns. Let's talk about what Google Analytics is good at. It is great for aggregated views that individual users who want self-serve analysis can leverage. So content performance, social media teams, content teams, let's look at the time on page, the number of blog posts viewed per session, incoming traffic, and time on site from particular social channels. That's where out-of-the-box sessionization and other components like that are really, it's just really helpful.

And frankly, just really easy. You put the script on the site, if it's pretty clean and you're looking at one particular property, it's a great self-serve analytics tool for those purposes. And then the other point I would make is that, for aggregate eCommerce analytics, it's actually pretty good because they include a lot of the eCommerce spec out of the box, and they have the basic funnel built-in, you can get really advanced and there are better ways to do more complex eCommerce reporting. But for a lot of basic eCommerce use cases, it's pretty dang good for your basic funnel optimization and understanding revolves around it.

Again, the user-level stuff, you can't really get that deep, which is where you can actually sort of uncovering things like lifetime value, but for the aggregate view, it's pretty good for eCommerce. Anything else that comes to mind, Alex, that I didn't mention?

Alex Dovenmuehle (08:31)

I was just going to say, I mean, we had another webinar about the data stack journey and the blog post that I wrote about it, too. And this is where like, no tool, most tools have some value. And obviously, Google Analytics does have value just like you said, in this specific case. And it honestly even has some value, again, if you're just looking for aggregate views of this stuff, it's really, once your sort of past that point on your data sector needs where it's like, okay, like cool, we have some aggregate views of stuff, things are starting to get too complicated for Google Analytics to come, like to make sense of it anymore, like what now?

And I think the, we're trying to get across here is like, once you hit that inflection point, that's like, hey, we need RudderStack. Let's use RudderStack to solve this problem. 

Eric Dodds (09:22)

Sure, yeah. I think there's a point at which you're asking Google Analytics to do things that it really wasn't designed to do.

Alex Dovenmuehle (09:28)

Yeah. And then once you start getting into that as you said, once you start going into that, it turns into a, like you have to be some crazy ops guy and you're just pouring time into it. That's really not meant or not really getting you the value that you should be getting.

Eric Dodds (09:44)

Yep. Okay, so one of the questions that we discuss a lot when it comes to Google Analytics is, but what about GA4? So here's the hot take on GA4. It's a huge improvement on flexibility as far as events go. So it tracks events actually more similar to the paradigm of sort of a RudderStack esque approach, where you have an event and then you have attributes or properties related to that event, way better there. And you can get away more detailed look at sort of specific user behaviors, which is good.

And then their custom reporting and dashboarding is way better, you can get way more details, it's just way more flexible from that standpoint. So I think what will happen with GA4 is, it will take earlier companies further in their ability to sort of leverage the tool to get insight. But there are still major challenges, especially for companies that scale or want more granular data. So you still can't get the raw data. 

And then this, I think, is one of the big things with Google Analytics in general for most companies, is the data is actually still trapped in Google Analytics and can't be activated to support other business functions. So it's really just sort of a dead end, if you will, for analysis, which is fin, but once you have that analysis, if you don't have the raw data, and the ability to sort of activating that, there's a huge amount of manual labor that goes into that. And then, of course, it's actually still really hard to do the full-funnel reporting cross-property, which is sort of a fundamental challenge.

Okay. How do we migrate to RudderStack to replace GA? I thought, first, what would be good is to talk about the why. And I'll start by saying, it's really not either-or, RudderStack or Google Analytics. Most companies run both of them, they just use Google Analytics for what it's really good at, which is sort of the aggregate self-serve side of things. And they use RudderStacks events stream. But Alex, walk us through some of the other reasons, why would you sort of migrating from using Google Analytics as your main detailed analytics tool to RudderStack events stream?

Alex Dovenmuehle (12:16)

I think it's really, once you get to that place, especially for B2B kind of sass companies, especially. You have a marketing site where you're trying to convert people to sign up for your product. And then what you want to know after that is, well, what are they doing in my product or are they being successful with my product? And then you can do so many different things. I think fundamentally, it almost comes to the idea of RudderStacks warehouse first approach, which is something we've always been fun of, and try to preach the gospel of.

It's like, if we can get all this raw data into our warehouse, then that really unlocks the full power of it, as opposed to oh, I've got all this date over in Google Analytics and I've got other dates over here, and I don't know how to mesh it together to give me anything. So by having it in a warehouse, that's where you can use like the DBT models, which we'll go into, you can put BI tools on top of it. And then with RudderStacks warehouse actions, then it's like, okay, well, I have this data, I've transformed it, I've analyzed it, now I want to like go push it out to other tools like Salesforce or other CRM data. Mixpanel, I mean, the possibilities become a lot more endless than, oh, I can just look at like my marketing sites performance for Google Analytics right here. 

The other thing I would say too that is important is that ability to have more of that context on the events that you are sending to RudderStack, is then all of a sudden you can have like product identifiers, and like, what user is this in my product database and things like that, where you can really start mapping together all the things. And it just makes it ... it unlocks the power. 

Eric Dodds (14:11)

Yeah, the payloads can be much, much richer. I think that's a great point, right? Whereas with Google Analytics, you got this a little bit better in GA4, but with a RudderStack payload coming from your website, you can add any sort of dynamic properties, right? We often say if it's in the DOM, you can pass it through as a property. And so you can get as much rich diagnostic information about any behavior as you want, and then use all of that downstream in interesting ways.

I think one other point that and I remember sort of going through, this is, gosh, five or six years ago, I've been going through this exact process of wow, you don't have to be beholden to just sort of one analytics tool that you can't get data out of. And I think one of the big things for me was realizing, oh, wow, like I can track an event once and send it to Google Analytics, but then also send it to amplitude for products. I can also send it to the Kafka Stream that feeds whatever data science [clauser 00:15:18] they're working on. I can send it to, et cetera, et cetera, which is really cool. 

And that was really sort of paradigm-shifting where it's really like we said before, not Google Analytics or RudderStack, but you can actually do both. And you can use RudderStack to feed Google Analytics, but then it can also feed any other tool in your stack, including the warehouse. And so you can use point solutions for analytics for different teams, right? So if the product uses Mixpanel, just route the feed to Mixpanel. And then, of course, you can see down here on the slide, we have a snowflake table, and then you can go into a tool like Looker or if you're ... A lot of companies are using Data Studio, which is getting better and better, for an earlier stage. 

Data Studio is pretty powerful. But then you can have the raw data and you're not beholden to the prepackaged reports, you can do whatever you want with it to meet the needs of your business. 

Alex Dovenmuehle (14:11)

Okay.

Eric Dodds (16:18)

Okay. Let's get technical.

Alex Dovenmuehle (16:22)

Let's do it. So I think the thing that people probably are thinking themselves like, wow, guys, it sounds so amazing. I'd love to do that. But how do I do it? Right? So I think it is pretty straightforward. Obviously, I come from an engineering background. So to me, this is all pretty basic. But I mean, it's as simple as basically how you're loading Google Analytics itself, right? It's a similar kind of thing, include some JavaScript on your page. And I mean, just out of the box, that will automatically start sending those page events, which will give you, honestly, it gives you a lot of value, just even that one thing. Like I mean, it's literally like, I don't know, three lines of code or something that you have to put in.

I think the thing that starts to really take Rudder analytics over the top is when you start adding more of those properties and context to those events. So you can add those track calls to like you have this example, the form to submit, or when they click a button, or whatever the conversion target might be. You can see there, you're getting all sorts of stuff into that, which again, that feeds your downstream warehouse and then you can do whatever you want to that. The other thing I will say is for the sort of SAS model, it's like when that user does sign up, and like, or say, identify themselves really in any way, like if you capture their email or whatever, that's what you're going for. 

Call that identify call, and then we'll get into some of the DVT modelings that you can do. But you can then start attributing, like not only the first time that that user visited your site when they were anonymous, but now you can tie that together to everything they've done after they identified themselves. So you really have this full picture of like, well, how long did it take this person to convert? And how much, how are they using the product afterward? So it's really, really powerful. And I think the other thing about it is, not only can you use RudderStack on all your different properties, so then you're tracking essentially the same person across all those properties, whether it's like your dock site, your support, if you're running Zendesk or whatever, but then also like, once you get into your product. 

And even like, the thing that I think ends up becoming even more interesting is when you then have events that are sent from your back end servers, which is sort of over the top for this. But I think that can also unlock a lot of power as well as track even more events. RudderStack is the kind of thing where it's like, once you kind of like get it in there, you're like, oh, we should use it on this thing. We should use it on this thing, we should use it for that thing. And so it sort of just kind of spreads around everywhere. And honestly, that's for the best because the more data that you get into your warehouse, then you can really unlock the power of it. 

Eric Dodds (19:15)

Sure. Yeah. And I think one thing that I think maybe helpful is I'll just quickly walk through this payload here. This is a track called for a form submit. So a track call would be the equivalent of what Google Analytics would treat as an event. So you can see that this is tracking a form submit, and we're packing a bunch of information in here. So the page that it's on, so we're calling document title, they got the page, the page URL, and some of these are included in the payload by default, it's just easier to have them in the properties object. We're grabbing the form ID. So on this particular website, there are form IDs, that sort of identifies a blog subscription as a demo request, et cetera.

And then one other interesting thing is that you can see there's a property there called a label. And by default, RudderStack will send that to Google Analytics as the label. So we're using the form ID as the label. And so this is where you can see earlier we talked about it, it's not an either-or, we actually use this payload to feed Google Analytics and populate an event in Google Analytics. And the label is the form ID, which is pretty cool. And then, there are various other properties there. And then we'll talk about campaign tracking, you can see that we're actually pulling the UTM parameters into the form submit payload as properties. This is really, really useful. 

In a lot of cases, people will do things that are hard to scale, like create hidden fields and pass UTM parameters in the hidden fields, so that there are written in the contact record. But then if you have the users, if there are multiple actions, they get overwritten. So then you have to do logic and it becomes really crazy. Whereas one thing that's just so convenient, especially when you think about a user level, or sort of the events in GA not being tied to a specific user is, you have each distinct form submit that someone's performed with the UTM parameters so that you have the attribution for each of those same actions. So when you think about building multi-touch attribution models, et cetera, it becomes way easier when you get the raw data, and then you can package all of the information you need in each payload, which is really cool.

Alex Dovenmuehle (21:38)

Yeah. And one last thing I'll say about the default parameters that end up getting sent on a lot of these, you also get all the like browser version and browser type, and all that kind of stuff. So then you can have all those graphs that are like, oh, we have 40% of people are using Chrome, and they're using this version. And these people are using their mobile phones and all that kind of stuff. So it sort of gives you like, like literally, you have the raw data of everything that Google Analytics has, it's just now you actually have access to that raw data. And obviously, it's not costing you a gazillion dollars.

Eric Dodds (22:12)

[inaudible 00:22:12] page view and every single behavior, which is actually really interesting. You can think about Google Analytics, you do that by sort of creating lots of drill-downs, but if you think about the context of having all of that in a set of warehouse tables, simple pivots can give you much more dynamic reporting, which is really cool.

Alex Dovenmuehle (22:34)

Yeah. Okay, so basically like we said, like with the basic RudderStack setup, you really have all the data that Google Analytics is capturing. So you're capturing the same exact stuff. And this is where I want to get into sort of the warehouse first approach and using DBT, which stands for the data build tool, which is a tool that we're always espousing in tandem with RudderStack. These are kind of like the things that we always come to our clients with, you guys should be using DBT and RudderStack.

RudderStack themselves have created a couple of packages related to all this stuff to make it really easy to build the data models with DBT and these are vetted, created by RudderStack, so you can get the ... You take this raw data that's coming from RudderStack and then all of a sudden, on the other end, you run DBT through these packages on it, and you have this actionable valuable data. So you have the customer journeys, this sessionization, and the identity resolution one actually is a pretty big one as well. And so by, why is it such a hard word to say? 

Eric Dodds (23:52)

It's hard.

Alex Dovenmuehle (23:53)

Sessionization, it's just a hard word, I guess.

Eric Dodds (23:55)

It is.

Alex Dovenmuehle (23:57)

So sessionization is basically like trying to tag a set of events that a user has completed into like one session. So then you can start doing session analytics that says like, well, how many average page views do we have per session? How long does an average session last? And then you can even get into some of that funnel stuff that you're showing that Google Analytics does as well, where it's like, this session ended with cart signup, or whatever checkout thing.

And then the identity resolution one, I think, is the real key, because this is the one that allows you to tie together all the events, even when they were anonymous to when they signed up for your product and you actually haven't, or if they submitted a form, but when they identify themselves now you can go back and look in the past at what were all the things that this user did before they converted, which can be super valuable. And then did you want to talk about the attribution one?

Eric Dodds (25:00)

Yes, actually, let's talk about that. I think I have that towards the end, [inaudible 00:25:03].

Alex Dovenmuehle (25:00)

Okay, all right.

Eric Dodds (25:05)

But what I do want to ask, so how much work is this? Because I'll play devil's advocate a little bit here, because my background, I come from marketing. And so my kind of gut response would be yeah, sounds awesome to have all the information you're talking about, but man, that sounds like a ton of work to like get DBT up and running, is this something I'm going to have to repeatedly look at for sessionization? How much work is it to sort of get this basic, get this foundational ... Really, what we're talking about is this foundational set of derived tables in the warehouse done so that you can actually start to have an analyst do some of the reporting pieces of it.

Alex Dovenmuehle (25:52)

Yeah. So I wish Rachel was here, she would be able to totally kill this because she's a little less technical. But I think the thing with, DBT has their own cloud, SaaS product, so you can just sign up for that. It's really easy. Connect it to your warehouse, that should hopefully be relatively easy. And then downloading these packages that you guys have already created is also, I mean, read some docs, it's fairly easy. And then, you can just leverage DBT.

So I mean, in my opinion, it's not that bad. And I think the beauty of DBT has been that they've really designed it for more of the analytics-type person instead of the hardcore engineer-type person. So it's not like you need to go hire some crazy expensive engineer to do all this stuff, you can have an analyst, and really anybody who has even a relatively basic knowledge of SQL will be able to accomplish this stuff.

Eric Dodds (26:56)

You're talking about a couple of days at work, but after you get the data piping into the warehouse, you're talking about a couple of days of work to set up the sort of derived tables that allow you to build on this amazing recording.

Alex Dovenmuehle (27:09)

Mm-hmm (affirmative). And then I mean, connecting that stuff to your BI tool, whether or not you're using Data Studio or Looker, obviously, I feel like if you're using Looker, you probably have somebody who knows Looker. So like there'll be an able to hook that up easy peasy. But even with Data Studio, it's like if you just point your BI tool, like some of these tables that the RudderStack packages have built for you like it's going to be relatively straightforward to build visualizations and things.

And then, the nice thing is, again, like RudderStack will grow with you. So it's like, once I add more like product data and I want to get even more advanced with how I'm capturing these events, and like, the metadata that I'm sending with them, you can build upon what you have, instead of with Google Analytics it's like, well, this is cool, but I'm stuck. Like, I can't really go any further anymore.

Eric Dodds (28:10)

Yep, I agree. I would say the best practice that we talk about a lot with our customers is trying to build all of this just so you can look at maybe aggregate metrics that you're already looking at and Google Analytics might not be worth the effort, because you can already look at a lot of those things. I think the real value comes when you start to look at things like the customer journey and identity resolution pieces.

Alex Dovenmuehle (28:43)

Yes, the customer journey. This to me is like the real power of RudderStack. And I've kind of talked about it already before. But it's really like once they get into your product, you've identified them as a user. And now you can see all of the metrics about what happened before and what happened after because then you can start doing all sorts of, I mean, once you get to, you can start taking this in crazy directions, where it's like, oh, well, they looked at this docs page and this marketing page. And when they converted, then they used the product in some specific way. You can do a sort of cohort analysis. And then you can kind of figure out, oh, these are the people that are coming into my product, I should really target my marketing more at them, or I need to spend more on them.

And also giving you, which I guess we'll get into as well as the ad campaign attribution, and am I spending my marketing dollars in the right place? Do I have the right content that's converting? I think the other thing is you can start saying, am I getting the right users that are I to have? So yeah, the other thing I would say is, especially once you, if they sign up for a cloud product or something, if they sign up for your cloud product and you create them a user record in your database, send that internal ID in your identify call, this just makes everybody's life a whole lot easier on the back end, because that's where you can really start tying together all the events from your internal product data to your product event data to your marketing webpage data. And so you have this whole thing across everything that now it ties together. 

And then once you use a RudderStack to send the data to Salesforce or some CRM, HubSpot, whatever, you have all these identifiers that map up so you can tell when I'm talking about this user in HubSpot, I'm talking about this user of the product. I'm talking about this visitor on the webpage. 

Eric Dodds (31:09)

Yeah, that's pretty amazing. When you think about, sort of transactional data from your app database, event data from product usage, and then, of course, sort of the top of the funnel marketing journey to the website. Being able to have all of that information in the warehouse, and then sort of reconstructing for your most valuable customers, what was the journey that actually led them there, and understanding those pathways is super powerful.

So one example of that is discovering things like, well, the marketing team was really excited because these paid search campaigns had a really efficient cost per lead. But when you tie that together with transactional data over the lifetime of usage in the product, then you can see, well, those users’ over-index is churning after three to four months. And so in reality, the money that we're spending there was losing money. Because even though it was sort of, initially it was efficient, they're churning at a much higher rate. It's impossible to do that with Google Analytics. 

But once you unlock some of those insights, it can be really, really powerful. Okay. Yes, this is an actual report from Looker, I just pulled it up at random because we use our own DBT model for this ourselves, which is kind of cool. And so there's just a bunch of different stuff on there. But, why don't you talk about this, because you've got way more in this recording than I have.

Alex Dovenmuehle (32:46)

Yeah, so what we do at RudderStack, the events that we're getting from the web pages, and even when we're trying to track conversions in the product pages, you have all the UTM parameters, you have the Google Ads campaign IDs. We basically have written a bunch of reports, we also leverage Looker and DBT. We should really open-source some of that stuff, more of that stuff. But anyway, that's where you can really start looking at leveraging the Google Ads, like metadata about the campaigns. And then you can join that together with, well, how are the campaigns actually performing, which gives you some, in the basic form charts like this, but then you can really do even more with it.

And then the other thing on the back end, it's like, okay, what if my marketing team, they're like, we have to use Google Analytics, we have to do all this stuff because I want to, it's like, all right, fine. That's where you can sort of co-exist with Google Analytics as well. And so you can use RudderStack warehouse actions to just, okay, we'll send you the, maybe it's a conversion event, or whatever it happens to be that they're looking for, to send that data back to Google Analytics. And that's actually sort of happening at Mattermost at as we speak, is what we're dealing with right now. 

So it's just nice to be able to have that functionality already there to assuage the new, some new marketing person comes in is like, no, we're a Google Analytics shop, and that's what we're going to do. It's like, okay, fine, but we're going to still use RudderStack. 

Eric Dodds (34:30)

Yeah, no, I think that is, having the flexibility from sort of the data engineering standpoint in some ways people can, data engineering can have their cake and eat it too from that standpoint. Where the downstream team wants to use this and we can coexist peacefully for that because we can sort of get data in and then get data back out, which is really interesting. The other thing is from a traditional ETL standpoint, which I know we're talking about event streaming Google and Google Analytics, but you can use ETL, like RudderStack cloud extract to pull Google Ads campaign performance, Facebook ads, et cetera, campaign performance data from a variety of different ad platforms, Pinterest, et cetera, into the warehouse as well.

Then you have your spend data. And because you have all these UTMs, you can start to do a really interesting cost per acquisition analysis by a campaign which is really, really cool. And several of our customers just really need setups in terms of sort of, and actually, I would say the real utility here is breaking out of the need to rely on what is almost always over recorded conversion numbers in Google and Facebook which, that's a whole other subject we can talk about. When you have the raw data and the spend numbers, and you align that in your warehouse, you know exactly how many people came from this campaign. 

And you can really get your true cost per acquisition, and sort of triangulating against the platform just gives you much, much better insight into actual real performance, as opposed to, like, oh, this seems high or this seems low and ROI conversions getting to Google ads, et cetera. You already know. And you can sort of track your own conversions by triangulating the data in your warehouse, as opposed to relying on the Google Ads script sending that. You still want to do that to feed their algorithm. But if you want to know the true cost per acquisition doing that in a warehouse with all the data is really, really cool.

Alex Dovenmuehle (36:45)

Yeah. And that actually just made me think about how RudderStack sort of starts to change how you almost think about all of this stuff. You used to have like, oh, well, we need Google Analytics for this, and we need Mixpanel for that. And we need just all these tools, you just have hundreds of tools. Oh, I forgot my login to this one, I do this one and all this stuff. It's like RudderStack, having so much of this functionality, sort of just in this one platform, it makes it a lot easier to choose, well, we'll just use RudderStack for this. I need my Google campaign ad data into my warehouse, use RudderStack for that. I need stuff synced out to Google Analytics, use RudderStack for that. It makes your life a whole lot easier, just from an operational perspective.

And also again, having the warehouse first approach really allows you to unlock all the value of your data because it is your data. I think that's the biggest thing between sort of the segment and the RudderStack is like RudderStacks philosophy is it's your data we're not trying to lock it behind anything. We want you to get the value out of your data.

Eric Dodds (38:04)

Before I joined RudderStack, I think we've talked about this, but I was running a consultancy that was sort of the more marketing-ish version of big-time data but really focusing on the warehouse and data. And I remember actually finding RudderStack from way long ago, their initial Hacker News post. And I was like, yes, this makes everything I'm trying to do for my clients, it makes it 1000 times easier. Cool. All right. Well, we still have plenty of time for questions. And we may be able to give people some time back depending on how many questions there are.

So feel free to type a question into the Q&A box. Or if you want to raise your hand, I can unmute you and you can ask a question. We're happy to answer any questions. I mean, we've both used Google Analytics pretty extensively and RudderStack pretty extensively so. All right, Dan, I'm going to allow you to talk here Dan.

Dan (39:13)

Hey Eric.

Alex Dovenmuehle (39:15)

Hey Dan.

Eric Dodds (39:15)

How's it going?

Dan (39:17)

Yeah, not so bad. Great presentation both of you, superb. This is kind of exactly along the lines that we're going out of pocket. I'm using RudderStack, totally loving the experience. And it really does everything that we kind of wanted it to. I've got a little question for you, Alex, in terms of how you go about, we're talking about how we're reinventing the wheel and doing what Google Analytics is already doing. And the time that it takes to do that sort of stuff.

I know that we've had particular struggles with doing referrer categorization to kind of match what you're getting in GA. Have you got any tips as to how to go about doing that in an easy sense. So we get that kind of paid search organic search type distinction.

Alex Dovenmuehle (40:06)

That was actually going to be one of the ones I was like, that's one thing Google Analytics is good is the referrer stuff.

Dan (40:12)

Yeah. They obviously have a huge data set that they can just go-

Alex Dovenmuehle (40:17)

See that's the thing that I think in some ways, Google Analytics is kind of cheating. Basically, you wonder how much other information they have about that person skulking around that they can use to leverage. I think with the data that we've been able to do and leverage, I think the referrer stuff is, I don't want to say good enough, but good enough. that's where we sort of landed on it.

Dan (40:52)

Do you kind of aggregate that across all of your clients so you're able to build a bigger data set out of what you've got going on there? And if so, is this something that we could look like a community to, like open source to be able to provide that as an additional tool to RudderStack users looking to kind of categorize URLs?

Alex Dovenmuehle (41:15)

That's a really interesting idea.

Eric Dodds (41:16)

That is a really cool idea.

Alex Dovenmuehle (41:20)

Yeah. Because you could have it stored in some shared, I don't know, snowflake data set, or BigQuery.

Dan (41:30)

Yeah, that's something I've been thinking about for a while. And I think that would really help us. And obviously, we've got a huge amount of data that we're processing and can definitely contribute towards that sort of stuff. And with the kind of data science experience that we've got as a company, we could really feed into that quite well. Something worth maybe chatting about offline.

The other thing I was going to mention is the point that you made Eric about the PPC data, and validating real-world data against Google Analytics, and this is such a huge thing, you blindly believe what Google Ad platform is telling you and what Google Analytics is telling you and what you find in the real world, or certainly what we found is that that data is massively different. It's hugely inflated on Google's side, and RudderStack has really allowed us to get some transparency in that data and even feedback to Google and say, look, this isn't real. What you're giving us here isn't real. And request refunds from ad costing. So loads of problems on that front and money. So thank you, guys. Amazing.

Alex Dovenmuehle (42:39)

Thank you.

Eric Dodds (42:42)

Absolutely. Well, thanks, Dan. And I'm going to meet you and then jump into a couple more questions here. We've got some questions coming in which is great. Disabled talking. Sorry, Dan. Okay, a question from George. How can RudderStack help in managing accounts and companies instead of solely users? Great question. And I'm really excited to tell you that the answer is pretty simple.

There's a group method that you can run to identifies what Alex mentioned, in order to identify a user, if you want to associate that user with an account, let's just use a simple example of maybe a form fill and there's a company field on the form fill, you can also run a group call where the company name is the group and then you can associate that user with a company. And we will, actually, I think Brooks is on this, we'll send these answers. If there's any sort of relevant links to documentation, et cetera, we'll send that in the follow-up email. 

So we'll send documentation on that group method as well, but it works great. So if you're thinking about sort of, like Salesforce account, contacts, et cetera, it works really well. Fabian, or Fabienne, is there a way to avoid ad blockers when using RudderStack? We just set up for product analytics. Cool. Excited to have you onboard, we'd like to get data on all our users. Yes, we'll include this in the follow-up email as well. Most of our customers actually proxy the JavaScript SDK and deliver it via their CDN. So it's essentially a first-party API call and it's extremely resilient to ad blockers. And we have very deep documentation on how to do it.

Alex Dovenmuehle (44:41)

Yep, it's pretty easy to set up. I did it for Mattermost, not bad at all. And I would honestly suggest doing that if you are going to do RudderStack, obviously, you can always change it later. But I think that's the way to go.

Eric Dodds (44:58)

We recommend that as the best practice. King Mike asked, does RudderStack do website visitor identification, or do you need GA or some other tool for that to pull the data in through RudderStack? We do. So every page view that comes in, we assign a unique anonymous ID to that user. And we associate all of their page view events, and then any other behavioral events, you track to that anonymous ID.

And then when you identify them, say they sign up for the product or do a form fill, you identify them. And let's say they sign up for the product and you have an internal ID from your app, that becomes their user ID. We associate the anonymous ID with that user ID and then in your warehouse, you can key on the anonymous ID to get a full picture of their events, both pre and post-identification. It's pretty awesome actually. It makes tying anonymous events to what becomes a known user way, way easier. Great question. And we can send some doc's on that as well, anonymous ID and identification. George had another question, with syncing user data across different apps in a company be plan use for RudderStack? 

Alex Dovenmuehle (46:11)

Yeah. I would say so.

Eric Dodds (46:15)

George, I'm going to allow you to talk, pardon me, to clarify that a little bit more, because he can do some of that already.

George (46:25)

Okay, can you hear me?

Alex Dovenmuehle (46:26)

Yep.

Eric Dodds (46:27)

Yes.

George (46:28)

Yes, well, the problem that we're seeing currently, so we have different apps across a company, we have a CRM, we have customer Success manager product, we have chat, Intercom communications package, and we have data that is produced by all these systems. And we have a lot of problems actually just making sense of it all and syncing it all. Okay. Well, we have currently will do that through automation, we have an automation wizard that pushes data from one application when something happens, pushes it to the other one.

And then when you need to push data from another application to another one, we use all this automation that will push data from all these apps too, they'll throw to each other. And there's very little coordination and sense of it all. So it's a bit of a big mess, sometimes we don't even know where the data came from, what updated it. I know that's more of a data administration issue than anything else. But yeah, it has to do also with technology, or how we were moving the technology from where it's created to where it's needed. So that's what we're considering that all the tools like RudderStack to help us on that. And another consideration is as well that we are very privacy-oriented. 

So for us to keep the data, for example, for us is a no-no, no, for example, just to track the user, when they arrive anonymously to our website, we try to keep as little data from there, and the less repositories where that data is stored, the better for us. Okay. So that's what where we're not at yet at the point where we want to centralize all the data and use a central repository for keeping all that data.

Alex Dovenmuehle (48:35)

Can I take this one?

Eric Dodds (48:37)

Yeah, absolutely.

Alex Dovenmuehle (48:38)

Okay. This is like what we do with big-time data all day. This is like the situation that everybody finds themselves in is they've got HubSpot and CRM and Intercom and Marketo and blah, blah, blah, blah, blah, the tools and they're trying to figure out how do I make sense of this? And how do I track things across all these various properties? I guess there are two parts to your question, the privacy piece, at Mattermost we're also very privacy-conscious in the way that Mattermost wanted to capture their analytics.

I would say that, obviously, if you're trying to not include really any identifying information, then there are at least some things that I would consider not private, like, the anonymous ID that RudderStack generates for that user with the... what pages are they viewing on your marketing site? To me that I don't know if that would be I mean, moving on, but that would be okay to me. That's not identifying really.

George (49:47)

Yeah, I mean, for us, it doesn't matter if we only identify the user until they create a trial with us when they sign the agreements and then we can track him appropriately across the app.

Alex Dovenmuehle (50:00)

Okay, I got you.

George (50:01)

Yes, so that's not a problem. Well yeah, we tried to reduce the amount of, data is stored everywhere because that's also another repository that we need to take care of, and that we need to comply with GDPR laws. And how long that data can be there and that deletion request. So yeah, that's the concern of minimization of where the data store. Yeah.

Alex Dovenmuehle (50:30)

And that's where I would say, honestly, having a warehouse first approach actually helps you. Because if you think about it, I'm going to pipe all this data into my warehouse. And that's where this data lives, the one place that I know like, oh, I know they converted to a trial. So I have their email in the stated warehouse. And it's like if you have to go back and deal with GDPR if you update that in your warehouse, and your warehouse is the source of where all this data about that user is getting synced to Salesforce or Intercom or whatever, then when that GDPR request comes in, you can null all that identifying information out just in that one place, then your reverse ETL processes will automatically propagate out all that sort of data scrubbing.

And then you don't have to deal with it in like, oh, this person came up with the GDPR request so I got to go talk to like eight different people because he's the admin of Salesforce, she's the admin of Intercom and blahdy blahdy, blah. Really think about the warehouse first approach, because I think this actually could help you.

And then it also lets you get your mind around all those interactions that they're having and all those different apps like you're saying. If you can use sort of that, I mean, you kind of have to coalesce around like an identifier for that user. But once you coalesce around that identifier, then you can say, oh, this is what happened, this user in Salesforce, this is what happened in Intercom, this is what happened in Zendesk. Now you have all the places that this user has sort of engaged with your company or product anywhere, sorry, I went kind of ham on-

Eric Dodds (52:09)

No, that's great. Just a couple of things I'd add there George is, one RudderStack doesn't store any data. So we collect it, we process it, and then we route it, but we don't store any of your data. So that's a core principle for us. We don't want to create another data Silo for you to deal with. And then as far as connecting all of your apps, one of the key values that RudderStack brings is you instrument the SDK for RudderStack, and then collect data one time and then send it to all your downstream applications. So that could be website behavior data, it could be form fills that create contacts, an Intercom, HubSpot, customer success, et cetera.

So all of that happens simultaneously just with one tracking event with RudderStack, but then you can also connect to your stack in other ways. So you can pull in data from SAS applications. So let's say you have Zendesk for customer support, you can actually pull in all of your customer support data from Zendesk into your warehouse and then sort of analyze are there certain types of customers that create more support tickets, et cetera. 

And then reverse ETL, our reverse ETL feature is called warehouse actions, you can actually take that data and then push it back out to tool. So let's say, you want to flag customers who have created more than five tickets as high-risk customers and you want to push that to Salesforce or something like that, you can use reverse ETL. And RudderStack is the set of pipelines that connects all of those. So it makes it way, way easier to make sense of all the different tools in your stack that sort of have varying data sets, we help you unify all that. 

George (53:49)

Okay, great.

Eric Dodds (53:53)

Awesome, great question. We had, let's see, feel free to keep your questions coming in. I know we're getting close to time here. Dan asked, is there a simple way to exclude bot traffic spiders, et cetera? We're currently pulling apart the user agent to find this. Alex, I'd be interested in your answer. We see a lot of customers do this with transformations, where you can take the payload and then block events. As you see this happen, you can kind of update the transformation as you see new unique bot traffic, and then our data governance API soon will support blocking as well. So you can handle that, a version control repo like GitHub which would really nice, but we'd love your thoughts on that, Alex?

Alex Dovenmuehle (54:36)

Yeah, we've actually just been filtering that with DBT on the DBT side. So I mean, I guess that's pretty much akin to doing it via RudderStack transformations. I always err towards give me more data in my warehouse, and then I'll deal with it. But I think actually, I mean, the Data Governance API is something I need to look more into, but that could definitely be something that would be super cool to start playing around with.

Eric Dodds (55:07)

Yeah, that's a great point. If you want all of the raw data, leveraging something like DBT is really helpful because you still get the data in the warehouse. But there also are some times where, if the bad track of traffic is just causing so much noise or sort of causing problems for downstream APIs, for SaaS tools or whatever, blocking it in the actual live event stream can help.

Cool. Any other questions? I know we're out of time here, but if someone who wants to sneak in one more question, we might be able to answer it. I don't see anything coming through. Thank you so much for joining us. We'll send a follow-up email with the information we mentioned, and feel free to reach out to us if you have any questions.

Alex Dovenmuehle (55:53)

Yeah, that was fun. Thanks, Eric.

Eric Dodds (55:56)

That was great. Yeah, always a pleasure, Alex. Thanks.

Alex Dovenmuehle (55:59)

Yep.