Google has recently completed a significant overhaul in web measurement, marking the end of Universal Analytics – a long-standing industry standard for the past nine years. The change encompasses more than just a revamped user interface, as analysts must now adapt to new dimensions, metrics, default attribution models and conversion modeling. In this article, we will delve into the intricacies of Google Analytics 4’s modeling process, providing a detailed explanation. Furthermore, we will conduct a comprehensive analysis, comparing its approach and attribution outcomes with an unbiased Roivenue AI Attribution model.

Contents

  1. What is the Attribution Problem
  2. Attribution Modeling in GA4
  3. Customer Journeys
  4. Models in GA4
  5. Missing Conversions Modeling in GA4
  6. Attribution Modeling in Roivenue
  7. Comparison of GA4 vs. Roivenue
  8. Conclusion

What is the Attribution Problem?

The concept of attribution, often referred to as the “attribution problem,” is a persistent challenge that marketers face. It involves determining how to allocate credit to various marketing activities. Let’s imagine a scenario involving your customer:

  1. The customer encounters your ad on Instagram.
  2. They visit your website for the first time by clicking on a Facebook ad.
  3. They return to your website through a remarketing banner and ultimately make a €100 purchase.

Now the question arises: which ad should receive credit for driving that purchase? Traditionally, Google Analytics (GA) would attribute 100% of the credit to the last touchpoint in the customer’s journey, which, in this case, is the remarketing platform. However, this approach tends to overvalue marketing activities at the bottom of the funnel while undervaluing other touchpoints. Similarly, rule-based models encounter similar issues, as they may under or overvalue certain groups of channels. On the flip side, these models are relatively easy to implement and understand due to their straightforward logic.

A more advanced approach to tackling this issue is the data-driven approach, where the entire customer journey is analysed, and a model attempts to estimate the impact of each touchpoint. There are various methods for consolidating customer journey data, as well as multiple data-driven models available. In our exploration, we will delve into how GA4 addresses this challenge and evaluate its performance in both theory and real-world scenarios.

Attribution Modeling in GA4

First and foremost, it’s worth noting that Google has been providing data-driven attribution modelling for quite some time, so it’s not an entirely new concept in GA4. However, the major shift is that it has now become the default attribution model, making it the most widely used among marketers.

The second notable change is that Google has discontinued offering other models, such as first-click and linear, apart from last-click and data-driven. This decision stems from the fact that Google found that these additional models were not extensively utilized by the majority of users. Consequently, analysts now have fewer tools at their disposal when seeking to comprehend the role of different channels within their marketing mix.

Customer Journeys

Google Analytics effectively measures website visits for users who have given their consent with collecting cookies and do not have anti-tracking mechanisms installed in their browsers. For gathering additional data there is an option to implement server-side measurement, although that is a topic for a separate article.

One notable feature offered by Google Analytics is the utilization of Google Signals for tracking customer journeys. This functionality enables cross-device tracking when users are logged into their Google accounts on multiple devices. However, it’s important to note that this feature needs to be activated as an optional setting and may not function optimally for iOS 14.5 and subsequent versions.

Models in GA4

Both models in GA 4 (last click & Data-driven) share a common approach in processing Direct visits. In the post-processing stage, the significance of Direct is minimized. When a customer journey includes both Direct and any other channel, all the credit is allocated exclusively to the other channel. As a result, in GA4, Direct receives credit only in cases where the customer journey consists solely of Direct visits.


GA4 Data-driven Model

In the following section, we will provide a detailed explanation of the underlying logic of GA4 attribution modeling. However, if you prefer to read the official documentation prepared by Google, we have included a link for your reference.

Google is analyzing the impact of each interaction on conversion probability, as in – what is the likelihood of a user to convert at any given time in his journey? It uses factors such as:

  1. Time from conversion
  2. Device type
  3. Number of ad interactions
  4. The order of ad exposure 
  5. The type of creative assets 

The data-driven attribution model assigns credit based on how the addition of each ad interaction to the path changes the estimated conversion probability. 

Example:

In the following high-level illustration, the combination of Ad Exposure #1 (Paid search), Ad Exposure #2 (Social), Ad Exposure #3 (Affiliate), and Ad Exposure #4 (Search) leads to a 3% probability of conversion. When Ad Exposure #4 does not occur, the probability drops to 2%, so we know that Ad Exposure #4 drives +50% conversion probability. We repeat this for each ad interaction and use the learned contributions as attribution weights.

the combination of ad exposure and probability of conversion

It is important to note that Google does not incorporate parameters related to visit quality. Consequently, it may not accurately identify channels that commonly appear in conversion journeys but have minimal impact on the user’s decision-making process. The impact of this will be discussed later when we compare the attribution results of different models.

A significant drawback of the attribution modeling GA uses is its inherent limitation in accounting for the post-view impact of brand-awareness campaigns and other upper funnel activities. These types of campaigns often have low click-through rates but play a crucial role in influencing buyers over the medium to long term.


Last-click model and other options in GA4

Last-click model ignores direct traffic and attributes 100% of the conversion value to the last channel that the customer clicked through (or engaged view through for YouTube) before converting. 

Google provides alternative perspectives to analyze the data, such as a model that attributes all the credit to the final interaction with Google Ads (known as Google Paid Channels Last Click). However, we will not discuss these models in greater detail as they inherently skew the results.

Last-click results from Google Big Query data

Google provides an intriguing option for analysts utilising BigQuery, as it allows users to utilise a Streaming Export feature. This feature captures the raw, unprocessed data directly from measurements and saves it in Google BigQuery. Analysts can then access this unaltered data and apply their own customized rules and analyses. In the comparative section of this article, we will explore and compare the processed data within GA4 with the raw data obtained through BigQuery.

Missing Conversions Modeling in GA4

In today’s web measurement landscape numerous challenges arise leading to a portion of conversions being unobservable or customer journeys being incomplete. These challenges commonly arise due to the following reasons:

  1. Non-consent to cookie collection: buyers who do not provide consent for the collection of cookies hinder the ability to track and observe their complete journey.
  2. Multi-device customer journeys: when customer journeys span multiple devices it becomes difficult to accurately track and attribute conversions.
  3. iOS 14.5+ device limitations: iOS 14.5 and later versions require developers to obtain explicit permission to access certain information from other apps and websites, further impacting the ability to observe and track conversions.

To address these limitations, Google employs machine learning techniques to model and estimate the missing conversions. Google’s machine learning models analyse trends between directly observed conversions and those that are unattributed. By identifying similarities between attributed conversions on one browser and unattributed conversions on another, the machine learning model can predict overall attribution. This prediction allows for the aggregation of both modeled and observed conversions.

While Google’s intent to provide a more comprehensive view is understandable, it also introduces an additional layer of complexity for analysts. Some reports in Google Analytics (GA) include modeled conversions, while others do not. This lack of differentiation between observed and modeled conversions within GA reports can pose challenges for analysts seeking to understand the data accurately.

Attribution Modeling in Roivenue

Roivenue serves as a specialized attribution tool for marketers, aiming to enhance marketing ROI. Its core functionality revolves around integrating data from multiple sources and employing advanced modeling techniques to achieve precise attribution. This ensures that all digital channels are fairly represented in the attribution process. In this part, we will provide an in-depth explanation of Roivenue’s methodology and conduct a thorough comparison with GA4, assessing their respective approaches and methodologies.

Customer Journeys

The first notable distinction in modeling between Roivenue and GA4 lies in the underlying customer journey data used for attribution. While Roivenue utilizes visit-level data from Google Analytics 4 as a starting point, it goes beyond this by incorporating impression-level data from Demand-side Platforms (DSPs). This inclusion enables the evaluation of the post-view impact of real-time buying campaigns and the accurate measurement of conversions through direct deals with publishers.

Additionally, Roivenue provides a unique solution for measuring post-view and cross-device conversions from walled gardens such as Meta (Facebook, Instagram), TikTok, Twitter, YouTube, and others. While the process of connecting visit-level data is similar to GA4, the incorporation of impression-level data requires a deeper exploration to fully comprehend its added value.

Visits from GA

Roivenue retrieves events from Google Analytics to reconstruct visit-level customer journeys. It is worth highlighting that Roivenue also captures multiple qualitative parameters of the visit, such as bounces, pageviews, events and more. This comprehensive data allows Roivenue to assess the true impact of each specific visit on conversions more accurately. By considering these qualitative parameters, Roivenue can provide a more nuanced understanding of the effectiveness and significance of individual visits in driving conversions.

True Impressions

Demand-side Platforms (DSPs) are essential tools for marketers to purchase ad space in real time across the internet. They not only facilitate real-time buying but also provide performance measurement for inventory obtained through direct deals with publishers. DSPs often offer pixel-based measurement and employ advanced techniques to track conversions effectively. Additionally, these platforms grant access to granular data, including browser identifiers and timestamps, for each served ad.

By integrating this detailed impression-level data with Google Analytics data, Roivenue can enhance visits with precise information about impressions. This integration allows for a comprehensive view of the customer journey, ensuring that even campaigns with zero direct visits receive fair credit if they contribute to increased awareness and impact conversions further down the funnel.

Synthetic Touchpoints

This represents a significant breakthrough in the field of attribution. In the past, tracking the post-view performance of walled gardens posed a challenge because these platforms did not allow pixel-based tracking. As a result, marketers had to rely on Google Analytics data, which couldn’t capture the full brand effect of campaigns, or trust the potentially inflated results reported by advertising platforms that have a vested interest in selling more ad space. Moreover, these platforms typically count the entire conversion for themselves within their attribution windows, leading to duplicate reporting if a person is targeted across multiple platforms.

Roivenue introduces a solution to this dilemma. By integrating the most granular data available from platforms (typically hourly data on the ad level) and combining it with conversions tracked in Google Analytics, Roivenue can identify matches between reported conversions. When a match is found, Roivenue generates synthetic touchpoints for all the platforms claiming involvement in a given conversion. While Roivenue cannot directly observe the real impression-level data from these platforms, it can indirectly assess their post-view impact on conversions by leveraging their reporting.

As a result, the customer journey consists of visits, true impressions, and synthetic touchpoints, all of which are used in the attribution calculation. This comprehensive approach allows for a more accurate understanding of the contribution made by each platform and facilitates a more insightful attribution analysis.

Methodology

Roivenue employs an AI attribution model based on recurrent neural networks (RNNs) and places a strong emphasis on transparency regarding its methodology. The model utilizes the aforementioned customer journey data and aims to predict the probability of a conversion. It then assigns credit to each touchpoint based on the extent to which it contributed to increasing the likelihood of a conversion. The process consists of three key steps:

  1. Model Training: In this initial step, the model is exposed to a vast dataset consisting of hundreds of thousands of customer journeys. By analying these journeys, the model learns the behavioural patterns associated with converting and non-converting users. It memorises significant patterns and identifies factors that increase the chances of conversion.
  2. Conversion Probability Estimation: This step is crucial, as it involves sequentially presenting each customer journey to the model, touchpoint by touchpoint. After each touchpoint exposure, the model estimates the probability of a user converting at that specific touchpoint. By repeating this process for all touchpoints, Roivenue can determine the contribution of each touchpoint in increasing the likelihood of conversion.
  3. Touchpoint Scoring: In the final step, the probability estimates obtained in step 2 are transformed into scores. Roivenue applies partial conversions to each touchpoint within the customer journey, considering their individual impact on increasing the likelihood of conversion. This scoring mechanism allows for a more granular and accurate allocation of credit to each touchpoint.

Example of the step 3 – how conversion likelihood estimates are converted into scores

how conversion likelihood estimates are converted into scores

By following this systematic approach, Roivenue’s AI attribution model leverages machine learning techniques to effectively assess the significance of touchpoints and provide valuable insights into the attribution process.

Comparison of GA4 vs. Roivenue

Now that we have explored the methodologies of both GA4 and Roivenue, it is crucial to compare them and highlight the differences in their results. Ultimately, marketers rely on these results to make informed decisions and it is important for any attribution model to offer a distinct view of the results, otherwise there is no point in deploying it in addition to the existing model.

Methodology comparison

  1. Data Integration: While GA4 uses its own visit-level data measured on the customer’s website, Roivenue integrates data from multiple sources including Google Analytics and impression-level data from DSPs and walled gardens. As a result, Roivenue offers solutions for measuring post-view and cross-device conversions from walled gardens.
  2. Attribution Logic: GA4 employs a data-driven attribution model that focuses on estimating the conversion probability of journeys in which a specific channel is included in comparison with journeys where it is not included. Roivenue’s AI attribution model utilizes recurrent neural networks to predict the likelihood of conversion and assigns credit based on the contribution of each touchpoint in increasing the chances of conversion.
    • The main difference here is the inclusion of qualitative data about each visit as GA4 will consider a visit on a website that bounced within a few seconds the same quality as a visit which spent half an hour on the website. Roivenue will be able to recognize the difference and assign credit based on that.
  3. Transparency and Customization: Roivenue emphasises transparency in its methodology, allowing marketers to understand and adjust the attribution model according to their specific needs. GA4, on the other hand, provides a default attribution model with limited customization options and very limited visibility into how the model really works.
  4. Post-View Attribution: Roivenue addresses the challenge of post-view attribution by leveraging impression-level data and generating synthetic touchpoints. GA4, in comparison, may face limitations in capturing the true impact of such activities and therefore will overvalue channels which appear later in the customer journey.
  5. Cross-device Attribution: Both tools are partially able to address this issue. Google Analytics can use Google Signals to connect visits from multiple devices if a user was logged into the same Google account on both and the tracking was allowed. Roivenue on the other hand can use a similar feature implemented by walled gardens (e.g. Meta) to evaluate impact of impressions/clicks served to a user across devices on which he/she is logged into the platform. Neither of the two approaches is solving this challenge entirely and both will struggle on devices with iOS 14.5+

This is a brief summary of the differences in the methodologies, but now let’s dive into the results!

Results comparison

To conduct a comprehensive comparison, we have collected data from six distinct online retailers operating across multiple countries and various industries, including fashion, furniture, professional equipment and more. Our aim was to ensure a balanced representation of retailers with different customer life cycles, varied marketing mixes and diverse strategies. By including this diverse range of retailers, we can obtain a holistic view of how attribution models perform across various business contexts, providing valuable insights into its strengths and weaknesses.

We have decided to use Last Click results from Google Big Query as a baseline so you can always find it as 100%. All the other models are then a % from the last click in Big Query. This was chosen as it also provides an interesting view into how post-processing of the data can influence the results you see in GA4. All the data are in percentages. You will find a summary of the patterns which were frequently found in the data in the next chapter, but we are also providing a description for each client after the summary if you really want to deep-dive into the comparison.

Common Patterns

attribution common patterns

When combining all of the results, we can see some interesting patterns:

  • Google/cpc – Concerns have been raised among marketers regarding Google’s new attribution model, fearing that it might inflate the contribution of Google Ads campaigns. However, based on the data and our analysis, we can conclude that Google does not seem to be doing so—at least for now. Google has a separate attribution model within Google Ads where marketers predominantly spend their time, and the reported results there are considerably higher than those in Google Analytics. This suggests that there is likely no need for Google to skew the data in GA to favour their own campaigns.
  • Organic – This channel raises some suspicion as it consistently receives high scores in comparison to last-click attribution in Google Analytics. It is plausible that Google’s attribution model might be facing challenges in accurately capturing the impact of other channels or that there could be an intentional attempt to present organic search in a more positive light, especially given the fact that it is primarily related to Google’s search engine.
  • Direct – both GA and Roivenue display a tendency to redistribute a significant portion of Direct’s credit to other channels on average. GA increases the value of Direct, for Client 1 and Client 4 for unknown reasons. Otherwise the average decrease in this channel would be even more significant.
  • Walled Gardens – The analysis highlights a key finding regarding the “Walled Gardens” channel. Google’s new data-driven model fails to provide substantial credit to walled gardens and, on average, it further reduces their perceived value. This comes as a surprise, considering the fact that advertising on platforms like Meta/Tiktok is typically associated with upper-funnel activities. In contrast, Roivenue proves valuable in capturing the post-impression effect of campaigns run within walled gardens, offering a more accurate representation of their impact.
  • Affiliate –  GA4 tends to credit affiliates more due to their presence in high-conversion customer journeys. In contrast, Roivenue decreases their value, recognizing that they often appear towards the end of the customer journey and the visit did not bring much additional value to the decision making process. GA’s model lacks the data about visit quality, leading to differences in credit assignment. This is particularly noticeable for certain affiliates offering cashbacks or coupons, which typically receive last-click credit but target users who have already made their purchase decision.
  • Referral – An intriguing behavior emerges in GA4 concerning this channel, as a substantial increase in credit allocation is observed. It appears that the attribution modeling GA provides redistributes credit predominantly from “Direct” to “Organic” towards “Referral.” Unfortunately, the lack of detailed documentation for GA’s model leaves us speculating about the underlying reasons for this phenomenon, warranting further investigation and clarity.

Granular Data for Each Client

Client 1
Client 1 attribution

Key Findings

  • Walled Gardens – The GA4 data-driven model shows a mere 10% increase in performance compared with the last click approach. In contrast, the Roivenue AI model demonstrates an ability to measure 2.5 times more conversions from these Walled Gardens due to synthetic impressions.
  • Google Organic – GA4 tends to overvalue channels like Google Organic, while Roivenue decreases their perceived value due to the synthetic impressions that can create misleading customer journey representations in GA. This results in one-touchpoint journeys appearing from Organic or Direct sources, even when customers interacted with advertising on platforms like Instagram. 
  • Affiliate – GA4 tends to credit affiliate channels more due to their presence in high-conversion customer journeys, while Roivenue decreases their value as they often appear towards the end of the customer journey with lower quality which is something the GA model cannot capture as it is missing the data about the quality of the visit.
Client 2
Client 2 attribution

Key Findings

  • Walled Gardens – An interesting revelation emerges when examining walled gardens in GA4, as the model substantially diminishes their value despite their usual role in upper-funnel activities. In sharp contrast, Roivenue recognizes the true worth of walled gardens, assigning them significantly higher credit.
  • Google Organic – Across all clients, a similar pattern emerges for Google Organic, where overvaluation becomes a recurring theme. This phenomenon is a widespread occurrence among various businesses.
  • Referral – We can observe one of the biggest discrepancies between raw measured data and the post-processing in GA4 here. The approach to this channel is something we would need more details from Google to be able to interpret the behaviour.
Client 3

We can see a combination of previously described patterns, specifically:

  • Organic – Clearly, there is a significant overvaluation in GA for this channel.
  • Walled Gardens – Roivenue effectively captures the true contribution of walled gardens, while GA’s data-driven model fails to attribute more credit to it compared to last-click attribution.
  • Affiliate – GA tends to seriously overvalue the affiliate channel, primarily due to inherent limitations in its attribution model.
  • Referral – Across all models, referral consistently receives more credit compared to last-click attribution. However, GA noticeably places a stronger emphasis on this channel.

Furthermore, an observation can be made regarding the “Direct” channel. Google acknowledges that this channel lacks substantial insights for marketers, so the model minimises its value by design which is a useful feature. Roivenue allows marketers to configure the approach to this channel, granting greater flexibility in the analysis.

Client 4

This client confirms some of the previous observations but also shows some less frequent results:

  • Organic – As expected, GA significantly overvalued the Organic channel, aligning with our previous findings.
  • Walled Gardens – The observed decrease in value compared to last-click attribution in GA raises suspicion. However, even with Roivenue’s analysis, this channel shows minimal added value. This could indicate an underlying measurement issue or, in this specific case, suggest that the channel may not hold significant additional value.
  • Referral – Similar to other clients, GA tends to overvalue the Referral channel in this dataset.
  • Direct – We encounter results similar to those observed in Client 1, where the minimization of Direct’s value does not function optimally. GA assigns more credit to this channel during post-processing than what is initially measured in the raw data.
Client 5

We can see a combination of previously described patterns, specifically:

  • Organic – Once again, we find that GA significantly overvalued the Organic channel.
  • Walled Gardens – Roivenue proves effective in capturing the true contribution of walled gardens, while GA’s data-driven model further diminishes their perceived impact.
  • Direct – Both GA and Roivenue demonstrate the minimization effect in this case, downplaying the value of the Direct channel.

These observations reinforce the consistent patterns observed across different channels in GA and Roivenue. The significant overvaluation of Organic in GA4 and the minimizing effect on Direct in both platforms highlight the importance of comprehensive attribution models that can accurately represent the true impact of each channel. Additionally, the contrasting treatment of Walled Gardens between the two platforms may indicate inherent differences in their attribution methodologies and underscores the significance of choosing the most suitable tool for accurate marketing analysis.

Client 6

The description of the client 6 would be exactly the same as for the client 5, all the patterns are very similar.

Conclusion

In this article, we explored the world of data-driven attribution in Google Analytics 4 (GA4) and compared it with the independent data-driven attribution model offered by Roivenue AI. The attribution problem, which marketers face, involves allocating credit to various marketing activities in a customer’s journey. Traditional models, such as last-click attribution, tend to undervalue touchpoints leading up to the final conversion.

GA4’s data-driven attribution model, which has now become the default in the platform, attempts to estimate the impact of each touchpoint by analyzing factors like time from conversion, device type, ad interactions, and creative assets. However, it has limitations in accounting for post-view impact and cross-device conversions from walled gardens.

Roivenue AI’s attribution model stands out by integrating visit-level data from Google Analytics with impression-level data from Demand-side Platforms (DSPs) and walled gardens. This allows Roivenue to accurately measure post-view impacts and cross-device conversions, providing a more comprehensive view of customer journeys. The AI-powered recurrent neural network model in Roivenue predicts conversion probabilities and assigns credit to each touchpoint based on its contribution to increasing the likelihood of conversion.

In comparing the results of both models across various business contexts and industries, we discovered some interesting patterns. Google’s data-driven model did not significantly inflate the contribution of Google Ads campaigns, but it showed challenges in accurately capturing the impact of other channels, such as organic search and walled gardens. Roivenue, on the other hand, excelled in recognizing the post-impression effect of walled garden campaigns, giving a more accurate representation of their impact.

Overall, Roivenue AI’s attribution model demonstrated greater transparency, customization options, and a more comprehensive approach to attribution, making it a valuable tool for marketers seeking to enhance their understanding of customer journeys and marketing ROI. However, both models have their strengths and weaknesses, and the choice between them depends on the specific needs and goals of each marketer.




Ready to Dive Deeper? Join our upcoming free live Webinar: Demystifying GA4 – the facts on attribution modeling in Google Analytics