Article

8min read

How to Better Handle Collateral Effects of Experimentation: Dynamic Allocation vs Sequential Testing

When talking about web experimentation, the topics that often come up are learning and earning. However, it’s important to remember that a big part of experimentation is encountering risks and losses. Although losses can be a touchy topic, it’s important to talk about and destigmatize failed tests in experimentation because it encourages problem-solving, thinking outside of your comfort zone and finding ways to mitigate risk. 

Therefore, we will take a look at the shortcomings of classic hypothesis testing and look into other options. Basic hypothesis testing follows a rigid protocol: 

  • Creating the variation according to the hypothesis
  • Waiting a given amount of time 
  • Analyzing the result
  • Decision-making (implementing the variant, keeping the original, or proposing a new variant)

This rigid protocol and simple approach to testing doesn’t say anything about how to handle losses. This raises the question of what happens if something goes wrong? Additionally, the classic statistical tools used for analysis are not meant to be used before the end of the experiment.

If we consider a very general rule of thumb, let’s say that out of every 10 experiments, 8 will be neutral (show no real difference), one will be positive, and one will be negative. Practicing classic hypothesis testing suggests that you just accept that as a collateral effect of the optimization process hoping to even it out in the long term. It may feel like crossing a street blindfolded.

For many, that may not cut it. Let’s take a look at two approaches that try to better handle this problem: 

  • Dynamic allocation – also known as “Multi Armed Bandit” (MAB). This is where traffic allocation changes for each variation according to their performance, implicitly lowering the losses.
  • Sequential testing – a method that allows you to stop a test as soon as possible, given a risk aversion threshold.

These approaches are statistically sound but they come with their assumptions. We will go through their pros and cons within the context of web optimization.

First, we’ll look into the classic version of these two techniques and their properties and give tips on how to mitigate some of their problems and risks. Then, we’ll finish this article with some general advice on which techniques to use depending on the context of the experiment.

Dynamic allocation (DA)

Dynamic allocation’s main idea is to use statistical formulas that modify the amount of visitors exposed to a variation depending on the variation’s performance. 

This means a poor-performing variation will end up having little traffic which can be seen as a way to save conversions while still searching for the best-performing variation. Formulas ensure the best compromise between avoiding loss and finding the real best-performing variation. However, this implies a lot of assumptions that are not always met and that make DA a risky option. 

There are two main concerns, both of which are linked to the time aspect of the experimentation process: 

  • The DA formula does not take time into account 

If there is a noticeable delay between the variation exposure and the conversion, the algorithm may go wrong resulting in a visitor being considered a ‘failure’ until they convert. This means that the time between a visit and a conversion will be falsely counted as a failure.

As a result, the DA will use the wrong conversion information in its formula so that any variation gaining traffic will automatically see a (false) performance drop because it will detect a growing number of non-converting visitors. As a result, traffic to that variation will be reduced.  

The reverse may also be true: a variation with decreasing traffic will no longer have any new visitors while existing visitors of this variation could eventually convert. In that sense, results would indicate a (false) rise in conversions even when there are no new visitors, which would be highly misleading.

DA gained popularity within the advertising industry where the delay between an ad exposure and its potential conversion (a click) is short. That’s why it works perfectly well in this context. The use of Dynamic Allocation in CRO must be done in a low conversion delay context only.

In other words, DA should only be used in scenarios where visitors convert quickly. It’s not recommended for e-commerce except for short-term campaigns such as flash sales or when there’s not enough traffic for a classic AB test. It can also be used if the conversion goal is clicking on an ad on a media website.

  • DA and the different days of the week 

It’s very common to see different visitor behavior depending on the day of the week. Typically, customers may behave differently on weekends than during weekdays.  

With DA, you may be sampling days unevenly, implicitly giving more weight on some days for some variations. However, you should weigh each day the same because, in reality, you have the same amount of weekdays. You should only use Dynamic Allocation if you know that the optimized KPI is not sensitive to fluctuations during the week.

The conclusion is that DA should be considered only when you expect too few total visitors for classic A/B testing. Another requirement is that the KPI under experimentation needs a very short conversion time and no dependence on the day of the week. Taking all this into account: Dynamic Allocation should not be used as a way to secure conversions.

Sequential Testing (ST)

Sequential Testing is when a specific statistical formula is used enabling you to stop an experiment. This will depend on the performance of variations with given guarantees on the risk of false positives. 

The Sequential Testing approach is designed to secure conversions by stopping a variation as soon as its underperformance is statistically proven. 

However, it still has some limitations. When it comes to effect size estimation, the effect size may be wrong in two senses: 

  • Bad variations will be seen as worse than they really are. It’s not a problem in CRO because the false positive risk is still guaranteed. This means that in the worst-case scenario, you will discard not a strictly losing variation but maybe just an even one, which still makes sense in CRO.
  • Good variations will be seen as better than they really are. It may be a problem in CRO since not all winning variations are useful for business. The effect size estimation is key to business decision-making. This can easily be mitigated by using sequential testing to stop losing variations only. Winning variations, for their part, should be continued until the planned end of the experiment, ensuring both correct effect size estimation and an even sampling for each day of the week.
    It’s important to note that not all CRO software use this hybrid approach. Most of them use ST to stop both winning and losing variations, which is wrong as we’ve just seen.

As we’ve seen, by stopping a losing variation in the middle of the week, there’s a risk you may be discarding a possible winning variation. 

However, to actually have a winning variation after ST has shown that it’s underperforming, this variation will need to perform so well that it becomes even with the reference. Then, it would also have to perform so well that it outperforms the reference and all that would need to happen in a few days. This scenario is highly unlikely.

Therefore, it’s safe to stop a losing variation with Sequential Testing, even if all weekdays haven’t been evenly sampled.

The best of both worlds in CRO 

Dynamic Allocation is the best approach to experimentation instead of static allocation when you expect a small volume of traffic. It should be used only in the context of ‘short delay KPI’ and with no known weekday effect (for example: flash sales). However, it’s not a way to mitigate risk in a CRO strategy.

To be able to run experiments with all the needed guarantees, you need a hybrid system using Sequential Testing to stop losing variations and a classic method to stop a winning variation. This method will allow you to have the best of both worlds.

 

Subscribe to
our Newsletter

bloc Newsletter EN

We will process and store your personal data to respond to send you communications as described in our  Privacy Policy.

Article

9min read

Harmony or Dissonance: Decoding Data Divergence Between AB Tasty and Google Analytics

The world of data collection has grown exponentially over the years, providing companies with crucial information to make informed decisions. However, within this complex ecosystem, a major challenge arises: data divergence. 

Two analytics tools, even if they seem to be following the same guidelines, can at times produce different results. Why do they differ? How do you leverage both sets of data for your digital strategy?

In this article, we’ll use a concrete example of a user journey to illustrate differences in attribution between AB Tasty and Google Analytics. GA is a powerful tool for gathering and measuring data across the entire user journey. AB Tasty lets you easily make changes to your site and measure the impact on specific goals. 

Navigating these differences in attribution strategies will explain why there can be different figures across different types of reports. Both are important to look at and which one you focus on will depend on your objectives:

  • Specific improvements in cross-session user experiences 
  • Holistic analysis of user behavior

Let’s dive in! 

Breaking it down with a simple use case

We’re going to base our analysis on a deliberately very basic use case, based on the user journey of a single visitor.

Campaign A is launched before the first session of the visitor and remains live until the end which occurs after the 3rd session of the visitor.

Here’s an example of the user journey we’ll be looking at in the rest of this article: 

  • Session 1:  first visit, Campaign A is not triggered (the visitor didn’t match all of the targeting conditions)
  • Session 2:  second visit, Campaign A is triggered (the visitor matched all of the targeting conditions)
  • Session 3:  third visit, no re-triggering of Campaign A which is still live, and the user carries out a transaction.

NB A visitor triggers a campaign as soon as they meet all the targeting conditions: 

  • They meet the segmentation conditions
  • During their session, they visit at least one of the targeted pages 
  • They meet the session trigger condition.

In A/B testing, a visitor exposed to a variation of a specific test will continue to see the same variation in future sessions, as long as the test campaign is live. This guarantees reliable measurement of potential changes in behavior across all sessions.

We will now describe how this user journey will be taken into account in the various AB Tasty and GA reports. 

Analysis in AB Tasty

In AB Tasty, there is only one report and therefore only one attribution per campaign.

The user journey above will be reported as follows for Campaign A:

  • Total Users (Unique visitors) = 1, based on a unique user ID contained in a cookie; here there is only one user in our example.
  • Total Session = 2, s2 and s3, which are the sessions that took place during and after the display of Campaign A, are taken into account even if s3 didn’t re-trigger campaign A
  • Total Transaction = 1, the s3 transaction will be counted even if s3 has not re-triggered Campaign A.

In short, AB Tasty will collect and display in Campaign A reporting all the visitor’s sessions and events from the moment the visitor first triggered the campaign

Analysis in Google Analytics

The classic way to analyze A/B test results in GA is to create an analysis segment and apply it to your reports. 

However, this segment can be designed using 2 different methods, 2 different scopes, and depending on the scope chosen, the reports will not present the same data. 

Method 1: On a user segment/user scope

Here we detail the user scope, which will include all user data corresponding to the segment settings. 

In our case, the segment setup might look something like this: 

This segment will therefore include all data from all sessions of all users who, at some point during the analysis date range, have received an event with the parameter event action = Campaign A.

We can then see in the GA report for our user journey example: 

  • Total User = 1, based on a user ID contained in a cookie (like AB Tasty); here there is only one user in our example
  • Total Session = 3, s1, s2 and s3 which are the sessions created by the same user entering the segment and therefore includes all their sessions
  • Total Transaction = 1, transaction s3 will be counted as it took place in session s3 after the triggering of the campaign.

In short, in this scenario, Google Analytics will count and display all the sessions and events linked to this single visitor (over the selected date range), even those prior to the launch of Campaign A.

Method 2: On a session segment/session scope 

The second segment scope detailed below is the session scope. This includes only the sessions that correspond to the settings.

In this second case, the segment setup could look like this: 

This segment will include all data from sessions that have, at some point during the analysis date range, received an event with the parameter event action = Campaign A.

As you can see, this setting will include fewer sessions than the previous one. 

In the context of our example:

  • Total User = 1, based on a user ID contained in a cookie (like AB Tasty), here there’s only one user in our example
  • Total Session = 1, only s2 triggers campaign A and therefore sends the campaign event 
  • Total Transaction = 0, the s3 transaction took place in the s3 session, which does not trigger campaign A and therefore does not send an event, so it is not taken into account. 

In short, in this case, Google Analytics will count and display all the sessions – and the events linked to these sessions – that triggered campaign A, and only these.

Attribution model

Tool – scope Counted in the selected timeframe
AB Tasty All sessions and events that took place after the visitor first triggered campaign A
Google Analytics – user scope  All sessions and events of a user that triggered campaign A at least once during one their sessions
Google Analytics – session scope  Only sessions that have triggered campaign A

 

Different attribution for different objectives

Depending on the different attributions of the various reports, we can observe different figures without the type of tracking being different. 

The only metric that always remains constant is the sum of Users (Unique visitors in AB Tasty). This is calculated in a similar (but not identical) way between the 2 tools. It’s therefore the benchmark metric, and also the most reliable for detecting malfunctions between A/B testing tools and analytics tools with different calculations. 

On the other hand, the attribution of sessions or events (e.g. a transaction) can be very different from one report to another. All the more so as it’s not possible in GA to recreate a report with an attribution model similar to that of AB Tasty. 

Ultimately, A/B test performance analysis relies heavily on data attribution, and our exploration of the differences between AB Tasty and Google Analytics highlighted significant distinctions in the way these tools attribute user interactions. These divergences are the result of different designs and distinct objectives.

From campaign performance to holistic analysis: Which is the right solution for you?

AB Tasty, as a solution dedicated to the experimentation and optimization of user experiences, stands out for its more specialized approach to attribution. It offers a clear and specific view of A/B test performance, by grouping attribution data according to campaign objectives. 

Making a modification on a platform and testing it aims to measure the impact of this modification on the performance of the platform and its metrics, during the current session and during future sessions of the same user. 

On the other hand, Google Analytics focuses on the overall analysis of site activity. It’s a powerful tool for gathering data on the entire user journey, from traffic sources to conversions. However, its approach to attribution is broader, encompassing all session data, which can lead to different data cross-referencing and analysis than AB Tasty, as we have seen in our example.

It’s essential to note that one is not necessarily better than the other, but rather adapted to different needs. 

  • Teams focusing on the targeted improvement of cross-session user experiences will find significant value in the attribution offered by AB Tasty. 
  • On the other hand, Google Analytics remains indispensable for the holistic analysis of user behavior on a site.

The key to effective use of these solutions lies in understanding their differences in attribution, and the ability to exploit them in complementary ways. Ultimately, the choice will depend on the specific objectives of your analysis, and the alignment of these tools with your needs will determine the quality of your insights.