Article

6min read

Minimal Detectable Effect: The Essential Ally for Your A/B Tests

In CRO (Conversion Rate Optimization), a common dilemma is not knowing what to do with a test that shows a small and non-significant gain. 

Should we declare it a “loser” and move on? Or should we collect more data in the hope that it will reach the set significance threshold? 

Unfortunately, we often make the wrong choice, influenced by what is called the “sunk cost fallacy.” We have already put so much energy into creating this test and waited so long for the results that we don’t want to stop without getting something out of this work. 

However, CRO’s very essence is experimentation, which means accepting that some experiments will yield nothing. Yet, some of these failures could be avoided before even starting, thanks to a statistical concept: the MDE (Minimal Detectable Effect), which we will explore together.

MDE: The Minimal Detectable Threshold

In statistical testing, samples have always been valuable, perhaps even more so in surveys than in CRO. Indeed, conducting interviews to survey people is much more complex and costly than setting up an A/B test on a website. Statisticians have therefore created formulas that link the main parameters of an experiment for planning purposes:

  • The number of samples (or visitors) per variation
  • The baseline conversion rate
  • The magnitude of the effect we hope to observe

This allows us to estimate the cost of collecting samples. The problem is that, among these three parameters, only one is known: the baseline conversion rate

We don’t really know the number of visitors we’ll send per variation. It depends on how much time we allocate to data collection for this test, and ideally, we want it to be as short as possible. 

Finally, the conversion gain we will observe at the end of the experiment is certainly the biggest unknown, since that’s precisely what we’re trying to determine.

So, how do we proceed with so many unknowns? The solution is to estimate what we can using historical data. For the others, we create several possible scenarios:

  • The number of visitors can be estimated from past traffic, and we can make projections in weekly blocks.
  • The conversion rate can also be estimated from past data.
  • For each scenario configuration from the previous parameters, we can calculate the minimal conversion gains (MDE) needed to reach the significance threshold.

For example, with traffic of 50,000 visitors and a conversion rate of 3% (measured over 14 days), here’s what we get:

MDE Uplift
  • The horizontal axis indicates the number of days.
  • The vertical axis indicates the MDE corresponding to the number of days.

The leftmost point of the curve tells us that if we achieve a 10% conversion gain after 14 days, then this test will be a winner, as this gain can be considered significant. Typically, it will have a 95% chance of being better than the original. If we think the change we made in the variation has a chance of improving conversion by ~10% (or more), then this test is worth running, and we can hope for a significant result in 14 days.

On the other hand, if the change is minor and the expected gain is less than 10%, then 14 days will not be enough. To find out more, we move the curve’s slider to the right. This corresponds to adding days to the experiment’s duration, and we then see how the MDE evolves. Naturally, the MDE curve decreases: the more data we collect, the more sensitive the test becomes to smaller effects.

For example, by adding another week, making it a 21-day experiment, we see that the MDE drops to 8.31%. Is that sufficient? If so, we can validate the decision to create this experiment.

MDE Graph

If not, we continue to explore the curve until we find a value that matches our objective. Continuing along the curve, we see that a gain of about 5.44% would require waiting 49 days.

Minimum Detectable Uplift Graph

That’s the time needed to collect enough data to declare this gain significant. If that’s too long for your planning, you’ll probably decide to run a more ambitious test to hope for a bigger gain, or simply not do this test and use the traffic for another experiment. This will prevent you from ending up in the situation described at the beginning of this article, where you waste time and energy on an experiment doomed to fail.

From MDE to MCE

Another approach to MDE is to see it as MCE: Minimum Caring Effect. 

This doesn’t change the methodology except for the meaning you give to the definition of your test’s minimal sensitivity threshold. So far, we’ve considered it as an estimate of the effect the variation could produce. But it can also be interesting to consider the minimal sensitivity based on its operational relevance: the MCE. 

For example, imagine you can quantify the development and deployment costs of the variation and compare it to the conversion gain over a year. You could then say that an increase in the conversion rate of less than 6% would take more than a year to cover the implementation costs. So, even if you have enough traffic for a 6% gain to be significant, it may not have operational value, in which case it’s pointless to run the experiment beyond the duration corresponding to that 6%.

MDE graph

In our case, we can therefore conclude that it’s pointless to go beyond 42 days of experimentation because beyond that duration, if the measured gain isn’t significant, it means the real gain is necessarily less than 6% and thus has no operational value for you.

Conclusion

AB Tasty’s MDE calculator feature will allow you to know the sensitivity of your experimental protocol based on its duration. It’s a valuable aid when planning your test roadmap. This will allow you to make the best use of your traffic and resources.

Looking for a free and minimalistic MDE calculator to try? Check out our free Minimal Detectable Effect calculator here.

Subscribe to
our Newsletter

bloc Newsletter EN

We will process and store your personal data to respond to send you communications as described in our  Privacy Policy.

Article

4min read

Transaction Testing With AB Tasty’s Report Copilot

Transaction testing, which focuses on increasing the rate of purchases, is a crucial strategy for boosting your website’s revenue. 

To begin, it’s essential to differentiate between conversion rate (CR) and average order value (AOV), as they provide distinct insights into customer behavior. Understanding these metrics helps you implement meaningful changes to improve transactions.

In this article, we’ll delve into the complexities of transaction metrics analysis and introduce our new tool, the “Report Copilot,” designed to simplify report analysis. Read on to learn more.

Transaction Testing

To understand how test variations impact total revenue, focus on two key metrics:

  • Conversion Rate (CR): This metric indicates whether sales are increasing or decreasing. Tactics to improve CR include simplifying the buying process, adding a “one-click checkout” feature, using social proof, or creating urgency through limited inventory.
  • Average Order Value (AOV): This measures how much each customer is buying. Strategies to enhance AOV include cross-selling or promoting higher-priced products.

By analyzing CR and AOV separately, you can pinpoint which metrics your variations impact and make informed decisions before implementation. For example, creating urgency through low inventory may boost CR but could reduce AOV by limiting the time users spend browsing additional products. After analyzing these metrics individually, evaluate their combined effect on your overall revenue.

Revenue Calculation

The following formula illustrates how CR and AOV influence revenue:

Revenue=Number of Visitors×Conversion Rate×AOV

In the first part of the equation (Number of Visitors×Conversion Rate), you determine how many visitors become customers. The second part (×AOV) calculates the total revenue from these customers.

Consider these scenarios:

  • If both CR and AOV increase, revenue will rise.
  • If both CR and AOV decrease, revenue will fall.
  • If either CR or AOV increases while the other remains stable, revenue will increase.
  • If either CR or AOV decreases while the other remains stable, revenue will decrease.
  • Mixed changes in CR and AOV result in unpredictable revenue outcomes.

The last scenario, where CR and AOV move in opposite directions, is particularly complex due to the variability of AOV. Current statistical tools struggle to provide precise insights on AOV’s overall impact, as it can experience significant random fluctuations. For more on this, read our article “Beyond Conversion Rate.”

While these concepts may seem intricate, our goal is to simplify them for you. Recognizing that this analysis can be challenging, we’ve created the “Report Copilot” to automatically gather and interpret data from variations, offering valuable insights.

Report Copilot

The “Report Copilot” from AB Tasty automates data processing, eliminating the need for manual calculations. This tool empowers you to decide which tests are most beneficial for increasing revenue.

Here are a few examples from real use cases.

Winning Variation:

The left screenshot provides a detailed analysis, helping users draw conclusions about their experiment results. Experienced users may prefer the summarized view on the right, also available through the Report Copilot.

Complex Use Case:


The screenshot above demonstrates a case where CR and OAV have opposite trends and need a deeper understanding of the context.

It’s important to note that the Report Copilot doesn’t make decisions for you; it highlights the most critical parts of your analysis, allowing you to make informed choices.

Conclusion

Transaction analysis is complex, requiring a breakdown of components like conversion rate and average order value to better understand their overall effect on revenue. 

We’ve developed the Report Copilot to assist AB Tasty users in this process. This feature leverages AB Tasty’s extensive experimentation dashboard to provide comprehensive, summarized analyses, simplifying decision-making and enhancing revenue strategies.