Article

6min read

Mastering Revenue Metrics: Understand the Power and Practical Use of RevenueIQ

Revenue is the cornerstone of any e-commerce business, yet most optimization efforts focus only on improving conversion rates.

Average Order Value (AOV), an equally important driver of revenue, is often overlooked because it’s difficult to measure accurately with standard statistical tools. This gap can lead to missed opportunities and slow decision-making.

RevenueIQ addresses this challenge by providing a robust, reliable way to measure and optimize revenue directly—combining conversion and AOV into a single, actionable metric.

Here’s how RevenueIQ changes the way you approach experimentation and business growth.

Discover how to accurately measure and optimize revenue in your experiments with our patented feature.

The most important KPI in e-commerce is revenue. In an optimization context, this means focusing on two key areas:

  • Conversion Rate (CR): Turning as many visitors as possible into customers.
  • Average Order Value (AOV): Generating as much value as possible per customer.

However, Conversion Rate Optimization (CRO) often remains focused on conversion, while AOV is frequently neglected due to its statistical complexity. Accurately estimating AOV with classic tests (such as the t-test or Mann-Whitney) is challenging because purchase distributions are highly skewed and have no upper bound.

RevenueIQ offers a robust test that directly estimates the distribution of the effect on revenue (through a refined estimation of AOV), providing both the probability of gain (“chance to win”) and consistent confidence intervals.

In benchmarks, RevenueIQ maintains a correct false positive rate, has power close to Mann-Whitney, and produces confidence intervals four times narrower than the t-test. By combining the effects of AOV and CR, it delivers an RPV (Revenue Per Visitor) impact and then an actionable revenue projection.

Curious to learn more details? Please read our RevenueIQ Whitepaper for a full scientific explanation written by our Data Scientist, Hubert Wassner.

Context & Problem

In CRO, we often optimize CR due to a lack of suitable tools for revenue. Yet, Revenue = Visitors × CR × AOV; ignoring AOV distorts the view.

AOV is misleading because:

  • It is unbounded (someone can buy many items).
  • It is highly right-skewed (many small orders, a few very large ones).
  • A few “large and rare” values can dominate the average.
  • In random A/B splits, these large orders can be unevenly distributed, leading to huge variance in observed AOV.

Limitations of Classic Tests

t-test

Assumes normality (or relies on the Central Limit Theorem for the mean). On highly skewed e-commerce data, the CLT variance formula is unreliable at realistic volumes. The result: very low power (detects ~15% of true winners in the benchmark) and very wide confidence intervals, leading to slow and imprecise decisions.

Mann-Whitney (MW):

Robust to non-normality (works on ranks), so much more powerful (~80% detection in the benchmark). But it only provides a p-value (thus only trend information), not an estimate of effect size (no confidence interval), making it impossible to quantify the business case.

RevenueIQ: Principle

It uses and combines two innovative approaches:

  1. Bootstrap Technique: Studies the variability of a measure with unknown statistical behavior.
  2. Basket Difference Measurement: Instead of measuring the difference in average baskets, it measures the average of basket differences. It compares sorted order differences between variants (A and B), with weighting by density (approx. log-normal) to favor “comparable” pairs. This bypasses the problem of very large observed value differences in such data.

RevenueIQ then provides:

  • The Chance to Win (probability that the effect is > 0), which is easy for decision-makers to interpret.
  • Narrow and reliable confidence intervals on the AOV effect as well as on revenue.

Benchmarks (AOV)

  • Alpha validity (on AA tests): Good control of false positives. Using a typical 95% threshold exposes only a 5% false positive risk.
  • Statistical power measurement: 1000 AB tests with a known effect of +€5
    • MW Test: 796/1000 winners, ~80% power.
    • t-test: 146/1000, only 15% power.
    • RevenueIQ: 793/1000 (≈ equivalent to MW). ~80% power.
  • Confidence interval (CI): RevenueIQ produces CIs of €8 width, which is reasonable and functional in the context of a real effect of €5. With an average CI width of €34, the t-test is totally ineffective.
  • CI coverage: The validity of the confidence intervals was verified. A 95% CI indeed has a 95% chance of containing the true effect value (i.e., €0 for AA tests and €5 for AB tests).

From AOV KPI to Revenue

Beyond techniques and formulas, the key point is that RevenueIQ uses a Bayesian method for AOV analysis, allowing this metric to be merged with conversion. Competitors use frequentist methods, at least for AOV, making any combination of results impossible. Under the hood, RevenueIQ combines conversion and AOV results into a central metric: visitor value (RPV). With precise knowledge of RPV, revenue (in € or other currency) is then projected by multiplying by the targeted traffic for a given period.

Real Case (excerpt) Here is a textbook case for RevenueIQ:

  • Conversion gain is 92% CTW, encouraging but not “significant” by standard threshold.
  • AOV gain is at 80% CTW. Similarly, taken separately, this is not enough to declare a winner.
  • The combination of these two metrics gives a CTW of 95.9% for revenue, enabling a simple and immediate decision, where a classic approach would have required additional data collection while waiting for one of the two KPIs (CR or AOV) to become significant.
  • For an advanced business decision, RevenueIQ provides an estimated average gain of +€50k, with a confidence interval [-€6,514; +€107,027], allowing identification of minimal risk and substantial gain.

What This Changes for Experimentation

  • Without RevenueIQ: “inconclusive” results (or endless tests) lead to missed opportunities.
  • With RevenueIQ: Faster, quantified decisions (probability, effect, CI), at the revenue level (RPV then projected revenue).

Practical Recommendations

  • Stop interpreting observed AOV without safeguards: it is highly volatile.
  • Avoid filtering/Winsorizing “extreme values”: arbitrary thresholds ⇒ bias.
  • Measure CR & AOV jointly and reason in RPV to reflect business reality.
  • Use RevenueIQ to obtain chance to win + CI on AOV, RPV, and revenue projection.
  • Decide via projected revenue (average gain, lower CI bound) rather than isolated p-values.

Curious to learn more details? Please read our RevenueIQ Whitepaper for a full scientific explanation written by our Data Scientist, Hubert Wassner.

Conclusion

RevenueIQ brings a robust and quantitative statistical test to monetary metrics (AOV, RPV, revenue), where:

  • t-test is weak and imprecise on e-commerce data,
  • Mann-Whitney is powerful but not quantitative.

RevenueIQ enables faster detection, quantification of business impact, and prioritization of deployments with explicit confidence levels.

**Original information can be found by following this link to AB Tasty’s documentation, “Understanding the practical use of RevenueIQ.”

Subscribe to
our Newsletter

bloc Newsletter EN

We will process and store your personal data to send you communications as described in our  Privacy Policy.

Article

7min read

Is Your Average Order Value (AOV) Misleading You?

Average Order Value (AOV) is a widely used metric in Conversion Rate Optimization (CRO), but it can be surprisingly deceptive. While the formula itself is simple—summing all order values and dividing by the number of orders—the real challenge lies within the data itself.

The problem with averaging

AOV is not a “democratic” measure. A single high-spending customer can easily spend 10 or even 100 times more than your average customer. These few extreme buyers can heavily skew the average, giving a limited number of visitors disproportionate impact compared to hundreds or thousands of others. This is problematic because you can’t truly trust the significance of an observed AOV effect if it’s tied to just a tiny fraction of your audience.

Let’s look at a real dataset to see just how strong this effect can be. Consider the order value distribution:

  • The horizontal axis represents the order value.
  • The vertical axis represents the frequency of that order value.
  • The blue surface is a histogram, while the orange outline is a log-normal distribution approximation.

This graph shows that the most frequent order values are small, around €20. As the order value increases, the frequency of such orders decreases. This is a “long/heavy tail distribution,” meaning very large values can occur, albeit rarely.

A single strong buyer with an €800 order value is worth 40 times more than a frequent buyer when looking at AOV. This is an issue because a slight change in the behavior of 40 visitors is a stronger indicator than a large change from one unique visitor. While not fully visible on this scale, even more extreme buyers exist. 

The next graph, using the same dataset, illustrates this better:

  • The horizontal axis represents the size of the growing dataset of order values (roughly indicating time).
  • The vertical axis represents the maximum order value in the growing dataset in €

At the beginning of data collection, the maximum order value is quite small (close to the most frequent value of ~€20). However, we see that it grows larger as time passes and the dataset expands. With a dataset of 10,000 orders, the maximum order value can exceed €5,000. This means any buyer with an order above €5,000 (they might have multiple) holds 250 times the power of a frequent buyer at €20. At the maximum dataset size, a single customer with an order over €20,000 can influence the AOV more than 2,000 other customers combined.

When looking at your e-commerce metrics, AOV should not be used as a standalone decision-making data.

E-commerce moves fast. Get the insights that help you move faster. Download the 2025 report now.

The challenge of AB Test splitting

The problem intensifies when considering the random splits used in A/B tests.

Imagine you have only 10 very large spenders whose collective impact equals that of 10,000 medium buyers. There’s a high probability that the random split for such a small group of users will be uneven. While the overall dataset split is statistically even, the disproportionate impact of these high spenders on AOV requires specific consideration for this small segment. Since you can’t predict which visitor will become a customer or how much they will spend, you cannot guarantee an even split of these high-value users.

This phenomenon can artificially inflate or deflate AOV in either direction, even without a true underlying effect, simply depending on which variation these few high spenders land on.

What’s the solution?

AOV is an unreliable metric, how can we effectively work with it? The answer is similar to how you approach conversion rates and experimentation.

You don’t trust raw conversion data—one more conversion on variation B doesn’t automatically make it a winner, nor do 10 or 100. Instead, you rely on a statistical test to determine when a difference is significant. The same principle applies to AOV. Tools like AB Tasty offer the Mann-Whitney test, a statistical method robust against extreme values and well-suited for long-tail distributions.

AOV behavior can be confusing because you’re likely accustomed to the more intuitive statistics of conversion rates. Conversion data and their corresponding statistics usually align; a statistically significant increase in conversion rate typically means a visibly large difference in the number of conversions, consistent with the statistical test. However, this isn’t always the case with AOV. It’s not uncommon to see the AOV trend and the statistical results pointing in different directions. Your trust should always be placed in the statistical test.

The root cause: Heavy tail distributions

You now understand that the core issue stems from the unique shape of order value distributions: long-tail distributions that produce rare, extreme values.

It’s important to note that the problem isn’t just the existence of extreme values. If these extreme values were frequent, the AOV would naturally be higher, and their impact would be less dramatic because the difference between the AOV and these values would be smaller. Similarly, for the splitting problem, a larger number of extreme values would ensure a more even split.

At this point, you might think your business has a different order distribution shape and isn’t affected. However, this shape emerges whenever these two conditions are met:

  • You have a price list with more than several dozen different values.
  • Visitors can purchase multiple products at once.

Needless to say, these conditions are ubiquitous and apply to nearly every e-commerce business. The e-commerce revolution itself was fueled by the ability to offer vast catalogues.

Furthermore, the presence of shipping costs naturally encourages users to group their purchases to minimize those costs. It means that nearly all e-commerce businesses are affected. The only exceptions are subscription-based businesses with limited pricing options, where most purchases are for a single service.

Here’s a glimpse into the order value distribution across various industries, demonstrating the pervasive nature of the “long tail distribution”:

Cosmetic
Transportation
B2B packaging (selling packaging for e-commerce)
Fashion
online flash sales

AOV, despite its simple definition and apparent ease of understanding, is a misleading metric. Its magnitude is easy to grasp, leading people to confidently make intuitive decisions based on its fluctuations. However, the reality is far more complex; AOV can show dramatic changes even when there’s no real underlying effect.

Conversely, significant changes can go unnoticed. A strong negative effect could be masked by just a few high-spending customers landing in a poorly performing variation. So, now you know: just as you do for conversion rates, rely on statistical tests for your AOV decisions.