When you hear ‘A/B Testing’, do you think straight away of revenue gain? Uplift? A dollars and cents outcome? David Mannheim, CEO of the Conversion Rate Optimization (CRO) agency User Conversion, argued at our most recent CRO on Tap event that you probably do – and shouldn’t. Here’s why.

Why do we Experiment?

David kicked off his presentation by asking the audience to dial back a few assumptions. He asked a seemingly obvious – but as it turns out, rather contentious – question: Why do we experiment?

In theory, experimentation should look something like this:

ROI frustration backlog

Experimentation, ideally, he argued, is about learning, prioritizing and minimizing risk – and the tests we run should be developed accordingly.

But he’s found that, far too often, it’s the key stakeholders (or HIPPOs) who decide what tests get implemented first – and their first concern is unwaveringly to see a neat revenue uplift.

Experiment backlog example

This tendency has led David to the following theory:

The ROI of experimentation is impossible to achieve because the industry is conditioned to thinking that A/B testing is about gain only

Context, Context, Context

The reader might be asking themselves at this point, What’s so bad about expecting revenue uplift from A/B tests? I mean, it’s normal to expect clear ROI, right?

Perhaps – but what David explained is that the issue just isn’t that clean cut.

It’s more the expectation of a neat, “we invested X, we need to get Y” formula that gets in the way. The industry has been exposed to misleading ‘CRO myths’, propagated by sites like ‘Which Test Won’ and folklore about the Obama campaign’s 60 million dollar experiment. Stakeholders have come to (erroneously) believe that every test they run should function like this – which has set unrealistic expectations for conversion optimization practitioners.

The industry has cultivated this mindset. Because we say everything is measurable online – now people also believe that you can put an exact number on it. But that’s not possible. That’s the downside of believing that you can measure everything online.

Annemarie Klassen, Conversion Manager, Tix.nl

What people often overlook, David explained, is the complexity of the context in which they are running their tests – and assessing their ROI. David explained three of the biggest challenges involved:

Challenge #1: Forecasting

The first challenge with assessing the ROI of experimentation is forecasting. A huge range of factors impact an analyst’s ability to perfectly accurately project revenue uplift from any given test:

  • Paid traffic strategy
  • Online and offline marketing
  • Newsletters
  • Offers
  • Bugs
  • Device traffic evolution
  • Season
  • What your competitors are doing
  • Societal factors (Brexit)

Mathieu Fauveaux, one of the CRO specialists David interviewed during his quest to better understand the issue, had this to say:

In terms of estimating the revenue projection for the next 12 months from a single experiment, it’s impossible. We can only imagine a trend or an average amount and I never met a team with this capability or vision.

Expecting a perfectly accurate and precise prediction for each test you run just isn’t realistic – the context is too complex.

Challenge #2: Working with averages

Then there’s the issue that, when it comes to reporting, your CRO team is more often than not working with averages – in fact, averages of averages.

Let’s say you’ve run an excellent experiment on a certain, specific audience segment – and you did experience a high uplift in conversion rate. If you then take a look at your global conversion rate for your entire site, there’s a very good chance that uplift will be swallowed up in the number crunching (averaging) it takes to come to that data. Your revenue wave will have shrunk to an undetectable ripple. And this is a big issue when trying to assess overall conversion rate or revenue uplift – there are just too many external factors to get a perfectly accurate picture.

Bottom line? What you’re doing is shifting an average. On average, an average customer, exposed to an average A/B test will perform…averagely. There’s nothing earth shattering about that. Averages fundamentally lie.

Craig Sullivan, Optimal Visit

Challenge #3: Multiple tests

And then there’s the issue of running multiple tests at a time, and trying to aggregate the results.  Again, it’s tempting to want to run simple math equations to get a clear cut answer as to your gain, but the reality is more complicated, as Stephen Pavlovich, CEO of Conversion.com, explains:

I’d be hesitant on grouping together multiple experiments and grouping the impact. You are grouping fuzzy results, which adds a lot of… fuzz. It tends to be how a lot of people look it and it’s misleading.

Should it Always be “Revenue First”?

And really, when you step back and think about it, it doesn’t make sense for conversion optimizers to expect revenue gain, and only revenue gain, to be the primary indicator of success driving their entire experimentation program.

David contextualized the issue by asking the audience what would happen if all businesses always put revenue first.

Well, that would mean no free returns for an e-commerce site (returns don’t increase gain!), no free sweets in the delivery packaging (think ASOS), the most inexpensive product photographs on the site, and so on.

And if you were to put immediate revenue gain first – as stakeholders so often want to do in an experimentation context – the implications are even more unsavory: you would offer the skimpiest customer service to cut costs; you would push ‘buy now!’ offers unendingly; you would discount everything and forget any kind of brand loyalty initiatives, etc.

In short, focussing too heavily on immediate, clearly measurable revenue gain inevitably cannibalizes the customer experience. And this, in turn, will diminish your revenue in the long run.

So, What Should A/B Testing be About?

One big thing experimenters can do is work with binomial metrics. Avoid the fuzziness and much of the complexity by running tests that aim to give you a yes/no, black or white answer:

binomial metrics examples

In a similar vein, be extremely clear and deliberate with your hypothesis, and be savvy with your secondary metrics:

secondary metrics

Use experimentation to avoid losses and minimize risk, and so on.

But perhaps the best thing you can do is modify your expectations. Instead of saying, experimentation should unfailingly lead to a clear revenue gain, each and every time, you might want to start saying, experimentation will allow us to make better decisions. And these better decisions – combined with all of the other efforts the company is making – will move your business in a better direction, one that includes revenue gain:

Good experimentation model

With this in mind, David was able to come to a satisfactory conclusion for everyone in the room, by slightly modifying his original theory:

I believe that the ROI of experimentation is difficult to achieve and should be contextualized to different stakeholders and businesses. We should not move completely away from pounds and pence way of thinking, but we should deprioritize it.