A/B Testing: Different Tools, Different Results?

AB TastyBlogExperimentationA/B Testing: Different Tools, Different Results?

Recently, a client of ours tried Kissmetrics’ significance calculator to see how their A/B test results compared to those displayed in AB Tasty’s. What a surprise when they found two completely different results for the exact same data!

Here’s a sample:

Variation Visitors diverted to variation Visitors that converted Conversion rate
A 10,299 1,439 13.97%
B 10,505 1,495 14.23%


Given the above figures, it appears variation B is beating variation A – at first glance. The question is: is variation B actually outperforming A or is it due to chance?

This is precisely what statistical algorithms in A/B testing try to determine. What is the likelihood of the result to be due to chance? Or the other way around, what is the likelihood of the result actually depicting reality?

In this test, AB Tasty’s algorithm displays a confidence rate of 70.4%, which our client decided to compare to those of other online tools, namely splittestcalculator.com and getdatadriven.com (the latter being powered by Kissmetrics – a rather reliable source of information!). In the following table, we brought along a few more sources as well.

Tool Confidence rate
AB Tasty 70.02%
Split Test Calculator 42.23%
Get Data Driven 70%
Hubspot 70.43%
Evan Miller’s calculator 41%

Why is there a difference?

Well, we are talking about two different methods of calculating: the chi-squared method gives 42% whereas the Bayesian approach gives 70.43%.

Choosing one method over the other is arbitrary, so let’s dig a little deeper (do keep reading, there is no need for a degree in astrophysics!).

To make it simple, there are two things to consider when placing a bet:

  • The probability that there is a difference (A beats B)
  • The gain (A is 20% better than B)

Calculation methods take these parameters into account but give them different weights, which leads to different results. The chi-squared method only takes the probability of a difference into account, whereas the Bayesian method is based on both probability and gain (or loss).

Bottom line: both are correct, although different.

How to make informed decisions?

“In most cases, focusing on a single source of information to qualify the difference between two variations leads to making blind decisions”, says AB Tasty’s Chief Data Scientist Hubert Wassner, “it is like saying Usain Bolt wins the race”.

Usain Bolt wins the competition
Usain Bolt smashed it

The first image shows Bolt taking a prize for having won the race, the second image shows by how much margin he won or the size of the difference. Given how far ahead he is when crossing the finish line, Bolt is probably very likely to win the next race as well.

The same goes for A/B testing: Bayesian statistics offer an estimation of the potential gain (or loss) whereas chi-squared statistics stick to providing confidence rate only.

The AB Tasty reporting gives upper and lower limits around the gain

Conversion rates (here 13.98% and 14.24%) and the gain (1.89%), as displayed in most A/B testing tools, give the impression that they are related to the reliability rate. Actually, they are just indicating the empirical conversion rates at the present time. The “real” conversion rates remain unknown.

The most valuable information here are the limits around the gain (-4.8% and +8.85). They should be read as follows: with a 95% confidence rate, the real value of the gain is between -4.8% and +8.85%. The higher the lower limit, the safer the decision.

Bottom line: the confidence rate only offers an indication of when it’s time to make a decision (there is a difference between the two and it’s not due to chance) and the limits indicate what decision should be made. You need a combination of both to have the best predictions of your test results and to spot variations worthy of Usain Bolt.

Share on Linkedin
Share on Facebook
Share on Twitter

You might also like...

Enable your product team to put users in the center of product strategy, while enabling engineering to deploy code faster.
Trusted by
Request a demo
Don't throw away hard-earned traffic.
Make your website work overtime - so you don't have to.
click-through rate
conversion rate
average order value
Trusted by
Request a demo