A/B testing technology: how to measure the impact on load speed

This is my personal response to a study by ConversionXL and introduction to an unbiased benchmark of the impact of testing on site speed. Above, a sample of what you’ll find there.

Rémi Aubert, CEO at AB Tasty.

Context

On May 18th, digital agency OrangeValley in collaboration with ConversionXL, a blog writing about conversion rate optimisation, published a study that tackles the impact A/B testing has on site speed.

Comforted by the knowledge that we have made significant and continuous improvements on loading speed to ensure we meet the highest of market expectations, I approached the article with great enthusiasm. Boy, was I in for a mighty surprise!

Long story short, the study alleges that my company’s testing tool, AB Tasty, adds to page load time noticeably more than those of our competitors…!

Knowing that our clients pay close attention to page load time, the impact our technology has on site speed is something to which we devote considerable effort and naturally take very seriously. Despite OrangeValley’s and ConversionXL’s good intentions, given the impact the study has on the industry and sheer amount hard work that goes into building and maintaining an enterprise-level conversion optimisation tool, I feel obliged to respond to the claims set out in the article.

Why? For the past 4 years, I’ve personally been running comparative tests to ensure that AB Tasty excels in performance and deserves a top place among the worlds’ best A/B testing platforms. With my own results in mind, I cast serious doubt on the accuracy of the study: the applied methodology lacks in coherency and logic, effectively leading to flawed results. Let me explain what they have done wrong and what one should do to get accurate results from comparative tests.

Lack of transparency and unstable testing environment

Although ConversionXL has tried to detail the actual conditions of the test, the information provided is not sufficiently transparent. They should have provided the reader with:

  • tested page URLs
  • baseline page URLs (not tested) for comparison. Were they even used?
  • real-time third-party tools’ results

Only run in December, a very low number of tests (around 80) were made. Just as when A/B testing, you can’t draw any conclusions from such numbers as the risk for false positives are much too high.

To solve the lack of transparency, I have launched a test website where you can find unbiased results in real-time. My methodology is clearly detailed and all information is available for verification. In addition to the tools I usually use to benchmark site speed, I have also implemented the tool (SpeedCurve) ConversionXL claims to have used for the study.

To find out more, check out the website directly. You will be surprised how different results are!

Comparing apples and oranges

SiteSpect, ranked first in the study, is a server-side (or DNS-side) technology, whereas every other tool in the comparison is based on JavaScript. Consequently, ConversionXL is comparing apples and oranges. This is the first and perhaps the most telling clue to why the comparison is flawed – the tools are simply too different to be compared on speed as the one and only criteria.

Why? Speed comes at a cost… Server-side tools choose speed first, all the others choose security and safety. If a server-side tool is down, so are the clients’ websites! Nasty, right? On the other hand, if AB Tasty (or any JavaScript based provider using a CDN balancer, which isn’t the case with most of our competitors) would go silent, visitors would be served by the original page.

Traffic allocation matters

Additionally, the study doesn’t give any information about traffic allocation. Is it a 50/50 split or is 100% of the traffic diverted to the variation? We know for a fact that the test set up with AB Tasty diverts 100% of the traffic to the variation, whereas it is a feature other tools don’t offer so they simply couldn’t have been tested using the same method. This clearly reveals that different rules were applied to different tools.

Seasoned testers know that loading a variation takes longer than loading a control page. Here is an example: let’s say that loading times are stable: 1 second for the original, 1.5 second for the variation:

Say you run 80 tests with 50%/50% for repartition. Average loading time would be as follows:

(1*40 + 1.5*40) / 80 = 1.25s

Now let’s say you divert 100% of the traffic to the variation and run 80 tests, the average loading time would be 1.5s.

This hasn’t been taken into account at all!

Taking backend loading time into account

When a web page loads, the first request is made to the website’s server followed by third-party scripts. The delay before a third-party script is called is obviously not in the hands of the A/B testing provider (should you have problems here, you should probably look into tag management platforms).

ConversionXL did not subtract the back-end loading time from its calculation, despite it clearly affecting the overall loading time. Back-end loading varies in time and can as a consequence perform very differently from one test to the other. Again, ConversionXL ended up drawing conclusions based on irrelevant information.

So, how should you assess the impact of testing tools?

My point here is not to show that AB Tasty has the best loading time (security, remember?). I am, however, very confident in my team’s efforts in improving page speed and reducing the inevitable impact of A/B testing, which is why I offer you an unbiased way of assessing your A/B testing tool – by yourself.

On this website, you’ll find:

  • tests running on verifiable pages
  • a baseline page
  • Google Webpage Speed results in real-time
  • SpeedCurve results in real-time
  • PingDom results in real-time

The testing environment is fully detailed here.

I hope that makes things a little clearer. Happy testing!

Comments? Leave a comment here or reach out to me personally at remi@abtasty.com.

Related Posts


Tweet
Share
Share
Pocket
Buffer
X