Article

9min read

A/B, Split or Multivariate Test: How to Choose the Right One

In the fast-paced world of digital marketing, settling for anything less than the best user experience is simply not an option.

Every marketing strategy has room for improvement and unlocking more comes from recognizing hidden opportunities.

With analytics data and a little bit of creativity, you can uncover some valuable insights on how to optimize your conversion rate on your website or campaign landing pages. However, achieving structured and streamlined data from your assumptions requires diligent testing.

Marketing professionals have steadily used different testing methodologies such as A/B testing, split testing, multivariate testing and multipage testing to increase conversion rates and enhance digital performance.

Experimenting and testing are essential as they eliminate opinions and bias from the decision-making process, ensuring data-driven decisions.

With the availability of many diverse testing options, it can be challenging to find your starting point. In this article, we’ll dive into the specifics of different forms of testing to help you navigate this testing landscape.

What is A/B testing?

flowers-366155_1280

A/B testing is a method of website optimization where you are comparing two versions of the same page: variation A and variation B.  For the comparison, it’s common to look at the conversion rates and metrics that matter to your business (clicks, page views, purchases, etc) while using live traffic.

It’s also possible to do an A/B/C/D test when you need to test more than two content variations. The A/B/C/D method will allow you to test three or more variations of a page at once instead of testing only one variation against the control version of the page.

When to use A/B tests?

A/B tests are an excellent method to test radically different ideas for conversion rate optimization or small changes on a page.

A/B testing is the right method to choose if you don’t have a large amount of traffic to your site. Why is this? A/B tests can deliver reliable data very quickly, without a large amount of traffic. This is a great approach to experimentation to maximize test time to achieve fast results.

If you have a high-traffic website, you can evaluate the performance of a much broader set of variations. However, there is no need to test 20 different variations of the same element, even if you have adequate traffic. It’s important to have a strategy when approaching experimentation.

Want to start testing? AB Tasty is the best-in-class experience optimization platform that empowers you to create a richer digital experience – fast. From experimentation to personalization, this solution can help you activate and engage your audience to boost your conversions.

Split testing vs A/B testing

A/B tests and split tests are essentially the same concept.

“A/B” refers to the two variations of the same URL where changes are made “live” using Javascript on the original page. SaaS tools that provide you with a visual editor, like AB Tasty, allow you to create these changes quickly without technical knowledge.

Meanwhile, “split” refers to the traffic redirection towards one variation or another, each hosted on its own URL and fully redesigned in the code.

You can consider A/B tests to work the same as split tests.

The variation page may differ in many aspects depending on the testing hypothesis you put forth and your industry goals (layout, design, pictures, headlines, sub-headlines, calls to action, offers, button colors, etc.).

In any case, the number of conversions on each page’s variation is compared once each variation gets enough visitors.

In A/B tests, the impact of the design as a whole is tracked, not individual elements – even though many design elements might be changed on variations simultaneously.

TIP: Keep in mind that testing is all about comparing the performances of variations. It’s recommended not to make too many changes between the control and variation versions of the page at the same time. You should limit the number of changes to better understand the impact of the results. In the long term, a continuous improvement process will lead to better and lasting performance.

What is multivariate testing?

magic-cube-378543_1280

Multivariate tests or multi-variant tests are the same as A/B tests in their core mechanism and philosophy. The difference is that multivariate testing allows you to compare a higher number of variables and the interactions between each other. In other words, you can test and track changes to multiple sections on a single page.

For multivariate testing, you need to identify a few key page sections and then create variations for those sections specifically. You aren’t creating variations of a whole page as you do while A/B testing.

TIP: Use multivariate testing when several element combinations on your website or landing page are called into question.

Multivariate testing reveals more information about how these changes to multiple sections interact with one another. In multivariate tests, website traffic is split into each possible combination of a page – where the effectiveness of the changes is measured.

It’s very common to use multivariate testing to optimize an existing website or landing page without making a significant investment in redesign.

Although this type of testing can be perceived as an easier way of experimentation – keep in mind that multivariate testing is more complicated than traditional A/B testing.

Multivariate tests are best suited for more advanced testers because they give many more possibilities of combinations for visitors to experience on your website. Too many changes on a page at once can quickly add up. You don’t want to be left with a very large number of combinations that must be tested.

Multivariate test example

Let’s say that you’ve decided to run a multivariate test on one of your landing pages. You choose to change two elements on your landing page. On the first variation, you swap an image for a video, and on the second variation, you swap the image for a slider.

For each page variation, you add another version of the headline. This means that now you have three versions of the main content and two versions of the headline. This is equal to six different combinations of the landing page.

Image Video Slider
Headline 1 Combination 1 Combination 2 Combination 3
Headline 2 Combination 4 Combination 5 Combination 6

After only changing two sections, you quickly have six variations. This is where multivariate testing can get tricky.

When to use multivariate testing?

Multivariate tests are recommended for sites with a large amount of daily traffic. You will need a site with a high volume of traffic to test multiple combinations, and it will take a longer time to obtain meaningful data from the test.

mvt
AB Tasty’s reporting allows you to weigh up each element’s impact on the conversion rate

The multivariate testing method will allow you to incrementally improve an existing design, while the test results can be used to apply to a larger website or landing page redesign.

What is multipage testing?

Multipage testing is an experimentation method similar to standard A/B testing. As we’ve discussed, in A/B testing, changes can be made to one specific page or to a group of pages.

If the changed element appears on several pages, you can choose whether or not to change it on each page. However, if the element is on several pages but it’s not identical, appears at a different place or has a different name, you’ll have to set up a multipage test.

Multipage tests allow you to implement changes consistently over several pages. 

This means that multipage tests allow you to link together variations of different pages and are especially useful when funnel testing.

In multipage tests, site visitors are funneled into one funnel version or the other. You need to track the way visitors interact with the different pages they are shown so you can determine which funnel variation is the most effective.

You must ensure that the users see a consistent variation of changes throughout a set of pages. This is key to getting usable data and allows one variation to be fairly tested against another.

Multipage test example

Let’s say you want to conduct a multipage test with a free shipping coupon displayed in the funnel at different places. You want to run the results of this test against the original purchase funnel without a coupon.

For example, you could offer visitors a free shipping coupon on a product category page – where they can see “Free shipping over €50” as a static banner on the page. Once the visitor adds a product to the shopping cart,  you can show them a new dynamic message according to the cart balance – “Add €X to your cart for free shipping”.

In this case, you can experiment with the location of the message (near the “Proceed to checkout” button, near the “Continue shopping” button, near the shipping cost for his order or somewhere else) and with the call-to-action variations of the message.

This kind of test will help you understand visitors’ purchase behavior better – i.e. how does the placement of a free shipping coupon reduce shopping cart abandonment and increase sales? After enough visitors come to the end of the purchase funnel through the different designs, you will be able to compare the effect of design styles easily and effectively.

How to test successfully?

Remember that the pages being tested need to receive substantial traffic so the tests will give you some relevant data to analyze.

Whether you use A/B testing, split testing, multivariate testing or multipage testing to increase your conversion rate or performance, remember to use them wisely.

Each type of test has its own requirements and is uniquely suited to specific situations, with advantages and disadvantages.

Using the proper test for the right situation will help you get the most out of your site and the best return on investment for your testing campaign. Even though testing follows a scientific method, there is no need for a degree in statistics when working with AB Tasty.

Related: How long you should run a test and how statistics calculation works with AB Tasty

You might also like...

Subscribe to
our Newsletter

bloc Newsletter EN

We will process and store your personal data to respond to send you communications as described in our  Privacy Policy.

Article

10min read

How Long Should You Run an A/B Test?

One of the most popular questions when starting with experimentation is: How long should an A/B test run before you can draw conclusions from it?

Determining the ideal A/B test duration can be a challenge for most businesses. You have to factor in your business cycles, traffic flow, the sample size needed and be aware of other business campaigns.

Even if you reach your sample size in a few days… is it okay to end your test then? How long should you really wait?

In this article, we will discuss potential mishaps if your testing cycle is too short, give insights into which factors you need to consider and share advice on finding the best duration for your A/B tests.

Looking for fast statistical reliability? At AB Tasty, we provide a free A/B test duration calculator, which also has capabilities for a sample size calculator.

What happens if you end an A/B test too soon?

The underlying question is a crucial one and can be summed up as follows: At what point can you end a test that appears to be yielding results?

The answer depends on the relevance of the analysis and on the actual benefits of the test.

In fact, it’s not all that unusual to see tests yield good results during the trial phase and no longer see those results once the modifications are introduced.

In most cases, a disappointing observation of this nature comes down to an error during the trial phase: the test was ended too soon and the results at that point were misleading.

Let’s look at an example that illustrates the nature of the problem.

How long to run an A/B test

The graph above shows the change in the conversion rate of two versions of a page that were the subject of a test. The first version appears to break away and perform extremely well. The discrepancy between the two versions is gradually eroded as time goes by – two weeks after the starting point there’s hardly any observable difference at all.

This phenomenon where the results converge is a typical situation: the modification made does not have a real impact on conversion.

There is a simple explanation for the apparent outperformance at the start of the test:  it’s unusual for the samples to be representative of your audience when the test starts. You need time for your samples to incorporate all internet user profiles, and therefore, all of their behaviors.

If you end the test too soon and allow your premature data to be the deciding factor, your results will quickly show discrepancies.

How to determine the duration of your A/B test

Now that the problem has been aired let’s have a look at how you can avoid falling into this trap.

The average recommended A/B testing time is 2 weeks, but you should always identify key factors relevant to your own conversion goals to determine the best length for a test that will meet your goals.

Let’s discuss several criteria you should use as a foundation to determine when you can trust the results you see in your A/B testing:

  • The statistical confidence level
  • The size of the sample
  • The representativeness of your sample
  • The test period and the device being tested

1.  The statistical confidence level

All A/B testing solutions show a statistical reliability indicator that measures the probability of the difference in the results observed between each sample not being a matter of chance.

This indicator, which is calculated using the Chi-squared test, is the first indicator that should be used as a basis. It is used by statisticians to assert that a test is deemed reliable when the rate is 95% or higher.  So, it is acceptable to make a mistake in 5% of cases and for the results of the two versions to be identical.

Yet, it would be a mistake to use this indicator alone as a basis for assessing the appropriate time to end a test.

For the purposes of devising the conditions necessary to assess the reliability of a test, this is not sufficient. In other words, if you have not reached this threshold then you cannot make the decision. Additionally, once this threshold has been reached, you still need to take certain precautions.

It’s also important to understand what the Chi-squared test actually is: a way of rejecting or not rejecting what is referred to as the null hypothesis.

This, when applied to A/B testing, is when you say that two versions produce identical results (therefore, there’s no difference between them).

If the conclusion of the test leads you to reject the null hypothesis then it means that there is a difference between the results.

However, the test is in no way an indication of the extent of this difference.

Related: A/B Test Hypothesis Definition, Tips and Best Practices

2. The size of the sample

There are lots of online tools that you can use to calculate the value of Chi-squared by giving, as the input parameters, the four elements necessary for its calculation (within the confines of a test with two versions).

AB Tasty can provide you with our own sample size calculator for you to find the value of Chi-squared.

By using this tool, we have taken an extreme example in order to illustrate this exact problem.

Sample size required for A/B testing

In this diagram, the Chi-squared calculation suggests that sample 2 converts better than sample 1 with a 95% confidence level. Having said that, the input values are very low indeed and there is no guarantee that if 1,000 people were tested, rather than 100, you would still have the same 1 to 3 ratio between the conversion rates.

It’s like flipping a coin. If there is a 50% probability that the coin will land heads-up or tails-up, then it’s possible to get a 70 / 30 distribution by flipping it just 10 times. It’s only when you flip the coin a very large number of times that you get close to the expected ratio of 50 / 50.

So, in order to have faith in the Chi-squared test, you are advised to use a significant sample size.

You can calculate the size of this sample before beginning the test to get an indication of the point at which it would be appropriate to look at the statistical reliability indicator. There are several tools online that you could use to calculate this sample size.

In practice, this can turn out to be difficult, as one of the parameters to be given is the % improvement expected (which is not easy to evaluate). But, it can be a good exercise to assess the pertinence of the modifications being envisaged.

Pro Tip: The lower the expected improvement rate, the greater the sample size needed to be able to detect a real difference.  

If your modifications have a very low impact, then a lot of visitors will need to be tested. This serves as an argument in favor of introducing radical or disruptive modifications that would probably have a greater impact on the conversion.

img_548ab40f4fb20

3. The representativeness of your sample

If you have a lot of traffic, then getting a sufficiently large sample size is not a problem and you will be able to get a statistical reliability rate in just a few days, sometimes just two or three.

Related: How to Deal with Low Traffic in CRO

Having said that, ending a test as soon as the sample size and statistical reliability conditions have been met is no guarantee that results in a real-life situation are being reproduced.

The key point is to test for as long as you need to in order for all of your audience segments to be included.

Actually, the statistical tests operate on the premise that your samples are distributed in an identical fashion. In other words, the conversion probability is the same for all internet users.

But this is not the case: the probability varies in accordance with different factors such as the weather, the geographical location and also user preferences.

There are two very important factors that must be taken into account here: your business cycles and traffic sources.

Your business cycles 

Internet users do not make a purchase as soon as they come across your site. They learn more, they compare, and their thoughts take shape.  One, two or even three weeks might elapse between the time they are the subject of one of your tests and the point at which they convert.

If your purchasing cycle is three weeks long and you have only run the test for one week, then your sample will not be representative. As the tool records visits from all internet users, they may not record the conversions of those that are impacted by your test.

Therefore, you’re advised to test over at least one business cycle and ideally two.

Your traffic sources 

Your sample must incorporate all of your traffic sources including emails, sponsored links and social networks. You need to make sure that no single source is over-represented in your sample.

Let’s take a concrete situation:  if the email channel is a weak source of traffic but significant in terms of revenue and you carry out a test during an email campaign, then you are going to include internet users who have a stronger tendency to make a purchase in your sample.

This would no longer be a representative sample. It’s also crucial to know about major acquisition projects and, if possible, not to test during these periods.

The same goes for tests during sales or other significant promotional periods that attract atypical internet users. You will often see less marked differences in the results if you re-do the tests outside these periods.

It turns out that it’s quite difficult to make sure that your sample is representative, as you have little control over the kind of internet users who take part in your test.

Thankfully, there are two ways of overcoming this problem.

  • The first is to extend the duration of your test more than is necessary in order to get closer to the normal spread of your internet users.
  • The second is to target your tests so that you only include a specific population group in your sample. For example, you could exclude all internet users who have come to you as a result of your email campaigns from your samples, if you know that this will distort your results. You could also target only new visitors so that you do not include visitors who have reached an advanced stage in their purchasing process (AKA visitors who are likely to convert regardless of which variation they see).

4. Other elements to keep in mind

There are other elements to bear in mind in order to be confident that your trial conditions are as close as they can be to a real-life situation: timing and the device.

Conversion rates can vary massively on different days of the week and even at different times of the day. Therefore, you’re advised to run the test over complete periods.

In other words, if you launch the test on a Monday morning then it should be stopped on a Sunday evening so that a normal range of conversions is respected.

In the same way, conversion rates can vary enormously between mobiles, tablets and desktop computers. So with devices, you’re advised to test your sites or pages specifically for each device. This is easy to accomplish by using the targeting features to include or exclude the devices if your users show very different browsing and purchasing behavior patterns.

These elements should be taken into account so that you do not end your tests too soon and get led astray by a faulty analysis of the results.

They also explain why certain A/A tests carried out over a period of time that is too short, or during a period of unusual activity, can present differences in results and also differences in statistical reliability, even when you may not have made any modifications at all.

The ideal A/B test duration

Running and A/B testing requires a thorough consideration of various factors such as your personal conversion goals, statistical significance, sample size, seasonality, campaigns, traffic sources, etc. All factors deserve equal attention when determining the best practices for your business.

Just remember to be patient, even if you reach your sample size early. You may be surprised by the ending results.

As A/B testing is an iterative process,  continuous experimentation and conversion rate optimization will lead to better results over time.