Article

8min read

How to Rebrand Your Site Using Experimentation in 5 Easy Steps

 

We invited Holly Ingram from our partner REO Digital, an agency dedicated to customer experience, to talk us through the practical ways you can use experimentation when doing a website redesign.

 

Testing entire site redesigns at once is a huge risk. You can throw away years of incremental gains in UX and site performance if executed incorrectly. Not only do they commonly fail to achieve their goals, but they even fail to achieve parity with the old design. That’s why an incremental approach, where you can isolate changes and accurately measure their impact, is most commonly recommended. That being said, some scenarios warrant an entire redesign, in which case, you need a robust evidence-driven process to reduce this risk. 

Step 1 – Generative research to inform your redesign 

With the level of collaboration involved in a redesign, changes must be based on evidence over opinion. There’s usually a range of stakeholders who all have their own ideas about how the website should be improved and despite their best intentions, this process often leads to prioritizing what they feel is important, which doesn’t always align with customers goals. The first step in this process is to carry out research to see your site as your customers do and identify areas of struggle. 

It’s important here to use a combination of quantitative research (to understand how your users behave) and qualitative research (to understand why). Start off broad using quantitative research to identify areas of the site that are performing the worst, looking for high drop-off rates and poor conversion. Now you have your areas of focus you can look at more granular metrics to gather more context on the points of friction. 

  • Scroll maps: Are users missing key information as it’s placed below the fold?  
  • Click maps: Where are people clicking? Where are they not clicking? 
  • Traffic analysis: What traffic source(s) are driving users to that page? What is the split between new and returning? 
  • Usability testing: What do users that fit your target audience think of these pages? What helps them? What doesn’t help? 
  • Competitor analysis: How do your competitors present themselves? How do they tackle the same issues you face?

Each research method has its pros and cons. Keep in mind the hierarchy of evidence. The hierarchy is visually depicted as a pyramid, with the lowest-quality research methods (having the highest risk of bias) at the bottom of the pyramid and the highest-quality methods (with the lowest risk of bias) at the top. When reviewing your findings place more importance on findings that come from research methods at the top of the pyramid, e.g. previous A/B test findings, than research methods that come at the bottom (e.g. competitor analysis).

Step 2 – Prioritise areas that should be redesigned 

Once you have gathered your data and prioritised your findings based on quality of evidence you should be able to see which areas you should focus on first. You should also have an idea of how you might want to improve them. This is where the fun part comes in, and you can start brainstorming ideas. Collaboration is key here to ensure a range of potential solutions are considered. Try and get the perspective of designers, developers, and key stakeholders. Not only will you discover more ideas, but you will also save time as everyone will have context on the changes. 

 It’s not only about design. A common mistake people make when doing a redesign is purely focussing on design and making the page look ‘prettier’, and not changing the content. Through research, you should have identified content that performs well and content that could do with an update. Make sure you consider this when brainstorming.

Step 3 – Pilot your redesign through a prototype 

It can be tempting once you’ve come up with great ideas to go ahead and launch it. Even if you are certain this new page will perform miles better than the original, you’d be surprised how often you’re wrong. Before you go ahead and invest a lot of time and money into building your new page,  it’s a good idea to get some outside opinions from your target audience. The quickest way to do this is to build a prototype and get users to feedback on it through user testing. See what their attention is drawn to, if there’s anything on the page they don’t like or think is missing. It’s much quicker to make these changes before launching than after. 

Step 4 – A/B test your redesign to know with statistical certainty whether your redesign performs better

Now you have done all this work conducting research, defining problem statements, coming up with hypotheses, ideating solutions and getting feedback, you want to see if your solution actually works better!

However, do not make the mistake of jumping straight into launching on your website. Yes it will be quicker, but you will never be able to quantify the difference all of that work has made to your key metrics. You may see conversion rate increase, but how do you know that is due to the redesign and nothing else (e.g. a marketing campaign or special offer deployed around the same time)? Or worse, you see conversion rate decrease and automatically assume it must be down to the redesign when in fact it’s not.  

With an A/B test you can rule out outside noise. For simplicity, imagine the scenario where you have launched your redesign, in reality it made no difference, but due to a successful marketing campaign around the same time you saw an increase in conversion rate. If you had launched your redesign as an A/B test, you would see no difference between the control and the variant, as both would have been equally affected by the marketing campaign. 

This is why it is crucial you A/B test your redesign. Not only will you be able to quantify the difference your redesign has made, you will be able to tell whether that change is statistically significant. This means you will know the probability that the change you have seen is due to the test rather than random chance. This can help minimize the risk that redesigns often bring.  

Once you have your results you can then choose whether you want to launch the redesign to 100% of users, which you can do through the testing tool whilst you wait for the changes to be hardcoded. As the redesign has already been built for the A/B test, hardcoding it should be a lot quicker!

Step 5 – Evaluative research to validate how your redesign performs 

Research shouldn’t stop once the redesign has been launched. We recommend conducting post-launch analysis to evaluate how it performs over time. This especially helps measure metrics that have a longer lead time, such as returns or cancellations.

Redesigns are susceptible to visitor bias, as rolling out a completely different experience can be shocking and uncomfortable for your returning visitors. They are also susceptible to novelty effects, where users can react more positively just because something looks new and shiny. In either case, these effects will wear off with time. That’s why it’s important to monitor performance after it’s deployment.

Things to look out for: 

  • Bounce rate 
  • On-page metrics (scroll rate, click-throughs, heatmap, mouse tracking) 
  • Conversion rate 
  • Funnel progression 
  • Difference in performance for new vs. returning users 

Redesigns are all about preparation. It may seem thorough, but it should be with such a big change. If you follow the right process you could dramatically increase sales and conversions, but if done wrong you may have wasted some serious time, effort and money. Don’t skimp on the research and keep a user-centred approach and you could create a website your audience loves.

If you want to find out more about how a redesign worked with a real customer of AB Tasty’s and REO – take a look at this webinar where La Redoute details how they tested the new redesign of their site and sought continuous improvement.

Subscribe to
our Newsletter

bloc Newsletter EN

We will process and store your personal data to respond to send you communications as described in our  Privacy Policy.

Article

8min read

How to Better Handle Collateral Effects of Experimentation: Dynamic Allocation vs Sequential Testing

When talking about web experimentation, the topics that often come up are learning and earning. However, it’s important to remember that a big part of experimentation is encountering risks and losses. Although losses can be a touchy topic, it’s important to talk about and destigmatize failed tests in experimentation because it encourages problem-solving, thinking outside of your comfort zone and finding ways to mitigate risk. 

Therefore, we will take a look at the shortcomings of classic hypothesis testing and look into other options. Basic hypothesis testing follows a rigid protocol: 

  • Creating the variation according to the hypothesis
  • Waiting a given amount of time 
  • Analyzing the result
  • Decision-making (implementing the variant, keeping the original, or proposing a new variant)

This rigid protocol and simple approach to testing doesn’t say anything about how to handle losses. This raises the question of what happens if something goes wrong? Additionally, the classic statistical tools used for analysis are not meant to be used before the end of the experiment.

If we consider a very general rule of thumb, let’s say that out of every 10 experiments, 8 will be neutral (show no real difference), one will be positive, and one will be negative. Practicing classic hypothesis testing suggests that you just accept that as a collateral effect of the optimization process hoping to even it out in the long term. It may feel like crossing a street blindfolded.

For many, that may not cut it. Let’s take a look at two approaches that try to better handle this problem: 

  • Dynamic allocation – also known as “Multi Armed Bandit” (MAB). This is where traffic allocation changes for each variation according to their performance, implicitly lowering the losses.
  • Sequential testing – a method that allows you to stop a test as soon as possible, given a risk aversion threshold.

These approaches are statistically sound but they come with their assumptions. We will go through their pros and cons within the context of web optimization.

First, we’ll look into the classic version of these two techniques and their properties and give tips on how to mitigate some of their problems and risks. Then, we’ll finish this article with some general advice on which techniques to use depending on the context of the experiment.

Dynamic allocation (DA)

Dynamic allocation’s main idea is to use statistical formulas that modify the amount of visitors exposed to a variation depending on the variation’s performance. 

This means a poor-performing variation will end up having little traffic which can be seen as a way to save conversions while still searching for the best-performing variation. Formulas ensure the best compromise between avoiding loss and finding the real best-performing variation. However, this implies a lot of assumptions that are not always met and that make DA a risky option. 

There are two main concerns, both of which are linked to the time aspect of the experimentation process: 

  • The DA formula does not take time into account 

If there is a noticeable delay between the variation exposure and the conversion, the algorithm may go wrong resulting in a visitor being considered a ‘failure’ until they convert. This means that the time between a visit and a conversion will be falsely counted as a failure.

As a result, the DA will use the wrong conversion information in its formula so that any variation gaining traffic will automatically see a (false) performance drop because it will detect a growing number of non-converting visitors. As a result, traffic to that variation will be reduced.  

The reverse may also be true: a variation with decreasing traffic will no longer have any new visitors while existing visitors of this variation could eventually convert. In that sense, results would indicate a (false) rise in conversions even when there are no new visitors, which would be highly misleading.

DA gained popularity within the advertising industry where the delay between an ad exposure and its potential conversion (a click) is short. That’s why it works perfectly well in this context. The use of Dynamic Allocation in CRO must be done in a low conversion delay context only.

In other words, DA should only be used in scenarios where visitors convert quickly. It’s not recommended for e-commerce except for short-term campaigns such as flash sales or when there’s not enough traffic for a classic AB test. It can also be used if the conversion goal is clicking on an ad on a media website.

  • DA and the different days of the week 

It’s very common to see different visitor behavior depending on the day of the week. Typically, customers may behave differently on weekends than during weekdays.  

With DA, you may be sampling days unevenly, implicitly giving more weight on some days for some variations. However, you should weigh each day the same because, in reality, you have the same amount of weekdays. You should only use Dynamic Allocation if you know that the optimized KPI is not sensitive to fluctuations during the week.

The conclusion is that DA should be considered only when you expect too few total visitors for classic A/B testing. Another requirement is that the KPI under experimentation needs a very short conversion time and no dependence on the day of the week. Taking all this into account: Dynamic Allocation should not be used as a way to secure conversions.

Sequential Testing (ST)

Sequential Testing is when a specific statistical formula is used enabling you to stop an experiment. This will depend on the performance of variations with given guarantees on the risk of false positives. 

The Sequential Testing approach is designed to secure conversions by stopping a variation as soon as its underperformance is statistically proven. 

However, it still has some limitations. When it comes to effect size estimation, the effect size may be wrong in two senses: 

  • Bad variations will be seen as worse than they really are. It’s not a problem in CRO because the false positive risk is still guaranteed. This means that in the worst-case scenario, you will discard not a strictly losing variation but maybe just an even one, which still makes sense in CRO.
  • Good variations will be seen as better than they really are. It may be a problem in CRO since not all winning variations are useful for business. The effect size estimation is key to business decision-making. This can easily be mitigated by using sequential testing to stop losing variations only. Winning variations, for their part, should be continued until the planned end of the experiment, ensuring both correct effect size estimation and an even sampling for each day of the week.
    It’s important to note that not all CRO software use this hybrid approach. Most of them use ST to stop both winning and losing variations, which is wrong as we’ve just seen.

As we’ve seen, by stopping a losing variation in the middle of the week, there’s a risk you may be discarding a possible winning variation. 

However, to actually have a winning variation after ST has shown that it’s underperforming, this variation will need to perform so well that it becomes even with the reference. Then, it would also have to perform so well that it outperforms the reference and all that would need to happen in a few days. This scenario is highly unlikely.

Therefore, it’s safe to stop a losing variation with Sequential Testing, even if all weekdays haven’t been evenly sampled.

The best of both worlds in CRO 

Dynamic Allocation is the best approach to experimentation instead of static allocation when you expect a small volume of traffic. It should be used only in the context of ‘short delay KPI’ and with no known weekday effect (for example: flash sales). However, it’s not a way to mitigate risk in a CRO strategy.

To be able to run experiments with all the needed guarantees, you need a hybrid system using Sequential Testing to stop losing variations and a classic method to stop a winning variation. This method will allow you to have the best of both worlds.