If your website traffic numbers aren’t as high as you may hope for, that’s no reason to give up on your conversion rate optimization (CRO) goals.
By now you must have noticed that most CRO advice is tailored for high-traffic websites. Luckily, this doesn’t mean you can’t optimize your website even if you have lower traffic.
The truth is, any website can be optimized – you just need to tailor your optimization strategy to suit your unique situation.
In this article, we will cover:
- The most common complaint about CRO (95% threshold and where it comes from)
- An appropriate threshold for low traffic
- Ideas on how to optimize your website with a smaller traffic flow
- The CUPED testing technique
- When CUPED works and doesn’t work
In order to make this article easier to understand, let’s start with an analogy. Imagine that instead of measuring two variants and picking a winner, we are measuring the performance of two boxers and placing bets on who will win the next 10 rounds.
So, how will we place our bet on who will win?
Imagine that boxer A and boxer B are both newbies that no one knows. After the first round, you have to make your choice. In the end, you will most likely place your bet on the boxer who won the first round. It might be risky if the winning margin is small, but in the end, you have no other way to base your decision.
Imagine now that boxer A is known to be a champion, and boxer B is a challenger that you don’t know. Your knowledge about boxer A is what we would call a prior – information you have before that influences your decision.
Based on the prior, you will be more likely to bet on boxer A as the champion for the next few rounds, even if boxer B wins the first round with a very small margin.
Furthermore, you will only choose boxer B as your predicted champion if they win the first round by a large margin. The stronger your prior, the larger the margin needs to be in order to convince you to change your bet.
Are you following? If so, the following paragraphs will be easy to grasp and you will understand where this “95% threshold” comes from.
Now, let’s move on to tips for optimizing your website with low traffic.
1. Solving the problem: “I never reach the 95% significance”
This is the most common complaint about CRO for websites with lower traffic and for lower traffic pages on bigger websites.
Before we dig into this most common problem, let’s start by answering the question, where does this 95% “golden rule” come from?
The origin of the 95% threshold
Let’s start our explanation with a very simple idea: What if optimization strategies were applied from day one? If two variants with no previous history were created at the same time, there would be no “original” version challenged by a newcomer.
This would force you to choose the best one from the beginning.
In this setting, any small difference in performance could be measured for decision-making. After a short test, you will choose the variant with the higher performance. It would not be good practice to pick the variant that had lower performance and furthermore, it would be foolish to wait for a 95% threshold to pick a winner.
But in practice, optimization is done well after the launch of a business.
So, in most real-life situations, there is a version A that already exists and a new challenger (version B) that is created.
If the new challenger, version B, comes along and the performance difference between the two variants is not significant, you will have no issues declaring version B “not a winner.”
Statistical tests are symmetric. So if we reverse the roles, swapping A and B in the statistical test will tell you that the original is not significantly better than the challenger. The “inconclusiveness” of the test is symmetric.
So, why do you set 100% of traffic toward the original at the end of an inconclusive test, implicitly declaring A as a winner? Because you have three priors:
- Version A was the first choice. This choice was made by the initial creator of the page.
- Version A has already been implemented and technically trusted. Version B is typically a mockup.
- Version A has a lot of data to prove its value, whereas B is a challenger with limited data that is only collected during the test period.
Points 1 & 2 are the bases of a CRO strategy, so you will need to go beyond these two priors. Point 3 explains that version A has more data to back its performance. This explains why you trust version A more than version B. Version A has data.
Now you understand that this 95% confidence rule is a way of explaining a strong prior. And this prior mostly comes from historical data.
Therefore, when optimizing a page with low traffic, your decision threshold should be below 95% because your prior on A is weaker due to its traffic and seniority.
The threshold should be set according to the volume of traffic that went through the original from day one. However, the problem with this approach is that we know that the conversion rates are not stable and can change over time. Think of seasonality – i.e. black Friday rush, vacation days, Christmas time increases in activity, etc. Because of the seasonal changes, you can’t compare performances in different periods.
This is why practitioners only take into account data for version A and version B taken at the same period of time and set a high threshold (95%) to accept the challenger as a winner in order to formalize a strong prior toward version A.
What is the appropriate threshold for low traffic?
It’s hard to suggest an exact number to focus on because it depends on your risk acceptance.
According to the hypothesis protocol, you should structure a time frame for the data collection period in advance.
This means that the “stop” criteria of a test are not a statistical measure or based on a certain number. The “stop” criteria should be a timeframe coming to an end. Once the period is over, then you should look at the stats to make an appropriate decision.
AB Tasty, our customer experience optimization and feature management software, uses the Bayesian framework which produces a “chances to win” index which encourages a direct interpretation instead of a p-value, which has a very complex meaning.
In other words, the “chances to win index” is the probability for a given variation to be better than the original.
Therefore, a 95% “chance to win” means that there is a 95% probability that the given variation will be the winner. This is assuming that we don’t have any prior knowledge or specific trust for the original.
The 95% threshold itself is also a default compromise between the prior you have on the original and a given level of risk acceptance (it could have even been a 98% threshold).
Although it is hard to give an exact number, let’s make a rough scale for your threshold:
- New A & B variations: If you have a case where variation A and variation B are both new, the threshold could be as low as 50%. If there is no past data on the variations’ performance and you must make a choice for implementation, even a 51% chance to win is better than 49%.
- New website, low traffic: If your website is new and has very low traffic, you likely have very little prior on variation A (the original variation in this case). In that case, setting 85% as a threshold is reasonable. Since it means that if you put aside the little you know about the original you still have 85% to pick up the winner and only 15% to pick a variation that is equivalent to the original, and a lesser chance that it performs worse. So depending on the context, such a bet can make sense.
- Mature business, low traffic: If your business has a longer history, but still lower traffic, 90% is a reasonable threshold. This is because there is still little prior on the original.
- Mature business, high traffic: Having a lot of prior, or data, on variation A suggests a 95% threshold.
The original 95% threshold is far too high if your business has low traffic because there’s little chance that you will reach it. Consequently, your CRO strategy will have no effect and data-driven decision-making becomes impossible.
By using AB Tasty as your experimentation platform, you will be given a report that includes the “chance to win” along with other statistical information regarding your web experiments. A report from AB Tasty would also include the confidence interval on the estimated gain as an important indicator. The boundaries around the estimated gain are also computed in a Bayesian way, which means it can be interpreted as the best and the worst scenario.
The importance of Bayesian statistics
Now you understand the exact meaning of the well-known 95% “significance” level and are able to select appropriate thresholds corresponding to your particular case.
It’s important to remember that this approach only works with Bayesian statistics since frequentist approaches give statistical indices (such as p-Values and confidence intervals that have a totally different meaning and are not suited to the explained logic).
2. Are the stats valid with small numbers?
Yes, they are valid as long as you do not stop the test depending on the result.
Remember the testing protocol says once you decide on a testing period, the only reason to stop a test is when the timeframe has ended. In this case, the stat indices (“chances to win” & confidence interval) are true and usable.
You may be thinking: “Okay, but then I rarely reach the 95% significance level…”
Remember that the 95% threshold doesn’t need to be the magic number for all cases. If you have low traffic, chances are that your website is not old. If you refer back to the previous point, you can take a look at our suggested scale for different scenarios.
If you’re dealing with lower traffic as a newer business, you can certainly switch to a lower threshold (like 90%). The threshold is still higher because it’s typical to have more trust in an original rather than a variant because it’s used for a longer time.
If you’re dealing with two completely new variants, at the end of your testing period, it will be easier to pick the variant with the higher conversions (without using a stat rest) since there is no prior knowledge of the performance of A or B.
3. Go “upstream”
Sometimes the traffic problem is not due to a low-traffic website, but rather the webpage in question. Typically, pages with lower traffic are at the end of the funnel.
In this case, a great strategy is to work on optimizing the funnel closer to the user’s point of entry. There may be more to uncover with optimization in the digital customer journey before reaching the bottom of the funnel.
4. Is the CUPED technique real?
What is CUPED?
Controlled Experiment Using Pre-Experiment Data is a newer buzzword in the experimentation world. CUPED is a technique that claims to produce up to 50% faster results. Clearly, this is very appealing to small-traffic websites.
Does CUPED really work that well?
Not exactly, for two reasons: one is organizational and the other is applicability.
The organizational constraint
What’s often forgotten is that CUPED means Controlled experiment Using Pre-Experiment Data.
In practice, the ideal period of “pre-experiment data” is two weeks in order to hope for a 50% time reduction.
So, for a 2-week classic test, CUPED claims that you can end the test in only 1 week.
However, in order to properly see your results, you will need two weeks of pre-experiment data. So in fact, you must have three weeks to implement CUPED in order to have the same accuracy as a classic 2-week test.
Yes, you are reading correctly. In the end, you will need three weeks time to run the experiment.
This means that it is only useful if you already have two weeks of traffic data that is unexposed to any experiment. Even if you can schedule two weeks of no experimentations into your experimentation planning to collect data, this will be blocking traffic for other experiments.
The applicability constraint
In addition to the organizational/2-week time constraint, there are two other prerequisites in order for CUPED to be effective:
- CUPED is only applicable to visitors browsing the site during both the pre-experiment and experiment periods.
- These visitors need to have the same behavior regarding the KPI under optimization. Visitors’ data must be correlated between the two periods.
You will see in the following paragraph that these two constraints make CUPED virtually impossible for e-commerce websites and only applicable to platforms.
Let’s go back to our experiment settings example:
- Two weeks of pre-experiment data
- Two weeks of experiment data (that we hope will only last one week as there is a supposed 50% time reduction)
- The optimization goal is a transaction: raising the number of conversions.
Constraint number 1 states that we need to have the same visitors in pre-experiment & experiment, but the visitor’s journey in e-commerce is usually one week.
In other words, there is very little chance that you see visitors in both periods. In this context, only a very limited effect of CUPED is to be expected (up to the portion of visitors that are seen in both periods).
Constraint number 2 states that the visitors must have the same behavior regarding the conversion (the KPI under optimization). Frankly, that constraint is simply never met in e-commerce.
The e-commerce conversion occurs either during the pre-experiment or during the experiment but not in both (unless your customer frequently purchases several times during the experiment time).
This means that there is no chance that the visitors’ conversions are correlated between the periods.
In summary: CUPED is simply not applicable for e-commerce websites to optimize transactions.
It is clearly stated in the original scientific paper, but for the sake of popularity, this buzzword technique is being misrepresented in the testing industry.
In fact, and it is clearly stated in scientific literature, CUPED works only on multiple conversions for platforms that have recurring visitors performing the same actions.
Great platforms for CUPED would be search engines (like Bing, where it has been invented) or streaming platforms where users come daily and do the same repeated actions (playing a video, clicking on a link in a search result page, etc).
Even if you try to find an application of CUPED for e-commerce, you’ll find out that it’s not possible.
- One may say that you could try to optimize the number of products seen, but the problem of constraint 1 still applies: a very little number of visitors will be present on both datasets. And there is a more fundamental objection – this KPI should not be optimized on its own, otherwise you are potentially encouraging hesitation between products.
- You cannot even try to optimize the number of products ordered by visitors with CUPED because constraint number 2 still holds. The act of purchase can be considered as instantaneous. Therefore, it can only happen in one period or the other – not both. If there is no visitor behavior correlation to expect then there is also no CUPED effect to expect.
Conclusion about CUPED
CUPED does not work for e-commerce websites where a transaction is the main optimization goal. Unless you are Bing, Google, or Netflix — CUPED won’t be your secret ingredient to help you to optimize your business.
This technique is surely a buzzword spiking interest fast, however, it’s important to see the full picture before wanting to add CUPED into your roadmap. E-commerce brands will want to take into account that this testing technique is not suited for their business.
Optimization for low-traffic websites
Brands with lower traffic are still prime candidates for website optimization, even though they might need to adapt to a less-than-traditional different approach.
Whether optimizing your web pages means choosing a page that’s higher up in the funnel or adopting a slightly lower threshold, continuous optimization is crucial.
Want to start optimizing your website? AB Tasty is the best-in-class experience optimization platform that empowers you to create a richer digital experience – fast. From experimentation to personalization, this solution can help you activate and engage your audience to boost your conversions.