Category: Data Science

Article

Jun 24, 2025

9min read

Heatmaps: Your Team’s Secret Weapon for Uncovering Website Gold

AB Tasty

What are heatmaps? (and why your team needs them)

Think of heatmaps as your website’s truth-teller. They’re visual snapshots showing exactly where visitors click, scroll, and linger. No guesswork required.

Here’s how they work: Warm colors (reds, oranges) highlight the hotspots where users engage most. Cool colors (blues, greens) reveal the overlooked zones that might need attention.

The best part? Your visitors do all the heavy lifting. They show you what’s working and what’s not, so your team can make changes that actually move the needle.

Spot the signals: When to bring heatmaps into play

Heatmaps aren’t just pretty pictures—they’re your optimization toolkit’s MVP. Here’s how they deliver the biggest impact:

Measuring real engagement

Writing content that no one reads? Heatmaps show you exactly where readers drop off. If only 10% of visitors reach your CTA, it’s time to shake things up.

Tracking what matters: Actions

Are people clicking where you want them to? Heatmaps reveal if visitors complete your desired actions—or where they’re getting stuck instead.

Highlighting where attention sticks (and slips)

What grabs your attention first? What images distract from your main message? Heatmaps answer these questions so you can double down on what works.

Once you have these insights, bigger questions become easier to tackle:

Where should we place our most important content?
How can we use images and videos more effectively?
What’s pulling attention away from our goals?

The essential heatmap lineup every team needs

Most modern heatmap tools offer multiple views of user behavior. We partner closely with some of the major players already. Let’s break down the most common ones you’ll come across.

Click Heatmaps: The Action Tracker

These maps show every click on your page, with dense concentrations appearing as bright white areas surrounded by warm colors. Think of them as your conversion reality check.

What it tells you: Whether people click where you want them to—or if they’re trying to click non-clickable elements that look interactive.

How to use it: Look for clicks scattered around non-interactive text or images. These “frustrated clicks” signal design problems. If users are clicking on underlined text that isn’t a link, or images they expect to be clickable, you need to either make those elements functional or redesign them to look less interactive.

Pro tip: Compare click density on your primary CTA versus other page elements. If secondary elements are getting more clicks than your main conversion button, it’s time to redesign your visual hierarchy.

Scroll Heatmaps: The Attention Meter

See how far down visitors scroll and what percentage of users reach each section of your page. This is crucial for understanding whether your important content is actually being seen.

What it tells you: If users actually see your important content or bail before reaching your CTA. Most importantly, it shows you the “fold line”—where 50% of users stop scrolling.

How to use it: Identify the scroll percentage where you lose half your audience, then ensure all critical elements (value propositions, CTAs, key benefits) appear above that line. If your main CTA is only seen by 20% of visitors, move it higher or add secondary CTAs above the fold.

Pro tip: Use scroll maps to optimize content length. If 80% of users stop reading halfway through your blog post, either shorten the content or add more engaging elements (images, subheadings, interactive elements) to keep them scrolling.

Click Percentage Maps: The Element Analyzer

This view breaks down clicks by specific elements, showing exactly how many people clicked each button, image, or link as a percentage of total visitors.

What it tells you: Which elements deserve prime real estate and which ones are dead weight. You’ll see precise engagement rates for every clickable element on your page.

How to use it: Rank your page elements by click percentage to understand what’s actually driving engagement. If your newsletter signup gets 15% clicks but your main product CTA only gets 3%, you might need to redesign your primary call-to-action or reconsider your page goals.

Pro tip: Use this data to inform A/B tests. If one button consistently outperforms others, test applying its design (color, size, copy) to underperforming elements.

Confetti Maps: The Individual Click Tracker

Instead of showing click density, these maps display each individual click as a colored dot. Perfect for spotting users trying to click non-clickable areas or understanding click patterns in detail.

What it tells you: Where to add functionality or remove confusion. Each dot represents a real user’s intent to interact with something on your page.

How to use it: Look for clusters of dots over non-interactive elements—these represent frustrated users trying to click things that don’t work. Also watch for dots scattered far from any actual buttons or links, which might indicate responsive design issues or accidental clicks.

Pro tip: Filter confetti maps by traffic source or user segment. Mobile users might have different click patterns than desktop users, and organic traffic might behave differently than paid traffic.

Mobile-Specific Heatmaps: The Touch Tracker

Modern tools capture mobile-specific actions like taps, swipes, pinches, and multi-touch gestures—because mobile behavior is fundamentally different from desktop.

What it tells you: How to optimize for the majority of your traffic (since mobile often dominates). Mobile users have different interaction patterns, attention spans, and conversion behaviors.

How to use it: Create separate heatmaps for mobile and desktop traffic. Mobile users typically scroll faster, have shorter attention spans, and interact differently with buttons and forms. Use this data to optimize button sizes, reduce form fields, and adjust content layout for mobile-first experiences.

Pro tip: Pay special attention to thumb-reach zones on mobile heatmaps. Elements that are easy to tap with a thumb (bottom third of screen, right side for right-handed users) typically get higher engagement rates.

Learn more about best practices for designing for mobile experiences with our Mobile Optimization Guide.

Eyes vs. clicks: Understanding the key differences

While heatmaps track mouse movements and clicks, eye-tracking follows actual gaze patterns. Eye-tracking gives deeper insights but requires specialized equipment most teams don’t have.

The good news? AI-powered tools like Feng-Gui and EyeQuant now simulate eye-tracking through algorithms, making this technology more accessible.

Bottom line: Start with heatmaps. They’re easier to implement and give you actionable insights right away.

Features that make or break your heatmapping game

Not all heatmap tools are created equal. Here’s what your team should prioritize:

Must-have features:

Audience Segmentation: Create maps for specific user groups (new vs. returning visitors, mobile vs. desktop)
Map Comparison: Easily compare results across different segments
Page Templates: Aggregate data for similar page types (crucial for e-commerce sites)
Mobile Optimization: Track touch, scroll, and swipe behaviors
Export Capabilities: Share results with your team effortlessly
Dynamic Element Tracking: Capture interactions with dropdowns, sliders, and AJAX-loaded content
Historical Data: Preserve old heatmaps even after design changes

Test smarter with heatmap insights

Here’s where things get exciting. Heatmaps show you the problems, but how do you know if your fixes actually work?

Enter A/B testing.

This three-step approach turns insights into results:

Identify problems with heatmaps
Test potential solutions with A/B testing
Choose the highest-performing solution based on data

Real Example:

Nonprofit UNICEF France wanted to better understand how visitors perceived its homepage ahead of a major redesign.

Their move: UNICEF France combined on-site surveys with heatmapping to gather both qualitative feedback and visual behavioral data.

The result: Heatmaps showed strong engagement with the search bar, while surveys confirmed it was seen as the most useful element. Less-used features, like social share icons, were removed in the redesign—resulting in a cleaner, more user-focused homepage.

Continue reading this case study

Connect the dots and act with confidence

Ready to put heatmaps to work? Here’s your game plan:

Start small. Pick one high-traffic page and run your first heatmap analysis.

Look for patterns. Are users clicking where you expect? Scrolling to your key content? Getting stuck somewhere?

Test your hunches. Use A/B testing to validate any changes before rolling them out site-wide.

Iterate forward. Heatmaps aren’t a one-and-done tool but part of your ongoing optimization process.

Remember: every click tells a story. Every scroll reveals intent. Your visitors are already showing you how to improve—you just need to listen.

Ready to see what your visitors are really doing? Heatmaps give you the insights. A/B testing helps you act on them. Together, they’re your path to better conversions and happier users.

You might also like...

See all

Article

3min read

Experiment Health Check: Proactive Monitoring for Reliable Experimentation

Emily Healy

Jul 1, 2025

Article

6min read

9 AI Features that Transform How Digital Teams Test, Learn, and Grow

Emily Healy

Jun 20, 2025

Article

7min read

From Search to Checkout: 10 Data-Driven E-commerce Trends for 2025

Maddie Ostrander

Jun 17, 2025

Subscribe to
our Newsletter

Article

Jul 18, 2024

4min read

Transaction Testing With AB Tasty’s Report Copilot

Hubert Wassner

Transaction testing, which focuses on increasing the rate of purchases, is a crucial strategy for boosting your website’s revenue.

To begin, it’s essential to differentiate between conversion rate (CR) and average order value (AOV), as they provide distinct insights into customer behavior. Understanding these metrics helps you implement meaningful changes to improve transactions.

In this article, we’ll delve into the complexities of transaction metrics analysis and introduce our new tool, the “Report Copilot,” designed to simplify report analysis. Read on to learn more.

Transaction Testing

To understand how test variations impact total revenue, focus on two key metrics:

Conversion Rate (CR): This metric indicates whether sales are increasing or decreasing. Tactics to improve CR include simplifying the buying process, adding a “one-click checkout” feature, using social proof, or creating urgency through limited inventory.
Average Order Value (AOV): This measures how much each customer is buying. Strategies to enhance AOV include cross-selling or promoting higher-priced products.

By analyzing CR and AOV separately, you can pinpoint which metrics your variations impact and make informed decisions before implementation. For example, creating urgency through low inventory may boost CR but could reduce AOV by limiting the time users spend browsing additional products. After analyzing these metrics individually, evaluate their combined effect on your overall revenue.

Revenue Calculation

The following formula illustrates how CR and AOV influence revenue:

Revenue=Number of Visitors×Conversion Rate×AOV

In the first part of the equation (Number of Visitors×Conversion Rate), you determine how many visitors become customers. The second part (×AOV) calculates the total revenue from these customers.

Consider these scenarios:

If both CR and AOV increase, revenue will rise.
If both CR and AOV decrease, revenue will fall.
If either CR or AOV increases while the other remains stable, revenue will increase.
If either CR or AOV decreases while the other remains stable, revenue will decrease.
Mixed changes in CR and AOV result in unpredictable revenue outcomes.

The last scenario, where CR and AOV move in opposite directions, is particularly complex due to the variability of AOV. Current statistical tools struggle to provide precise insights on AOV’s overall impact, as it can experience significant random fluctuations. For more on this, read our article “Beyond Conversion Rate.”

While these concepts may seem intricate, our goal is to simplify them for you. Recognizing that this analysis can be challenging, we’ve created the “Report Copilot” to automatically gather and interpret data from variations, offering valuable insights.

Report Copilot

The “Report Copilot” from AB Tasty automates data processing, eliminating the need for manual calculations. This tool empowers you to decide which tests are most beneficial for increasing revenue.

Here are a few examples from real use cases.

Winning Variation:

The left screenshot provides a detailed analysis, helping users draw conclusions about their experiment results. Experienced users may prefer the summarized view on the right, also available through the Report Copilot.

Complex Use Case:

The screenshot above demonstrates a case where CR and OAV have opposite trends and need a deeper understanding of the context.

It’s important to note that the Report Copilot doesn’t make decisions for you; it highlights the most critical parts of your analysis, allowing you to make informed choices.

Conclusion

Transaction analysis is complex, requiring a breakdown of components like conversion rate and average order value to better understand their overall effect on revenue.

We’ve developed the Report Copilot to assist AB Tasty users in this process. This feature leverages AB Tasty’s extensive experimentation dashboard to provide comprehensive, summarized analyses, simplifying decision-making and enhancing revenue strategies.

You might also like...

See all

Article

3min read

Experiment Health Check: Proactive Monitoring for Reliable Experimentation

Emily Healy

Jul 1, 2025

Article

9min read

Heatmaps: Your Team’s Secret Weapon for Uncovering Website Gold

AB Tasty

Jun 24, 2025

Article

6min read

9 AI Features that Transform How Digital Teams Test, Learn, and Grow

Emily Healy

Jun 20, 2025

Subscribe to
our Newsletter

Article

Jul 16, 2024

5min read

The Past, Present, and Future of Experimentation | Bhavik Patel

Maddie Ostrander

What is the future of experimentation? Bhavik Patel highlights the importance of strategic planning and innovation to achieve meaningful results.

A thought leader in the worlds of CRO and experimentation, Bhavik Patel founded popular UK-based meetup community, CRAP (Conversion Rate, Analytics, Product) Talks, seven years ago to fill a gap in the event market – opting to cover a broad range of optimization topics from CRO, data analysis, and product management to data science, marketing, and user experience.

After following his passion throughout the industry from acquisition growth marketing to experimentation and product analytics, Bhavik landed the role of Product Analytics & Experimentation Director at product measurement consultancy, Lean Convert, where his interests have converged. Here he is scaling a team and supporting their development in data and product thinking, as well as bringing analytical and experimentation excellence into the organization.

AB Tasty’s CMO Marylin Montoya spoke with Bhavik about the future of experimentation and how we might navigate the journey from the current mainstream approach to the potentialities of AI technology.

Here are some of the key takeaways from their conversation.

The evolution of experimentation: a scientific approach.

Delving straight to the heart of the conversation, Bhavik talks us through the evolution of A/B testing, from its roots in the scientific method, to recent and even current practices – which involve a lot of trial and error to test basic variables. When projecting into the future, we need to consider everything from people, to processes, and technology.

Until recently, conversion rate optimization has mostly been driven by marketing teams, with a focus on optimizing the basics such as headlines, buttons, and copy. Over the last few years, product development has started to become more data driven. Within the companies taking this approach, the product teams are the recipients of the A/B test results, but the people behind these tests are the analytical and data science teams, who are crafting new and advanced methods, from a statistical standpoint.

Rather than making a change on the homepage and trying to measure its impact on outcome metrics, such as sales or new customer acquisition, certain organizations are taking an alternative approach modeled by their data science teams: focusing on driving current user activity and then building new products based on that data.

The future of experimentation is born from an innovative mindset, but also requires critical thinking when it comes to planning experiments. Before a test goes live, we must consider the hypothesis that we’re testing, the outcome metric or leading indicators, how long we’re going to run it, and make sure that we have measurement capabilities in place. In short, the art of experimentation is transitioning from a marketing perspective to a science-based approach.

Why you need to level up your experiment design today.

While it may be a widespread challenge to shift the mindset around data and analyst teams from being cost centers to profit-enablement centers, the slowing economy might have a silver lining: people taking the experimentation process a lot more seriously.

We know that with proper research and design, an experiment can achieve a great ROI, and even prevent major losses when it comes to investing in new developments. However, it can be difficult to convince leadership of the impact, efficiency and potential growth derived from experimentation.

Given the current market, demonstrating the value of experimentation is more important than ever, as product and marketing teams can no longer afford to make mistakes by rolling out tests without validating them first, explains Bhavik.

Rather than watching your experiment fail slowly over time, it’s important to have a measurement framework in place: a baseline, a solid hypothesis, and a proper experiment design. With experimentation communities making up a small fraction of the overall industry, not everyone appreciates the ability to validate, quantify, and measure the impact of their work, however Bhavik hopes this will evolve in the near future.

Disruptive testing: high risk, high reward.

On the spectrum of innovation, at the very lowest end is incremental innovation, such as small tests and continuous improvements, which hits a local maximum very quickly. In order to break through that local maximum, you need to try something bolder: disruptive innovation.

When an organization is looking for bigger results, they need to switch out statistically significant micro-optimizations for experiments that will bring statistically meaningful results.

Once you’ve achieved better baseline practices – hypothesis writing, experiment design, and planning – it’s time to start making bigger bets and find other ways to measure it.

Now that you’re performing statistically meaningful tests, the final step in the evolution of experimentation is reverse-engineering solutions by identifying the right problem to solve. Bhavik explains that while we often focus on prioritizing solutions, by implementing various frameworks to estimate their reach and impact, we ought to take a step back and ask ourselves if we’re solving the right problem.

With a framework based on quality data and research, we can identify the right problem and then work on the solution, “because the best solution for the wrong problem isn’t going to have any impact,” says Bhavik.

What else can you learn from our conversation with Bhavik Patel?

The common drivers of experimentation and the importance of setting realistic expectations with expert guidance.
The role of A/B testing platforms in the future of experimentation: technology and interconnectivity.
The potential use of AI in experimentation: building, designing, analyzing, and reporting experiments, as well as predicting test outcomes.
The future of pricing: will AI enable dynamic pricing based on the customer’s behavior?

About Bhavik Patel

A seasoned CRO expert, Bhavik Patel is the Product Analytics & Experimentation Director at Lean Convert, leading a team of optimization specialists to create better online experiences for customers through experimentation, personalization, research, data, and analytics.
In parallel, Bhavik is the founder of CRAP Talks, an acronym that stands for Conversion Rate, Analytics and Product, which unites CRO enthusiasts with thought leaders in the field through inspiring meetup events – where members share industry knowledge and ideas in an open-minded community.

About 1,000 Experiments Club

The 1,000 Experiments Club is an AB Tasty-produced podcast hosted by Marylin Montoya, AB Tasty CMO. Join Marylin and the Marketing team as they sit down with the most knowledgeable experts in the world of experimentation to uncover their insights on what it takes to build and run successful experimentation programs.

You might also like...

See all

Article

3min read

Experiment Health Check: Proactive Monitoring for Reliable Experimentation

Emily Healy

Jul 1, 2025

Article

9min read

Heatmaps: Your Team’s Secret Weapon for Uncovering Website Gold

AB Tasty

Jun 24, 2025

Article

6min read

9 AI Features that Transform How Digital Teams Test, Learn, and Grow

Emily Healy

Jun 20, 2025

Subscribe to
our Newsletter

Article

May 23, 2024

5min read

Mutually Exclusive Experiments: Preventing the Interaction Effect

Hubert Wassner

What is the interaction effect?

If you’re running multiple experiments at the same time, you may find their interpretation to be more difficult because you’re not sure which variation caused the observed effect. Worse still, you may fear that the combination of multiple variations could lead to a bad user experience.

It’s easy to imagine a negative cumulative effect of two visual variations. For example, if one variation changes the background color, and another modifies the font color, it may lead to illegibility. While this result seems quite obvious, there may be other negative combinations that are harder to spot.

Imagine launching an experiment that offers a price reduction for loyal customers, whilst in parallel running another that aims to test a promotion on a given product. This may seem like a non-issue until you realize that there’s a general rule applied to all visitors, which prohibits cumulative price reductions – leading to a glitch in the purchase process. When the visitor expects two promotional offers but only receives one, they may feel frustrated, which could negatively impact their behavior.

What is the level of risk?

With the previous examples in mind, you may think that such issues could be easily avoided. But it’s not that simple. Building several experiments on the same page becomes trickier when you consider code interaction, as well as interactions across different pages. So, if you’re interested in running 10 experiments simultaneously, you may need to plan ahead.

A simple solution would be to run these tests one after the other. However, this strategy is very time consuming, as your typical experiment requires two weeks to be performed properly in order to sample each day of the week twice.

It’s not uncommon for a large company to have 10 experiments in the pipeline and running them sequentially will take at least 20 weeks. A better solution would be to handle the traffic allocated to each test in a way that renders the experiments mutually exclusive.

This may sound similar to a multivariate test (MVT), except the goal of an MVT is almost the opposite: to find the best interaction between unitary variations.

Let’s say you want to explore the effect of two variation ideas: text and background color. The MVT will compose all combinations of the two and expose them simultaneously to isolated chunks of the traffic. The isolation part sounds promising, but the “all combinations” is exactly what we’re trying to avoid. Typically, the combination of the same background color and text will occur. So an MVT is not the solution here.

Instead, we need a specific feature: A Mutually Exclusive Experiment.

What is a Mutually Exclusive Experiment (M2E)?

AB Tasty’s Mutually Exclusive Experiment (M2E) feature enacts an allocation rule that blocks visitors from entering selected experiments depending on the previous experiments already displayed. The goal is to ensure that no interaction effect can occur when a risk is identified.

How and when should we use Mutually Exclusive Experiments?

We don’t recommend setting up all experiments to be mutually exclusive because it reduces the number of visitors for each experiment. This means it will take longer to achieve significant results and the detection power may be less effective.

The best process is to identify the different kinds of interactions you may have and compile them in a list. If we continue with the cumulative promotion example from earlier, we could create two M2E lists: one for user interface experiments and another for customer loyalty programs. This strategy will avoid negative interactions between experiments that are likely to overlap, but doesn’t waste traffic on hypothetical interactions that don’t actually exist between the two lists.

What about data quality?

With the help of an M2E, we have prevented any functional issues that may arise due to interactions, but you might still have concerns that the data could be compromised by subtle interactions between tests.

Would an upstream winning experiment induce false discovery on downstream experiments? Alternatively, would a bad upstream experiment make you miss an otherwise downstream winning experiment? Here are some points to keep in mind:

Remember that roughly eight tests out of 10 are neutral (show no effect), so most of the time you can’t expect an interaction effect – if no effect exists in the first place.
In the case where an upstream test has an effect, the affected visitors will still be randomly assigned to the downstream variations. This evens out the effect, allowing the downstream experiment to correctly measure its potential lift. It’s interesting to note that the average conversion rate following an impactful upstream test will be different, but this does not prevent the downstream experiment from correctly measuring its own impact.
Remember that the statistical test is here to take into account any drift of the random split process. The drift we’re referring to here is the fact that more impacted visitors of the upstream test could end up in a given variation creating the illusion of an effect on the downstream test. So the gain probability estimation and the confidence interval around the measured effect is informing you that there is some randomness in the process. In fact, the upstream test is just one example among a long list of possible interfering events – such as visitors using different computers, different connection quality, etc.

All of these theoretical explanations are supported by an empirical study from the Microsoft Experiment Platform team. This study reviewed hundreds of tests on millions of visitors and saw no significant difference between effects measured on visitors that saw just one test and visitors that saw an additional upstream test.

Conclusion

While experiment interaction is possible in a specific context, there are preventative measures that you may take to avoid functional loss. The most efficient solution is the Mutually Exclusive Experiment, allowing you to eliminate the functional risks of simultaneous experiments, make the most of your traffic and expedite your experimentation process.

References:

https://www.microsoft.com/en-us/research/group/experimentation-platform-exp/articles/a-b-interactions-a-call-to-relax/

You might also like...

See all

Article

3min read

Experiment Health Check: Proactive Monitoring for Reliable Experimentation

Emily Healy

Jul 1, 2025

Article

9min read

Heatmaps: Your Team’s Secret Weapon for Uncovering Website Gold

AB Tasty

Jun 24, 2025

Article

6min read

9 AI Features that Transform How Digital Teams Test, Learn, and Grow

Emily Healy

Jun 20, 2025

Subscribe to
our Newsletter

Article

May 21, 2024

13min read

Frequentist vs Bayesian Methods in A/B Testing

Hubert Wassner

In A/B testing, there are two main ways of interpreting test results: Frequentist vs Bayesian.

These terms refer to two different inferential statistical methods. Debates over which is ‘better’ are fierce – and at AB Tasty, we know which method we’ve come to prefer.

If you’re shopping for an A/B testing vendor, new to A/B testing or just trying to better interpret your experiment’s results, it’s important to understand the logic behind each method. This will help you make better business decisions and/or choose the best experimentation platform.

Bayesian vs frequentist methods in ab testing

Source

In this article, we discuss these two statistical methods under the inferential statistics umbrella, compare and contrast their strong points and explain our preferred method of measurement.

What is inferential statistics?

Both Frequentist and Bayesian methods are under the umbrella of inferential statistics.

As opposed to descriptive statistics (which describes purely past events), inferential statistics try to infer or forecast future events.

Would version A or version B have a better impact on X KPI?

Side note: If we want to geek out, technically inferential statistics isn’t really forecasting in a temporal sense, but extrapolating what will happen when we apply results to a larger pool of participants.

What happens if we apply winning version B to my entire website audience? There’s a notion of ‘future’ events in that we need to actually implement version B tomorrow, but in the strictest sense, we’re not using statistics to ‘predict the future.’

For example, let’s say you were really into Olympic sports, and you wanted to learn more about the men’s swimming team. Specifically, how tall are they? Using descriptive statistics, you could determine some interesting facts about ‘the sample’ (aka the team):

The average height of the sample
The spread of the sample (variance)
How many people are below or above the average
Etc.

This might fit your immediate needs, but the scope is pretty limited.

What inferential statistics allows you to do is to infer conclusions about samples that are too big to study in a descriptive way.

If you were interested in knowing the average height of all men on the planet, it wouldn’t be possible to go and collect all that data. Instead, you can use inferential statistics to infer that average from different, smaller samples.

Two ways of inferring this kind of information through statistical analysis are the Frequentist and Bayesian methods.

What is the Frequentist statistics method in A/B testing?

The Frequentist approach is perhaps more familiar to you since it’s more frequently used by A/B testing software (pardon the pun). This method also makes an appearance in college-level stats classes.

This approach is designed to make a decision about a unique experiment.

With the Frequentist approach, you start with the hypothesis that there is no difference between test versions A and B. And at the end of your experiment, you’ll end up with something called a P-Value (probability value).

The P-Value is the probability of obtaining results at least as extreme as the observed results assuming that there is no (real) difference between the experiments.

In practice, the P-Value is interpreted to mean: the probability that there is no difference between your two versions. (That’s why it is often “inverted” with the basic formula p = 1-pValue, in order to express the probability that there is a difference.)

The smaller the P-Value, the higher the chance that there is, in fact, a difference, and also that your hypothesis is wrong.

Frequentist pros:

Frequentist models are available in any statistic library for any programming language.
The computation of frequentist tests is blazing fast.

Frequentist cons:

You only estimate the P-Value at the end of a test, not during. ‘Data peeking’ before a test has ended generates misleading results because it actually becomes several experiments (one experiment each time you peek at the data), whereas the test is designed for one unique experiment.
You can’t know the actual gain interval of a winning variation – just that it won.

What is the Bayesian statistics method in A/B testing?

The Bayesian approach looks at things a little differently.

We can trace it back to a charming British mathematician, Thomas Bayes, and his eponymous Bayes’ Theorem.

Bayes Theorem

Source

The Bayesian approach allows for the inclusion of prior information (‘a prior’) into your current analysis. The method involves three overlapping concepts:

Prior – information you have from a previous experiment. At the beginning of the experiment, we use a ‘non-informative’ prior (think ’empty’)
Evidences – the data of the current experiment
Posterior – the updated information you have from the prior and the evidences. This is what is produced by the Bayesian analysis.

By design, this test can be used for an ongoing experiment. When data peeking, the ‘peeked at data’ can be seen as a prior, and the future incoming data will be the evidence, and so on.

This means ‘data peeking’ naturally fits in the test design. So at each ‘data peeking,’ the posterior computed by the Bayesian analysis is valid.

Crucially for A/B testing in a business setting, the Bayesian approach allows the CRO practitioner to estimate the gain of a winning variation – more on that later.

Bayesian pros:

Allows you to ‘peek’ at the data during a test, so you can either stop sending traffic if a variation is tanking or switch earlier to a clear winner.
Allows you to see the actual gain of a winning test.
By its nature, often rules out the implementation of false positives.

Bayesian cons:

Needs a sampling loop, which takes a non-negligible CPU load. This is not a concern at the user level, but could potentially gum things up at scale.

Bayesian vs Frequentist: which is better?

So, which method is the ‘better’ method?

Let’s start with the caveat that both are perfectly legitimate statistical methods. But at AB Tasty, our customer experience optimization and feature management software, we have a clear preference for the Bayesian a/b testing approach. Why?

Gain size

One very strong reason is because with Bayesian statistics, you can estimate a range of the actual gain of a winning variation, instead of only knowing that it was the winner, full stop.

In a business setting, this distinction is crucial. When you’re running your A/B test, you’re really deciding whether to switch from variation A to variation B, not whether you choose A or B from a blank slate. You therefore need to consider:

The implementation cost of switching to variation B (time, resources, budget)
Additional associated costs of variation B (vendor costs, licenses…)

As an example, let’s say you’re a B2B software vendor, and you ran an A/B test on your pricing page. Variation B included a chatbot, whereas version A didn’t. Variation B outperformed variation A, but to implement variation B, you’ll need 2 weeks of developer time to integrate your chatbot into your lead workflow, plus allocate X dollars of marketing budget to pay for the monthly chatbot license.

via GIPHY

You need to be sure the math adds up, and that it’s more cost-effective to switch to version B when these costs are weighed against the size of the test gain. A Bayesian A/B testing approach will let you do that.

Let’s take a look at an example from the AB Tasty reporting dashboard.

In this fictional test, we’re measuring three variations against an original, with ‘CTA clicks’ as our KPI.

AB Tasty reporting

We can see that variation 2 looks like the clear winner, with a conversion rate of 34.5%, compared to the original of 25%. But by looking to the right, we also get the confidence interval of this gain. In other words, a best and worst-case scenario.

The median gain for version 2 is 36.4%, with the lowest possible gain being +2.25% and the highest being 48.40%

These are the lowest and the highest gain markers you can achieve in 95% of cases.

If we break it down even further:

There’s a 50% chance of the gain percentage lying above 36.4% (the median)
There’s a 50% chance of it lying below 36.4%.
In 95% of cases, the gain will lie between +2.25% and +48.40%.
There remains a 2.5% chance of the gain lying below 2.25% (our famous false positive) and a 2.5% chance of it lying above 48.40%.

This level of granularity can help you decide whether to roll out a winning test variation across your site.

Are both the lowest and highest ends of your gain markers positive? Great!

Is the interval small, i.e. you’re quite sure of this high positive gain? It’s probably the right decision to implement the winning version.

Is your interval wide but implementation costs are low? No harm in going ahead there, too.

However, if your interval is large and the cost of implementation is significant, it’s probably best to wait until you have more data to shrink that interval. At AB Tasty we generally recommend that you:

Wait until you have recorded at least 5,000 unique visitors per variation
Let the test run for at least 14 days (two business cycles)
Wait until you have reached 300 conversions on the main goal.

Data peeking

Another advantage of Bayesian statistics is that it’s ok for you to ‘peek’ at your data’s results during a test (but be sure not to overdo it…).

Let’s say you’re working for a giant e-commerce platform and you’re running an A/B test involving a new promotional offer. If you notice that version B is performing abysmally – losing you big money – you can stop it immediately!

Conversely, if your test is outperforming, you can switch all of your website traffic to the winning version earlier than if you were relying on the Frequentist method.

This is precisely the logic behind our Dynamic Traffic Allocation feature – and it wouldn’t be possible without Mr. Thomas Bayes.

Dynamic Traffic Allocation

If we pause quickly on the topic of Dynamic Traffic Allocation, we’ll see that it’s particularly useful in business settings or contexts that are volatile or time-limited.

AB Tasty dynamic traffic allocation bayesian

Dynamic Traffic Allocation option in the AB Tasty Interface.

Essentially, (automated) Dynamic Traffic Allocation strikes the balance between data exploitation and exploration.

The test data is ‘explored’ rigorously enough to be confident in the conclusion, and ‘exploited’ early enough so as to not lose out on conversions (or whatever your primary KPI is) unnecessarily. Note that this isn’t manual – a real live person is not interpreting these results and deciding to go or not to go.

Instead, an algorithm is going to make the choice for you, automatically.

In practice, for AB Tasty clients, this means checking the associated box and picking your primary KPI. The platform’s algorithm will then make the determination of if or when to send the majority of your traffic to a winning variation, once it’s determined.

This kind of approach is particularly useful:

Optimizing micro-conversions over a short time period
When the time span of the test is short (for example, during a holiday sales promotion)
When your target page doesn’t get a lot of traffic
When you’re testing 6+ variations

Though you’ll want to pick and choose when to go for this option, it’s certainly a handy one to have in your back pocket.

Want to start A/B testing on your website with a platform that leverages the Bayesian method? AB Tasty is a great example of an A/B testing tool that allows you to quickly set up tests with low code implementation of front-end or UX changes on your web pages, gather insights via an ROI dashboard, and determine which route will increase your revenue.

False Positives

In Bayesian statistics, like with Frequentist methods, there is a risk of what’s called a false positive.

A false positive, as you might guess, is when a test result indicates a variation shows an improvement when in reality it doesn’t.

It’s often the case with false positives that version B gives the same results as version A (not that it performs inadequately compared to version A).

While by no means innocuous, false positives certainly aren’t a reason to abandon A/B testing. Instead, you can adjust your confidence interval to fit the risk associated with a potential false positive.

Gain probability using Bayesian statistics

You’ve probably heard of the 95% gain probability rule of thumb.

In other words, you consider that your test is statistically significant when you’ve reached a 95% certainty level. You’re 95% sure your version B is performing as indicated, but there’s still a 5% risk that it isn’t.

For many marketing campaigns, this 95% threshold is probably sufficient. But if you’re running a particularly important campaign with a lot at stake, you can adjust your gain probability threshold to be even more exact – 97%, 98% or even 99%, practically ruling out the potential for a false positive.

While this seems like a safe bet – and it is the right choice for high-stakes campaigns – it’s not something to apply across the board.

This is because:

In order to attain this higher threshold, you’ll have to wait longer for results, therefore leaving you less time to reap the rewards of a positive outcome.
You will implicitly only get a winner with a bigger gain (which is rarer), and you will let go of smaller improvements that still could be impactful.
If you have a smaller amount of traffic on your web page, you may want to consider a different approach

Bayesian tests limit false positives

Another thing to keep in mind is that because the Bayesian approach provides a gain interval – and because false positives virtually only appear to perform slightly better than in reality – you’re unlikely to implement a false positive in the first place.

A common scenario would be that you run an A/B test to test whether a new promotional banner design increases CTA click-through rates.

Your result says version B performs better with a 95% gain probability but that the gain is minuscule (1% median improvement). Were this to be a false positive, you’re unlikely to deploy the version B promotional banner across your website, since the resources needed to implement it wouldn’t make it worth the minimum again.

But, since a Frequentist approach doesn’t provide the gain interval, you might be more tempted to put in place the false positive. While this wouldn’t be the end of the world – version B likely performs the same as version A – you would be spending time and energy on a modification that won’t bring you any added return.

Bottom line? If you play it too safe and wait for a confidence level that’s too high, you’ll miss out on a series of smaller gains, which is also a mistake.

Wrapping up: Frequentist vs Bayesian

So, which is better, Frequentist or Bayesian?

As we mentioned early, both approaches are perfectly sound, statistical methods.

But at AB Tasty, we’ve opted for the Bayesian approach, since we think it helps our clients make even better business decisions on their web experiments.

It also allows for more flexibility and maximizing returns (Dynamic Traffic Allocation). As for false positives, these can occur whether you go with a Frequentist or Bayesian approach – though you’re less likely to fall for one with the Bayesian approach.

At the end of the day, if you’re shopping for an A/B testing platform, you’ll want to find one that gives you easily interpretable results that you can rely on.

You might also like...

See all

Article

3min read

Experiment Health Check: Proactive Monitoring for Reliable Experimentation

Emily Healy

Jul 1, 2025

Article

9min read

Heatmaps: Your Team’s Secret Weapon for Uncovering Website Gold

AB Tasty

Jun 24, 2025

Article

6min read

9 AI Features that Transform How Digital Teams Test, Learn, and Grow

Emily Healy

Jun 20, 2025

Subscribe to
our Newsletter

Article

Feb 2, 2024

12min read

A/A Testing: What is it and When Should You Use it?

Hubert Wassner

A/A tests are a legacy from the early days of A/B testing. It’s basically creating an A/B test where two identical versions of a web page or element are tested against each other. Variation B is just a copy of A without any modification.

One of the goals of A/A tests is to check the effectiveness and accuracy of testing tools. The expectation is that, if no winner is declared, the test is a success. Whereas detecting a statistical difference would mean a failure, indicating a problem somewhere in the pipeline.

But it’s not always that simple. We’ll dive into this type of testing and the statistics and tech behind the scenes. We’ll look at why a failed A/A test is not a proof of pipeline failure, and that a successful A/A test isn’t a foolproof sanity check.

What is tested during an A/A test?

Why is there so much buzz around A/A testing? An A/A test can be a way to verify two components of an experimentation platform:

The statistical tool: It may be possible that the formulas chosen don’t fit the real nature of the data, or may contain bugs.
The traffic allocation: The split between variations must be random and respect the proportions it has been given. When a problem occurs, we talk about Sample Ratio Mismatch (SRM); that is, the observed traffic does not match the allocation setting. This means that the split has some bias impacting the analysis quality.
Let’s explore this in more detail.

Statistical tool test

Let’s talk about a “failed” A/A test

The most common idea behind A/A tests is that the statistical tool should yield no significant difference. It is considered a “failed” A/A test if you detect a difference in performance during an A/A test.

However, to understand how weak this conclusion is, you need to understand how statistical tests work. Let’s say that your significance threshold is 95%. This means that there is still a 5% chance that the difference you see is a statistical fluke and no real difference exists between the variations. So even with a perfectly working statistical tool, you still have one chance in twenty (1/20=5%) that you will have a “failed” A/A test and you might start looking for a problem that may not exist.

With that in mind, an acceptable statistical procedure would be to perform 20 A/A tests and expect to have 19 that yield no statistical difference, and one that does detect a significant difference. And even in this case, if 2 tests show significant results, it’s a sign of a real problem. In other words, having 1 successful A/A test is in fact not enough to validate a statistical tool. To validate it fully, you need to show that the tests are successful 95% of the time (=19/20).

Therefore, a meaningful approach would be to perform hundreds of A/A tests and expect ~5% of them to “fail”. It’s worth noting that if it “fails” less than 5% of the time it’s also a problem, maybe indicating that the statistical test simply says “no” too often, leading to a strategy that never detects any winning variation. So one A/A “failed” test doesn’t tell much in reality.

What if it’s a “successful A/A test”?

A “successful” A/A test (yielding no difference) is not proof that everything is working as it should. To understand why, you need to check another important tool in an A/B test: the sample size calculator.

In the following example, we see that from a 5% conversion rate, you need around 30k visitors per variation to reach the 95% significance level if a variation yields a 10% MDE (Minimal Detectable Effect).

But in the context of an A/A test, the Minimal Detectable Effect (MDE) is in fact 0%. Using the same formula, we’ll plug 0% as MDE.

At this point, you will discover that the form does not let you put a 0% here, so let’s try a very small number then. In this case, you get almost 300M visitors, as seen below.

In fact, to be confident that there is exactly no difference between two variations, you need an infinite number of visitors, which is why the form does not let you set 0% as MDE.

Therefore, a successful A/A test only tells you that the difference between the two variations is smaller than a given number but not that the two variations perform exactly the same.

This problem comes from another principle in statistical tests: the power.

The power of a test is the chance that you discover a difference if there is any. In the context of an A/A test, this refers to the chance you discover a statistically significant discrepancy between the two variations’ performance.

The more power, the more chance you will discover a difference. To raise the power of a test you simply raise the number of visitors.

You may have noticed that in the previous screenshots, tests are usually powered at 80%. This means that even if a difference exists between the variations in performance, 20% of the time you will miss it. So one “successful” A/A test (yielding no statistical difference) may just be an occurrence of this 20%. In other words, having just one successful A/A test doesn’t ensure the efficiency of your experimentation tool. You may have a problem and there is a 20% chance that you missed it. Additionally, reaching 100% of power will need an infinite number of visitors, making it impractical.

How do we make sure we can trust the statistical tool then? If you are using a platform that is used by thousands of other customers, chances are that the problem would have already been discovered.

Because statistical software does not change very often and it is not affected by the variation content (whereas the traffic allocation might change, as we will see later), the best option is to trust your provider, or you can double-check the results with an independent provider. You can find a lot of independent calculators on the web. They only need the number of visitors and the number of conversions for each variation to provide the results making it quick to implement.

Traffic allocation test

In this part, we only focus on traffic, not conversions.

The question is: does the splitting operation work as it should? We call this kind of failure a SRM or Sample Ratio Mismatch. You may ask yourself how a simple random choice could fail. In fact, the failure happens either before or after the random choice.

The following demonstrates two examples where that can happen:

The variation contains a bug that may crash some navigators. In this case, the corresponding variation will lose visitors. The bug might depend on the navigator and then you will end up with bias in your data.
If the variation gives a discount coupon (or any other advantage), and some users find a way to force their navigator to run the variation (to get the coupon), then you will have an excess of visitors for that variation that is not due to random chance, which results in biased data.

It’s hard to detect with the naked eye because the allocation is random, so you never get sharp numbers.

For instance, a 50/50 allocation never precisely splits the traffic in groups with the exact same size. As a result, we would need statistical tools to check if the split observed corresponds with the desired allocation.

SRM tests exist. They work more or less the same way as an A/B test except that the SRM formula indicates whether there is a difference between the desired allocation and what really happened. If there is indeed an SRM, then there is a chance that this difference is not due to pure randomness. This means that some data is lost or bias occurred during the experiment entailing trust for future (real) experiments.

On the one hand, detecting an SRM during an A/A test sounds like a good idea. On the other hand, if you think operationally it might not be that useful because the chance of a SRM is low.

Even if some reports say that they are more frequent than you may think, most of the time it happens on complex tests. In that sense, checking SRM within an A/A test will not help you to prevent having one on a more complex experiment later.

If you find a Sample Ration Mismatch on a real experiment or in an A/A test, the following actions remain the same: find the cause, fix it, and restart the experiment. So why waste time and traffic on an A/A test that will give you no information? A real experiment would have given you real information if it worked fine on the first try. If a problem does occur, we would detect it even in a real experiment since we only consider traffic and not conversions.

A/A tests are also unnecessary since most trustworthy A/B testing platforms (like AB Tasty) do SRM checks on an automated basis. So if an SRM occurs, you will be notified anyway.

So where does this “habit” of practicing A/A tests come from?

Over the years, it’s something that engineers building A/B testing platforms have done. It makes sense in this case because they can run a lot of automated experiments, and even simulate users if they don’t have enough at hand, performing a sound statistical approach to A/A tests.

They have reasons to doubt the platform in the works and they have the programming skills to automatically create hundreds of A/A tests to test it properly. Since these people can be seen as pioneers, their voice on the web is loud when they explain what an A/A test is and why it’s important (from an engineering perspective).

However, for a platform user/customer, the context is different as they’ve paid for a ready-to- use and trusted platform and can start a real experiment as soon as possible to get a return on investment. Therefore, it makes little sense to waste time and traffic on an A/A test that won’t provide any valuable information.

Why sometimes it might be better to skip A/A tests

We can conclude that a failed A/A test is not a problem and that a successful one is not proof of sanity.

In order to gain valuable insights from A/A tests, you would need to perform hundreds of them with an infinite number of visitors. Moreover, an efficient platform like AB Tasty does the corresponding checks for you.

That’s why, unless you are developing your own A/B testing platform, running an A/A test may not give you the insights you’re looking for. A/A tests require a considerable amount of time and traffic that could otherwise be used to conduct A/B tests that could give you valuable insights on how to optimize your user experience and increase conversions.

When it makes sense to run an A/A test

It may seem that running A/A tests may not be the right call after all. However, there may be a couple of reasons why it might still be useful to perform A/A tests.

First is when you want to check the data you are collecting and compare it to data already collected with other analytics tools but keep in mind that you will never get the exact same results. The reason is that most of the metric definitions vary on different tools. Nonetheless this comparison is an important onboarding step to ensure that the data is properly collected.

The other reason to perform an A/A test is to know the reference value for your main metrics so you can establish a baseline to analyze your future campaigns more accurately. For example, what is your base conversion rate and/or bounce rate? Which of these metrics need to be improved and are, therefore, a good candidate for your first real A/B test?

This is why AB Tasty has a feature that helps users build A/A tests dedicated to reach these goals and avoids the pitfalls of “old school” methods that are not useful anymore. With our new A/A test feature, A/A test data is collected in one variant (not two); let’s call this an “A test”.

This allows you to have a more accurate estimation of these important metrics as the more data you have, the more accurate the measurements are. Meanwhile, in a classic A/A test, data is collected in two different variants which provides less accurate estimates since you have less data for each variant.

With this approach, AB Tasty enables users to automatically set up A/A tests, which gives better insights than classic “handmade” A/A tests.

You might also like...

See all

Article

3min read

Experiment Health Check: Proactive Monitoring for Reliable Experimentation

Emily Healy

Jul 1, 2025

Article

9min read

Heatmaps: Your Team’s Secret Weapon for Uncovering Website Gold

AB Tasty

Jun 24, 2025

Article

6min read

9 AI Features that Transform How Digital Teams Test, Learn, and Grow

Emily Healy

Jun 20, 2025

Subscribe to
our Newsletter

Article

Jun 7, 2023

8min read

10 Generative AI Ideas for Your Experimentation Roadmap

Emily Healy

Artificial intelligence has been a recurring theme for decades. However, it’s no longer science fiction – it’s a reality.

Since OpenAI launched its own form of generative AI, ChatGPT, in November 2022, the world has yet to stop talking about its striking capabilities. It’s particularly fascinating to see just how easy it is to get results after interacting with this bot which is comprised of deep-learning algorithms for natural language processing.

Even Google quickly followed by launching a new and experimental project, Gemini, to revolutionize its own Search. By harnessing the power of generative AI and the capacity of large language models, Google is seeking to take its search process to the next level.

Given the rapid growth of this technological advancement over the past few months, it’s time that we talk about generative AI in the context of A/B testing and experimentation.

Whether you’re curious about how AI can impact your experiments or are ready for inspiration we’ll discuss some of our ideas around using AI for A/B testing, personalization, and conversion rate optimization.

What is generative AI?

Generative AI is a type of artificial intelligence that doesn’t have programming limitations, which allows it to generate new content (think ChatGPT). Instead of following a specific, pre-existing dataset, generative AI learns from indexing extensive data, focusing on patterns and using deep learning techniques and neural networks to create human-like content based on its learnings.

The way algorithms capture ideas is similar to how humans gather inspiration from previous experiences to create something unique. Based on the large amounts of data used to craft generative AI’s learning abilities, it’s capable of outputting high-quality responses that are similar to what a human would create.

However, some concerns need to be addressed:

Biased information: Artificial intelligence is only as good as the datasets used to train it. Therefore if the data used to train it has biases, it may create “ideas” that are equally biased or flawed.
Spreading misinformation: There are many concerns about the ethics of generative AI and sharing information directly from it. It’s best practice to fact-check any content written by AI to avoid putting out false or misleading information.
Content ownership: Since content generated with AI is not generated by a human, can you ethically can claim it as your own idea? In a similar sense, the same idea could potentially be generated elsewhere by using a similar prompt. Copywriting and ownership are then called into question here.
Data and privacy: Data privacy is always a top-of-mind concern. With the new capabilities of artificial intelligence, data handling becomes even more challenging. It’s always best practice to avoid using sensitive information with any form of generative AI.

By keeping these limitations in mind, generative AI has the potential to streamline processes and revolutionize the way we work – just as technology has always done in the past.

10 generative AI uses for A/B testing

In the A/B testing world, we are very interested in how one can harness these technological breakthroughs for experimentation. We are brainstorming a few approaches to re-imagine the process of revolutionizing digital customer experiences to ultimately save time and resources.

Just like everyone else, we started to wonder how generative AI could impact the world of experimentation and our customers. Here are some ideas, some of them concrete and some more abstract, as to how artificial intelligence could help our industry:

DISCLAIMER: Before uploading information into any AI platform, ensure that you understand their privacy and security practices. While AI models strive to maintain a privacy standard, there’s always the risk of data breaches. Always protect your confidential information.

1. Homepage optimization

Your homepage is likely the first thing your visitors will see so optimization is key to staying ahead of your competitors. If you want a quick comparison of content on your homepage versus your competitors, you can feed this information into generative AI to give it a basis for understanding. Once your AI is loaded with information about your competitors, you can ask for a list of best practices to employ to make new tests for your own website.

2. Analyze experimentation results

Reporting and analyzing are crucial to progressing on your experimentation roadmap, but it’s also time-consuming. By collecting a summary of testing logs, generative AI can help highlight important findings, summarize your results, and potentially even suggest future steps. Ideally, you can feed your A/B test hypothesis as well as the results to show your thought process and organization. After it recognizes this specific thought process and desired results, it could aid in generating new test hypotheses or suggestions.

3. Recommend optimization barriers

Generative AI can help you prioritize your efforts and identify the most impactful barriers to your conversion rate. Uploading your nonsensitive website performance data gathered from your analytics platforms can give AI the insight it needs into your performance. Whether it suggests that you update your title tags or compress images on your homepage, AI can quickly spot where you have the biggest drop-offs to suggest areas for optimization.

4. Client reviews

User feedback is your own treasure trove of information for optimization. One of the great benefits of AI that we already see is that it can understand large amounts of data quickly and summarize it. By uploading client reviews, surveys and other consumer feedback into the database, generative AI can assist you in creating detailed summaries of your users’ pain points, preferences and levels of satisfaction. The more detailed your reviews – the better the analysis will be.

5. Chatbots

Chatbots are a popular way to communicate with website visitors. As generative AI is a large language model, it can quickly generate conversational scripts, prompts and responses to reduce your brainstorming time. You can also use AI to filter and analyze conversations that your chatbot is already having to determine if there are gaps in the conversation or ways to enhance its interaction with customers.

6. Translation

Language barriers can limit a brand that has a presence in multiple regions. Whether you need translations for your chatbot conversations, CTAs or longer form copy, generative AI can provide you with translations in real time to save you time and make your content accessible to all zones touched by your brand.

7. Google Adwords

Speed up brainstorming sessions by using generative AI to experiment with different copy variations. Based on the prompts you provide, AI can provide you with a series of ideas for targeting keywords and creating copy with a particular tone of voice to use with Google Adwords. Caution: be sure to double-check all keywords proposed to verify their intent.

8. Personalization

Personalized content can be scaled at speed by leveraging artificial intelligence to produce variations of the same messages. By customizing your copy, recommendations, product suggestions and other messages based on past user interactions and consumer demographics, you can significantly boost your digital consumer engagement.

9. Product Descriptions

Finding the best wording to describe why your product is worth purchasing may be a challenge. With generative AI, you can get more ambitious with your product descriptions by testing out different variations of copy to see which version is the most promising for your visitors.

10. Predict User Behavior

Based on historical data from your user behavior, generative AI can predict behavior that can help you to anticipate your next A/B test. Tailoring your tests according to patterns and trends in user interaction can help you conduct better experiments. It’s important to note that predictions will be limited to patterns interpreted by past customer data collected and uploaded. Using generative AI is better when it’s used as a tool to guide you in your decision-making process rather than to be the deciding force alone.

The extensive use of artificial intelligence is a new and fast-evolving subject in the tech world. If you want to leverage it in the future, you need to start familiarizing yourself with its capabilities.

Keep in mind that it’s important to verify the facts and information AI generates just as you carefully verify data before you upload. Using generative AI in conjunction with your internal experts and team resources can assist in improving ideation and efficiency. However, the quality of the output from generative AI is only as good as what you put in.

Is generative AI a source of competitive advantage in A/B testing?

The great news is that this technology is accessible to everyone – from big industry leaders like Google to start-ups with a limited budget. However, the not-so-great news is that this is available to everyone. In other words, generative AI is not necessarily a source of competitive advantage.

Technology existing by itself does not create more value for a business. Rather, it’s the people driving the technology who are creating value by leveraging it in combination with their own industry-specific knowledge, past experiences, data collection and interpretation capabilities and understanding of customer needs and pain points.

While we aren’t here to say that generative AI is a replacement for human-generated ideas, this technology can definitely be used to complement and amplify your already-existing skills.

Leveraging generative AI in A/B testing

From education to copywriting or coding – all industries are starting to see the impact that these new software developments will have. Leveraging “large language models” is becoming increasingly popular as these algorithms can generate ideas, summarize long forms of text, provide insights and even translate in real-time.

Proper experimentation and A/B testing are at the core of engaging your audience, however, these practices can take a lot of time and resources to accomplish successfully. If generative AI can offer you ways to save time and streamline your processes, it might be time to use it as your not-so-secret weapon. In today’s competitive digital environment, continually enhancing your online presence should be at the top of your mind.

Want to start optimizing your website? AB Tasty is the best-in-class experience optimization platform that empowers you to create a richer digital experience – fast. From experimentation to personalization, this solution can help you activate and engage your audience to boost your conversions.

You might also like...

See all

Article

3min read

Experiment Health Check: Proactive Monitoring for Reliable Experimentation

Emily Healy

Jul 1, 2025

Article

9min read

Heatmaps: Your Team’s Secret Weapon for Uncovering Website Gold

AB Tasty

Jun 24, 2025

Article

6min read

9 AI Features that Transform How Digital Teams Test, Learn, and Grow

Emily Healy

Jun 20, 2025

Subscribe to
our Newsletter

Article

May 11, 2023

17min read

AB Tasty’s JavaScript Tag Performance and Report Analysis

AB Tasty

Hello! I am Léo, Product Manager at AB Tasty. I’m in charge, among several things, of our JavaScript tag that is currently running on thousands of websites for our clients. As you can guess, my roadmap is full of topics around data collection, privacy and… performance.

In today’s article, we are going to talk about JavaScript tag performance, open-data monitoring and competition. Let’s go!

Performance investigation

As performance has become a big and hot topic during the past few years, mainly thanks to Google’s initiative to deploy their Core Web Vitals, my team and I have been focused a lot on that. We’ve changed a lot of things, improved many parts of our tag and reached excellent milestones. Many of our users have testified of their satisfaction around that. I have already made a (long) series of blog articles about that here. Sorry though, it’s only in French. 😊🥖

From time to time, we get tickled by competitors about a specific report around performance that seems to show us as underperforming based on some metrics. Some competitors claim that they are up to 4 times faster than us! And that’s true, I mean, that’s what the report shows.

You can easily imagine how devastating this can be for the image of my company and how hard it could be for our sales team when a client draws this card. This is especially demoralizing for me and my team after all the work we’ve pushed through this topic during the last few years.

Though it was the first feeling I got when seeing this report, I know for a fact that our performance is excellent. We’ve reached tremendous improvements after the release of several projects and optimizations. Today all the benchmarks and audits I run over our customers’ websites show very good performance and a small impact on the famous Core Web Vitals.

Also, it’s very rare that a customer complains about our performance. It can happen, that’s for sure, but most of the time all their doubts disappear after a quick chat, some explanations and hints about optimization best practices.

But that report is still there, right? So maybe I’m missing something. Maybe I’m not looking at the correct metric. Maybe I’ve only audited customers where everything is good, but there’s a huge army of customers that don’t complain that our tag is drastically slowing their website down.

One easy way to tackle that would be to say that we are doing more with our tag than our competitors do.

Is CRO the same as analytics?

On the report (I promise I will talk about it in depth below 😄), we are grouped in the Analytics Category. However, Conversion Rate Optimization isn’t the same as Analytics. An analytics tool only collects data while we activate campaigns, run personalizations, implement widgets, add pop-ins and more. In this sense, our impact will be higher.

Let’s talk about our competitors: Even though we have the best solution out there (😇), our competitors do more or less the same things as us by using the same technics with the same limits and issues. Therefore, it’s legit to compare us with the same metrics. It might be true that we do a bit more than they do, but in the end, this shouldn’t explain a 4x difference in our performance.

Back then, and before digging into the details, I took the results of the report with humility. Therefore, my ambition was to crawl the data, analyze websites where their tag is running and try to find what they do better than us. We call that retro-engineering, and I find it healthy as it would help to have a faster website for everyone.

My engagement with my management was to find where we had a performance leak and solve it to be able to decrease our average execution time and get closer to our competitors.

But first, I needed to analyze the data. And, wow, I wasn’t prepared for that.

The report

The report is a dataset that is being monthly generated by The HTTP Archive. Here is a quote from their About Page:

“Successful societies and institutions recognize the need to record their history – this provides a way to review the past, find explanations for current behavior, and spot emerging trends. In 1996, Brewster Kahle realized the cultural significance of the Internet and the need to record its history. As a result he founded the Internet Archive which collects and permanently stores the Web’s digitized content.”

“In addition to the content of web pages, it’s important to record how this digitized content is constructed and served. The HTTP Archive provides this record. It is a permanent repository of web performance information such as size of pages, failed requests, and technologies utilized. This performance information allows us to see trends in how the Web is built and provides a common data set from which to conduct web performance research.”

Every month, they run a Lighthouse audit on millions of websites and generate a dataset containing the raw results.

As it is open-source and legit, it can be used by anyone to draw data visualization and ease access to this type of data.

That’s what the inventor of Google Lighthouse, Patrick Hulce, has done. Through his website, GitHub, he provides a nice visualization of this huge dataset and allows anyone to dig into details through several categories such as Analytics, Ads, Social Media and more. As I said, you’ll find the CRO tools in the Analytics category.

The website is fully open-source. The methodology is known and can be accessed.

So, what’s wrong with the report?

Well, there’s nothing technically wrong with it. We could find it disappointing that the dataset isn’t automatically updated every month, but the repository is open-source, so anyone motivated could do it.

However, this is only displaying the data in a fancy manner and not providing any insights or deep analysis of it. Any flaw or inconsistency will remain hidden and it could lead to a situation where a third party is seen as having bad performance compared to others when it is not necessarily the case.

One issue though, not related to the report itself, is the flaw an average could bring with it. That’s also something we are all aware of but that we tend to forget. If you take 10 people, 9 of them earn 800€ a month but one is earning 12 million euros a month, then we could conclude that everyone earns 1.2 million euros per month. Statistically right, but sounds a bit wrong, doesn’t it? More on that in a minute.

Knowing that, it was time to get my hands a bit dirty. With my team, we downloaded the full dataset from February 2023 to run our own audit and understand where we had performance leaks.

Note that downloading the full dataset is something we have been doing regularly for about one and a half years to monitor our trend. However, this time I decided to dig into the February 2023 report in particular.

The analysis

On this dataset, we could find the full list of websites running AB Tasty that have been crawled and the impact our tag had on them. To be more accurate, we have the exact measured execution time of our tag, in milliseconds.

This is what we extracted. The pixellated column is the website URL. The last column is the execution time in milliseconds.

With the raw data, we were able to calculate a lot of useful metrics.

Keep in mind that I am not a mathematician or anything close to a statistics expert. My methodology might sound odd, but it’s adequate for this analysis.

Average execution time

This is the first metric I get — the raw average for all the websites. That’s probably very close, if not equal, to what is used by the thirdpartyweb.today website. We already saw the downside of having an average, however, it’s still an interesting value to monitor.

Mean higher half and mean lower half

Then, I split the dataset in half. If I have 2000 rows, I create two groups of 1000 rows. The “higher” one and the “lower” one. It helps me have a view of the websites where we perform – the worst compared to the best. Then, I calculate the average of each half.

The difference between the two halves

The difference between the two halves is important as it shows the disparity within the dataset. The closer it is, the less extreme values we have.

The number of websites with a value above 6k ms

It’s just an internal metric we follow to give us a mid-term goal of having 0 websites above this value.

The evolution of the last dataset

I compute the evolution between the last dataset I have and the current. It helps me see if we get better in general, as well as how many websites are leaving or entering the chart.

The results

These are the results that we have:

Here are their corresponding graphs:

This is the evolution between October 2022 and February 2023:

Watch out: Logarithmic scale! Sorted by February 2023 execution time from left to right.

The figures say it all. But, if I can give a global conclusion, it’s that we made tremendous improvements in the first six months and staled a bit after with finer adjustments (the famous 80/20 of Pareto’s).

However, after the initial fall, two key figures are important.

First of all, the difference between the two halves is getting very close. This means that we don’t have a lot of potential performance leaks anymore (features that lead to an abnormal increase in the execution time). This is our first recent win.

Then, the evolution shows that in general, and except for the worst cases, it is steady or going down. Another recent win.

Digging into the details

What I have just shared is the raw results without having a look at the details of each row and each website that is being crawled.

However, as we say, the devil is in the details. Let’s dig in a bit.

Let’s focus on the websites where AB Tasty takes more than six seconds to execute.

Six seconds might sound like a lot (and it is), but don’t forget that the audit simulates a low-end CPU which is not representative of the average device. Instead, it shows the worst-case scenario.

In the February 2023 report, there are 33 of them. This is an average execution time of 19877 ms. I quickly identified that:

27 of them are from the same AB Tasty customer
One of them is abtasty.com and the total execution of resources coming from *abtasty.com on this website is very high 😊
Two others are also coming from one singular AB Tasty customer

In the end, we have only 5 customers on this list (but still 33 websites, don’t get me wrong).

Let’s now try to group up these two customers with duplicates to see the impact on the average. The customer with 27 duplicates also has websites that are below the 6k ms mark, but I’m going to ignore it for now (and to ease things up).

For each of the two customers with duplicates, I’m going to compute the average of all their duplicates. For the first one, the result is 21671 ms. For the second, the result is 14708 ms.

I’m also going to remove abtasty.com, which is not relevant.

With the new list, I went from 1223 ms for the full list average to 1005 ms. I just improved our average by more than 200 ms! 🎉

Wait, what? But you’re just removing the worst websites. Obviously, you are getting better…

Yep, that’s true. That’s cheating for sure! But, the point of this whole article is to demonstrate that data doesn’t say it all.

Let’s talk first about what is happening with this customer that has 27 duplicates.

The same tag has been deployed on more than 50 very different websites! You might not be very familiar with AB Tasty, so let me explain why this is an issue.

You might have several websites which have the same layout (that’s often the case when you have different languages). It makes sense to have the same tag on these different domains to be able to deploy the same personalizations on all of them at once. That’s not the most optimal way of doing it, but as of today, that’s the easiest way to do it with our tool.

However, if your websites are all different, there is absolutely no point in doing that. You are going to create a lot of campaigns (in this case, hundreds!) that will almost never be executed on the website (because it’s not the correct domain) but are still at least partially included in the tag. So our tag is going to spend its time checking hundreds of campaigns that have no chance to execute as the URL is rarely going to be valid.

Though we are working on a way to block this behavior (as we have alternatives and better options), it will take months before it disappears from the report.

Note: If you start using AB Tasty, you will not be advised to do that. Furthermore, the performance of your tag will be far better than that.

Again, I didn’t take the time to group all the duplicated domains as it is pointless, the goal was to demonstrate that it is easy to show better performance if we exclude anomalies that are not representative. We can imagine that we would improve more than 200+ ms by keeping only one domain.

I took the most obvious case, but a quick look at the rest of the dataset showed me some other examples.

The competitors’ figures

Knowing these facts and how our score might look worse than it is because of one single anomaly, I started looking into our competitors’ figures to see if they have the same type of issue.

I’m going to say it again: I’m not trying to say that we are better (or worse) than any of our competitors here, that’s not my point. I’m just trying to show you why statistics should be deeply analyzed to avoid any interpretation mistakes.

Let’s start by comparing AB Tasty’s figures for February 2023 with the same metrics for one of them.

Competitor's figures

In general, they look a bit better, right? Better average and even the means for each half is better (and the lower half by a lot!).

However, between the two halves, the factor is huge: 24! Does it mean that depending on your usage, the impact of their tag might get multiplied by 24?

If I wanted to tease them a little bit, I would say that when testing the tag on your website, you might find excellent performance but when starting to use it intensely you might face serious performance drops.

But, that would be interpreting a very small part of what the data said.

Also, they have more than twice the number of websites that are above the 6k ms mark (again: this mark is an AB Tasty internal thing). And that is by keeping the duplicates in AB Tasty’s dataset that we discussed just before! They also have duplicates, but not as many as we do.

A first (and premature) conclusion is that they have more websites with a big impact on performance but at the same time, their impact is lower in general.

Now that I know that in our case we have several customers that have duplicates, I wanted to check if our competitors have the same. And this one does – big time.

Among the 2,537 websites that have been crawled, 40% of them belong to the same customer. This represents 1,016 subdomains of the same domain.

How does this impact their score?

Well, their customer wasn’t using the solution at the moment the data was collected (I made sure of it by visiting some of the subdomains). This means that the tag wasn’t doing anything at all. It was there, but inactive.

The average execution time of these 1,016 rows in the dataset is 59 ms!! 😭 It also has a max value of 527 ms and a min value of 25 ms.

I don’t need to explain why this “anomaly” interestingly pulls down their average, right?

The 1,016 subdomains are not fake websites at all. I’m not implying that this competitor cheated on purpose to look better- I’m sure they didn’t. It is just a very nice coincidence for them, whether they are aware of it or not.

To finish, let’s compare the average of our two datasets after removing these 1,016 subdomains.

AB Tasty is at 1223 ms (untouched list) when this competitor is now at… 1471 ms.

They went from 361 ms better to 248 ms worse. I told you that I can let the figures say whatever I want. 🙂

I would have a lot of other things to say about these datasets, but I didn’t run all the analysis that could have been done here. I already spent too much time on it, to be honest.

Hopefully, though, I’ve made my point of showing that the same dataset can be interpreted in a lot of different manners.

What can we conclude from all of this?

The first thing I want to say is: TEST IT.

Our solution is very easy to implement. You simply put the tag on your website and run an audit. To compare, you can put another tool’s tag on your website and run the same audit. Run it several times with the same conditions and compare. Is the second tool better on your website? Fine, then it will probably perform better for your specific case.

Does a random report on the web says that one solution is better than another? Alright, that’s one insight, but you should either crunch the data to challenge it or avoid paying too much attention to it. Just accepting the numbers as it is displayed (or worse: advertised…) might make you miss a big part of the story.

Does AB Tasty have a bad performance?

No, it doesn’t. Most of our customers never complained about performance and some are very grateful for the latest improvements we’ve released on this topic.

So, some customers are complaining?

Yes. This is because sometimes AB Tasty can have a lower performance depending on your usage. But, we provide tools to help you optimize everything directly from our platform. We call this the Performance Center. It is a full section inside the platform and is dedicated to showing you which campaign is impacting your performance and what you can do to improve it. Just follow the guidelines and you’ll be good. It’s a very innovative and unique feature in the market, and we are very proud of it.

Though, I must admit that a few customers (only a few) have unrealistic expectations about performance. AB Tasty is a JS tag that is doing DOM manipulations, asynchronous checks, data collection and a lot of fancy stuff. Of course, it will impact your website more than a simple analytics tool will. The goal for you is to make sure that the effect of optimizing your conversions is higher than what it costs you in terms of performance. And it will be the same, whatever the CRO tool you are using, except if you use a server-side tool like Flagship by AB Tasty, for example.

I am convinced that we should aim towards a faster web. I am very concerned about my impact on the environment, and I’m trying to keep my devices as long as possible. My smartphone is 7 years old (and I’m currently switching to another one that is 10 years old) and my laptop isn’t very recent either. So, I know that a slow website can be a pain.

Final Remarks

Let me assure you that at AB Tasty we are fully committed to improving our performance because our customers are expecting us to do it, because I am personally motivated to do it, and because that is a very fun and interesting challenge for the team (and also because my management asks me to do it 😅 ).

Also, kudos to HTTP Archive which does very important work in gathering all this data and especially sharing it with everyone. Kudos to Patrick Hulce who took the time to build a very interesting website that helps people have a visual representation of HTTP Archive’s data. Kudos to anyone that works to build a better, faster and more secure web, often for free and because that’s what they believe in.

Want to test our tool for yourself? AB Tasty is the complete platform for experimentation, content personalization, and AI-powered recommendations equipped with the tools you need to create a richer digital experience for your customers — fast. With embedded AI and automation, this platform can help you achieve omnichannel personalization and revolutionize your brand and product experiences.

You might also like...

See all

Article

3min read

Experiment Health Check: Proactive Monitoring for Reliable Experimentation

Emily Healy

Jul 1, 2025

Article

9min read

Heatmaps: Your Team’s Secret Weapon for Uncovering Website Gold

AB Tasty

Jun 24, 2025

Article

6min read

9 AI Features that Transform How Digital Teams Test, Learn, and Grow

Emily Healy

Jun 20, 2025

Subscribe to
our Newsletter

Article

Apr 27, 2023

13min read

How to Deal with Low Traffic in CRO

Hubert Wassner

If your website traffic numbers aren’t as high as you may hope for, that’s no reason to give up on your conversion rate optimization (CRO) goals.

By now you must have noticed that most CRO advice is tailored for high-traffic websites. Luckily, this doesn’t mean you can’t optimize your website even if you have lower traffic.

The truth is, any website can be optimized – you just need to tailor your optimization strategy to suit your unique situation.

In this article, we will cover:

The most common complaint about CRO (95% threshold and where it comes from)
An appropriate threshold for low traffic
Ideas on how to optimize your website with a smaller traffic flow
The CUPED testing technique
When CUPED works and doesn’t work

CRO analogy

In order to make this article easier to understand, let’s start with an analogy. Imagine that instead of measuring two variants and picking a winner, we are measuring the performance of two boxers and placing bets on who will win the next 10 rounds.

So, how will we place our bet on who will win?

Imagine that boxer A and boxer B are both newbies that no one knows. After the first round, you have to make your choice. In the end, you will most likely place your bet on the boxer who won the first round. It might be risky if the winning margin is small, but in the end, you have no other way to base your decision.

Imagine now that boxer A is known to be a champion, and boxer B is a challenger that you don’t know. Your knowledge about boxer A is what we would call a prior – information you have before that influences your decision.

Based on the prior, you will be more likely to bet on boxer A as the champion for the next few rounds, even if boxer B wins the first round with a very small margin.

Furthermore, you will only choose boxer B as your predicted champion if they win the first round by a large margin. The stronger your prior, the larger the margin needs to be in order to convince you to change your bet.

Are you following? If so, the following paragraphs will be easy to grasp and you will understand where this “95% threshold” comes from.

Now, let’s move on to tips for optimizing your website with low traffic.

1. Solving the problem: “I never reach the 95% significance”

This is the most common complaint about CRO for websites with lower traffic and for lower traffic pages on bigger websites.

Before we dig into this most common problem, let’s start by answering the question, where does this 95% “golden rule” come from?

The origin of the 95% threshold

Let’s start our explanation with a very simple idea: What if optimization strategies were applied from day one? If two variants with no previous history were created at the same time, there would be no “original” version challenged by a newcomer.

This would force you to choose the best one from the beginning.

In this setting, any small difference in performance could be measured for decision-making. After a short test, you will choose the variant with the higher performance. It would not be good practice to pick the variant that had lower performance and furthermore, it would be foolish to wait for a 95% threshold to pick a winner.

But in practice, optimization is done well after the launch of a business.

So, in most real-life situations, there is a version A that already exists and a new challenger (version B) that is created.

If the new challenger, version B, comes along and the performance difference between the two variants is not significant, you will have no issues declaring version B “not a winner.”

Statistical tests are symmetric. So if we reverse the roles, swapping A and B in the statistical test will tell you that the original is not significantly better than the challenger. The “inconclusiveness” of the test is symmetric.

So, why do you set 100% of traffic toward the original at the end of an inconclusive test, implicitly declaring A as a winner? Because you have three priors:

Version A was the first choice. This choice was made by the initial creator of the page.
Version A has already been implemented and technically trusted. Version B is typically a mockup.
Version A has a lot of data to prove its value, whereas B is a challenger with limited data that is only collected during the test period.

Points 1 & 2 are the bases of a CRO strategy, so you will need to go beyond these two priors. Point 3 explains that version A has more data to back its performance. This explains why you trust version A more than version B. Version A has data.

Now you understand that this 95% confidence rule is a way of explaining a strong prior. And this prior mostly comes from historical data.

Therefore, when optimizing a page with low traffic, your decision threshold should be below 95% because your prior on A is weaker due to its traffic and seniority.

The threshold should be set according to the volume of traffic that went through the original from day one. However, the problem with this approach is that we know that the conversion rates are not stable and can change over time. Think of seasonality – i.e. black Friday rush, vacation days, Christmas time increases in activity, etc. Because of the seasonal changes, you can’t compare performances in different periods.

This is why practitioners only take into account data for version A and version B taken at the same period of time and set a high threshold (95%) to accept the challenger as a winner in order to formalize a strong prior toward version A.

What is the appropriate threshold for low traffic?

It’s hard to suggest an exact number to focus on because it depends on your risk acceptance.

According to the hypothesis protocol, you should structure a time frame for the data collection period in advance.

This means that the “stop” criteria of a test are not a statistical measure or based on a certain number. The “stop” criteria should be a timeframe coming to an end. Once the period is over, then you should look at the stats to make an appropriate decision.

AB Tasty, our customer experience optimization and feature management software, uses the Bayesian framework which produces a “chances to win” index which encourages a direct interpretation instead of a p-value, which has a very complex meaning.

In other words, the “chances to win index” is the probability for a given variation to be better than the original.

Therefore, a 95% “chance to win” means that there is a 95% probability that the given variation will be the winner. This is assuming that we don’t have any prior knowledge or specific trust for the original.

The 95% threshold itself is also a default compromise between the prior you have on the original and a given level of risk acceptance (it could have even been a 98% threshold).

Although it is hard to give an exact number, let’s make a rough scale for your threshold:

New A & B variations: If you have a case where variation A and variation B are both new, the threshold could be as low as 50%. If there is no past data on the variations’ performance and you must make a choice for implementation, even a 51% chance to win is better than 49%.
New website, low traffic: If your website is new and has very low traffic, you likely have very little prior on variation A (the original variation in this case). In that case, setting 85% as a threshold is reasonable. Since it means that if you put aside the little you know about the original you still have 85% to pick up the winner and only 15% to pick a variation that is equivalent to the original, and a lesser chance that it performs worse. So depending on the context, such a bet can make sense.
Mature business, low traffic: If your business has a longer history, but still lower traffic, 90% is a reasonable threshold. This is because there is still little prior on the original.
Mature business, high traffic: Having a lot of prior, or data, on variation A suggests a 95% threshold.

The original 95% threshold is far too high if your business has low traffic because there’s little chance that you will reach it. Consequently, your CRO strategy will have no effect and data-driven decision-making becomes impossible.

By using AB Tasty as your experimentation platform, you will be given a report that includes the “chance to win” along with other statistical information regarding your web experiments. A report from AB Tasty would also include the confidence interval on the estimated gain as an important indicator. The boundaries around the estimated gain are also computed in a Bayesian way, which means it can be interpreted as the best and the worst scenario.

The importance of Bayesian statistics

Now you understand the exact meaning of the well-known 95% “significance” level and are able to select appropriate thresholds corresponding to your particular case.

It’s important to remember that this approach only works with Bayesian statistics since frequentist approaches give statistical indices (such as p-Values and confidence intervals that have a totally different meaning and are not suited to the explained logic).

2. Are the stats valid with small numbers?

Yes, they are valid as long as you do not stop the test depending on the result.

Remember the testing protocol says once you decide on a testing period, the only reason to stop a test is when the timeframe has ended. In this case, the stat indices (“chances to win” & confidence interval) are true and usable.

You may be thinking: “Okay, but then I rarely reach the 95% significance level…”

Remember that the 95% threshold doesn’t need to be the magic number for all cases. If you have low traffic, chances are that your website is not old. If you refer back to the previous point, you can take a look at our suggested scale for different scenarios.

If you’re dealing with lower traffic as a newer business, you can certainly switch to a lower threshold (like 90%). The threshold is still higher because it’s typical to have more trust in an original rather than a variant because it’s used for a longer time.

If you’re dealing with two completely new variants, at the end of your testing period, it will be easier to pick the variant with the higher conversions (without using a stat rest) since there is no prior knowledge of the performance of A or B.

3. Go “upstream”

Sometimes the traffic problem is not due to a low-traffic website, but rather the webpage in question. Typically, pages with lower traffic are at the end of the funnel.

In this case, a great strategy is to work on optimizing the funnel closer to the user’s point of entry. There may be more to uncover with optimization in the digital customer journey before reaching the bottom of the funnel.

4. Is the CUPED technique real?

What is CUPED?

Controlled Experiment Using Pre-Experiment Data is a newer buzzword in the experimentation world. CUPED is a technique that claims to produce up to 50% faster results. Clearly, this is very appealing to small-traffic websites.

Does CUPED really work that well?

Not exactly, for two reasons: one is organizational and the other is applicability.

The organizational constraint

What’s often forgotten is that CUPED means Controlled experiment Using Pre-Experiment Data.

In practice, the ideal period of “pre-experiment data” is two weeks in order to hope for a 50% time reduction.

So, for a 2-week classic test, CUPED claims that you can end the test in only 1 week.

However, in order to properly see your results, you will need two weeks of pre-experiment data. So in fact, you must have three weeks to implement CUPED in order to have the same accuracy as a classic 2-week test.

Yes, you are reading correctly. In the end, you will need three weeks time to run the experiment.

This means that it is only useful if you already have two weeks of traffic data that is unexposed to any experiment. Even if you can schedule two weeks of no experimentations into your experimentation planning to collect data, this will be blocking traffic for other experiments.

The applicability constraint

In addition to the organizational/2-week time constraint, there are two other prerequisites in order for CUPED to be effective:

CUPED is only applicable to visitors browsing the site during both the pre-experiment and experiment periods.
These visitors need to have the same behavior regarding the KPI under optimization. Visitors’ data must be correlated between the two periods.

You will see in the following paragraph that these two constraints make CUPED virtually impossible for e-commerce websites and only applicable to platforms.

Let’s go back to our experiment settings example:

Two weeks of pre-experiment data
Two weeks of experiment data (that we hope will only last one week as there is a supposed 50% time reduction)
The optimization goal is a transaction: raising the number of conversions.

Constraint number 1 states that we need to have the same visitors in pre-experiment & experiment, but the visitor’s journey in e-commerce is usually one week.

In other words, there is very little chance that you see visitors in both periods. In this context, only a very limited effect of CUPED is to be expected (up to the portion of visitors that are seen in both periods).

Constraint number 2 states that the visitors must have the same behavior regarding the conversion (the KPI under optimization). Frankly, that constraint is simply never met in e-commerce.

The e-commerce conversion occurs either during the pre-experiment or during the experiment but not in both (unless your customer frequently purchases several times during the experiment time).

This means that there is no chance that the visitors’ conversions are correlated between the periods.

In summary: CUPED is simply not applicable for e-commerce websites to optimize transactions.

It is clearly stated in the original scientific paper, but for the sake of popularity, this buzzword technique is being misrepresented in the testing industry.

In fact, and it is clearly stated in scientific literature, CUPED works only on multiple conversions for platforms that have recurring visitors performing the same actions.

Great platforms for CUPED would be search engines (like Bing, where it has been invented) or streaming platforms where users come daily and do the same repeated actions (playing a video, clicking on a link in a search result page, etc).

Even if you try to find an application of CUPED for e-commerce, you’ll find out that it’s not possible.

One may say that you could try to optimize the number of products seen, but the problem of constraint 1 still applies: a very little number of visitors will be present on both datasets. And there is a more fundamental objection – this KPI should not be optimized on its own, otherwise you are potentially encouraging hesitation between products.
You cannot even try to optimize the number of products ordered by visitors with CUPED because constraint number 2 still holds. The act of purchase can be considered as instantaneous. Therefore, it can only happen in one period or the other – not both. If there is no visitor behavior correlation to expect then there is also no CUPED effect to expect.

Conclusion about CUPED

CUPED does not work for e-commerce websites where a transaction is the main optimization goal. Unless you are Bing, Google, or Netflix — CUPED won’t be your secret ingredient to help you to optimize your business.

This technique is surely a buzzword spiking interest fast, however, it’s important to see the full picture before wanting to add CUPED into your roadmap. E-commerce brands will want to take into account that this testing technique is not suited for their business.

Optimization for low-traffic websites

Brands with lower traffic are still prime candidates for website optimization, even though they might need to adapt to a less-than-traditional different approach.

Whether optimizing your web pages means choosing a page that’s higher up in the funnel or adopting a slightly lower threshold, continuous optimization is crucial.

You might also like...

See all

Article

3min read

Experiment Health Check: Proactive Monitoring for Reliable Experimentation

Emily Healy

Jul 1, 2025

Article

9min read

Heatmaps: Your Team’s Secret Weapon for Uncovering Website Gold

AB Tasty

Jun 24, 2025

Article

6min read

9 AI Features that Transform How Digital Teams Test, Learn, and Grow

Emily Healy

Jun 20, 2025

Subscribe to
our Newsletter

Article

Dec 21, 2022

10min read

The Ethical Use of First-Party Data for Personalization

AB Tasty

With the end of third-party cookies in sight, first-party data has moved to the forefront of digital marketing.

First-party data is a powerful tool for personalizing your customers’ buying journey. It’s generally more reliable and offers deeper customer insights than third-party data, helping you gain that competitive edge. But these benefits also bring responsibility. It’s essential from both a compliance and customer experience perspective that you practice ethical data collection when it comes to first-party data.

In this article, we take a closer look at first-party data—what it is, how you can collect and use it ethically and the benefits first-party data offers both your customers and your business.

What is first-party data?

First-party data is information about your customers that you collect directly from them via channels you own.

Potential sources of first-party data include your website, social media account, subscriptions, online chat or call center transcripts or customer surveys. Importantly, the first-party data you collect is yours and you have complete control over its usage.

Examples of first-party data include a customer’s

name, location and email address
survey responses
purchase history
loyalty status
search history
email open, click or bounce rates
interest profile
website or app navigational behavior, including the page they visit and the time they spend on them
interactions with paid ads
feedback

As it comes straight from the customer, first-party data provides you with deep and accurate insights into your audience, their buying behavior and preferences.

These insights are essential for guiding the development of digital marketing strategies that prioritize the human experience, such as personalization. They can also help you create customer personas to help connect with new audiences which may inform key business decisions, including new products or services.

How to collect first-party data

Customers may voluntarily provide first-party data. For example, customers submit their email addresses when signing up for a newsletter, offer their responses when completing a survey or leave comments on a social media post. This is often referred to as declarative data—personal information about your customers that comes from them.

Alternatively, first-party data can be collected via tracking pixels or first-party cookies that record customers’ interactions with your site. This produces behavioral data about your customers.

First-party data is typically stored on a Customer Data Platform (CDP) or Customer Relationship Management (CRM) Platform. From this, you can build a database of information that you can later use to generate customer personas and personalize your marketing efforts.

What is third-party data?

Third-party data removes the direct relationship between your business and your customers during the data collection process. While first-party data comes straight from your customers, third-party data is collected by a separate entity that has no connection to your audience or your business.

Unlike first-party data which is free to collect, third-party data is typically aggregated from various sources and then sold to businesses to use for marketing purposes.

From a marketing perspective, third-party data is further removed and therefore offers less accurate customer insights. You don’t know the source of third-party data and it likely comes from sources that have not used or don’t know your business, limiting its utility.

For many years, marketers relied on third-party cookies to provide the data needed to develop digital marketing strategies and campaigns. But over time, concerns around the ethics of third-party data collection grew, especially in relation to data privacy and users’ lack of control over their data. As a result, most of the major search engines have banned—or will soon ban, in the case of Google Chrome—the use of third-party cookies.

Is first-party data ethical?

First-party data is ethical if it’s collected, stored and used according to data privacy laws, regulations and best practices that require responsible and transparent data handling.

The move away from third-party cookies highlights how first-party data is preferable when it comes to ethical considerations. With full control over the data you collect, you can ensure your first-party data strategy protects the data privacy rights of your customers. You can clearly explain to your customers how you handle their data so they can decide whether they agree to it when using your site or service.

Unfortunately, unethical first-party data collection can and does happen. Businesses that collect data from their customers without informed consent or who use the data in a way the customer didn’t agree to—such as selling it to a third party—violate their data privacy. Not only does this carry potential legal consequences, but it also significantly undermines the relationship of trust between a business and its customers.

How do you collect first-party data ethically?

The first step towards ethical data handling is compliance. There is a range of data privacy laws protecting customer rights and placing obligations on businesses in terms of how they collect, store and use personal data, including first-party data.

Confirming which laws apply to your business and developing an understanding of your legal obligations under them is not only essential for compliance, but it also informs your data architecture structure. The application of data privacy laws depends on your business or activities meeting certain criteria. It’s worth noting that some data privacy laws apply based on where your customer is located, not your business.

Data privacy legislation in Europe

European customers’ data privacy is protected by the General Data Protection Regulation (GDPR). The GDPR requires businesses to demonstrate ethics in data collection and use.

This often means customers must provide informed consent, or opt-in, to their data being collected and used. Businesses must also keep records of this consent. Customers can withdraw their consent at any time and request their data be deleted in certain cases. You must implement reasonable security measures to ensure data is stored securely, according to the level of risk. One option is to use air-gap backups to protect data from cyber threats by isolating it from the network. In certain circumstances, you also need to nominate a data protection officer.

Data privacy legislation in the UK

If you have UK-based customers, you need to comply with the provisions of the UK General Data Protection Regulation (UK GDPR) and the Data Protection Act 2018. These include providing a lawful basis for collecting personal data, such as consumer consent via a positive opt-in.

Consumers have the right to request the use of their data be restricted or their data erased, in certain circumstances. Relevant to first-party data, consumers can object to their data being used for profiling, including for direct marketing purposes.

Data privacy legislation in the US

The US doesn’t have a federal data privacy law. Instead, an increasing number of states have introduced their own. The first state to do so was California.

Under the California Consumer Privacy Act (CCPA)*, you can only collect customer data by informed consent—customers need to know how data, including first-party data, is collected and used. Customers also have the right to opt-out of the sale of their personal data and to request their data be deleted. If a data breach occurs where you have failed to use reasonable security measures to store the data, customers have a right of action.

2023 looks to be a big year for the data privacy landscape in America. In Virginia, the Consumer Data Protection Act (VCDPA) is due to commence on January 1. The VCDPA includes a provision for customers to opt-out of data collection for profiling or targeted advertising processes. Colorado, Connecticut and Utah have introduced similar laws, also ready to commence next year.

Beyond compliance

As you can see, some general principles emerge across the different pieces of data privacy legislation:

Customer consent — customers should consent to the collection and use of their data
Transparency — you should explain to customers what data you collect, how you collect it and what you do with it, typically via a privacy policy or statement
Control — customers should be able to control the use of their data, including requesting its deletion.

From a consumer perspective, compliance is the bare minimum. While the design of your data architecture structure should be guided by the above principles and comply with any relevant data privacy laws, you can also take extra steps to demonstrate your business’s commitment to ethical data handling. This may include appointing a data protection officer to oversee compliance and provide a point of contact for complaints or providing your employees with training, even where it isn’t required by law.

How to use first-party data

In a crowded online marketplace, it’s hard to make yourself heard over the noise. Arming yourself with accurate and reliable first-party data, however, helps you stand out from the crowd and communicate your message to both current and potential customers.

Firstly, you can use the first-party data you collect to create an exceptional customer journey through personas—fictional representations of your customers’ broad wants and needs. Building a series of personas can help you tailor your product or service and business practices to better serve your general customer base.

First-party data is also a crucial ingredient for more specific 1:1 personalization. With it, you can craft a unique user experience for your customers by delivering individual recommendations, messages, ads, content and offers to improve their purchasing journey.

In addition to serving a marketing purpose, first-party data is also essential for retargeting customers, for example, by sending abandoned cart emails. It can also help you identify and address gaps in your customers’ buying experience or your current offerings.

Want to get started with 1:1 personalization or personal recommendations?

AB Tasty and Epoq is the complete platform for experimentation, content personalization, and AI-powered recommendations equipped with the tools you need to create a richer digital experience for your customers — fast. With embedded AI and automation, this platform can help you achieve omnichannel personalization and revolutionize your brand and product experiences.

Benefits of first-party data

Personalization

First-party data provides deeper insights than second or third-party data, allowing you to incorporate a higher degree of personalization in your marketing. In turn, this improves the buying experience for your customers, gaining their loyalty.

Reduces costs

Engaging a third party to aggregate data costs money. First-party data, on the other hand, doesn’t cost you anything to collect.

Increases accuracy

Collecting data from your specific customer base and their interactions with your company produces tailored insights, rather than generic information. First-party data comes directly from the source, increasing its reliability.

Gives you control over data

You own first-party data collected from your customers. This puts you in full control of how it is collected, stored and used.

Transparency

As you have full control over how you collect and use first-party data, you can clearly explain this to your customers to obtain their informed consent. This transparency builds trust and loyalty with your customer base.

Strengthens customer relationships

In a recent Ipsos poll, 84% of Americans report being at least somewhat concerned about the safety of the personal data they provide on the internet. At the same time, Salesforce found that 61% of consumers are comfortable with businesses using their personal data in a beneficial and transparent way. First-party data builds better customer relationships by balancing customers’ desire for data privacy with their preference for personalized advertising.

Compliance with regional privacy laws

Most countries are strengthening their legislative framework around data privacy and prioritizing users’ rights. With first-party data, you can design your data architecture structure to ensure it complies with any relevant laws.

Ethical first-party data handling benefits both you and your customers

First-party data is the key to accurate and sharp customer insights that help you shape effective, targeted marketing strategies. But with the demand for ethical data collection at an all-time high, it’s important you treat your customers’ first-party data with care.

First-party data should be collected responsibly and transparently, with the customer’s fully informed consent. Your first-party data strategy also needs to comply with any relevant data privacy laws, regulations and best practices. This approach achieves a happy medium between addressing customers’ data privacy concerns with their desire for personalization during the purchasing journey. It also helps you optimize your customer’s experience with your business and, in turn, your profits.

Interested in learning more about how you can use first-party data to benefit your business? Check out our customer-centric data series for more insights from the experts.

*Amendments to the CCPA are due to be introduced in 2023, via the California Privacy Rights Act. Many of the related regulations are still being updated.