Article

12min read

A/A Testing: What is it and When Should You Use it?

A/A tests are a legacy from the early days of A/B testing. It’s basically creating an A/B test where two identical versions of a web page or element are tested against each other. Variation B is just a copy of A without any modification.

One of the goals of A/A tests is to check the effectiveness and accuracy of testing tools. The expectation is that, if no winner is declared, the test is a success. Whereas detecting a statistical difference would mean a failure, indicating a problem somewhere in the pipeline.

But it’s not always that simple. We’ll dive into this type of testing and the statistics and tech behind the scenes. We’ll look at why a failed A/A test is not a proof of pipeline failure, and that a successful A/A test isn’t a foolproof sanity check.

What is tested during an A/A test?

Why is there so much buzz around A/A testing? An A/A test can be a way to verify two components of an experimentation platform: 

  1. The statistical tool: It may be possible that the formulas chosen don’t fit the real nature of the data, or may contain bugs.
  2. The traffic allocation: The split between variations must be random and respect the proportions it has been given. When a problem occurs, we talk about Sample Ratio Mismatch (SRM); that is, the observed traffic does not match the allocation setting. This means that the split has some bias impacting the analysis quality.
    Let’s explore this in more detail.

Statistical tool test

Let’s talk about a “failed” A/A test

The most common idea behind A/A tests is that the statistical tool should yield no significant difference. It is considered a “failed” A/A test if you detect a difference in performance during an A/A test. 

However, to understand how weak this conclusion is, you need to understand how statistical tests work. Let’s say that your significance threshold is 95%. This means that there is still a 5% chance that the difference you see is a statistical fluke and no real difference exists between the variations. So even with a perfectly working statistical tool, you still have one chance in twenty (1/20=5%) that you will have a “failed” A/A test and you might start looking for a problem that may not exist.

With that in mind, an acceptable statistical procedure would be to perform 20 A/A tests and expect to have 19 that yield no statistical difference, and one that does detect a significant difference. And even in this case, if 2 tests show significant results, it’s a sign of a real problem. In other words, having 1 successful A/A test is in fact not enough to validate a statistical tool. To validate it fully, you need to show that the tests are successful 95% of the time (=19/20).

Therefore, a meaningful approach would be to perform hundreds of A/A tests and expect ~5% of them to “fail”. It’s worth noting that if it “fails” less than 5% of the time it’s also a problem, maybe indicating that the statistical test simply says “no” too often, leading to a strategy that never detects any winning variation. So one A/A “failed” test doesn’t tell much in reality. 

What if it’s a “successful A/A test”? 

A “successful” A/A test (yielding no difference) is not proof that everything is working as it should. To understand why, you need to check another important tool in an A/B test: the sample size calculator.

In the following example, we see that from a 5% conversion rate, you need around 30k visitors per variation to reach the 95% significance level if a variation yields a 10% MDE (Minimal Detectable Effect).

But in the context of an A/A test, the Minimal Detectable Effect (MDE) is in fact 0%. Using the same formula, we’ll plug 0% as MDE.

At this point, you will discover that the form does not let you put a 0% here, so let’s try a very small number then. In this case, you get almost 300M visitors, as seen below.

In fact, to be confident that there is exactly no difference between two variations, you need an infinite number of visitors, which is why the form does not let you set 0% as MDE.

Therefore, a successful A/A test only tells you that the difference between the two variations is smaller than a given number but not that the two variations perform exactly the same.

This problem comes from another principle in statistical tests: the power. 

The power of a test is the chance that you discover a difference if there is any. In the context of an A/A test, this refers to the chance you discover a statistically significant discrepancy between the two variations’ performance. 

The more power, the more chance you will discover a difference. To raise the power of a test you simply raise the number of visitors.

You may have noticed that in the previous screenshots, tests are usually powered at 80%. This means that even if a difference exists between the variations in performance, 20% of the time you will miss it. So one “successful” A/A test (yielding no statistical difference) may just be an occurrence of this 20%. In other words, having just one successful A/A test doesn’t ensure the efficiency of your experimentation tool. You may have a problem and there is a 20% chance that you missed it. Additionally, reaching 100% of power will need an infinite number of visitors, making it impractical.

How do we make sure we can trust the statistical tool then? If you are using a platform that is used by thousands of other customers, chances are that the problem would have already been discovered. 

Because statistical software does not change very often and it is not affected by the variation content (whereas the traffic allocation might change, as we will see later), the best option is to trust your provider, or you can double-check the results with an independent provider. You can find a lot of independent calculators on the web. They only need the number of visitors and the number of conversions for each variation to provide the results making it quick to implement.

Traffic allocation test

In this part, we only focus on traffic, not conversions. 

The question is: does the splitting operation work as it should? We call this kind of failure a SRM or Sample Ratio Mismatch. You may ask yourself how a simple random choice could fail. In fact, the failure happens either before or after the random choice. 

The following demonstrates two examples where that can happen:

  • The variation contains a bug that may crash some navigators. In this case, the corresponding variation will lose visitors. The bug might depend on the navigator and then you will end up with bias in your data.
  • If the variation gives a discount coupon (or any other advantage), and some users find a way to force their navigator to run the variation (to get the coupon), then you will have an excess of visitors for that variation that is not due to random chance, which results in biased data.


It’s hard to detect with the naked eye because the allocation is random, so you never get sharp numbers. 

For instance, a 50/50 allocation never precisely splits the traffic in groups with the exact same size. As a result, we would need statistical tools to check if the split observed corresponds with the desired allocation. 

SRM tests exist. They work more or less the same way as an A/B test except that the SRM formula indicates whether there is a difference between the desired allocation and what really happened. If there is indeed an SRM, then there is a chance that this difference is not due to pure randomness. This means that some data is lost or bias occurred during the experiment entailing trust for future (real) experiments.

On the one hand, detecting an SRM during an A/A test sounds like a good idea. On the other hand, if you think operationally it might not be that useful because the chance of a SRM is low.  

Even if some reports say that they are more frequent than you may think, most of the time it happens on complex tests. In that sense, checking SRM within an A/A test will not help you to prevent having one on a more complex experiment later. 

If you find a Sample Ration Mismatch on a real experiment or in an A/A test, the following actions remain the same: find the cause, fix it, and restart the experiment. So why waste time and traffic on an A/A test that will give you no information? A real experiment would have given you real information if it worked fine on the first try. If a problem does occur, we would detect it even in a real experiment since we only consider traffic and not conversions.

A/A tests are also unnecessary since most trustworthy A/B testing platforms (like AB Tasty) do SRM checks on an automated basis. So if an SRM occurs, you will be notified anyway. 

So where does this “habit” of practicing A/A tests come from?

Over the years, it’s something that engineers building A/B testing platforms have done. It makes sense in this case because they can run a lot of automated experiments, and even simulate users if they don’t have enough at hand, performing a sound statistical approach to A/A tests. 

They have reasons to doubt the platform in the works and they have the programming skills to automatically create hundreds of A/A tests to test it properly. Since these people can be seen as pioneers, their voice on the web is loud when they explain what an A/A test is and why it’s important (from an engineering perspective).

However, for a platform user/customer, the context is different as they’ve paid for a ready-to- use and trusted platform and can start a real experiment as soon as possible to get a return on investment. Therefore, it makes little sense to waste time and traffic on an A/A test that won’t provide any valuable information.

Why sometimes it might be better to skip A/A tests

We can conclude that a failed A/A test is not a problem and that a successful one is not  proof of sanity. 

In order to gain valuable insights from A/A tests, you would need to perform hundreds of them with an infinite number of visitors. Moreover, an efficient platform like AB Tasty does the corresponding checks for you.

That’s why, unless you are developing your own A/B testing platform, running an A/A test may not give you the insights you’re looking for. A/A tests require a considerable amount of time and traffic that could otherwise be used to conduct A/B tests that could give you valuable insights on how to optimize your user experience and increase conversions. 

When it makes sense to run an A/A test

It may seem that running A/A tests may not be the right call after all. However, there may be a couple of reasons why it might still be useful to perform A/A tests. 

First is when you want to check the data you are collecting and compare it to data already collected with other analytics tools but keep in mind that you will never get the exact same results. The reason is that most of the metric definitions vary on different tools. Nonetheless this comparison is an important onboarding step to ensure that the data is properly collected.

The other reason to perform an A/A test is to know the reference value for your main metrics so you can establish a baseline to analyze your future campaigns more accurately. For example, what is your base conversion rate and/or bounce rate? Which of these metrics need to be improved and are, therefore, a good candidate for your first real A/B test?

This is why AB Tasty has a feature that helps users build A/A tests dedicated to reach these goals and avoids the pitfalls of “old school”  methods that are not useful anymore. With our new A/A test feature, A/A test data is collected in one variant (not two); let’s call this an “A test”. 

This allows you to have a more accurate estimation of these important metrics as the more data you have, the more accurate the measurements are. Meanwhile, in a classic A/A test, data is collected in two different variants which provides less accurate estimates since you have less data for each variant.

With this approach, AB Tasty enables users to automatically set up A/A tests, which gives better insights than classic “handmade” A/A tests.

You might also like...

Subscribe to
our Newsletter

bloc Newsletter EN

AB Tasty's Privacy Policy is available here.

Article

8min read

How AI Can Enhance Your Experimentation Roadmap

Artificial intelligence has been a recurring theme for decades. However, it’s no longer science fiction – it’s a reality.

Since OpenAI launched its own form of generative AI, ChatGPT, in November 2022, the world has yet to stop talking about its striking capabilities. It’s particularly fascinating to see just how easy it is to get results after interacting with this bot which is comprised of deep-learning algorithms for natural language processing.

Even Google quickly followed by launching a new and experimental project, Bard, to revolutionize its own Search. By harnessing the power of generative AI and the capacity of large language models, Google is seeking to take its search process to the next level.

Given the rapid growth of this technological advancement over the past few months, it’s time that we talk about generative AI in the context of A/B testing and experimentation.

Whether you’re curious about how AI can impact your experiments or are ready for inspiration we’ll discuss some of our ideas around using AI for A/B testing, personalization, and conversion rate optimization.

What is generative AI?

Generative AI is a type of artificial intelligence that doesn’t have programming limitations, which allows it to generate new content (think ChatGPT). Instead of following a specific, pre-existing dataset, generative AI learns from indexing extensive data, focusing on patterns and using deep learning techniques and neural networks to create human-like content based on its learnings.

The way algorithms capture ideas is similar to how humans gather inspiration from previous experiences to create something unique. Based on the large amounts of data used to craft generative AI’s learning abilities, it’s capable of outputting high-quality responses that are similar to what a human would create.

However, some concerns need to be addressed:

  • Biased information: Artificial intelligence is only as good as the datasets used to train it. Therefore if the data used to train it has biases, it may create “ideas” that are equally biased or flawed.
  • Spreading misinformation: There are many concerns about the ethics of generative AI and sharing information directly from it. It’s best practice to fact-check any content written by AI to avoid putting out false or misleading information.
  • Content ownership: Since content generated with AI is not generated by a human, can you ethically can claim it as your own idea? In a similar sense, the same idea could potentially be generated elsewhere by using a similar prompt. Copywriting and ownership are then called into question here.
  • Data and privacy: Data privacy is always a top-of-mind concern. With the new capabilities of artificial intelligence, data handling becomes even more challenging. It’s always best practice to avoid using sensitive information with any form of generative AI.

By keeping these limitations in mind, generative AI has the potential to streamline processes and revolutionize the way we work – just as technology has always done in the past.

10 generative AI uses for A/B testing

In the A/B testing world, we are very interested in how one can harness these technological breakthroughs for experimentation. We are brainstorming a few approaches to re-imagine the process of revolutionizing digital customer experiences to ultimately save time and resources.

Just like everyone else, we started to wonder how generative AI could impact the world of experimentation and our customers. Here are some ideas, some of them concrete and some more abstract, as to how artificial intelligence could help our industry:

DISCLAIMER: Before uploading information into any AI platform, ensure that you understand their privacy and security practices. While AI models strive to maintain a privacy standard, there’s always the risk of data breaches. Always protect your confidential information. 

1. Homepage optimization

Your homepage is likely the first thing your visitors will see so optimization is key to staying ahead of your competitors. If you want a quick comparison of content on your homepage versus your competitors, you can feed this information into generative AI to give it a basis for understanding. Once your AI is loaded with information about your competitors, you can ask for a list of best practices to employ to make new tests for your own website.

2.  Analyze experimentation results

Reporting and analyzing are crucial to progressing on your experimentation roadmap, but it’s also time-consuming. By collecting a summary of testing logs, generative AI can help highlight important findings, summarize your results, and potentially even suggest future steps. Ideally, you can feed your A/B test hypothesis as well as the results to show your thought process and organization. After it recognizes this specific thought process and desired results, it could aid in generating new test hypotheses or suggestions.

3. Recommend optimization barriers

Generative AI can help you prioritize your efforts and identify the most impactful barriers to your conversion rate. Uploading your nonsensitive website performance data gathered from your analytics platforms can give AI the insight it needs into your performance. Whether it suggests that you update your title tags or compress images on your homepage, AI can quickly spot where you have the biggest drop-offs to suggest areas for optimization.

4. Client reviews

User feedback is your own treasure trove of information for optimization. One of the great benefits of AI that we already see is that it can understand large amounts of data quickly and summarize it. By uploading client reviews, surveys and other consumer feedback into the database, generative AI can assist you in creating detailed summaries of your users’ pain points, preferences and levels of satisfaction. The more detailed your reviews – the better the analysis will be.

5. Chatbots

Chatbots are a popular way to communicate with website visitors. As generative AI is a large language model, it can quickly generate conversational scripts, prompts and responses to reduce your brainstorming time. You can also use AI to filter and analyze conversations that your chatbot is already having to determine if there are gaps in the conversation or ways to enhance its interaction with customers.

6. Translation

Language barriers can limit a brand that has a presence in multiple regions. Whether you need translations for your chatbot conversations, CTAs or longer form copy, generative AI can provide you with translations in real time to save you time and make your content accessible to all zones touched by your brand.

7. Google Adwords

Speed up brainstorming sessions by using generative AI to experiment with different copy variations. Based on the prompts you provide, AI can provide you with a series of ideas for targeting keywords and creating copy with a particular tone of voice to use with Google Adwords. Caution: be sure to double-check all keywords proposed to verify their intent. 

8. Personalization

Personalized content can be scaled at speed by leveraging artificial intelligence to produce variations of the same messages. By customizing your copy, recommendations, product suggestions and other messages based on past user interactions and consumer demographics, you can significantly boost your digital consumer engagement.

9. Product Descriptions

Finding the best wording to describe why your product is worth purchasing may be a challenge. With generative AI, you can get more ambitious with your product descriptions by testing out different variations of copy to see which version is the most promising for your visitors.

10. Predict User Behavior

Based on historical data from your user behavior, generative AI can predict behavior that can help you to anticipate your next A/B test. Tailoring your tests according to patterns and trends in user interaction can help you conduct better experiments. It’s important to note that predictions will be limited to patterns interpreted by past customer data collected and uploaded. Using generative AI is better when it’s used as a tool to guide you in your decision-making process rather than to be the deciding force alone.

The extensive use of artificial intelligence is a new and fast-evolving subject in the tech world. If you want to leverage it in the future, you need to start familiarizing yourself with its capabilities.

Keep in mind that it’s important to verify the facts and information AI generates just as you carefully verify data before you upload. Using generative AI in conjunction with your internal experts and team resources can assist in improving ideation and efficiency. However, the quality of the output from generative AI is only as good as what you put in.

Is generative AI a source of competitive advantage in A/B testing?

The great news is that this technology is accessible to everyone – from big industry leaders like Google to start-ups with a limited budget. However, the not-so-great news is that this is available to everyone. In other words, generative AI is not necessarily a source of competitive advantage.

Technology existing by itself does not create more value for a business. Rather, it’s the people driving the technology who are creating value by leveraging it in combination with their own industry-specific knowledge, past experiences, data collection and interpretation capabilities and understanding of customer needs and pain points.

While we aren’t here to say that generative AI is a replacement for human-generated ideas, this technology can definitely be used to complement and amplify your already-existing skills.

Leveraging generative AI in A/B testing

From education to copywriting or coding – all industries are starting to see the impact that these new software developments will have. Leveraging “large language models” is becoming increasingly popular as these algorithms can generate ideas, summarize long forms of text, provide insights and even translate in real-time.

Proper experimentation and A/B testing are at the core of engaging your audience, however, these practices can take a lot of time and resources to accomplish successfully. If generative AI can offer you ways to save time and streamline your processes, it might be time to use it as your not-so-secret weapon. In today’s competitive digital environment, continually enhancing your online presence should be at the top of your mind.

Want to start optimizing your website? AB Tasty is the best-in-class experience optimization platform that empowers you to create a richer digital experience – fast. From experimentation to personalization, this solution can help you activate and engage your audience to boost your conversions.