Article

8min read

10 Generative AI Ideas for Your Experimentation Roadmap

Artificial intelligence has been a recurring theme for decades. However, it’s no longer science fiction – it’s a reality.

Since OpenAI launched its own form of generative AI, ChatGPT, in November 2022, the world has yet to stop talking about its striking capabilities. It’s particularly fascinating to see just how easy it is to get results after interacting with this bot which is comprised of deep-learning algorithms for natural language processing.

Even Google quickly followed by launching a new and experimental project, Gemini, to revolutionize its own Search. By harnessing the power of generative AI and the capacity of large language models, Google is seeking to take its search process to the next level.

Given the rapid growth of this technological advancement over the past few months, it’s time that we talk about generative AI in the context of A/B testing and experimentation.

Whether you’re curious about how AI can impact your experiments or are ready for inspiration we’ll discuss some of our ideas around using AI for A/B testing, personalization, and conversion rate optimization.

What is generative AI?

Generative AI is a type of artificial intelligence that doesn’t have programming limitations, which allows it to generate new content (think ChatGPT). Instead of following a specific, pre-existing dataset, generative AI learns from indexing extensive data, focusing on patterns and using deep learning techniques and neural networks to create human-like content based on its learnings.

The way algorithms capture ideas is similar to how humans gather inspiration from previous experiences to create something unique. Based on the large amounts of data used to craft generative AI’s learning abilities, it’s capable of outputting high-quality responses that are similar to what a human would create.

However, some concerns need to be addressed:

  • Biased information: Artificial intelligence is only as good as the datasets used to train it. Therefore if the data used to train it has biases, it may create “ideas” that are equally biased or flawed.
  • Spreading misinformation: There are many concerns about the ethics of generative AI and sharing information directly from it. It’s best practice to fact-check any content written by AI to avoid putting out false or misleading information.
  • Content ownership: Since content generated with AI is not generated by a human, can you ethically can claim it as your own idea? In a similar sense, the same idea could potentially be generated elsewhere by using a similar prompt. Copywriting and ownership are then called into question here.
  • Data and privacy: Data privacy is always a top-of-mind concern. With the new capabilities of artificial intelligence, data handling becomes even more challenging. It’s always best practice to avoid using sensitive information with any form of generative AI.

By keeping these limitations in mind, generative AI has the potential to streamline processes and revolutionize the way we work – just as technology has always done in the past.

10 generative AI uses for A/B testing

In the A/B testing world, we are very interested in how one can harness these technological breakthroughs for experimentation. We are brainstorming a few approaches to re-imagine the process of revolutionizing digital customer experiences to ultimately save time and resources.

Just like everyone else, we started to wonder how generative AI could impact the world of experimentation and our customers. Here are some ideas, some of them concrete and some more abstract, as to how artificial intelligence could help our industry:

DISCLAIMER: Before uploading information into any AI platform, ensure that you understand their privacy and security practices. While AI models strive to maintain a privacy standard, there’s always the risk of data breaches. Always protect your confidential information. 

1. Homepage optimization

Your homepage is likely the first thing your visitors will see so optimization is key to staying ahead of your competitors. If you want a quick comparison of content on your homepage versus your competitors, you can feed this information into generative AI to give it a basis for understanding. Once your AI is loaded with information about your competitors, you can ask for a list of best practices to employ to make new tests for your own website.

2.  Analyze experimentation results

Reporting and analyzing are crucial to progressing on your experimentation roadmap, but it’s also time-consuming. By collecting a summary of testing logs, generative AI can help highlight important findings, summarize your results, and potentially even suggest future steps. Ideally, you can feed your A/B test hypothesis as well as the results to show your thought process and organization. After it recognizes this specific thought process and desired results, it could aid in generating new test hypotheses or suggestions.

3. Recommend optimization barriers

Generative AI can help you prioritize your efforts and identify the most impactful barriers to your conversion rate. Uploading your nonsensitive website performance data gathered from your analytics platforms can give AI the insight it needs into your performance. Whether it suggests that you update your title tags or compress images on your homepage, AI can quickly spot where you have the biggest drop-offs to suggest areas for optimization.

4. Client reviews

User feedback is your own treasure trove of information for optimization. One of the great benefits of AI that we already see is that it can understand large amounts of data quickly and summarize it. By uploading client reviews, surveys and other consumer feedback into the database, generative AI can assist you in creating detailed summaries of your users’ pain points, preferences and levels of satisfaction. The more detailed your reviews – the better the analysis will be.

5. Chatbots

Chatbots are a popular way to communicate with website visitors. As generative AI is a large language model, it can quickly generate conversational scripts, prompts and responses to reduce your brainstorming time. You can also use AI to filter and analyze conversations that your chatbot is already having to determine if there are gaps in the conversation or ways to enhance its interaction with customers.

6. Translation

Language barriers can limit a brand that has a presence in multiple regions. Whether you need translations for your chatbot conversations, CTAs or longer form copy, generative AI can provide you with translations in real time to save you time and make your content accessible to all zones touched by your brand.

7. Google Adwords

Speed up brainstorming sessions by using generative AI to experiment with different copy variations. Based on the prompts you provide, AI can provide you with a series of ideas for targeting keywords and creating copy with a particular tone of voice to use with Google Adwords. Caution: be sure to double-check all keywords proposed to verify their intent. 

8. Personalization

Personalized content can be scaled at speed by leveraging artificial intelligence to produce variations of the same messages. By customizing your copy, recommendations, product suggestions and other messages based on past user interactions and consumer demographics, you can significantly boost your digital consumer engagement.

9. Product Descriptions

Finding the best wording to describe why your product is worth purchasing may be a challenge. With generative AI, you can get more ambitious with your product descriptions by testing out different variations of copy to see which version is the most promising for your visitors.

10. Predict User Behavior

Based on historical data from your user behavior, generative AI can predict behavior that can help you to anticipate your next A/B test. Tailoring your tests according to patterns and trends in user interaction can help you conduct better experiments. It’s important to note that predictions will be limited to patterns interpreted by past customer data collected and uploaded. Using generative AI is better when it’s used as a tool to guide you in your decision-making process rather than to be the deciding force alone.

The extensive use of artificial intelligence is a new and fast-evolving subject in the tech world. If you want to leverage it in the future, you need to start familiarizing yourself with its capabilities.

Keep in mind that it’s important to verify the facts and information AI generates just as you carefully verify data before you upload. Using generative AI in conjunction with your internal experts and team resources can assist in improving ideation and efficiency. However, the quality of the output from generative AI is only as good as what you put in.

Is generative AI a source of competitive advantage in A/B testing?

The great news is that this technology is accessible to everyone – from big industry leaders like Google to start-ups with a limited budget. However, the not-so-great news is that this is available to everyone. In other words, generative AI is not necessarily a source of competitive advantage.

Technology existing by itself does not create more value for a business. Rather, it’s the people driving the technology who are creating value by leveraging it in combination with their own industry-specific knowledge, past experiences, data collection and interpretation capabilities and understanding of customer needs and pain points.

While we aren’t here to say that generative AI is a replacement for human-generated ideas, this technology can definitely be used to complement and amplify your already-existing skills.

Leveraging generative AI in A/B testing

From education to copywriting or coding – all industries are starting to see the impact that these new software developments will have. Leveraging “large language models” is becoming increasingly popular as these algorithms can generate ideas, summarize long forms of text, provide insights and even translate in real-time.

Proper experimentation and A/B testing are at the core of engaging your audience, however, these practices can take a lot of time and resources to accomplish successfully. If generative AI can offer you ways to save time and streamline your processes, it might be time to use it as your not-so-secret weapon. In today’s competitive digital environment, continually enhancing your online presence should be at the top of your mind.

Want to start optimizing your website? AB Tasty is the best-in-class experience optimization platform that empowers you to create a richer digital experience – fast. From experimentation to personalization, this solution can help you activate and engage your audience to boost your conversions.

Subscribe to
our Newsletter

bloc Newsletter EN

We will process and store your personal data to respond to send you communications as described in our  Privacy Policy.

Article

10min read

How Long Should You Run an A/B Test?

One of the most popular questions when starting with experimentation is: How long should an A/B test run before you can draw conclusions from it?

Determining the ideal A/B test duration can be a challenge for most businesses. You have to factor in your business cycles, traffic flow, the sample size needed and be aware of other business campaigns.

Even if you reach your sample size in a few days… is it okay to end your test then? How long should you really wait?

In this article, we will discuss potential mishaps if your testing cycle is too short, give insights into which factors you need to consider and share advice on finding the best duration for your A/B tests.

Looking for fast statistical reliability? At AB Tasty, we provide a free A/B test duration calculator, which also has capabilities for a sample size calculator.

What happens if you end an A/B test too soon?

The underlying question is a crucial one and can be summed up as follows: At what point can you end a test that appears to be yielding results?

The answer depends on the relevance of the analysis and on the actual benefits of the test.

In fact, it’s not all that unusual to see tests yield good results during the trial phase and no longer see those results once the modifications are introduced.

In most cases, a disappointing observation of this nature comes down to an error during the trial phase: the test was ended too soon and the results at that point were misleading.

Let’s look at an example that illustrates the nature of the problem.

How long to run an A/B test

The graph above shows the change in the conversion rate of two versions of a page that were the subject of a test. The first version appears to break away and perform extremely well. The discrepancy between the two versions is gradually eroded as time goes by – two weeks after the starting point there’s hardly any observable difference at all.

This phenomenon where the results converge is a typical situation: the modification made does not have a real impact on conversion.

There is a simple explanation for the apparent outperformance at the start of the test:  it’s unusual for the samples to be representative of your audience when the test starts. You need time for your samples to incorporate all internet user profiles, and therefore, all of their behaviors.

If you end the test too soon and allow your premature data to be the deciding factor, your results will quickly show discrepancies.

How to determine the duration of your A/B test

Now that the problem has been aired let’s have a look at how you can avoid falling into this trap.

The average recommended A/B testing time is 2 weeks, but you should always identify key factors relevant to your own conversion goals to determine the best length for a test that will meet your goals.

Let’s discuss several criteria you should use as a foundation to determine when you can trust the results you see in your A/B testing:

  • The statistical confidence level
  • The size of the sample
  • The representativeness of your sample
  • The test period and the device being tested

1.  The statistical confidence level

All A/B testing solutions show a statistical reliability indicator that measures the probability of the difference in the results observed between each sample not being a matter of chance.

This indicator, which is calculated using the Chi-squared test, is the first indicator that should be used as a basis. It is used by statisticians to assert that a test is deemed reliable when the rate is 95% or higher.  So, it is acceptable to make a mistake in 5% of cases and for the results of the two versions to be identical.

Yet, it would be a mistake to use this indicator alone as a basis for assessing the appropriate time to end a test.

For the purposes of devising the conditions necessary to assess the reliability of a test, this is not sufficient. In other words, if you have not reached this threshold then you cannot make the decision. Additionally, once this threshold has been reached, you still need to take certain precautions.

It’s also important to understand what the Chi-squared test actually is: a way of rejecting or not rejecting what is referred to as the null hypothesis.

This, when applied to A/B testing, is when you say that two versions produce identical results (therefore, there’s no difference between them).

If the conclusion of the test leads you to reject the null hypothesis then it means that there is a difference between the results.

However, the test is in no way an indication of the extent of this difference.

Related: A/B Test Hypothesis Definition, Tips and Best Practices

2. The size of the sample

There are lots of online tools that you can use to calculate the value of Chi-squared by giving, as the input parameters, the four elements necessary for its calculation (within the confines of a test with two versions).

AB Tasty can provide you with our own sample size calculator for you to find the value of Chi-squared.

By using this tool, we have taken an extreme example in order to illustrate this exact problem.

Sample size required for A/B testing

In this diagram, the Chi-squared calculation suggests that sample 2 converts better than sample 1 with a 95% confidence level. Having said that, the input values are very low indeed and there is no guarantee that if 1,000 people were tested, rather than 100, you would still have the same 1 to 3 ratio between the conversion rates.

It’s like flipping a coin. If there is a 50% probability that the coin will land heads-up or tails-up, then it’s possible to get a 70 / 30 distribution by flipping it just 10 times. It’s only when you flip the coin a very large number of times that you get close to the expected ratio of 50 / 50.

So, in order to have faith in the Chi-squared test, you are advised to use a significant sample size.

You can calculate the size of this sample before beginning the test to get an indication of the point at which it would be appropriate to look at the statistical reliability indicator. There are several tools online that you could use to calculate this sample size.

In practice, this can turn out to be difficult, as one of the parameters to be given is the % improvement expected (which is not easy to evaluate). But, it can be a good exercise to assess the pertinence of the modifications being envisaged.

Pro Tip: The lower the expected improvement rate, the greater the sample size needed to be able to detect a real difference.  

If your modifications have a very low impact, then a lot of visitors will need to be tested. This serves as an argument in favor of introducing radical or disruptive modifications that would probably have a greater impact on the conversion.

img_548ab40f4fb20

3. The representativeness of your sample

If you have a lot of traffic, then getting a sufficiently large sample size is not a problem and you will be able to get a statistical reliability rate in just a few days, sometimes just two or three.

Related: How to Deal with Low Traffic in CRO

Having said that, ending a test as soon as the sample size and statistical reliability conditions have been met is no guarantee that results in a real-life situation are being reproduced.

The key point is to test for as long as you need to in order for all of your audience segments to be included.

Actually, the statistical tests operate on the premise that your samples are distributed in an identical fashion. In other words, the conversion probability is the same for all internet users.

But this is not the case: the probability varies in accordance with different factors such as the weather, the geographical location and also user preferences.

There are two very important factors that must be taken into account here: your business cycles and traffic sources.

Your business cycles 

Internet users do not make a purchase as soon as they come across your site. They learn more, they compare, and their thoughts take shape.  One, two or even three weeks might elapse between the time they are the subject of one of your tests and the point at which they convert.

If your purchasing cycle is three weeks long and you have only run the test for one week, then your sample will not be representative. As the tool records visits from all internet users, they may not record the conversions of those that are impacted by your test.

Therefore, you’re advised to test over at least one business cycle and ideally two.

Your traffic sources 

Your sample must incorporate all of your traffic sources including emails, sponsored links and social networks. You need to make sure that no single source is over-represented in your sample.

Let’s take a concrete situation:  if the email channel is a weak source of traffic but significant in terms of revenue and you carry out a test during an email campaign, then you are going to include internet users who have a stronger tendency to make a purchase in your sample.

This would no longer be a representative sample. It’s also crucial to know about major acquisition projects and, if possible, not to test during these periods.

The same goes for tests during sales or other significant promotional periods that attract atypical internet users. You will often see less marked differences in the results if you re-do the tests outside these periods.

It turns out that it’s quite difficult to make sure that your sample is representative, as you have little control over the kind of internet users who take part in your test.

Thankfully, there are two ways of overcoming this problem.

  • The first is to extend the duration of your test more than is necessary in order to get closer to the normal spread of your internet users.
  • The second is to target your tests so that you only include a specific population group in your sample. For example, you could exclude all internet users who have come to you as a result of your email campaigns from your samples, if you know that this will distort your results. You could also target only new visitors so that you do not include visitors who have reached an advanced stage in their purchasing process (AKA visitors who are likely to convert regardless of which variation they see).

4. Other elements to keep in mind

There are other elements to bear in mind in order to be confident that your trial conditions are as close as they can be to a real-life situation: timing and the device.

Conversion rates can vary massively on different days of the week and even at different times of the day. Therefore, you’re advised to run the test over complete periods.

In other words, if you launch the test on a Monday morning then it should be stopped on a Sunday evening so that a normal range of conversions is respected.

In the same way, conversion rates can vary enormously between mobiles, tablets and desktop computers. So with devices, you’re advised to test your sites or pages specifically for each device. This is easy to accomplish by using the targeting features to include or exclude the devices if your users show very different browsing and purchasing behavior patterns.

These elements should be taken into account so that you do not end your tests too soon and get led astray by a faulty analysis of the results.

They also explain why certain A/A tests carried out over a period of time that is too short, or during a period of unusual activity, can present differences in results and also differences in statistical reliability, even when you may not have made any modifications at all.

The ideal A/B test duration

Running and A/B testing requires a thorough consideration of various factors such as your personal conversion goals, statistical significance, sample size, seasonality, campaigns, traffic sources, etc. All factors deserve equal attention when determining the best practices for your business.

Just remember to be patient, even if you reach your sample size early. You may be surprised by the ending results.

As A/B testing is an iterative process,  continuous experimentation and conversion rate optimization will lead to better results over time.