Statistical significance is a widely-used concept in statistical hypothesis testing. It indicates the probability that the difference or observed relationship between a variation and a control isn’t due to chance.
According to SPSS Tutorials, it is “the probability of finding a given variation from the null hypothesis in a sample”. The probability is referred to as the p-value.
Simply put, a statistically significant experiment means that you can be almost sure that the results you observed are reliable.
For any given statistical experiment – including A/B testing – statistical significance is based on several parameters:
- The confidence level (i.e how sure you can be that the results are statistically relevant, e.g 95%)
- Your sample size (little effects in small samples tend to be unreliable)
- Your minimum detectable effect (i.e the minimum effect that you want to observe with that experiment)
As you may have guessed already, your sample size is correlated with your minimum detectable effect. The smaller your minimum detectable effect is, the larger sample size you will need.
Your sample size and confidence level are also correlated. Larger samples mean that you can be more confident in your tests’ statistical significance.
Statistical significance and digital business applications
Although many industries utilize statistics, digital marketers have started using them more and more with the rise of A/B testing. And it’s no surprise.
Online marketers seek more accurate, proven methods of running online experiments. These experiments can play on conversions, average order value, cart abandonment and many other key performance indicators.
As a marketer, you want to be certain about the results you get. You want your results to be reliable so that they will lead you to informed decisions.
When your company runs an A/B test, the said test can be assimilated to an online controlled experiment. You’re basically trying to establish a causal link between your actions and the observed results.
A/B testing example
Let’s say that your company wants to A/B test a landing page in order to increase its performance at generating leads.
You decide to test whether or not changing your CTA’s color will affect the number of clicks on it. This is your hypothesis.
As an informed marketer, you know that people behave very differently from one cohort to another. Chance plays a huge role in determining if a variation would actually be caused by your actions.
That’s why you’ll need randomized samples and A/B tests.
At that point:
- Your current CTA color is the Null Hypothesis
- Your new CTA color is the Alternative Hypothesis
In order to get statistically relevant results, we’ll then have to work on 2 variables:
- The P-Value: the probability to observe an effect from a certain sample
- The Confidence Interval: the associated range of values derived from your confidence level.
From there on, there are two parameters that you’d probably want to determine:
- The size of your sample (i.e how many people should be included)
- The duration of your test (i.e how long should you run your tests)
Understanding sample size
Sample size is a key parameter signifying how many people you should test for your hypothesis to be statistically proven. We have a sample size calculator for you to play with to measure the required sample size.
It is based on 3 other parameters:
- Your current conversion rate
- The minimum detectable effect (MDE)
- The statistical significance
Understanding minimum detectable effect
You might estimate a projected minimum detectable effect, but there are many parameters that can affect your projection.
Well-aware of this issue, we’ve created a MDE calculator to compute your minimum detectable effect. This calculation is based on your current conversion rate and the number of visitors to your website.
A note on multivariate testing and statistical significance
Multivariate tests consist in testing several variants against a control version in a single test.
Because it allows bigger potential gains, marketers often use multivariate tests as an A/B/n type of test. It’s an effective solution to generate vast improvements in conversions and goals.
Although these tests can rapidly yield big results, bear in mind that they require larger sample sizes and longer test durations in order to be statistically significant.
Simply put, the risk of getting false positives is higher as the number of variables tested increases.
Lastly, and regardless of the type of tests that you’ll run, there are tradeoffs that you should keep in mind. These tradeoffs include regarding statistical significance and test sensitivity include:
- Lower detectable effect = larger sample = slower testing
- Increased statistical significance = larger sample = slower testing
- Increased sample size / test duration = higher level of confidence
Want to know more?
Download our ebook Demystifying A/B Testing Statistics
In this ebook, you’ll learn how to understand different statistical methods with side-by-side comparisons and how to read A/B test results with confidence.