Sample size calculator
Statistical significance calculator
Calculate the minimum sample size as well as the ideal duration of your A/B tests based on your audience, conversions and other factors like the Minimum Detectable Effect. Check out the FAQ below to learn more!
How many users do you need?
How long should your test run?
Our statistical significance calculator also gives you an idea of the duration of your A/B test. For this A/B test duration calculator to work, please fill in the information above, as well as your average daily traffic on the tested page and your number of variations – including the control version. Read our post to learn more about how many users you need for A/B testing.
How to figure out the Minimum Detectable Effect?
If you have absolutely no clue how to choose this MDE, we've got you covered!
The goal is to provide a simple way to compute the needed amount of visitors required for a test to be statistically significant (e.g. the needed amount of visitors you need to assess that a lift/loss of x% can be trusted with 95% confidence).
The null hypothesis is the convention in “frequentist” statistical tests, stating that there is no difference between variations (thus, the naming “null”).
When a test’s result is negative, it means that there is indeed a difference: we are negating the null hypothesis. On the contrary, when the test’s result is positive, it means that there isn’t any difference between variations.
This is linked to the concept of p-value.
The p-value is the probability of the result of an A/B test considering the null hypothesis.
In short, if the p-value is low (smaller than 0.05), the null hypothesis is unlikely to be true, hence that there is a difference between variations.
On the contrary, if the p-value is high (greater than 0.05), then the null hypothesis is likely to be true, meaning that there is probably no real difference between the variations. At the very least, you cannot conclude at this point and need more data to further the analysis.
This p-value only informs about the existence of a difference, it doesn’t give any information about its size or whether A > B or B > A.
Notation: since the p-value formulation is a bit confusing, it is often translated into a “confidence index” using percentage: (1 - p-value)*100.
Reaching statistical significance means that the confidence index is equal or greater than a given threshold. Theory dictates that this threshold is fixed once, before the start of the experiment.
For the confidence index, a conventional threshold for its statistical significance is 95% (corresponding to a p-value of 0.05), but it is only a convention.
This threshold should be set with the distinctive characteristics of each business in mind, as it is directly linked to the risk deemed reasonable for the experiment.
Also remember that a 95% statistical significance means that, statistically, 1 in every 20 results will be wrong, without any possibilities to detect it.
The algorithm is currently based on an extrapolation of the z-statistic formula, usually used for the normal distribution. AB Tasty also offers Bayesian A/B Testing.
Statistical power is the ability for a test to detect an effect, if the effect actually exists. i.e.: detecting a difference between variations if a real difference exists.
When doing prediction there are two types of errors. For A/B tests, a type I error, also called “false positive”, is declaring a bad variation as the winner, while a type II error is missing a winning variation.
The distinction is not just a theoretical: type I and type II errors often don’t implicate the same cost! It is then desirable to handle them differently.
Also named one- and two-tailed tests, the difference lies in the scope of their result:
- One-sided tests will only give one information on whether A = B or not. If A != B, it could be that A > B or A < B.
- Two-sided tests will give one more information: if A != B, is A > B or A < B?
This is really important for A/B testing as the direction of a difference, if any, is generally unknown before an experiment starts.
Two-sided tests are safer to use and this is what we use at AB Tasty.
with AB Tasty
All the tools you need to optimize your conversion rates, together in one platform.
Get a custom walkthrough of the platform.