Alphabet: T
In any experiment that is carried out, we often rely on probabilities to prove (or disprove) a hypothesis.
When carrying out an A/B test, for example, we are often seeking statistically significant results.
We are great advocates of testing in production and so A/B testing is one effective way to test your features on a select number of users to make sure that they’re working as they should before rolling them out to everyone else.
However, since such tests are always based on probabilities, as no hypothesis testing can be 100% certain, this is why sometimes we may arrive at wrong conclusions leading to what is known as type I and type II errors.
Statistical significance
We mentioned the term ‘statistical significance’ which is what any experiment is seeking to find. In the experiments you run, you want to make sure that a relationship actually exists between the variables proposed in your hypothesis, which is the purpose of an A/B test.
You are ultimately seeking to ensure that your A/B tests achieve statistical significance before making any decisions.
If you’ve often carried out A/B tests, then you’re probably familiar with this term as it gives you the tools necessary to make informed decisions to meet your business goals.
For the sake of further clarification, a statistically significant result in such tests means that the result is highly unlikely to have occurred randomly and is instead attributed to a specific cause or trend.
Simply put, it is the probability that the gap or difference between variations and control is not random or due to chance but due to a well-backed experiment. It indicates your risk tolerance and confidence level.
In other words, when you run an A/B test with a 95% significance or confidence level, this means you can be 95% confident that when you determine the winning variation, the results obtained are real and not due to chance.
However, as with any hypothesis test based on statistics and probabilities, two types of errors can show up in your results.
Hypothesis testing
Before we delve deeper into type I errors, it would be worthwhile to give an overview of what hypothesis testing is.
Hypothetical testing is when a hypothesis is tested against its opposite to determine whether it’s true or not. In this case, you have the null hypothesis and the alternative hypothesis or two variables.
Therefore, a statistical hypothesis test is used to determine a possible conclusion from two different and conflicting hypotheses.
The null hypothesis posits that there is no relationship between the two proposed phenomenon while the alternative hypothesis is the opposite of what is stated in the null hypothesis.
P-values used in statistical testing help decide whether to reject the null hypothesis. The smaller the value, the more likely you are to reject the null hypothesis. In other words, it tells you how likely your data would have occurred under the null hypothesis.
The p-value is most commonly set at p< 0.05 to declare statistical significance.
However, in any statistical test, there is always a degree of uncertainty so the risks of committing an error are quite high.
The following table depicts these errors in relation to the null hypothesis:
Type 1 error
One such error is type 1 (or type I) error, also referred to as false positive, which is the wrong rejection of a null hypothesis even though it’s true. In other words, you conclude that the results are statistically significant when they are simply a result of chance or due to unrelated factors.
Simply put, a type 1 error occurs when the tester validates a statistically significant difference when there isn’t one.
In an A/B test, a type 1 error is when you declare a bad variation as the winner even though the test conducted was inconclusive. In other words, as a false positive, you adhere to the belief that a variation in a test has made a statistically significant difference.
Type 1 errors have a probability of “α” or alpha correlated to the confidence level you set. For example, if you set a confidence level of 95% then there is a 5% chance that you will get a type 1 error.
Consequence of type 1 errors
Type 1 means wrongfully assuming that your hypothesis testing worked even though it hasn’t. Consequently, the main reason to remain on the lookout for such errors is that they may end up costing your company a lot of money as they could possibly lead to loss in sales.
If, for example, you tested out a change in the color of a button on your homepage and you noticed early on that the button did lead to more clicks. You are then convinced that this variation made a difference so you decide to end the test early by wrongfully concluding that there is indeed a correlation between this change in color and conversion rates.
Thus, you end up deploying this variation to all your users to find that, surprise, it didn’t actually have an impact. The end result is that you could risk hurting your customer conversion rate in the long run.
The best way to avoid such errors may be to increase test duration to ensure that your variation outperformed the control in the long run and sample size.
Related: Sample Size Calculator for A/B Testing
Type 2 error
Type 2 (or type II) errors, also referred to as false negatives, occur when you don’t reject the null hypothesis when it’s actually false and you end up rejecting your own hypothesis and variation. Type 2 errors have a probability of β or beta.
In an A/B test, this means that you fail to conclude there was an effect when there indeed was and so no conclusive winner is declared among the control and variations even though there should be one.
In other words, you believe that a variation has made no statistical difference and you mistakenly believe the null hypothesis and that a relationship doesn’t exist when it does.
A type 2 error is inversely related to the statistical power of a test, where power is the probability that a test can detect an effect that actually exists. The higher the statistical power, the lower the probability of committing a type 2 error.
Statistical power usually depends on three factors: sample size, significance level and The “true” value of your tested parameter.
Consequence of type 2 errors
Just like type I errors, type II errors can lead to false assumptions and poor decision-making by concluding the test too early.
Furthermore, getting false negatives and failing to notice the effect of your variations may lead to wasted opportunities as you’re not taking advantage of opportunities to increase your conversion rate.
To reduce the risk of such an error, make sure you increase the statistical power of your test, for example, having a big enough sample size. This would entail gathering more data over a longer period of time to help avoid reaching the false conclusion that your experiment didn’t have an impact when the opposite is true.
The probability of making type I and type II errors is depicted in the image below, where the null hypothesis distribution shows all possible results if the null hypothesis is true while the alternative hypothesis shows all possible results if the alternative hypothesis is true:
As can be seen, type I and type II errors occur where these two distributions overlap.
Let’s sum up…
Let’s consider these two scenarios:
- If your results demonstrate statistical significance, this means that there is a difference between the variations. In that case you may reject the null hypothesis. However, this could sometimes be a type 1 error.
- If your results don’t show statistical significance then the null hypothesis cannot be rejected. This could also sometimes be a type 2 error.
In the end, it’s important to strike a balance between making type 1 and type 2 errors. Many argue that making type I errors may be more damaging as it could lead to changes that will end up wasting resources, costing time and money while type 2 errors are more about ‘missed opportunities’ (though it could also have significant consequences).
The essential thing to remember is that A/B tests are based on statistical probabilities meaning that the results obtained are never 100% certain.
Nevertheless, these tests serve as a valuable tool to help marketers increase sales and conversion rate so even if your results may not be as certain as you’d like them to be, you can still increase the probability of the test result being true by avoiding the aforementioned errors.
To reduce probabilities for error, the key is to increase sample size and run the test for as long as possible to ensure the collection of as accurate as possible data and to increase the credibility of your test results.
Read more about A/B testing statistics in our A/B testing guide.
What is trunk-based development?
Trunk-based development (TBD) is a git branching strategy where developers collaborate in a single branch called ‘trunk’ and make smaller changes more frequently. In this case, developers rarely branch and should they do, the branches are usually short-lived, typically lasting no more than a few hours.
Thus, the underlying idea behind this strategy is to limit long-lasting branches and as a result avoid what is referred to as ‘merge hell’. Instead, the process of trunk-based development revolves around the concept that features should always be in a state ready for release.
How is this done exactly? Put simply, each developer splits the work they do in smaller batches and merges into the trunk or mainline so they commit directly into the trunk without the use of branches. This means that there is only one branch where developers directly commit known as the trunk.
Trunk-based development and CI/CD
Trunk-based development is required for and a key enabler of continuous integration and continuous delivery. Because developers are making changes multiple times a day to the trunk and any branches are merged as frequently as possible, they are adhering to the practice of continuous integration. This makes the process of releasing new features quicker, hence making continuous delivery possible.
So, in this sense, trunk-based development creates the environment necessary for continuous integration as commits occur multiple times a day. In other words, it creates the conditions necessary for continuous integration.
The two concepts are so intertwined that sometimes one can be used in lieu of another as they both adhere to the same idea of continuously integrating to the trunk. Or alternatively, you need trunk-based development to implement continuous integration.
When to implement trunk-based development?
Trunk-based development usually works best when you have a team of experienced, senior developers as this kind of workflow gives them the autonomy they need to get the job done.
However, it’s less recommended when you have a team of junior developers whose work you need to monitor closely. It is also not highly recommended if you are running an open-source project where you might need stricter control over any changes made since in this kind of project anyone can contribute.
Product maturity is another factor to consider in the decision to use trunk-based development. If the product is just starting out, the priority is usually to get it up and running in no time. Meanwhile, for a more mature product, you might want to monitor any changes closely, lest you lose a great amount of money depending on how much your well-established product is worth. In this case, you might want to consider feature branching.
Trunk-based development is often combined with feature flags – or feature toggles, so any new features can be deployed as soon as they are ready and rolled back easily in case of bugs.
Benefits of trunk-based development
Escape from merge hell
Unlike feature branching, branches in trunk-based development are short-lived, lasting no more than a few hours. This eliminates the risky task of merging long-lived branches that have split from the trunk leading to fewer conflicts or what we referred to earlier as ‘merge hell’. This leads us to the second advantage of TBD.
Fewer merge conflicts
By implementing continuous integration, developers push their changes to the shared branch at least once a day, so the likelihood of merge conflicts is significantly reduced.
This is because developers are not waiting for long periods of time to integrate their changes making it easier to resolve any conflicts that may arise. Any bugs become easier to spot and likely to be fixed quicker ensuring a smooth integration into the mainline.
Additionally, as usually there are no branches so developers push code directly into the trunk thereby discarding the need and hassle of merging long-lived branches.
Quicker feedback
Continuously integrating changes to the trunk allows for continuous improvements to be made to features as developers are able to validate their changes with that of their colleagues.
This is unlike feature branching where each developer works independently within a branch and any changes made within that branch would only be seen after merging into the main branch, which may take days or even weeks.
Consequently, trunk-based development offers greater visibility allowing developers to ensure that their changes are aligned and work seamlessly with other recent commits.
Quick releases
Quicker feedback naturally leads to quick releases. Committing changes to the trunk on a regular basis leads to more efficient release management and more frequent deployments. With a continuous stream of work being integrated into the trunk and continuous small changes to the feature being released bring a more stable release. This release would then be ready for deployment at any given moment in time.
In a fast-paced environment, rolling out constant updates and upgrades to features is a given. Trunk-based development is indispensable to get these releases out faster and not just any release but a high-quality release with significantly fewer bugs. Therefore, new features reach the end-user much faster. Basically, use trunk-based development when time is of the essence.
Use with caution
Trunk-based development is not without its weaknesses. When you have developers constantly adding changes to trunk, you end up in a constant state of churn. It is important to ensure that developers are regularly pulling from the trunk so that they don’t end up tripping over one another while adding new commits.
As mentioned earlier, there are also instances where implementing trunk-based development is not prudent. In this case, developers should consider other development methods such as feature branching.
Keep reading: how trunk-based development stacks up against other Git branching strategies
To sum it up
In the end, the benefits of this technique far outweigh its weaknesses. Choosing the right method for managing source code depends on what is right for your team. However, if you’re looking to deliver high-quality releases to your customers while promoting efficient collaboration between developers then the simple strategy is to implement this practice within your business.