Article

6min read

Using Failed A/B Test Results to Drive Innovation

“Failure” can feel like a dirty word in the world of experimentation. Your team spends time thinking through a hypothesis, crafting a test, and finally when it rolls out … it falls flat. While it can feel daunting to see negative results from your a/b tests, you have gained valuable insights that can help you make data-driven, strategic decisions for your next experiment. Your “failure” becomes a learning opportunity.

Embracing the risk of negative results is a necessary part of building a culture of experimentation. On the first episode of the 1,000 Experiments Club podcast, Ronny Kohavi (formerly of Airbnb, Microsoft, and Amazon) shared that experimentation is a time where you will “fail fast and pivot fast.” As he learned while leading experimentation teams for the largest tech companies, your idea might fail. But it is your next idea that could be the solution you were seeking.

“There’s a lot to learn from these experiments: Did it work very well for the segment you were going after, but it affected another one? Learning what happened and why will lead to developing future strategies and being successful,” shares Ronny.

In order to build a culture of experimentation, you need to embrace the failures that come with it. By viewing negative results as learning opportunities, you build trust within your team and encourage them to seek creative solutions rather than playing it safe. Here are just a few benefits to embracing “failures” in experimentation:

  1. Encourage curiosity: With AB Tasty, you can test your ideas quickly and easily. You can bypass lengthy implementations and complex coding. Every idea can be explored immediately and if it fails, you can get the next idea up and running without losing speed, saving you precious time and money.
  2. Eliminate your risks without a blind rollout: Testing out changes on a few pages or with a small audience size can help you gather insights in a more controlled environment before planning larger-scale rollouts.
  3. Strengthen hypotheses: It’s easy to fall prey to confirmation bias when you are afraid of failure. Testing out a hypothesis with a/b testing and receiving negative results confirms that your control is still your strongest performer, and you’ll have data to support the fact that you are moving in the right direction.
  4. Validate existing positive results: Experimentation helps determine what small changes can drive a big impact with your audience. Comparing negative a/b test results against positive results for similar experiments can help to determine if the positive metrics stand the test of time, or if an isolated event caused skewed results.

In a controlled, time-limited environment, your experiment can help you learn very quickly if the changes you have made are going to support your hypothesis. Whether your experiment produces positive or negative results, you will gain valuable insights about your audience. As long as you are leveraging those new insights to build new hypotheses, your negative results will never be a “failure.” Instead, the biggest risk would be allowing a status quo continuing to go unchecked.

“Your ability to iterate quickly is a differentiation,” shares Ronny. “If you’re able to run more experiments and a certain percentage are pass/fail, this ability to try ideas is key.”

Below are some examples of real-world a/b tests and the crucial learnings that came from each experiment:

Lesson learned: Removing “Add to Basket” CTAs decreased conversion

In this experiment, our beauty/cosmetics client tested removing the “Add to Basket” CTA from their product pages. The idea behind this was to test if users would be more interested in clicking through to the individual pages, leading to a higher conversion rate. The results? While there was a 0.4% increase in visitors clicking “Add to Basket,” conversions were down by 2%. The team took this as proof that the original version of the website was working properly, and they were able to reinvest their time and effort into other projects.

Beauty client add to basket use case

Lesson learned: Busy form fields led to decreased leads

A banking client wanted to test if adjusting their standard request form would drive passage to step 2 and ultimately increase the number of leads from form submissions. The test focused on the mandatory business identification number field, adding a pop-up explaining what the field meant in the hopes of reducing form abandonment. The results? They saw a 22% decrease in leads as well as a 16% decrease in the number of visitors continuing to step 2 of the form. The team’s takeaways from this experiment were that in trying to be helpful and explain this field, their visitors were overwhelmed with information. The original version was the winner of this experiment, and the team saved themselves a huge potential loss from hardcoding the new form field.

Banking client form use case

Lesson learned: Product availability couldn’t drive transactions

The team at this beauty company designed an experiment to test whether displaying a message about product availability on the basket page would lead to an increase in conversions by appealing to the customer’s sense of FOMO. Instead, the results proved inconclusive. The conversion rate increased by 1%, but access to checkout and the average order value decreased by 2% and 0.7% respectively. The team determined that without the desired increase in their key metrics, it was not worth investing the time and resources needed to implement the change on the website. Instead, they leveraged their experiment data to help drive their website optimization roadmap and identify other areas of improvement.

Beauty client availability use case

Despite negative results, the teams in all three experiments leveraged these valuable insights to quickly readjust their strategy and identify other places for improvement on their website. By reframing the negative results of failed a/b tests into learning opportunities, the customer experience became their driver for innovation instead of untested ideas from an echo chamber.

Jeff Copetas, VP of E-Commerce & Digital at Avid, stresses the importance of figuring out who you are listening to when building out an experimentation roadmap.  “[At Avid] we had to move from a mindset of ‘I think …’ to ‘let’s test and learn,’ by taking the albatross of opinions out of our decision-making process,” Jeff recalls. “You can make a pretty website, but if it doesn’t perform well and you’re not learning what drives conversion, then all you have is a pretty website that doesn’t perform.”

Through testing you are collecting data on how customers are experiencing your website,  which will always prove to be more valuable than never testing the status quo. Are you seeking inspiration for your next experiment? We’ve gathered insights from 50 trusted brands around the world to understand the tests they’ve tried, the lessons they’ve learned, and the successes they’ve had.

Subscribe to
our Newsletter

bloc Newsletter EN

We will process and store your personal data to send you communications as described in our  Privacy Policy.

Article

8min read

How Feature Flags Support Your CI/CD Pipeline by Increasing Velocity and Decreasing Risk

As more modern software development teams start adopting DevOps practices that emphasize speed of delivery while maintaining product quality, these teams have had to instill certain processes that would allow them to deliver releases in small batches for the purpose of quicker feedback and faster time to market. 

Continuous integration (CI) and continuous delivery (CD), implemented in the development pipeline, embody a set of practices that enable modern development teams to deliver quickly and more frequently.

We’ll start by breaking down these terms to have a clearer understanding of how these processes help shorten the software development lifecycle and bring about the continuous delivery of features.

What is CI/CD?

A CI/CD pipeline first starts with continuous integration. This software development practice is where developers merge their changes into a shared trunk multiple times a day through trunk-based development – a modern git branching strategy well-suited for fast turnaround.

This method enables developers to integrate small changes frequently. This way, developers can get quick feedback as they will be able to see all the changes being merged by other developers as well as avoid merge conflicts when multiple developers attempt to merge long-lived branches simultaneously.

This also ensures that bugs are detected and fixed rapidly through the automated tests that are triggered with each commit to the trunk.

Afterwards, continuous delivery keeps the software that has made it through the CI pipeline in a constant releasable state decreasing time to market as code is always ready to be deployed to users.

During CI/CD, software goes through a series of automated tests from unit tests to integration tests and more which verify the build to detect any errors which can be quickly fixed early on.

This saves time and boosts productivity as all repetitive tasks can now be automated allowing developers to focus on developing high quality code faster.

We may also add continuous deployment to the pipeline, which goes one step further and deploys code automatically and so its purpose is to automate the whole release process. Meanwhile, with continuous delivery, teams manually release the code to the production environment.

To sum up, CI and CD have many advantages including shortening the software development cycle and allowing for a constant feedback loop to help developers improve their work resulting in higher quality code.

However, they can even be better when combined with feature flags. We can even go further and argue that you cannot implement a true CI/CD pipeline without feature flags.

So what are feature flags?

Before we go further, we will provide a brief overview of feature flags and their value in software development processes.

Feature flags are a software development tool that enables the decoupling of release from deployment giving you full control over the release process.

Feature flags range from a simple IF statement to more complex decision trees, which act upon different variables. Feature flags essentially act as switches that enable you to remotely modify application behavior without changing code.

Most importantly, feature flags allow you to decouple feature rollout from code deployment which means that code deployment is not equal to a release. This decoupling or separation gives you control over who sees your features and when.

Therefore, they help ship releases safely and quickly as any unfinished changes can be wrapped in a flag; hence, this allows features that are ready to be progressively deployed to your users according to pre-defined groups and then eventually these features can be released to the rest of your user base.

As a result, feature flags allow teams to deliver more features with less risk. It allows product teams, in particular, to test out their ideas, through A/B testing for example, to see what works and discard what isn’t before rolling the feature out to all users.

Therefore, there are many advantages to feature flags as their value extends to a wide variety of use cases including:    

  • Running experiments and testing in production
  • Progressive delivery
  • User targeting
  • Kill switch

Ultimately, there is one common underlying theme and purpose behind those use cases, which is risk mitigation.

Incorporating feature flags into your CI/CD pipeline

Feature flags are especially useful as part of the CI/CD pipeline as they represent a safety net to help you ship features quickly and safely and keep things moving across your pipeline.

As we’ve already seen, CI and CD will help shorten the software development cycle allowing you to release software faster but these processes aren’t without their risks. 

That’s where feature flags come in handy. Feature flags will allow you to enable or disable features and roll back in case anything goes wrong.

This way you can test your new features by targeting them to specific user groups and measure their impact in relation to the relevant KPIs set at the beginning of the experiment.

In other words, by the time you release your features to all users you’d have already tested them and so you’re confident that they will perform well.

To better understand how CI and CD are better with feature flags, we will look at each process individually and discuss how feature flags help improve the efficiency of CI and CD. 

Feature flags and CI

You’re only undertaking true continuous integration when you integrate early and often. However, without feature flags, developers who have finished their changes will have to wait until all the other developers on the team have also completed their changes to merge deploy the changes. 

Then, another issue arises when they don’t integrate often enough as this will result in long-lived feature branches that may lead to merge conflicts, and worst case scenario, merge hell.

Things become even more complicated as your developer team grows. With such delays, the purpose of CI would be defeated.

This is where feature flags step in.

Feature flags will allow developers to release their ready features without having to wait for others to be finished as any unfinished features can be wrapped in a flag and disabled so it doesn’t disrupt the next step, which is continuous delivery. 

Thus, feature flags allow developers to turn off portions of the code that are incomplete or causing issues after being integrated. This way, other developers can still integrate their changes often- as soon as they’re ready- without disrupting the CI process. 

Furthermore, practicing CI means you have to integrate frequently, often several times a day but what happens when a build fails? Feature flags will allow you to rollback buggy features until they are fixed and can then be toggled on when they are ready. 

Thus, any features that fail the automated tests upon integration can be simply turned off. This also helps to keep your master branch healthy and bug-free as you’re able to disable the portions of code that are causing problems. 

Feature flags and CD

Continuous delivery’s essence is speed so you should always be ready to deliver something in small increments frequently. This means if there’s a feature slowing you down or contains bugs then you cannot deploy and so you’ve lost the whole momentum of CD.

Again, this is where feature flags come in.

If developers haven’t finished working on their code, it can be turned off until it’s ready and still proceed with the release instead of delaying it for an indefinite amount of time resulting in disgruntled customers.

Any complete features can then be turned on in the trunk and other features remain unaffected and can remain disabled until they’re complete as well.

In other words, feature flags will allow you to still deploy your code so if there is an incomplete feature, users won’t be able to access the functionality as it would be turned off using feature flags. Only when the flag is activated making the feature visible can users finally access it.

Continuous delivery’s purpose is to keep code in a deployable state but if you’re not confident about the release and you’re worried about its impact on your users, what’s the solution?

Well, what if you don’t have to ship the release to all users? What if you can target specific users, for example internally within your organization, before releasing it to everyone else?

With feature flags, you can target certain user groups so that you test your new features in production without impacting all users.

Thus, you choose who you want to test on by using feature flags. If a feature isn’t working like it should while testing in production, then you can turn it off until you figure out the issue.

Feature flags + CI/CD= The answer to fast and risk-free deployments

Feature flags, then, help keep your features moving within your pipeline in a quick and safe manner.

Using feature flags means you no longer need to do a full rollback of a release while you fix any issues which could potentially take so long that you risk losing customers.

To put it simply, feature flags give you a safety net when integrating and delivering features by giving you control over what portions of code you enable or disable.

The key to success in modern software development is speed in order to keep up with rapidly changing consumer demands. Otherwise, you risk losing the race to competitors.

However, if not managed carefully, feature flags can be more burdensome than valuable. Thus, feature flags require careful management and monitoring to reap its benefits without bearing its potential heavy costs.

When we talk about heavy costs, we refer to the potential of feature flags accumulating into what is known as ‘technical debt’. If you don’t have a system in place to manage all your flags then feature flags can quickly become a liability.

This is why using a feature flag solution becomes crucial. Such sophisticated platforms give you a way to track and manage all the flags in your system throughout their entire lifecycle.

For example, AB Tasty’s flagging feature has a flag tracking dashboard that lists all the flags you have set up with their current values (on/off) and the campaigns that reference them. This would allow you to keep track of every single flag purpose. This will ultimately enable you to clean up any stale flags that you’re no longer using which would otherwise result in technical debt.