Tag: EN

Article

Aug 22, 2023

11min read

CRO Metrics: Navigating Pitfalls and Counterintuitive KPIs

Hubert Wassner

Metrics play an essential role in measuring performance and influencing decision-making.

However, relying on certain metrics alone can lead you to misguided conclusions and poor strategic choices. Potentially misleading metrics are often referred to as “pitfall metrics” in the world of Conversion Rate Optimization.

Pitfall metrics are data indicators that can give you a distorted version of reality or an incomplete view of your performance if analyzed in isolation. Pitfall metrics can even cause you to backtrack in your performance if you’re not careful about how you evaluate these metrics.

Metrics are typically split into two categories:

Session metrics: Any metrics that are measured on a session instead of a visitor basis
Count metrics: Metrics that count events (for instance number of pages viewed)

Some metrics can mesh into both categories. Needless to say, that’s the worst option for a few main reasons: no real statistical model is used when meshing into both categories. There is no direct/simple link to business objectives and these metrics may not need standard optimization.

While metrics are very valuable for business decisions, it’s crucial to use them wisely and be mindful of potential pitfalls in your data collection and analysis. In this article, we will explore and explain why some metrics are very not wise to use in practice in CRO.

Session-based metrics vs visitors

One problem with session-based metrics is that “power users” (AKA users returning for multiple sessions during the experimentation) will lead to a bias with the results.

Let’s remember that during experimentation, the traffic split between the variations is a random process.

Typically you think of a traffic split as very random but very even groups. When we talk about big groups of users – this is typically true. However, when you consider a small group, it’s very unlikely that you will have an even split in terms of visitor behaviors, intentions and types.

Let’s say that you have 12 power users that need to be randomly divided between two variations. Let’s say that these power users have 10x more sessions than the average user. It’s quite likely that you will end up with a 4 and 8 split, a 2 and 10 split, or another uneven split. Having an even split randomly occur is very unlikely. You will then end up in one of two very likely situations:

Situation 1: Very few users may make you believe you have a winning variation (which doesn’t yet exist)
Situation 2: The winning variation is masked because it received too few of these power users

Another problem with session-based metrics is that a session-based approach blurs the meaning of important metrics like transaction rates. The recurring problem here is that not all visitors display the same type of behavior. If average buyers need 3 sessions to make a purchase while some need 10, this is a difference in user behavior and does not have anything to do with your variation. If your slow buyers are not evenly split between the variations, then you will see a discrepancy in the transaction rate that doesn’t actually exist.

Moreover, the metric itself will lose part of its intuitive meaning over time. If your real conversion rate is around 3%, but counted by session and not by unique visitors, you will only likely only see a 1% conversion rate when switching to unique visitors.

This is not only disappointing but very confusing.

Imagine a variation urging visitors to buy sooner by using “stress marketing” techniques. Let’s say this leads to a one session purchase instead of three sessions. You will see a huge gain (3x) on the conversion per session. BUT this “gain” is not an actual gain since the number of conversions will have no effect on the revenue earned. It’s also good to keep in mind that visitors under pressure may not feel very happy or comfortable with such a quick purchase and may not return.

It’s best practice to avoid using session-based metrics unless you don’t have another choice as they can be very misleading.

Understanding count metrics

We will come back to our comparison of these two types of metrics. But for now, let’s get on the same page about “count metrics.” To understand why count metrics are harder to assess, you need to have more context on how to measure accuracy and where exactly the measure comes from.

To model rate accuracy measures, we use beta distribution. In the graph below, we see the measure of two conversion ratios – one blue and one orange. The X-axis is the rate and Y-axis is the likelihood. When trying to measure the probability that the two rates are different, we implicitly explore the part of the two curves that are overlapping.

In this case, the two curves have very little overlap. Therefore, the probability that these two rates are actually different is quite high.

The more narrow or compact the distribution is, the easier it is to see that they’re different.

Want to start optimizing your website with a platform you can trust? AB Tasty is the best-in-class experience optimization platform that empowers you to create a richer digital experience – fast. From experimentation to personalization, this solution can help you activate and engage your audience to boost your conversions.

The fundamental difference between conversion and count distributions

Conversion metrics are bounded into [0:1] as a rate or [0%:100%] as a percentage. But, for count metrics the range is open, and the counts are in [0,+infinity].

The following figure shows a gamma distribution (in orange) that may be used with this kind of data, along with a beta distribution (in blue).

These two distributions are based on the same data: 10 visitors and 5 successes. This is a 0.5 success rate (or 50%) when considering unique conversions. In the context of multiple conversions, it’s a process with an average of 0.5 rate conversion per visitor.

Notice that the orange curve (for the count metric) is non-0 above x = 1, this clearly shows that it expects that sometimes there will be more than 1 conversion per visitor.

We will see that comparisons between this kind of metric depend on whether we consider it as a count metric or as a rate. There are two options:

Either we consider that the process is a conversion process, using a beta distribution (in blue), which is naturally bounded in [0;1].
Or we consider that the process is a count process, using gamma distribution (in orange), which is not bounded on the right side.

On the graph, we see an inner property of count data distributions, they are dissymmetric: the right part goes slower to 0 than the left part. This makes it naturally more spread out than the beta distribution.

Since both curves are distributions, their surface under the curve must be 1.

As you can see, the beta distribution (in blue) has a higher peak than the gamma distribution (in orange). This exposes that the gamma distribution is more spread out than the beta distribution. This is a hint that count distributions are harder to get accurate than conversion distributions. This is also why we need more visitors to assess a difference when using count metrics rather than when using conversion metrics.

To understand this problem you have to imagine two gamma distribution curves, one for each variation of an experiment. Then, gradually shift one on the right, showing an increasing difference between the two distributions. (see figure below)

Since both curves are right-skewed, the overlap region will occur on at least one of the skewed parts of the distributions.

This means that differences will be harder to assess with count data than with conversion data. This comes from the fact that count data works on an open range, whereas conversion rates work on a closed range.

Do count metrics need more visitors to get accurate results?

No, it is more complex than that in the CRO context. Typical statistical tests for count metrics are not suited for CRO in practice.

Most of these tests come from the industrial world. A classic usage of count metrics is counting the number of failures of a machine in a given timeframe. In this context, the risk of failure doesn’t depend on previous events. If a machine already had one failure and has been repaired, the chance of a second failure is considered to be the same.

This hypothesis is not suited for the number of pages viewed by a visitor. In reality, if a visitor saw two pages, there’s a higher chance that they will see a third page compared to a visitor that just saw one page (since they have a high probability to “bounce”).

The industrial model does not fit in the CRO context since it deals with human behavior, making it much more complex.

Not all conversions have the same value

The next CRO struggle also comes from the direct exploitation of formulas from the industrial world.

If you run a plant that produces goods with machines, and you test a new kind of machine that produces more goods per day on average, you will conclude that these new machines are a good investment. Because the value of a machine is linear with its average production, each extra product adds the same value to the business.

But this is not the same in CRO.

Imagine this experiment result for a media company:

Variation B is yielding an extra 1,000 page views more than the original A. Based on that data, you put variation B in production. Let’s say that variation B lost 500 people that saw 2 pages and variation B also won 20 people that saw 100 pages each. That makes a net benefit of 1000 page views for variation B.

But what about the value? These 20 people, even if they spent a lot of time on the media, are maybe not the same value as 500 people that come regularly.

In CRO each extra value added to a count metric does not have the same value, so you cannot trust measured increment as a direct added value.

In applied statistics, one adds an extra layer to the analysis: a utility function, which links extra counts to value. This utility function is very specific to the problem and is unknown to most CRO problems. Even if you get some more conversions in a count metric context, you are unsure about the real value of this gain (if any).

Some count metrics are not meant to be optimized

Let’s see some examples where raising the number of a count metric might not be a good thing:

Page views: If the count of page views rises, you can think it’s a good thing because people are seeing more of your products. However, you can also think that people get lost and need to browse more pages to find what they need.
Items added to cart: We have the same idea for the number of products added to the cart. If you do not check how many products remain in the cart at the checkout stage, you don’t know if the variation helps to sell more or if it just makes the product selection harder.
Product purchased: Even the number of products purchased may be misleading as a business objective alone if used alone in an optimization context. Visitors could be buying two cheaper products instead of one high-quality (and more expensive) product.

You can’t tell just by looking at these KPIs if your variation or change is good for your business or not. There is more that needs to be considered when looking at these numbers.

How do we use this count data then?

We see in this article how counterintuitive optimization based on sessions is. And even worse, we see how misleading count metrics are in CRO.

Unless you have both business and statistics expert resources, it’s best practice to avoid them, at least as a unique KPI.

As a workaround, you can use several conversion metrics with specific triggers using business knowledge to set the thresholds. For instance:

Use one conversion metric for count is in the range [1; 5] called “light users.”
Use another conversion metric in the range [6,10] called “medium users.”
Use another one for the range [11,+infinity] called “heavy users”.

Splitting up the conversion metrics in this way will give you a clearer signal about where you gain or lose conversions.

Another piece of advice is to use several KPIs to have a broader view.

For instance, although analyzing the product views alone is not a good idea – you can check the overall conversion rate and average order value at the same time. If product views and conversion KPIs are going up and the average order value is stable or goes up, then you can conclude that your new product page layout is a success.

Counterintuitive Metrics in CRO

Now you see that except for conversions counted on a unique visitor basis, nearly all other metrics can be very counterintuitive to use in CRO. Mistakes can happen because of statistics that work differently, and also because the meaning of these metrics and their evolutions may have several interpretations.

It’s important to understand that CRO skill is a mix of statistics, business and UX knowledge. Since it’s very rare to have all this within one person, the key is to have the needed skills spread across a team with good communication.

You might also like...

See all

Article

7min read

Is Your Average Order Value (AOV) Misleading You?

Hubert Wassner

Jul 11, 2025

Article

5min read

Why AB Tasty Delivers 4x Faster

Leo Wiel

Jul 7, 2025

Article

15min read

16 Experimentation Influencers You Should Follow

Maddie Ostrander

Jul 3, 2025

Subscribe to
our Newsletter

Article

Aug 10, 2023

7min read

Four Ways to Use GA4 to Power Your Web Experimentation Programs

AB Tasty

We invited Oliver Walker from our partner Hookflash to talk us through the practical ways you can use GA4 with your experimentation.

Although many people are talking about GA4 as a different platform from the previous version (Universal Analytics), conceptually it lets you do largely the same things. Its primary functions are to help you to understand and optimize your media; to understand and optimize your website; and to understand and segment your website visitors into audiences. However, with GA4 several features can really help you to power an experimentation program.

Here we’ll outline how to use GA4 to its full potential to drive results for your testing program.

Understanding User Behavior

At its core, Google Analytics has always been great for helping website owners to understand their website traffic. Whether it’s where they started their journey or where they ended their digital journey, or whether they sought help halfway through, there are a few options to know about. What we know about GA4 already is that it’s not the most intuitive tool in the world so here are some quick tips on that front:

Landing Pages – use Explorations – although there is a default report for landing pages…it’s not the best. Not just because there’s a known bug resulting in an empty row, but also because it doesn’t have the most useful metrics, i.e. bounce rate or engagement rate. If you build a report in Explorations, you can use a different dimension (called “Landing page + query string”) and choose the dimensions you’d find useful:

Exit rate – similar to the above, you no longer get Exits (or Exit Rate) in the default Pages & Screens report. Again, rebuilding the report in Explorations gives you both the ability to add Exits as a metric, and you can choose your preferred pages dimension. The default dimension in the Pages and Screens report does not include query strings but if you’d prefer to use the one that does, choose the dimension “Page path + query string”.
Site search – and finally, where’s the Site Search report gone!? There’s no longer a default report for this but you can rebuild this in Explorations. You can understand which search terms were most often looked for, by building an Exploration with the dimension of “Search term” and the metric “Event count”.

Understanding User Flow

What Universal Analytics was not particularly good at is visualizing how people traverse through a website. The flow reports were horribly sampled and just merely teased you as to what you could have had. GA4 has on-the-fly path exploration reports that can be used and tweaked, very flexibly. You can find these within Explorations too, just choose Path Exploration and then tweak, as per the following:

Get the pages view – for some unhelpful reason, the default view is Event Name, within each step. In the visualization, click the drop-down underneath Step +1 and change Event Name to be your preferred page dimension to get a view of how users move from page to page.
Double-click the page you are interested in to see where users go next. You can also click the +15 more (or whichever number) link at the bottom of each column to get the longer tail
Choose a dimension to “breakdown” by lets you easily compare routes through the site for different users, for example mobile vs. desktop or for each of the different browsers. Likewise, you can use segments here to review a certain audience type, e.g. non-UK traffic or Purchasers.

Audience targeting & triggers

Speaking of audiences, this was always a great feature of Universal Analytics and when Google Optimize was in its pomp, the ability to share audiences from UA to Optimize was one of its prime features. With GA4 you get the same ability to build audiences and to share audiences natively with other Google Marketing Platform (GMP) plus some neat additional elements:

The ability to use user behavior to trigger new types of goals. For example, if you’re a publisher and you want to engage people to read a certain number of articles in a particular time frame, it’s possible to create an audience for this and then have that set of behavior trigger a new event. It’s called audience triggers. And this becomes a powerful new metric with which to optimize your testing campaigns, by importing that conversion into your chosen testing tool
The ability to export audiences from GA4 to other platforms. Namely, this is something that the new Google Analytics Data API supports. This is big news. Whilst it’s to be expected that other platforms will catch-up, at the moment AB Tasty is the only one to have published their mechanism for pulling GA4 audiences into their platform:

This is generally a great leap forward as GA4 also has the concept of users being added, and removed from audience groups, whereas most testing tools don’t have this feature.

Advanced analysis using BigQuery

The final area where GA4 really steps forward beyond its predecessor is that all GA4 accounts have a native integration with Google BigQuery. Whilst the integration itself is free, it’s worth noting that you do incur costs by storing or processing data in BigQuery, although a good partner will be able to advise on what that might look like for you.

So where does BigQuery help? The data schema provided by integrating GA4 and BigQuery is raw-level data – that means each row is effectively an event, with a time stamp, and all the associated parameters. It lets you have a greater degree of flexibility over what you analyze, provided you’re able to query the data (using SQL, or your friendly AI-driven chat tool.) For example:

If you want to understand how long it takes a user to complete a particular flow or set of actions. Worth noting that Google Analytics does batch events so this isn’t perfect, but it is easier than within the interface
If you want to look at user flows at an even greater level of detail, for example, how users traverse through the site having landed at a particular page
If you want to stitch together any data that GA didn’t capture but that also exists in Google Cloud, e.g. following a lead to submission through to outcome.
If you want to conduct a deeper analysis within your post-experiment analyses. All testing platforms will pass events and parameters to denote whether a user was part of an experiment and the variation they saw, so GA4 is a powerful additional tool to deep-dive into results

It’s not all doom and gloom

Yup, GA4 does have some limitations, it’s a big change to a tool that lots of people loved and it’s hard to pick-up. BUT when you start to understand certain concepts and familiarize yourself with capabilities, there are lots of features to help you with your experimentation program.

You might also like...

See all

Article

7min read

Is Your Average Order Value (AOV) Misleading You?

Hubert Wassner

Jul 11, 2025

Article

5min read

Why AB Tasty Delivers 4x Faster

Leo Wiel

Jul 7, 2025

Article

15min read

16 Experimentation Influencers You Should Follow

Maddie Ostrander

Jul 3, 2025

Subscribe to
our Newsletter

Article

Aug 7, 2023

7min read

Understanding shopping engagement software: How do virtual shopping assistants work?

AB Tasty

Every visitor shopping online wants to find a product that precisely meets their expectations quickly and efficiently. To achieve this, you can offer your potential customers purchasing advice to guide them throughout their buying journey.

In this article, you will discover the different forms of virtual shopping assistants available in e-commerce and the advantages they bring to you and your customers.

What are virtual shopping assistants?

Virtual shopping assistants, enabled by shopping engagement software, provide your shoppers with support in their product selection through an interactive and personalized exchange. By asking precise questions, your customers can find products that align with their wishes and needs more quickly.

This approach is based on the purchase advice provided in brick-and-mortar retail, aiming to overcome the impersonal components of online shops and enhance the individual user experience.

How do virtual shopping assistants differ from faceted search?

With faceted search, your customers can filter their search results in the online shop to view the products that interest them. For example, when searching through an e-commerce apparel shop they can use faceted navigation to select features, such as women’s blue capris in size 40, providing a user-friendly experience.

However, customers need to already know exactly what they want to buy to filter accordingly. If a customer is uncertain about their purchase or unsure about the specific product features they desire, they require support in the form of virtual shopping assistants.

What kinds of virtual shopping assistants are available?

There are various formats of virtual shopping assistants in e-commerce that can be integrated at different points of the customer journey. Let’s take a closer look at two categories: person-to-person communication tools and automated tools that can handle multiple customer inquiries in real time.

Virtual shopping assistants with human-to-human communication

Below, we present two examples of virtual shopping assistants that utilize human-to-human communication:

Live chat

Live chat is a messenger tool that allows your customers to directly contact an employee of your online shop. Typically integrated as a pop-up window on the company website, it facilitates one-to-one communication, resembling the experience of brick-and-mortar retail.

Video consultation

Video consultation is a rising trend in the e-commerce industry.

Customers visiting your e-commerce site may still be exploring their needs, making phone, chat or email interactions insufficient. With video consulting, customers can engage in face-to-face conversations with an employee of your online shop, ask questions, and receive individual advice on your products and processes.

For instance, customers can share their screens and present their ideas and inspiration to the sales representative, leading to a more targeted sales pitch. This combination of online shopping with personalized attention replicates the experience of boutique purchases and ultimately boosts customer loyalty and satisfaction.

The advantage: Your customers receive immediate, personalized answers to their questions about products and processes while they browse your shop. Especially for complex products that require explanation, customer-oriented live chat can positively influence purchase decisions. Additionally, you can offer appointments for individual purchase advice.

Virtual shopping assistants with AI-based tools

Now, let’s explore two examples of online consulting software that utilize AI-based tools for real-time interactions with multiple customers at once.

AI-based chatbots

Chatbots using artificial intelligence can respond to hundreds of customer inquiries simultaneously and in real time.

With the emergence of large language model chatbots such as OpenAI’s ChatGPT and Google’s Bard, brands have the potential to revolutionize how they engage with their customers online.

Depending on how the tool is programmed, it can recognize natural language, generate suitable answers from text blocks and databases on your website, and even escalate queries to a human employee if necessary. This enables personnel-friendly automation of various processes.

Guided Selling

Guided Selling involves guiding your customers through the product selection process to facilitate a confident purchase decision. This is particularly useful for potential buyers who may not possess enough knowledge about the products to make an informed choice.

For instance, when it comes to purchasing a stroller, expectant parents can feel overwhelmed by the countless models available. Guided Selling assists them in narrowing down the selection through targeted questions, leading to the ideal stroller. This can be seen in the example from babymarkt.de, who uses Guided Selling from AB Tasty to provide better shopping experiences for their customers.

This form of assistance, where a customer is guided step-by-step through the consultation process based on specific questions, is especially suitable for products that require explanation and mirrors the experience of a sales pitch in brick-and-mortar retail. Guided Selling can also be used for self-explanatory products, where customers can find the right product selection by selecting certain tags.

What makes Guided Selling special is that the results can be personalized to display suitable products based on the individual click and buying behavior of your customer. This ensures that your customer receives not only products that match their desired features and requirements but also their unique preferences.

Why is good customer engagement important in e-commerce?

Customers who feel well-advised are happy to come back. This applies to both brick-and-mortar stores and e-commerce shops. In addition, there are other reasons for using shopping engagement software like virtual shopping assistants.

Personalized shopping experience

When potential buyers walk into a brick-and-mortar store, they can approach the on-site sales consultants to find the right product.

By integrating this service into your online shop in the form of live chats, video advice or Guided Selling, you enable your customers to recreate the feeling of an interactive, personalized shopping experience.

Shoppers become customers

Virtual shopping assistants help you convert potential buyers into customers. By putting customers in direct contact with your team or catalog, they get answers to their questions that can positively influence their purchase decision.

For very personal products such as mattresses, a virtual shopping assistant tool helps visitors to find the one that exactly meets their needs from the multitude of models.

A better user experience

Your visitors appreciate positive experiences throughout their customer journey.

Support through virtual shopping assistants gives them a secure feeling when choosing a product and more frequently leads to a purchase decision. In addition, virtual shopping assistants make shopping easier: You present your customers with suitable solutions, they feel understood and the positive user experience is anchored in their memory.

Higher conversion

With virtual shopping assistants and shopper engagement software, you can reduce lost sales opportunities and thus increase your conversions. Sometimes potential buyers leave a shop because they didn’t find a product that is actually there. If they can easily ask a sales representative about the product via live chat, it will improve their shopping experience.

Your potential customers have already added products to their shopping cart, so why are they abandoning the checkout process? One possible reason: They had a question about a process that was not answered quickly enough. With an AI-based chatbot available during the checkout, these questions can be solved quickly and efficiently.

Higher customer satisfaction

The personalized service of a virtual shopping assistant creates an intimate atmosphere – a 1:1 exchange reminiscent of brick-and-mortar experiences. This not only strengthens potential buyers’ trust in your company but also their satisfaction. And satisfied customers turn into loyal customers.

Fewer Returns

Implementing virtual shopping assistants in your shop reduces the risk of returns. The two most common reasons for returns are either that the product didn’t fit or they didn’t like it.

With personal, targeted advice, you can help your customers to choose the right products that meet their wishes and needs as precisely as possible. This reduces your costs and makes your returns management easier.

Conclusion: Virtual shopping assistants make e-commerce more human

Virtual shopping assistants are a must-have in e-commerce. It offers advantages for you as an e-commerce marketer as well as for your customers.

Live chats or chatbots, video advice and Guided Selling make it easier for potential buyers to select a product and improve their user experience. In a 1:1 exchange, they receive personalized answers to their questions – the online shop becomes more human. At the same time, you benefit from higher customer loyalty and fewer returns, which means you can increase your sales.

You might also like...

See all

Article

7min read

Is Your Average Order Value (AOV) Misleading You?

Hubert Wassner

Jul 11, 2025

Article

5min read

Why AB Tasty Delivers 4x Faster

Leo Wiel

Jul 7, 2025

Article

15min read

16 Experimentation Influencers You Should Follow

Maddie Ostrander

Jul 3, 2025

Subscribe to
our Newsletter

Article

Jul 25, 2023

16min read

Feature Flags: Should I Build or Buy?

Rowan Haddad

The concept of feature flags is quite straightforward and easy to implement, at least at first. In the beginning, you would usually be managing one feature flag by modifying a configuration file but when you start using multiple flags they may become harder to manage and to keep everyone in sync across different functions.

Undoubtedly, feature flags become increasingly important as engineering and product teams begin to see their benefits. By separating code deployment from feature release, teams can now deliver new functionalities to users safely and quickly.

Feature flags are also extremely versatile and their uses can extend to a number of different scenarios to achieve various tasks by all teams across your organization. As feature flags help developers release faster with lower risk, it makes sense that teams would want to extend their usage across these additional use cases.

We can look at feature flag implementation as a journey that is used initially for one simple use case and which then evolves to more advanced implementations by different stakeholders. This article will illustrate this journey by introducing you to the different use cases of feature flags from simple to more complex and to help you consider whether it is in your best interest to build or buy a feature flag management system according to your goals.

Are you looking for a feature flagging solution packed full of features with an easy-to-use dashboard? AB Tasty is the all-in-one feature flagging, rollout, experimentation and personalization solution that empowers you to create a richer digital experience — fast.

The value of feature flags

Before we go deeper into the build vs buy topic, it’s important to highlight exactly why you need feature flags in your daily workflows and the value they can bring to your teams.

As we’ve mentioned, feature flags can be used across a range of use cases. Here’s a quick overview of when feature flags are especially handy:

User targeting and feature testing: When you have a new feature but you’re not yet ready for a big bang release; instead, you want to have the control to target who sees this new feature to collect necessary feedback for optimization purposes.
Testing in production: When you want to test a production by gradually rolling out a new feature or change to validate it.
Kill switch: When you want to have the ability to quickly roll back a feature in case anything goes wrong and turn it off while the issue is being fixed.
Migrations: Feature flags offer a low-risk way to perform architectural or database migrations such as migrating from a monolith architecture to microservices.

This means that feature flags are a great way to continuously (and progressively) roll out releases with minimal risk by controlling who gets access to your releases and when.

The journey begins with a simple step: if/else statements

A feature flag in a code is essentially an IF statement. Here is a very straightforward, basic example:

Therefore, you can just be starting off with a simple if/else statement, usually reserved for short-lived flags but less so if you’re planning to keep the flag around for a long time or for other more advanced use cases which require more sophistication. Therefore, feature flags have evolved beyond one use case and can serve a variety of purposes. Inserting a few IF statements is easy but it’s actually maintaining a feature flag management system that’s hard work; it requires time, resources and commitment.

You can implement a feature flag by reading from a config file in order to control which code path will be exposed to your subset of users. Using a config file at the beginning may seem like a viable solution but in the long-term may not be so practical, resulting in technical debt that accumulates over time.

Keep reading: Best practices on storing feature flags.

Here, a simple flagging solution will not suffice and so you would need to turn to a more advanced solution. Implementing the solution you need in-house can be quite costly and requires a lot of maintenance. In this case, you can turn to a third-party option.

Bumps along the road: Evolving use cases

When you’re just starting out, you’ll implement a feature flag from a config file with an easy on/off toggle to test and roll out new features. Sounds simple enough. Then, one flag turns into 10 or 20 and as you keep adding to these flags leading to the aforementioned technical debt issue as it becomes harder to pinpoint which of your active feature flags need to be removed. In this case, a proactive approach to managing your flags is essential in the form of a comprehensive feature flag management system.

Therefore, at the start of your feature flag journey, you may simply be employing one use case which is experimentation through release management but over time, you may want to implement feature flags across a variety of use cases once you’ve seen first-hand the difference feature flags are making to your releases.

Test in production

You may for example want to test in production but only internally so you expose the feature to people within your organization. You may also use feature flags to manage entitlements, that is a small subset of users can access your feature, such as users with a premium subscription to your product or service. These types of flags are referred to as permission toggles. So you will need to build a system that can handle different levels of permissions for different users.

Indeed, one of the most important uses for feature flags is the ability to toggle features on or off for a certain subset of users. This use case is referred to as testing in production, which is essential to verify that your feature works as it should in real time and allows you to easily roll back the feature if it turns out to be buggy resulting in safer releases.

To be able to carry out such controlled roll-outs, your feature flagging system should enable you to make such context-specific flagging decisions, for example, for carrying out A/B tests.

So, for example, you might want to expose your feature to 5, 10 or 15% of your users or you might be looking to test this feature on users from a certain region. A good feature management system provides the means necessary to take such specific contexts when making flagging decisions. Therefore, such contexts can include additional information about the user so here we take into consideration the server handling the request as well as the geographic market the request is linked to.

Choose your players

As a result, feature flags allow you to choose who you want to release your feature to, so the new code can be targeted to a specific group of users whose feedback you need. This would require you to have a system in place that would allow you to perform feature experimentation on those users and attach KPIs to your releases to monitor their reception. However, some companies may not have the time or resources or even experience to collect this kind of rich data.

Kill switches

Feature flags can be used to kill off non-essential features or disable any broken features in production. Therefore, as soon as your team logs an error, they can easily turn off the feature immediately with the click of a button while your team investigates the issue. This would require your team to have a 2-way communication pathway between monitoring tools and the internal flag system that could be complex to set-up and maintain. The feature can then just as easily be turned on again once it’s ready for deployment. Such kill switches usually require a mature feature flag service implementation platform.

Feature flag hell

We can conclude that when implementing feature flags, you must continuously be aware of the state of each of your feature flags. Otherwise, you could find yourself becoming overwhelmed with the amount of flags in your system leading you to lose control of them when you’re unable to keep track of and maintain them properly. Things could get complicated fast as you add more code to your codebase so you need to make sure that the system you have in place is well-equipped to handle and reduce those costs.

You’ve probably already come across the term ‘merge hell’ but there’s also such a thing as ‘feature flag hell’. This is basically when you add too many feature flags which can convert your code into the nightmare that is ‘feature flag hell’.

As mentioned above, you can start off with a simple if/else statement but more sophistication will be needed to implement these more advanced use cases.

It is also important to be able to manage the configuration of your in-house system. Any small configuration change can have a major impact on the production environment. Therefore, your system will need to have access controls, audit logs and custom permissions to restrict who can make changes.

Your system will also need to have an environment-aware configuration that supports a flag configuration from one environment to the next. Most systems should be able to create two kinds of environments: one for development and one for production with its own SDK key. Then you would be able to control the flag’s value depending on which of these environments it’s being used. For example, the flag could be ‘true’ in development but ‘false’ in production.

Having different environments prevents you from accidentally exposing something in production before you are prepared. When you have all these flags across different environments, it becomes harder to keep everyone in sync, which leads us back to the issue of ‘feature flag hell’ if you don’t have the right system in place.

Feature flags categorization

With such sophisticated use cases, it would not make sense to place feature flags under one category and call it a day. Thus, here we will talk about feature flags when it comes to their longevity and dynamism.

Static vs dynamic

The configuration for some flags will need to be more dynamic than for others. Flipping a toggle can be a simple on/off switch. However, other categories of toggle are more dynamic and will require more sophisticated, very context-specific flagging decisions which are needed for advanced use cases such as A/B testing. For example, permission toggles, usually used for entitlements mentioned earlier, tend to be the most dynamic type of flag as their state depends on the current user and are triggered on a user basis.

Long- vs short-lived

We can also categorize flags based on how long the decision logic for that flag will remain in the codebase. On the one hand, some flags may be transient in nature, such as release toggles, which can be removed within a few days where the decision logic can be implemented through a simple if/else statement. On the other hand, for flags that will last longer then you’ll need to use more maintainable implementation techniques. Such flags include permission toggles and kill switches.

Therefore, it is important that your feature management solution can keep track of all the flags by determining which flag is which and indicating which flags need to be removed that are no longer needed or in use.

Challenges of an in-house system

As use cases grow so do the challenges of developing an in-house feature flagging system. Among the challenges organizations face when developing such a system include:

Many organizations will start out with a basic implementation where config changes would need to be made manually so the config change for every release will need to be made manually, which is time-consuming. Similarly, when rolling out releases, compiling customer IDs will also be done manually so keeping track of the features rolled out to each user would prove to be a major challenge.

Most of these manual processes would be carried out by the engineering team so product managers would be unable to make changes from their end and so will be dependent on engineers to make those changes for them.

The preceding point also raises the question of what you want your engineers to devote their time to. Your engineers will need to dedicate a large chunk of their time maintaining your in-house feature flagging tool which could divert their attention from building new features that could drive revenue for your company.

This ties to a lack of a UI that could serve as an audit log tracking to monitor when changes are made and by who. The lack of a UI will also mean that only engineers can control feature rollouts while product managers cannot do such deployments themselves or view which features are rolled out to which users. Thus, a centralized dashboard is needed so that all relevant stakeholders can monitor feature impact.

As mentioned previously, inability to monitor and clean up old flags will become increasingly difficult as more flags are generated. When flag adoption increases, people across your organization will find it more difficult to track which flags are still active.

Eventually, if your team does not remove these flags from the system, technical debt would become a major issue. Even keeping track of who created which flag and for what purpose could become a problem if the in-house system doesn’t provide such data.

Thus, while the advantages of feature flags are numerous, they will be far outweighed by the technical debt you start to accumulate over time that could slow you down if you are not able to take control and keep track of your feature flags’ lifecycles.

There are often high costs associated with maintaining such in-house tools as well as costs associated with upgrades so over time you will see such costs as well as your technical debt accumulating over time.
Besides the rising costs, building and maintaining a feature flagging system requires ample resources and a high degree of technical expertise as such systems require a solid infrastructure to handle large amounts of data and traffic, which many smaller organizations lack.

Such in-house tools are usually built initially to address one pain point so they have minimal functionality and thus cannot be used widely across teams and lack the scalability required to handle a wide range of uses and functions.

Time taken to develop feature flag solutions could be time lost that you could have spent developing features for your customers so you will need to consider how much time you are willing to dedicate to developing such a system.

On the other hand:

Buying a platform from a third-party vendor can be cost-effective which means you can avoid the associated costs with building a platform. There are also ongoing costs associated with buying a platform but with many options out there, companies can find a platform that suits their needs and budget.
Third-party systems typically come with ongoing support and maintenance from the vendor including comprehensive documentation so you wouldn’t have to worry about handling the upkeep for it yourself or the costs associated to maintain the platform to handle large-scale implementations.
Perhaps one of the biggest advantages of buying a solution is its immediate availability and market readiness as the solution is ready-made with expert support and pre-existing functionalities. Thus, you can save valuable time and your teams can quickly implement feature flags in their daily workflows to accelerate releases and time-to-market.
Time dedicated to building and maintaining your in-house solution could otherwise be spent developing innovative and new revenue-generating features.

Safe landing: How to proceed

To ensure a safe arrival at the final spot of your feature flag journey (depending on why and how you’re using feature flags), you will need to decide whether in-house or a third-party solution is what’s right for you. With each additional use case, maintaining an in-house solution may become burdensome. In other words, as the scope of the in-house system grows so do the challenges of building and maintaining your in-house system.

Let’s consider some scenarios where the “buy” end of the argument wins:

Your flag requirements are widening: your company is experiencing high growth- your teams are growing and different teams beyond development and engineering are becoming more involved in your feature flag journey, who in turn have different requirements.
With increasing flag usage and build-up, it’s become harder to keep track of all them in your system eventually leading to messy code.
You’re now working with multiple languages that maintaining SDKs may become highly complex.
You have an expanding customer-base which means higher volume of demand and release velocity leading to strained home-grown systems.
You need more advanced features that can handle the needs of more complex use cases. In-house systems usually lack advanced functionalities as they are usually built for immediate needs unlike third-party tools that come equipped with sophisticated features.

All these different scenarios illustrate the growing scope of feature flag usage which in turn means an increase in scope for your feature flagging system, which could pose a serious burden on in-house solutions that often lack the advanced functionalities to grow as you grow.

Many third-party feature flagging platforms come equipped with a user-friendly UI dashboard that teams can easily use to manage their feature flag usage.

Using AB Tasty’s Feature Experimentation and Rollouts, all teams within an organization from development to product can leverage to streamline the software development and delivery processes. Product teams can run sophisticated omnichannel experiments to get critical feedback from real-world users while development teams can continuously deploy new features and test in production to validate them.

Teams also have full visibility over all the flags in their system in our “flag tracking dashboard” where they can control who gets access to each flag so when the time comes they can retire unused flags to avoid build-up of technical debt.

Feature flag system is a must

At this point, you may decide that using a third-party feature flag management tool is the right choice for you. Which one you opt for will largely depend on your needs. As already pointed out, implementing your own solution is possible at first but it can be quite costly and troublesome to maintain.

Keep in mind the following before selecting a feature flag solution:

Pain points: What are your goals? What issues are you currently facing in your development and/or production process?
Use cases: We’ve already covered the many use cases where feature flags can be employed so you need to consider what you will be using feature flags for. You also need to consider who will be using it (is it just your developers or are there stakeholders involved beyond developers such as Product, Sales, etc?)
Needs and resources: Carefully weigh the build vs buy decision taking into account factors such as total costs and budget, the time required to build the platform, the scope of your solution (consider the long-term plan of your system), whether there is support across multiple programming languages-the more languages you use, the more tools you will need to support them.

Following the aforementioned points, your feature flagging management system will need to be: stable, scalable, flexible, highly-supported and multi-language compatible.

It’s more than fine to start simple but don’t lose sight of the higher value feature flags can bring to your company, well beyond the use case of decoupling deploy from release. To better manage and monitor your flags, the general consensus is to rely on a feature flag management tool. This will make feature flags management a piece of cake and can help speed up your development process.

With AB Tasty, formerly known as Flagship, we take feature flagging to the next level where we offer you more than just switching features on and off, offering high performance and highly scalable managed services. Our solution is catered not just to developers but can be widely used across different teams within your organization. Sign up for a free trial today to learn how you can ship with confidence anytime anywhere.

Download our build vs buy debate e-book to help guide you further in your decision.

You might also like...

See all

Article

7min read

Is Your Average Order Value (AOV) Misleading You?

Hubert Wassner

Jul 11, 2025

Article

5min read

Why AB Tasty Delivers 4x Faster

Leo Wiel

Jul 7, 2025

Article

15min read

16 Experimentation Influencers You Should Follow

Maddie Ostrander

Jul 3, 2025

Subscribe to
our Newsletter

Article

Jul 13, 2023

10min read

Rollout and Deployment Strategies: Definition, Types and the Role of Feature Flags in Your Deployment Process

Rowan Haddad

How teams decide to deploy software is an important consideration before starting the software development process.

This means long before the code is written and tested, teams need to carefully plan the deployment process of new features and/or updates to ensure it won’t negatively impact the user experience.

Having an efficient deployment strategy in place is crucial to ensure that high quality software is delivered in a quick, efficient, consistent and safe way to your intended users with minimal disruptions.

In this article, we’ll go through what a deployment strategy is, the different types of strategies you can implement in your own processes and the role of feature flags in successful rollouts.

What is a deployment strategy?

A deployment strategy is a technique adopted by teams to successfully launch and deploy new application versions or features. It helps teams plan the processes and tools they will need to successfully deliver code changes to production environments.

It’s worth noting that there’s a difference between deployment and release though they may seem synonymous at first.

Deployment is the process of rolling out code to a test or live environment while release is the process of shipping a specific version of your code to end-users and the moment they get access to your new features. Thus, when you deploy software, you’re not necessarily exposing it to real-world users yet.

In that sense, a deployment strategy is the process by which code is pushed from one environment into another to test and validate the software and then eventually release it to end-users. It’s basically the steps involved in making your software available to its intended users.

This strategy is now more important than ever as modern standards for software development are demanding and require continuous deployment to keep up with customer demands and expectations.

Having the right strategy will help ensure minimal downtime and will reduce the risk of errors or bugs so users get the best experience possible. Otherwise, you may find yourself dealing with high costs due to the number of bugs that need to be fixed resulting in disgruntled customers which could severely damage your company’s reputation.

Types of deployment strategies

Teams have a number of deployment strategies to choose from, each with their own pros and cons depending on the team objectives.

The deployment strategy an organization opts for will depend on various factors including team size, the resources available as well as how complex your software is and the frequency of your deployment and/or releases.

Below, we’ll highlight some of the most common deployment strategies that are often used by modern software development and DevOps teams.

Recreate deployment

Image

A recreate deployment strategy involves developers scaling down the previous version of the software to zero in order to be removed and to upload a new one. This requires a shutdown of the initial version of the application to replace it with the updated version.

This is considered to be a simple approach as developers only have to deal with one scaling process at a time without having to manage parallel application deployments.

However, this strategy will require the application to be inaccessible for some time and could have significant consequences for users. This means it’s not suited for critical applications that always need to be available and works best for applications that have relatively low traffic where some downtime wouldn’t be a major issue.

Rolling deployment

Image

A rolling deployment strategy involves updating running instances of the software with the new release.

Rolling deployments offer more flexibility in scaling up to the new software version before scaling down the old version. In other words, updates are rolled out to subsets of instances one at a time; the window size refers to the number of instances updated at a time. Each subset is validated before the next update is deployed to ensure the system remains functioning and stable throughout the deployment process.

This type of deployment strategy prevents any disruptions in service as you would be updating incrementally- which means less users are affected by any faulty update- and you would then direct traffic to the updated deployment only after it’s ready to accept traffic. If any issue is detected during a subset deployment, it can be stopped while the issue is fixed.

However, rollback may be slow as it also needs to be done gradually.

Blue-green deployment

Image

A blue/green deployment strategy consists of setting up two identical production environments nicknamed “blue” and “green” which run side-by-side, but only one is live, receiving user transactions. The other is up but idle.

Thus, at any given time, only one of them is the live environment receiving user transactions- the green environment that represents the new application version. Meanwhile, teams use the idle blue system as the test or staging environment to conduct the final round of testing when preparing to release a new feature.

Afterwards, once they’ve validated the new feature, the load balancer or traffic router switches all traffic from the blue to the green environment where users will be able to see the updated application.

The blue environment is maintained as a backup until you are able to verify that your new active environment is bug-free. If any issues are discovered, the router can switch back to the original environment, the blue one in this case, which has the previous version of the code.

This strategy has the advantage of easy rollbacks. Because you have two separate but identical production environments, you can easily make the shift between the two environments, switching all traffic immediately to the original (for example, blue) environment if issues arise.

Teams can also seamlessly switch between previous and updated versions and cutover occurs rapidly with no downtime. However, for that reason this strategy may be very costly as it requires a well-built infrastructure to maintain two identical environments and facilitate the switch between them.

Canary deployment

Image

Canary deployments is a strategy that significantly reduces the risk of releasing new software by allowing you to release the software gradually to a small subset of users. Traffic is directed to the new version using a load balancer or feature flag while the rest of your users will see the current version

This set of users identifies bugs, broken features, and unintuitive features before your software gets wider exposure. These users could be early adopters, a demographically targeted segment or a random sample.

Therefore, you start testing on this subset of users then as you gain more confidence in your release, you widen your release and direct more users to it.

Canary deployments are less risky than blue-green deployments as you’re adopting a gradual approach to deployment instead of switching from one environment to the next.

While blue/green deployments are ideal for minimizing downtime and when you have the resources available to support two separate environments, canary deployments are better suited for testing a new feature in a production environment with minimal risk and are much more targeted.

In that sense, canary deployments are a great way to test in production on live users but on a smaller scale to avoid the risks of a big bang release. It also has the advantage of a fast rollback should anything go wrong by redirecting users back to the older version.

However, deployment is done in increments, which is less risky but also requires monitoring for a considerable period of time which may delay the overall release.

A/B testing

Image

A/B testing, also known as split testing, involves comparing two versions of a web page or application to see which performs better, where variations A and B are presented randomly to users. In other words, users are divided into two groups with each group receiving a different variation of the software application.

A statistical analysis of the results then determines which version, A or B, performed better, according to certain predefined indicators.

A/B testing enables teams to make data-driven decisions based on the performance of each variation and allows them to optimize the user experience to achieve better outcomes.

It also gives them more control over which users get access to the new feature while monitoring results in real-time so if results are not as expected, they can redirect visitors back to the original version.

However, A/B tests require a representative sample of your users and they also need to run for a significant period to gain statistically significant results. Moreover, determining the validity of the results without a knowledge database can be challenging as several factors may skew these results.

AB Tasty is an example of an A/B testing tool that allows you to quickly set up tests with low code implementation of front-end or UX changes on your web pages, gather insights via an ROI dashboard, and determine which route will increase your revenue.

Feature flags: The perfect companion for your deployment strategy

Whichever deployments you choose, feature flags can be easily implemented with each of these strategies to improve the speed and quality of the software delivery process while minimizing risk.

By decoupling deployment from release, feature flags enable teams to choose which set of users get access to which features to gradually roll out new features.

For example, feature flags can help you manage traffic in blue-green deployments as they can work in conjunction with a load balancer to manage which users see which application updates and feature subsets.

Instead of switching over entire applications to shift to the new environment all at once, you can cut over to the new application and then gradually turn individual features on and off on the live and idle systems until you’ve completely upgraded.

Feature flags also allow for control at the feature level. Instead of rolling back an entire release if one feature is broken, you can use feature flags to roll back and switch off only the faulty feature. The same applies for canary deployments, which operate on a larger scale. Feature flags can help prevent a full rollback of a deployment; if anything goes wrong, you only need to kill that one feature instead of the entire deployment.

Feature flags also offer great value when it comes to running experiments and feature testing by setting up A/B tests by allowing for highly granular user targeting and control over individual features.

Put simply, feature flags are a powerful tool to enable the progressive rollout and deployment of new features, run A/B testing and test in production.

What is the right deployment strategy?

Choosing the right deployment strategy is imperative to ensure efficient, safe and seamless delivery of features and updates of your application to end-users.

There are plenty of strategies to choose from, and while there is no right or wrong choice, each comes with its own advantages and disadvantages.

Whichever strategy you opt for will depend on several factors according to the needs and objectives of the business as well as the complexity of your application and the type of targeting you’re looking to implement i.e whether you want to test a new feature on a select group of users to validate it before a wider release.

No matter your deployment strategy, AB Tasty is your partner for easier and low risk deployments with Feature Experimentation and Rollouts. Sign up for a free trial to explore how AB Tasty can help you improve your software delivery processes.

You might also like...

See all

Article

7min read

Is Your Average Order Value (AOV) Misleading You?

Hubert Wassner

Jul 11, 2025

Article

5min read

Why AB Tasty Delivers 4x Faster

Leo Wiel

Jul 7, 2025

Article

15min read

16 Experimentation Influencers You Should Follow

Maddie Ostrander

Jul 3, 2025

Subscribe to
our Newsletter

Article

Jul 11, 2023

6min read

Put Data in the Driver’s Seat | Marianne Stjernvall

Rowan Haddad

Marianne Stjernvall explains the evolution of CRO and the importance of centralizing your CRO Program to create a data-driven organization

Before becoming a leading specialist in CRO and A/B testing, Marianne Stjernvall was studying computer and systems science when a company reached out to her on LinkedIn for a position as a CRO specialist, which for her turned out to be the perfect mix of logic programming data and business and people.

Since then she founded the Queen of CRO where Marianne acts as an independent CRO consultant helping many organizations with experimentation, CRO, personalization and creating a data-driven culture for growth.

Previously, Marianne worked for companies such as iProspect, TUI and Coop Sverige where she spearheaded their CRO roadmap and developed a culture of experimentation. Additionally, she was awarded CRO Practitioner of the Year in 2020.

AB Tasty’s VP Marketing Marylin Montoya spoke with Marianne on the importance of contextualizing A/B test data to make better-informed decisions. Marianne also shared her own take on the much debated build vs buy topic and some wise advice from her years of experience with CRO and experimentation.

Here are some key takeaways from their conversation.

The importance of contextualizing data

For Marianne, CRO is becoming a big part of product development and delivery. She highlights the importance of this methodology when it comes to collecting data and acting on it in order to drive decisions.

Marianne stresses the importance of putting data into context and deriving insights from that data. This means companies need to be able to answer why they’re collecting certain information and what they plan to do with that information or data.

CRO is the key to unlocking many of those insights from the vast amount of data organizations have at hand and to pinpoint exactly what they need to optimize.

“What are you going to do with that information? You need context to provide insights and that, I think, is what CRO actually is about,” Marianne says.

This is what makes CRO so powerful as it enables organizations to take more valuable actions based on the insights derived from data.

When done right, testing within the spectrum of CRO can help move organizations into a completely different path that they were on before onto a more innovative and transformative journey.

Centralize and standard your experimentation processes first

When companies are just starting to create their experimentation or CRO program, Marianne recommends having parts of it centralized and to run tests within a framework or process to avoid teams running their own tests and executing these tests all over each other.

Otherwise, you could have different teams, such as marketing, product development and CRO teams, executing tests with no set process in place which could potentially lead to chaos.

“You will be taking decisions on A/B tests on basically three different data sets because you will be checking different kinds of data. So having an ownership of that to produce this framework and process, this is how the organization should work with these kinds of tests,” says Marianne.

With established frameworks and processes in place, organizations can set rules on how to carry out tests to get better value out of them and create ownership for the entire organization. The trick is to start small with one team and build in these processes over time onto the next team and so on.

This is especially important as Marianne argues that many organizations cannot increase their test velocity because they don’t have set processes to act on the data they get from their A/B tests. This includes how they’re calculating the tests, how they’re determining the winning or losing variation and what kind of goals or KPIs they’ve set up.

In other words, experimentation needs to be democratized as a starting point to allow an organization to naturally evolve around CRO.

Putting people at the center of your CRO program

When it comes to the build vs buy debate, Marianne argues that an A/B testing tool will not automatically solve everything.

“A great A/B testing tool can make you comfortable in that we have all the grounds covered with that. Now we can actually execute on this, but the rest is people and the organization. That’s the big work.”

In fact, companies tend to blame the tech side of things when their A/B testing is not going as planned. For Marianne, that has nothing to do with the tool; the issue primarily lies with people and processes.

As far as the build vs buy debate, before deciding to build a tool in-house, companies should first ask themselves why they want to build their own tool beyond the fact it’s more cost-efficient. This is because these tools need time to get set up and running. It may not be so cost-effective as many tend to think when choosing to build their own tool.

Marianne believes that companies should focus their energy and time on building processes and educating teams on these processes instead. In other words, it’s about people first and foremost; that’s where the real investment lies.

Nevertheless, before starting the journey of building their own tool, companies should evaluate themselves internally to understand how teams are utilizing and incorporating data obtained from tests into their feature releases.

If you’re just starting on your CRO journey, it’s largely about organizing your teams and involving them in these processes you’re building. The idea is to build engagement across all teams so that this journey happens in the organization as a whole. (An opinion that was shared by 1,000 Experiments Club podcast guest Ben Labay).

What else can you learn from our conversation with Marianne Stjernvall?

What to consider when choosing the right A/B testing tool
Her own learnings from experiments she’s run
How to get HIPPOs more involved during A/B testing
How “failed” tests and experiments can be a learning experience

About Marianne Stjernvall

Having worked with CRO and experiments for a decade and executed more than 500 A/B tests, Marianne Stjernvall has helped over 30 organizations to help them grow their CRO programs. Today, Marianne has transformed her passion for creating experimental organisations with a data-driven culture to become a CRO consultant at her own company, the Queen of CRO. She also regularly teaches at schools to pass on her CRO knowledge and show the full kind of spectrum of what it takes to execute on CRO and A/B testing and experimentation.

About 1,000 Experiments Club

The 1,000 Experiments Club is an AB Tasty-produced podcast hosted by Marylin Montoya, VP of Marketing at AB Tasty. Join Marylin and the Marketing team as they sit down with the most knowledgeable experts in the world of experimentation to uncover their insights on what it takes to build and run successful experimentation programs.

You might also like...

See all

Article

7min read

Is Your Average Order Value (AOV) Misleading You?

Hubert Wassner

Jul 11, 2025

Article

5min read

Why AB Tasty Delivers 4x Faster

Leo Wiel

Jul 7, 2025

Article

15min read

16 Experimentation Influencers You Should Follow

Maddie Ostrander

Jul 3, 2025

Subscribe to
our Newsletter

Article

5min read

The Future of Digital Personalization: EmotionsAI by AB Tasty

John Hughes

At AB Tasty, we understand the importance of personalization in reaching your audience. We also know that up to 80% of consumers are more likely to complete an online purchase with brands that offer personalized customer experiences.

We have worked extensively to enable businesses to dynamically customize website content, product recommendations and promotional offers based on individual user preferences, behavior and demographics.

However, website experiences have not lived up to customer expectations when it comes to feeling understood by brands. If brands can’t bring relevance to their audience, at the very least they should reduce frustration and negative emotions.

The role of emotions

Emotions have a big impact on the entire purchasing journey. Brands not only need to understand customer preferences, but they also need to understand the emotional impact behind each decision. People are not always rational when it comes to making buying decisions – and not all people react in the same way.

Emotions play a huge role in how we make our decisions. In fact, once we start to think of the customer journey as a succession of micro-decisions (e.g. clicking on a CTA is one of them), we can easily understand how important it is to serve a personalized experience depending on emotional profiles.

What if you can understand your customers beyond the surface level? To make concrete data-driven decisions based on the abstract notion of emotional needs in order to connect with audiences like never before? To be equipped with more knowledge and data on your customers’ behaviors? To be able to use language to describe different shopper personalities?

How can you optimize according to the distinct desires of each person?

The next step in digital personalization: AB Tasty’s EmotionsAI

Hundreds of behavioral patterns uncover your buyers’ emotional needs and train our EmotionsAI algorithm.

At AB Tasty, we love to push the boundaries of digital experiences which is why we are excited to launch our most recent acquisition. With EmotionsAI, you can experiment with unique, personalized messages for each visitor type, delve into data to understand their needs, conduct tests to identify effective messaging and construct personalized journeys targeting specific emotional needs.

Formerly known under the name Dotaki, this new technology is based on years of psychographic modeling, customer journey mapping and AI technology combined with real-time interactions on your site and device usage.

Brands are already using EmotionsAI and AB Tasty to:

Understand the emotional needs of audiences to bolster their Experience Optimization roadmap with effective messages, designs and CTAs that activate their visitors.
Have more winning variations by digging deeper into what works and for which type of personality with analytics.
Personalize campaigns by targeting based on emotional needs in the AB Tasty Audience Builder.

Customer Segmentation By Personality Type

EmotionsAI can help you understand what type of visitor is on your site. For instance, if they were classed as a “Competitive” visitor, they would react strongly to either social proof or labels that indicated previous sales or limited stock on products. If they were considered a “Safety” visitor – they would be looking for a clear, secure payment system, with easy reassurance along the way. Pragmatic visitors, who are looking for “immediacy” want the quickest route to order completion, with as few blocking points as possible.

Results

Once you are able to classify visitors with EmotionsAI, you can then start using winning variations to address their specific needs.

You can instantly identify when a variation meets the emotional need of a portion of the audience. The impact on the test success rate is impressive: with EmotionsAI, it is possible to detect a significant impact on sales in 3 times more A/B tests. This opens the door to easily implement personalizations targeting visitors on the most relevant criterion: the emotional.

In addition, the emotional segments make it possible to identify which stages of the online journey do not respond well enough to the emotional needs of the audience and generate a shortfall. This gives you ideas for future tests, for example, adding a reassurance strip to a basket stage. A/B tests based on these emotional insights have a success rate twice as high as the average.

We have seen a massive increase in revenue from previous customers. More than 60% of test variations show a successful business impact compared to 10% without EmotionsAI. Additionally, personalization campaigns using EmotionsAI have driven revenue increases ranging from 5% to 10%.

Stay ahead of the curve with the next step in experience optimization by mastering emotional personalization with EmotionsAI. Let your audience be seen by incorporating learning algorithms to map customer behaviors for predictable buying profiles.

EmotionsAI is an AI-Powered Segmentation Tool by AB Tasty, allowing for better personalization and higher conversion rates.

Want to find out more? Get in touch with us today!

You might also like...

See all

Article

7min read

Is Your Average Order Value (AOV) Misleading You?

Hubert Wassner

Jul 11, 2025

Article

5min read

Why AB Tasty Delivers 4x Faster

Leo Wiel

Jul 7, 2025

Article

15min read

16 Experimentation Influencers You Should Follow

Maddie Ostrander

Jul 3, 2025

Subscribe to
our Newsletter

Article

Jun 29, 2023

18min read

The Many Uses of Feature Flags to Control Your Releases

Rowan Haddad

The use of feature flags has evolved and expanded as teams now recognize the value they can bring to their releases.

First, let’s start with a simple definition of feature flags. A feature flag is a software development technique that lets you turn functionalities on and off to test new features in production without changing code.

This means that feature flags significantly accelerate software development processes while giving teams greater control and autonomy over releases.

Keep reading: Our complete guide to feature flagging

This is a technique that can be employed by all teams in an organization across a wide range of use cases, from the most simple to more advanced uses to improve their daily workflows.

In this article, we will explore these different uses to illustrate what feature flags can do across different contexts depending on your pain points and objectives.

Feature flags examples and use cases

Many of the use cases outlined below allow teams to take back control of releases and enable them to deliver new features quickly and safely. There may be a bug in production and you want to turn it off without delaying the release or you have second thoughts about a feature and you’re not ready for all your users to see it so you’d rather test this feature on a subset of users.

Feature flags also increase productivity and speed of teams. You’re no longer waiting to merge your code if other changes are incomplete; you just put it behind a flag until it’s ready. With this, you get more predictability to your releases. There’s no need to delay your release cycle for any last-minute bugs detected.

Therefore, we will see how the use cases outlined below bring these benefits to your team.

Prepare for launch
Hassle-free deployments: Release anytime by decoupling release from deployment
Experience rollouts and progressive delivery
Time your launch
Running experiments and A/B testing
Continuous integration and continuous delivery
Managing access: User targeting
Risk mitigation
Test in production
Feature flags and mobile app deployment: Bypass app store validation
Kill Switch: Feature rollback
Sunsetting features
Managing migrations
Feature flags as circuit breakers
Bottomline: Use feature flags often but proceed with caution

Prepare for launch

Hassle-free deployments: Release anytime by decoupling release from deployment

Feature flags allow you to deploy whenever you and your team sees fit. You no longer need to delay your releases. Any changes to a feature that are not yet ready can be toggled off with a switch.

What feature flags do in this scenario is separate code deployment from release. This is done through a release toggle, which allows specific parts of a feature to be activated or deactivated so any unfinished features will remain invisible to users until they are ready to be released.

Why is the distinction between deployment and release significant? To answer this question, it is worth noting the difference between the two terms:

Deployment is the process of putting code in its final destination on a server or any other place in your infrastructure where your code will run.
Release is exposing your code to your end-users and so it is the moment when they get access to your new features.

This difference is why we talk about decoupling deployment from release because once you do that, you can push code anywhere, anytime, without impacting your users. Then, you can release gradually and selectively whenever you’re ready through progressive and controlled rollouts as we will see below.

Experience rollouts and progressive delivery

With feature flags, you are in complete control. This means once you have a feature ready for release, you can control which subset of users will see this feature through phased rollout of releases.

When we talk about experience rollouts, we’re referring to the risk-free deployment of features that improve and optimize the customer experience.

This is usually achieved through progressive rollouts, which builds on continuous delivery to include the use of feature flags to gradually introduce features to your users.

Rather than releasing to all your users, which is often risky, you may want to release to just 5% or 10% of your users. These users should represent your overall users. Meanwhile, the team observes how these users respond to the new feature before rolling out to everyone else.

One progressive rollout technique is known as canary deployment. This is where you test how good your feature is on a small group of users and if there’s any issue, you can immediately fix it before it’s exposed to a larger number of users. This sort of gradual rollout helps mitigate the risk of a so-called big bang release. It also helps ease the pressure on your server in case it cannot handle the load.

You may also carry out what is known as ‘ring deployments.’ This technique is used to limit the impact on end-users by gradually rolling out new features to different groups. These groups are represented by an expanding series of rings, hence the name, where you start with a small group and keep expanding this group to eventually encompass all users.

In a ring deployment, you choose a group of users based on similar attributes and then make the features available to this group.

Rings and feature flags work together where feature flags can help you hide certain parts of your feature if they’re not ready in any of the deployment rings.

The advantage of such controlled rollouts is the feedback you would generate from users, especially for releases you’re less than confident about and so with the feedback received, you can improve your product accordingly.

Time your launch

We know at this point that feature flags give you the control to release at any time you deem suitable. Feature flags, then, are important because you always decide the ‘when.’ As such, with feature flags, you can aim for a timed launch where you push your feature for people in your trusted circle, such as your QA team, to test in production.

Afterwards, when it’s time to launch, you simply turn on the feature for everyone else without any fuss with the added advantage that you’re feeling much more confident when it comes to the actual release to everyone else.

This significantly reduces stress among your team because you’ve tested the feature before the official launch and you’ve made sure it’s working as it should before going ahead with a wider release.

Running experiments and A/B testing

Feature flags are great for A/B tests, where you can assign a subset of users to a feature variation and see which performs better.

This is a great use for product and marketing teams who can easily test new ideas and eliminate them if they don’t fulfil the hypothesis defined upon creation of the test.

For example, feature flags would allow your product and marketing teams to send 50% of users to the new variation of a feature and the other 50% to the original one to compare performance according to the goals set and see which variation runs better according to the KPIs set.

Using feature flags to run A/B tests is particularly useful when a feature receives enough traffic to generate efficient results. So, as a cautionary note, keep in mind that not everything can be an A/B test when it comes to feature flags.

In this sense, you can look at feature flags like a light switch. You decide when you want to turn on the feature, when to turn it off and which users have access to it. This allows you to continuously test in production until you’re satisfied with the end-result which you can then roll out to the rest of your users.

Continuous integration and continuous delivery

Feature flags means developers no longer need multiple long-lived feature branches which more often than not lead to merge conflicts.

Let’s imagine you are all set to release but then one developer’s changes have not yet been integrated into the main feature branch. Does this mean you need to wait especially when you know time is precious when it comes to releasing to impatient customers in this day and age?

With this method, developers can integrate their changes, or commit code in small increments, much more frequently, perhaps even several times a day. Through trunk-based development, a key enabler of continuous integration, developers can merge their changes directly into the master trunk helping them move faster to ensure that code is always in a ready-to-be-released state.

Consequently, we can deduce that feature flags also facilitate the process of continuous delivery.

Feature flags are essential to maintain the momentum of CI/CD processes because as mentioned, feature flags decouple deployment from release so even unfinished features can be merged but can easily be hidden behind a flag so users don’t see it while other changes can still be delivered to users without waiting on those unfinished features.

In other words, feature flags will still allow you to still continuously deploy your code so even if there is an incomplete feature, users won’t be able to access the functionality as it would be hidden behind a flag.

Managing access: User targeting

You don’t just choose the when, you also choose to whom.

Feature flags, as we’ve seen, gives you a lot of control over the release process by putting the power of when to release in your hands.

It‘s worth mentioning yet another form of power feature flags can give you, which is the ability to choose which users can access the feature. When you are testing in production, having the option to choose who you want to test on is extremely valuable depending on the kind of feedback you’re seeking.

Giving early access

We’ve seen in canary deployment that sometimes the sample you pick can be completely random. Other times, however, you might decide to carefully handpick a select group of users to give them early access to your new feature.

Why is this important? Because these are the users that are considered to be ‘early adopters.’ They are users you trust and whose feedback is top priority and who are most interested in this particular feature. These users are also the most forgiving should anything go wrong with the release.

With feature flags, you can release the feature to these early adopters who are more than willing to provide the kind of feedback you need to improve your product. This technique works well if you have a very risky release that you’re hesitant to release to a wider audience.

Power to the users: beta testing

Beta testing is another side to early access where in this scenario users willingly opt-in to test out your new features before they are released to the rest of your users.

As a result, the customers who opt-in get to see and test the feature by turning it on in their accounts and should they wish to back out they can easily disable the feature, which makes these users more inclined to opt-in in the first place as it makes them feel more in control.

This is an important use-case because it shows your customers that you’re really listening to their feedback by asking them to test your release.

The users who opt-in are those who you’re targeting with this feature so how they react to the feature will be of extreme use to you. Hence, you get to test out your new feature and you deliver value to your customers by responding to their feedback; it’s a win-win situation!

Dogfooding

This term refers to eating your own dog food, or in this case refers to an organization using its own product or service. You can, therefore, look at it as a way to test in production on internal teams.

It’s a form of alpha testing that you can run on internal users (within the organization) to make sure that the software meets all expectations and is working as it should. Thus, it represents an opportunity to evaluate the performance and functionality of a product release as well as obtain feedback from technical users.

This is a great way of testing to obtain meaningful feedback especially when you’re introducing new features or major changes that you’re not fully confident about.

This way, you are taking less risks because it’s only people within your organization who can see the releases as opposed to your actual, external users who may be more unforgiving in case things take a bad turn during a release.

No trespassing allowed: blocking users

Just as you can pick users who you want to access your feature, you can also block users from seeing it. For example, you can block certain users from a particular country or organization.

What feature flags would allow you to do is hide some features from users who might not give you the right sort of feedback while giving access to the relevant target consumers who would be most impacted by the new feature. You can also target certain features for a certain type of user to provide a more personalized experience for them.

Managing entitlements

With feature flags, you can manage which groups of users get access to different features. This is especially common in SaaS companies that offer various membership plans and so with entitlements, you can dictate which features each plan can access. This way, you would be offering different experiences to your users.

Let’s take the example of Spotify. Spotify offers free and paid plans. With the free membership, you can stream music but with advertisements while with the premium membership, you can stream unlimited music with no ads. You also get unlimited skips and you can download music to listen to offline. There are also different levels of premium to choose from including student and family plans. Consequently, with each plan, you are entitled to different content and features.

With feature flags, you can wrap a flag around a feature and release it to a particular customer depending on their subscription plan. These types of flags are usually referred to as permission toggles. They also allow you to move features easily between the different plans i.e. paid and free versions, for example.

Managing entitlements is considered to be an advanced use case as it requires careful coordination across teams and involves working with multiple flags to control permissions for the features. The person who manages entitlements is usually on the product team so careful planning and monitoring of each change performed by which person is required.

There should also be a seamless process in place to move users from one plan to another. Thus, this use case requires vigilant implementation.

Product demos and free trials

On a similar note, product and sales teams may be looking for a way to offer prospective customers a demo or a free trial or specialized demo of a feature.

Feature flags are a great way to give prospects temporary access to certain features under various pricing plans to give them a taste of through a live demo of the features among the higher plans so they can decide if an upgrade is worth it by simply toggling these features on with a flag then turning it off once the demo is complete.

Risk mitigation

Test in production

Through the use of Feature flags, teams can confidently ship their releases by testing code directly in production to validate new features on a subset of users.

Unlike testing in a staging environment, when you test in a production environment, you can collect real-world feedback from live users to ensure that teams are building a viable pipeline of products.

Testing in production also allows you to uncover any bugs that you may have missed in the development stages and discern whether your new feature can handle a high volume of real-world usage and then optimize your features accordingly.

Feature flags and mobile app deployment: Bypass app store validation

This is when we use A/B testing to test different experiences within mobile apps. Imagine you’ve just released a brand new app or introduced a new shiny update to your app.

How can you make sure your app or this update is running smoothly or that you haven’t unintentionally introduced an update full of bugs that crashes on your users? Anything that goes wrong will involve a lengthy review process that will setback your entire release as you attempt to locate and resolve the issue.

You no longer need to wait for app store approval, which could take some time and the changes are released to all users instead of smaller segments.

Instead, with remote config implemented through feature flags, any changes can be made instantly and remotely and then released to a small subset of users to get feedback before releasing it to everyone else. Therefore, you can upgrade your app continuously in real-time based on feedback from your users without waiting on app store approval.

It’s also a good way to personalize experiences for different types of users rather than creating a unified experience for all users depending on the demographics you set forth.

As a result, with feature flags, you can roll out different versions of your mobile app to monitor their impact by releasing different features to different groups of users. Afterwards, you can decide on what features will be incorporated in the final release of your app.

Using feature flags to test out your mobile app is an excellent way to generate buzz around your release by giving exclusive access to a select number of users.

Kill Switch: Feature rollback

Using feature flags will allow you to disable a feature if it’s not working as it should. This is done by using a kill switch. Thus, whenever anything goes wrong in production, you can turn it off immediately while your team fixes the issue. This would prevent you from having to roll back the entire release so other changes can be deployed and released without worrying about delaying the whole release.

With a kill switch, you can switch off a specific, troublesome feature so you can decrease the number of users who can see it, including turning it off for all users if necessary until the issue is analyzed and resolved by your team. This way, you won’t have to go through the entire code review process to locate the issue.

Kill switches therefore give you even more control over the release process. This not only empowers your team of developers but also marketing and production teams with no software development experience who can now easily test in production and kill a feature without having to rely on engineering support.

AB Tasty offers an automatic rollback option that enables you to stop the deployment of a feature and to revert all the changes that have been made in order to ensure that your feature isn’t breaking your customer experience. This is done by defining the business KPI you want to track that would indicate the performance of your feature. If it reaches a certain value, the rollback is automatically triggered.

Sunsetting features

Feature flags can also enable the ‘sunsetting’ of features. For example, with time, you might see your usage of feature flags increasing and widening to encompass a number of features. However, this accumulation of features may eventually turn into a heavy debt.

This is why it is important to continuously keep track of which features you are using and which features have run their time and need to be retired from your system.

Sunsetting, then, enables you to kill off features that are no longer being used. Feature flags would give you an idea of the usage of certain features which would help you determine whether it’s time to kill it off, lest you end up with the dreaded technical debt.

Removing unused features and clearing up old flags is the best way to keep such hidden costs in check. Thus, you should have a careful plan in mind to remove some flags once they have served their purpose or otherwise you end up with the aforementioned technical debt. This will require you to have an efficient feature flag management system in place to track down ‘stale’ flags.

Managing migrations

Feature flags can be used to safely and effectively migrate to a new database as business requirements change and evolve. What organizations would normally do before feature flags is a one-time migration then hope for the best as rollbacks are usually a painful process.

Obviously, the biggest risk that comes with switching databases is loss of data. Therefore, developers need a way to test that the data will remain intact during the migration process.

Enter feature flags. They allow you to facilitate migration and should something go wrong, you can disable the migration by simply toggling the flag off.

A percentage rollout can then be implemented using feature flags to validate the new database and any changes can be reversed by using feature flags as a kill switch.

Feature flags as circuit breakers

Feature flags are particularly useful when your system is experiencing heavy load during times of exceptionally high traffic.

In particular, the on/off switch of feature flags (operational toggles) can be used as circuit breakers to disable non-critical features that add stress to the system to help your website run better and avoid any backlash from any potential downtime caused by a heavy load.

For example, many e-commerce websites experience heavy traffic during Black Friday. To avoid a potential system outage or failure, development teams can use feature flags to turn on critical features and turn off the rest until this period of heavy traffic passes to shed some of the load from a system.

Bottomline: Use feature flags often but proceed with caution

As we’ve seen so far, many of the use cases can be easily implemented. However, others will require the ability to make detailed, complex and context-specific decisions so a more advanced feature flagging system that enables such functionalities would be needed.

Regardless of what you decide to use feature flags for, one thing is clear: feature flags put you in the driver seat when it comes to releases. You are in complete control of the when and to whom you release. It also allows you to experiment to your heart’s content but without the risks, especially when the release doesn’t go as expected.

Working with feature flags also increases productivity among teams. As we’ve seen in the use cases outlined above, it’s not only developers who have complete control over and access to the release process but product and operations teams can also release and roll back as needed.

You can use features for many things across different contexts. Some may remain for a long period of time while others need to be extracted as soon as possible so as not to accumulate technical debt.

Thus, the general advice would be to use feature flags often but keep in mind that proactive flag management and implementation will be needed to maximize the benefits while minimizing the costs.

Don’t just take our word for it. Start your feature flag journey and see for yourself what feature flags can do for you by signing up for a free trial at AB Tasty.

You might also like...

See all

Article

7min read

Is Your Average Order Value (AOV) Misleading You?

Hubert Wassner

Jul 11, 2025

Article

5min read

Why AB Tasty Delivers 4x Faster

Leo Wiel

Jul 7, 2025

Article

15min read

16 Experimentation Influencers You Should Follow

Maddie Ostrander

Jul 3, 2025

Subscribe to
our Newsletter

Article

Jun 23, 2023

1min read

Client-Side vs. Server-Side Experimentation [Infographic]

Rowan Haddad

As you go further into your digital transformation journey, the more necessary it may become to expand your experimentation capabilities. Our server-side tool was created with the aim of enriching our conversion rate optimisation platform to enable you to carry out more sophisticated tests beyond the scope of UI or cosmetic changes.

But which solution is the right one for your team and use cases? This infographic serves as a way to answer this question by comparing client- vs server-side experimentation so you can come to a decision on which solution works for your individual case.

You might also like...

See all

Article

7min read

Is Your Average Order Value (AOV) Misleading You?

Hubert Wassner

Jul 11, 2025

Article

5min read

Why AB Tasty Delivers 4x Faster

Leo Wiel

Jul 7, 2025

Article

15min read

16 Experimentation Influencers You Should Follow

Maddie Ostrander

Jul 3, 2025

Subscribe to
our Newsletter

Article

1min read

Feature Flags: Essential List of Dos and Don’ts

Rowan Haddad

Undoubtedly, feature flags provide a lot of value for software development teams looking to mitigate risk and have more control over the release process. By wrapping code (or features) in a flag, teams can enable and disable a feature thereby controlling who has (and doesn’t have) access to it. Feature flags also allow you to progressively roll out new features and test in production to obtain feedback so you can optimize accordingly before a general release.

However, feature flags do come with some degree of risk and so certain best practices apply in order to get their full benefits. Here is a checklist of dos and don’ts to follow to ensure your feature flagging usage is as smooth and as productive as possible.

Click on the image to view the full-sized infographic in a new tab