Article

7min read

Sample Ratio Mismatch: What Is It and How Does It Happen?

A/B testing can bring out a few types of experimental flaws.

Yes, you read that right – A/B testing is important for your business, but only if you have trustworthy results. To get reliable results, you must be on the lookout for errors that might occur while testing.

Sample ratio mismatch (SRM) is a term that is thrown around in the A/B testing world. It’s essential to understand its importance during experimentation.

In this article, we will break down the meaning of sample ratio mismatch, how to spot SRM, when it is and is not a problem, why it can happen and how to detect SRM.

Sample ratio mismatch overview

Sample ratio mismatch is an experimental flaw where the expected traffic allocation doesn’t fit with the observed visitor number for each testing variation.

In other words, an SRM is evidence that something went wrong.

Sample ratio mismatch is crucial to be aware of in A/B testing.

Now that you have the basic idea, let’s break this concept down piece by piece.

What is a “sample”?

The “sample” portion of SRM refers to the traffic allocation.

Traffic allocation refers to how the traffic is split toward each test variation. Typically, the traffic will be split equally (50/50) during an A/B test. Half of the traffic will be shown the new variation and the other half will go toward the control version.

This is how an equal traffic allocation will look for a basic A/B test with only one variant:

A/b testing equal traffic allocation

If your test has two variants or even three variants, the traffic will still be allocated equally to each test to ensure that each version receives the same amount of traffic. An equal traffic allocation in an A/B/C test will be split into 33/33/33.

For both A/B and A/B/C tests, traffic can be manipulated in different ways such as 60/40, 30/70, 20/30/50, etc. Although this is possible, it is not a recommended practice to get accurate and trustworthy results from your experiment.

Even by following this best practice guideline, equally allocated traffic will not eliminate the chance of an SRM. This type of mismatch is something that can still occur and must be calculated no matter the circumstances of the test.

Define sample ratio mismatch (SRM)

Now that we have a clear picture of what the “sample” is, we can build a better understanding of what SRM means:

  • SRM happens when the ratio of the sample does not match the desired 50/50 (or even 33/33/33) traffic allocation
  • SRM occurs when the observed traffic allocation to each variant does not match the  allocation chosen for the test
  • The control version and variation receive undesired mismatched samples

Whichever words you choose to describe SRM, we can now understand our original definition with more confidence:

“Sample ratio mismatch is an experimental flaw where the expected traffic allocation doesn’t fit with the observed visitor number for each testing variation.”

sample ratio mismatch

Is SRM always a problem?

To put it simply, SRM occurs when one test version receives a noticeably different amount of visitors than what was originally expected.

Imagine that you have set up a classic A/B test: Two variations with 50/50 traffic allocation. You notice at one point that version A receives 10,000 visitors and version B receives 10,500 visitors.

Is this truly a problem? What exactly happened in this scenario?

The problem is that while conducting an A/B test, an extremely strict respect of the allocation scheme is not always 100% possible since it must be random. The small difference in traffic that is noted in the example above is something we would typically refer to as a “non-problem.”

If you are seeing a similar traffic allocation on your A/B test in the final stages, there is no need to panic.

A randomly generated traffic split has no way of knowing exactly how many visitors will stumble upon the A/B test during the given time frame of the test. This is why toward the end of the test duration, there may be a smaller difference in the traffic allocation while the majority (+95%) of traffic is correctly allocated.

When is SRM a problem?

Some tests may have SRM due to experimental setup.

When the SRM is a big problem, there will be a noticeable difference in traffic allocation.

If you see 1,000 directed to one variant and 200 directed to the other — this is an issue. Sometimes, spotting SRM does not require a particular mathematical formula dedicated to calculating SRM as it is evident enough on its own.

However, an extreme difference in traffic allocation can be very rare. Therefore, it’s essential to check the visitor counts in an SRM test before each test analysis.

Does SRM occur frequently?

Sample ratio mismatch can happen more often than we think. According to a study done by Microsoft & Booking, about 6% of experiments experience this problem.

Furthermore, if the test includes a redirect to an entirely new page, SRM can be even more likely.

Since we heavily rely on tests and trust their conclusions to make strategic business decisions, it’s important that you are able to detect SRM as early as possible when it happens during your A/B test.

Can SRM still affect tests using Bayesian?

The reality is that everyone needs to be on the lookout for SRM, no matter what type of statistical test they are running. This includes experiments using the Bayesian method.

There are no exemptions to the possibility of experiencing a statistically significant mismatch between the observed and expected results of a test. No matter the test, if the expected assumptions are not met, the results will be unreliable.

Sample ratio mismatch: why it happens

Sample ratio mismatch can happen due to a variety of different root causes. Here we will discuss three common examples that cause SRM.

One common example is when the redirection to one variant isn’t working properly for poorly connected visitors.

Another classic example is when the direct link to one variant is spread on social media, which brings all users who click on the link directly to one of the variants. This error does not allow the traffic to be properly distributed among the variants.

In a more complex case, it’s also possible that a test including JS code is crashing a variant and therefore some of the visitor configurations. In this situation, some visitors that are being sent to the crashing variant won’t be collected and indexed properly, which leads to SRM.

All of these examples have a selection bias: some non-random visitors are excluded. The non-random visitors are arriving directly from a link shared on social media, have a poor connection, or are visiting a crashing variant.

In any case, when these issues occur, the SRM is an indication that something went wrong and you cannot trust the numbers and the test conclusion.

Checking for SRM in your A/B tests

Something important to be aware of when doing an SRM check is that the priority metric when checking needs to be “users” and not “visitors.” Users are the specific people that are allocated to each variation, meanwhile, the visitors metric is counting the number of sessions that each user makes.

It’s important to differentiate between users and visitors because results may be skewed if a visitor comes back to their variation multiple times. SRM detected with “visitors” may not be the most reliable metric, but using the “users” metric is evidence of a problem.

SRM in A/B testing

Testing for sample ratio mismatch may seem a bit complicated or unnecessary at first glance. In reality, it’s quite the opposite.

Understanding what SRM is, why it happens, and how it can affect your results is crucial in A/B testing. Running an A/B test to help make key decisions is only helpful for your business if you have reliable data from those tests.

Want to get started on A/B testing for your website? AB Tasty is a great example of an A/B testing tool that allows you to quickly set up tests with low code implementation of front-end or UX changes on your web pages, gather insights via an ROI dashboard, and determine which route will increase your revenue.

Subscribe to
our Newsletter

bloc Newsletter EN

We will process and store your personal data to respond to send you communications as described in our  Privacy Policy.

Article

6min read

How AB Tasty Delivers High-Quality Risk-Free Releases with Feature Flags

With their feature flagging functionality, AB Tasty were able to safely and quickly launch new changes to end users without impacting quality through progressive delivery and continuous feedback loops.

In the world of SaaS, velocity and quality are of utmost importance. This is an industry that is constantly evolving and companies must work diligently to keep pace with consumers’ fast-changing needs and to maintain competitive advantage.

AB Tasty has seen a high growth in users all around the world. Consequently, AB Tasty had to accelerate their development processes, which meant that development and feature teams experienced high growth in order to enable the development of more features and increase speed of delivery to market.

The challenges of CI/CD

However, with such high growth and scaling, the company was faced with many growing pains and bottlenecks, which significantly slow down CI/CD processes. This increased the risk of pile up of issues, which defeats the initial purpose of accelerating time-to-market.

Even with mature CI/CD processes, developer and product teams are not immune to pitfalls that impact speed of delivery and go-to-market.

With these challenges in mind, the team at AB Tasty had four main objectives in mind:

  • Accelerate time-to-market.
  • Increase speed of delivery without sacrificing quality.
  • Allow teams to remain autonomous to avoid delays.
  • Reduce risk by avoiding big bang releases.

The team essentially needed a stress-free solution to push code into production and an easy-to-use interface that can be used by development teams to release features as soon as they’re ready to eliminate the issue of bottlenecks and by product teams to gain more control of the release process by running safe experiments in production to gather useful feedback.

This is when the team at AB Tasty turned to their flagging feature.

Feature flags were a way for the team to safely test and deploy new changes to any users of their choice while keeping them turned off for everyone else.

The team at AB Tasty was able to do this by, first, defining a flag in the feature management interface whose value is controlled remotely by the tool’s API.

The team can then set targeting rules, that is the specific conditions for the flag to be triggered, based on the user ID. Using feature flags, they can perform highly granular user targeting, allowing them to target users using any user attributes to which they have access.

Then, in AB Tasty’s own codebase, teams can simply condition the activation of the feature that interests them, or its behavior, according to the value of the variable, using a simple conditional branch.

At the time, the company was working on a key project to revamp a major part of the UI namely the navigation system, which includes a new vertical navigation and new responsive grids to offer new personalization campaigns with the goal to make it more understandable to users.

For a project of this scope, there were big changes tied to many dependencies, such as the database, and so AB Tasty needed a way to progressively deploy these new changes to obtain initial feedback and avoid a big bang release.

Progressively deliver features

With such a large project, the goal was to mitigate risk by avoiding deploying major changes to all users at once. With feature flags, teams are able to reduce the number of users who can access the new changes.

In particular, the ON/OFF deployment logic of feature toggles within the feature management tool’s interface works like a switch so teams can progressively roll out features based on pre-set targeting rules while turning them off for everyone else.

Easily set up and manage beta and early adopter lists

After testing internally, the product team was looking for a way to easily manage their early adopter list before releasing to the rest of their users. This will enable them to receive quicker feedback from the most relevant (and more forgiving) users.

With AB Tasty’s flagging functionality, teams can simply add these early adopters’ account ID into the targeting of the flag, where they can then immediately access the new feature exclusively before anyone else.

Release without stress by ensuring that developers are ready to tackle any issues

Since most of the development team was based in France, the new changes were initially rolled out to that region so developers can ensure that everything works and can quickly fix any bugs before deploying to other regions (and timezones).

Should anything go wrong, teams can easily roll back the release with a kill switch by immediately toggling a flag off within the feature flagging platform interface so that the feature is no longer visible.

Enable continuous feedback loops

Teams can now test in production on end-users and to optimize features and iterate faster based on real production data. As a result, teams can launch the end-product to all users with the reassurance that they have identified and fixed any issues.

This also empowers teams to become more innovative, as they now have a safe way to test and receive feedback on their ideas, and are no longer limited in their scope of work.

Accelerate go-to-market

Furthermore, the ON/OFF deployment logic allows teams to release at their own pace. This accelerates the time-to-market as developers no longer need to wait for all changes to be ready to release their own changes resulting in less delays and disgruntled customers.

This speed would not be at the expense of quality as with continuous feedback loops, teams can iterate releases which ensures that only high quality products are released to users.

Teams can send features to production whenever they’re ready, make them visible to some users and officially launch to market once all deliverables are ready and good to go!