Article

11min read

CRO Metrics: Navigating Pitfalls and Counterintuitive KPIs

Metrics play an essential role in measuring performance and influencing decision-making.

However, relying on certain metrics alone can lead you to misguided conclusions and poor strategic choices. Potentially misleading metrics are often referred to as “pitfall metrics” in the world of Conversion Rate Optimization.

Pitfall metrics are data indicators that can give you a distorted version of reality or an incomplete view of your performance if analyzed in isolation. Pitfall metrics can even cause you to backtrack in your performance if you’re not careful about how you evaluate these metrics.

Metrics are typically split into two categories:

  • Session metrics: Any metrics that are measured on a session instead of a visitor basis
  • Count metrics: Metrics that count events (for instance number of pages viewed)

Some metrics can mesh into both categories. Needless to say, that’s the worst option for a few main reasons: no real statistical model is used when meshing into both categories. There is no direct/simple link to business objectives and these metrics may not need standard optimization.

While metrics are very valuable for business decisions, it’s crucial to use them wisely and be mindful of potential pitfalls in your data collection and analysis. In this article, we will explore and explain why some metrics are very not wise to use in practice in CRO.

Session-based metrics vs visitors

One problem with session-based metrics is that “power users” (AKA users returning for multiple sessions during the experimentation) will lead to a bias with the results.

Let’s remember that during experimentation, the traffic split between the variations is a random process.

Typically you think of a traffic split as very random but very even groups. When we talk about big groups of users – this is typically true. However, when you consider a small group, it’s very unlikely that you will have an even split in terms of visitor behaviors, intentions and types.

Let’s say that you have 12 power users that need to be randomly divided between two variations. Let’s say that these power users have 10x more sessions than the average user. It’s quite likely that you will end up with a 4 and 8 split, a 2 and 10 split, or another uneven split. Having an even split randomly occur is very unlikely. You will then end up in one of two very likely situations:

  • Situation 1: Very few users may make you believe you have a winning variation (which doesn’t yet exist)
  • Situation 2: The winning variation is masked because it  received too few of these power users

Another problem with session-based metrics is that a session-based approach blurs the meaning of important metrics like transaction rates. The recurring problem here is that not all visitors display the same type of behavior. If average buyers need 3 sessions to make a purchase while some need 10, this is a difference in user behavior and does not have anything to do with your variation. If your slow buyers are not evenly split between the variations, then you will see a discrepancy in the transaction rate that doesn’t actually exist.

Moreover, the metric itself will lose part of its intuitive meaning over time. If your real conversion rate is around 3%, but counted by session and not by unique visitors, you will only likely only see a 1% conversion rate when switching to unique visitors.

This is not only disappointing but very confusing.

Imagine a variation urging visitors to buy sooner by using “stress marketing” techniques. Let’s say this leads to a one session purchase instead of three sessions. You will see a huge gain (3x) on the conversion per session. BUT this “gain” is not an actual gain since the number of conversions will have no effect on the revenue earned. It’s also good to keep in mind that visitors under pressure may not feel very happy or comfortable with such a quick purchase and may not return.

It’s best practice to avoid using session-based metrics unless you don’t have another choice as they can be very misleading.

Understanding count metrics

We will come back to our comparison of these two types of metrics. But for now, let’s get on the same page about “count metrics.” To understand why count metrics are harder to assess, you need to have more context on how to measure accuracy and where exactly the measure comes from.

To model rate accuracy measures, we use beta distribution. In the graph below, we see the measure of two conversion ratios – one blue and one orange. The X-axis is the rate and Y-axis is the likelihood. When trying to measure the probability that the two rates are different, we implicitly explore the part of the two curves that are overlapping.

In this case, the two curves have very little overlap. Therefore, the probability that these two rates are actually different is quite high.

The more narrow or compact the distribution is, the easier it is to see that they’re different.

Want to start optimizing your website with a platform you can trust? AB Tasty is the best-in-class experience optimization platform that empowers you to create a richer digital experience – fast. From experimentation to personalization, this solution can help you activate and engage your audience to boost your conversions.

The fundamental difference between conversion and count distributions

Conversion metrics are bounded into [0:1] as a rate or [0%:100%] as a percentage. But, for count metrics the range is open, and the counts are in [0,+infinity].

The following figure shows a gamma distribution (in orange) that may be used with this kind of data, along with a beta distribution (in blue).

These two distributions are based on the same data: 10 visitors and 5 successes. This is a 0.5 success rate (or 50%) when considering unique conversions. In the context of multiple conversions, it’s a process with an average of 0.5 rate conversion per visitor.

Notice that the orange curve (for the count metric) is non-0 above x = 1, this clearly shows that it expects that sometimes there will be more than 1 conversion per visitor.

We will see that comparisons between this kind of metric depend on whether we consider it as a count metric or as a rate. There are two options:

  • Either we consider that the process is a conversion process, using a beta distribution (in blue), which is naturally bounded in [0;1].
  • Or we consider that the process is a count process, using gamma distribution (in orange), which is not bounded on the right side.

On the graph, we see an inner property of count data distributions, they are dissymmetric: the right part goes slower to 0 than the left part. This makes it naturally more spread out than the beta distribution.

Since both curves are distributions, their surface under the curve must be 1.

As you can see, the beta distribution (in blue) has a higher peak than the gamma distribution (in orange). This exposes that the gamma distribution is more spread out than the beta distribution. This is a hint that count distributions are harder to get accurate than conversion distributions. This is also why we need more visitors to assess a difference when using count metrics rather than when using conversion metrics.

To understand this problem you have to imagine two gamma distribution curves, one for each variation of an experiment. Then, gradually shift one on the right, showing an increasing difference between the two distributions. (see figure below)

Since both curves are right-skewed, the overlap region will occur on at least one of the skewed parts of the distributions.

This means that differences will be harder to assess with count data than with conversion data. This comes from the fact that count data works on an open range, whereas conversion rates work on a closed range.

Do count metrics need more visitors to get accurate results?

No, it is more complex than that in the CRO context. Typical statistical tests for count metrics are not suited for CRO in practice.

Most of these tests come from the industrial world. A classic usage of count metrics is counting the number of failures of a machine in a given timeframe. In this context, the risk of failure doesn’t depend on previous events. If a machine already had one failure and has been repaired, the chance of a second failure is considered to be the same.

This hypothesis is not suited for the number of pages viewed by a visitor. In reality, if a visitor saw two pages, there’s a higher chance that they will see a third page compared to a visitor that just saw one page (since they have a high probability to “bounce”).

The industrial model does not fit in the CRO context since it deals with human behavior, making it much more complex.

Not all conversions have the same value

The next CRO struggle also comes from the direct exploitation of formulas from the industrial world.

If you run a plant that produces goods with machines, and you test a new kind of machine that produces more goods per day on average, you will conclude that these new machines are a good investment. Because the value of a machine is linear with its average production, each extra product adds the same value to the business.

But this is not the same in CRO.

Imagine this experiment result for a media company:

Variation B is yielding an extra 1,000 page views more than the original A. Based on that data, you put variation B in production. Let’s say that variation B lost 500 people that saw 2 pages and variation B also won 20 people that saw 100 pages each. That makes a net benefit of 1000 page views for variation B.

But what about the value? These 20 people, even if they spent a lot of time on the media, are maybe not the same value as 500 people that come regularly.

In CRO each extra value added to a count metric does not have the same value, so you cannot trust measured increment as a direct added value.

In applied statistics, one adds an extra layer to the analysis: a utility function, which links extra counts to value. This utility function is very specific to the problem and is unknown to most CRO problems. Even if you get some more conversions in a count metric context, you are unsure about the real value of this gain (if any).

Some count metrics are not meant to be optimized

Let’s see some examples where raising the number of a count metric might not be a good thing:

  • Page views: If the count of page views rises, you can think it’s a good thing because people are seeing more of your products. However, you can also think that people get lost and need to browse more pages to find what they need.
  • Items added to cart: We have the same idea for the number of products added to the cart. If you do not check how many products remain in the cart at the checkout stage, you don’t know if the variation helps to sell more or if it just makes the product selection harder.
  • Product purchased: Even the number of products purchased may be misleading as a business objective alone if used alone in an optimization context. Visitors could be buying two cheaper products instead of one high-quality (and more expensive) product.

You can’t tell just by looking at these KPIs if your variation or change is good for your business or not. There is more that needs to be considered when looking at these numbers.

How do we use this count data then?

We see in this article how counterintuitive optimization based on sessions is. And even worse, we see how misleading count metrics are in CRO.

Unless you have both business and statistics expert resources, it’s best practice to avoid them, at least as a unique KPI.

As a workaround, you can use several conversion metrics with specific triggers using business knowledge to set the thresholds. For instance:

  • Use one conversion metric for count is in the range [1; 5] called “light users.”
  • Use another conversion metric in the range [6,10] called “medium users.”
  • Use another one for the range [11,+infinity] called “heavy users”.

Splitting up the conversion metrics in this way will give you a clearer signal about where you gain or lose conversions.

Another piece of advice is to use several KPIs to have a broader view.

For instance, although analyzing the product views alone is not a good idea – you can check the overall conversion rate and average order value at the same time. If product views and conversion KPIs are going up and the average order value is stable or goes up, then you can conclude that your new product page layout is a success.

Counterintuitive Metrics in CRO

Now you see that except for conversions counted on a unique visitor basis, nearly all other metrics can be very counterintuitive to use in CRO. Mistakes can happen because of statistics that work differently, and also because the meaning of these metrics and their evolutions may have several interpretations.

It’s important to understand that CRO skill is a mix of statistics, business and UX knowledge. Since it’s very rare to have all this within one person, the key is to have the needed skills spread across a team with good communication.

Subscribe to
our Newsletter

bloc Newsletter EN

We will process and store your personal data to send you communications as described in our  Privacy Policy.

Article

7min read

Four Ways to Use GA4 to Power Your Web Experimentation Programs

We invited Oliver Walker from our partner Hookflash to talk us through the practical ways you can use GA4 with your experimentation. 

Although many people are talking about GA4 as a different platform from the previous version (Universal Analytics), conceptually it lets you do largely the same things. Its primary functions are to help you to understand and optimize your media; to understand and optimize your website; and to understand and segment your website visitors into audiences. However, with GA4 several features can really help you to power an experimentation program.

Here we’ll outline how to use GA4 to its full potential to drive results for your testing program.

Understanding User Behavior

At its core, Google Analytics has always been great for helping website owners to understand their website traffic. Whether it’s where they started their journey or where they ended their digital journey, or whether they sought help halfway through, there are a few options to know about. What we know about GA4 already is that it’s not the most intuitive tool in the world so here are some quick tips on that front:

  • Landing Pages – use Explorations – although there is a default report for landing pages…it’s not the best. Not just because there’s a known bug resulting in an empty row, but also because it doesn’t have the most useful metrics, i.e. bounce rate or engagement rate. If you build a report in Explorations, you can use a different dimension (called “Landing page + query string”) and choose the dimensions you’d find useful:

  • Exit rate – similar to the above, you no longer get Exits (or Exit Rate) in the default Pages & Screens report. Again, rebuilding the report in Explorations gives you both the ability to add Exits as a metric, and you can choose your preferred pages dimension. The default dimension in the Pages and Screens report does not include query strings but if you’d prefer to use the one that does, choose the dimension “Page path + query string”.
  • Site search – and finally, where’s the Site Search report gone!? There’s no longer a default report for this but you can rebuild this in Explorations. You can understand which search terms were most often looked for, by building an Exploration with the dimension of “Search term” and the metric “Event count”.

Understanding User Flow

What Universal Analytics was not particularly good at is visualizing how people traverse through a website. The flow reports were horribly sampled and just merely teased you as to what you could have had. GA4 has on-the-fly path exploration reports that can be used and tweaked, very flexibly. You can find these within Explorations too, just choose Path Exploration and then  tweak, as per the following:

  • Get the pages view – for some unhelpful reason, the default view is Event Name, within each step. In the visualization, click the drop-down underneath Step +1 and change Event Name to be your preferred page dimension to get a view of how users move from page to page.
  • Double-click the page you are interested in to see where users go next. You can also click the +15 more (or whichever number) link at the bottom of each column to get the longer tail
  • Choose a dimension to “breakdown” by lets you easily compare routes through the site for different users, for example mobile vs. desktop or for each of the different browsers. Likewise, you can use segments here to review a certain audience type, e.g. non-UK traffic or Purchasers.

Audience targeting & triggers

Speaking of audiences, this was always a great feature of Universal Analytics and when Google Optimize was in its pomp, the ability to share audiences from UA to Optimize was one of its prime features. With GA4 you get the same ability to build audiences and to share audiences natively with other Google Marketing Platform (GMP) plus some neat additional elements:

  • The ability to use user behavior to trigger new types of goals. For example, if you’re a publisher and you want to engage people to read a certain number of articles in a particular time frame, it’s possible to create an audience for this and then have that set of behavior trigger a new event. It’s called audience triggers. And this becomes a powerful new metric with which to optimize your testing campaigns, by importing that conversion into your chosen testing tool
  • The ability to export audiences from GA4 to other platforms. Namely, this is something that the new Google Analytics Data API supports. This is big news. Whilst it’s to be expected that other platforms will catch-up, at the moment AB Tasty is the only one to have published their mechanism for pulling GA4 audiences into their platform:

This is generally a great leap forward as GA4 also has the concept of users being added, and removed from audience groups, whereas most testing tools don’t have this feature.

Advanced analysis using BigQuery

The final area where GA4 really steps forward beyond its predecessor is that all GA4 accounts have a native integration with Google BigQuery. Whilst the integration itself is free, it’s worth noting that you do incur costs by storing or processing data in BigQuery, although a good partner will be able to advise on what that might look like for you. 

So where does BigQuery help? The data schema provided by integrating GA4 and BigQuery is raw-level data – that means each row is effectively an event, with a time stamp, and all the associated parameters. It lets you have a greater degree of flexibility over what you analyze, provided you’re able to query the data (using SQL, or your friendly AI-driven chat tool.) For example:

  • If you want to understand how long it takes a user to complete a particular flow or set of actions. Worth noting that Google Analytics does batch events so this isn’t perfect, but it is easier than within the interface
  • If you want to look at user flows at an even greater level of detail, for example, how users traverse through the site having landed at a particular page
  • If you want to stitch together any data that GA didn’t capture but that also exists in Google Cloud, e.g. following a lead to submission through to outcome.
  • If you want to conduct a deeper analysis within your post-experiment analyses. All testing platforms will pass events and parameters to denote whether a user was part of an experiment and the variation they saw, so GA4 is a powerful additional tool to deep-dive into results

It’s not all doom and gloom

Yup, GA4 does have some limitations, it’s a big change to a tool that lots of people loved and it’s hard to pick-up. BUT when you start to understand certain concepts and familiarize yourself with capabilities, there are lots of features to help you with your experimentation program.