In the next installment of our series on a data-driven approach to customer-centric marketing, we spoke with our partner Raoul Doraiswamy, Founder & Managing Director of Conversionry to understand the flow of a customer-centric experimentation process, and why it is critical to tap into insights from experimentation processes to make better decisions.
What do you find is the biggest gap in the marketing & growth knowledge among brands right now?
Many brands today have the right set of tools such as technology investments, or the right people with marketing expertise. However, brands often face the issue of not knowing how to meet customer needs/how to give their customers what they want whether on their website, app or through digital advertising on the website, app or digital advertising – in other words, how can these brands increase conversions? Raoul identifies the lack of customer understanding to be at the core of this gap and suggests that brands should adopt a customer-centric, customer-driven process that enables a flow of customer insights, complemented by experimentation.
Which key activities deliver the best insights into customer problems?
Raoul believes that to start a strategy that puts the customers at the core, it is important to have the right data-gathering approach to get insights. It’s the foundation of any experimentation program, but can be applied to all marketing channels.
“Imagine you are an air traffic controller. You have multiple screens constantly feeding you where the planes are, or when they might crash into each other. From all these constant insights, the person in front of the screens will have to make the right decisions,” he shares. “However, there are also inconsequential insights such as baggage holders being full – and it is up to the decision-makers to pick out the critical data and make use of them.”
Raoul provides this analogy to liken it to the role of marketing decision-makers, who normally have a dashboard with metrics like revenue, conversion rate, abandoned cart and more. An insights dashboard helps marketers better understand their customers, combining this real-time data with customer feedback from sources like analytics, heatmaps, session recordings, social media comments and user testing. Solid research can be done through a critical analysis of session recordings and user poll forms, and the main takeaways can be fed to this dashboard. How empowering is that for a marketing decision-maker?
Where are the best sources for experimentation ideas?
Raoul asserts that a combination of quantitative and qualitative analysis is key. Heuristic analysis and competitor analysis are also gold when coming up with experimentation ideas. He continues, “Don’t limit yourself to looking at competitors, look at other industries too. For example, for a $90M trade tools client we had to solve the problem of increasing user sign-ins to their loyalty program. By researching Expedia and Qantas, we got the idea to show users points instead of cash to pay for items.” Raoul shares, “Do heat map analysis, look at session recordings, user polls, run surveys to email databases, and user testing. User testing is critical in understanding the full picture.”
After distilling customer problems and coming up with some rough experimentation ideas, the next step is to flesh out your experiment ideas fully. “Going back to the analogy of the Air Traffic Controller, one person on the team is seeing a potential crash but might have limited experience in dealing with this situation. That’s when more perspectives can be brought in by, let’s say, a supervisor, to make a more well-rounded decision. In the same way, when you are ideating, you do not want to just limit it to yourself but rather have a workshop where you discuss ideas with your internal team. If you are working with an agency, you can still have a workshop with both the agency and the client present, or have your CRO team and product team come together to share ideas. This way, you can get multiple stakeholders involved, each of them being able to provide expertise based on their experience with customers,” says Raoul.
Is there value in running data-gathering experiments (as opposed to improving conversion / driving a specific metric)?
“Yes, absolutely,” replies Raoul. “Aligning growth levers with clients every quarter while working with CRO and Experimentation teams on the experimentation process is important. When working towards the goal of increasing conversions, there are KPIs and predictive models to project the goals.
“On the other hand, if the focus of the program is on product feature validation or reducing the risk of revenue due to untested features, there will be a separate metric for that,” he continues. “It is key to have specific micro KPIs for the tests that are running to generate a constant flow of insights, which then allows us to make better decisions.”
In running data-gathering experiments, features such as personalization can be applied which can have a positive impact on the conversions on the website.
What do brands need to get started?
“To begin, you need to start running experiments. Every day without a test is a day lost in revenue!” heeds Raoul. “For marketing leaders who have yet to start running experiments, you can start by pinpointing customer problems, and the flow of insights. To get the insights, you can gather them from Google Analytics, more specifically, by looking at your funnel. Through these insights, identify the drop-off point and observe the Next Page Path, to see where users go next.
“Take for example an eCommerce platform. If the users are dropping off at the product page instead of adding to the cart and moving on to the shipping page, this shows that they are confused about the shipping requirements. This alone can tell you what goes through the user’s mind. Look at heat maps and session recordings to understand the customer’s problems. The next step then is to solve the issue and to do that, you will need an A/B testing platform. Use the A/B testing platform to build tests and launch them as quickly as possible.”
As for established marketing teams who are already doing some testing, Raoul recommends gathering insights and customer problems as they come in every month. “Then to make sense of the data you’ve collected, you need conversion optimization analysts like our experts at Conversionry who are experienced in distilling data down to problems.”
Identifying customer problems is key. If some of the issues your customers encounter stay unaddressed, it could lead to the initiatives flatlining despite months of experimentation. Instead by keeping customer feedback top of mind, you can start designing, development, testing, speak to experience optimization platforms like AB Tasty to build the experiments, then gather insights, and repeat the cycle to see what wins and what doesn’t.
Get started building your A/B tests today with the best-in-class software solution, AB Tasty. With embedded AI and automation, this experimentation and personalization platform creates richer digital experiences for your customers, fast.
This is the fifth part of our series on a data-driven approach to customer-centric marketing. We met with our partner Sophie D’Souza, Vice President of Optimization at Spiralyze, and Rémi Aubert, Co-CEO & Co-Founder of AB Tasty, who talk about what a customer-centric culture really means, why it’s so important for companies to foster one, the data that enables such a culture, and the challenges and benefits involved.
How would you define a customer-centric culture?
In this data series, we’ve discussed ways to use and analyze data, metrics, and experimentation to better understand your customers, meet their needs and forge emotional connections with them. All of these things contribute to the ultimate goal of building a customer-centric vision and culture for brands.
But what defines a customer-centric culture? For Sophie, “Being customer-centric means that the customer is at the nucleus of the business – the shared collection of values, expectations, practices, and decisions that guide and inform team members are centered around the customer and the needs of the customer. And a big part of achieving that is ensuring data isn’t siloed – it’s not segmented to any one department like upper management or customer success; it permeates every aspect of the company; formal and informal systems, behaviors, business decisions and values all revolve around the customer.”
In Rémi’s opinion, customer-centricity is also very much about “Prioritizing customers above prospects in your day-to-day work. It’s easiest when you’re a small business, but it’s vital to keep this spirit while you grow. Acquiring new customers is important, but we need to remember that our existing customers have already given us their trust. It’s our job to repay them for that with positive experiences, or at least excellent customer support so we can maintain positive experiences and turn any negative experiences into positive ones to ensure we retain them.
“Above all, being customer-centric means not being mercenary: it’s the foundation of organic growth, where word-of-mouth from satisfied customers spreads and turns prospects into new customers.”
Why is the democratization of data important? “Data democratization is essential for building a customer-centric culture,” explains Sophie. “Shared, accessible data that isn’t siloed to any one department is the best way to gain customer knowledge. Equally important is a system for gathering, storing, interpreting, and acting upon this data whenever possible.”
“Constant product and website experimentation has shed light on the value of feedback – both qualitative and quantitative – and proven its value for providing insights to the organization. Companies now understand the meaning of a data-driven culture, and the dissemination of these insights across the entire organization is what drives customer-centricity.”
Rémi notes that during the last ten years, the emphasis has been on collecting data. “But today, we’re in a phase of interpreting data in order to act upon it – and this is a mature phase, we know the right KPIs to use to bring value; tomorrow, we’ll be able to automate this data, but few organizations have attained that capability yet.”
What types of data are needed to build a customer-centric culture?
“A customer-centric culture is a data-led business model, where both qualitative and quantitative data are essential – and experimentation plays a vital role,” says Sophie. “Quantitative data gives us brilliant direction. It’s often dictated by product centricity – how customers are interacting with products, and the actions they’re taking. Qualitative data, on the other hand, is dictated by customer needs. Pairing them will provide tons of valuable information. You can gather this from many different sources: engagement and community building (e.g., encouraging customers to leave reviews, asking questions on social channels, etc.).”
“But experimentation is a core part of this, allowing us to directly measure how individuals coming to our product or our website are interacting with us and what actions should accordingly be taken.”
Rémi agrees: “Even if we understand the quantitative aspect or the qualitative aspect of our data, we won’t be able to measure the impact of customer behavior if we’re not able to change those behaviors. This is where testing and personalization come into play.
“It’s fine to identify issues, but if we can’t propose solutions and measure their efficacy, we won’t be able to adapt our culture of customer centricity to new needs. The complementarity between quantitative and qualitative data is essential. Quantitative data helps us identify problems, while qualitative data usually helps find solutions.”
Sophie’s on board: “Experimentation lets us put the customer first because we can test different solutions based on the problems we’ve identified. So rather than rolling out an idea we’ve deemed internally to be the best, experimentation lets the customer guide our actions, and in that way, we know we’re responding to real needs.”
Are there problems associated with acquiring the necessary data?
Rémi says the main problem is related to faulty data collection: “We sometimes see biased data due to incomplete data collection. Biased data is useless. Another issue we often see is that of overcollection: people collect far more data than they need, then find themselves lost in a data deluge that’s impossible to analyze and from which they can’t extract insights. The enemy of good data is too much data because you can’t orchestrate it.”
“We’ve learned that too much data equals clutter and distraction,” says Sophie. “There’s a lack of central systems in place that are efficient enough to process that much data and make it actionable. Designing systems to capture the information we need at scale and disseminate it while minimizing variance by individual interpretation is the objective for businesses today.”
What are the challenges to achieving a customer-centric culture?
Rémi tells a story about a client from a top-tier luxury jeweler. “It’s very difficult for brands like that, which have strict graphic charts and editorial guidelines, to be customer-centric, as they have little flexibility for testing. These brands are very powerful: you can’t make the slightest modification without validation by the entire brand team. So even if you know you can improve the customer journey or experience on the website, you can’t implement any changes because brand policy prohibits it. The result? Even if you have data proving a given change will improve their customer satisfaction, brand ‘integrity’ won’t allow it.”
Sophie sees a lot of progress being made, but certain barriers remain. “To be a data-driven organization, you need an open mind and an experimentation mindset, because a customer-centric culture is premised on innovation and constant change to meet customer needs. A big challenge today is that not everyone in a given organization has a data-driven mindset, although website and product experimentation and personalization are paving the way to its adoption.”
Rémi and Sophie agree that in a data-driven organization, people at every level are empowered to contribute, because it’s data, not experience, that matters. A new hire can propose a test hypothesis just as valuable as one suggested by a CEO. This kind of democratization is happening at Hanna Andersson, a children’s clothing manufacturer where all employees have a voice and are encouraged to submit test ideas. The best ones are acted upon, as in thisAB Tasty case study where a small change in product image led to big impact.
How does a customer-centric culture benefit businesses/brands?
According to research byDeloitte and Touche, customer-centric businesses are 60% more profitable than their product-focused counterparts. Companies that put the customer at the center of their organization enjoy increased customer lifetime value and reduced churn.
“There’s a plethora of concrete benefits, including increased retention, customer loyalty, referrals… Operational efficiency is a major benefit, and it’s fueled by experimentation. This means that we’re not just guessing, but spending our time where it’s most valuable: on meeting real customer needs.
“Then there’s innovation. When we receive customer feedback, whether online or off, the products are iterated upon accordingly. It allows us to be more creative with solutions for customer problems rather than small iterations.”
Rémi adds that there’s also an important internal benefit to being customer-centric. “When your experiments have been successful and you’ve increased customer satisfaction, your clients are happy and so are your teams. That boosts their confidence in the product they’ve developed. It’s very rewarding.”
Sophie enthusiastically agrees: “It rallies everyone around the customer. No matter what role you play in an organization, you can see the benefit of your work.”
For the fourth installment in our series on a data-driven approach to customer-centric marketing, we got together with Filip von Reiche, CTO of Integrated Customer Experiences at Wunderman Thompson, and Gaetan Philippot, Data Scientist at AB Tasty. We discussed the pros and cons of vanity metrics, how they’re different from actionable metrics, and the roles all types of metrics play when measuring a brand’s digital impact.
Let’s begin with digital transformation. What is it, and why have companies been so focused on it over the past few years?
Digital transformation, as defined by Salesforce, is the process of using digital technologies to create new – or modify existing – business processes, culture, and customer experiences to meet changing business and market requirements. It began in the late 20th century and underwent rapid acceleration in the first two decades of the 21st century, spreading across almost all industries.
Resisting digital transformation is risky. TechTarget tells the fateful story of Blockbuster LLC, a once-global entity with video rental stores throughout the US and the world. But its presence and relevance precipitously declined from about 2005, as Netflix harnessed emerging technologies and capitalized on consumer appetite for on-demand entertainment delivered by the then newly-available streaming services.
But digital transformation can also be seen as a buzzword, says Filip, “in the sense that people think it’s something they need to do. The original impetus behind digital transformation was that brands were trying to be more competitive – in how they grew their market share, how they were perceived, and so on. And digital transformation was the engine that enabled them to achieve these things, to react faster, and to be able to measure their impact.
“Initially, it was focused on giving brands an online presence, and of course, it has achieved that, but over time, it has acquired new uses. Its latest purpose is to help brands create personalized experiences by providing them with the right content and flow which allows them to have better conversations with their customers, and that leads to more conversions.”
For Gaetan, “Part of it is imitative: people say ‘Amazon is doing a thousand experiments a year, so we have to do the same,’ but not everyone has the vast resources of Amazon, or can hope for the same results.”
But if the objective is to have personalized brand experiences, Amazon isn’t a website where people want to spend much time. “On the contrary, people go to Amazon because they can get in, buy what they want, and get out fast. It’s totally impersonal,” explains Filip. “However, the reason I spend more time with a brand is because I want a specific product or service they offer, and I expect personalization from brands I’m engaged with.”
For personalization to be successful, there must be constant validation of your perceptions before going live with any website or campaign.“More than half of all campaigns that customers perform using AB Tasty have to do with personalization or experimenting with personalization,” remarks Gaetan.“They’re the foundation on which everything else is built.”
What are the differences between vanity metrics and actionable metrics?
The use of vanity metrics varies across different verticals at different levels and from client to client. The one constant is that vanity metrics are very alluring because they provide what Filip calls “A dopamine rush that lights up your brain – and in some cases, depending on what you’re trying to achieve with your personalization, that ‘rush’ might be sufficient. But ideally, you want to know what the long-range impact will be.”
The problem is that the impact is not always easily attainable. “Let’s take real estate as an example. It’s unfortunately not as simple as the target sees a personalized message, the target clicks, the target purchases a house. Wouldn’t that be great? But in reality, the lapse of time between that initial personalization and the purchase might be 30, 60, 90 days, or even longer. In some cases, you do need a vanity metric such as page likes, favorites, shares, etc., as an indicator to tell you where things are going, but it’s always better to have a conversion metric in the background to tell you what it all really means,” explains Filip.
“This is where more in-depth analytics come into play. If you have a customer who is engaged but not converting, you need to find out what the barrier is and find a way to get around it. If you can propose a solution using personalization that meets the consumer’s needs and knocks down that barrier, great. But you always have to respect the trust the consumer has placed in you by giving you the data you need for personalization. You can’t just pop out and say “Hi! We see you’re looking at our website! That’s creepy. But you can indicate that you, as a brand, are present and listening to your consumers’ needs. It’s a delicate balance.”
Can vanity metrics be transformed into actionable metrics?
It should be emphasized that the use of a “superficial” or vanity metric is always justified when there is a notable response, whether positive or negative, because it may prompt a company to want to dig deeper and analyze further; to do so, they turn to actionable metrics for answers.
Gaetan remarks, “But it’s important to remember that not everything is actionable immediately: sometimes the payoff will be further along. The value of each type of metric varies according to industry and also according to client maturity. For example, e-commerce clients that are just starting out will test all sorts of things before they learn which key metrics are the most useful and offer the best results for their businesses.”
“The entire metric discussion needs to begin as soon as you devise your personalization or testing strategy,” says Filip. “You’ll have a goal in mind: to achieve a certain type of awareness or engagementor a certain number of conversions, etc. Everything you test that you want to use as a measure of success must align with that goal. If a vanity metric can support that goal, then it’s sufficient. If the final conversion is needed to prove my point, then we need to figure out how to get it. Sometimes that can be more complicated and involve offline integrations, but that’s usually how it works.”
What questions should companies ask to find the right metrics to track?
For Filip, a vital question concerns the scope of the project you’re undertaking. Are you measuring an entire campaign or are you breaking it down into individual parts? A high-level scope is easier to measure, meaning fewer metrics are needed, generally speaking. A detailed scope is more complex, as measuring on an individual basis raises questions of how to determine identity, how to relate conversions back to specific individuals, etc., especially when using data from a Customer Data Platform (CDP). But the most fundamental question is: ‘Should I be testing and personalizing my experiences?’ And Filip’s answer is “Hell yes! But there are lots of different paths to take to do these things. One way is to ask a company like Wunderman Thompson to help you in doing analysis, acting as a consultant to show you what’s working and what isn’t, where there are blockages, places for improvement, etc. (Sorry for the sales pitch).
“But if you’d rather appeal to consumers on your own, from a consumer experience point of view, you need to test to discover what the best way is to have a conversation with them. How can you show them you want to help them without being intrusive? It may help companies to think of this in terms of a retail store experience by asking themselves, ‘How do I, as a customer, want to be welcomed, assisted, guided?’ Understanding this is the best way to start their personalization framework.”
How is Customer Lifetime Value measured?
Customer lifetime value (CLTV) is the profit margin a company expects to earn over the entirety of its business relationship with the average customer. A CleverTap article explains further: “Because CLTV is a financial projection, it requires a business to make informed assumptions. For example, in order to calculate CLTV, a business must estimate the value of the average sale, average number of transactions, and the duration of the business relationship with a given customer. Established businesses with historical customer data can more accurately calculate their customer lifetime value.” A bit blunt, but that’s how it works.
A visual example of calculating customer lifetime value using sale, transactions, and retention metrics – all of which can be impacted by experimentation.
Now, where to find this precious historical customer data?
“CDPs play an essential role in measuring CLTV because they can combine data from dozens of sources to retrace a customer’s entire history of interactions with a brand, from their web and mobile experiences to their in-store and support experiences. And with this data, you can measure how long you’ve been engaging with that customer, what the value of that engagement has been, what things you offer that they’re interested in,” says Filip.
“Obviously, if a consumer has been engaging with a particular brand for a very long time, they’re going to expect a certain level of personalization from you. They’re going to expect the warm embrace and friendly conversation you have with someone you’ve known for years, not just the quick hello and small talk you’d offer to someone you just met. And it’s worth offering this level of personalization because the better you know your customers, the longer you can continue your conversation with them, which results in loyalty and retention and hopefully, referrals.”
There are techniques to maximize CLTV, including segmenting, personalization, increasing marketing channels, cross-selling, and up-selling, to mention but a few.
In today’s economy, where the markets are crowded with competitors vying for the same customers, engagement and conversion are crucial to the success of any business.
Watch for the fifth installment in our Customer-Centric Data Series in two weeks!
For the third blog in our series on a data-driven approach to customer-centric marketing, we talked with our partner Matt Wright, Director of Behavioral Science at Widerfunnel, and Alex Anquetil, Manager of North America Customer Success at AB Tasty, who discuss what emotional connection means in a marketing context, why it’s critical for brands to forge emotional connections with their customers, and how data can be used to both build and measure the efficacy of these connections.
What do we mean when we talk about creating an “emotional connection” in a marketing context?
Simply put, emotions are the driving force behind every purchase. People don’t buy from a given brand because they need a product they could easily find elsewhere, but because they feel an affinity, a sense of trust, well-being, or inclusion with or loyalty to that brand.
In such a crowded market, forging deep emotional connections with customers is essential for marketers to attract and retain customers today. Marketers can’t merely “appeal to emotions,” but need to understand their behaviors and motivations and ensure that their missions and messages align with customers’ emotions and needs.
Matt asks to reframe the question, “What’s the role of emotional decision-making in marketing? People build mental models around their emotions, experiences, and cultural associations. They think of some as ‘good’ or ‘bad’… they tie emotion to them. The key for marketers is to understand which emotions resonate with which group of people. And this is where A/B testing can help you find clues as to what works and what doesn’t. Creating strong emotional connections is paramount, and through experimentation, you can create them allthroughout your sales funnel.”
“Our brains have limited bandwidth,” remarks Alex, “so we tend to save our resources for the important things. When we make a simple purchase, we take shortcuts. We grab what’s available from the wheel of our basic emotions– happiness, anger, surprise – to enable us to make quick decisions. If brands can leverage these emotions, whether positive or negative, and align their sales tactics to them, they can create frictionless experiences. The fact that every purchase is emotional is the reason why we don’t have ‘one perfect user interface,’ or ‘one ideal sales funnel:’ every brand, product, and user is different.”
Matt says, “That’s a great analogy. Usability is the foundation, but you need to build upon it. Even if your UI is ugly, in the right circumstances, it will convert. For example, if your website is for a charity, people don’t want you to spend your money on making it look beautiful. They want the money to go to the cause – so they may negatively judge you if you have a digital masterpiece for a website. But if you’re designing for a chic brand, people want it to look and feel exclusive. This is what A/B testing teaches us: it’s not about win or lose, it’s about gathering insights, which I think is often overlooked at a base level of experimentation.”
Why are emotional connections with customers so important for brands?
For Matt, emotion is especially important for positioning. “It’s not something people typically do experimentation around – I wish they did – because the data you can glean from testing things like value propositions or copywriting is extremely valuable for successfully positioning a product. Also, as customers move through their journeys, they’re going to have different emotions at different moments, including doubts, so give them signals to reassure them they’ve made the right decisions. By doing that, you’ll strengthen their loyalty to you.”
Alex thinks that first impressions matter, and if you don’t connect on the first day, you may not get another shot. “People look for meaning in what they buy, even when it’s something as banal as a pack of batteries. Utilitarian products can have ‘the right’ signals attached to them (think of the Energizer bunny, and the tradition and reliability attached to it). No one wants to buy products that have negative connotations. When it comes to clothing or luxury items, these are 100% emotional, and it’s essential for marketers to confer the correct image and status by selling to the right groups (because, of course, there are in-groups and out-groups by the brand’s standards) and by attaching the right emotions and motivators specific to each brand and product.
Should brands create different types of emotional connections for different audiences?
Again, Matt has a preliminary question to reposition how we approach the subject: “Is it worth it to build multiple experiences? The best way to decide is to start small then go deeper, and keep testing until the data leads you to a value proposition. If the data shows you it’s worth it, then build different approaches, yes.”
But Alex, who’s familiar with both the French and US markets, says yes right away. “When looking at short term and long term outcomes, I think there have to be different types of emotional connections for different cultural or geographical audiences. The question is, do you want the emotions to serve sales or marketing at all costs? In other words, do you want your value proposition to associate your brand with specific emotions? When brands expand to new markets, they may require different approaches. For example, certain French luxury brands sell product collections only in France and entirely different ones in the US. With perfume, US customers tend to buy larger bottles, while the French buy smaller ones, due to different cultural priorities andmotivators.
Examples of motivators and leverage:
Source: HBR.org, “THE NEW SCIENCE OF CUSTOMER EMOTIONS,” NOVEMBER 2015, SCOTT MAGIDS, ALAN ZORFAS, AND DANIEL LEEMON
“You can analyze your own market data to find out what your highest-value group is and what their motivators are, then push that to the market and take everyone on your journey, or you can do it the other way around, and make sales your ultimate objective.”
Matt thinks the brand will usually lead and cites the example of Netflix. “There’s a debate going on right now to decide whether, in order to keep growing, Netflix should sell ads. Now, they can probably run an A/B test and find out they’ll make more money if they do sell ads, obviously. But how will that affect their brand image in the short, medium, and long term? They might not lose money, but on an emotional level, they might lose a lot of their historical appeal.
“When dabbling with emotions, it’s not as simple as just an A/B test. When making strategic decisions, experimentation can certainly help incrementally optimize things, but it can do bigger things, including help you make key decisions, better understand your customers, innovate, take risks… Not enough people realize the power of advanced testing. Companies that use them see exponential improvements.”
Talking about experimentation tools, Matt explains: “Early on in the industry, we talked about A/B testing in pretty much only an optimization win-or-lose mindset. And it’s so much more than that. When you make this investment, it’s going to help you make decisions, not just find tiny, incremental bits of revenue for your company. There’s a resourcing problem: conversion rate improvement isn’t the only thing you can do, there’s a huge range of other things you can achieve, and teams need more than a CRO manager to effect the full capabilities. It’s a key competitive differentiator.”
How can data be used to create emotional connections in marketing?
It’s a lot harder to target audiences today due to cookie policy changes and new regulations. But as Matt says (and everyone else agrees), “First-party data will lead to strong positioning and really good ads that connect with users. Because it’s owned by brands, it’s going to be the best quality data for testing hypotheses and segmenting data so brands can offer personalized, exclusive experiences.”
Alex puts it this way: “At the end of the day, you’re still going to be tracking conversions and clicks, so you need to do the groundwork in marketing. It’s more advanced than usability testing. To test for emotions, you have to do some groundwork and some guesswork. You need to know your brand; you need to work with market research. And when you find an emotion aligned with what you want your brand to represent, you need to identify a segment of high-potential customers. Then you find the motivators you associate with that segment, thanks to qualitative research and feedback; then you need to quantify all of that to see if you’re correct. Then you push motivators, measure results, see what boosts efficiency, retention, loyalty, customer lifetime value… and discover whether you’ve got a winning proposition.”
Matt grins: “I wouldn’t call any part of that approach ‘guesswork’. You’re simply combining qualitative with quantitative to come up with better hypotheses for testing. It’s the heart of good experimentation.”
The next installment in our Customer-Centric Data Series will be out in two weeks. Don’t miss it!
For the second blog in our series on a data-driven approach to customer-centric marketing, we talked with our partner Ryan Lucht, Director of Strategy at Cro Metrics, and AB Tasty data expert Hubert Wassner, who explore the evolution of customer needs, why understanding them is so important for brands today, and the role experimentation plays in meeting those needs. Be sure to check out the series introduction and part 1 if you missed those.
What do we mean when we say “customer needs”?
Today’s savvy customers expect a lot from brands: connected journeys, personalization, innovation, data protection. They’re used to seamless online interactions. They want to find products easily. They want a frictionless experience and flexible payment methods. And if a brand doesn’t deliver, they’ll switch to one that does.“It’s a good thing, it’s forced us to become a lot more customer-centric and make our websites easier to use,” says Ryan.
And it pays to meet customer needs and deliver great customer experiences. In a2020 CX poll, 91% of respondents said they were more likely to make a repeat purchase after a positive experience, and 71% said they’d made a purchase decision based on experience quality.
Why is it important for businesses and brands to understand customer needs?
Ryan believes it’s vital to offer customers the right information at the right time. “Customer needs look different at different stages in the customer journey. When you’re running an experimentation program, the order in which information is presented matters. If you dive into the details too soon, you risk overloading or boring your visitors, but if you wait too long to introduce core information like a refund policy, or a piece of your value proposition, you’ll lose business.
“One of our clients, a leading chain of gyms in the United States, is a good example. A crucial part of their strategy is reassuring customers they’ll feel comfortable in their gyms – that their gyms are inclusive, body-positive, and everyone fits in. It’s an important customer need. But on their homepage, the first thing people wanted to find was ‘Where’s the closest gym to me?’ So they shifted to putting location-centric information first, and establishing their brand afterward: It increased subscriptions immediately. It’s bottom-line revenue impact: if you’re not solving for the right needs at the right time, you’re leaving money on the table.”
Hubert adds, “In online business, your competitors are just a click away, so if users get bored on your site, or it takes them too long to find what they need, they’ll simply go elsewhere. The experience you offer needs to be frustration-free. There are growth results to be found at every stage of the purchase journey, so always think competitively, someone else is surely doing it better somewhere.”
Ryan agrees: “When it comes to understanding your customers’ needs, the more you leverage experimentation and advanced forms of data science, the better your competitive advantage. The reason is that customer needs are unique from one brand to another. An established brand can sell itself differently than a niche brand or a newcomer brand.”
Where can the most valuable data about customer needs be accessed?
There are tools that automate data access. “At AB Tasty,” says Hubert, “we’d begin by installing a tag to gather transaction information for developers to see, for example, when users visit sites and when they leave, to measure bounce rate. Agencies can also gather information about products and purchases, as well as metrics like conversion rate, access to the cart page, dates of sale, and how much conversion rates change within engagement levels (see chart). Once you have enough data, you can begin to test different metrics to see which ones increase the engagement levels of different audience segments.”
Does your site adapt to frequent users or does it treat them as new each time?
Ryan likes to ask executives, “Do you think someone coming to your website for the very first time has the same needs as someone coming for the fifth time? And of course, the answer is ‘no.’ But think ‘new’ vs ‘returning’, or traffic sources – people come to the website from different places: email, advertisements, Facebook, TikTok, Google. All these people have very different contexts, and this gives us a hundred different test ideas. When he asks that same group whether or not they treat a new visitor differently than a returning visitor, unfortunately, the answer is still usually ‘no.’ This is just one of the countless opportunities for brands to use even the most basic data points to start to differentiate experiences.
“It can get very complex very quickly. If you’re going to do an AI or an ML model, it’s important to understand which of the variables we’re asking the model to look at are actually correlative and which are just noise.”
How can data from disparate sources be organized/prioritized for testing?
To combine data in order to act on it in a meaningful way, Hubert explains that the most common tools are CRM plug-ins. “If your user is identified, you can automatically gather more first-party information to save along with the customer journey, which can help you to segment for experimentation a posteriori.”
For Ryan, using disparate data sources isn’t just about prioritizing test ideas, it’s about better understanding test outcomes. “I’ve never met anybody who can use data to predict which test ideas will win. If they could, everyone would have staggering win rates, but even when you look at Microsoft, Google and Netflix, only one in ten ideas pans out. The more impactful side of using disparate data sources occurs after you run an experiment, because experiment results are really only one specific learning; it’s how you tie those results back to other sources of customer data or analytics to piece together a narrative you think might be true and how you use those results to inform your next test idea.”
How can experimentation help you gain insight into customer needs?
For Ryan, insight can be gained on two levels. “An individual experiment can tell you whether you’re getting closer to or further away from fulfilling customer needs, based on if they’re doing the thing you hoped they’d do, but there’s also a meta-analysis level, where once you’ve run a lot of experiments, you start to recognize the patterns. For example, every time we emphasize pricing and make it clearer, we get better outcomes, so perhaps one customer need is better price understanding. Individual experiments help us, but we really learn the broader themes once we’ve run a lot of experiments.”
Hubert adds, “In my mind, when you build the test, you have to answer a question, not focus on making an improvement, because maybe you won’t improve anything, but you will learn something, you’ll increase your knowledge, and that’s the real goal.”
How can insights about customer needs contribute to a better customer experience?
“The better and more granularly you understand your customers and their needs, the more it will seem to them that you’re able to read their minds; you’ll please them and win their loyalty and win in the marketplace. The businesses that are doing this – doing a lot of experiments and doing them well – are the businesses that win. It’s as simple as that”, states Ryan.
“When testing, we shouldn’t be asking business questions, but user questions and only then adding business KPIs,” says Hubert. “It’s vital to ask customer-centric questions first, not business-centric ones. This way, you can boost your business while better satisfying your customers.”
Ryan agrees: “It’s a very positive way to frame it. This is how we win with our customers because if we’re better at meeting their needs, everyone wins.”
Want more coverage of data topics? Be sure to come back in two weeks for the third installment in our Customer-Centric Data Series!
For the first blog in our series on the different ways you can utilize data to help brands build a more customer-centric vision, our partner Aimee Bos, VP of Analytics and Data StrategyatZion & Zion, and AB Tasty’s Chief Data Scientist, Hubert Wassner, delve into how experimentation data can help you better understand your customers. They explore the who, what, and when of testing, discuss key customer insight metrics, the importance of audience sizes, where your best ideas for testing are lurking, and more.
Why is experimentation important for understanding customers?
Put simply, experimentation enables brands to “perfect” their products. By improving upon the value that’s already been developed, the customer experience is improved. And each time a new feature or option is added to a product, consistent A/B testing ensures consistent customer reactions. Experimentation operates in a feedback loop with customers, moving beyond conversions or acquisition, improving adoption and retention, eventually making your product indispensable to your customers.
Which key metrics deliver the best insights about customers?
Hubert says, “Basically, the metrics that deliver reliable customer insights are conversion rate and average cart value, segmented on meaningful criteria such as geolocation, or CRM data. But there are others that are interesting, such as revenue per visitor (RPV). It’s a low-value metric but important to monitor.
“And average order value (AOV) is another. This metric will vary enormously over time so it shouldn’t be taken as fact. Seasonality (think Christmas or Black Friday, for example), or even one huge buyer can skew the statistics. It needs to be viewed in multiple contexts to get a better understanding of progress – not just Year over Year but Month over Month and even Week over Week to be effectively computed.
“AOV and RPV are important because their omission can lead to data bias. People often forget to analyze metrics about non-converting visitors. Of course, AOV only gives you data about those who actually make it fully through the purchase cycle.”
And Aimee agrees, “Well, win rate, of course. For e-commerce it’s conversions, value, RPV, how they’re moving the needle, are they increasing the value of the average order? We want as much data as possible at the most granular level possible for lead generation, gated content, and micro-conversions… These smaller tests can be tied to more customer-centric metrics, as opposed to larger business-level metrics such as revenue, growth, number of customers, ROI, etc.”
Where are the best sources for experimentation ideas?
Aimee has her own process. “I start by asking myself what my business objectives are (micro/macro). Then I check Google Analytics and ask myself ‘Where are conversions not happening?’ For experimentation ideas, I check tools like HotJar, voice-of-customer data (VoC), Qualtrics data, see actual customer feedback, user panelists: give them choices, ask what they prefer. Always hypothesize friction points, these will give you your best ideas for testing!”
Hubert likes to get his ideas from NPS scores. “Net promoter score (NPS) has useful information and comments and can be a good starting point for fact-based rather than random hypotheses, which are a waste of time. NPS can add some real focus to well-designed tests. It’s based on a single question: On a scale of 0 to 10, how likely are you to recommend this company’s product or service to a friend or a colleague? NPS is a good way to identify areas that need improvement, but as a signifier of a company’s CX score, it needs to be paired with qualitative insights to understand the context behind the score.”
How do I pull everything together? What do I need to carry out my tests?
Obviously, you need a tool to run your AB tests and collect the data necessary to make good hypotheses, but a good way to add a big boost to your testing program – and help drive more ROI – is with tools likeContentsquare orFullstory which offer more data on customer behavior and experience to focus your testing data. Designed to bridge the gap between the digital experiences companies think they’re offering their customers and what customers are actually getting, analytics platforms can provide real opportunities for useful testing hypotheses by offering more educated guesses about variables for testing to improve CX.
Aimee has an important note about initial data collection, too. “You also need three months of data before you begin testing if you want reliable results, and you need to be sure it’s accurate. Most people rely on Google Analytics (GA). That’s a lot of data to handle and organize. A Customer Data Platform (CDP) represents a significant investment, but centralizing your data in one is extremely useful for customer segmentation and detailed analysis. The sooner you can invest in a tool like a CDP, the better for a sustainable data architecture strategy.
I’m ready to test, but I have several hypotheses. How to begin?
According to Aimee, “when that happens, we break large problems into smaller ones. We have a customer that wants to triple their business and also wants a CDP this year among other goals. It’s a lot! To help them, we build out a customer journey roadmap to see what influences the client’s goals. We select five or six high-level goals (landing page, navigation measured against click-through rate, for example), then test various aspects of each of these goals.”
Hubert notes, “it’s possible to test more than one hypothesis at once if your sample size is big enough. But first, you need to know what the statistical power of your experiment is. Small sample sizes can only detect big effects: it’s important to know the order of magnitude in order to carry out meaningful experiments. It’s always best to test your variables on a large audience, with varied behaviors and needs, in order to get the most reliable and informed results.”
Is there value in running data-gathering experiments (as opposed to improving conversion / driving a specific metric)?
Hubert is a full believer in testing no matter what you think may happen. “Testing is always useful because a good test teaches you something, you learn something, win or lose. As long as you have a hypothesis. For instance, measuring the effect of a (supposed) selling feature (like an ad or sale) is useful. You know how much an ad or a sale costs, but without experimenting you don’t know how much it pays.
“Or say you have a 100% win rate. That means you’re not learning anymore. So you test to gain new information in other areas, you don’t just stand still. You minimize losses to maximize wins.”
At AB Tasty, we think, breathe, eat, drink, and sleep experimentation – all in the name of improving digital experiences. Over the coming months, we’re going to peel back a layer to take a closer look at what’s under the hood of experimentation: the data. Data drives the experimentation cycle – all of the ideation, hypotheses, statistical management, and test-result analysis. It’s no secret that today’s world runs on data, and the development of your digital experience should be no different.
Customers today – whether speaking of a business buyer or an everyday consumer – prioritize experience over other aspects of a brand. Over the coming months, we’ll be talking with some of our partners and data experts at AB Tasty to explore how brands can use data to better profile customers, understand their needs, and forge valuable emotional connections with them, as well as to measure overall digital impact and build a data-based, customer-centric vision.
Before we jump right in, let’s take a moment to center our discussions of data within a privacy-conscious scope.
Every marketer knows that nothing is more precious than customer data, but acquiring it has become increasingly thorny. Europe’s far-reaching General Data Protection Regulation (GDPR), enforced in 2018, was a game-changer, requiring companies to get consent before collecting personal data. The California Consumer Privacy Act (CCPA) soon followed, giving consumers the right, among other things, to opt-out from the sale of their data.
Even if you think your business isn’t subject to such regulations, you might need to consider compliance anyway. E-commerce has erased national borders, allowing goods and services to be purchased with little regard for their origin. The globalization of brands means that an influencer in Pennsylvania who posts about your products could drive Parisian customers to your site, and suddenly you’re collecting data subject to GDPR guidelines – which require customer consent for use.
Leveraging the right customer data
Understanding your customers and their needs and changing behaviors is key to delivering timely, relevant messages that boost loyalty and drive revenue. Whether your company sells yoga mats, yams, or yacht insurance, you need data to enhance their experience with you and strengthen your relationship with them.
But how can you leverage the data you need while ensuring your customers continue to trust you? In recent years, consumers have grown skeptical of handing over their personal data. According to a 2021survey by KMPG, 86% of consumers questioned said they feel a growing concern about data privacy. And they should be: the same survey showed that 62% of business leaders felt that their companies should do more to protect customer data.
Thanks to the well-deserved death of third-party cookies, marketers are now seeking the data they need by forging consent-driven first-party relationships with their audiences. While this is a step in the right direction, data privacy needs to go further.
Enhancing brand value through consent- and privacy-oriented processes
Consumers are more likely to buy from companies with transparent privacy practices that clearly explain how personal data is collected, used, and stored. Giving or withholding consent for the use of their data should be effortless, and if requested, customers should know that brands will not only delete all the data they’ve stored, but also remove any access privileges they may have granted to partners or third parties.
By making consent and preferences easily manageable, a multitude of data can be shared at every customer touchpoint, revealing customer behaviors, preferences, attitudes, and values. To deal with this omnichannel data, a Consent Management Platform (CMP) can help you collect and handle personal information in a privacy-first way. A CMP enables you to maintain consent logs across all customer-facing channels, ensuring that the personal data processing is always in line with the data subject’s preferences, adding an ethical dimension to the customer experience.
Ethical handling of customer data is mission-critical if brands are to succeed today. From big tech to retail, companies of every stripe are taking an ethical and privacy-centered approach to data, because, as anarticle in the Harvard Business Review aptly put it, “Privacy is to the digital age what product safety was to the Industrial Age.”
Customer data can help you deliver relevant, personalized, and innovative experiences.
It can build your brand by generating new leads, predicting sales and marketing trends, and enabling you to create the personalized messages that customers love. But unless your data is protected and unbreachable, your customer base is at risk.
At AB Tasty, we’re actively committed to ensuring compliance with all relevant privacy regulations and to being entirely transparent with our users with regard to the consensual first-party and impersonal statistical data we collect when they visit our site. We strive to ensure that our partner agencies and SMEs take accountability and responsibility for the use of their customers’ personal data and respond rapidly should customers want to opt-out or be forgotten.
In this series of articles, we’ll be looking at using data to get value from anonymous visitors, using experimentation to discover customer needs, creating emotional connections to customers with data, and using data to measure your digital impact – all of this featuring data experts from the industry to guide us on our journey. See you soon!
The debate about the best way to interpret test results is becoming increasingly relevant in the world of conversion rate optimization.
Torn between two inferential statistical methods (Bayesian vs. frequentist), the debate over which is the “best” is fierce. At AB Tasty, we’ve carefully studied both of these approaches and there is only one winner for us.
There are a lot of discussions regarding the optimal statistical method: Bayesian vs. frequentist (Source)
But first, let’s dive in and explore the logic behind each method and the main differences and advantages that each one offers. In this article, we’ll go over:
[toc]
What is hypothesis testing?
The statistical hypothesis testing framework in digital experimentation can be expressed as two opposite hypotheses:
H0 states that there is no difference between the treatment and the original, meaning the treatment has no effect on the measured KPI.
H1 states that there is a difference between the treatment and the original, meaning that the treatment has an effect on the measured KPI.
The goal is to compute indicators that will help you make the decision of whether to keep or discard the treatment (a variation, in the context of AB Tasty) based on the experimental data. We first determine the number of visitors to test, collect the data, and then check whether the variation performed better than the original.
There are two hypotheses in the statistical hypothesis framework (Source)
Essentially, there are two approaches to statistical hypothesis testing:
Frequentist approach: Comparing the data to a model.
Bayesian approach: Comparing two models (that are built from data).
From the first moment, AB Tasty chose the Bayesian approach for conducting our current reporting and experimentation efforts.
What is the frequentist approach?
In this approach, we will build a model Ma for the original (A) that will give the probability Pto see some data Da. It is a function of the data:
Ma(Da) = p
Then we can compute a p-value, Pv, from Ma(Db), which is the probability to see the data measured on variation B if it was produced by the original (A).
Intuitively, if Pv is high, this means that the data measured on B could also have been produced by A (supporting hypothesis H0). On the other hand, if Pv is low, this means that there are very few chances that the data measured on B could have been produced by A (supporting hypothesis H1).
A widely used threshold for Pv is 0.05. This is equivalent to considering that, for the variation to have had an effect, there must be less than a 5% chance that the data measured on B could have been produced by A.
This approach’s main advantage is that you only need to model A. This is interesting because it is the original variation, and the original exists for a longer time than B. So it would make sense to believe you could collect data on A for a long time in order to build an accurate model from this data. Sadly, the KPI we monitor is rarely stationary: Transactions or click rates are highly variable over time, which is why you need to build the model Ma and collect the data on B during the same period to produce a valid comparison. Clearly, this advantage doesn’t apply to a digital experimentation context.
This approach is called frequentist, as it measures how frequently specific data is likely to occur given a known model.
It is important to note that, as we have seen above, this approach does not compare the two processes.
Note: since p-value are not intuitive, they are often changed into probability like this:
p = 1-Pvalue
And wrongly presented as the probability that H1 is true (meaning a difference between A & B exists). In fact, it is the probability that the data collected on B was not produced by process A.
What is the Bayesian approach (used at AB Tasty)?
In this approach, we will build two models, Ma and Mb (one for each variation), and compare them. These models, which are built from experimental data, produce random samples corresponding to each process, A and B. We use these models to produce samples of possible rates and compute the difference between these rates in order to estimate the distribution of the difference between the two processes.
Contrary to the first approach, this one does compare two models. It is referred to as the Bayesian approach or method.
Now, we need to build a model for A and B.
Clicks can be represented as binomial distributions, whose parameters are the number of tries and a success rate. In the digital experimentation field, the number of tries is the number of visitors and the success rate is the click or transaction rate. In this case, it is important to note that the rates we are dealing with are only estimates on a limited number of visitors. To model this limited accuracy, we use beta distributions (which are the conjugate prior of binomial distributions).
These distributions model the likelihood of a success rate measured on a limited number of trials.
Let’s take an example:
1,000 visitors on A with 100 success
1,000 visitors on B with 130 success
We build the model Ma = beta(1+success_a,1+failures_a) where success_a = 100 & failures_a = visitors_a – success_a =900.
You may have noticed a +1 for success and failure parameters. This comes from what is called a “prior” in Bayesian analysis. A prior is something you know before the experiment; for example, something derived from another (previous) experiment. In digital experimentation, however, it is well documented that click rates are not stationary and may change depending on the time of the day or the season. As a consequence, this is not something we can use in practice; and the corresponding prior setting, +1, is simply a flat (or non-informative) prior, as you have no previous usable experiment data to draw from.
For the three following graphs, the horizontal axis is the click rate while the vertical axis is the likelihood of that rate knowing that we had an experiment with 100 successes in 1,000 trials.
(Source: AB Tasty)
What usually occurs here is that 10% is the most likely, 5% or 15% are very unlikely, and 11% is half as likely as 10%.
The model Mb is built the same way with data from experiment B:
Mb= beta(1+100,1+870)
(Source: AB Tasty)
For B, the most likely rate is 13%, and the width of the curve’s shape is close to the previous curve.
Then we compare A and B rate distributions.
Blue is for A and orange is for B (Source: AB Tasty)
We see an overlapping area, 12% conversion rate, where both models have the same likelihood. To estimate the overlapping region, we need to sample from both models to compare them.
We draw samples from distribution A and B:
s_a[i] is the i th sample from A
s_b[i] is the i th sample from B
Then we apply a comparison function to these samples:
the relative gain: g[i] =100* (s_b[i] – s_a[i])/s_a[i]for all i.
It is the difference between the possible rates for A and B, relative to A (multiplied by 100 for readability in %).
We can now analyze the samples g[i] with a histogram:
The horizontal axis is the relative gain, and the vertical axis is the likelihood of this gain (Source: AB Tasty)
We see that the most likely value for the gain is around 30%.
The yellow line shows where the gain is 0, meaning no difference between A and B. Samples that are below this line correspond to cases where A > B, samples on the other side are cases where A < B.
We then define the gain probability as:
GP = (number of samples > 0) / total number of samples
With 1,000,000 (10^6) samples for g, we have 982,296 samples that are >0, making B>A ~98% probable.
We call this the “chances to win” or the “gain probability” (the probability that you will win something).
The gain probability is shown here (see the red rectangle) in the report:
(Source: AB Tasty)
Using the same sampling method, we can compute classic analysis metrics like the mean, the median, percentiles, etc.
Looking back at the previous chart, the vertical red lines indicate where most of the blue area is, intuitively which gain values are the most likely.
We have chosen to expose a best- and worst-case scenario with a 95% confidence interval. It excludes 2.5% of extreme best and worst cases, leaving out a total of 5% of what we consider rare events. This interval is delimited by the red lines on the graph. We consider that the real gain (as if we had an infinite number of visitors to measure it) lies somewhere in this interval 95% of the time.
In our example, this interval is [1.80%; 29.79%; 66.15%], meaning that it is quite unlikely that the real gain is below 1.8 %, and it is also quite unlikely that the gain is more than 66.15%. And there is an equal chance that the real rate is above or under the median, 29.79%.
The confidence interval is shown here (in the red rectangle) in the report (on another experiment):
(Source: AB Tasty)
What are “priors” for the Bayesian approach?
Bayesian frameworks use the term “prior” to refer to the information you have before the experiment. For instance, a common piece of knowledge tells us that e-commerce transaction rate is mostly under 10%.
It would have been very interesting to incorporate this, but these assumptions are hard to make in practice due to the seasonality of data having a huge impact on click rates. In fact, it is the main reason why we do data collection on A and B at the same time. Most of the time, we already have data from A before the experiment, but we know that click rates change over time, so we need to collect click rates at the same time on all variations for a valid comparison.
It follows that we have to use a flat prior, meaning that the only thing we know before the experiment is that rates are in [0%, 100%], and that we have no idea what the gain might be. This is the same assumption as the frequentist approach, even if it is not formulated.
Challenges in statistics testing
As with any testing approach, the goal is to eliminate errors. There are two types of errors that you should avoid:
False positive (FP): When you pick a winning variation that is not actually the best-performing variation.
False negative (FN): When you miss a winner. Either you declare no winner or declare the wrong winner at the end of the experiment.
Performance on both these measures depends on the threshold used (p-value or gain probability), which depends, in turn, on the context of the experiment. It’s up to the user to decide.
Another important parameter is the number of visitors used in the experiment, since this has a strong impact on the false negative errors.
From a business perspective, the false negative is an opportunity missed. Mitigating false negative errors is all about the size of the population allocated to the test: basically, throwing more visitors at the problem.
The main problem then is false positives, which mainly occur in two situations:
Very early in the experiment: Before reaching the targeted sample size, when the gain probability goes higher than 95%. Some users can be too impatient and draw conclusions too quickly without enough data; the same occurs with false positives.
Late in the experiment: When the targeted sample size is reached, but no significant winner is found. Some users believe in their hypothesis too much and want to give it another chance.
Both of these problems can be eliminated by strictly respecting the testing protocol: Setting a test period with a sample size calculator and sticking with it.
At AB Tasty, we provide a visual checkmark called “readiness” that tells you whether you respect the protocol (a period that lasts a minimum of 2 weeks and has at least 5,000 visitors). Any decision outside these guidelines should respect the rules outlined in the next section to limit the risk of false positive results.
This screenshot shows how the user is informed as to whether they can take action.
(Source: AB Tasty)
Looking at the report during the data collection period (without the “reliability” checkmark) should be limited to checking that the collection is correct and to check for extreme cases that require emergency action, but not for a business decision.
When should you finalize your experiment?
Early stopping
“Early stopping” is when a user wants to stop a test before reaching the allocated number of visitors.
A user should wait for the campaign to reach at least 1,000 visitors and only stop if a very big loss is observed.
If a user wants to stop early for a supposed winner, they should wait at least two weeks, and only use full weeks of data. This tactic is interesting if and when the business cost of a false positive is okay, since it is more likely that the performance of the supposed winner would be close to the original, rather than a loss.
Again, if this risk is acceptable from a business strategy perspective, then this tactic makes sense.
If a user sees a winner (with a high gain probability) at the beginning of a test, they should ensure a margin for the worst-case scenario. A lower bound on the gain that is near or below 0% has the potential to evolve and end up below or far below zero by the end of a test, undermining the perceived high gain probability at its beginning. Avoiding stopping early with a low left confidence bound will help rule out false positives at the beginning of a test.
For instance, a situation with a gain probability of 95% and a confidence interval like [-5.16%; 36.48%; 98.02%] is a characteristic of early stopping. The gain probability is above the accepted standard, so one might be willing to push 100% of the traffic to the winning variation. However, the worst-case scenario (-5.16%) is relatively far below 0%. This indicates a possible false positive — and, at any rate, is a risky bet with a worst scenario that loses 5% of conversions. It is better to wait until the lower bound of the confidence interval is at least >0%, and a little margin on top would be even safer.
Late stopping
“Late stopping” is when, at the end of a test, without finding a significant winner, a user decides to let the test run longer than planned. Their hypothesis is that the gain is smaller than expected and needs more visitors to reach significance.
When deciding whether to extend the life of a test, not following the protocol, one should consider the confidence interval more than the gain probability.
If the user wants to test longer than planned, we advise to only extend very promising tests. This means having a high best-scenario value (the right bound of the gain confidence interval should be high).
For instance, this scenario: gain probability at 99% and confidence interval at [0.42 %; 3.91%] is typical of a test that shouldn’t be extended past its planned duration: A great gain probability, but not a high best-case scenario (only 3.91%).
Consider that with more samples, the confidence interval will shrink. This means that if there is indeed a winner at the end, its best-case scenario will probably be smaller than 3.91%. So is it really worth it? Our advice is to go back to the sample size calculator and see how many visitors will be needed to achieve such accuracy.
Note: These numerical examples come from a simulation of A/A tests, selecting the failed ones.
Confidence intervals are the solution
Using the confidence interval instead of only looking at the gain probability will strongly help improve decision-making. Not to mention that even outside of the problem of false positives, it’s important for the business. All variations need to meet the cost of its implementation in production. One should keep in mind that the original is already there and has no additional cost, so there is always an implicit and practical bias toward the original.
Any optimization strategy should have a minimal threshold on the size of the gain.
Another type of problem may arise when testing more than two variations, known as the multiple comparison problem. In this case, a Holm-Bonferroni correction is applied.
Why AB Tasty chose the Bayesian approach
Wrapping up, which is better: the Bayesian vs. frequentist method?
As we’ve seen in the article, both are perfectly sound statistical methods. AB Tasty chose the Bayesian statistical model for the following reasons:
Using a probability index that corresponds better to what the users think, and not a p-value or a disguised one;
Providing confidence intervals for more informed business decisions (not all winners are really interesting to push in production.). It’s also a means to mitigate false positive errors.
At the end of the day, it makes sense that the frequentist method was originally adopted by so many companies when it first came into play. After all, it’s an off-the-shelf solution that’s easy to code and can be easily found in any statistics library (this is a particularly relevant benefit, seeing as how most developers aren’t statisticians).
Nonetheless, even though it was a great resource when it was introduced into the experimentation field, there are better options now — namely, the Bayesian method. It all boils down to what each option offers you: While the frequentist method shows whether there’s a difference between A and B, the Bayesian one actually takes this a step further by calculating what the difference is.
To sum up, when you’re conducting an experiment, you already have the values for A and B. Now, you’re looking to find what you will gain if you change from A to B, something which is best answered by a Bayesian test.
Statistical significance is a powerful yet often underutilized digital marketing tool.
A concept that is theoretical and practical in equal measures, you can use statistical significance models to optimize many of your business’s core marketing activities (A/B testing included).
A/B testing is integral to improving the user experience (UX) of a consumer-facing touchpoint (a landing page, checkout process, mobile application, etc.) and increasing its performance while encouraging conversions.
By creating two versions of a particular marketing asset, both with slightly different functions or elements, and analyzing their performance, it’s possible to develop an optimized landing page, email, web app, etc. that yields the best results. This methodology is also referred to as two-sample hypothesis testing.
When it comes to success in A/B testing, statistical significance plays an important role. In this article, we will explore the concept in more detail and consider how statistical significance can enhance the A/B testing process.
But before we do that, let’s look at the meaning of statistical significance.
What is statistical significance and why does it matter?
According to Investopedia, statistical significance is defined as:
“The claim that a result from data generated by testing or experimentation is not likely to occur randomly or by chance but is instead likely to be attributable to a specific cause.”
In that sense, statistical significance will bestow you with the tools to drill down into a specific cause, thereby making informed decisions that are likely to benefit the business. In essence, it’s the opposite of shooting in the dark.
Make informed decisions with testing and experimentation
Calculating statistical significance
To calculate statistical significance accurately, most people use Pearson’s chi-squared test or distribution.
Invented by Karl Pearson, the chi (which represents ‘x’ in Greek)-squared test commands that users square their data to highlight possible variables.
This methodology is based on whole numbers. For instance, chi-squared is often used to test marketing conversions—a clear-cut scenario where users either take the desired action or they don’t.
In a digital marketing context, people apply Pearson’s chi-squared method using the following formula:
Statistically significant = Probability (p) < Threshold (ɑ)
Based on this notion, a test or experiment is viewed as statistically significant if the probability (p) turns out lower than the appointed threshold (a), also referred to as the alpha. In plainer terms, a test will prove statistically significant if there is a low probability that a result has happened by chance.
Statistical significance is important because applying it to your marketing efforts will give you confidence that the adjustments you make to a campaign, website, or application will have a positive impact on engagement, conversion rates, and other key metrics.
Essentially, statistically significant results aren’t based on chance and depend on two primary variables: sample size and effect size.
Statistical significance and digital marketing
At this point, it’s likely that you have a grasp of the role that statistical significance plays in digital marketing.
Without validating your data or giving your discoveries credibility, you will probably have to take promotional actions that offer very little value or return on investment (ROI), particularly when it comes to A/B testing.
Despite the wealth of data available in the digital age, many marketers are still making decisions based on their gut.
While the shooting in the dim light approach may yield positive results on occasion, to create campaigns or assets that resonate with your audience on a meaningful level, making intelligent decisions based on watertight insights is crucial.
That said, when conducting tests or experiments based on key elements of your digital marketing activities, taking a methodical approach will ensure that every move you make offers genuine value, and statistical significance will help you do so.
Using statistical significance for A/B testing
Now we move on to A/B testing, or more specifically, how you can use statistical significance techniques to enhance your A/B testing efforts.
Testing uses
Before we consider its practical applications, let’s consider what A/B tests you can run using statistical significance:
Emails clicks, open rates, and engagements
Landing page conversion rates
Notification responses
Push notification conversions
Customer reactions and browsing behaviors
Product launch reactions
Website calls to action (CTAs)
The statistical steps
To conduct successful A/B tests using statistical significance (the chi-squared test), you should follow these definitive steps:
1. Set a null hypothesis
The idea of the null hypothesis is that it won’t return any significant results. For example, a null hypothesis might be that there is no affirmative evidence to suggest that your audience prefers your new checkout journey to the original checkout journey. Such a hypothesis or statement will be used as an anchor or a benchmark.
2. Create an alternative theory or hypothesis
Once you’ve set your null hypothesis, you should create an alternative theory, one that you’re looking to prove, definitively. In this context, the alternative statement could be: our audience does favor our new checkout journey.
3. Set your testing threshold
With your hypotheses in place, you should set a percentage threshold (the (a) or alpha) that will dictate the validity of your theory. The lower you set the threshold—or (a)—the stricter the test will be. If your test is based on a wider asset such as an entire landing page, then you might set a higher threshold than if you’re analyzing a very specific metric or element like a CTA button, for instance.
For conclusive results, it’s imperative to set your threshold prior to running your A/B test or experiment.
4. Run your A/B test
With your theories and threshold in place, it’s time to run the A/B test. In this example, you would run two versions (A and B) of your checkout journey and document the results.
Here you might compare cart abandonment and conversion rates to see which version has performed better. If checkout journey B (the newer version) has outperformed the original (version A), then your alternative theory or hypothesis will be proved correct.
5. Apply the chi-squared method
Armed with your discoveries, you will be able to apply the chi-squared test to determine whether the actual results differ from the expected results.
To help you apply chi-squared calculations to your A/B test results, here’s a video tutorial for your reference:
By applying chi-squared calculations to your results, you will be able to determine if the outcome is statistically significant (if your (p) value is lower than your (a) value), thereby gaining confidence in your decisions, activities, or initiatives.
6. Put theory into action
If you’ve arrived at a statistically significant result, then you should feel confident transforming theory into practice.
In this particular example, if our checkout journey theory shows a statistically significant relationship, then you would make the informed decision to launch the new version (version B) to your entire consumer base or population, rather than certain segments of your audience.
If your results are not labelled as statistically significant, then you would run another A/B test using a bigger sample.
At first, running statistical significance experiments can prove challenging, but there are free online calculation tools that can help to simplify your efforts.
Statistical significance and A/B testing: what to avoid
While it’s important to understand how to apply statistical significance to your A/B tests effectively, knowing what to avoid is equally vital.
Here is a rundown of common A/B testing mistakes to ensure that you run your experiments and calculations successfully:
Unnecessary usage: If your marketing initiatives or activities are low cost or reversible, then you needn’t apply strategic significance to your A/B tests as this will ultimately cost you time. If you’re testing something irreversible or which requires a definitive answer, then you should apply chi-squared testing.
Lack of adjustments or comparisons: When applying statistical significance to A/B testing, you should allow for multiple variations or multiple comparisons. Failing to do so will either throw off or narrow your results, rendering them unusable in some instances.
Creating biases: When conducting A/B tests of this type, it’s common to apply biases to your experiments unwittingly—the kind of which that don’t consider the population or consumer base as a whole.
To avoid doing this, you must examine your test with a fine-tooth comb before launch to ensure that there aren’t any variables that could push or pull your results in the wrong direction. For example, is your test skewed towards a specific geographical region or narrow user demographic? If so, it might be time to make adjustments.
Statistical significance plays a pivotal role in A/B testing and, if handled correctly, will offer a level of insight that can help catalyze business success across industries.
While you shouldn’t rely on statistical significance for insight or validation, it’s certainly a tool that you should have in your digital marketing toolkit.
We hope that this guide has given you all you need to get started with statistical significance. If you have any wisdom to share, please do so by leaving a comment.
Note: This article was written by Hubert Wassner, Chief Data Scientist at AB Tasty.
Some of you may have noticed Google’s recent release of a free version of Google Optimize and asked yourselves if it will change the market for SaaS A/B testing tools, such as AB Tasty?
Well, history tells us that when Google enters a market, the effects are often disruptive – especially when the tool is free, like with Google Analytics or Google Tag Manager. To be clear, this new offer will be a free version of Google Optimise, with the premium version starting at around $150,000 per year. Also, note that neither the free nor the paid-for version of Google Optimize offer multi-page testing (i.e. test consistency across a funnel for example) and that Google Optimise is not compatible with native applications.
Before going any further, a disclaimer: I’m the chief data scientist at AB Tasty, the leading European solution for A/B testing and personalization and, therefore, in direct competition with Google Optimize. Nevertheless, I’ll do my best to be fair in the following comparison. I’m not going to list and compare all features offered by the two tools. Rather, I’d like to focus on the data side of things – I’m a data scientist after all..!
Let’s dig into it:
To me, Google Optimize’s first and main limitation is that it is based on Google Analytics’ infrastructure and thus doesn’t take the notion of visitor unicity into account. Google looks at sessions. By default, a session duration is fixed to 30 minutes and can be extended to up to 4 hours only. This means that if a visitor visits a website twice with one day between, or visits first in the morning and a second time in the evening, Google Analytics will log 2 different visitors.
This way of counting has two immediate consequences:
Conversion rates are much lower than they should be. Perhaps, a little annoying, but we can deal with it
Gains are much more difficult to measure. Now, this is a real issue!
Let’s have a closer look…
Conversion rates are much lower
People will normally visit a website several times before converting. For one conversion, Google Analytics (and by extension Google Optimize) records several different sessions. Only the visit during which the visitor converted is recorded as a ‘success’. All the others are considered ‘failures’. Consequently, the success rate is lower as the denominator grows. For Google, conversion rate is based on visits instead of visitors.
You can put up with this limitation if you make decisions based on relative values instead of absolute values. After all, the objective of testing is first and foremost to gauge the difference, whatever the exact value. The Bayesian model for statistics used by Google Optimize (and AB Tasty) does this very well.
Say 100 visitors saw each variation, 10 converted on A and 15 on B.
Based on these hypotheses, variation A is 14% more likely to be best. The rate reaches 86% for variation B.
Now say that the above conversions occur after 2 visits on average. It doubles the number of trials and simulates a conversion rate by session instead of visitor.
Results are very similar as there is just a 1% difference between the two experiments. So, if the goal is to see if there is a significant difference between two variations (but not the size of the difference), then taking the session as reference value works just fine.
NB: This conclusion stays true as long as the number of visits per unique visitor is stable across all variations – which is not certain.
It’s impossible to measure confidence intervals for gain with the session approach
Confidence intervals for gain are crucial when interpreting results and in making sound decisions. They predict worst and best case scenarios that could occur once changes are no longer in a test environment.
See results below for the same sample as previously:
100 visits, 10 successes on variation A
100 visits, 15 successes on variation B
This curve shows the probability distribution of the real value of the gain linked to variation B.
The 95% confidence interval is [ – 0.05; +0.15 ], which means that with a 95% confidence rate, the actual value of the gain is above -0.05 and below +0.15.
The interval being globally positive, we can draw the same conclusion as previously: B is probably the winning variation but there are doubts.
Now let’s say that there are 2 visits before conversion on average. The number of trials is doubled, like previously – this is the kind of data Google Optimize would have.
Here is the curve showing the probability distribution of the real value of the gain.
This distribution is much narrower than the other, and the confidence interval is much smaller: [ – 0.025; 0.08 ]. It gives the impression that it’s more precise – but as the sample is the exact same, it’s not! The bigger the number of sessions before conversion, the more striking this effect would be.
The root of the problem is that the number of sessions for a unique visitor is unknown and varies between segments, business models and industries. Calculating a confidence interval is, therefore, impossible – although it’s essential we draw accurate conclusions.
To conclude, the session-based approach promises to identify which variation is best but doesn’t help estimate gain. To me, this is highly limiting.
Then, why has Google made this (bad) choice?
To track a visitor over multiple sessions, Google would have to store the information server-side, and it would represent a huge amount of data. Given that Google Analytics is free, it is very likely that they try to save as much storage space as they can. Google Optimize is based on Google Analytics, so it’s no surprise they made the same decision for Google Optimize. We shouldn’t expect this to change anytime soon.
I’d say Google Optimize is very likely to gain substantial market share with small websites. Just as they chose Google Analytics, they will go for Google Optimize and gratuity. More mature websites tend to see conversion rate optimization as a game changer and generally prefer technology that can provide more accuracy – results based on unique visitors, real customers.
Overall, the introduction of Google Optimize represents a great opportunity for the market as a whole. As the tool is free, it will likely speed up awareness and optimization skills across the digital industry. Perhaps even the general understanding of statistics will increase! As marketers put tests in place and realize results don’t always follow outside the testing environment, they may very well look for more advanced and precise solutions.