According to research by UserTesting, up to 80% of A/B Tests fail to produce statistically significant results. This means that only about 20% of A/B Tests produce statistically significant results.
Say you’re a chef trying out a new recipe. The first time you make it, it turns out perfectly delicious, just as you hoped. However, if you only cook it once, can you say it’s a foolproof recipe? What if the next three times you make it, it’s a disaster?
That’s where statistical significance comes into play in CRO. A single ‘successful’ test doesn’t give you the complete picture. Just like you’d need to cook that dish multiple times to confirm it’s a winner, you need statistically significant results to confidently say your CRO changes are effective.
What Does “Statistically Significant” Mean?
Statistical significance is a measure that quantifies the likelihood that the results you observe in your CRO tests are not due to random chance.
In simpler terms, it helps you determine whether the changes you made to your website—like altering a call-to-action button or revising product descriptions—have genuinely improved conversions or if you’re just seeing a temporary spike or dip.
In the context of Conversion Rate Optimization, achieving statistical significance means that you can be confident that your test results are reliable and repeatable. It’s not a one-off or a fluke; it’s a change that, when implemented, will likely continue to deliver improved performance.
Understanding and considering statistical significance is not just a best practice; it’s necessary for anyone serious about optimizing their website’s performance. It’s the difference between making informed decisions and shooting in the dark.
Frequentist vs. Bayesian Approach To Analyzing CRO Results
When it comes to analyzing CRO results, two primary methodologies are often discussed: the Frequentist and Bayesian approaches. While both aim to provide a framework for making data-driven decisions, their underlying philosophies and methods differ.
The Frequentist approach is what many people first encounter in introductory statistics courses. It relies on fixed sample sizes and employs reject/accept hypothesis testing. In this method, you set up null and alternative hypotheses.
You then collect data and use statistical tests to reject or accept the null hypothesis based on a predetermined significance level, usually 95% or 99%.
Let’s say you’re testing two headlines for your product page to see which drives more clicks. You decide to run the test until you have data from 1,000 visitors for each headline. At the end of the test, you use a statistical formula to determine if one headline is significantly better than the other.
This approach is straightforward but rigid. You set your sample size and significance level beforehand (often 95% or 99%), and the test concludes once those criteria are met. There’s no room for adapting as you go along.
On the other hand, the Bayesian approach takes a more dynamic view of probability. Instead of fixed sample sizes and rigid hypothesis testing, this method updates the probability estimate for the hypothesis as more data becomes available.
Let’s say you’re testing the same two headlines, but this time using the Bayesian approach. After the first 100 visitors, you notice that one headline is performing better. The Bayesian method updates the probability that this headline is more effective than just benefiting from early random variations. As more data comes in, these probability estimates continue to update.
This approach is more flexible and adapts to new information. It’s beneficial in real-world scenarios where you’re continuously collecting data and want to make decisions before reaching a fixed sample size.
Why Does This Matter?
Understanding the differences between these two approaches is important for anyone involved in CRO. With its fixed sample sizes, the Frequentist approach may be more straightforward to implement but can be rigid. The Bayesian approach offers more flexibility but requires a deeper understanding of statistical models.
Choosing the right approach depends on your needs, the nature of your tests, and the available resources.
When you understand the nuances of these methodologies, you’ll be better equipped to interpret your CRO results accurately and make more informed decisions.
How to Determine if Your CRO Results Are Statistically Significant
First things first, you need data—enough of it to make a reliable conclusion. Think of this as gathering ingredients for a recipe; skimping on quality or quantity will affect the final dish.
In CRO terms, this means tracking conversions, user behavior, and other relevant metrics over a sufficient period. The more data you have, the more confident you can be in your results.
Tools like Google Analytics can be invaluable here. Make sure to track the metric you’re trying to improve (like conversion rate) and any other metrics that could be affected (like bounce rate or average time on page).
Set Up Hypotheses
Once you have your data, you can set up your hypotheses. You’ll need a null hypothesis, which is your baseline assumption that any changes you make won’t have an effect.
For example, “Changing the call-to-action button color will not affect conversion rates.” Then, you’ll also have an alternative hypothesis you want to prove, like “Changing the call-to-action button color will increase conversion rates.”
These hypotheses give you a clear framework for what you’re testing.
Choose A Significance Level
Before running the test, decide on a significance level, usually denoted by alpha (α). Common choices are 0.05 for 95% confidence or 0.01 for 99% confidence. This level will serve as the threshold for determining whether your results are statistically significant.
Selecting a significance level is essentially choosing how much risk you’re willing to accept. A significance level of 0.05 means you’re willing to accept a 5% chance of making a Type I error, which is rejecting a true null hypothesis.
On the flip side, a significance level of 0.01 means you’re only willing to accept a 1% chance of making that error, making your test more stringent.
It’s crucial to set this level before you start your test. Changing your significance level mid-way through testing can skew your results and undermine the integrity of the entire process.
Run Tests and interpret P-Value
Now comes the nitty-gritty: performing statistical tests to evaluate your hypotheses. You can use various tests, like the Chi-Square test for categorical data or the T-test for continuous data.
You’ll compare the performance of your control group (no changes) to your test group (with the changes). There are plenty of software tools designed to run these tests for you. FigPii Optimizely, VWO, and even some advanced features in Google Analytics can handle the heavy lifting.
You input your data—how many people visited each version of your page, how many converted, etc.—and the software will run the test and spit out a p-value for you.
The p-value is a decimal that represents the probability that the results occurred by random chance.
Finally, you’ll need to interpret what these numbers mean. A low p-value (typically below 0.05) indicates that your results are statistically significant, meaning it’s very unlikely the changes in conversion rate happened by chance.
On the other hand, a high p-value suggests that you can’t be confident the changes had a real impact. But remember, “statistically significant” doesn’t always mean “practically significant.”
Even if the numbers say your new button color increases conversions, if the increase is only 0.01%, it might not be worth the resources to implement the change.
Why is Statistical Significance Important?
Here are reasons why statistical significance is a cornerstone of any successful conversion optimization strategy.
Without statistical significance, you risk making business decisions based on unreliable data. This could lead to wasted resources and a decline in conversions.
Knowing which changes are statistically significant allows you to allocate your resources more effectively. You can focus on implementing changes likely to have a lasting positive impact.
When you base your decisions on statistically significant results, it adds credibility to your CRO efforts. Stakeholders and team members are more likely to trust and invest in your optimization strategies.
Achieving statistical significance is akin to building on a solid foundation. It ensures that the improvements you see are likely to be sustainable over the long term rather than being short-lived anomalies.
Interview With A CRO Expert
We had the opportunity to sit down with a seasoned CRO expert, Andrew Marshall, to delve deeper into the intricacies of Conversion Rate Optimization. Here are some key takeaways that can help you navigate the complex landscape of CRO more effectively.
Ensuring Statistical Significance in Quick Decisions
Question: In a fast-paced business environment where quick decisions are often required, how do you ensure that your A/B tests reach a level of statistical significance that you’re comfortable with?
Andrew’s Answer: We use test calculators to pre-determine how long tests will need to run based on a client’s amount of traffic and conversions. While the test is running, A/B testing platforms provide indicators for current confidence levels. Ultimately, it’s up to you and your team to stay accountable to the data rather than making decisions based on instinct.
Advanced Statistical Methods
Question: Beyond basic p-value calculations and confidence intervals, what advanced statistical methods or tools do you employ?
Andrew’s Answer: Setting up additional goals and subgoals helps us understand user interactions with our experiments and subsequent pages. This ensures that not just one metric but overall conversions are increasing as well.
Seasonal Trends and External Events
Question: How do you account for seasonal trends, holidays, and external events that can skew A/B test results?
Andrew’s Answer: Testing during events like Black Friday has become easier in recent years. The buying period has stretched out, making the data less anomalous, especially if you’re consistently running other promotions throughout the year.
Question: Could you share an example where an A/B test yielded statistically significant results but didn’t translate into practical significance?
Andrew’s Answer: This happens occasionally, and usually, it’s because we failed to plan ahead and account for upcoming hurdles. Site redesigns, new product launches, and high-traffic promotions often skew or invalidate our results completely, but with proper planning and communication, we can avoid most of these complications.
Determining Optimal Sample Size
Question: How do you determine the optimal sample size for each test?
Andrew’s Answer: Calculators are a good starting point, but we also look at Google Analytics data. We try to run all of our tests for at least a week to account for fluctuations in traffic between weekdays and weekends. For example, running a test from Monday to Friday and then shutting it down wouldn’t tell you how it would perform on a Sunday morning.
In a nutshell, understanding statistical significance is important if you want to make informed decisions in Conversion Rate Optimization. It’s the yardstick that helps you measure whether the changes you’ve made to your website are genuinely effective or just a result of random chance.
However, it’s crucial not to get tunnel vision. Statistical significance is important, but it’s just one part of a much larger CRO puzzle. When interpreting your data and making decisions, you should also consider factors like customer feedback, market trends, and overall business objectives.