Landing Page Testing – The Ultimate Guide To Test Statistics
In this guide, I will explain three calculations that are critical to designing and evaluating your landing page tests:
- How many conversions you need before you can be confident in a given landing page’s conversion rate.
- How to determine whether you can be confident in the results of your landing page test.
- How to calculate in advance how many test versions your landing page can support, as well as how long it will take you to get to confident results.
Summary: Use In Case of Math Allergy
For those of you who overwhelmed by statistics, here are some high-level rules of thumb:
Question: How many conversions do I need before I can be confident in a given landing page’s conversion rate?
Answer: Between 25 to 50 conversions are required to be somewhat confident in a given landing page’s reported conversion rate.
Question: Are the results of my landing page test statistically significant?
Answer: Typically, you need 25 to 50 conversions per test version to be somewhat confident in your test results.
Question: How many test versions can my landing page support, and how long will it take me to get confident results?
Answer: Use Marketo’s Online Testing Calculator! Or, take the number of conversions you get per day and divide it by 20. Then take your testing period in weeks. Multiply the two results together to get the number of versions you can confidently test in your testing period. (By the way, don’t be surprised that this is higher than 25 to 50 conversions per version, since this takes into account avoiding false negatives as well as false positives).
A note about creating and testing landing pages: According to MarketingSherpa, getting landing pages built and tested is one of the top five challenges faced by B2B marketers. If you face this challenge, check out Marketo Landing Pages. It lets you easily create landing pages with no IT or code, and then automatically rotate between test versions to learn which works best.
A note about testing other things: The following calculations apply just as well to testing things besides landing pages, like ad copy or emails. Just replace “page views” with “impressions” or “emails sent”, and “conversion rate” with “click through rate”.
Feel free to stop here, unless you really want to dig into the statistics…
Part 1: Confidence in Individual Statistics
If after a certain number of page views (say 500) I observe that a landing page has a conversion rate of 10%, how confident can I be that the “real” value would be close to 10%? (By “real”, I mean the value I would observe if I ran the test for a much longer time.) If statistically it could easily be somewhere between 0% to 20%, then I’m not confident. If it could be between 5% and 15%, I’m still not very confident. If it could be between 7.5% or 12.5%, then perhaps I can argue that I’m starting to get confident.
Therefore, for these calculations I will say that I am “confident” in a given conversion rate if there is a “sufficiently high” probability that the conversion rate I’ve seen so far is actually within ¼ of the “real” long-term conversion rate. Using ¼ is somewhat arbitrary; others might want to look for effects that are within 0.1 of the true rate.
There are two main ways we can calculate this, an easier way and a more accurate way. (Of course, when it comes to statistics, even the “easy” way is debatable!)
a. The Easier Way: Binomial Distributions
To calculate the confidence in a given conversion rate, we first need to estimate the standard deviation of the observed conversion rate.
Introduction to Standard Deviation
The standard deviation is often used in calculating a confidence interval. For example, let’s say I run a test that with 400 page views and I get 40 conversions. In this case, the observed conversion rate is 10% with a standard deviation of 1.5% (I’ll explain how I got 1.5% in a minute). A standard deviation of 1.5% means that if I had another 400 page views, then there is a 68% probability that I’d see a conversion rate somewhere between 8.5% or 11.5% and a 95% probability I’d see a rate between 7% and 13%.
Where did I get those probabilities? Given a normal distribution (which is typically a valid approximation after 10-20 conversion), there is a 68% probability of being within 1 standard deviation of the calculated rate, and 95% probability of being within 2 standard deviations. Note: You can get these numbers yourself using a probability calculator (input -1, 1 between) or enter the following formula into Excel: =NORMSDIST(Z)-NORMSDIST(-Z) (using Z=1 or Z=2) which gives the cumulative probability of being between -Z and Z standard deviations from the mean.
Now, because a conversion is a binary observation, we can use the Binomial Distribution to calculate the standard deviation. If we define that p is the conversion rate we’ve seen to date (known as the sample proportion or observed value) and n is the number of page views, then the standard deviation σ is equal to:
For example, if the conversion rate is 10% and we have 400 page views, the standard deviation is equal to 10.0% ± 1.50%.
Calculating Confidence Using Hypothesis Testing
Now, say we want to know the probability that our true conversion is actually within ¼ of the 10% value (i.e. between 7.5% and 12.5%) when I have 500 page views and 50 conversions (which yields a standard deviation of 1.34%). This is the same as saying, “am I confident in my observed value of 10%?”.
We can test this using a hypothesis test. Basically, this means I will first try to show that the real long-term conversion rate is likely less than 12.5%, and next that is likely greater than 7.5%.
To do this, first I’ll hypothesize that the real conversion rate is actually 12.5% (this is called the null hypothesis). Then I calculate the probability that I might get an observed value of 10% or less assuming the true value were 12.5%. If this is unlikely, then I can conclude that the null hypothesis is probably false (i.e. the true value is NOT 12.5%), which will lend weight to the alternative hypothesis, namely that the true conversion rate is less than 12.5%.
Got it? To calculate this probability, we use what is called a z-transformation:
where X is the “true” conversion rate, p is the calculated rate, and σ (sigma) is the standard deviation defined above.
The purpose of the z-transformation is that the probability of getting a z-value of at least 1.86 is equal to the probability of getting an observed conversion rate of 10% after 500 page views assuming the true conversion rate were 12.5%. Using =1‑NORMSDIST(Z) in Excel yields a probability of 3.1%. This tells us that there is less than a 5% chance that the null hypothesis is true. In other words, the real conversion rate is likely less than 12.5%.
However, we are not yet finished. We also need to calculate the probability that the real conversion rate is greater than 7.5%. Disproving this hypothesis will lend weight to the alternative that the true conversion rate is > 7.5%. We get the inverse of the last calculation:
Here, the probability of getting a value less than 7.5% is also 3.1%.
To finish, what is the probability of getting a Z value greater than -1.86 AND less than 1.86? This is given by the area between our test statistics: NORMSDIST(Z) – NORMSDIST(-Z) = 93.8%.
Put more simply, given the fact that the observed conversion rate was 10% after 500 page views, there is a 93.8% probability that the true conversion rate is between 7.5% and 12.5%.
There are different standards about what probability we need to be confident (aka “confidence level”). The choice depends on the cost of being wrong. In a drug test, being wrong can kill people so researchers aim for 99.99% confidence. In general statistics, researchers aim for 95%. In marketing, being wrong means we may lose a some money, but nobody gets hurt. So for our purposes, we’ll say that 80% confidence is acceptable. Our p-value of 93.8% is above this critical value, so we conclude that we ARE confident that the true conversion rate is between 7.5% and 12.5%.
RULE OF THUMB: You will reach the 80% confidence level with this approach when you have about ~25 responses. This is generally true regardless of your response rate.
b. Advanced: Chi-Square Distributions
You can skip this section unless you are a statistics purist. There is disagreement about whether Z-tests are valid for proportions (i.e. percentages), especially when the value of p is close to 0 — which can be common in marketing, especially with click through rates. A different approach that is universally accepted is to use a Chi-Square Test with Yates Correction. This approach tends to require more conversions before reaching statistical confidence.
We start with the following null hypothesis: The real click through rate is equal to 0.75 times our calculated click thru rate.
Wikipedia gives this formula for the test statistic:
where our results are as shown:
Given this, we can use the following values to test whether the “true” result is equal to 3/4 of the calculated result:
Calculating this for a conversion rate of 10% with 500 impressions, we get:
χyates = 1.66
We need to calculate the probability value (p-value) that because of chance alone we would get a value of χ as “extreme as or more extreme” than the one we got, assuming the hypothesis were true. The χ statistic “normalizes” the calculation to the Chi-Square distribution, so we just need to calculate the area to the right of the value 1.66 on that distribution. (Note: Because we are testing only one variable, we should use a One-Degree of Free Distribution.) Using a chi-square calculator or =CHIDIST(χ,1) in Excel, we get a p-value of 19.8%.
This tells us that there is a 19.8% chance the hypothesis is true, or inversely, a 80.2% chance that the real conversion is within ¼ of the one we observed.
RULE OF THUMB: Generally, regardless of the response rate, you will reach 80% confidence level with this approach when you have about ~50 responses.
Part 2: Evaluating Landing Page Test Results
In this section, I will show you how to determine whether you can be confident in the results of your landing page test (or other types of tests). Suppose I used testing to compare two landing pages against each other. Here are my observed results (so far):
Landing Page 1
Page Views (n1): 1000
Conversion Rate (p1): 10%
Landing Page 2
Page Views (n2): 800
Conversion Rate (p2): 8%
Can I be confident that landing page 1 beats landing page 2? Again, there are two ways we can calculate this, an easier way using normal distributions, and a harder way using Chi-Square.
a. Using Normal Distributions
The exact calculation we use depends on a subtle distinction about how we approached our test. This distinction will tell use whether we need to do a “one-sided test” or a “two-sided” test.
One-Sided Tests vs. Two-Sided Tests
As its name implies, a one-sided tests evaluates one thing: whether a champion landing page beats a challenger landing page. In contrast, a two-sided test evaluates two things: the likelihood that landing page 1 would beat landing page 2, as well as the likelihood landing page 2 beats landing page 1. A “two-sided” test is required if we did not hypothesize in advance whether one page (a champion) would beat the other (a challenger). Unfortunately, a two-sided test requires more conversions before it returns confident results. Note: it is not valid to simply pick the winner after the fact and call it the champion, since that introduces bias into the test.
The One-Sided Test: Champion vs. Challenger
In this example, let’s say that we supposed in advance that landing page 1 was the champion, so the only thing we want to calculate is that probability that the challenger (landing page 2) might beat the champion in the long tun.
As before, we will use hypothesis testing. In a one-sided test, here are our hypotheses:
Null Hypothesis: There is no difference between the challenger and champion response rate (and the fact that we observed a difference is due to chance).
Alternative Hypothesis: The Champion has a higher response rate than the Challenger.
Again, we want to calculate the probability that the null hypothesis is true. If this has a low probability, then we can conclude that the null hypothesis is probably false and the alternative hypothesis, namely that the Champion will win in the long run, is true.
To evaluate the null hypothesis, we again need to calculate a test statistic. Here is the test statistic we will use:
Entering in the values for the champion and the challenger, we get:
Entering this z-value into =1-NORMSDIST(Z) yields a probability of 6.9%. This means there is a low probability the null hypothesis is true, or in other words, a 93.1% probability that the champion will beat the challenger. This is greater than our 80% standard, so we can confidently pick a winner.
USEFUL LINK: Split Test Calculator, Using Normal Distribution (note: this calculator uses a slightly different calculation than the one I show above).
The Two-Sided Test: Champion vs. Challenger
In this case, we still evaluate both whether landing page 1 beats landing page 2, but we also need to include the probability that landing page 2 beats landing page 1.
The first calculation, whether page 1 beats page 2, is the same as the one-sided test. The z-value will be 1.48 and the probability that page 1 beats page 2 is 93.1%.
For the second calculation, here are our hypotheses:
Null Hypothesis: There is no difference between the response rate of landing page 1 and landing page 2 (and the fact that I calculated a difference is due to chance).
Alternative Hypothesis: Landing page 2 has a higher response rate than page 1.
This generates the inverse test statistic:
This is associated with a probability of 93.1%, meaning there is only a 6.9% chance that landing page 2 will beat landing page 1.
To get the complete results of the two-sided test, we need to know the probability that the null hypothesis is false in both cases. This is equal to 93.1% – 6.9% = 86.2%. So, in conclusion, there is an 86.2% chance that the two landing pages have DIFFERENT conversion rates, meaning we can still conclude with at least an 80% standard that landing page 1 beats landing page 2.
b. Advanced: Using Chi-Square Distributions
As before, there is disagreement about the use of normal distributions for low response rates. Again, let’s use the more valid Chi-Square distribution with the Yates correction. (We will again see that this approach requires more conversions to get to the same level of confidence.)
The hypotheses are the same as when we use the normal distribution, and the test statistic is calculated the same way as section 1(b):
Plugging in the same numbers as before for landing page 1 and landing page 2, we get a Chi-Square with Yates correction statistic of 1.912.
Note: The Chi-Square test is naturally a two-sided test. We can simply plug this into CHIDIST(x,1) to get a probability of 16.7%. This says there is a 16.7% that the two landing pages have the same conversion rate; or 83.3% chance they are different. Since this is above the 80% standard, we are confident we have a winner.
For a one sided test in which we had picked a champion in advance, we can divide the 16.7% probability in half to yield a 91.7% chance that the champion will beat the challenger in the long run.
USEFUL LINK: Split Test Calculator, Using Chi-Square (note, this calculator uses a 90% confidence level, so returns that the above test is not yet confident).
Part 3. How Many Landing Page Tests Can I Run?
a. Minimum Sample Size To Get Confident on Individual Statistics
We can work backward from the approach in Part 1(a) to estimate the minimum sample size needed to reach a particular level of confidence for an individual statistic. Let’s say that a represents the ratio of the size of the effect we want to observe (i.e. ±25% in the previous examples). Then the general formula for the Z stat is:
If we square both sides, we get:
Solving for n gives:
If we want 80% confidence then z=1.2816. (This is because the value of 1-NORMDIST(1.2816) is 10%, and since the test is two sided, we need 1-10%-10% = 80%.) If p=10% and a=25% (like before), then we get:
This tells us that if we expect a conversion rate of 10%, then we need at least 237 page views (23.7 conversions) to be 80% confident that the observed rate is truly within 25% of the real rate.
This can also tell us how long it will take to get confident. Say that after running for 3 weeks (21 days) we have 150 impressions and 15 conversions (for an observed conversion rate of 10%). We are not yet confident in this value, but can calculate how many days it will take to get confident:
So, I need a total of 33.2 days of data, and I have 21 days, so I expect to reach confidence another 12.2 days.
b. Minimum Sample Size To Get Confident on Tests
The situation is a little more complicated if we want to calculate the minimum sample size required to be confident that we will observe a certain effect in a one-sided test between a champion and a challenger.
Let’s repeat the process of starting with the formula for z and solving for n.
Our formula for z is:
Again, let’s say a represents the ratio of the size of the effect we want to observe. We can therefore simplify the equation by saying that:
Let’s simplify things further by saying each test version receives an equal number of page views, e.g. if the total number of page views across both versions is N, then n1=n2=N/2. Since there are two sources of variation, we use pooled standard deviation:
Squaring gives us:
Solving for N gives:
where p is the expected proportion (e.g. click through rate), a is the size of the effect we want to be able to detect (expressed as a % of p), and z is the z-value for the confidence level (alpha) we want to achieve.
Unfortunately, this formula is still incomplete since it only takes Type I error into account (aka false positives). In other words, it gives the probability that we would incorrectly see a difference between the pages when in reality there was no difference. However, we need to take Type II errors (aka false negatives) into account as well, i.e. the probability we would incorrectly say there was no difference between the pages when in reality there was a difference. This is also known as the statistical power of the test, since power measures the probability of detecting a real effect.
- Zβ = 0, for a 50-50 chance of seeing the “effect” of the chosen size
- Zβ = 0.674 for only a 25% chance of missing the effect (75% power)
- Zβ = 0.841 for only a 20% chance of missing the effect (80% power)
For example, let’s say I have two landing pages that I’m testing against each other. I expect a conversion rate of 10% but I want to be able to determine an “effect” of 2.5% (i.e. be able to detect a difference of 7.5% versus 10%). Using 80% confidence for Za and Zβ gives:
This means I need 2,595 page views (equivalent to 259 conversions), spread across the two test versions, to be confident that if I see a difference of 2.5% between my pages, then the difference is valid and real.
c. How Many Test Versions Can I Run?
Finally, we can calculate the number of test versions a given landing page can support. The z-statistic is similar to the prior section, although instead of dividing N by 2 we divide it by T, the number of test versions. Assuming we still want to do pairwise comparisons (i.e. compare one challenger to a champion), we still only have two sources of variation. This gives:
where T is the number of tests we can want to run. Solving for N, and again adding the correction for Type II errors, gives:
If I is the number of impressions per day, and D is the number of days we run the test, then N=ID. Plugging this in and solving for T gives:
and solving for T:
Alternately, I times p is the number of responses per day (e.g. clicks per day). If we say that R=I times p, we get:
So, finally we have the formula that tells us how many test versions we can run!
For example, let’s assume that we have a Landing Page Test Group that gets R=20 conversions per day. Let’s continue to use a=25% and p=10%, and use Za=1.28 and Zb=0.841. In this case, how many landing page test versions can we run if we want significant results within 2 weeks (D=14)? Plugging this into the formula yields 2.2 tests.
As another example, let’s say that we have 4 test versions and we get 20 conversions per day. How long will it take to get valid results? Plugging in the values to the equation and solving for D gives 19.5 days.
RULE OF THUMB: Using a=25% and p=10%, and using Za=1.28 and Zb=0.841, the formula reduces to T = 0.0077 x D x R. Divide D by 7 to turn it into W (weeks). This gives:
Thus, to get the number of versions you can confidently test, take the number of conversions you get per day and divide it by 20. Then take your testing period in weeks. Multiply the two results together, and you’ll estimate the number of versions you can confidently test.
Links for additional information:
This guide is sponsored by Marketo Landing Pages. Easily create multiple test versions of each landing page using our PowerPoint-like editor (no IT or code required). Then, have our system automatically rotate between the versions to test which one drives the most conversions. Check it out for yourself by signing up for our landing pages free trial!