Statistical Significance: most marketers have a rough idea of what it means; maybe you even knew the formula back in college. When I graduated as an economics major and landed in marketing, I was naive enough to think that every decision must be made with this calculation in mind…
I was wrong.
The vast majority of marketing decisions are made by gut feeling.
Why? Lack of data.
Sure Google has enough traffic to A/B Test the relative impact of 50 shades of blue in their text links (result: +$200m), but very few businesses have that much traffic. Even at Google any decision outside of the core search experience probably doesn’t have enough data to reach a significant result. If they A/B Tested the color of their employee’s ID badges, it’d take something in the region of 23 million times longer to conclude that test (1.2 trillion searches per year vs 54,000 employees).
Let me explain:
When you run an experiment, your results can sometimes be untrustworthy.
Flip a coin 10 times. Did you get exactly 5 heads and 5 tails? No? Wait, does that mean the coin is biased to one side? Of course not. Flip it 1000 more times and you’re much more likely to get a 50:50 result.
It works the same with marketing experiments.
That’s why we calculate Statistical Significance; to see how ‘trustworthy’ the result of our experiment was. Without diving too deep into the formula, statistical significance is the function of three things:
- Sample Size – more ‘observations’ (how many times the test is run) means a higher likelihood you’ll reach statistical significance. i.e. more coin flips.
- Difference – a big performance difference means a quicker conclusion. i.e. it’s faster to spot a coin weighted 75% to heads than to prove a coin will hit heads 50% of the time.
- Confidence – the level of certainty you need to accept the result – typically 95%. i.e. 1 in 20 of your ‘statistically significant’ results will actually be incorrect.
To reach a statistically significant result you need a lot of data and/or a big impact. Even then you STILL have a 1 in 20 chance of being wrong…
Well that sucks. Why bother in the first place?
If you’ve ever spent time actually tracking the outcome of your decisions, you’ll see that you’re wrong more often than you’re right. When we looked at the outcomes of 330 marketing tests we ran at Ladder, we had a failure rate of 57%.
The industry average failure rate is worse at 85%!
When we can’t trust our own judgement, it’s imperative that we use data to make better decisions… but how?
How do we still make a data-driven decision when we don’t have enough data?
My solution? Cascading Significance.
How Cascading Significance works
Cascading Significance means starting with your gut decision then building your confidence in the decision over time by actively gathering data.
Done correctly, you’ll get all the benefits of being scientific and data-driven without sacrificing the speed of making gut decisions.
To explain how this works, I’ll go through a real-world example: testing a new marketing channel.
Visualize your marketing funnel (i.e. the stages your users go through to become a customer). It’ll look something like this:
If you wanted to run a test to get more people to complete sign-up, you’d be able to show the test to 11,883 people (your website visitors), giving you 1,000 observations.
Now if you wanted to run a test that increased purchases, you’d only able to show the test to, at most, 1,000 people and would expect only 150 observations. That means a conversion test would take you 11x longer to reach a statistical result than an activation test!
It gets even worse further down the funnel, where only 1 in 10 customers refer a friend and you have to stick around for a month and a half to see whether a user is retained or not.
Now if you think about it, it would be easier to conclude above the top of this funnel: it’s relatively cheap (~$100) to get 10,000 people to see your Facebook ad and you should expect around 116 clicks for your budget. When you’re paying $0.85 per observation (a click), it’s much easier to stomach the cost than when you’re paying $10 (a sign-up) or $67 (a purchase).
So back to our example of testing a new marketing channel.
If you insist that the customers need to be retained before you trust that marketing channel works for you, your test is going to take a very long time (and a lot of cash) to conclude.
Modeling Sample Size
Let’s run some numbers. #sorrynotsorry #mathisfun
With the above marketing funnel our conversion from ‘visits website’ to ‘buys product’ is only 1.26%. Now if we want the new marketing channel to be within 20% of the effectiveness of our existing channel, and we are willing to drive 1,000 visitors a day ($850 daily budget), how long would we have to run the test…?
Answer: 63 days!
That’s a testing budget of 63 x $850 = $53,550 in total just to find out if a new marketing channel ‘works’ for us or not!
Note that you could cut the time by half by doubling the daily budget, but either way you’re still forking out more than $50k to get away a clean test.
Worse, when you do get a statistically significant number of conversions, you’ll still need to wait around for months to see if they’re retained.
Who has the time or money for that? You need to make important business decisions and can’t afford to suffer from analysis paralysis while you wait for a clean test conclusion.
Applying Cascading Significance
The first step is to relax your requirements. Don’t go for the golden goose of 794 fully retained customers straight away.
First try seeing if you can even get customers to your website for 85 cents.
You only need a couple hundred dollars to test that hypothesis and one of 3 things will happen:
- You’ll be nowhere near (>50% too high) and you can safely move on
- You’ll be within a range (+50% to -50%) which gives you confidence to go on
- You’ll be onto something big (<50% below) and drop everything to double down
If you get result 2 or 3, you go to the next step.
Can you even drive a single email for ~$10? If so it’s time to crank up the spend and see if you can get a statistically significant result.
Maybe you don’t even worry about 95% significance at first, try hitting the 80% barrier first and testing further to reach 95% if it’s still looking good.
Once you’re confident about the emails you might be tempted to move on to conversions. However the gap between activation and conversion is large (85% dropoff). To bridge that gap, we can look at indicator metrics.
Before converting, users will take actions on the website that indicate their intention to convert. It could be something as strong as looking at the pricing page or booking a demo, or as soft as simply not ‘bouncing’ from the site after sign-up or using the product in their first week.
You can cycle through these indicator metrics, building confidence the whole time, moving from soft to hard indicators until you’re ready to commit to a statistically significant conversion test.
Of course from Conversion you can then move to concluding based on whether the customers are retained in month one, month two, etc until you’re finally fully confident in your channel.
You still will have spent the same to fully prove the channel, but you’ve given yourself a number of times to opt out and deploy those resources elsewhere. You’ve been calculating and updating the odds the whole time.
It’s like counting cards, and just as effective if you do it right.
At every funnel stage you were checking significance, updating your beliefs and moving on. For the more technical, this is a form of Bayesian inference. But you don’t have to be technical to know this makes sense because it’s exactly how we make decisions in our everyday lives.
I give my friends the benefit of the doubt based on all our past shared experiences, but if they start repeatedly bailing on me or the quality of our time together drops, I’ll naturally start deciding to see them less and less. It may even lead to the ending of our friendship if my confidence in them drops enough or they drop the ball in a major way.
In short, I have optionality. I have other friends that I’d go see instead. Unless I didn’t, in which case I’d stick with them for a lot longer.
It works the same with cascading significance.
If the only hope for your startup is getting this marketing channel to work, then you’ll stick with it for as long as possible, even if you’re seeing negative results.
If you already have a bunch of marketing channels working and you were just toying with this channel on a whim, you don’t have to go that far down the rabbit hole; at the first sign of failure you can break off the test and deploy those resources elsewhere.
Normally you should be juggling several of these tests each week, cutting the ones that you’re least confident in and redeploying that budget to those that are showing the most promise.
That iterative process will eventually help you explore all the opportunities available to you without wasting an enormous amount of time and budget dogmatically sticking to the science.
It’ll help you avoid analysis paralysis and move at the speed your business needs you to move, without that guilty nagging feeling that you should be more data-driven.
Other Tips to Speed up Testing
Tip #1: Go Big Or Go Home
Remember that a bigger difference in performance is easier and quicker to spot than a smaller difference. Don’t waste time tweaking button colors – go radical with your designs instead. Once you find a broad ‘type’ of page that works, drill down further and further into elements in descending order of importance to eventually land on a fully optimized page.
Tip #2: Measure Twice, Cut Once
If you run a similar calculation to the one above for each test your team (or boss, or client) wants you to do, you can avoid a lot of heartache. You can reset their expectations on what’s possible and avoid tests that are doomed to fail or unlikely to move the needle from the start.
Tip #3: Success Is Optional
Once you’re armed with a model of the test, like the one I did above, it becomes very difficult to argue with you. Because you showed the initiative and smarts to debunk the original plan, you’ll be in the driver’s seat when suggesting what to do next. Use this power to flex your optionality and test the things you really should be testing.
At Ladder, we build software and offer services to help high-potential businesses accelerate their growth. Is that something you need?