[BackChannel] Why You Should Test (Almost) Everything
BY Benjamin Simon and Jim Pugh | Monday, May 6 2013
techPresident's Backchannel series is an ongoing conversation between practitioners and close observers at the intersection of technology and politics. Jim Pugh is the CEO of ShareProgress. He previously ran the digital analytics program at Organizing for America.
Ben Simon was formerly the director of new media campaigns at the DNC and OFA, and he is currently working as an independent consultant.
One of the best things to come out of the post-campaign coverage of OFA 2012 has been a renewed focus on analytics -- and in particular on randomized testing and experimentation -- as a crucial part of any good digital program. It’s something we’ve both been preaching for years, and we're excited to see its growing proliferating outside of just the largest programs.
Randomized testing is an incredibly valuable tool. It lets you use data to determine which messages resonate the most and drive your supporters to take action, rather than needing to make guesses based on your gut instinct (which we can attest will often be inaccurate). Applying the results of these tests can increase the impact of your digital program by a substantial margin.
However, it's important to recognize that there's an opportunity cost associated with any test you run. Even the simplest email subject line test requires time and effort to plan and execute. And more complicated tests take even more work -- to execute a 4x3 email test (four different emails with three subject lines each) requires writing four separate drafts, coding each one up separately, and analyzing a lot more response data.
There may be a credibility cost to testing as well. Many organization directors are still skeptical about the value of testing, so every experiment that takes time without yielding useful results could be a strike against the cause of testing more generally.
A good test can be well worth it -- and pay off handsomely by increasing the impact your program. But to gain useful, actionable results, your experiment needs to provide you with enough data to see statistically significant differences between the approaches that you're testing -- ideally with 95% confidence or more.
If you're the Obama campaign, MoveOn.org, or Avaaz and have an email list of millions of people to contact, it won't be hard to collect enough responses to reach this threshold. But for smaller organizations with more limited reach, it may be much more difficult.
How can you check in advance to see if you'll reach statistical significance? Here's what you need to do:
- Identify what it is you're trying to maximize. If this is an email, it should be your ultimate action (donations or petition signatures, for example), rather than simply opens or clicks. If it's a webpage test, it should be whatever action you want people to take on the page.
- Figure out how many people you intend to reach with each different approach that you're testing.
- Estimate, based on past performance, what percentage of the people you're reaching you expect to take action. This percentage will be very different depending on what your action is (for example, you'll probably have a much lower action percentage when asking people to make a donation than when asking them to sign a petition).
- Make an educated guess about how much difference in response you might see between your approaches. Will one version get 5% more actions than another? 10%? 30%?
- Using your estimates from the previous steps, calculate how many actions you expect for each of your different approaches, then plug your reach and action numbers into a statistical significance calculator to measure the expected confidence level of your results.
Is the difference between your approaches significant? If yes -- go ahead with your test! There's a good chance your results will be able to tell you which approach is best.
But if the answer's no? Then don’t run the test. Without statistical significance, it's very possible that the approach with the highest number of actions may not actually be the best one, and the test therefore doesn't provide you with useful information. The time it took to set up and run the experiment could have been more effectively spent elsewhere.