This article was originally published on the Programmatic Guide
Author: Mike Hans
Lift testing is one of the most commonly used methods for evaluating programmatic efficacy. It's common to see large programmatic budgets of $20M+, especially amongst DR marketers, be distributed to vendors purely on the basis of incremental lift.
The premise is simple: split your audience into test (also known as treatment) and control groups, serve a PSA ad to the control group and your branded creative to the test group, measure their respective performance, and finally use the difference to calculate lift.
Note: some vendors ofter ghost ads as an alternative to serving PSAs, this research paper provides a comprehensive look at this option and why it should be considered when available.
A Valid Approach With Design Flaws
This methodology has been pushed by marketers looking to prove ROAS, ad tech companies hoping to show that their attribution isn't claiming undue credit, and agencies wanting to demonstrate unbiased results. To be clear, lift testing makes sense, and marketers should be using objective measures of performance, but is it actually unbiased?
The fundamental, underlying principle of lift testing is that the test and control groups must hold all else equal, except for the creative. In programmatic media terms this means targeting and buying variables such as geography, audience, inventory, price, duration, viewability, and much more need to match up in order to get clean results. Failure to keep all else equal introduces testing bias. As a simple example, of testing bias, if the control has a much higher CPM than the test, there will be a lower comparative reach, meaning less opportunity for the control users to convert, thereby inflating the lift demonstrated by the test.
Dynamic media CPMs are just one of the many variables that make lift testing inherently susceptible to manipulation. Given the never ending revelations about questionable programmatic practices, such as this damning Ad Exchanger report, marketers need to do everything possible to safeguard themselves. When it comes to lift testing, extreme care should be taken to ensure that tests are well designed, unbiased, and that the results are independently measurable and auditable. The following keys are designed to help marketers run effective and valid lift tests.
10 Keys To Success for Programmatic Lift Testing
Create test and control groups at random. If you're testing a pre-defined audience, such as a CRM file or an audience from your DMP, pre-split the audiences before sending to the DSP. If the DSP is doing a broader test/control group spanning tactics, be sure to vet their methodology.
Use a large enough control group. The industry standard is 90%/10% test/control, but that can either lead to too small of a group yielding insignificant results or too large of a group leading to PSA waste. Use a sample size calculator to set a baseline and then consider increasing 2-3X to allow for match rates.
Confirm that each group is excluded from the other. This is an often overlooked step that if missed can immediately muddy the results.
Plan a sufficiently long test. The duration should be at least as long as a full buying cycle (two would be ideal). Many marketers rush lift testing, leading to inconclusive results.
Set clear media expectations for both groups and introduce measurement standards to ensure all else stays equal. Forge Group recommends that CPMs, viewability rates, brand safety standards, ad fraud rates, frequency, and domain reporting match up between both test & control to avoid manipulation. Any deviation will introduce significant bias and render the end results questionable at best.
Be sure to test only one variable - creative. While it can be tempting to look at many other variables, true lift testing only tests branded creative versus PSA creative. The utmost care needs to be taken that all other variables remain equal. Setup additional testing scenarios (A/B testing in a test & learn framework), using these same best practices if you would like to evaluate other variables.
Ensure you have complete, contractual transparency with the DSP across: price, media, data, and service (read more here). Without full transparency it will be impossible to access the necessary data to validate the results.
Before testing, notify the DSP that full log data will be required to validate results. Forge Group recommends pre-defining all data that should be included in the logs to avoid downstream frustration.
Monitor optimizations via access to change logs and daily summaries from the service team.
Use a 3rd party measurement tool, such as your ad server or MTA partner, as the measurement system of record. Results should directionally align with the DSP, but a neutral, 3rd party should be the ultimate source of truth.
A Call To Action
Lift testing is a valuable tool to help marketers evaluate vendors and allocate dollars. Due to a lack of transparency many marketers openly question the validity, and they should, of previous lift tests. Using the 10 Keys listed above, consider designing new lift tests to establish a baseline understanding of where to invest your future budgets. Forge Group sees a more rigorous approach to lift testing as a key way for brands to improve programmatic performance and reduce waste in 2018.
Forge Group is a consultancy that helps brands maximize value from programmatic and digital marketing technologies.
Driven by a belief in absolute transparency, and with a deep background in ad tech, Forge Group delivers Programmatic Transformation Consulting to marketers across industries and verticals.