Why experts are often bad at predicting test results

Áine Doris | Sep 06, 2019

Companies such as Amazon, Google, and Uber run hundreds of online tests every month, with results guiding their business decisions and strategies. They also regularly make forecasts and predictions about what’s worth testing.

But research by University of California at Berkeley’s Stefano DellaVigna and Chicago Booth’s Devin G. Pope finds that while test results are generally trustworthy, experts’ forecasts and predictions are not. “Our findings suggest that people are not as skilled at forecasting results and determining what can be generalized as they assume. And that should be serious food for thought for companies,” says Pope.

The researchers first wanted to determine the extent to which test results can be trusted and generalized. Can a company have confidence in a test when its results suggest that a certain type of online banner ad leads to increased sales? Or would those results change depending on the demographics of people tested or the parameters of the test?

To find out, building on their own previous research on experimental methodology, DellaVigna and Pope devised a simple A/B button-pushing task and made various alterations to test the robustness of the results. In some cases, they changed the demographics of the volunteers performing the task. In other cases, they changed the task itself—from a button-pushing to a coding exercise. Ultimately, they find that their test produced similar results despite the variations.

But DellaVigna and Pope also wondered how well people who are presumably experts in the field would be at predicting these results. They asked 70 behavioral experts, as well as economics PhD students and the participants who had performed the tasks, to estimate how they thought the results would vary. “The experts have, at best, a mixed record in their ability to predict how much design changes affect the results,” says Pope.

This has significant implications for academics, whose decisions about what to study and how to advise students are affected by the robustness of research findings. Moreover, if experts can’t be trusted to predict and interpret study results, businesses may want to rethink how much trust they put in their own tests. Even if results are trustworthy, the underlying assumptions that guided the tests might not be.