Why Experts Are Often Bad at Predicting Test Results

By Áine Doris
September 06, 2019
CBR - Behavioral Science

Companies such as Amazon, Google, and Uber run hundreds of online tests every month, with results guiding their business decisions and strategies. They also regularly make forecasts and predictions about what’s worth testing.

But research by University of California at Berkeley’s Stefano DellaVigna and Chicago Booth’s Devin G. Pope finds that while test results are generally trustworthy, experts’ forecasts and predictions are not. “Our findings suggest that people are not as skilled at forecasting results and determining what can be generalized as they assume. And that should be serious food for thought for companies,” says Pope.

The researchers first wanted to determine the extent to which test results can be trusted and generalized. Can a company have confidence in a test when its results suggest that a certain type of online banner ad leads to increased sales? Or would those results change depending on the demographics of people tested or the parameters of the test?

To find out, building on their own previous research on experimental methodology, DellaVigna and Pope devised a simple A/B button-pushing task and made various alterations to test the robustness of the results. In some cases, they changed the demographics of the volunteers performing the task. In other cases, they changed the task itself—from a button-pushing to a coding exercise. Ultimately, they find that their test produced similar results despite the variations.

But DellaVigna and Pope also wondered how well people who are presumably experts in the field would be at predicting these results. They asked 70 behavioral experts, as well as economics PhD students and the participants who had performed the tasks, to estimate how they thought the results would vary. “The experts have, at best, a mixed record in their ability to predict how much design changes affect the results,” says Pope.

This has significant implications for academics, whose decisions about what to study and how to advise students are affected by the robustness of research findings. Moreover, if experts can’t be trusted to predict and interpret study results, businesses may want to rethink how much trust they put in their own tests. Even if results are trustworthy, the underlying assumptions that guided the tests might not be.

Works Cited

Stefano DellaVigna and Devin G. Pope, “Stability of Experimental Results: Forecasts and Evidence,” Working paper, May 2019.

More from Chicago Booth Review

Line of Inquiry: Abigail Sussman on Why You May Feel Less Wealthy than Your Neighbors

Those assessing their own wealth relative to others’ often have a skewed perception.

CBR - Behavioral Science

Would You Take a Later Flight for Cash?

People often neglect to consider the diminishing nature of marginal utility in everyday financial decisions unless prompted to do so—and it makes a big difference.

CBR - Behavioral Science

How to Forge Relationships with the ‘Enemy’

A summer camp offers lessons on how to form close relationships with members of other groups.

CBR - Behavioral Science

NECESSARY COOKIES These cookies are essential to enable the services to provide the requested feature, such as remembering you have logged in.	ALWAYS ACTIVE
	Accept \| Reject
PERFORMANCE AND ANALYTIC COOKIES These cookies are used to collect information on how users interact with Chicago Booth websites allowing us to improve the user experience and optimize our site where needed based on these interactions. All information these cookies collect is aggregated and therefore anonymous.
FUNCTIONAL COOKIES These cookies enable the website to provide enhanced functionality and personalization. They may be set by third-party providers whose services we have added to our pages or by us.
TARGETING OR ADVERTISING COOKIES These cookies collect information about your browsing habits to make advertising relevant to you and your interests. The cookies will remember the website you have visited, and this information is shared with other parties such as advertising technology service providers and advertisers.
SOCIAL MEDIA COOKIES These cookies are used when you share information using a social media sharing button or “like” button on our websites, or you link your account or engage with our content on or through a social media site. The social network will record that you have done this. This information may be linked to targeting/advertising activities.

Why Experts Are Often Bad at Predicting Test Results

More from Chicago Booth Review

Line of Inquiry: Abigail Sussman on Why You May Feel Less Wealthy than Your Neighbors

Would You Take a Later Flight for Cash?

How to Forge Relationships with the ‘Enemy’

Related Topics

More from Chicago Booth

Related Topics

Manage Cookie Preferences

Why Experts Are Often Bad at Predicting Test Results

More from Chicago Booth Review

Line of Inquiry: Abigail Sussman on Why You May Feel Less Wealthy than Your Neighbors

Would You Take a Later Flight for Cash?

How to Forge Relationships with the ‘Enemy’

Related Topics

More from Chicago Booth

Related Topics