For more than a decade, academic research has been mired in a “replication crisis,” in which the findings of thousands of published studies have proven to be difficult if not impossible to reproduce. Particularly in medicine and the social sciences, some widely cited papers have been discredited due to concerns about their methodology and conclusions, and tens of thousands of published papers have been withdrawn due to flaws, according to journalists Ivan Oransky and Adam Marcus, authors of the Retraction Watch blog.
This has implications beyond academia. In December 2020, the UK government scrapped unconscious-bias training for civil servants in England after concluding that there was little scientific evidence that it improved workplace equality. In the United States, the replication crisis has been cited in attempts to prevent tighter environmental regulations.
But while flawed, the current academic publishing system may be more valuable to policy makers and society at large than some proposed alternatives, according to research by Chicago Booth’s Alexander P. Frankel and University of Oxford’s Maximilian Kasy. They argue that instead of overhauling the system, journals could improve it by making more explicit the criteria they use to select papers for publication.
Many observers have blamed the replication crisis on publication bias, the tendency of journals to prioritize papers that have surprising results, or that upset conventional wisdom. While only a tiny minority of researchers may manipulate their results in an attempt to get published, the fear is that publication bias could prompt even well-intentioned researchers to engage in p-hacking, or data-dredging: parsing data until statistically significant (and therefore publishable) results come up.
In response, a few scholarly journals have moved to a system of registered reports, where they accept papers for publication on the basis of methodology, regardless of the final results. Others have removed asterisks from the tables in published papers—used to call attention to figures deemed statistically significant—so as not to overemphasize these results.
The researchers propose that the criteria for publication should be determined not on the statistical significance of the findings but on the extent to which those findings can move existing beliefs and shape new policies.
In theory, it might seem ideal to publish every single study regardless of results and let the good science rise to the top. But in reality, there are financial and practical constraints—while all researchers can post their papers on the internet, few of those studies will be seen unless they get published by journals.
This highlights the role of journal editors as gatekeepers. Frankel and Kasy built a model to understand the optimal way for editors to select papers, recognizing that whenever a publication sets criteria for what papers it will publish, there will be trade-offs.
The researchers’ model takes an instrumental perspective, which identifies the value of publication as helping to inform policy makers and guide their decisions. While much academic research is conducted without policy implications in mind, the model is exclusively focused on those studies with clear potential impact. In the model, the most valuable papers are those that prompt policy makers to change policies. The more surprising the result, the greater the likelihood it will change policy makers’ opinions, therefore the greater its value. By contrast, in this perspective, there is no value in publishing null results that confirm conventional wisdom, since the results wouldn’t lead to policy changes.
The researchers propose that the criteria for publication should therefore be determined not on the statistical significance of the findings but on the extent to which those findings can move existing beliefs and shape new policies. Instead of comparing results with what would happen without any policy intervention at all, findings should be compared with the effect of existing policies or conventional wisdom, Frankel and Kasy argue.
When a paper’s results could affect a binary policy choice, such as whether or not to introduce a government program, they should be subject to a “one-sided test” that demonstrates the value they could have on policy, Frankel and Kasy recommend. When it comes to what the researchers call “continuous” policy issues, such as choosing a tax rate, Frankel and Kasy recommend publishing papers that pass a “two-sided test”—for example, the results produce an optimal rate that is either far above or far below the existing tax rate.
The proposed system is meant to be self-correcting, Frankel and Kasy say, because researchers would still have an incentive to overturn inaccurate findings.