How to improve randomized trials

Jeff Cockrell | Sep 04, 2020

Sections Strategy

Collections Statistics

Randomized experiments have long been a cornerstone of scientific research. And many tech companies run randomized tests to learn from the huge amounts of data their customers generate. In 2019, Google conducted nearly half a million experiments on web search alone. 

In randomized testing, researchers randomly assign different treatments to different people (or other experimental units) and analyze the outcomes, which are all theoretically independent of each other. 

But in practice, one participant’s assignment can affect how another participant behaves, a phenomenon known as interference. If Facebook assigns two users to different experimental conditions, and those users happen to be friends, the behavior of one could affect the other’s, undermining the social media platform’s ability to draw classical causal conclusions.  

While this can be problematic, identifying how experimental units interact, and testing how these relationships affect outcomes, can also make experiments more informative and valuable. Research by Chicago Booth principal researcher David Puelz, Booth’s Panos Toulis, Stanford’s Guillaume Basse, and University of California at Berkeley’s Avi Feller demonstrates a graph-based method that can help experimenters incorporate interference into their causal analyses. 

In their method, the researchers construct a graph showing each member of the experimental group on one axis and every possible combination of treatment assignments on the other. If a social media company wanted to test how promotional content affects user engagement, for example, all the users in the experiment would be arrayed on one axis and all the content options, or combinations of options, would be on the other. 

The researchers then use an algorithm to identify a subset of experimental units and treatment assignments, which they dub a clique, that are relevant to the hypothesis being tested. (For the social media example above, a clique might include females aged 18–29 with friends or followers who are exposed to a particular type of advertisement.) This clique is then used to conduct a randomization test—a statistical procedure used to determine how much variation should be expected between experimental units in each treatment group if the treatment has no effect. Any more variation than that suggests there’s a causal relationship between receiving the treatment and changes in the outcome. 

Unlike more-traditional approaches to dealing with interference, the graph-based approach doesn’t require experimenters to make any assumptions.

To illustrate how their method works, the researchers used data from a large-scale policing experiment in Medellín, Colombia. In the experiment, a different research team, led by Daniela Collazos of the Secretariat of Security of Bogotá, started with a list of 967 street segments (each one roughly a block) identified as hotspots for crime, and randomly selected 384 to receive an increased police presence for six months. Collazos’s team aimed to understand how additional policing would affect crime on the treated blocks themselves, as well as how this effect would spill over to other blocks that didn’t have increased policing. 

Puelz, Basse, Feller, and Toulis used their method to investigate how the treatment effect varied between blocks that didn’t receive hotpot treatment but were within 125 meters of a block that did, and those that were at least 500 meters from a treated block. They created a graph with all of Medellín’s 37,000 street segments on one axis and about 10,000 possible combinations of policing assignments on the other. They then homed in on a specific clique of units and treatments relevant to their question. The analysis was complicated by the fact that treated blocks were heavily concentrated in Medellín’s city center. 

Using their technique, the researchers find evidence for a spillover effect, or “a decrease in crime of street segments surrounding an area with increased law enforcement/community policing,” which they note is consistent with the findings of prior research. 

Unlike more-traditional approaches to dealing with interference, which sometimes use models to estimate how different experimental assignments interact with each other, the graph-based approach doesn’t require experimenters to make any assumptions. It can also be applied to a broad range of experimental settings, from e-commerce to policing to traffic congestion. “The method itself is completely general, and can be applied to virtually any setting where there is a randomized experiment and interference,” Puelz says.

That generality may be important for the growing number of businesses hoping to use randomized experiments to reduce the ambiguity around the effects of new decisions and ideas.