Law and order and data

Will algorithms fix what’s wrong with American justice, or make things worse?

Credit: Noma Bar

Jeff Cockrell | Mar 01, 2021

As it was for so many things, 2020 was a strange year for crime. Cities around the United States saw their overall crime rates drop considerably, according to data collected by University of Pennsylvania’s David S. Abrams, a trend perhaps driven by the general decline of economic activity and face-to-face interaction amid the COVID-19 pandemic. At the same time, Abrams’s data also show rates of shootings and homicides in many cities that were at or even well above historical averages.  

But one aspect of criminal justice in the US was disturbingly familiar in 2020: the deaths of Black Americans at the hands of police. The murder of George Floyd in May and a series of other high-profile incidents fueled nationwide protests and calls for change, including from then presidential candidate Joe Biden, whose platform included expanding the powers of the Department of Justice “to address systemic misconduct in police departments and prosecutors’ offices.” 

President Biden’s platform also included aspirations to reduce the prison population and “root out the racial, gender, and income-based disparities in the [criminal justice] system,” reflecting a sense that even apart from policing, America’s justice system is bloated and unfair.

These demands for reform intersect with advancements in technology, namely software that uses algorithms and, in particular, machine learning to analyze huge masses of data and make predictions relevant to criminal outcomes. Many police departments, municipalities, and states are adopting these tools, or increasing their use of them. Some observers hope such tools will reduce the kind of racial inequity that has historically plagued justice in the US, but others worry they will only exacerbate those problems. 

Algorithms are already being used in criminal-justice applications in many places, helping decide where police departments should send officers for patrol, as well as which defendants should be released on bail and how judges should hand out sentences. Research is exploring the potential benefits and dangers of these tools, highlighting where they can go wrong and how they can be prevented from becoming a new source of inequality. The findings of these studies prompt some important questions such as: Should artificial intelligence play some role in policing and the courts? If so, what role should it play? The answers, it appears, depend in large part on small details.  

A persistently flawed system

Even before the pandemic, crime in the US had broadly been on the decline for decades. Data from the Federal Bureau of Investigation show a 49 percent drop in the violent crime rate from 1993 to 2019 and a 55 percent drop in the property crime rate. Survey data from the Bureau of Justice Statistics, which include both reported and unreported crimes, show even steeper downward trends. 

But however encouraging some aggregate trends may be, criminal justice in the US suffers from massive shortcomings—in terms of both keeping people safe and treating people fairly. Despite the improvement in its crime rates, relative to other developed countries, the US still faces significant crime-related challenges, especially when it comes to its homicide rate: according to the Organisation for Economic Co-operation and Development, the US had 5.5 reported homicides per 100,000 people in the latest year for which data are available, as compared with other wealthy countries such as Canada (1.3 reported homicides per 100,000 people), Germany (0.5), and the United Kingdom (0.2). 

Meanwhile, police departments in many US cities have been found to engage in practices that discriminate against people of color or other groups, and police brutality in the US, particularly against Black people, is a source of persistent social upheaval. Add to this the fact that among countries for which data are available, the US maintains what is by far the world’s biggest prison system, numerically dominated by prisoners of color.

Public outrage over problems such as these has led to an often rancorous discourse over whether, and how, the US should reform its approach to fighting crime. The conversations are wide ranging. They encompass concerns such as the militarization of police departments; the allocation of public funds to policing relative to other social services, including mental-health care; the fundamental aspects of how police approach and build relationships with the communities they serve; and the debate over whether incarceration should focus on punishment for the crime or rehabilitation of the prisoner.

The use of algorithms by both police and the courts is becoming a more important part of these discussions. Algorithmic decision aids that use machine learning to predict the likelihood of a given outcome are increasingly pervasive in many settings. They play a role in weighty decisions such as who should receive a loan from a bank, or be interviewed for a job. But there may be few contexts in which their application has greater consequence than in the criminal-justice system, where they can help shape a community’s exposure to policing and play a part in decisions about freedom and imprisonment. 

One threat is that predictive-policing systems will create feedback loops as they ingest the data generated by their own predictions.

Such tools work by analyzing massive stores of data, including data about past criminal events and outcomes, to predict where crimes will occur, who’s most likely to fail to appear in court, and who’s most likely to be a repeat or violent offender. The promise of these tools is better outcomes using less resources while locking up fewer people. The risk is that they will make existing inequalities even more pervasive and difficult to address.

The academic literature contains support for both notions: cautionary tales and warnings about unintended consequences, as well as promising glimpses of new possibilities. These widely varying outcomes hint at the heterogeneity that underlies the broad banner of machine learning.

University of Chicago Harris School of Public Policy’s Jens Ludwig illustrates this point by contrasting algorithms with vaccines. Unlike a vaccine—which is a single, clearly defined thing—A.I. is not a “thing” but rather an umbrella term for a collection of tools that are heterogeneous in their design and in their implications for crime, fairness, and the harm the justice system can inflict. “In practice, A.I. algorithms actually vary enormously,” he says, including in their overall quality and in the amount of attention that gets paid to anticipating and addressing concerns that may be general (such as discrimination) or specific to their use. “The key challenge for the field,” he says, “is figuring out how we can get more of the good ones and fewer of the bad ones.”

Predictive policing and its risks

The criminal-justice pipeline starts with policing, where machine learning is being deployed in some cases to detect or solve crimes. Noise sensors installed throughout many cities feed ML algorithms trained on audio data to listen for and report gunshots—a significant task, as research by Purdue’s Jillian B. Carr and Texas A&M’s Jennifer L. Doleac has found that the vast majority of gunshots go unreported. Some departments also use ML-powered facial recognition systems to help identify the perpetrators of crimes. 

ML is widely used for crime prediction too. The Chicago Police Department is using products sold by the policing-technology company ShotSpotter, including gunshot sensors but also a patrol management software called ShotSpotter Connect. Connect uses a combination of local crime and gunshot-detection data, as well as other data such as weather information, census data, and the locations of schools and parks, to identify discrete zones for police to patrol on the basis of the likelihood of a crime occurring there. The system creates risk assessments for numerous types of crime, from homicide to auto theft, and even suggests tactics officers should use on their patrol.

Criminal-justice algorithms in practice

There are various concerns about algorithms in policing, and one is bias. The American Civil Liberties Union, among others, worries that because ML policing tools rely in part on historical data, they will be influenced by human bias and will exacerbate and further entrench historical patterns of inequality and disparate treatment in the justice system. This has some backing in research. 

“It is a common fallacy that police data is objective and reflects actual criminal behavior, patterns, or other indicators of concern to public safety in a given jurisdiction,” write Rutgers’s Rashida Richardson, NYU’s Jason M. Schultz, and Microsoft Research’s Kate Crawford. “In reality, police data reflects the practices, policies, biases, and political and financial accounting needs of a given department.”

Richardson, Schultz, and Crawford examined 13 jurisdictions in the US that adopted or used predictive-policing tools while subject to investigations, court-monitored settlements, memoranda of agreement, or consent decrees related to corrupt, biased, or otherwise illegal police practices. They find that in nine of those jurisdictions—including Chicago, New Orleans, and Maricopa County, Arizona—data that may have been shaped by such practices were available to train or otherwise affect the algorithms. “In these jurisdictions, this overlap presented at least some risk that these predictive systems could be influenced by or in some cases perpetuate the illegal and biased police practices reflected in dirty data,” the researchers write.

And if predictive-policing algorithms work in part by analyzing data on past police actions, they seem bound to absorb whatever bias shaped those data. As expressed by the ACLU and 16 other organizations in a 2016 joint statement on predictive policing, “the data driving predictive enforcement activities—such as the location and timing of previously reported crimes, or patterns of community- and officer-initiated 911 calls—is profoundly limited and biased.”

Richardson says that algorithms used to predict criminal outcomes will face inherent problems until we can trust the data used to fuel them. “I think if there is to be a future with automation in government decision-making and policy implementation, there also needs to be fundamental changes around data collection and use and sharing within government,” she says, adding that those changes will be especially challenging to effect given that they’ll involve organizations at the local, state, and federal levels.

ML could help identify at-risk officers

Early-intervention systems, designed to identify officers likely to use excessive violence or engage in other behaviors harmful to others or themselves, have been popular with police departments across the country for decades. Both academic researchers and private-sector businesses have begun exploring machine learning’s potential to perform this task.

See more

There’s also the threat of predictive-policing systems creating feedback loops as they ingest the data generated by their own predictions. Crime data come in part from police observations of crimes—so if an algorithm sends a patrol to a particular neighborhood, it becomes more likely that new data will be generated for that location. The data will then be used to update the algorithm, which becomes more likely to send a future patrol to the same area, which can lead to overpolicing. 

Danielle Ensign, formerly of the University of Utah, Sorelle A. Friedler of Haverford College, University of Michigan PhD student Scott Neville, University of Arizona’s Carlos Scheidegger, and University of Utah’s Suresh Venkatasubramanian analyzed a model based on PredPol, a widely used policing platform, and find that it is vulnerable to such feedback loops.

They also find that by filtering the data used to update the algorithm over time, it’s possible to mitigate the problem. Under their proposed solution, the probability that a crime is fed back into the algorithm goes down as the probability of the area being patrolled goes up, thereby counteracting the tendency toward overpolicing. However, the findings underscore the threat posed by feedback, Venkatasubramanian says, “because we have to throttle the feedback to prevent the system from drifting away.”

Encouraging evidence on bias

To investigate whether predictive policing in Los Angeles resulted in more arrests of Black and Latinx people, University of California at Los Angeles’ P. Jeffrey Brantingham, Louisiana State University’s Matthew Valasik, and George O. Mohler of Indiana University–Purdue University Indianapolis reviewed the data from a randomized controlled experiment conducted earlier by Mohler, Brantingham, and five coauthors. (Brantingham and Mohler are cofounders of PredPol.) In that 2014 experiment, officers in select divisions of the Los Angeles Police Department were given a list of 20 target areas to patrol and told that crime was expected to be highest in those locations. Some lists were generated by human crime analysts using “all of the technological and intelligence assets at their disposal,” and some were generated by an algorithmic forecasting tool. Whether officers received a human-generated list or an algorithm-generated list varied randomly by day. 

The algorithm outperformed the human analysts in terms of its impact on crime: for patrols of average duration, the algorithmically generated ones resulted in a 7.4 percent drop in crime, as compared with a 3.5 percent drop for human-directed patrols. Furthermore, in their follow-up analysis, Brantingham, Valasik, and Mohler find that “there is no significant difference in the arrest proportions of minority individuals” between the human and algorithmic patrols. 

ML could even make policing less biased, other research suggests. Stanford’s Sharad Goel, Booking.com’s Justin M. Rao, and NYU’s Ravi Shroff used ML to derive more effective guidelines for New York City’s controversial stop-and-frisk policy. Data from 2008 to 2012 show that the overwhelming majority of stop-and-frisk incidents involved people of color and didn’t result in any further action. Goel, Rao, and Shroff developed an algorithm that, had it been used, would have enabled the police to recover 90 percent of the weapons they had confiscated using only 58 percent of the stops. The algorithm would also have improved the equity of the racial balance of stops.

“We should be wary about the government procuring algorithms the same way we procure phones for the police department.”

In Chicago, ML is used in Strategic Decision Support Centers, which were introduced in 2017 following a civil-rights investigation into the police department by the Obama administration’s Department of Justice. The centers were meant to provide a way to combine better integration and use of technology with new management and deployment policies and procedures. District 7, in the Englewood neighborhood on the city’s south side, was one of the first to receive an SDSC; there, police leaders and analysts trained by the University of Chicago Crime Lab work side by side, using ShotSpotter products in real time to create more targeted deployment strategies for the district. Though gun violence in the city fell generally in 2017 after an exceptionally brutal 2016, it fell almost twice as steeply in District 7. The CPD has so far rolled out SDSCs to 21 of its 22 police districts. 

The deployment recommendations made in each SDSC don’t divert resources from other districts—they are used to determine how to allocate patrols within districts, not across them. Therefore, even if the people who live in a particular district are overwhelmingly Black—many neighborhoods in Chicago, as in other cities, are highly segregated by race—there’s reason to think the algorithm isn’t moving the needle much on the racial balance of police exposure. And to the extent the SDSC helps reduce crime within that district, ML is helping to reduce disparities in public safety within Chicago. 

Jay Stanley, a senior policy analyst at the ACLU, agrees that the narrower that predictive policing is used, the less concern there is. “It’s reasonable that changing the area reduces the concerns,” he says, “but I don’t think it eliminates them.” 

Algorithms and the prison system

Algorithms’ reach into criminal justice goes beyond policing, extending into the US’s large and expensive prison system, and into decisions about who goes into it. 

Some judges use algorithmic tools to help them decide who should await trial at home. This decision—often come to on the basis of the defendant’s flight risk, risk of recidivism, or both—has enormous consequences, as Chicago Harris’s Ludwig explained in a 2018 presentation as part of the Talks at Google lecture series. “If the judge jails you, on average you’ll spend two to four months in a place like the Cook County Jail,” he said. “You can imagine what that does to your job prospects. You can imagine what that does to your family.”

Greater predictive power could help focus pretrial detention on individuals who pose the greatest risk, and thereby minimize the number of people who have to endure this disruptive experience. 

In 2017, Cornell’s Jon Kleinberg, Harvard’s Himabindu Lakkaraju, and Stanford’s Jure Leskovec, with Ludwig and Chicago Booth’s Sendhil Mullainathan, constructed an algorithm to explore whether they could improve on the results of the pretrial-release system then being used in New York City, in which judges could reference risk assessments made by an older predictive tool (rolled out in 2003) as part of their decision-making. They find that, had their algorithm been used during the date range they studied, it could have offered substantial benefits over the existing system.

Kleinberg and his coresearchers examined arrest and bail data in New York between 2008 and 2013, and their findings indicate that pretrial judges were out of sync with the predictions of the researchers’ new algorithm. The judges released nearly half of the defendants the new algorithm picked out as most risky—more than 56 percent of whom then failed to appear in court. With the new algorithm’s help, the researchers find, the judges could have maintained the same failure-to-appear rate while jailing 40 percent fewer people, or lowered the failure-to-appear rate by 25 percent without jailing a greater number of people. And they could have done all this while reducing racial disparities.

That research served as a proof of concept that a new algorithm could potentially improve on pretrial judgments in practice. Following that study, New York City engaged the University of Chicago Crime Lab and a private company called Luminosity to develop a new algorithmic pretrial assessment tool, which it began using in 2019. The algorithm uses eight factors to generate a 26-point risk score judges can consider in their bail decisions—higher scores correspond to a greater predicted likelihood that defendants will appear in court. Data from November 2019 to March 2020 (at which time New York suspended pretrial court appearances due to COVID-19) indicate that these risk scores predicted defendants’ behavior with a high degree of accuracy: appearance rates tracked risk scores closely, with nearly 98 percent of those who received the highest possible score—a group that included roughly four in 10 defendants represented in the data—subsequently appearing in court. 

The algorithm recommended defendants in 85 percent of cases be “released on their own recognizance,” or without paying bail, in contrast to the city’s previous pretrial-release tool, which recommended such release in just 34 percent of cases. What’s more, the rate at which the new algorithm suggested ROR varied little across races: 83.5 percent of white defendants, 83.9 percent of Black defendants, and 85.8 percent of Hispanic defendants were recommended for ROR—in contrast with recommendations made by the old tool, which recommended ROR 30 percent more frequently for white defendants than Black defendants. The early evidence suggests judges’ decisions generally aligned with the new algorithm’s recommendations: nearly 90 percent of defendants recommended for ROR were ultimately released without bail.

Just as pretrial judges are often asked to predict defendants’ future behavior, trial judges may be asked during sentencing to forecast defendants’ risk of recidivism and future harm to the community. This opens another window for algorithmic involvement. 

In 2013, a Wisconsin judge relied on COMPAS—an algorithmic tool, also sometimes used in pretrial-release decisions, that predicts the risk that a criminal defendant will commit another crime in the future—in his sentencing of Eric Loomis for attempting to flee from the police after being found driving a car that had been used in a shooting. Loomis argued that the sentencing violated his due process. The Wisconsin Supreme Court disagreed, and the US Supreme Court declined to hear the case.

Who is most likely to commit another crime?

In sentencing defendants, judges often factor in how likely a person is to commit another crime in the future. And as technology advances, this assessment sometimes involves algorithms.

Criminal-justice nonprofit Recidiviz’s Julia Dressel and University of California at Berkeley’s Hany Farid studied COMPAS, an algorithmic tool that predicts defendants’ risk of recidivism. Using a database of defendants from Broward County, Florida, the researchers find that COMPAS achieved roughly 65 percent accuracy in its predictions of who would commit another crime. However, a set of human predictors with no criminal-justice expertise was almost as accurate: participants the researchers recruited through Amazon Mechanical Turk averaged 62 percent accuracy.

See more

The risks of allowing algorithms to weigh in on judicial decision-making—whether in pretrial decisions, sentencing, or parole decisions, where they’ve also been used—are obvious. Just as with predictive policing, biased data pose a threat to the equitability of outcomes. 

A 2016 ProPublica analysis of COMPAS risk scores highlights that concern: the results indicate that the tool was more likely to misidentify Black defendants as high risk than white defendants, and more likely to mislabel white defendants as low risk. The company that owns COMPAS has disputed those findings, and other analyses have called the broader conclusion that COMPAS produces racially biased results into question. 

But Loomis’s case highlighted a concern beyond bias: transparency. He argued that it violated his constitutional right to due process for a trial court to rely on the results of a proprietary instrument. He couldn’t challenge the accuracy or science of COMPAS because he couldn’t see it, much less analyze its results. And what he did know about it concerned him: Loomis alleged that it unjustly took gender and race into account. 

Transparency is another issue that the ACLU and other watchdogs have raised in terms of predictive policing—transparency about the algorithmic systems themselves, and about the procurement processes by which those systems are selected. “A lack of transparency about predictive policing systems prevents a meaningful, well-informed public debate,” they write in a 2016 letter. 

Whenever automated predictions are considered for policing, all stakeholders must understand what data is being used, what the system aims to predict, the design of the algorithm that creates the predictions, how predictions will be used in practice, and what relevant factors are not being measured or analyzed. The natural tendency to rush to adopt new technologies should be resisted until a true understanding is reached as to their short and long term effects.

They further argue that products and vendors need to be subject to independent, ongoing scrutiny—and currently are not getting that, and in fact too often claim trade secrets. 

“A lot of what we’re seeing is in the form of commercial products that are proprietary and opaque by design, and also probably way oversold,” says the ACLU’s Stanley. “I think some of the commercial products have relatively simple nuts and bolts but use secrecy to evoke magical results that aren’t really reflecting what’s going on under the hood.” 

And he warns that algorithms could aggravate discriminatory patterns by giving biased police and judges the appearance of digitally sanitized objectivity. “There’s a problem of people reifying the data and algorithms, and both obscuring and reifying decisions that are in fact highly questionable, but making them seem as though they’re objective,” he says. “That is the fundamental problem: a bunch of people running around playing with data and algorithms in all kinds of ways that have the potential for enormous destructivity. People are fast, loose, and out of control here.” 

Regulation is the factor

This, then, is the crux of algorithms and ML in criminal-justice applications: they’re tools that, like any other, can be well made or poorly made, used for good or used for harm. Concerns about bias, transparency, and more boil down to the contents of individual algorithms, some of which are better than others—and to how those algorithms are implemented. In theory, those contents and that implementation could be strongly guided by a robust set of rules.

“Part of the problem is that this technology is still so new, we haven’t yet developed the right regulations for guiding its use in public-sector policy applications,” says Ludwig. “We need to get those regulations right.”

Kleinberg, Ludwig, Mullainathan, and Harvard’s Cass R. Sunstein argue in 2019 research that algorithms have the advantage of being explicit in a way human decision-making can never be—as long as regulation is in place that encourages transparency. The researchers suggest that such regulation would require the producers of algorithms to store the data used to train their algorithms; to make the algorithms available for regulators to test in order to see how changing certain factors, such as a job applicant’s or criminal defendant’s race or gender, would affect the algorithms’ predictions; and to make clear the algorithms’ “objective function,” or the specific outcome they’re asked to predict, to scrutinize whether that outcome is fair and reasonable.

Access to those things, the researchers argue, would bring much-needed clarity to some of the questions that are often unanswerable in cases of suspected discrimination: What factors were considered in making a particular decision, and why were those factors chosen? Algorithmic bias, under these conditions, becomes easier to detect than human bias.

How algorithms could help or hurt the criminal-justice system

Kleinberg, Ludwig, Mullainathan, and Sunstein note that when it comes to the elements that regulators need to scrutinize an algorithm, “at a minimum, these records and data should be stored for purposes of discovery.” In other words, they should be available to resolve legal questions, even if they’re not made public. 

But for algorithms used in the public sector, transparency can go well beyond storing information for private inspection. The New York Criminal Justice Agency, which administers the pretrial-release algorithm developed in part by the University of Chicago Crime Lab, maintains a website where the general public can see data about its performance, read about how it’s used in practice, and even use the tool themselves. The site also describes the agency’s plan for assessing and updating the algorithm over time.

Given that public-sector use of algorithms is an issue of not only technical and regulatory competence but also popular acceptance, this kind of visibility into the function, performance, and maintenance of algorithms could play a key role in making them palatable to a skeptical public. Another key could be greater public oversight of how algorithmic products are selected for use. Given these tools’ potentially high impact, susceptibility to negative unintended consequences, and variation in quality, a thorough and transparent process for deciding which algorithm to use, and how to use it, may be appropriate. “We should be wary about the government procuring algorithms the same way we procure phones for the police department,” Ludwig says. “Having a private company say, ‘We can’t tell you how the algorithm works—that’s our [intellectual property]’ is not an acceptable answer for algorithms.” 

The 2016 joint statement issued by the ACLU and its 16 cosigners echoes this sentiment:

Vendors must provide transparency, and the police and other users of these systems must fully and publicly inform public officials, civil society, community stakeholders, and the broader public on each of these points. Vendors must be subject to in-depth, independent, and ongoing scrutiny of their techniques, goals, and performance. Today, instead, many departments are rolling out these tools with little if any public input, and often, little if any disclosure.

Ludwig says one solution may be to rely less on the private sector and more on in-house or nonprofit development of algorithms. Mullainathan agrees that there’s still too little oversight of how A.I.-driven tools, from facial recognition systems to pretrial-decision algorithms, are selected by public decision makers, and that there should be far greater transparency about the performance of algorithms purchased by police departments and other public agencies. “The biggest gains in public governance that we’ve had in any country come from transparency and accountability, and we simply do not have that” when it comes to public-sector use of A.I., Mullainathan says.

Algorithms aren’t everything

As machine learning and other algorithms become more pervasive, their presence in and influence on criminal justice will likely continue to grow. Of course, it’s only part of the increasingly complicated picture of law and order in the US. Embracing algorithms, or abolishing them, will not take the place of broad and thoughtful reconsideration of how the police should function within a community, what sort of equipment and tactics they should use, and how they should be held accountable for their actions.

The question, then, is whether ML and other algorithms can be part of the future. If algorithms are to help improve American justice, the people adopting and using the tools must be fully aware of the potential dangers in order to avoid them. 

Stanley concludes that while predictive policing and other examples of ML in criminal justice hold promise, it could take decades to work through the problems. Regulation is necessary, he says, but it is also a blunt tool, and legislators are rarely tech savvy.

He compares implementing ML to building the US transcontinental railroad system in the 19th century, which took many years and involved many train wrecks. “There’s no question there are ways that this could be socially useful and helpful, but it’s something that needs to be approached with great caution, great humility, and better transparency,” he says, adding that “a lot of the institutional patterns and incentives and cultures in law enforcement don’t lend themselves especially well to the kind of transparency that’s necessary. . . . It’s not foreordained that data and algorithms are going to bring some big social benefit compared to the nuts and bolts that need to be addressed to fix American policing.” 

Kleinberg, Ludwig, Mullainathan, and Sunstein acknowledge that algorithms are fallible because the humans who build them are fallible. “The Achilles’ heel of all algorithms is the humans who build them and the choices they make about outcomes, candidate predictors for the algorithm to consider, and the training sample,” they write. “A critical element of regulating algorithms is regulating humans.”

Getting this regulation right could be the key to realizing the often striking performance benefits of algorithmic systems without aggravating existing inequalities—and perhaps even while reducing them. But it remains to be seen whether regulatory structures will develop that can meet this goal. Such structures are, after all, maintained by humans.