It’s the seventh play of the game on a freezing night in November 2013, featuring a storied American football rivalry: the Green Bay Packers versus the Chicago Bears. Aaron Rodgers, the Packers’ star quarterback, takes the snap. He scrambles to his right and is caught by the Bears defensive end Shea McClellin, who takes him to the turf hard.
Rodgers stands up, grimaces, and jogs to the sidelines. The usually noisy Lambeau Field crowd goes quiet as Packers fans contemplate the implications of a possible Rodgers injury. But the real processing is yet to come. Soon the football world will learn that Rodgers has a fractured left collarbone and will not return for several weeks.
Upon the announcement, the media will speculate about the decisions that Mike McCarthy, head coach of the Packers, and his staff must make. The pundits will argue with a great deal of confidence, but regardless of what they say, they will keep their jobs and speculate another day.
McCarthy and his staff are confronted with a daunting situation. Rodgers, a former league MVP who is the core of the Packers’ offense, could be replaced by Seneca Wallace, a lesser-known, 33-year-old, fourth-round draft pick. The coaches must quickly determine if Wallace is the right man for the job or if they should seek a replacement and remake their offense around their new quarterback. How they navigate this transition will determine whether the Packers win or lose their late-season games, and perhaps even whether the coaches keep their jobs for another season.
With so much at stake, the leaders of the team need the best objective information that the world has to offer. And increasingly, coaches facing strategic decisions across sports can find some of that information in academia. Researchers at Chicago Booth as well as other elite institutions have investigated core questions that owners and coaches across multiple sports must answer when recruiting and managing a team.
How much does a superstar like Rodgers matter to a team?
Should a team trade a few good players for a single star?
If so, how many good players should the team trade so the deal is fair?
Once they have assembled a team that is powerful enough to win, how should the players behave tactically on the field?
For decades, statistical minds have tackled these and other questions. The results are finally coming together to provide a framework—one that can guide owners and coaches in their attempts to build and lead winning teams.
Superstars, not teams, win games
“There’s no ‘I’ in team,” is many a coach’s favorite phrase. But as a talented and cocky soccer player once retorted to his coach, “Yeah, but there is an ‘I’ in win.”
The player may have been more accurate than the coach. Across a variety of sports, there is strong evidence that a small number of star players contribute inordinately to their team’s wins.
The Aaron Rodgers injury afforded a rare opportunity to see the effect of an individual player. Bookmakers taking bets and calculating odds in the gambling center of Las Vegas had posted point spreads for the following week’s Packers-Philadelphia Eagles game before the Rodgers injury. Prior to the injury, the Packers were expected to beat the Eagles by nine points—a wide margin in professional football. After the injury, the bookmakers in Vegas adjusted the point spread so that Green Bay was expected to win by one and a half points.
Point spreads are generally excellent predictors of which team will win, on average. Like coaches (and unlike pundits), people betting on games stand to lose something valuable if they are wrong. It turns out that taken as a group, gamblers can make pretty good guesses about the future.
A 1991 paper by Hal S. Stern, a statistician at the University of California at Irvine, shows how to convert these point spreads into a probability of winning for the Packers before and after the Rodgers injury. Using Stern’s formula, the Packers were expected to win the game against the Eagles about 74.4% of the time with Aaron Rodgers, and only 54.4% of the time with Wallace in Rodgers’s place. The betting market believed a Packers team led by Rodgers would win an extra two of 10 games compared to Wallace, all else equal. That’s an impressive impact for one player to have.
Opportunities to see such gambling-market reactions in sports are rare, since coaches keep their stars playing as much as possible. But economists and statisticians have come up with more creative ways to assess player impact.
But who wins championships?
Tobias J. Moskowitz, Fama Family Professor of Finance at Chicago Booth, and Sports Illustrated writer L. Jon Wertheim followed a simple but creative approach to determine the impact of star players on basketball teams in their book, Scorecasting. They compared two types of teams—those with stars, and those without—and observed how they fared, asking, “[H]ow likely is it that an NBA [National Basketball Association] team without a superstar wins a championship, makes it to the finals, or even makes it to the playoffs?”
Regardless of their definition of a star player (they defined stars based on salary, MVP votes, and first-team all-star selections), the results were obvious: stars are virtually required in order to win championships.
In fact, only about nine in 1,000 teams without a star are expected to win NBA championships. (The results of their tests are summarized in “Star players matter to basketball teams.")
They took this analysis a step further, since one could argue that teams with a solid line of above-average players can reliably beat teams with one or two superstars but a weaker supporting cast. In the NBA at least, the argument does not hold up. “Controlling for the average starting salary and winning percentage of teams . . . [the analysis] suggests that a superstar with a relatively weak supporting cast fares better than the team with five good players,” Moskowitz and Wertheim note.
Similar star-player effects are detectable in soccer, hockey, and football, but they are much less pronounced in baseball. Moskowitz and Wertheim attribute this to the frequency with which star players can make an impact. In soccer, basketball, hockey, and football, certain stars can touch the ball or puck in almost every play of every game. In baseball, however, even star pitchers rarely pitch full games and almost never pitch consecutive games. Similarly, star batters must wait for their turn in the batting lineup. Thus, the effects of stars in baseball, though they exist, are more muted.
Talent-spotting: A tough task
While the research indicates that stars matter, the challenge for a coach is to determine how to measure which stars matter, setting aside the possibility of letting the gambling markets provide an estimate, as with Aaron Rodgers.
Robert B. Gramacy and Matt Taddy, respectively, assistant professor of econometrics and statistics and associate professor of econometrics and statistics at Chicago Booth, along with Wharton’s Shane T. Jensen, look closely at a common method of measuring the impact of individual hockey players, the plus-minus value, and find it wanting.
Plus-minus is calculated as follows: across a time period when player A is on the ice, sum up the goals scored by a team and subtract the goals scored against that team.
As the researchers noticed, there are two obvious drawbacks. First, a strong team has many players with high plus-minus values, due to its good overall performance. Second, players on teams whose schedule pits them against weak opponents will have high plus-minus values due to their relative strength. In both cases, relative team strength is more important than individual player contribution.
A third drawback is less obvious. Imagine that an average team has two superstars who are powerful contributors to the team’s overall plus-minus. They deserve the majority of the credit for goals scored and prevented. But imagine that a coach has a third below-average player whom he doesn’t want to play unless he has both superstars on the ice to pick up the slack. In this case, the weak player is almost always on the ice with the strongest players on the team. In terms of plus-minus, he looks almost exactly like the two superstars. Only by observing the few times when the superstars are on the ice and the weaker player is benched can one notice that the weaker player has little impact on the game.
Gramacy, Jensen, and Taddy constructed a new method for looking at player contribution, controlling for the three weaknesses of plus-minus. First, they estimated each player’s contribution to every goal, rather than giving equal credit to all players on the ice when a goal was scored. Secondly, they eliminated team effects, essentially penalizing players who were on good teams or who played against bad teams and improving the contributions of players on weak teams and those who played against strong teams. (See “Hockey’s team players,” which shows how team effects change players’ observed effectiveness during the 2007/8–2010/11 seasons.)
The clear ranking of players from best to worst begs a question: How certain can a coach be that the “best player” in the league is actually better than the second best? The researchers’ conclusion: across the league, there is immense uncertainty about whether players are better or worse than others with similar skill levels.
The best player in the league from 2007 to 2011, Pavel Datsyuk of the Detroit Redwings, only had a 75% chance of contributing more to plus-minus than the second best player, Alex Ovechkin of the Washington Capitals, according to the researchers’ estimates. Further, Datsyuk’s superiority over other players was not highly certain until he was compared to the 10th ranked player.
The researchers also evaluate a player in the middle of the pack. Based on their analysis, it was difficult to tell if he was significantly better than or worse than 30 of the other top players in the National Hockey League.
Paying the right price for a superstar
If determining player contributions to wins after observing the player in action for multiple seasons is difficult, it seems a fool’s errand to determine the future success of an athlete who has never been observed playing at a professional level. But that is the task that National Football League (NFL) coaches must accomplish every year when selecting rookies in the draft. How well do they do? Richard H. Thaler, Charles R. Walgreen Distinguished Service Professor of Behavioral Science and Economics at Chicago Booth, along with Wharton’s Cade Massey, who earned an MBA and PhD from Booth, set out to answer that question.
The NFL draft is essentially a straightforward process. Nonprofessional players who wish to enter the NFL wait as teams select them one by one. The picks take place over seven rounds, and the ordering of picks is assigned to teams based on how they performed during the previous season. The worst teams get to pick first, while the best pick last.
However, teams often trade draft picks. For example, in the 2013 draft, the Oakland Raiders traded their No. 3 overall draft pick to the Miami Dolphins for the Dolphins’ No. 12 overall pick, as well as the 42nd overall pick.
In the early 1990s, Mike McCoy, an enterprising co-owner of the Dallas Cowboys, looked at past data to find a pattern behind teams’ draft picks. He charted the approximate value of all picks in the draft relative to the first, based on historical trades. He assigned the first overall pick 3,000 points, and the value of picks decreased as the draft continued. Using the chart, coaches could see that the first overall pick could theoretically be traded for the fourth overall pick as well as the 12th, since the total point value of the fourth and 12th picks was equivalent to that of the first.
For decades this pricing system spread and became the de-facto standard for pricing NFL draft trades. Nobody questioned McCoy’s original math. The “prices” were assumed to be correct relative to player productivity, rather than proven to be correct by empirical evidence.
This is where Massey and Thaler stepped in. Familiar with psychological research, they had a hunch that early draft picks were overvalued by those who paid for them. They pointed to two core biases. First, people are too willing to make extreme forecasts. There has been just one Peyton Manning over the past two decades, but dozens of quarterback draft picks were considered to be “the next Peyton.” Second, experts such as NFL coaches and team owners are often overconfident about their predictive abilities. They think they can say with some assurance that this player will be a star and some other player will not. Therefore, the price chart was probably incorrect in a predictable way—coaches would overpay for early draft picks.
Massey and Thaler tested this hypothesis by looking at two attributes of football players: cost and value. They determined cost by looking at player-compensation costs. They then determined the “market” value of each player through a calculation that took into account the player’s position, years in the league, initial draft pick position, and productivity based on which of five groups players fell into: players who never started, backup players, occasional starters, regular starters, and Pro-Bowl players. Players who never started were assumed to be the least valuable, while players elected to the Pro-Bowl, essentially an all-star game, were assumed to be the highest-value players.
Average salaries within groups confirmed this assumption of value, as players who never started had the lowest average salaries, while Pro-Bowl players had the highest average salaries. The researchers looked at this cost and productivity only for each player’s first five years in the league, before players commonly start to negotiate new contracts as free agents.
Massey and Thaler’s first finding is that teams do have some ability to predict quality. On average, players chosen in the first round of the draft perform better than players chosen in the second round, and so on. However, this ability is limited. Say you compare two players at the same position who were drafted consecutively, as in the third and fourth wide receiver taken. Massey and Thaler find that the chance the earlier one taken will be better than the latter one is only 52%, barely better than a coin flip.
That means it can be a big mistake to give away a bunch of picks to get one of the players at the top of the draft. The research shows that the prices paid for early draft picks in terms of later draft picks are too high. In fact, the best average deals in the NFL draft go to teams that select players at the beginning of the second round (See “Overpaying to pick first.") Massey and Thaler note that picking football players is relatively easy compared to hiring employees in many other occupations. Teams get to watch the players in college and run them through a battery of physical and mental tests. Firms hiring a CEO from the outside rarely have that sort of information available to them. Caveat emptor.
Given a great team, coach for the win
The day of the 2006 FIFA World Cup Finals is remembered by soccer fans for an emotional scene. Zinedine Zidane of France, after some taunting, headbutted the Italian Marco Materazzi. But the outcome of the game arose from a more rational process: penalty kicks.
In the match, only one of the 10 goals was scored during regular play—Materazzi scored it in the 19th minute with a header from a corner kick. The remaining nine times the ball entered the goal were the result of penalty kicks, one (netted by Zidane) during regular play, and the remaining eight in a shootout after the game ended 1–1.
Imagine being one of the nine players who had to take one of the penalty kicks that could determine the outcome of the most important game of your life. You would want to play to your strengths. If you are more comfortable with your right leg, odds are that you are most comfortable kicking the ball to the left side of the goal. If you had your way, you would do it every time.
But there’s a problem: the goalie. If taking a kick is stressful, being a professional goalie is far worse. You must stand on the goal line, waiting for one of the world’s best players to kick an 8.6 in. diameter ball into a 24 x 8 ft. goal that you are responsible for defending. As a goalie, though, you have one advantage. You know which foot your opponent kicks with. You should jump to the side that you believe the player is most likely to kick.
Unfortunately, the kicker also knows that you know his preference, and thus the circular logic begins. Economists have thought a lot about the theory behind such strategic interactions in which two (or sometimes more) players must make decisions based on what each knows and what each knows the other knows. They call such interactions “games.” In economics, the penalty-kick scenario is considered a simultaneous game, since both goalie and kicker make their decisions at the same time. In a paper published in 2002, Steven D. Levitt of the University of Chicago, Columbia University’s Pierre-André Chiappori, and UCLA’s Timothy Groseclose developed a game-theoretic model to analyze the choices facing kickers and goalies as they approach penalty kicks.
The perfect strategy for penalty kicks
The researchers created a model based on several assumptions derived from conversations with goalies. Essentially, kickers have a favorite direction to kick, and they are more likely to score if they kick in that direction. They are also more likely to score if the goalie doesn’t move in that direction. Goalies know this and adjust accordingly, but penalty-takers know goalies adjust, and change their play accordingly. The researchers also allow for kicks to the center of the goal, since the goalie may jump to one side, leaving the middle of the net clear.
The assumption of a favorite direction turned out to be valid. Kickers are 94.4% more likely to score on a penalty kick when they aim for their favored side and the goalie does not jump in that direction. That compares to an 89.3% success rate when they kick to the other side when the goalie leaves that side open. Kickers also score more frequently when they kick to their strong side (and the goalie jumps that way) than when they kick to the weak side. With the assumptions confirmed, the researchers made other predictions and tested them using a sample of 459 penalty kicks.
Prediction one: Players shoot to the center more than goalies play to the center. Result: Correct. Players kicked to the center 17.2% of the time, while goalies stayed in the center 2.4% of the time.
Prediction two: Goalies jump to the players’ strong side more frequently than players kick in that direction. Result: Correct. Players kicked to their strong side 44.9% of the time, but goalies jumped that way 56.6% of the time—anticipating the strong-side kick.
Prediction three: Both goalies and penalty-takers go to the kickers’ strong side more than to their weaker side. Result: Correct. The goalies went to the kickers’ strong side 56.6% of the time compared to 40.9% of the time to the weaker side. Players kicked to their strong side 44.9% of the time and to their weaker side 37.9% of the time.
Of course, Levitt, Chiappori, and Groseclose’s predictions cannot guarantee a player can score on every penalty kick. Nor do they suggest a way in which goalies can block every penalty kick. However, they indicate a strategy for maximizing the total number of goals penalty-takers can score, and for maximizing the total number of saves for goalies.
The 2006 finals in which 88.9% of penalty kicks were scored was well above average. In the researchers’ sample, only 74.9% of penalties resulted in goals. Some might seek to explain this by noting that the World Cup final showcases the world’s greatest soccer players, but there are two sides to every penalty kick. The players were taking those kicks at the highest levels, but why weren’t goalies blocking at the highest levels? The results were abnormal, but to a statistically minded person, they look like good luck on the part of the kickers.
The biggest statistical blunder in sports
Coaching is a difficult job, especially at the professional level. Decisions can be complex. But every once in a while, the numbers tell an obvious story about actions a coach should take. In even fewer cases, the action suggested by the numbers is the opposite of what professional sports coaches do.
The NFL has one such decision. This error is so prevalent that Nate Silver, a pop statistician now with ESPN, in a 2012 interview at Google, called it “the most statistically unsound tactic in professional sports.” What is it? Coaches go for it too infrequently on fourth down.
An explanation for non-football fans: when on offense (and greater than 10 yards from scoring a touchdown), teams have four attempts to move the ball 10 yards from the spot where they started. If they succeed, they have achieved a first down, meaning they get another four downs to move the ball another 10 yards.
The tough decision arises when a team has moved less than 10 yards after using three downs. The team then has one down left and three options: punt the ball to the other team in the hopes of giving the opposing team a worse starting field position, attempt to kick the ball through the uprights for a field goal (worth three points), or attempt to continue moving the ball up the field to earn another first down, called converting. If the team fails to kick a field goal or convert, the opposing team takes the ball after fourth down wherever the previous play ended.
On fourth down, coaches almost always have their teams punt when the ball has moved less than halfway up the field, and almost always have them kick a field goal when the ball is within 30 yards of the opposing goal line. But research suggests that coaches choose to punt and kick field goals far too frequently when they should try to convert for a first down and continue their march toward a touchdown (worth six points, with an accompanying field-goal conversion worth an additional point).
The research comes from David Romer of the University of California, Berkeley. Analyzing 700 NFL games, he determines the expected values of going for a fourth-down conversion versus punting or kicking a field goal.
Avid football fans are accustomed to watching teams face fourth-down decisions multiple times throughout the season and to seeing teams nearly always choose to kick or punt rather than attempt to convert. It may be difficult for fans to consider any other possibility, but they should, Romer argues.
He demonstrates what happens when both possibilities are in play. To make the math simple, imagine a coach faces fourth and goal from the 2-yard line seven times. If the coach kicks the field goal every time, the numbers suggest the team will score all seven times, for a total of 21 points. If they go for the touchdown, Romer’s numbers show, the team will convert about three of seven times, for a total of 21 points. The decisions look equal, if points were all that mattered.
But points are not all that matters. Coaches also consider field position after the play is done. In the 2012 season, the average kickoff return was for 23.6 yards. So after kicking the field goal, the opponent is expected to start at about the 24-yard line. But with the touchdown attempt, the team will fail to score four of seven times, leaving the opponent at the 2-yard line. And the team will score three of seven times, leading to a kickoff and an expected opponent starting position at about the 24-yard line. (See “Think before you kick.")
So coaches should expect to score the same number of points but to keep significantly better field position when going for the touchdown on fourth down from the opponent’s 2-yard line. In other words, coaches do not go for a touchdown on fourth down as frequently as the numbers suggest they should.
This example is concrete, but Romer looked at the expected payoffs in going for a touchdown on fourth down at all points on the field relative to the other option, whether it be kicking a field goal or punting. (See “Fourth down: When to go for it.")
Why so few fourth-down attempts?
There are two theories that attempt to explain why there aren’t more fourth-down attempts. The first argues that some combination of coaches, owners, players, or fans does not want to maximize wins using fourth-down conversions.
The second argues that stakeholders do want to maximize wins, but a subset doesn’t understand that going for it on fourth down is the statistically best choice.
The first may sound strange, but consider this argument: Romer’s paper was published in 2006, giving owners and coaches time to hear of his findings. Given the persuasiveness of the evidence, one would think that more than a couple of team owners or coaches would have heeded his advice.
But there are few examples of coaches who frequently attempt fourth-down conversions. Bill Belichick, head coach of the New England Patriots, attempts them more frequently than other coaches, according to Moskowitz and Wertheim in Scorecasting. However, they could only point to one high-school coach who has embraced the practice. The implication: it is possible that coaches and owners are familiar with the evidence but prefer to win without fourth-down conversions.
What could explain this? Perhaps the stakeholders prefer battles decided by fighting back and forth across the gridiron, rather than by making unpredictable fourth-down conversions. Perhaps the norm of fourth-down punts and field goals is so ingrained that disrupting that norm causes discomfort. Perhaps there are other reasons. Finding objective evidence to substantiate these hypotheses is difficult, if not impossible.
The prevailing theory in the second camp assumes coaches know that fourth-down attempts are the best decision, but others—such as owners, fans, or journalists—do not. In this case, it’s possible coaches fear losing their jobs if they attempt fourth-down conversions more frequently. There are plenty of phrases describing the phenomenon. A Japanese proverb describes it nicely: “The nail that sticks out gets hammered down.”
As a head coach of an NFL team, life is good. It’s difficult to reach a higher point in your career, though it’s easy to be fired for a bad decision. Even if there is a statistically obvious way to improve your overall results, you won’t look like a genius every time. A coach who goes for it on fourth and goal from the 2-yard line instead of kicking an easy field goal looks like he made the wrong decision four of seven times, even though the points scored over seven attempts should be about the same. Why risk looking like you made a mistake when others who have not calculated the risks involved will yell at you (in the best circumstance) or fire you (in the worst)?
Unless owners decide to embrace fourth-down conversions or protect coaches who make good statistical decisions (even if some individual calls look poor), coaches will have little incentive to fight the norm.
Numbers augment human judgment
Baseball led the way in the sports-analytics revolution. The movie Moneyball—starring Brad Pitt as Billy Bean, an enterprising manager who dared to buck the establishment by hiring a young statistician to craft a numerically recruited baseball team—focused on the dissent between the stodgy and the innovative. It depicted the old guard of scouts and by-the-gut coaches as the losers in a new system overtaken by a young crop of quants. But if baseball is a guide to the future of other sports, that dramatization will not mirror reality.
Rather, the lifetime experts who know the softer side of the game will likely work alongside the number-crunchers to create an ever more powerful recruiting and coaching force. Using the best tactics from both worlds—the subjective and the objective, the emotional and the rational—teams will reach heights of achievement and competition that neither the old guard nor the new could reach on their own.