Secrets of wait loss

As life moves faster, everyone’s patience is wearing thin. Who gets to the head of the line quickest?

Credit: Sam Peet

Chana R. Schoenberger | Nov 18, 2019

Sections Strategy

Collections Operations Management

Stocks are processed in milliseconds. Chauffeured cars can be summoned almost instantly. Yet people are still routinely made to slow down and cool their heels. We wait in lines at the grocery store and the doctor’s office, on the phone with customer service, and virtually when buying concert tickets or waiting for a website to load. 

But as consumers are coming to expect ever-faster service, companies are looking for ways to keep lines moving. Customers are, too, and have found some shortcuts. For example, if you’re stuck on hold with an automated phone system, it may fast-track you if you swear, according to the technology news site TNW.

Researchers specializing in queuing theory may have better solutions. For most of the past century, operations researchers developed systems to reduce lines, and they were largely successful at streamlining factories, telephone exchanges, and other tasks. But with the service sector now dominating the economy, some recognize that they need a new approach, one that takes into account human behavior. Many lines that form these days are affected by people’s quirks and biases—including their propensity to swear or hang up when frustrated by circuitous automated phone systems. Anticipating these reactions could help one person cut the line, but could also help many people more quickly get what they’re waiting for.

Get in the queue

Queuing theory is a branch of mathematics that optimizes waiting times. Some of the earliest work was done at the turn of the 20th century by the Danish mathematician Agner Krarup Erlang, who tried to predict load times for the telephone exchange so that people could pick up a receiver and hear a dial tone rather than a busy signal. 

Since then, much work in the field has focused on waiting times in automated environments, such as manufacturing, or on fairness and scheduling in computer science. Consider a semiconductor factory, where silicon wafers are made for use in electronic devices. Research has developed ways to organize the factory so that each wafer is fabricated in the fastest and most efficient way. These factories are in some ways an ideal test case for wait times. The goods produced are delicate and expensive, giving companies an incentive to carefully but expeditiously move them through the production line and out the door. No company wants to build up unnecessary inventory and get stuck paying for storage. Traditional queuing theory assumed that jobs would wait forever to be processed. Those silicon wafers, after all, are inert and not going anywhere on their own accord. 

But over a few decades, the manufacturing sector has contracted while service has expanded to represent roughly two-thirds of US economic activity. The trend has taken hold across the globe: according to the World Bank, the service sector added 65 percent in value to overall GDP in 2017, versus 62 percent in 1997. As this has happened, queuing theorists have become more interested in the human side of queues. In hospitals, restaurants, and shops, lines comprise people, not inanimate objects. People are at the front of lines, too, routing patients to hospital beds, leading hungry patrons to tables, or ringing up purchases. 

With $200 billion in annual revenues, according to outsourcing expert CustomerServ, the call-center business represents a particularly good example of an industry that involves both long wait times as well as that crucial factor of human reactions. At call centers, both customers and customer-service agents are masses of feelings, biases, and restlessness. “We need to develop formulas that take into account human behavior; for example, customers hanging up,” says Chicago Booth’s Amy R. Ward.

This requires different math than used in the factory context. Most academics look at queuing theory by using stylized models that involve exponential distributions and require finite-state dimension. These models mostly reveal the way things should behave, but not the way things actually happen in the real world, says Amber Puha of California State University at San Marcos. Puha studies measure-valued processes, a tool that allows for writing equations to track how long every customer in a line has been waiting, and how long every customer being helped has been speaking to an agent.

To set up the kind of queuing problem that presents itself at a call center, Puha, Ward, and other operations researchers first divide customers into classes depending, for example, on the type of service desired. One person calling a bank might want to know her account balance, while another might want to report a stolen credit card. The goal is to design control policies that push customers through the process as quickly as possible. For most companies, the point at which someone calls in and needs to be paired with an agent is the start of the challenge. Quality of service is measured by short wait times, and the first point of contact affects the company’s reputation.

Customers who refuse to wait provide an example of how human behavior can short-circuit a traditional queuing model. People hang up when they’re frustrated. Wafers do not. “We’re interested in more fully understanding the implications of this, how to maximize revenue, minimize customer dissatisfaction, and maximize customers being happy with the service,” says Puha.

In one project, Puha and Ward set up a mathematical model that incorporates different types of customers, and use a probability model to distribute them, seeking to capture the behavior of frustrated humans hanging up when on hold. Then they analyzed how the system works when the number of call-center agents is very large.

The researchers thought first about what happens when call-center agents can do anything, such as speak every language. “Because this phenomenon of how abandonment [when customers hang up, or abandon a transaction] affects the overall system performance is not particularly well understood, it’s interesting to understand it even when agents are fully cross-trained,” Puha says. After that, they could study what happens when an agent has a specialization, such as speaking only English, or only Spanish. Would people calling in still hang up at the same rate?

Inequality on the line

A model that seeks to optimize the queue reflects a company’s goals and priorities, which typically include fairness. When establishing rules for how to route people, a company has to decide whether to treat customers more or less equally or to favor some over others. Will some groups of callers be forced to wait longer than others, on the basis of their frequent-flyer tier, hotel loyalty points, or order history? Say a customer has a credit card designed for big spenders, for which she pays a hefty annual fee. When she calls customer service and types in her credit-card number, should she be routed to a shorter queue for high-value customers? In many cases, companies have decided that yes, she should get this priority treatment.

Consequently, the conversation with the service representative who answers your call may be more pleasant and efficient when you are a member of a group the company wants to keep happy, and companies can use mathematical models to ensure this happens. “I can look at the solution to that optimization problem, rank the classes [by how valuable they are to me], and serve my highest priority customers first,” says Ward. “But that optimization problem is not capturing adverse consequences from treating customers unequally.”

Many of the problems Ward formulates focus on averages. If a center’s average wait time is one minute, that could be because everyone waits for one minute—or it could be because 98 percent of customers have no wait at all, while 2 percent of them wait for an hour. Those tails can have an impact on a business, so it makes sense to design a problem that incorporates both average and variability, Ward says.

The model Ward and Puha wrote can include fairness constraints. For example, given the number of agents staffed and the customer demand, there must be a certain amount of waiting, say 10 minutes on average. The question is: Should one customer group have one minute of waiting and the other group have nine minutes, or should the split be 50–50, or something else? In different situations, a company may need to be more or less worried about treating customers more or less equally, she says.

This ties in to the more general problem of social inequality, an issue that isn’t accounted for in call-center models but is very much a concern in real life. Routing a VIP customer to a shorter queue may make business sense, but it’s another reason people have disparate experiences that cause them to have differing views about, say, access to resources.

The business decision a company has to make involves where on the axis of fairness it wants to sit. Taking into consideration a company’s priorities, call-center executives have to decide how many resources to allocate, such as the number of agents who answer the phones. In addition to where customers are placed in line, a model also determines what agent they are routed to. Pairing customers with agents requires companies to decide how many employees to hire at different skill levels. A call center’s scheduling algorithm can take into account which employees can serve which types of customers. Of course, it’s more expensive to hire only agents with top-level skills. Moreover, if the VIP customer is routed to a high-skilled agent when several other customers are waiting on hold, the decision affects everyone’s wait times, and those of future customers.

The next stage for managers is to give people incentives to do the most efficient thing. “The first step in solving the scheduling problem is to assume the employees will do exactly what I want,” says Ward. “Then I can deliver the same quality of service to different customer classes as somebody else, but with fewer employees, so my model would allow you to be more efficient. But what if the employees do not behave consistent with my solution?”

Agents are people too

Fairness and efficiency are concerns with respect to the way queuing models treat not only customers but also the call-center agents who interact with them. These representatives offer another set of potential behavioral issues that can crop up when humans don’t do what the algorithm dictates. Agents can have bad days, or be tired, or need coffee, or feel berated. They may work quickly or slowly. They have their own preferences for the types of calls they want to answer, and in some cases even dictate whether they will see a variety of call types, for instance.

“If we think of higher-touch services where humans are providing the service, the human element kicks in both on the customer side and the server side,” says New York University’s Mor Armony. Armony has coauthored two papers with Ward, and they have studied how to route calls to provide for agents to have time to take breaks. But here is the conundrum: Suppose you are the most-effective employee. Thanks to your efficiency, you tend to get more work routed your way, and you may take steps to protect yourself and your time if your manager fails to do so. An academic referee that produces quality reviews quickly will likely be asked to review more papers, so that person may decide to take longer to return reviews in order to keep the workload in check.

Raga Gopalakrishnan of Queen’s University in Kingston, Ontario, and Ward are conducting research together, and their investigations include how to account for agent burnout, how to design systems so that more-efficient agents don’t feel as though they are being punished when they are assigned more work than others, and how such considerations interact with customer behavior (less work being done by the agents means higher wait times for the customers).

Gopalakrishnan and Ward, together with Booth PhD student Yueyang Zhong, are currently looking at the strategic behavior of both customers and agents. In their model, an agent will choose how quickly to work. While customers in a call center can’t necessarily see where they are in a queue, customers in a grocery store often can, and both they and the cashiers sometimes make decisions accordingly, leading to complex interactions. Say one checkout line is shorter than several others. More customers may join the shorter line, and the cashier, seeing a longer line, may speed up. (See “Which type of grocery queue is better?” below.)

Gopalakrishnan, Ward, and Zhong are working to analyze how the customer and agent trends interact with each other in order to design service systems that would settle down into and operate at an efficient equilibrium.

From search requests to hospital beds

In mathematical modeling in general, and queuing in particular, tools used in one context can be applied in another. This holds true in behavioral projects, where the same models used to optimize call centers can be used to solve other waiting-time and resource-allocation problems in areas from restaurants to retail. Armony is using queuing theory to study patients’ wait times in hospitals. Although this is a physical problem—patients are present, rather than waiting on the phone—it shares many characteristics with call-center issues.

People often think that emergency departments are busy because patients come in at random times, says Armony, but the distribution of patients by arrival time is actually fairly standard. In reality, while people don’t call a center or arrive at a hospital at a constant rate, there are patterns. Most people go to the ER at convenient times, such as after dropping their children at school, or on a Monday after getting sick on the weekend. Then these patients clog up the emergency department as they wait for a transfer to an equally overcrowded inpatient ward. Hospitals need to know how to get patients to beds faster, in order to treat patients expeditiously and make the best use of hospital resources.

“We know health-care costs are really high, and if people wait too long to get medical treatment, there could be severe effects. But [the problem] also translates into a mathematical theory that ends up being useful,” Armony says.

She observes that some hospitals have implemented tracking systems to winnow out the ER patients who arrive needing urgent care–type treatment, such as a prescription, rather than an inpatient bed. In the hospital she is studying, experienced triage nurses assign incoming patients a score of 1 to 5 on the basis of how likely they are to need an overnight stay in the hospital. The urgent-care patients are then treated quickly in a separate area and released. Most who stay in the ER are those who score a 3, while those scoring 1 or 2 may need surgery or a trip to the intensive-care unit.

Other research, including a recent paper Armony coauthored, deals with queuing for operating rooms, which involves optimizing the schedule so that patients can receive surgery quickly and find a recovery-room bed waiting when they are finished. When hospitals move patients into a nursing ward, they have to determine the order in which wards will take new patients, on the basis of the beds available and the patients’ medical problems. Hospitals can use queuing theory to make these assignments, but a real test is what happens within the hospital itself: Do nurses and doctors follow computerized rules implemented by the hospital or do they make their own decisions? And which group, the humans or the machines, makes better decisions for patients? Hospital executives know their trained staff have deep expertise in treating patients, but they also know that these staff members might not recognize the patterns a computer can see.

“The question is how to design the algorithm to support the decision makers in a way that would be helpful to them rather than intrusive or something they would just ignore,” says Armony.

A related issue is how hospitals route patients depending on the criteria that insurance companies and regulators use to judge the quality of care. Hospitals are penalized when patients who have been treated come back to the hospital too quickly, so some have an area for ER patients to wait under observation before they’re discharged, to make sure they are well enough to go home. This cuts down on readmission rates, thus presumably shortening future queues.

Health-care queuing is “not a problem that I think will ever be solved completely, as the state of the art continues to improve,” says Armony. Queuing problems can be solved mathematically, but researchers have much more to do in working toward better modeling human behavior. That way, when an airline call-center rep finally picks up your call, you may not feel quite so angry about the amount of time you’ve been waiting.