Text-Reading Machines Can Predict Share Prices

A single word in a news report—a well-placed “undervalue,” for example—can drive a company’s stock price up or down. Investors can benefit if they can figure out which words matter within a few days, research suggests.

Investors and researchers have suspected for decades that text could be used to predict markets, some trying and failing. But applying machine-learning techniques originated by computer scientists, Harvard’s Zheng Tracy Ke, Yale’s Bryan T. Kelly, and Chicago Booth’s Dacheng Xiu have built a model that in early tests outperformed a similar strategy based on scores from RavenPack, the leading vendor of news-sentiment scores.

Traditionally finance researchers and market practitioners have relied on accounting data and fundamentals to predict where the market is headed. But quarterly reports arrive slowly for a market moving at warp speed, which led researchers and traders to look for other sources of predictive information, including news. To find out if news reports could be used to predict stock prices, Ke, Kelly, and Xiu borrowed machine-learning techniques used by computer scientists, who are increasingly training machines to understand text.

Efforts to predict market direction by parsing financial journalism date back to 1933, when economist and businessman Alfred Cowles III classified pieces in the Wall Street Journal as bullish, bearish, or neutral to inform trading strategies. That didn’t necessarily work—Cowles’s theoretical portfolio would have underperformed the market by more than 3 percent a year from 1902 to 1929, the researchers note—but other people have continued to pursue the idea of extracting useful information from text. Among them, Northwestern’s Scott R. Baker, Stanford’s Nicholas Bloom, and Chicago Booth’s Steven J. Davis analyzed years of newspaper articles to identify words associated with economic uncertainty, and have used those words to inform dozens of uncertainty-related indexes.

Some efforts to assess sentiment in text rely on preexisting dictionaries created for other purposes—such as the Harvard-IV Dictionary, a manually selected list of positive and negative psychosocial words, and the Loughran-McDonald Master Dictionary, developed to highlight meaningful words in financial texts and the sentiment associated with those words. The latter starts with word lists and uses US Securities and Exchange Commission filings to add terms relevant to the finance sector. For example, the dictionary added Scholes for the Black-Scholes modeling tool used with financial derivatives.

Ke, Kelly, and Xiu created a model that essentially automatically generates a dictionary of relevant words and allows for contextually specific sentiment scores. Using supervised machine learning and a method that required only a laptop and basic statistical capabilities, the researchers analyzed more than 22 million articles published from 1989 to 2017 by Dow Jones Newswires. Classifying words as either positive or negative, the researchers generated article-level sentiment scores—to highlight how news likely to be perceived as positive or negative would impact stock prices.

Why Words Are the New Numbers

Every day, we express ourselves in 500 million tweets and 64 billion WhatsApp messages. We perform more than 250 million searches on eBay. On Facebook, 864 million of us log in to post status updates, comment on news stories, and share videos.

CBR - Economics

Machine Learning Can Help Money Managers Time Markets, Build Portfolios, and Manage Risk

Research suggests today's computers can predict asset returns with an unprecedented accuracy.

CBR - Finance

The first step in the process involved screening articles for words frequently associated with positive or negative returns. “Undervalue,” “repurchase,” and “surpass” are good for a share’s price, and “shortfall,” “downgrade,” and “disappointing” are bad, the model establishes. Several of the most impactful words highlighted by the research, such as “repurchase,” don’t appear in the other dictionaries used to assess sentiment. Next, the model isolated and weighted terms most likely to be informative about a stock’s future price. Finally, it gave articles sentiment scores on the basis of the words assessed.

Some funds have likely been using natural language processing to trade for several years, with dubious success. A 2016 article in MIT Technology Review called analyzing language data to predict markets “one of the most promising uses of new AI techniques,” but one of the handful of funds it mentioned, Sentient, liquidated in 2018. The research by Ke, Kelly, and Xiu provides an academic framework for applying such processing to markets.

To demonstrate their model’s predictive capacity, the researchers devised a simple trading strategy to buy assets associated with positive recent news sentiment and sell assets associated with articles containing negative sentiment. The resulting portfolio outperformed a similar strategy based on scores from RavenPack, the leading vendor of news-sentiment scores. Returns didn’t begin to even out between the two until five days after an article’s publication.

Works Cited

Scott R. Baker, Nicholas Bloom, and Steven J. Davis, “Measuring Economic Policy Uncertainty,” Quarterly Journal of Economics, November 2016.
Alfred Cowles III, “Can Stock Market Forecasters Forecast?” Econometrica, July 1933.
Zheng Tracy Ke, Bryan T. Kelly, and Dacheng Xiu, “Predicting Returns with Text Data,” NBER working paper, August 2019.

More from Chicago Booth Review

Capitalisn’t: Is Short Selling Dead?

Investment manager Jim Chanos discusses short selling’s role in financial markets.

CBR - Capitalisnt

Should Private Companies Report Emissions?

A proposed greenhouse gas emissions reporting regime exempts private companies from disclosure.

CBR - Climate Change

To Drive Change, Should Investors Divest or Engage?

Investors and academics debated exit versus voice.

CBR - Finance

NECESSARY COOKIES These cookies are essential to enable the services to provide the requested feature, such as remembering you have logged in.	ALWAYS ACTIVE
	Accept \| Reject
PERFORMANCE AND ANALYTIC COOKIES These cookies are used to collect information on how users interact with Chicago Booth websites allowing us to improve the user experience and optimize our site where needed based on these interactions. All information these cookies collect is aggregated and therefore anonymous.
FUNCTIONAL COOKIES These cookies enable the website to provide enhanced functionality and personalization. They may be set by third-party providers whose services we have added to our pages or by us.
TARGETING OR ADVERTISING COOKIES These cookies collect information about your browsing habits to make advertising relevant to you and your interests. The cookies will remember the website you have visited, and this information is shared with other parties such as advertising technology service providers and advertisers.
SOCIAL MEDIA COOKIES These cookies are used when you share information using a social media sharing button or “like” button on our websites, or you link your account or engage with our content on or through a social media site. The social network will record that you have done this. This information may be linked to targeting/advertising activities.

Text-Reading Machines Can Predict Share Prices

Recommended Reading

Why Words Are the New Numbers

Machine Learning Can Help Money Managers Time Markets, Build Portfolios, and Manage Risk

More from Chicago Booth Review

Capitalisn’t: Is Short Selling Dead?

Should Private Companies Report Emissions?

To Drive Change, Should Investors Divest or Engage?

Related Topics

More from Chicago Booth

Related Topics

Manage Cookie Preferences

Text-Reading Machines Can Predict Share Prices

Recommended Reading

Why Words Are the New Numbers

Machine Learning Can Help Money Managers Time Markets, Build Portfolios, and Manage Risk

More from Chicago Booth Review

Capitalisn’t: Is Short Selling Dead?

Should Private Companies Report Emissions?

To Drive Change, Should Investors Divest or Engage?

Related Topics

More from Chicago Booth

Related Topics