Risks of Predictive Analytics


Coiffed Ray-GunBased on an analysis of more than a half million public posts on message boards, blogs, social media sites and news sources, IBM predicts that ‘steampunk,’ a sub-genre inspired by the clothing, technology and social mores of Victorian society, will be a major trend to bubble up, and take hold, of the retail industry. –Jan. 14, 2013 IBM Press Release

 

Really? Is this a good idea? Not steampunk fashion — that’s clearly a bad idea. But publicizing this data-driven prediction — is that a good idea? Could this press release actually cause an increase in rayguns and polished brass driving goggles?

I think this illustrates one of a couple of important potentially negative consequences to making and communicating statistical predictions. The first risk is that making predictions may sway people to follow the predictions. The second risk is that making predictions may sway people to inaction and complacency. Both of these risks may need to be actively managed to prevent advanced predictive modeling from causing more harm than good.

Recently, none other than Nate Silver indicated that if he thought his predictive models of elections were swaying the results, he would stop publishing them. There are longstanding questions about bandwagon and “back the winner” effects in polling and voting. If your predictions are widely seen as accurate, as Silver’s are, then your statements may increase votes for the perceived winner and decrease them for the perceived loser. It’s well known that more people report, after the fact, that they voted for a winning candidate than actually did so.

There are other ways that prediction can drive outcomes in unpredicted or undesired ways, especially when predictions are tied to action. If your predictive model estimates increased automobile traffic between two locations, and you build a highway to speed that traffic, than the “induced demand” effect (added capacity causes increased use) will almost certainly prove your predictive model correct. Even if the model was predicting only noise. The steampunk prediction may fall into this category, sadly.

The other problem is exemplified by sales forecasts. If your predictions are read by the people whose effort is needed to realize the forecast results, they may be less likely to come true. Your predictions are probably based on a number of assumptions, including that the sales team is putting in the same type of effort that they did last month or last year. But if forecast results are perceived as a “done deal,” that assumption will be violated. A prediction is not a target, and should not be seen or communicated as such.

How can these problems be mitigated? In some cases, by better communications strategies. Instead of providing a point estimate of sales (“we’re going to make $82,577.11 next week!”), you may be better off providing the numbers from an 80% or 90% confidence interval: “if we slack off, we could make as little as $60,000, but if we work hard, we could make as much as $100,000.” Of course, if you have the sort of data where you can include sales effort as a predictor, you can do even better than that.

Another trick to keeping people motivated is to let them beat their targets most but not all of the time. How do you do this? Consider providing the 20th percentile of a forecast distribution as the target. If your model is well-calibrated, those forecasts will be met 80% of the time. There is extensive psychological and business research in the best way to set goals, and my (limited) understanding of it is that people who think they are doing well, but with room for improvement, are best engaged.

Returning to the upcoming steampunk sartorial catastrophe, perhaps IBM should have exercised some professional judgement, as Nate Silver seems to be doing, and just kept their big blue mouth shut on this one.

Harlan Harris has a PhD in Computer Science (Machine Learning) from the University of Illinois at Urbana-Champaign, and post-doctoral work in Cognitive Psychology at several universities. He currently is Senior Data Scientist at Kaplan Test Prep, and co-organizes Data Science DC.

Latest posts by Harlan Harris (see all)

This entry was posted in Methods and tagged , . Bookmark the permalink.
  • davidcroushore

    But IBM’s reputation improves if their predictions are seen as largely accurate. If sharing these predictions publicly increases the chances that they will be correct, then it is in IBM’s interest to share all of their predictions publicly.

    A couple of years ago, I saw an interesting workaround. We want to be able to evaluation our predictions after the fact, but public declaration might affect the outcome. To avoid this, a predictor can encrypt the prediction and post it publicly. After a specified date, the key can be supplied so that the validity of the public prediction can be assessed, but the content of the prediction cannot be known until after the fact. This way the predictor does not influence the outcome, but there is evidence that the prediction was made prior to the event.