Search This Blog

Pages

Wednesday, November 9, 2016

Polls, Analytics and Backlashes

It's been a while that I've posted anything here (for various reasons), but I thought that the fact that we've just seen an historic election in the United States of America warranted jotting down a few thoughts. This election completely upset most, if not all, predictions made by expert pundits, pollsters and analytics experts. And, this is because of the effect that massive leads and one-sided reporting have on the human psyche.

Nate Silver's (@NateSilver538FiveThirtyEight.com has followed the election campaign closely and analyzed the available data in polls-plus forecasts ("What polls, the economy and historical data tell us about Nov. 8.") providing a view on what the likely outcome of the election would be. Twenty-four hours ago, on November 8th, 2016, here's what that forecast looked like:
On November 9th, 2016, we now know that this forecast was incorrect and that Mr. Trump is President-Elect. So how can this prediction, rooted in polls, economics and historical data, have been so wrong?

First, let me bring another prediction and result to the discussion. Consider the Alberta general election of 2012:

Prediction
Source.
The numbers in white above is a forecast of the number of seats that the parties will win in the Provincial Legislature and the bars indicate the possible range of seats won for each party. Green is the Wildrose Party, blue is the Conservative Party.

Result
 Source.
The coloured numbers are the actual seats won. Notice how the Wildrose Party was limited to 17 seats while the Conservative Party won significantly more seats than forecasted. The reason for this is that polling frequency was not adequate, meaning that pollsters missed a last minute shift in voter intention - and by last minute, we mean the last two days of the campaign.
"Wildrose's support simply cratered, and to an extent that no model or method could have anticipated." Source. Eric Grenier, ThreeHundredEight.com
"There is the possibility the polls somewhat over-estimated Wildrose support in the final week of the campaign and that the swing was not as dramatic as the numbers would suggest. But it seems very likely that Danielle Smith would have won an election held last week – and that a large enough number of Albertans changed their minds and opted for the Tories to swing the election at the very last moment."Source. Eric Grenier, ThreeHundredEight.com
This is what I call a backlash vote. Pollsters indicated that an upstart, further-right-wing party, would replace the long standing incumbent Conservatives. This may have caused a change in voter intention, and/or possibly the mobilization of voters who did not originally intend to vote at all, in an effort to temper the predicted change.

I don't believe that Mr. Trump defied the odds. Rather, I believe that there was a backlash vote, like in Alberta, that was intended to send a message as well as temper the forecasts and foregone conclusions of a landslide win. In fact, landslide wins of these proportions are not common and are often accompanied by some prevailing social context. In the case of this Presidential election, the context was that of an anti-establishment election, as noted by some in social media.

The bottom line is that, regardless of the advances in technology and ever increasing use of social media, there will always be a human component that is relatively unpredictable and can thwart even the most sophisticated analysis and data modelling. Hopefully, the AI eventually used to analyze and predict election outcomes won't decide that we're too erratic to govern ourselves and decide to wipe us out...


Disclaimer: I am Canadian. This post is intended to simply point out how trusting analytics and data analysis can be dangerous without context and supporting anecdotal evidence. This is why we say that Data Scientists working in an industry should have industry specific knowledge that can provide context to results.

No comments:

Post a Comment