Bershidsky: Social networks could predict the next Brexit

This is an archived article that was published on sltrib.com in 2016, and information in the article may be outdated. It is provided only for personal research purposes and may not be reprinted.

The failure of polls and bookmakers to predict the outcome of the Brexit referendum will push the financial sector, which relies on accurate information, to search for alternatives. Some claim they obtained good results from scraping social media, which may have been the best way to predict the June 23 vote result.

Brevan Howard Asset Management, co-founded by the billionaire Alan Howard, reportedly reduced risk ahead of the vote after using artificial intelligence to study social network data. Its $16 billion macro fund gained 1 percent on the day the results were announced; hedge funds globally lost 1.6 percent. Other funds, Bloomberg News reports, are increasing investments in this technology.

For now, the use of "big data" to predict election results can only be hit or miss. Yet traditional pollsters' results are so often misleading that experiments with using social media data deserve more attention. The publicly available social network data on elections don't appear to be very serious. Most of the information involves counting tweets or posts that mention a certain candidate or cause. In an article analyzing the Instagram activity that preceded the referendum, Vyacheslav Polonski, of Oxford University, found that "leave" supporters far outnumbered and were more active than "remainers." Twitter provided similar data about user activity: The "leave" campaign spurred more discussion. The New York Times studied Facebook data and found that "leave" generated more user engagement (likes, shares and comments).

This raw data doesn't mean much, though: A vocal, angry minority often makes more noise than a level-headed majority. The same hashtags can be used by both sides, often sarcastically. The U.K. is particularly fond of sarcasm, making it especially difficult to determine a poster's sentiment by hashtag or keyword. And what does it matter how widely a campaign is discussed if it's mostly ridiculed?

Even if many people today get their news from social networks, the seeming advantage of one group may not foretell the final result because Twitter, Facebook and Instagram users tend to exist in silos, only interacting with those who agree with them.

Apart from the self-selection bias caused by anger - unhappy people are motivated to talk about their feelings - there is the self-selection bias of choosing to be on social networks at all (though in the U.K., about 33 million people are on Facebook).

And yet there's definitely something there. Andranik Tumasjan, a professor at the Technical University of Munich, was among the first to publish analysis of how social network activity correlates with national election results. In 2010, Tumasjan and his co-writers claimed that the number of tweets about each party fairly accurately predicted the results of Germany's 2009 parliamentary elections. Other researchers later showed that the margin of error in Tumasjan's research was actually much bigger than in traditional polls.

It was followed by more sophisticated work that relied not just on the frequency of mentions, but also on different varieties of sentiment analysis. That's where artificial intelligence comes in: In the best-case scenario, researchers hire people to label thousands of posts as positive, negative or neutral and mask whatever other nuances the designers of the study consider important. Then, an artificial neural network is "trained" on these inputs until it is capable of "grading" posts on its own. Yet problems remained. In 2012, Daniel Gayo-Avello of the University of Oviedo in Spain wrote of the body of work accumulated in the area:

"It's not prediction at all! I have not found a single paper predicting a future result. All of them claim that a prediction could have been made; i.e. they are post-hoc analysis and, needless to say, negative results are rare to find.'

That's still the case with most published work. Real forecasts have proved elusive. For example, a team of researchers from Cardiff and Manchester Universities used Twitter to predict the outcome of the 2015 U.K. national elections, which the pollsters got woefully wrong. So did this team, even though it used sophisticated sentiment analysis: Their model predicted a hung Parliament with the Labour Party gaining most seats.

In some cases, social network analysis has yielded slightly better results than the polls - but it suffers from the same shortcomings as traditional surveys. Such analysis underestimated, for example, the performance of France's hard-right National Front - a party whose backers are often reluctant to voice their sentiments publicly.

Other hard-to-overcome problems include the self-selection biases and the geographic factor: Accurate geolocation data are needed to predict the performance of regional parties such as the Scottish nationalists and how the regional vote will split. Such information often isn't available.

In other words, in this rather young (but already crowded) field, academics are still struggling with the optimal design of a survey using social network data. And yet, apparently, some tech-savvy investors are already making money by combining traditional public opinion research with a "big data" component.

Traditional pollsters should embrace the technology, too - that may help perfect the methodology - and start reporting the results along with those of phone and online surveys. It's getting harder to make the right predictions with the old toolbox. After all, in situations such as the U.K. referendum, every vote counts and poll data can affect the turnout. The new tech is imperfect, and it's not easy to figure out how to apply it, but the world is changing too fast to ignore it.