Can you scientifically quantify social media opinion?
Over at NBCNews.com, we’ve started publishing daily charts tracking what people are saying about the presidential and vice presidential candidates on Twitter and Facebook. Here’s today’s for the weekend (click here for the full-size version):
In my analysis, I write:
In recent weeks, Obama has generally led Romney by two to seven percentage points in national polls, which carefully select their samples to reflect Americans most engaged in the election and registered to vote.
The picture is different among Americans who have gone online to talk about the election, however — NBCPolitics.com’s analysis indicates that that narrower but more diverse sample of the country prefers Romney by 36 percent to 32 percent overall and by 51 percent to 49 percent when they’re compared head to head:
The report is unique among those being promoted by our competitors, whose social media analysis has largely focused on two metrics: “buzz,” or how much each candidate is talked about online in general, and “effectiveness,” or how extensively each candidate is using social media as a campaign tool.
Our analysis, by contrast, explores the actual content of what is being said, providing a glimpse at what issues are specifically driving people’s opinions. And that raises the question of whether an experiment like this is valid in any way. The short answer is “maybe.”
The long answer is that it won’t be long until full, statistically valid quantification of aggregated sentiment is common and easy. Several companies are working on it already, and we were impressed enough with Forsight, the tool built by Crimson Hexagon, to give it a test drive. It’s the same tool the Pew Research Center uses when it builds its reports on social media sentiment, which we considered an important endorsement.
We’ve actually been programming Crimson Hexagon and tracking the candidates since January. I’ve spent the past 7½ months combing through all its white papers, statistical models and documentation, building test programs and generally shaking it down. I, at least, am convinced that the data we’re getting are pretty solid.
Crimson Hexagon reports that its analysis of social media sentiment carries a statistical margin of error of plus or minus 3 percentage points. My analysis is that there are a lot of factors that can affect that, however, two in particular: the universe of posts it examines and the precision of the analysis monitors written by the client.
Crimson Hexagon has a direct connection to Twitter and collects the full firehose of all tweets sent every day. It collects a much smaller proportion of Facebook posts because of limits in the Facebook API. This means the ratio of Twitter-to-Facebook posts can be as high as 3:1.
Then the client has to devise very specific algorithms telling it what to look for and how to classify it. In our case, that programmer is me. My algorithms have been crafted to winnow out off-topic posts and straight news reports and to distinguish between different meanings of key words, like “race,” which can appear as generic references to the “race for the White House” or to the race of a person.
I’ve been running more than 30 daily monitors tracking the four candidates for more than seven months and reviewing both samples of the raw posts the tool is examining and the resulting classifications it’s spitting out. I’m comfortable with Crimson-Hexagon’s assertion of a 3-point margin of error, but in light of its limited Facebook sample, I’m stating in in-house reports and published articles that you should consider that it applies to the examined sample — which can run into hundreds of thousands of tweets and posts a day — but not necessarily to the universe of social media commentary.
By publishing these charts on the main NBCPolitics.com site, we’re making a big statement about the importance of social media in the political conversation in general and about the reliability of our data in particular. The rapidity with which such analysis is being refined means that by this time next year, what we’re doing could look primitive. But the technology is the best in the business today, and we decided that if anyone was going to be first to experiment with this kind of analysis in the daily news report, it should be us.
Your feedback would be enormously helpful, so please drop a note in the comments.