M. Alex Johnson – Journalist at Large

An analog journalist in a digital world

Can you scientifically quantify social media opinion?

with 28 comments

Over at NBCNews.com, we’ve started publishing daily charts tracking what people are saying about the presidential and vice presidential candidates on Twitter and Facebook. Here’s today’s for the weekend (click here for the full-size version):

Full social media chart Aug 26 2012

In my analysis, I write:

In recent weeks, Obama has generally led Romney by two to seven percentage points in national polls, which carefully select their samples to reflect Americans most engaged in the election and registered to vote.

The picture is different among Americans who have gone online to talk about the election, however — NBCPolitics.com’s analysis indicates that that narrower but more diverse sample of the country prefers Romney by 36 percent to 32 percent overall and by 51 percent to 49 percent when they’re compared head to head:

'Intent to vote' sentiment Aug 26 2012

The report is unique among those being promoted by our competitors, whose social media analysis has largely focused on two metrics: “buzz,” or how much each candidate is talked about online in general, and “effectiveness,” or how extensively each candidate is using social media as a campaign tool.

Our analysis, by contrast, explores the actual content of what is being said, providing a glimpse at what issues are specifically driving people’s opinions. And that raises the question of whether an experiment like this is valid in any way. The short answer is “maybe.”

The long answer is that it won’t be long until full, statistically valid quantification of aggregated sentiment is common and easy. Several companies are working on it already, and we were impressed enough with Forsight, the tool built by Crimson Hexagon, to give it a test drive. It’s the same tool the Pew Research Center uses when it builds its reports on social media sentiment, which we considered an important endorsement.

We’ve actually been programming Crimson Hexagon and tracking the candidates since January. I’ve spent the past 7½ months combing through all its white papers, statistical models and documentation, building test programs and generally shaking it down. I, at least, am convinced that the data we’re getting are pretty solid.

Crimson Hexagon reports that its analysis of social media sentiment carries a statistical margin of error of plus or minus 3 percentage points. My analysis is that there are a lot of factors that can affect that, however, two in particular: the universe of posts it examines and the precision of the analysis monitors written by the client.

Crimson Hexagon has a direct connection to Twitter and collects the full firehose of all tweets sent every day. It collects a much smaller proportion of Facebook posts because of limits in the Facebook API. This means the ratio of Twitter-to-Facebook posts can be as high as 3:1.

Then the client has to devise very specific algorithms telling it what to look for and how to classify it. In our case, that programmer is me. My algorithms have been crafted to winnow out off-topic posts and straight news reports and to distinguish between different meanings of key words, like “race,” which can appear as generic references to the “race for the White House” or to the race of a person.

I’ve been running more than 30 daily monitors tracking the four candidates for more than seven months and reviewing both samples of the raw posts the tool is examining and the resulting classifications it’s spitting out. I’m comfortable with Crimson-Hexagon’s assertion of a 3-point margin of error, but in light of its limited Facebook sample, I’m stating in in-house reports and published articles that you should consider that it applies to the examined sample — which can run into hundreds of thousands of tweets and posts a day — but not necessarily to the universe of social media commentary.

By publishing these charts on the main NBCPolitics.com site, we’re making a big statement about the importance of social media in the political conversation in general and about the reliability of our data in particular. The rapidity with which such analysis is being refined means that by this time next year, what we’re doing could look primitive. But the technology is the best in the business today, and we decided that if anyone was going to be first to experiment with this kind of analysis in the daily news report, it should be us.

Your feedback would be enormously helpful, so please drop a note in the comments.

28 Responses

Subscribe to comments with RSS.

  1. […] blunder of and or reduction 3 commission points among a self-selected amicable media audience. Click here for a minute […]

    • also use the evolution is just a dniefreft religion lolz argument to try to bolster their viewpoint. Like you just did with climate scientists and those who accept their findings.To a climate change denier the term falsifiable hypothesis is unheard of, it seems. Meanwhile almost the entire body of climate scientists are unequivocally showing us that the world is getting warmer due primarily to human actions. But who cares what those eggheads have to say, right?


      May 20, 2012 at 1:45 pm

  2. […] blunder of and or reduction 3 commission points among a self-selected amicable media audience. Click here for a minute […]

  3. […] (The research — that rans from 3 p.m. ET Wednesday, when ABC News promote a talk with Obama, by 3 p.m. ET Thursday — useda apparatus called ForSight, a natural-language information height grown by Crimson Hexagon Inc. For this form of view analysis, Crimson Hexagon reports a domain of sampling blunder of and or reduction 3 commission points among a self-selected amicable media audience. Click here for a minute explanation.) […]

  4. Although the idea is great some would look at this as a bad image and could influence others in creating a major opinion on how a candidate is or isn’t electable. But this is just for the politics, in any other domain this “popularity scale” is a nice thing.


    May 15, 2012 at 10:38 am

  5. […] Explainer: Can you scientifically quantify social media opinion? […]

  6. […] Explainer: Can you scientifically quantify social media opinion? […]

  7. […] Explainer: Can you scientifically quantify social media opinion? […]

  8. […] Explainer: Can you scientifically quantify social media opinion? […]

  9. I believe that you could scientifically quantify social media opinion with caveats. Most scientific inquiry begins with a question, then observation and analysis. You’d need to make sure that all of the analyisis is pertaining to that portion of the population that uses or has access to social media (i.e. your demographics). If your analysis is constrained to facebook and twitter, list those constraints at the beginning because those are not the only forms of social media. Your results should be clearly presented along with the methods used to compile those results. This way, your results/conclusions could be independently confirmed. Please also be certian to describe the magin of error and what that error is attributable to. Sentiment carries a margin of error of 3 percent, well, which sentiment?


    October 5, 2012 at 2:00 pm

  10. […] Explainer: Can you scientifically quantify social media opinion? […]

  11. […] Explainer: Can we scientifically quantify amicable media opinion? […]

  12. […] Explainer: Can you scientifically quantify social media opinion? […]

  13. […] Explainer: Can you scientifically quantify social media opinion? […]

  14. I understand your desire to tap into the social media universe as a means of getting a handle on what the public is thinking on important issues such as the election. However, your analysis is subject to a number of types of bias. Having taught research methods and statistics at the university levels for years I know how tempting it can be since the data are out there. These data from social media website are neither random nor representative of public opinion. The main reason is self selection bias. People who comment on the debates have a strong opinion one way or the other to begin with so they are not a viable option for inclusion in a sample of opinion for the country at large. It would be like standing in front of an entrance to a mall and assuming that by picking every 3rd person who came out of the mall you were creating a random sample.

    In addition to bias, there are both internal and external threats to the validity of any conclusions drawn from these “samples”. In terms of internal validity what is the realiability of the measures you are using for your causal model. How have you accounted for plausible rival explanations to your conclusions? In particular, your “experiment” is vulnerable to both history threats to its internal validity. Debates are shaped by outside events and the opinions of those on Twitter are not less, indeed probably more, influenced by outside events. What about environmental factors that affect the dependent variable, which I assume is a person’s opinion about who won the debate. The people tweeting are not objective in any classical sense of the word. In fact many may, indeed probably do, have a favored candidate and therefore are likely to view the answers of the opposing candidate in a less than objective fashion.

    More fundamentally, in terms of external validity, this is not an “experiment” in the classic sense of the term. You may be using a data analysis tool but you are not “experimenting” so any concluisons you draw are not predictive in any scientific sense of the word. In general the best you could say would be that this method represents a interesting method for taping into what those who are watching the debate are tweeting at the time, not who won or lost the debate.


    October 11, 2012 at 7:54 pm

  15. Patrick:

    All of this is valid, and I’m careful to try to acknowledge it in every piece I do for NBCNews.com. Each one includes this passage well near the top:

    > For this report, NBCPolitics.com analyzed XXXXX social media posts using ForSight, a data platform developed by Crimson Hexagon Inc., which many research and business organizations have adopted to gauge public opinion in new media.

    > It isn’t the same as a traditional survey, which seeks to reflect national opinion; instead, it’s a broad, non-predictive snapshot of what’s being said by Americans who follow politics and are active on Facebook, Twitter or both at a particular moment in time, and why they’re saying it.

    I think we agree to a great extent, We’ve never promoted these stories as anything other than “hey here’s something interesting but don’t draw too many conclusions.” And I hope you’ll have noticed that this explainer post doesn’t actually come to a definitive answer to the question posed by its headline.


    October 11, 2012 at 8:08 pm

    • Alex,

      Thanks for replying to my post so quickly. I would agree that your posting is not definitive and I also commend you for trying to make use of an interesting dataset as opposed to running the usual “horse race” story. I believe that in the future we will be able to bring a more robust set of statistical tools to bear on these datasets. In particular I use social network analysis (SNA) to deal with the issues concerning nonlinearity, heteroscadistity, and nonnormality of sampling distributions commonly found in these social media datasets. You may be aware of the work of James Fowler who just published a big study using about 600 million facebook datapoints to describe the influence of Facebook on the likelihood of voting. I think this will be where some important academic work is headed in my estimation. I think this could also provide some interesting results as SNA produce quite striking graphs.


      October 11, 2012 at 9:49 pm

  16. […] Explainer: Can you scientifically quantify social media opinion? […]

  17. […] Explainer: Can you scientifically quantify social media opinion? […]

  18. But during the last few years, IPTV has dwarfed the quality of DVD
    and Cable TV. I am not only a PC player, please anyone let me to remove Dell Inspiron password…” We usually hear such inquiry around us.

    There are a few significant improvements made over Vista,
    but for the most part, it’s just a version of Vista which is both stable and more user-friendly.

    pirate proxy

    July 13, 2013 at 5:01 pm

  19. Hello! I could have sworn I’ve been to this site before but after reading through
    some of the post I realized it’s new to me. Nonetheless, I’m definitely glad I found it and I’ll be
    book-marking and checking back often!


    September 26, 2014 at 7:57 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: