Friday, February 17, 2012

Twitter Sentiment Analysis: A Brief Bibliography

imageTwitter’s TweetStream is a Big Data mother lode that you can filter and mine to gain business intelligence about companies, brands, politicians and other public persona. Social media data is a byproduct of cloud-based Software as a Service (SaaS) Internet sites, such as Twitter, Facebook, LinkedIn and StackExchange, which are targeted at consumers. Filtered versions of this data are worth millions to retailers, manufacturers, pollsters, politicians and political parties, as well as marketing, financial and news analysts.

• Updated original 11/26/2011 version on 2/17/2012 with added details of Twitter sentiment analysis by Microsoft Codename “Social Analytics” Team, Microsoft Research personnel and Susan Etlinger:

Sentiment Analysis According to Wikipedia

Wikipedia’s Sentiment Analysis topic begins:

imageSentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials.

Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).

Subtasks

A basic task in sentiment analysis[1] is classifying the polarity of a given text at the document, sentence, or feature/aspect level — whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as "angry," "sad," and "happy."

Early work in that area includes Turney[2] and Pang[3] who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang [4] and Snyder [5] (among others):[4] expanded the basic task of classifying a movie review as either positive or negative to predicting star ratings on either a 3 or a 4 star scale, while Snyder [5] performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale).

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral or positive sentiment with them are given an associated number on a -5 to +5 scale (most negative up to most positive) and when a piece of unstructured text is analyzed using natural language processing, the subsequent concepts are analyzed for an understanding of these words and how they relate to the concept. Each concept is then given a score based on the way sentiment words relate to the concept, and their associated score. This allows movement to a more sophisticated understanding of sentiment based on an 11 point scale. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text [6]. …

The topic continues with Methods, Evaluation, Sentiment Analysis and Web 2.0 and See Also sections. From the References section:

    1. Michelle de Haaff (2010), Sentiment Analysis, Hard But Worth It!, CustomerThink, http://www.customerthink.com/blog/sentiment_analysis_hard_but_worth_it, retrieved 2010-03-12.
    2. Peter Turney (2002). "Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews". Proceedings of the Association for Computational Linguistics. pp. 417–424. arXiv:cs.LG/0212032.
    3. Bo Pang; Lillian Lee and Shivakumar Vaithyanathan (2002). "Thumbs up? Sentiment Classification using Machine Learning Techniques". Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP). pp. 79–86. http://www.cs.cornell.edu/home/llee/papers/sentiment.home.html.
    4. Bo Pang; Lillian Lee (2005). "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales". Proceedings of the Association for Computational Linguistics (ACL). pp. 115–124. http://www.cs.cornell.edu/home/llee/papers/pang-lee-stars.home.html.
    5. Benjamin Snyder; Regina Barzilay (2007). "Multiple Aspect Ranking using the Good Grief Algorithm". Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL). pp. 300–307. http://people.csail.mit.edu/regina/my_papers/ggranker.ps.
    6. Thelwall, Mike; Buckley, Kevan; Paltoglou, Georgios; Cai, Di; Kappas, Arvid (2010). "Sentiment strength detection in short informal text". Journal of the American Society for Information Science and Technology 61 (12): 2544–2558. http://www.scit.wlv.ac.uk/~cm1993/papers/SentiStrengthPreprint.doc.

The Thelwall, et al. paper, which describes analyzing MySpace comments, is most germane to Twitter sentiment analysis.


Twitter Sentiment Analysis at Microsoft Research

Microsoft Research has maintained active research projects for natural language processing (NLP) for many years at its San Francisco campus and later Microsoft Research Asia, headquartered in Beijing, China.

The Social Analytics Team described a Lab Bonus! Enhanced Sentiment Analysis for Twitter from Microsoft Research on 2/2/2012:

What’s new?

In this release, we enhanced our sentiment software by upgrading to the latest version of code from the same Microsoft Research team we used in prior releases. The major improvement in this version of their code is a new classifier specifically trained on Tweets. The sentiment analysis code we used in prior releases from Microsoft Research was trained on short sentences and paragraphs. We predict that the accuracy of sentiment analysis will improve in Social Analytics by using the classifier trained specifically on tweets for Twitter content items. We will continue to use the sentence and paragraph classifiers on all other content.

The tweet classifier was trained on nearly 4 million tweets from over a year’s worth of English Twitter data. It is based on a study of how people express their moods on Twitter with mood-indicating hashtags. We mapped over 150 different mood-bearing hashtags to positive and negative affect, and used the hashtags as a training signal to learn which words and word pairs in a tweet are highly correlated with positive or negative affect.

How we use the sentiment software in Social Analytics

We use the MSR sentiment software to assess the tone of all content items as part of our enrichment process. When the assessment is complete we store both that ranking and the reliability of that assessment the these 2 fields respectively; CalculatedToneID and ToneReliability. Our API and sample client Silverlight UI will expose content item as either positive or negative if the sentiment engine scores the item with a reliability percentage over a certain threshold we determine.

Here is a simple explanation of the three fields related to sentiment in the ContentItem table:

Field Description
CalculatedToneID

The sentiment (or tone) of the content item as determined by the sentiment software:

  • 3 = Neutral
  • 5 = Positive
  • 6 =Negative
ToneReliability The reliability of the tone calculation as determined by the sentiment software. The reliability thresholds are currently 80% for positive sentiment and 90% for negative sentiment. If we’re below the reliability threshold, the CalculatedToneId will be set to neutral.
ToneID If a user sets the sentiment manually in our UI or thru an API call, the tone they set is stored in this field. If ToneID is set, we show ToneID rather than CalculatedToneID in the UI and return it in API calls.

For more details on this Microsoft Research project, check out http://research.microsoft.com/tweetaffect! [See following article.]


Scott Counts (@scottjcounts), Munmun De Choudhury (@munmun10) and Michael Gamon of Microsoft Research described Affect detection in tweets in a January 2012 project description:

The case for automatic affect detection:

Detecting affect in free text has a wide range of possible applications:

  • What are the positive and negative talking points of your customers?
  • What opinions are out there on products and services (on Twitter, Facebook, in product reviews, etc.)?
  • How does mood and sentiment trend over time, geography and populations?

Similarly, there are different techniques to automatically detect affect: some systems use hand-curated word lists of positive and negative opinion terms, others use statistical models that are trained on opinion-heavy text. The challenge is to come up with a system that works reasonably well across various domains and types of content. In other cases, though, it would be better to use a classifier specific to a particular task, in which case the challenge is in creating, or finding, enough annotated text to train a classifier.

People:

Scott Counts Munmun De Choudhury Michael Gamon

Scott Counts

Munmun De Choudhury

Michael Gamon

Recently we have conducted a study based on the psychological literature where we identified over 150 different mood hashtags that people use on Twitter. We mapped these hashtags into positive and negative affect and used them as a training signal to identify affect from the tweet text. We collected nearly four million tweets from a span of one year and trained a text classifier on this data.

How the classifier works:

The classifier is trained on text with known affect (positive or negative). For each such text, words and word pairs are extracted and counted. At training time, the classification algorithm (maximum entropy classifier) assigns numerical weights to the words an word pairs depending on how strongly they correlate with positive or negative opinion. At runtime, a new text is passed in and words and word pairs are extracted from the new text. These are passed into the classifier, the weights for the words/pairs are looked up and combined in a classifier-specific mathematical formulation, and the output is a prediction (positive or negative) and a probability.

Training time:

Runtime:

Related References:

Author(s) Title In
Diakopoulos, N., De Choudhury, M., and Naaman, M. (2012) Finding and Assessing Social Media Information Sources in the Context of Journalism Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Austin, TX, USA, May 5-10, 2012). CHI 2012.
De Choudhury, M., Diakopoulos, N., and Naaman, M. (2012) Unfolding the Event Landscape on Twitter: Classification and Exploration of User Categories. Proceedings of the 15th ACM Conference on Computer Supported Cooperative Work (Seattle, USA, Feb 11-15, 2012). CSCW 2012.
De Choudhury, M., Counts, S. Czerwinski, M. (2011)

Find Me the Right Content!
Diversity-Based Sampling of Social Media Spaces for Topic-Centric Search

Proceedings of the 22nd ACM Conference on Hypertext and Hypermedia (Eindhoven, Jun 6-9, 2011). HT 2011.


Meredith Ringel Morris, Scott Counts, Aaron Hoff, Asta Roseway, and Julia Schwarz described Tweeting is Believing? Understanding Microblog Credibility Perceptions in a February 2012 paper presented at ACM, CSCW 2012. From the Abstract:

Twitter is now used to distribute substantive content such as breaking news, increasing the importance of assessing the credibility of tweets. As users increasingly access tweets through search, they have less information on which to base credibility judgments as compared to consuming content from direct social network connections. We present survey results regarding users’ perceptions of tweet credibility. We find a disparity between features users consider relevant to credibility assessment and those currently revealed by search engines. We then conducted two experiments in which we systematically manipulated several features of tweets to assess their impact on credibility ratings. We show that users are poor judges of truthfulness based on content alone, and instead are influenced by heuristics such as user name when making credibility assessments. Based on these findings, we discuss strategies tweet authors can use to enhance their credibility with readers (and strategies astute readers should be aware of!). We propose design improvements for displaying social search results so as to better convey credibility.

The complete paper is available in PDF format here.


The QuickView Project at Microsoft Research Asia

According to a July 2011 paper by the Microsoft QuickView Team, presented at SIGIR ’11, July 24–28, 2011, Beijing, China (ACM 978-1-4503-0757-4/11/07), QuickView is an advanced search service that uses natural language processing (NLP) technologies to extract meaningful information, including sentiment analysis, as well as index tweets in real time. From the paper’s Extended Abstract:

With tweets being a comprehensive repository for super
fresh information, tweet search becomes increasingly popular.
However, existing tweet search services, e.g., Twitter1,
offer only simple keyword based search. Owing to the noisy
and informal nature of tweets, the returned list does not
contain meaningful information in many cases.

This demonstration introduces QuickView, an advanced
tweet search service to address this issue. It adopts a series
of natural language processing technologies to extract useful
information from a large volume of tweets. Specifically, for
each tweet, it first conducts tweet normalization, followed
by named entity recognition(NER). Our NER component
is a combination of a k-nearest neighbors(KNN) classifier
(to collect global information across recently labeled tweets)
with a Conditional Random Fields (CRF) labeler (to exploit
information from a single tweet and the gazetteers). Then
it extracts opinions (e.g., positive or negative comments
about a product). After that it conducts semantic role labeling(
SRL) to get predicate-argument structures(e.g.,verbs
and their agents or patients), which are further converted
into events (i.e., triples of who did what). We follow Liu
et al. [1] to construct our SRL component. Next, tweets
are classified into predefined categories. Finally, non-noisy
tweets together with the mined information are indexed.
On top of the index, QuickView enables the following two
brand new scenarios, allowing users to effectively access their
interested tweets or fine-grained information mined from
tweets.

Categorized Browsing. As illustrated in Figure 1(a),
QuickView shows recent popular tweets, entities, events,
opinions and so on, which are organized by categories. It
also extracts and classifies URL links in tweets to allow users to check out popular links in a categorized way.

image

Fig. 1(a) A screenshot of the categorized browsing scenario.

Advanced Search. As shown in Figure 1(b), QuickView
provides four advanced search functions: 1) search results
are clustered so that tweets about the same/similar topic are
grouped together, and for each cluster only the informative
tweets are kept; 2) when the query is a person or a company
name, two bars are presented followed by the words that
strongly suggest opinion polarity. The bar’s width is proportional
to the number of associated opinions; 3) similarly, the
top 6 most frequent words that most clearly express event
occurrences are presented; 4) users can search tweets with
opinions or events, e.g., search tweets containing any positive/
negative opinion about Obama or any event involving
Obama.

image

Fig 1(b). A screenshot of the advanced search scenario.

Twahpic: Twitter topic modeling

"Twahpic" shows what tweets on Twitter™ are about in terms of both topics (like sports, politics, Internet, etc) and axes of Substance, Social, Status, and Style. Twahpic uses Partially Labeled Latent Dirichlet Analysis (PLDA) to identify 200 topics used on Twitter. Try the demo and see what topics someone uses or what a hashtag is all about!

Try the demo: http://twahpic.cloudapp.net/

Screenshot of Twahpic interface

Publications

Here’s a screen capture of Twahpic displaying Tweets about Windows 8 on 2/16/2012:

image


Twitter Sentiment, a Demo Using the Google App Engine

Stanford University students Alec Go, Richa Bhayani and Lei Huang developed a Google App Engine project called Twitter Sentiment “to research the sentiment for a brand, product, or topic.” Here are the results for Windows 8 on a relatively small Tweet sample:

image

Note: Tweets are color coded pink (negative) and light-green (positive).

According to the Twitter Sentiment help file:

What is Twitter Sentiment?

Twitter Sentiment allows you to research the sentiment for a brand, product, or topic.

Twitter Sentiment is a class project from Stanford University. We explored various aspects of sentiment analysis classification in the final projects for the following classes:

What are the use cases?

  1. Brand management (e.g. windows 7)
  2. Polling (e.g. obama)
  3. Purchase planning (e.g. kindle)
Who created this?

Twitter Sentiment was created by three Computer Science graduate students at Stanford University: Alec Go, Richa Bhayani, and Lei Huang. Twitter Sentiment is an academic project. It is neither encouraged nor discouraged by our employers.

How does this work?

You can read about our approach in our technical report: Twitter Sentiment Classification using Distant Supervision. We also added new pieces that aren't described in this paper. For the latest in this field, read the papers that cite us.

Can you help me?

Sure, we really like helping people with machine learning, natural language processing, or social media analysis questions. Feel free to contact us if you need help.

How is this different?

Our approach is different from other sentiment analysis sites because:

  • We use classifiers built from machine learning algorithms. Some other sites use a simpler keyword-based approach, which may have higher precision, but lower recall.
  • We are transparent in how we classify individual tweets. Other sites do not show you the classification of individual tweets and only show aggregated numbers, which makes it difficult to assess how accurate their classifiers are.
Related work

If you like Twitter Sentiment, you might like Twitter Earth, which allows you to visualize tweets on Google Earth.


Presentations about Social Media Analytics by Altimeter Group’s Susan Etlinger

imageSusan Etlinger (@setlinger) is an Industry Analyst with Altimeter Group where she focuses on helping clients develop strategic and actionable social media measurement and listening strategies. Susan is regularly interviewed and asked to speak on social strategy and best practices for business.

The following presentations and report cover a wide range of social media analytic techniques with multiple data sources, including Twitter:

Note: Refresh the page (F5) if the first slide doesn’t appear.


0 comments: