CS 498 | Mar 6
SENTIMENT ANALYSIS CS 498 | Mar 6 Macbeth, Scene 1, Act 2 from - - PowerPoint PPT Presentation
SENTIMENT ANALYSIS CS 498 | Mar 6 Macbeth, Scene 1, Act 2 from - - PowerPoint PPT Presentation
SENTIMENT ANALYSIS CS 498 | Mar 6 Macbeth, Scene 1, Act 2 from Wordle my Citeulike page Brad Paleys TextArc. Fernanda Vigass Themail. Martin Wattenbergs recent Word Tree visualization, showing Alberto Gonzaless testimony.
Macbeth, Scene 1, Act 2 from Wordle
my Citeulike page
Brad Paley’s TextArc.
Fernanda Viégas’s Themail.
Martin Wattenberg’s recent Word Tree visualization, showing Alberto Gonzales’s testimony.
PNNL’s ThemeRiver.
PNNL’s IN-SPIRE.
tools
you know...practical stuff
Stanford’s list
http://nlp.stanford.edu/links/statnlp.html
LIWC
http://www.liwc.net
SentiWordNet
http://sentiwordnet.isti.cnr.it
Pang & Lee’s data at Cornell
http://www.cs.cornell.edu/People/pabo/movie-review-data http://www.cs.cornell.edu/home/llee/data/convote.html
analysis & design
how we might use it and why
If I could figure out a way to determine whether people are more fearful or changing to more euphoric … I can forecast the economy better than any way I know. e trouble is, we can't figure that out.
“
— Alan Greenspan, Jan 2008
1841
Nasdaq vs. LiveJournal “anxious” moods, Jan 3 – Oct 26, 2007.
BOOSTED DECISION TREE CLASSIFIER
- 1. nerv*
- 2. wor*
- 3. anx*
- 4. hop*
- 5. you*
- 6. scar*
- 7. tomorrow
- 8. fun
- 9. war
- 10. your*
- 11. going
- 12. be*
- 13. interview
- 16. lov*, 21. hospital,
- 36. awesome, 51. yay, 89. exam*
- ther notables:
Bagged Naive Bayes classifier
Anxious true positive rate: 28% Anxious false positive rate: 3.4%
Boosted Decision Tree classifier
Anxious true positive rate: ~30% Anxious false positive rate: ~6%
All LiveJournal blog posts
posts per minute: ~107
Percentage of anxious posts in 10-min period Percentage of anxious posts in 10-min period Adapted Wald adjustment (lower bound on 95% CI) Adapted Wald adjustment (lower bound on 95% CI) average 60-min moving average
5% 10% 15% Jan 2008 Feb Mar Apr May Jun 11.6K 11.8K 12.0K 12.2K 12.4K 12.6K 12.8K 13.0K 13.2K μ + 6σ Dow Jones daily close 7-day exponential moving average Percentage anxious blog posts
Jan 26 Of three predictive spikes, this is the furthest from a local maximum: it appears 5 trading days later on Feb 1. The SC primary happens on this date. The Fed lowers rate 4 days before and 4 days after this date. Feb 24 This spike comes three days before the second most critical maxima over this six month
- period. The Dow takes nearly
two months to recover. After searching newspapers near this date, it is not clear what event may have caused this spike. Consumer confidence and poor business/housing reports follow in the next two days. Mar 25 This spike is probably noise, although it does preface a steep
- decline. Detecting important blogs
and topics may eliminate spikes like these. Conference Board’s consumer confidence came out this day and could be responsible for the spike. May 16 This spike appears three days before the most important local
- maxima. As of June 24, the Dow
has still not recovered from May 19, dropping nearly 10% to date. Michigan’s consumer sentiment index came out this day, along with unexpectedly poor housing
- numbers. May 19 followed with
many poor business reports (2.5 s.d. anxiety spikes on May 19).
11.5M posts
5 blog genres 33 top blogs 1,094 blog comments
OUR BLOG COMMENT DATASET
Great post and I really like the video. This is extremely similar to the approach I use in writing almost anything …
“
Just wait until hackers exploit the print layer to this mesh stuff enough to grab root and start injecting python code …
“
ProBlogger Scobelizer
Great post and I really like the video. This is extremely similar to the approach I use in writing almost anything …
“
Just wait until hackers exploit the print layer to this mesh stuff enough to grab root and start injecting python code …
“
ProBlogger Scobelizer
Wald method p < 0.05
neither agree 39.2%
Proportions of agreement
11.1% disagree
49.4%
AGREE/DISAGREE/NEITHER
LEXICAL
uni/bi/trigams TFIDF
POS
raw tags combo lexical
SENTIMENT
congressional floor rotten tomatoes LIWC
SEMANTIC
sim to post ESA
NAMED ENTITY
- rganizations
people
LIWC pos. emotion words agree LIWC affect words agree exclamations agree adjectives agree @ neither ellipsis !disagree great agree is tech blog neither cosine similarity to post !disagree great [noun] agree personal pronouns !disagree present tense verbs neither [prepos] [poss pronoun] agree tf-idf dot product with post !neither coordinating conjunctions agree
Features + Info Gain
0.079 0.049 0.043 0.041 0.041 0.038 0.035 0.034 0.034 0.03 0.028 0.026 0.026 0.026 0.026