Sentiment Analysis A Baseline Algorithm Dan Jurafsky - PowerPoint PPT Presentation

Sentiment Analysis A ¡Baseline ¡ Algorithm ¡

Dan ¡Jurafsky ¡ Sentiment Classification in Movie Reviews Bo ¡Pang, ¡Lillian ¡Lee, ¡and ¡Shivakumar ¡Vaithyanathan. ¡ ¡2002. ¡ ¡Thumbs ¡up? ¡Sen+ment ¡ Classifica+on ¡using ¡Machine ¡Learning ¡Techniques. ¡EMNLP-‑2002, ¡79—86. ¡ Bo ¡Pang ¡and ¡Lillian ¡Lee. ¡ ¡2004. ¡ ¡A ¡Sen+mental ¡Educa+on: ¡Sen+ment ¡Analysis ¡Using ¡ Subjec+vity ¡Summariza+on ¡Based ¡on ¡Minimum ¡Cuts. ¡ ¡ACL, ¡271-‑278 ¡ • Polarity ¡detec+on: ¡ • Is ¡an ¡IMDB ¡movie ¡review ¡posi+ve ¡or ¡nega+ve? ¡ • Data: ¡ Polarity ¡Data ¡2.0: ¡ ¡ • hXp://www.cs.cornell.edu/people/pabo/movie-‑review-‑data ¡

Dan ¡Jurafsky ¡ IMDB ¡data ¡in ¡the ¡Pang ¡and ¡Lee ¡database ¡ ✓ ¡ ✗ ¡ when ¡_star ¡wars_ ¡came ¡out ¡some ¡twenty ¡years ¡ “ ¡snake ¡eyes ¡” ¡is ¡the ¡most ¡aggrava+ng ¡ ago ¡, ¡the ¡image ¡of ¡traveling ¡throughout ¡the ¡stars ¡ kind ¡of ¡movie ¡: ¡the ¡kind ¡that ¡shows ¡so ¡ has ¡become ¡a ¡commonplace ¡image ¡. ¡[…] ¡ much ¡poten+al ¡then ¡becomes ¡ unbelievably ¡disappoin+ng ¡. ¡ ¡ when ¡han ¡solo ¡goes ¡light ¡speed ¡, ¡the ¡stars ¡change ¡ to ¡bright ¡lines ¡, ¡going ¡towards ¡the ¡viewer ¡in ¡lines ¡ it’s ¡not ¡just ¡because ¡this ¡is ¡a ¡brian ¡ that ¡converge ¡at ¡an ¡invisible ¡point ¡. ¡ ¡ depalma ¡film ¡, ¡and ¡since ¡he’s ¡a ¡great ¡ director ¡and ¡one ¡who’s ¡films ¡are ¡always ¡ cool ¡. ¡ ¡ greeted ¡with ¡at ¡least ¡some ¡fanfare ¡. ¡ ¡ _october ¡sky_ ¡offers ¡a ¡much ¡simpler ¡image–that ¡of ¡ and ¡it’s ¡not ¡even ¡because ¡this ¡was ¡a ¡film ¡ a ¡single ¡white ¡dot ¡, ¡traveling ¡horizontally ¡across ¡the ¡ starring ¡nicolas ¡cage ¡and ¡since ¡he ¡gives ¡a ¡ night ¡sky ¡. ¡ ¡ ¡[. ¡. ¡. ¡] ¡ brauvara ¡performance ¡, ¡this ¡film ¡is ¡hardly ¡ worth ¡his ¡talents ¡. ¡ ¡

Dan ¡Jurafsky ¡ Baseline ¡Algorithm ¡(adapted ¡from ¡Pang ¡ and ¡Lee) ¡ • Tokeniza+on ¡ • Feature ¡Extrac+on ¡ • Classifica+on ¡using ¡different ¡classifiers ¡ • Naïve ¡Bayes ¡ • MaxEnt ¡ • SVM ¡

Dan ¡Jurafsky ¡ Sen%ment ¡Tokeniza%on ¡Issues ¡ • Deal ¡with ¡HTML ¡and ¡XML ¡markup ¡ • TwiXer ¡mark-‑up ¡(names, ¡hash ¡tags) ¡ • Capitaliza+on ¡(preserve ¡for ¡ ¡ PoXs ¡emo+cons ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡words ¡in ¡all ¡caps) ¡ [<>]? # optional hat/brow � [:;=8] # eyes � [\-o\*\']? # optional nose � • Phone ¡numbers, ¡dates ¡ [\)\]\(\[dDpP/\:\}\{@\|\\] # mouth � | #### reverse orientation � • Emo+cons ¡ [\)\]\(\[dDpP/\:\}\{@\|\\] # mouth � [\-o\*\']? # optional nose � [:;=8] # eyes � • Useful ¡code: ¡ [<>]? # optional hat/brow � • Christopher ¡PoXs ¡sen+ment ¡tokenizer ¡ 21 ¡ • Brendan ¡O’Connor ¡twiXer ¡tokenizer ¡

Dan ¡Jurafsky ¡ Extrac%ng ¡Features ¡for ¡Sen%ment ¡ Classifica%on ¡ • How ¡to ¡handle ¡nega+on ¡ • I didn’t like this movie � ¡ ¡ ¡vs ¡ • I really like this movie � • Which ¡words ¡to ¡use? ¡ • Only ¡adjec+ves ¡ • All ¡words ¡ • All ¡words ¡turns ¡out ¡to ¡work ¡beXer, ¡at ¡least ¡on ¡this ¡data ¡ 22 ¡

Dan ¡Jurafsky ¡ Nega%on ¡ Das, ¡Sanjiv ¡and ¡Mike ¡Chen. ¡2001. ¡Yahoo! ¡for ¡Amazon: ¡Extrac+ng ¡market ¡sen+ment ¡from ¡stock ¡ message ¡boards. ¡In ¡Proceedings ¡of ¡the ¡Asia ¡Pacific ¡Finance ¡Associa+on ¡Annual ¡Conference ¡(APFA). ¡ Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86. Add ¡NOT_ ¡to ¡every ¡word ¡between ¡nega+on ¡and ¡following ¡punctua+on: ¡ didn’t like this movie , but I � didn’t NOT_like NOT_this NOT_movie but I �

Dan ¡Jurafsky ¡ Reminder: ¡Naïve ¡Bayes ¡ " c NB = argmax P ( c j ) P ( w i | c j ) c j ! C i ! positions P ( w | c ) = count ( w , c ) + 1 ˆ count ( c ) + V 24 ¡

Dan ¡Jurafsky ¡ Binarized ¡(Boolean ¡feature) ¡ ¡Mul%nomial ¡Naïve ¡Bayes ¡ • Intui+on: ¡ • For ¡sen+ment ¡(and ¡probably ¡for ¡other ¡text ¡classifica+on ¡domains) ¡ • Word ¡occurrence ¡may ¡maXer ¡more ¡than ¡word ¡frequency ¡ • The ¡occurrence ¡of ¡the ¡word ¡ fantas1c ¡tells ¡us ¡a ¡lot ¡ • The ¡fact ¡that ¡it ¡occurs ¡5 ¡+mes ¡may ¡not ¡tell ¡us ¡much ¡more. ¡ • Boolean ¡Mul+nomial ¡Naïve ¡Bayes ¡ • Clips ¡all ¡the ¡word ¡counts ¡in ¡each ¡document ¡at ¡1 ¡ 25 ¡

Dan ¡Jurafsky ¡ Boolean ¡Mul%nomial ¡Naïve ¡Bayes: ¡Learning ¡ • From ¡training ¡corpus, ¡extract ¡ Vocabulary ¡ • Calculate ¡ P ( c j ) ¡ terms ¡ • Calculate ¡ P ( w k ¡ | ¡c j ) ¡ terms ¡ • For ¡each ¡ c j ¡ in ¡ C ¡do ¡ • Text j ¡ ← ¡single ¡doc ¡containing ¡all ¡ docs j ¡ • Remove ¡duplicates ¡in ¡each ¡doc: ¡ • For ¡each ¡word ¡type ¡w ¡in ¡doc j ¡ ¡ ¡ ¡docs j ¡ ← ¡ all ¡docs ¡with ¡ ¡class ¡= c j ¡ • For ¡ each ¡word ¡ w k ¡ in ¡ Vocabulary ¡ • Retain ¡only ¡a ¡single ¡instance ¡of ¡w ¡ ¡ ¡ ¡ ¡n k ¡ ← ¡# ¡of ¡occurrences ¡of ¡ w k ¡ in ¡ Text j ¡ | docs j | P ( c j ) ← n k + ! P ( w k | c j ) ← | total # documents| n + ! | Vocabulary |

Dan ¡Jurafsky ¡ Boolean ¡Mul%nomial ¡Naïve ¡Bayes ¡ ¡on ¡a ¡test ¡document ¡ d ¡ • First ¡remove ¡all ¡duplicate ¡words ¡from ¡ d ¡ • Then ¡compute ¡NB ¡using ¡the ¡same ¡equa+on: ¡ ¡ " c NB = argmax P ( c j ) P ( w i | c j ) c j ! C i ! positions 27 ¡

Dan ¡Jurafsky ¡ Normal ¡vs. ¡Boolean ¡Mul%nomial ¡NB ¡ Normal ¡ Doc ¡ Words ¡ Class ¡ Training ¡ 1 ¡ Chinese ¡Beijing ¡Chinese ¡ c ¡ 2 ¡ Chinese ¡Chinese ¡Shanghai ¡ c ¡ 3 ¡ Chinese ¡Macao ¡ c ¡ 4 ¡ Tokyo ¡Japan ¡Chinese ¡ j ¡ Test ¡ 5 ¡ Chinese ¡Chinese ¡Chinese ¡Tokyo ¡Japan ¡ ? ¡ Boolean ¡ Doc ¡ Words ¡ Class ¡ Training ¡ 1 ¡ Chinese ¡Beijing ¡ c ¡ 2 ¡ Chinese ¡Shanghai ¡ c ¡ 3 ¡ Chinese ¡Macao ¡ c ¡ 4 ¡ Tokyo ¡Japan ¡Chinese ¡ j ¡ Test ¡ 5 ¡ Chinese ¡Tokyo ¡Japan ¡ ? ¡ 28 ¡

Dan ¡Jurafsky ¡ Binarized ¡(Boolean ¡feature) ¡ ¡ Mul%nomial ¡Naïve ¡Bayes ¡ B. ¡Pang, ¡L. ¡Lee, ¡and ¡S. ¡Vaithyanathan. ¡ ¡2002. ¡ ¡Thumbs ¡up? ¡Sen+ment ¡Classifica+on ¡using ¡Machine ¡Learning ¡ Techniques. ¡EMNLP-‑2002, ¡79—86. ¡ V. ¡Metsis, ¡I. ¡Androutsopoulos, ¡G. ¡Paliouras. ¡2006. ¡Spam ¡Filtering ¡with ¡Naive ¡Bayes ¡– ¡Which ¡Naive ¡Bayes? ¡ CEAS ¡2006 ¡-‑ ¡Third ¡Conference ¡on ¡Email ¡and ¡An+-‑Spam. ¡ K.-‑M. ¡Schneider. ¡2004. ¡On ¡word ¡frequency ¡informa+on ¡and ¡nega+ve ¡evidence ¡in ¡Naive ¡Bayes ¡text ¡ classifica+on. ¡ICANLP, ¡474-‑485. ¡ JD ¡Rennie, ¡L ¡Shih, ¡J ¡Teevan. ¡2003. ¡Tackling ¡the ¡poor ¡assump+ons ¡of ¡naive ¡bayes ¡text ¡classifiers. ¡ICML ¡2003 ¡ • Binary ¡seems ¡to ¡work ¡beXer ¡than ¡full ¡word ¡counts ¡ • This ¡is ¡ not ¡the ¡same ¡as ¡Mul+variate ¡Bernoulli ¡Naïve ¡Bayes ¡ • MBNB ¡doesn’t ¡work ¡well ¡for ¡sen+ment ¡or ¡other ¡text ¡tasks ¡ • Other ¡possibility: ¡log(freq( w )) ¡ 29 ¡

Dan ¡Jurafsky ¡ Cross-‑Valida%on ¡ Iteration • Break ¡up ¡data ¡into ¡10 ¡folds ¡ 1 Test Training • (Equal ¡posi+ve ¡and ¡nega+ve ¡ inside ¡each ¡fold?) ¡ 2 Test Training • For ¡each ¡fold ¡ • Choose ¡the ¡fold ¡as ¡a ¡ 3 Training Test Training temporary ¡test ¡set ¡ • Train ¡on ¡9 ¡folds, ¡compute ¡ 4 Training Test performance ¡on ¡the ¡test ¡fold ¡ • Report ¡average ¡ Training 5 Test performance ¡of ¡the ¡10 ¡runs ¡

Sentiment Analysis A Baseline Algorithm Dan Jurafsky - PowerPoint PPT Presentation

Sentiment Analysis A Baseline Algorithm Dan Jurafsky Sentiment Classification in Movie Reviews Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in R: The

Sentiment Analysis What is Sentiment Analysis? Positive or negative

Sentiment Analysis What is Sentiment Analysis? Dan Jurafsky Positive or negative movie review?

Multilingual Sentiment Analysis in Social Media Supervisors Candidate Dr. Rodrigo Agerri Iaki

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Sentiment Analysis in Twitter Rohit Kumar Jha, Sakaar Khurana Sentiment Analysis in Twitter

Analysis in Hindi Naman Bansal Umair Z Ahmed MOTIVATION Why Sentiment Analysis? Labeling

Tidying Shakespeare Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in

Feature extraction for sentiment analysis on twitter data with spanish language Victor Mu niz

Sentiment analysis tasks and methods Mike Thelwall University of Wolverhampton, UK Contents

Exploiting New Sentiment-Based Meta-level Features for Effective Sentiment Analysis Srgio

Rule-Based Sentiment Analysis in Narrow Domain Detecting Sentiment in Daily Horoscopes Using

Using sentiment analysis for stock market prediction BIRGER KLEVE Project Goals Increase

Identification of Fine Grained Feature Based Event and Sentiment Phrases from Business News

Welcome ! SE N TIME N T AN ALYSIS IN P YTH ON Violeta Mishe v a Data Scientist What is sentiment

CS260: Machine Learning Theory Lecture 1: Course Introduction Jenn Wortman Vaughan September 26,

rmarkdown Introduction David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 00 has

Electronic marking of electronically submitted coursework Tim Lowe Director of Teaching School

[ ANNOTATED VERSION, 5-16-2012 ] Rational maps of degree d 2. (Mostly d = 2.) Let K be an

2 Peter B. Rhines GFD-1 Slides - winter 2004 january sea level pressure July sea level

Sentiment Analysis A Baseline Algorithm Dan Jurafsky - PowerPoint PPT Presentation

Sentiment Analysis A Baseline Algorithm Dan Jurafsky Sentiment Classification in Movie Reviews Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

Linguistic Expressions of Sentiment, Subjectivity &amp; Stance Ling575 Sentiment April 1, 2014

Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in R: The

Sentiment Analysis What is Sentiment Analysis? Positive or negative

Sentiment Analysis What is Sentiment Analysis? Dan Jurafsky Positive or negative movie review?

Multilingual Sentiment Analysis in Social Media Supervisors Candidate Dr. Rodrigo Agerri Iaki

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Sentiment Analysis in Twitter Rohit Kumar Jha, Sakaar Khurana Sentiment Analysis in Twitter

Analysis in Hindi Naman Bansal Umair Z Ahmed MOTIVATION Why Sentiment Analysis? Labeling

Tidying Shakespeare Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in

Feature extraction for sentiment analysis on twitter data with spanish language Victor Mu niz

Sentiment analysis tasks and methods Mike Thelwall University of Wolverhampton, UK Contents

Exploiting New Sentiment-Based Meta-level Features for Effective Sentiment Analysis Srgio

Rule-Based Sentiment Analysis in Narrow Domain Detecting Sentiment in Daily Horoscopes Using

Using sentiment analysis for stock market prediction BIRGER KLEVE Project Goals Increase

Identification of Fine Grained Feature Based Event and Sentiment Phrases from Business News

Welcome ! SE N TIME N T AN ALYSIS IN P YTH ON Violeta Mishe v a Data Scientist What is sentiment

CS260: Machine Learning Theory Lecture 1: Course Introduction Jenn Wortman Vaughan September 26,

rmarkdown Introduction David Dalpiaz STAT 430, Fall 2017 1 Announcements Homework 00 has

Electronic marking of electronically submitted coursework Tim Lowe Director of Teaching School

[ ANNOTATED VERSION, 5-16-2012 ] Rational maps of degree d 2. (Mostly d = 2.) Let K be an

2 Peter B. Rhines GFD-1 Slides - winter 2004 january sea level pressure July sea level

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014