A Joint Model for Chinese Microblog Sentiment Analysis Yuhui Cao, - PowerPoint PPT Presentation

1 A Joint Model for Chinese Microblog Sentiment Analysis Yuhui Cao, Zhao Chen, Ruifeng Xu, Tao Chen Harbin Institute of Technology, Shenzhen Graduate School

Content I. Introduction II. Data preprocessing III. Word feature based classifier IV. CNN-based SVM classifier V. Classification results merging VI. Experimental results and analysis VII.Conclusion 2

Introduction Task: Topic-Based Chinese Message Polarity Classification Task Description : •Classify the message into positive, negative, or neutral sentiment towards the given topic. •For messages conveying both a positive and negative sentiment towards the topic, whichever is the stronger sentiment should be chosen. 3

Introduction Task Characteristics: •Real and noise data •Imbalance data between classes •Short but meaningful message Examples: • 好看？吗？ // 【 Galaxy S6 ：三星证明自己能做出好看的手机】 http://t.cn/RwHRsIb( 分享自 @ 今日头条 ) •# 三星 Galaxy S6# 三星 GALAXY S6 三星，挺中意 [ 酷 ][ 酷 ] [ 位置 ] 芒砀路 • 雾霾是什么？面对纯蓝的天，相机失焦了。 [ 位置 ] 北门街 4

Introduction Framework of our model •Data preprocessing: rule-based process •Word feature based SVM classifier: unigram + bigram + sentiment words •CNN-based SVM classifier: word embedding + convolutional neural network •Integrated strategy: multi-classifier results fusion 5

Introduction Framework of our model Data Training and preprocessing testing data Word Feature based CNN-based SVM SVM Classifier Classifier Merging rules Classification results 6

Data preprocessing Data preprocessing rules with illustrations Rules Raw Text Processed Text 好看？吗？ // 【 Galaxy S6 ：三星证明 Sharing news with 好看？吗？自己能做出好看的手机】 personal comments http://t.cn/RwHRsIb ( 分享自 @ 今日头条 ) # 三星 Galaxy S6# 三星 GALAXY S6 ，挺三星 GALAXY S6 ，挺中意 Removing HashTag 中意 [ 酷 ][ 酷 ] [ 位置 ] 芒砀路 [ 酷 ][ 酷 ] 699 欧元起传三星 Galaxy S6/S6 Edge 售 699 欧元起传三星 Galaxy Removing URL 价获证实（分享自 @ 新浪科技） S6/S6 Edge 售价获证实（分享自 @ 新浪科技） http://t.cn/RwTo3on 玻璃取代塑料，更美 Galaxy S6 的 5 大 http://t.cn/RwHY6Az 罗永 Removing 妥协 http://t.cn/RwHY6Az 罗永浩我去浩我去小米和三星这是要 nickname 小米和三星这是要闹哪样，，，老闹哪样，，，老罗。。不罗。。不能忍啊，，，，， @ 锤子科能忍啊，，，，，技营销帐号 @ 罗永浩【视频：三星 S6 对比苹果 iPhone6 【视频：三星 S6 对比苹果 Removing MWC2015 @youtube 科技 ~ 】 information sources iPhone6 MWC2015 http://t.cn/RwHQzJ8 （来自于优酷安卓 @youtube 科技 ~ 】客户端） http://t.cn/RwHQzJ8 7

Word Feature based Classifier Framework 8

Word Feature based Classifier Sentiment Lexicon expansion : To expand existing sentiment lexicon, POS tags, word frequency, mutual information and context entropy are used to mine the new sentiment word from twenty million microblog text. Positive Words Negative Words 人气王，亮骚，人气爆棚人渣，吐槽，坑爹，仆街卖萌，傲娇，傲娇，共赢伤退，伪娘，作孽，做空典藏版，劲爆，劲歌热舞偷腥，偷食，傻冒，傻叉力挺，牛逼，完爆，给力傻帽，傻缺，利空，劳神炫酷，靠谱，重磅，利好卖腐，厚黑，脑殘，无语 9

Word Feature based Classifier Word features : unigram, bigram, uni-part-of-speech, bi-part-of- speech, sentiment lexicons Features Selection Methods : CHI-test, TF-IDF Imbalance Data Problem : use SMOTE algorithm to undersampling the major class and oversampling the minor classes. Classifier : SVM with linear kernel 10

CNN-based SVM Classifier 11

CNN-based SVM Classifier 1. Word embedding • Train the CBOW model using 16GB Chinese microblog text • Obtain 200-dimension word embeddings for Chinese microblog text 12

CNN-based SVM Classifier 2. CNN-based SVM classifier Input : a matrix which is composed of the word embeddings of microblogs Features : use CNN to constitute the distributed paragraph feature representation Classifier : SVM with linear kernel 13

CNN-based SVM Classifier 2. CNN-based SVM classifier 14

Outputs merging • Two classification outputs are the same =>The final output is the same • Two classification outputs are different =>The final result is determined from the merge rules These rules are based on the statistical analysis on the individual classifier performances on training dataset. Final result Classifier 1 Classifier 2 neutral positive neutral neutral negative neutral neutral neutral positive neutral neutral negative negative positive negative positive negative positive 15

Experiments  Data set Training data: 4905 microblogs (394 positive, 538 negative and 3973 neutral), 5 topics Testing data: 19469 microblogs, 20 topics  Metrics System . Correct  P r ecision System . Output System . Correct  Re call Human . Labeled   2 Pr ecision Re call  F 1  Pr ecision Re call 16

Experiments Performances in restricted resource subtask All Positive Negative Team Name Precision Recall F1 Precision Recall F1 Precision Recall F1 TICS-dm 0.83 0.83 0.83 0.62 0.51 0.56 0.82 0.46 0.59 NEUDM2 0.74 0.74 0.74 0.31 0.08 0.13 0.44 0.08 0.13 LCYS_TEAM 0.72 0.64 0.68 0.26 0.05 0.09 0.40 0.10 0.16 HLT_HITSZ 0.68 0.68 0.68 0.21 0.40 0.28 0.45 0.60 0.52 17

Experiments Performances in unrestricted resource subtask All Positive Negative Team Name Precision Recall F1 Precision Recall F1 Precision Recall F1 TICS-dm 0.85 0.85 0.85 0.58 0.62 0.60 0.79 0.61 0.69 xk0 0.74 0.74 0.74 0.19 0.01 0.03 0.40 0.05 0.09 NEUDM1 0.74 0.74 0.74 0.26 0.11 0.16 0.46 0.33 0.38 HLT_HITSZ 0.71 0.71 0.71 0.24 0.41 0.30 0.51 0.54 0.53 18

Experiments Performances by different classifiers in unrestricted resource subtask Neutral Positive Negative Precisio Approach Recall F1 Precision Recall F1 Precision Recall F1 n Classifier 1 0.67 0.67 0.67 0.20 0.42 0.27 0.44 0.49 0.46 Classifier 2 0.60 0.60 0.60 0.18 0.61 0.28 0.42 0.67 0.52 Merging 0.71 0.71 0.71 0.24 0.41 0.30 0.51 0.54 0.53 19

Conclusion • Data preprocessing • Word feature based SVM classifier • CNN-based SVM classifier • Integrated strategy • Second rank on micro average F1 value • Fourth rank on macro average F1 value 20

21 Q&A

22 A Joint Model for Chinese Microblog Sentiment Analysis Thanks

A Joint Model for Chinese Microblog Sentiment Analysis Yuhui Cao, - PowerPoint PPT Presentation

1 A Joint Model for Chinese Microblog Sentiment Analysis Yuhui Cao, Zhao Chen, Ruifeng Xu, Tao Chen Harbin Institute of Technology, Shenzhen Graduate School Content I. Introduction II. Data preprocessing III. Word feature based classifier

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

WELCOME CHINESE Your Access Channel to the Chinese Market Welcome Chinese mission statement

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

The Columbia-GWU System at the 2016 TAC KBP BeSt Evaluation Owen Rambow, Tao Yu, Axinia Radeva,

Sentiment Analysis in Twitter Rohit Kumar Jha, Sakaar Khurana Sentiment Analysis in Twitter

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in R: The

Sentiment Analysis What is Sentiment Analysis? Positive or negative

Sentiment Analysis What is Sentiment Analysis? Dan Jurafsky Positive or negative movie review?

Multilingual Sentiment Analysis in Social Media Supervisors Candidate Dr. Rodrigo Agerri Iaki

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Analysis in Hindi Naman Bansal Umair Z Ahmed MOTIVATION Why Sentiment Analysis? Labeling

Tidying Shakespeare Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in

Feature extraction for sentiment analysis on twitter data with spanish language Victor Mu niz

Sentiment analysis tasks and methods Mike Thelwall University of Wolverhampton, UK Contents

A Year in the Life of a Parallel File System Glenn K. Lockwood, Shane Snyder, Teng Wang, Suren

Arlington County Civic Federation Meeting Dec. 6, 2016 Patrick K. Murphy, Ed.D.

Digital Strategies for Non Profit Organizations New Models & Trends ROME 14/10/2017 Alberto

WEEE Open: electronics, sustainability and open source Emanuele Guido, Tommaso Marinelli, Stefano

Information Extraction from Microblogs Posted during Disasters Saptarshi Ghosh 1 Kripabandhu Ghosh

microblogging posts Jasmina Smailovi Joef Stefan Institute Department of Knowledge Technologies

Placing images on the world map: a microblog- based enrichment approach Claudia Hau ff &

Applying geographical clustering methods to analyze geo-located open micro-blog posts Andy Turner 1

A Joint Model for Chinese Microblog Sentiment Analysis Yuhui Cao, - PowerPoint PPT Presentation

1 A Joint Model for Chinese Microblog Sentiment Analysis Yuhui Cao, Zhao Chen, Ruifeng Xu, Tao Chen Harbin Institute of Technology, Shenzhen Graduate School Content I. Introduction II. Data preprocessing III. Word feature based classifier

Twitter Sentiment Analysis Twitter Sentiment Analysis Presented by: Loitongbam Gyanendro Singh

WELCOME CHINESE Your Access Channel to the Chinese Market Welcome Chinese mission statement

Sentiment analysis Christopher Potts CS 244U: Natural language understanding May 19 1 / 83

Pl u tchik ' s w heel of emotion , polarit y v s . sentiment SE N TIME N T AN ALYSIS IN R Ted K

The Columbia-GWU System at the 2016 TAC KBP BeSt Evaluation Owen Rambow, Tao Yu, Axinia Radeva,

Sentiment Analysis in Twitter Rohit Kumar Jha, Sakaar Khurana Sentiment Analysis in Twitter

Linguistic Expressions of Sentiment, Subjectivity &amp; Stance Ling575 Sentiment April 1, 2014

Welcome! Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in R: The

Sentiment Analysis What is Sentiment Analysis? Positive or negative

Sentiment Analysis What is Sentiment Analysis? Dan Jurafsky Positive or negative movie review?

Multilingual Sentiment Analysis in Social Media Supervisors Candidate Dr. Rodrigo Agerri Iaki

Sentiment analysis IN TRODUCTION TO N ATURAL LAN GUAGE P ROCES S IN G IN R Kasey Jones

Analysis in Hindi Naman Bansal Umair Z Ahmed MOTIVATION Why Sentiment Analysis? Labeling

Tidying Shakespeare Julia Silge Data Scientist at Stack Overflow DataCamp Sentiment Analysis in

Feature extraction for sentiment analysis on twitter data with spanish language Victor Mu niz

Sentiment analysis tasks and methods Mike Thelwall University of Wolverhampton, UK Contents

A Year in the Life of a Parallel File System Glenn K. Lockwood, Shane Snyder, Teng Wang, Suren

Arlington County Civic Federation Meeting Dec. 6, 2016 Patrick K. Murphy, Ed.D.

Digital Strategies for Non Profit Organizations New Models &amp; Trends ROME 14/10/2017 Alberto

WEEE Open: electronics, sustainability and open source Emanuele Guido, Tommaso Marinelli, Stefano

Information Extraction from Microblogs Posted during Disasters Saptarshi Ghosh 1 Kripabandhu Ghosh

microblogging posts Jasmina Smailovi Joef Stefan Institute Department of Knowledge Technologies

Placing images on the world map: a microblog- based enrichment approach Claudia Hau ff &amp;

Applying geographical clustering methods to analyze geo-located open micro-blog posts Andy Turner 1

Linguistic Expressions of Sentiment, Subjectivity & Stance Ling575 Sentiment April 1, 2014

Digital Strategies for Non Profit Organizations New Models & Trends ROME 14/10/2017 Alberto

Placing images on the world map: a microblog- based enrichment approach Claudia Hau ff &