User Level Sentiment Analysis Incorporating Social Networks Chenhao - - PowerPoint PPT Presentation

user level sentiment analysis incorporating social
SMART_READER_LITE
LIVE PREVIEW

User Level Sentiment Analysis Incorporating Social Networks Chenhao - - PowerPoint PPT Presentation

User Level Sentiment Analysis Incorporating Social Networks Chenhao Tan Department of Computer Science Cornell University Joint work with : Lillian Lee, Jie Tang, Long Jiang, Ming Zhou and Ping Li May 18, 2011 Chenhao Tan Microsoft Research


slide-1
SLIDE 1

User Level Sentiment Analysis Incorporating Social Networks

Chenhao Tan

Department of Computer Science Cornell University Joint work with : Lillian Lee, Jie Tang, Long Jiang, Ming Zhou and Ping Li

May 18, 2011

Chenhao Tan Microsoft Research Asia

slide-2
SLIDE 2

Outline

1 Motivation 2 Problem Setting in Twitter 3 Data Collection 4 Observation 5 Model 6 Approach 7 Experiment 8 Conclusion

Chenhao Tan Microsoft Research Asia

slide-3
SLIDE 3

Motivation

User-level sentiment analysis Network information

Accessibility Homophily or Attention

Chenhao Tan Microsoft Research Asia

slide-4
SLIDE 4

Twitter as the basis

Text information: Tweets Network information

Follow Network @ Network ⋄ directed ⋄ mutual

Chenhao Tan Microsoft Research Asia

slide-5
SLIDE 5

Semi-supervised Learning in Twitter

Hard to get full labels Given a graph and labels of some nodes in the graph, try to classify the other users in the graph

Chenhao Tan Microsoft Research Asia

slide-6
SLIDE 6

Data Collection

Traditional Annotation by Tweets

Chenhao Tan Microsoft Research Asia

slide-7
SLIDE 7

Data Collection

Failed Traditional Annotation by Tweets

Chenhao Tan Microsoft Research Asia

slide-8
SLIDE 8

Data Collection

Failed Traditional Annotation by Tweets User Biographical Information

Chenhao Tan Microsoft Research Asia

slide-9
SLIDE 9

Final Data Set

1, 414, 340 users 1, 414, 211 user profiles 480, 435, 500 tweets 274, 644, 047 t-follow edges 58, 387, 964 @-edges

Chenhao Tan Microsoft Research Asia

slide-10
SLIDE 10

Sharing Label conditioned on being connected

Probability that two users have the same label, conditioned on whether or not they are connected

Chenhao Tan Microsoft Research Asia

slide-11
SLIDE 11

Connectedness conditioned on labels

Probability that two users are connected, conditioned on whether or not they have the same label

Chenhao Tan Microsoft Research Asia

slide-12
SLIDE 12

Model Framework

User-Tweet Factor fk,l(yi, yt) =

  • wlabeled

|tweetvi | yi = k, yt = l wunlabeled |tweetvi |

yi = k, yt = l 0 otherwise

User-User Factor hk,l(yi, yj) =

  • wrelation

|Neighborsvi | yi = k, yj = l

Objective Function

log P(Y) =

vi∈V

  • t∈tweetvi ,k,l

µk,lfk,l(yvi, yt)+

  • vj∈Neighborsvi ,k,l

λk,lhk,l(yvi, yvj )

  • − log Z

Chenhao Tan Microsoft Research Asia

slide-13
SLIDE 13

Approach

Parameter Estimation

Direct estimation from simple statistics SampleRank

Inference

loopy belief propagation

Chenhao Tan Microsoft Research Asia

slide-14
SLIDE 14

Methods

Training set: 50 positive users and 50 negative users the others for testing Labels of Tweets

SpecificSVM

Labels of Users

Majority Vote HGM-NoLearning HGM-Learning

Chenhao Tan Microsoft Research Asia

slide-15
SLIDE 15

Case Study

  • (a) Ground Truth

(b) Text-Only Approach (c) Our algorithm

Chenhao Tan Microsoft Research Asia

slide-16
SLIDE 16

Case Study

Sample tweets of users classified correctly only with network information

Chenhao Tan Microsoft Research Asia

slide-17
SLIDE 17

Overall Performance

Beat Baseline! Follow better than @ Directed better than Undirected NoLearning same with Learning

Chenhao Tan Microsoft Research Asia

slide-18
SLIDE 18

Performance Per Topic

Sparseness of graph Size of graph or #Tweets per user SVM Classifier Performance

Chenhao Tan Microsoft Research Asia

slide-19
SLIDE 19

Adding More Unlabeled Data

Learning better than NoLearning

Chenhao Tan Microsoft Research Asia

slide-20
SLIDE 20

Conclusion

Empirical analyses on the correlation of networks and sentiment Propose a heterogeneous graphical model Validate the effectiveness of incorporating network information

Chenhao Tan Microsoft Research Asia

slide-21
SLIDE 21

Future Work

More data sets Better models and semi-supervised learning algorithms Find the helpful parts of networks Build a theory of why and how users correlate on different topics in different kinds of networks

Chenhao Tan Microsoft Research Asia

slide-22
SLIDE 22

The End

Thank you! Questions?

Chenhao Tan Microsoft Research Asia