http://cs224w.stanford.edu Observations Models Algorithms Small - - PowerPoint PPT Presentation

http cs224w stanford edu observations models algorithms
SMART_READER_LITE
LIVE PREVIEW

http://cs224w.stanford.edu Observations Models Algorithms Small - - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Observations Models Algorithms Small diameter, Erds-Renyi model, Decentralized search Edge clustering Small-world model


slide-1
SLIDE 1

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

http://cs224w.stanford.edu

slide-2
SLIDE 2

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

Observations

Small diameter, Edge clustering Patterns of signed edge creation Viral Marketing, Blogosphere, Memetracking Scale-Free Densification power law, Shrinking diameters Strength of weak ties, Core-periphery

Models

Erdös-Renyi model, Small-world model Structural balance, Theory of status Independent cascade model, Game theoretic model Preferential attachment, Copying model Microscopic model of evolving networks Kronecker Graphs

Algorithms

Decentralized search Models for predicting edge signs Influence maximization, Outbreak detection, LIM PageRank, Hubs and authorities Link prediction, Supervised random walks Community detection: Girvan-Newman, Modularity

slide-3
SLIDE 3

 Networks with positive and

negative relationships

 Our basic unit of investigation

will be signed triangles

 First we talk about undirected

networks then directed

 Plan for today:

  • Model: Consider two soc. theories of signed nets
  • Data: Reason about them in large online networks
  • Application: Predict if A and B are linked with + or -

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

  • +
  • +
slide-4
SLIDE 4

 Networks with positive and negative

relationships

 Consider an undirected complete graph  Label each edge as either:

  • Positive: friendship, trust, positive sentiment, …
  • Negative: enemy, distrust, negative sentiment, …

 Examine triples of connected nodes A, B, C

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

slide-5
SLIDE 5

 Start with the intuition [Heider ’46]:

  • Friend of my friend is my friend
  • Enemy of enemy is my friend
  • Enemy of friend is my enemy

 Look at connected triples of nodes:

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

+ + +

  • +

+ +

  • Unbalanced

Balanced

Consistent with “friend of a friend” or “enemy of the enemy” intuition Inconsistent with the “friend of a friend”

  • r “enemy of the enemy” intuition
slide-6
SLIDE 6

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

Balanced Unbalanced

 Graph is balanced if every connected triple

  • f nodes has:
  • All 3 edges labeled +, or
  • Exactly 1 edge labeled +
slide-7
SLIDE 7

 Balance implies global coalitions [Cartwright-Harary]  If all triangles are balanced, then either:

  • The network contains only positive edges, or
  • Nodes can be split into 2 sets where negative edges
  • nly point between the sets

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

+ +

L

+

R

slide-8
SLIDE 8

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

B

+

C D E

+ – –

Friends of A Enemies of A Every node in L is enemy of R

+ + –

A

Any 2 nodes in L are friends Any 2 nodes in R are friends

L R

slide-9
SLIDE 9

 International relations:

  • Positive edge: alliance
  • Negative edge: animosity

 Separation of Bangladesh from Pakistan in

1971: US supports Pakistan. Why?

  • USSR was enemy of China
  • China was enemy of India
  • India was enemy of Pakistan
  • US was friendly with China
  • China vetoed

Bangladesh from U.N.

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

P R I C U

+ +

– – –

+?

B

–? –

slide-10
SLIDE 10

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

slide-11
SLIDE 11

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

slide-12
SLIDE 12

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

slide-13
SLIDE 13

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

slide-14
SLIDE 14

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

slide-15
SLIDE 15

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

slide-16
SLIDE 16

 So far we talked about complete graphs

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

Balanced?

  • +

Def 1: Local view Fill in the missing edges to achieve balance Def 2: Global view Divide the graph into two coalitions The 2 definitions are equivalent!

  • +
slide-17
SLIDE 17

 Graph is balanced if and only if it contains no

cycle with an odd number of negative edges

 How to compute this?

  • Find connected components on + edges
  • If we find a component of nodes on +edges

that contains a –edge ⇒ Unbalanced

  • For each component create a super-node
  • Connect components A and B if there is a

negative edge between the members

  • Assign super-nodes to sides using BFS

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

Even length cycle

– – – – – – – – –

Odd length cycle

slide-18
SLIDE 18

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

slide-19
SLIDE 19

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

slide-20
SLIDE 20

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

slide-21
SLIDE 21

 Using BFS assign each node a side  Graph is unbalanced if any two

super-nodes are assigned the same side

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

L R R L L L R Unbalanced!

slide-22
SLIDE 22

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

slide-23
SLIDE 23

 Each link AB is explicitly tagged with a sign:

  • Epinions: Trust/Distrust
  • Does A trust B’s product reviews?

(only positive links are visible)

  • Wikipedia: Support/Oppose
  • Does A support B to become

Wikipedia administrator?

  • Slashdot: Friend/Foe
  • Does A like B’s comments?
  • Other examples:
  • Online multiplayer games

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

+ + + + + + + + – – – – – – – [CHI ‘10]

slide-24
SLIDE 24

 Does structural balance hold?

  • Compare frequencies of signed triads

in real and “shuffled” data

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

Triad Epinions Wikipedia Balance P(T) P0(T) P(T) P0(T) 0.87 0.62 0.70 0.49

0.07 0.05 0.21 0.10

0.05 0.32 0.08 0.49

0.007 0.003 0.011 0.010

  • +

+ +

  • +

+ +

P(T) … fraction of a triads P0(T)… triad fraction if the signs would be random

Real data Shuffled data + x x + – – – + + + + + + + + + + + + + + + + + + – – – [CHI ‘10] x x x x x Balanced Unbalanced

slide-25
SLIDE 25

 Intuitive picture of social

network in terms of densely linked clusters

 How does structure

interact with links?

 Embeddedness of

link (A,B): Number of shared neighbors

25

slide-26
SLIDE 26

 Embeddedness of ties:

  • Positive ties tend to be

more embedded

 Positive ties tend to be

more clumped together

  • Public display of signs

(votes) in Wikipedia further attenuates this

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

Epinions Wikipedia [CHI ‘10]

slide-27
SLIDE 27

 Clustering:

  • +net: More clustering than baseline
  • –net: Less clustering than baseline

 Size of max. component:

  • +/–net: Smaller than the baseline

27 10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

[CHI ‘10] + +

  • +

+ + + +

  • +

+ +

slide-28
SLIDE 28

 New setting:

Links are directed and created over time

 How many  are now

explained by balance?

  • Only half (8 out of 16)

 Is there a better explanation? Yes. Status.

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28

16 signed directed triads

  • +
  • +

+ + + +

              

[CHI ‘10]

B X A

⋅ ⋅

(in directed networks people traditionally applied balance by ignoring edge directions)

slide-29
SLIDE 29

 Status in a network [Davis-Leinhardt ’68]

  • A ⟶ B :: B has higher status than A
  • A ⟶ B :: B has lower status than A
  • (Note the notion of status is now implicit)
  • Apply this principle transitively over paths
  • Can replace each A ⟶ B with A ⟵ B
  • Obtain an all-positive network with same

status interpretation

10/9/2012 Jure Leskovec: How people evaluate each other in social media 29

+ – + –

[CHI ‘10]

slide-30
SLIDE 30

B B

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 30

A X -

  • A

X +

+

[CHI ‘10]

Balance: + Status: – Balance: + Status: –

Status and balance give different predictions!

slide-31
SLIDE 31

At a global level:

 Status ⇒ Hierarchy

  • All-positive directed network

should be (approximately) acyclic

 Balance ⇒ Coalitions

  • Balance ignores directions and

implies that subgraph of negative edges should be (approximately) bipartite

10/9/2012 Jure Leskovec: How people evaluate each other in social media 31

+ +

  • 3

1 2

+ + +

slide-32
SLIDE 32

B

 Edges are directed and created over time

  • X has links to A and B
  • Now, A links to B (triad A-B-X)
  • How does sign of A⟶B

depend signs from/to X? P(A⟶B | X) vs. P(A⟶B)

 We need to formalize:

  • 1) Links are embedded in triads:

Triads provide context for signs

  • 2) Users are heterogeneous in

their linking behavior

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32

A X +

+ ?

B A

[CHI ‘10]

Vs.

+ +

slide-33
SLIDE 33

33

 Link A⟶B

appears in context X: A⟶B | X

 16 possible

contexts:

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

[CHI ‘10]

slide-34
SLIDE 34

 Users differ in frac. of + links they give/receive  For a user U:

  • Generative baseline: Frac. of + given by U
  • Receptive baseline: Frac. of + received by U

Basic question:

 How do different link contexts cause users to

deviate from their baselines?

  • Link contexts as modifiers on a person’s

predicted behavior

  • Surprise: How much behavior of A/B deviates

from his/her baseline when A/B is in context X

34

slide-35
SLIDE 35

 Surprise: How much behavior of user deviates from

baseline in context X

  • Baseline: For every user Ai :

pg(Ai)… generative baseline of Ai

  • Fraction of times Ai gives a plus
  • Context: (A1, B1| X1),…, (An, Bn| Xn)

… all instances of triad context X

  • (Ai, Bi, Xi) … an instance where when

user Ai links to user Bi the triad of type X is created.

  • Say k of those triads closed with a plus
  • k out of n times: Ai ⟶ Bi

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 35

Vs.

B A B X -

  • A

[CHI ‘10]

Context X: +

slide-36
SLIDE 36

 Surprise: How much behavior of user deviates from

baseline in context X

  • Generative surprise of context X:
  • pg(Ai) … generative baseline of Ai
  • Context X: (A1, B1| X1),…, (An, Bn| Xn)
  • k of instances of triad X closed

with a plus edges

  • Receptive surprise is similar, just use pr(Ai)

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 36

Vs.

B A B X -

  • A

[CHI ‘10]

Context X:

∑ ∑

= =

− − =

n i i g i g n i i g g

A p A p A p k X s

1 1

)) ( 1 )( ( ) ( ) (

slide-37
SLIDE 37

 Assume status is at work  What happens?

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37

B X -

  • B

X

+ +

A A

  • Gen. surprise of A: –
  • Rec. surprise of B: –
  • Gen. surprise of A: –
  • Rec. surprise of B: –
slide-38
SLIDE 38

 X positively endorses A and B  Now A links to B

A puzzle:

 In our data we observe:

Fraction of positive links deviates

  • Above generative baseline of A: Sg(X) >0
  • Below receptive baseline of B: Sr(X) < 0

 Why?

B X +

+ ?

A

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/9/2012 38

[CHI ‘10]

slide-39
SLIDE 39

 Ask every node: How does skill

  • f B compare to yours?
  • Build a signed directed network

 We haven’t asked A about B  But we know that X thinks

A and B are both better than him

 What can we infer about A’s answer?

B X +

+ ?

A

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/9/2012 39

[CHI ‘10]

slide-40
SLIDE 40

 A’s viewpoint:

  • Since B has positive evaluation,

B is high status

  • Thus, evaluation A gives is

more likely to be positive than the baseline B X +

+ ?

Y B How does A evaluate B?

A

A is evaluating someone who is better than avg.  A is more positive than average

Y… average node

A

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/9/2012 40

slide-41
SLIDE 41

 B’s viewpoint:

  • Since A has positive evaluation,

A is high status

  • Thus, evaluation B receives

is less likely to be positive than the baseline B X +

+ ?

A

Y A How is B evaluated by A? B is evaluated by someone better than average.  They will be more negative to B than average

Y… average node

B

Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Sign of AB deviates in different directions depending on the viewpoint!

10/9/2012 41

slide-42
SLIDE 42

 Determine node status:

  • Assign X status 0
  • Based on signs and directions
  • f edges set status of A and B

 Surprise is status-consistent, if:

  • Gen. surprise is status-consistent

if it has same sign as status of B

  • Rec. surprise is status-consistent

if it has the opposite sign from the status of A

 Surprise is balance-consistent, if:

  • If it completes a balanced triad

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42

Status-consistent if:

  • Gen. surprise > 0
  • Rec. surprise < 0

B X +

+

A

+1 +1 [CHI ‘10]

slide-43
SLIDE 43

 Predictions:

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43

t14 t15 t16 t3 t2 [CHI ‘10] Sg(ti) Sr(ti) Bg Br Sg Sr Mistakes:

slide-44
SLIDE 44

 At a global level:

  • Balance ⇒ Coalitions
  • Status ⇒ Hierarchy

 Observations:

  • No evidence for global balance

beyond the random baselines

  • Real data is 80% consistent vs. 80%

consistency under random baseline

  • Evidence for global status beyond

the random baselines

  • Real data is 80% consistent, but 50%

consistency under random baseline

10/9/2012 Jure Leskovec: How people evaluate each other in social media 44

+ +

  • 3

1 2

+ + +

slide-45
SLIDE 45

Edge sign prediction problem

 Given a network and

signs on all but one edge, predict the missing sign

 Friend recommendation:

  • Predicting whether you know

someone vs. Predicting what you think of them

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 45

u v + + ? + + + + + – – – – – – – [WWW ‘10]

slide-46
SLIDE 46

 Problem Formulation:

  • Predict sign of edge (u,v)

 Class label:

  • +1: positive edge
  • -1: negative edge

 Learning method:

  • Logistic regression
  • Each feature “votes”

for/against a positive edge.

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 46

 Dataset:

  • Balanced: 50% +edges

 Evaluation:

  • Accuracy

 Features for learning:

  • Next slide

u v + + ? + + + + + – – – – – – – [WWW ‘10]

slide-47
SLIDE 47

For each edge (A,B) create a set of features:

 Triad counts of edge (A,B):

  • In what types of triads

does our red-edge participate in?

A B

  • +

+ +

  • +
  • Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

10/9/2012 47

[WWW ‘10]

slide-48
SLIDE 48

 Classification Accuracy:

  • Epinions: 93.5%
  • Slashdot: 94.4%
  • Wikipedia: 81%

 Signs can be modeled from

local network structure alone!

  • Trust propagation model of

[Guha et al. ‘04] has 14% error

  • n Epinions

 Triad features perform less

well for less embedded edges

 Wikipedia is harder to model:

  • Votes are publicly visible

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 48

Epin Slash Wiki

[WWW ‘10]

slide-49
SLIDE 49

49

+ + +

  • +
  • +

+ +

  • +
  • +

+ +

  • +
  • +

+ +

  • +
slide-50
SLIDE 50

 Do people use these very different linking

systems by obeying the same principles?

  • How generalizable are the results across the

datasets?

  • Train on row “dataset”, predict on “column”

 Nearly perfect generalization of the models

even though networks come from very different applications!

10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 50

slide-51
SLIDE 51

 Suppose we are only interested in predicting

whether there is a positive edge or no edge

 Does knowing negative edges help?

51

+ + ? + + + + + – – – – – – – + + ? + + + + +

Vs.

YES!

10/9/2012 Jure Leskovec: How people evaluate each other in social media

[WWW ‘10]

slide-52
SLIDE 52

 Signed networks provide insight into how social

computing systems are used:

  • Status vs. Balance
  • Role of embeddedness and public display
  • More evidence that networks are globally
  • rganized based on status

 Sign of relationship can be reliably

predicted from the local network context

  • ~90% accuracy sign of the edge
  • People use signed edges consistently regardless of

particular application

  • Near perfect generalization of models across datasets

52