CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University
http://cs224w.stanford.edu Observations Models Algorithms Small - - PowerPoint PPT Presentation
http://cs224w.stanford.edu Observations Models Algorithms Small - - PowerPoint PPT Presentation
CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Observations Models Algorithms Small diameter, Erds-Renyi model, Decentralized search Edge clustering Small-world model
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
Observations
Small diameter, Edge clustering Patterns of signed edge creation Viral Marketing, Blogosphere, Memetracking Scale-Free Densification power law, Shrinking diameters Strength of weak ties, Core-periphery
Models
Erdös-Renyi model, Small-world model Structural balance, Theory of status Independent cascade model, Game theoretic model Preferential attachment, Copying model Microscopic model of evolving networks Kronecker Graphs
Algorithms
Decentralized search Models for predicting edge signs Influence maximization, Outbreak detection, LIM PageRank, Hubs and authorities Link prediction, Supervised random walks Community detection: Girvan-Newman, Modularity
Networks with positive and
negative relationships
Our basic unit of investigation
will be signed triangles
First we talk about undirected
networks then directed
Plan for today:
- Model: Consider two soc. theories of signed nets
- Data: Reason about them in large online networks
- Application: Predict if A and B are linked with + or -
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3
- +
- +
Networks with positive and negative
relationships
Consider an undirected complete graph Label each edge as either:
- Positive: friendship, trust, positive sentiment, …
- Negative: enemy, distrust, negative sentiment, …
Examine triples of connected nodes A, B, C
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4
Start with the intuition [Heider ’46]:
- Friend of my friend is my friend
- Enemy of enemy is my friend
- Enemy of friend is my enemy
Look at connected triples of nodes:
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5
+ + +
- +
+ +
- Unbalanced
Balanced
Consistent with “friend of a friend” or “enemy of the enemy” intuition Inconsistent with the “friend of a friend”
- r “enemy of the enemy” intuition
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6
Balanced Unbalanced
Graph is balanced if every connected triple
- f nodes has:
- All 3 edges labeled +, or
- Exactly 1 edge labeled +
Balance implies global coalitions [Cartwright-Harary] If all triangles are balanced, then either:
- The network contains only positive edges, or
- Nodes can be split into 2 sets where negative edges
- nly point between the sets
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7
+ +
L
+
R
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8
B
+
C D E
+ – –
Friends of A Enemies of A Every node in L is enemy of R
+ + –
A
Any 2 nodes in L are friends Any 2 nodes in R are friends
L R
International relations:
- Positive edge: alliance
- Negative edge: animosity
Separation of Bangladesh from Pakistan in
1971: US supports Pakistan. Why?
- USSR was enemy of China
- China was enemy of India
- India was enemy of Pakistan
- US was friendly with China
- China vetoed
Bangladesh from U.N.
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9
P R I C U
+ +
– – –
+?
B
–? –
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15
So far we talked about complete graphs
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16
Balanced?
- +
Def 1: Local view Fill in the missing edges to achieve balance Def 2: Global view Divide the graph into two coalitions The 2 definitions are equivalent!
- +
Graph is balanced if and only if it contains no
cycle with an odd number of negative edges
How to compute this?
- Find connected components on + edges
- If we find a component of nodes on +edges
that contains a –edge ⇒ Unbalanced
- For each component create a super-node
- Connect components A and B if there is a
negative edge between the members
- Assign super-nodes to sides using BFS
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17
Even length cycle
– – – – – – – – –
Odd length cycle
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20
Using BFS assign each node a side Graph is unbalanced if any two
super-nodes are assigned the same side
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21
L R R L L L R Unbalanced!
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22
Each link AB is explicitly tagged with a sign:
- Epinions: Trust/Distrust
- Does A trust B’s product reviews?
(only positive links are visible)
- Wikipedia: Support/Oppose
- Does A support B to become
Wikipedia administrator?
- Slashdot: Friend/Foe
- Does A like B’s comments?
- Other examples:
- Online multiplayer games
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23
+ + + + + + + + – – – – – – – [CHI ‘10]
Does structural balance hold?
- Compare frequencies of signed triads
in real and “shuffled” data
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24
Triad Epinions Wikipedia Balance P(T) P0(T) P(T) P0(T) 0.87 0.62 0.70 0.49
0.07 0.05 0.21 0.10
0.05 0.32 0.08 0.49
0.007 0.003 0.011 0.010
- +
+ +
- +
+ +
P(T) … fraction of a triads P0(T)… triad fraction if the signs would be random
Real data Shuffled data + x x + – – – + + + + + + + + + + + + + + + + + + – – – [CHI ‘10] x x x x x Balanced Unbalanced
Intuitive picture of social
network in terms of densely linked clusters
How does structure
interact with links?
Embeddedness of
link (A,B): Number of shared neighbors
25
Embeddedness of ties:
- Positive ties tend to be
more embedded
Positive ties tend to be
more clumped together
- Public display of signs
(votes) in Wikipedia further attenuates this
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26
Epinions Wikipedia [CHI ‘10]
Clustering:
- +net: More clustering than baseline
- –net: Less clustering than baseline
Size of max. component:
- +/–net: Smaller than the baseline
27 10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
[CHI ‘10] + +
- +
+ + + +
- +
+ +
New setting:
Links are directed and created over time
How many are now
explained by balance?
- Only half (8 out of 16)
Is there a better explanation? Yes. Status.
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28
16 signed directed triads
- +
- +
+ + + +
-
[CHI ‘10]
B X A
⋅ ⋅
(in directed networks people traditionally applied balance by ignoring edge directions)
Status in a network [Davis-Leinhardt ’68]
- A ⟶ B :: B has higher status than A
- A ⟶ B :: B has lower status than A
- (Note the notion of status is now implicit)
- Apply this principle transitively over paths
- Can replace each A ⟶ B with A ⟵ B
- Obtain an all-positive network with same
status interpretation
10/9/2012 Jure Leskovec: How people evaluate each other in social media 29
+ – + –
[CHI ‘10]
B B
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 30
A X -
- A
X +
+
[CHI ‘10]
Balance: + Status: – Balance: + Status: –
Status and balance give different predictions!
At a global level:
Status ⇒ Hierarchy
- All-positive directed network
should be (approximately) acyclic
Balance ⇒ Coalitions
- Balance ignores directions and
implies that subgraph of negative edges should be (approximately) bipartite
10/9/2012 Jure Leskovec: How people evaluate each other in social media 31
+ +
- 3
1 2
+ + +
B
Edges are directed and created over time
- X has links to A and B
- Now, A links to B (triad A-B-X)
- How does sign of A⟶B
depend signs from/to X? P(A⟶B | X) vs. P(A⟶B)
We need to formalize:
- 1) Links are embedded in triads:
Triads provide context for signs
- 2) Users are heterogeneous in
their linking behavior
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 32
A X +
+ ?
B A
[CHI ‘10]
Vs.
+ +
33
Link A⟶B
appears in context X: A⟶B | X
16 possible
contexts:
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
[CHI ‘10]
Users differ in frac. of + links they give/receive For a user U:
- Generative baseline: Frac. of + given by U
- Receptive baseline: Frac. of + received by U
Basic question:
How do different link contexts cause users to
deviate from their baselines?
- Link contexts as modifiers on a person’s
predicted behavior
- Surprise: How much behavior of A/B deviates
from his/her baseline when A/B is in context X
34
Surprise: How much behavior of user deviates from
baseline in context X
- Baseline: For every user Ai :
pg(Ai)… generative baseline of Ai
- Fraction of times Ai gives a plus
- Context: (A1, B1| X1),…, (An, Bn| Xn)
… all instances of triad context X
- (Ai, Bi, Xi) … an instance where when
user Ai links to user Bi the triad of type X is created.
- Say k of those triads closed with a plus
- k out of n times: Ai ⟶ Bi
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 35
Vs.
B A B X -
- A
[CHI ‘10]
Context X: +
Surprise: How much behavior of user deviates from
baseline in context X
- Generative surprise of context X:
- pg(Ai) … generative baseline of Ai
- Context X: (A1, B1| X1),…, (An, Bn| Xn)
- k of instances of triad X closed
with a plus edges
- Receptive surprise is similar, just use pr(Ai)
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 36
Vs.
B A B X -
- A
[CHI ‘10]
Context X:
∑ ∑
= =
− − =
n i i g i g n i i g g
A p A p A p k X s
1 1
)) ( 1 )( ( ) ( ) (
Assume status is at work What happens?
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37
B X -
- B
X
+ +
A A
- Gen. surprise of A: –
- Rec. surprise of B: –
- Gen. surprise of A: –
- Rec. surprise of B: –
X positively endorses A and B Now A links to B
A puzzle:
In our data we observe:
Fraction of positive links deviates
- Above generative baseline of A: Sg(X) >0
- Below receptive baseline of B: Sr(X) < 0
Why?
B X +
+ ?
A
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/9/2012 38
[CHI ‘10]
Ask every node: How does skill
- f B compare to yours?
- Build a signed directed network
We haven’t asked A about B But we know that X thinks
A and B are both better than him
What can we infer about A’s answer?
B X +
+ ?
A
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/9/2012 39
[CHI ‘10]
A’s viewpoint:
- Since B has positive evaluation,
B is high status
- Thus, evaluation A gives is
more likely to be positive than the baseline B X +
+ ?
Y B How does A evaluate B?
A
A is evaluating someone who is better than avg. A is more positive than average
Y… average node
A
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10/9/2012 40
B’s viewpoint:
- Since A has positive evaluation,
A is high status
- Thus, evaluation B receives
is less likely to be positive than the baseline B X +
+ ?
A
Y A How is B evaluated by A? B is evaluated by someone better than average. They will be more negative to B than average
Y… average node
B
Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
Sign of AB deviates in different directions depending on the viewpoint!
10/9/2012 41
Determine node status:
- Assign X status 0
- Based on signs and directions
- f edges set status of A and B
Surprise is status-consistent, if:
- Gen. surprise is status-consistent
if it has same sign as status of B
- Rec. surprise is status-consistent
if it has the opposite sign from the status of A
Surprise is balance-consistent, if:
- If it completes a balanced triad
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42
Status-consistent if:
- Gen. surprise > 0
- Rec. surprise < 0
B X +
+
A
+1 +1 [CHI ‘10]
Predictions:
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 43
t14 t15 t16 t3 t2 [CHI ‘10] Sg(ti) Sr(ti) Bg Br Sg Sr Mistakes:
At a global level:
- Balance ⇒ Coalitions
- Status ⇒ Hierarchy
Observations:
- No evidence for global balance
beyond the random baselines
- Real data is 80% consistent vs. 80%
consistency under random baseline
- Evidence for global status beyond
the random baselines
- Real data is 80% consistent, but 50%
consistency under random baseline
10/9/2012 Jure Leskovec: How people evaluate each other in social media 44
+ +
- 3
1 2
+ + +
Edge sign prediction problem
Given a network and
signs on all but one edge, predict the missing sign
Friend recommendation:
- Predicting whether you know
someone vs. Predicting what you think of them
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 45
u v + + ? + + + + + – – – – – – – [WWW ‘10]
Problem Formulation:
- Predict sign of edge (u,v)
Class label:
- +1: positive edge
- -1: negative edge
Learning method:
- Logistic regression
- Each feature “votes”
for/against a positive edge.
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 46
Dataset:
- Balanced: 50% +edges
Evaluation:
- Accuracy
Features for learning:
- Next slide
u v + + ? + + + + + – – – – – – – [WWW ‘10]
For each edge (A,B) create a set of features:
Triad counts of edge (A,B):
- In what types of triads
does our red-edge participate in?
A B
- +
+ +
- +
- Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu
10/9/2012 47
[WWW ‘10]
Classification Accuracy:
- Epinions: 93.5%
- Slashdot: 94.4%
- Wikipedia: 81%
Signs can be modeled from
local network structure alone!
- Trust propagation model of
[Guha et al. ‘04] has 14% error
- n Epinions
Triad features perform less
well for less embedded edges
Wikipedia is harder to model:
- Votes are publicly visible
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 48
Epin Slash Wiki
[WWW ‘10]
49
+ + +
- +
- +
+ +
- +
- +
+ +
- +
- +
+ +
- +
Do people use these very different linking
systems by obeying the same principles?
- How generalizable are the results across the
datasets?
- Train on row “dataset”, predict on “column”
Nearly perfect generalization of the models
even though networks come from very different applications!
10/9/2012 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 50
Suppose we are only interested in predicting
whether there is a positive edge or no edge
Does knowing negative edges help?
51
+ + ? + + + + + – – – – – – – + + ? + + + + +
Vs.
YES!
10/9/2012 Jure Leskovec: How people evaluate each other in social media
[WWW ‘10]
Signed networks provide insight into how social
computing systems are used:
- Status vs. Balance
- Role of embeddedness and public display
- More evidence that networks are globally
- rganized based on status
Sign of relationship can be reliably
predicted from the local network context
- ~90% accuracy sign of the edge
- People use signed edges consistently regardless of
particular application
- Near perfect generalization of models across datasets
52