http://cs224w.stanford.edu Observations Models Algorithms Small - - PowerPoint PPT Presentation

http cs224w stanford edu observations models algorithms
SMART_READER_LITE
LIVE PREVIEW

http://cs224w.stanford.edu Observations Models Algorithms Small - - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu Observations Models Algorithms Small diameter, Erds-Renyi model, Decentralized search Edge clustering Small-world model


slide-1
SLIDE 1

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

http://cs224w.stanford.edu

slide-2
SLIDE 2

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

Observations

Small diameter, Edge clustering Patterns of signed edge creation Viral Marketing, Blogosphere, Memetracking Scale-Free Densification power law, Shrinking diameters Strength of weak ties, Core-periphery

Models

Erdös-Renyi model, Small-world model Structural balance, Theory of status Independent cascade model, Game theoretic model Preferential attachment, Copying model Microscopic model of evolving networks Kronecker Graphs

Algorithms

Decentralized search Models for predicting edge signs Influence maximization, Outbreak detection, LIM PageRank, Hubs and authorities Link prediction, Supervised random walks Community detection: Girvan-Newman, Modularity

slide-3
SLIDE 3

¡ Spreading through

networks:

§ Cascading behavior § Diffusion of innovations § Network effects § Epidemics

¡ Behaviors that cascade

from node to node like an epidemic

¡ Examples:

§ Biological:

§ Diseases via contagion

§ Technological:

§ Cascading failures § Spread of information

§ Social:

§ Rumors, news, new technology § Viral marketing

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

slide-4
SLIDE 4

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

Obscure tech story Small tech blog Wired Slashdot Engadget CNN NYT BBC

slide-5
SLIDE 5

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

slide-6
SLIDE 6

6 10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-7
SLIDE 7

¡ Product adoption:

§ Senders and followers of recommendations

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

slide-8
SLIDE 8

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

slide-9
SLIDE 9

¡ Contagion that spreads over the edges

  • f the network

¡ It creates a propagation tree, i.e., cascade

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

Cascade (propagation graph) Network

Terminology:

  • Stuff that spreads: Contagion
  • “Infection” event: Adoption, infection, activation
  • We have: Infected/active nodes, adoptors
slide-10
SLIDE 10

¡ Decision based models (today!):

§ Models of product adoption, decision making

§ A node observes decisions of its neighbors and makes its own decision

§ Example:

§ You join demonstrations if k of your friends do so too

¡ Probabilistic models (on Thursday):

§ Models of influence or disease spreading

§ An infected node tries to “push” the contagion to an uninfected node

§ Example:

§ You “catch” a disease with some prob. from each active neighbor in the network

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

slide-11
SLIDE 11
slide-12
SLIDE 12

¡ Collective Action [Granovetter, ‘78]

§ Model where everyone sees everyone else’s behavior (that is, we assume a complete graph) § Examples:

§ Clapping or getting up and leaving in a theater § Keeping your money or not in a stock market § Neighborhoods in cities changing ethnic composition § Riots, protests, strikes

¡ How does the number of people participating

in a given activity grow or shrink over time?

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

[Granovetter ‘78]

slide-13
SLIDE 13

¡ n people – everyone observes all actions ¡ Each person i has a threshold ti (0 ≤ 𝑢$ ≤ 1)

§ Node i will adopt the behavior iff at least ti fraction of people have already adopted:

§ Small ti: early adopter § Large ti: late adopter

§ Time moves in discrete steps

¡ The population is described by {t1,…,tn}

§ F(x) … fraction of people with threshold ti ≤ x

§ F(x) is given to us. F(x) is a property of the contagion.

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

1 P(adoption) ti

slide-14
SLIDE 14

¡ F(x) … fraction of people with threshold ti ≤ x

§ F(x) is non-decreasing: 𝑮 𝒚 + 𝜻 ≥ 𝑮 𝒚

¡ The model is dynamic:

§ Step-by-step change in number of people adopting the behavior:

§ F(x) … frac. of people with threshold ≤ x § s(t) … frac. of people participating at time t

§ Simulate:

§ s(0) = 0 § s(1) = F(0) § s(2) = F(s(1)) = F(F(0))

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

Threshold, x

F(x) F(0)

  • Frac. of population

1 1

  • Frac. of people

with threshold ≤ 𝒚 y=x s(0) s(1)

slide-15
SLIDE 15

¡ Step-by-step change in number of people :

§ F(x) … fraction of people with threshold ≤ x § s(t) … number of participants at time t

¡ Easy to simulate:

§ s(0) = 0 § s(1) = F(0) § s(2) = F(s(1)) = F(F(0)) § s(t+1) = F(s(t)) = Ft+1(0)

¡ Fixed point: F(x)=x

§ Updates to s(t) to converge to a stable fixed point § There could be other fixed points but starting from 0 we only reach the first one

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Threshold, x

F(x) y=x

15

Iterating to y=F(x). Fixed point.

F(0)

  • Frac. of population
slide-16
SLIDE 16

¡ What if we start the process somewhere else?

§ We move up/down to the next fixed point § How is market going to change?

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

Threshold, x

  • Frac. of pop.

y=x F(x)

Note: we are assuming a fully connected graph

slide-17
SLIDE 17

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

Threshold, x

  • Frac. of pop.

y=x

Stable fixed point Unstable fixed point

slide-18
SLIDE 18

¡ Each threshold ti is drawn independently from

some distribution F(x) = Pr[thresh ≤ x]

§ Suppose: Truncated normal with µ=45, variance σ Small σ: Large σ:

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

Normal(45, 10) Normal(45, 27)

slide-19
SLIDE 19

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

Bigger variance lets you build a bridge from early adopters to mainstream Small σ Medium σ F(x) F(x) No cascades! Small cascades

Fixed point is low

Normal(45, 10) Normal(45, 27)

slide-20
SLIDE 20

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

But if we increase the variance the fixed point starts going down! Big σ Huge σ Big cascades!

Fixed point gets lower! Fixed point is high!

Normal(45, 33) Normal(45, 50)

slide-21
SLIDE 21

¡ No notion of social network:

§ Some people are more influential § It matters who the early adopters are, not just how many

¡ Models people’s awareness of size of participation

not just actual number of people participating

§ Modeling perceptions of who is adopting the behavior vs. who you believe is adopting § Non-monotone behavior – dropping out if too many people adopt § People get “locked in” to certain choice over a period of time

¡ Modeling thresholds

§ Richer distributions § Deriving thresholds from more basic assumptions

§ Game theoretic models

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

slide-22
SLIDE 22

¡ Dictator tip: Pluralistic ignorance – erroneous

estimates about the prevalence of certain

  • pinions in the population

§ Survey conducted in the U.S. in 1970 showed that while a clear minority of white Americans at that point favored racial segregation, significantly more than 50% believed that it was favored by a majority of white Americans in their region of the country

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

slide-23
SLIDE 23
slide-24
SLIDE 24

¡ Based on 2 player coordination game

§ 2 players – each chooses technology A or B § Each person can only adopt one “behavior”, A or B § You gain more payoff if your friend has adopted the same behavior as you

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 24

[Morris 2000] Local view of the network of node v

slide-25
SLIDE 25

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 25

slide-26
SLIDE 26

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 26

slide-27
SLIDE 27

¡ Payoff matrix:

§ If both v and w adopt behavior A, they each get payoff a > 0 § If v and w adopt behavior B, they reach get payoff b > 0 § If v and w adopt the opposite behaviors, they each get 0

¡ In some large network:

§ Each node v is playing a copy of the game with each of its neighbors § Payoff: sum of node payoffs per game

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 27

slide-28
SLIDE 28

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 28

¡ Let v have d neighbors ¡ Assume fraction p of v’s neighbors adopt A

§ Payoffv = a∙p∙d if v chooses A = b∙(1-p)∙d if v chooses B

¡ Thus: v chooses A if: a·p·d > b·(1-p)·d

q b a b p = + >

Threshold: v choses A if

p… frac. v’s nbrs. with A q… payoff threshold

slide-29
SLIDE 29

Scenario:

¡ Graph where everyone starts with B ¡ Small set S of early adopters of A

§ Hard-wire S – they keep using A no matter what payoffs tell them to do

¡ Assume payoffs are set in such a way that

nodes say: If more than 50% of my friends take A I’ll also take A

This means: a = b-ε (ε>0, small positive constant) and q>1/2

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 29

slide-30
SLIDE 30

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

If more than q=50% of my friends are red I’ll be red

30

} , { v u S =

slide-31
SLIDE 31

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

u v

If more than q=50% of my friends are red I’ll also be red

31

} , { v u S =

slide-32
SLIDE 32

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

If more than q=50% of my friends are red I’ll also be red

32

u v

} , { v u S =

slide-33
SLIDE 33

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

If more than q=50% of my friends are red I’ll also be red

33

u v

} , { v u S =

slide-34
SLIDE 34

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

If more than q=50% of my friends are red I’ll also be red

34

u v

} , { v u S =

slide-35
SLIDE 35

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

If more than q=50% of my friends are red I’ll also be red

35

u v

} , { v u S =

slide-36
SLIDE 36

¡ Observation: Use of A spreads monotonically (Nodes only switch B→A, but never back to B) ¡ Why? Proof sketch:

§ Nodes keep switching from B to A: B→A § Now, suppose some node switched back from A→B, consider the first node u (not in S) to do so (say at time t) § Earlier at some time t’ (t’<t) the same node u switched B→A § So at time t’ u was above threshold for A § But up to time t no node switched back to B, so node u could only have more neighbors who used A at time t compared to t’. There was no reason for u to switch at the first place!

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 36

!! Contradiction !!

1 2 3 5 4 6

u

slide-37
SLIDE 37

¡ Consider infinite graph G

§ (but each node has finite number of neighbors!)

¡ We say that a finite set S causes a cascade in

G with threshold q if, when S adopts A, eventually every node in G adopts A

¡ Example: Path

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37

b a b q + =

v chooses A if p>q

If q<1/2 then cascade occurs

S

p… frac. v’s nbrs. with A q… payoff threshold

slide-38
SLIDE 38

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38

S S

If q<1/3 then cascade occurs

¡ Infinite Tree: ¡ Infinite Grid:

If q<1/4 then cascade occurs

slide-39
SLIDE 39

¡ Def:

§ The cascade capacity of a graph G is the largest q for which some finite set S can cause a cascade

¡ Fact:

§ There is no (infinite) G where cascade capacity > ½

¡ Proof idea:

§ Suppose such G exists: q>½, finite S causes cascade § Show contradiction: Argue that nodes stop switching after a finite # of steps

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 39

S

slide-40
SLIDE 40

¡ Fact: There is no G where cascade capacity > ½ ¡ Proof sketch:

§ Suppose such G exists: q>½, finite S causes cascade § Contradiction: Switching stops after a finite # of steps

§ Define “potential energy” § Argue that it starts finite (non-negative) and strictly decreases at every step

§ “Energy”: = |dout(X)|

§ |dout(X)| := # of outgoing edges of active set X

§ The only nodes that switch have a strict majority of its neighbors in S § |dout(X)| strictly decreases § It can do so only for a finite number of steps

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 40

X

slide-41
SLIDE 41

¡ What prevents cascades from spreading? ¡ Def: Cluster of density ρ is a set of nodes C

where each node in the set has at least ρ fraction of edges in C

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 41

ρ=3/5 ρ=2/3

slide-42
SLIDE 42

¡ Let S be an initial set of

adopters of A

¡ All nodes apply threshold

q to decide whether to switch to A

¡ Two facts:

§ 1) If G\S contains a cluster of density >(1-q) then S cannot cause a cascade § 2) If S fails to create a cascade, then there is a cluster of density >(1-q) in G\S

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 42

S ρ=3/5 No cascade if q>2/5

slide-43
SLIDE 43
slide-44
SLIDE 44

¡ So far:

§ Behaviors A and B compete § Can only get utility from neighbors of same behavior: A-A get a, B-B get b, A-B get 0

¡ Let’s add an extra strategy “AB”

§ AB-A : gets a § AB-B : gets b § AB-AB : gets max(a, b) § Also: Some cost c for the effort of maintaining both strategies (summed over all interactions)

§ Note: a given node can receive a from one neighbor and b from another by playing AB, which is why it could be worth the cost c

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 44

slide-45
SLIDE 45

¡ Every node in an infinite network starts with B ¡ Then a finite set S initially adopts A ¡ Run the model for t=1,2,3,…

§ Each node selects behavior that will optimize payoff (given what its neighbors did in at time t-1)

¡ How will nodes switch from B to A or AB?

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 45

B A A AB

a a max(a,b) AB b Payoff

  • c
  • c
slide-46
SLIDE 46

¡ Path graph: Start with Bs, a > b (A is better) ¡ One node switches to A – what happens?

§ With just A, B: A spreads if a > b § With A, B, AB: Does A spread?

¡ Example: a=3, b=2, c=1

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 46

B A A

a=3

B B

b=2 b=2

B A A

a=3

B B

a=3 b=2 b=2

AB

  • 1

Cascade stops

a=3

slide-47
SLIDE 47

¡ Example: a=5, b=3, c=1

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 47

B A A

a=5

B B

b=3 b=3

B A A

a=5

B B

a=5 b=3 b=3

AB

  • 1

B A A

a=5

B B

a=5 a=5 b=3

AB

  • 1

AB

  • 1

A A A

a=5

B B

a=5 a=5 b=3

AB

  • 1

AB

  • 1

Cascade never stops!

slide-48
SLIDE 48

¡ Let’s solve the model in a general case:

§ Infinite path, start with all Bs § Payoffs for w: A:a, B:1, AB:a+1-c

¡ For what pairs (c,a) does A spread?

§ We need to analyze two cases for node w: Based

  • n the values of a and c, what would w do?

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 48

w

A B

w

AB B

slide-49
SLIDE 49

¡ Infinite path, start with Bs ¡ Payoffs for w: A:a, B:1, AB:a+1-c ¡ What does node w in A-w-B do?

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 49

a c 1 1 B vs A AB vs A

w

A B

AB vs B

B B AB AB A A

a+1-c=1 a+1-c=a

slide-50
SLIDE 50

¡ Infinite path, start with Bs ¡ Payoffs for w: A:a, B:1, AB:a+1-c ¡ What does node w in A-w-B do?

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 50

a c 1 1 B vs A AB vs A

w

A B

AB vs B

B B AB AB A A

a+1-c=1 a+1-c=a

Since a<1, c>1 a is big c is big a is high c <1, AB is optimal for w

slide-51
SLIDE 51

¡ Same reward structure as before but now payoffs

for w change: A:a, B:1+1, AB:a+1-c

¡ Notice: Now also AB spreads ¡ What does node w in AB-w-B do?

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 51

w

AB B

a c 1 1 B vs A AB vs A AB vs B

B B AB AB A A

2

slide-52
SLIDE 52

¡ Same reward structure as before but now payoffs

for w change: A:a, B:1+1, AB:a+1-c

¡ Notice: Now also AB spreads ¡ What does node w in AB-w-B do?

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 52

w

AB B

a c 1 1 B vs A AB vs A AB vs B

B B AB AB A A

2

a<2, c>1 then 2b > 2a a is big c >1 c <1, then a+1-c > a AB is optimal for w

slide-53
SLIDE 53

¡ Joining the two pictures:

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 53

a c 1 1

B AB B→AB → A A

2

slide-54
SLIDE 54

¡ B is the default throughout the

network until new/better A comes along. What happens?

§ Infiltration: If B is too compatible then people will take on both and then drop the worse one (B) § Direct conquest: If A makes itself not compatible – people

  • n the border must choose.

They pick the better one (A) § Buffer zone: If you choose an

  • ptimal level then you keep

a static “buffer” between A and B

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 54

a c

B stays B→AB B→AB→A A spreads B → A

slide-55
SLIDE 55

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 55

slide-56
SLIDE 56

¡ Terms:

§ Course Project use only § Delete the data after the course is over § Non-commercial research use only § Don’t share the data

¡ See Piazza post about how to get access to

the data

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 56

slide-57
SLIDE 57

Internet

Offsite

Save Do

Pinterest

slide-58
SLIDE 58
slide-59
SLIDE 59

Pin Board

slide-60
SLIDE 60
slide-61
SLIDE 61

¡

Pinner: user_id Total users: 10,389,475 tsv file size: 123M

¡

Food Boards: board_id, board_name, board_description, user_id, board_create_time. Total number of boards: 12,466,754 The users in this table are the creators of the boards tsv file size: 493M

¡

Pins: board_id, pin_title, pin_create_time Total number of entries (rows): 736,030,700 Total number of pins (# of distinct pin_ids): 20,049,957 (the same pin can belong to multiple boards) tsv file size: 19G

¡

Follows: board_id, user_id, created_date Total number of entries (rows): 48,309,378 Number of followers (# of distinct user_ids): 14,097,700 tsv file size: 1.3G

10/18/16 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 61