http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link - - PowerPoint PPT Presentation

http cs224w stanford edu
SMART_READER_LITE
LIVE PREVIEW

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link - - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University Jure Leskovec Stanford University http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link prediction task: Given G[t 0 ,t


slide-1
SLIDE 1

CS224W: Social and Information Network Analysis Jure Leskovec Stanford University Jure Leskovec, Stanford University

http://cs224w.stanford.edu

slide-2
SLIDE 2

[LibenNowell‐Kleinberg ‘03]

 Link prediction task:  Link prediction task:

  • Given G[t0,t0’] a graph on edges up to time t0’
  • utput a ranked list L of links (not in G[t t ’]) that
  • utput a ranked list L of links (not in G[t0,t0 ]) that

are predicted to appear in G[t1,t1’]

 Evaluation:

  • n=|Enew|: # new edges that appear during the test

period [t1,t1’]

  • Take top n elements of L and count correct edges

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

slide-3
SLIDE 3

[LibenNowell‐Kleinberg ‘03]

 Link prediction task:  Link prediction task:

  • Given G[t0,t0’] a graph on edges up to time t0’
  • utput a ranked list L of links (not in G[t t ’]) that
  • utput a ranked list L of links (not in G[t0,t0 ]) that

are predicted to appear in G[t1,t1’]

 Evaluation:

  • n=|Enew|: # new edges that appear during the test

period [t1,t1’]

  • Take top n elements of L and count correct edges

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

slide-4
SLIDE 4

[LibenNowell‐Kleinberg ‘03]

 Predict links evolving collaboration network  Predict links evolving collaboration network  Core: Since network data is very sparse

  • Consider only nodes with in‐degree and out‐

degree of at least 3

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

slide-5
SLIDE 5

[LibenNowell‐Kleinberg ‘03]

 For every pair of nodes (x,y) compute:

Γ(x) … degree of node x

For every pair of nodes (x,y) compute:

 Sort the pairs by score and

di t t i li k predict top n pairs as new links

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

Γ(x) … degree of node x

slide-6
SLIDE 6

[LibenNowell‐Kleinberg ‘03]

 Rank potential links (x,y) based on:

Rank potential links (x,y) based on:

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

Γ(x) … degree of node x

slide-7
SLIDE 7

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

slide-8
SLIDE 8

[LibenNowell‐Kleinberg ’ 03]

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

slide-9
SLIDE 9

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

 Improvement over #common neighbors

slide-10
SLIDE 10

 Recommend a list of possible friends

Recommend a list of possible friends

 Supervised machine learning setting:

  • Training example:
  • For every node s have a list of nodes

she will create links to {v1, …, vk}

  • Problem:

Problem:

  • Learn a model that will for a given

node s rank nodes {v1, …, vk} higher than other nodes in the network than other nodes in the network

 How to combine node/edge

attributes and network structure?

  • Let’s learn how to bias random walks!

10 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-11
SLIDE 11

[WSDM ’11]

 Let s be the center node

v1 v1 v2 v2

 Let fw(u,v) be a function that assigns

a strength to each edge:

f ( ) ( Ψ )

s

auv = fw(u,v) = exp(-wΨuv)

  • Ψuv is a feature vector
  • Features of node u

s

  • Features of node u
  • Features of node v
  • Features of edge (u,v)

v3 v3 positive examples negative examples

  • w is the parameter vector we want to learn

 Do a random walk from s where transitions

di t d t th

negative examples

are according to edge strengths

 How to learn fw(u,v)?

11 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-12
SLIDE 12

[WSDM ’11]

 Random walk transition matrix:

v1 v1 v2 v2

 Random walk transition matrix:

2

 PageRank transition matrix:

s

g

  • with prob. α jump back to s

v3 v3

 Compute PageRank vector: p=pTQ  Rank nodes by p  Rank nodes by pu

12 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-13
SLIDE 13

[WSDM ’11]

 Each node u has a score p

v1 v1 v2 v2

 Each node u has a score pu  Destination nodes D={v1,…, vk}  No‐link nodes L={the rest}

2

 No‐link nodes L={the rest}  What do we want?

s v3 v3

 Hard constraints, make them soft

13 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-14
SLIDE 14

[WSDM ’11]

 Want to minimize:

v1 v1 v2 v2

 Want to minimize:

2

  • Loss: h(x)=0 if x<0, x2 else

 How to minimize F?

s

How to minimize F? pl and pd depend on w:

  • Given w assign edge weights a =f (u v)

v3 v3

Given w assign edge weights auv fw(u,v)

  • Using transition matrix Q=[auv] compute

PageRank scores p PageRank scores pu

  • Want to set w such that pl<pd

14 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-15
SLIDE 15

[WSDM ’11]

 How to minimize F?

v1 v1 v2 v2

  • Take the derivative!

2

s

 We know:

i.e.

v3 v3

 So:  Looks like the PageRank equation!

15 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-16
SLIDE 16

[WSDM ’11]

 Iceland Facebook network

v1 v1 v2 v2

 Iceland Facebook network

  • 174,000 nodes (55% of population)
  • A

d 168

2

  • Avg. degree 168
  • Avg. person added 26 new friends/month

For every node

s

 For every node s:

  • Positive examples:

D { f i d hi f i N ‘09 }

v3 v3

  • D={ new friendships of s in Nov ‘09 }
  • Negative examples:
  • L { th

d did t t li k t }

  • L={ other nodes s did not create new links to }

16 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-17
SLIDE 17

 Node and Edge features for learning:

g g

  • Node:
  • Age
  • Gender
  • Degree
  • Edge:
  • Age of an edge

C i ti

  • Communication,
  • Profile visits
  • Co‐tagged photos

 Baselines:

Baselines:

  • Decision trees and logistic regression:
  • Above features + 10 network features (PageRank, common friends)

 Evaluation:

  • AUC and precision at Top20

17 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-18
SLIDE 18

 Facebook:  Facebook:

predicting future friends friends

18 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-19
SLIDE 19

 Arxiv Hep Ph  Arxiv Hep‐Ph

collaboration network network

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

slide-20
SLIDE 20

 Results:  Results:

  • 2.3X improvement over

previous FB‐PYMK system

2.3x

previous FB‐PYMK system

 How to scale to FB size?

  • FB network:
  • >500 million people, >65 billion edges
  • 40 machines, each 72GB of RAM (total 2.8TB)
  • System makes 8.6 million suggests per second

y gg p

20 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-21
SLIDE 21
  • Many social or information networks are implicit or

Many social or information networks are implicit or hard to observe:

  • Hidden/hard‐to‐reach populations:

k f dl h b d

  • Network of needle sharing between drug injection users
  • Implicit connections:
  • Network of information propagation in online news media
  • But we can observe results of the processes

taking place on such (invisible) networks:

  • Virus propagation:
  • Drug users get sick, and we observe when they see the doctor
  • Information networks:

Information networks:

  • We observe when media sites mention information

21

  • Question: Can we infer the hidden networks?

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-22
SLIDE 22
  • There is a directed social network over which

diff i t k l diffusions take place:

b a b a b d e c e c d e

  • But we do not observe the edges of the network
  • We only see the time when a node gets infected:
  • We only see the time when a node gets infected:
  • Cascade c1: (a, 1), (c, 2), (b, 6), (e, 9)
  • Cascade c2: (c, 1), (a, 4), (b, 5), (d, 8)

22

2 ( , ), ( , ), ( , ), ( , )

  • Task: inferring the underlying network

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-23
SLIDE 23

Word of mouth &

Virus propagation

Word of mouth & Viral marketing

Viruses propagate Recommendations and Process p p g through the network We only observe when people get sick Recommendations and influence propagate We only observe when Process We observe people get sick But NOT who infected whom people buy products But NOT who influenced h We observe It’s hidden whom whom

23

Can we infer the underlying network?

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-24
SLIDE 24
  • Continuous time cascade diffusion model:

Continuous time cascade diffusion model:

  • Cascade c reaches node u at tu

and spreads to u’s neighbors:

  • With probability β cascade propagates along edge (u, v)

and we determine the infection time of node v tv = tu + Δ e.g.: Δ ~ Exponential or Power‐law

tu tb tc

Δ1 Δ2

b a b u u b

2 24

c c d

We assume each node v has only one parent!

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-25
SLIDE 25
  • Probability that cascade c propagates from

node u to node v is:

Pc(u, v)  P(tv‐ tu) with tv> tu

  • Since not all nodes get infected by the diffusion process, we

introduce the external influence node m: Pc(m, v) = ε

m b a b

  • Prob. that cascade c propagates

in a tree pattern T:

m ε ε ε d e c e

25

Tree pattern T on cascade c: Tree pattern T on cascade c: (a, 1), (b, 2), (c, 4), (e, 8)

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-26
SLIDE 26
  • There are many possible propagation trees that are

h h b d d consistent with the observed data:

  • c: (a, 1), (c, 2), (b, 3), (e, 4)

b a b b a b b a b b d e a c a c b e b d e a c a c b e b d e a c a c b e e e e

  • Need to consider all possible propagation trees T

supported by the graph G:

  • Likelihood of a set of cascades C:

26

Likelihood of a set of cascades C:

  • Want to find a graph:

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-27
SLIDE 27
  • We consider only the most likely tree

We consider only the most likely tree

  • Maximum log‐likelihood for a cascade c

under a graph G: g L lik lih d f G i t f d C

  • Log‐likelihood of G given a set of cascades C:

27 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-28
SLIDE 28

Given a cascade c, Given a cascade c,

  • What is the most likely propagation tree?

where where

  • A maximum directed spanning tree (MDST):
  • The sub‐graph of G induced by the nodes in the

The sub graph of G induced by the nodes in the cascade c is a DAG

  • Because edges point forward in time
  • For each node, just picks an in‐edge of max‐weight:

28

Greedy parent selection of each node gives globally optimal tree!

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-29
SLIDE 29
  • Theorem:

Theorem:

Log‐likelihood Fc(G) of cascade c is monotonic, and submodular in the edges of the graph G

Gain of adding an edge to a “small” graph Gain of adding an edge to a “large“ graph

Fc(A  {e}) – Fc(A) ≥ Fc(B  {e}) – Fc(B)

graph graph

A  B  VxV

  • Log‐likelihood FC(G) is a sum of submodular functions,

then it is submodular too

29 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-30
SLIDE 30
  • Use the greedy hill‐climbing to maximize FC(G):
  • For i=1…k:
  • At every step, pick the edge that maximizes the marginal

improvement

B fit

b d a Marginal gains

a b c b d b : 20 : 18 : 4 : 17 : 2 : 1

  • 1. Approximation guarantee (≈ 0.63 of OPT)

2 Tight on line bo nds on the sol tion q alit

Benefits:

d e c

e b : 5 a c b c b d c d : 15 : 8 : 16 : 8 : 3 : 1 : 7 : 6

  • 2. Tight on‐line bounds on the solution quality
  • 3. Speed‐ups:

Lazy evaluation (by submodularity)

30

d c d e d : 8 : 10 b e d e : 7 : 13 : 8 :

Lazy evaluation (by submodularity) Localized update (by the structure of the problem)

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-31
SLIDE 31
  • We validate our method on:

We validate our method on: Synthetic data

Generate a graph G on k edges

Real data

MemeTracker: 172m news articles Generate cascades Record node infection times Reconstruct G Aug ’08 – Sept ‘09 343m textual phrases (quotes)

  • How many edges of

G can we find?

  • How well do we
  • ptimize the

G can we find?

  • Precision‐Recall
  • Break‐even point
  • How fast is the

l i h ?

  • ptimize the

likelihood Fc(G)?

31

  • How many cascades

do we need? algorithm?

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-32
SLIDE 32

1024 node hierarchical Kronecker 1000 node Forest Fire (α = 1.1) l l

  • Performance does not depend on the network structure:

S h i N k F t Fi K k

1024 node hierarchical Kronecker exponential transmission model power law transmission model

  • Synthetic Networks: Forest Fire, Kronecker, etc.
  • Transmission time distribution: Exponential, Power Law
  • Break even point of > 90%

32

  • Break‐even point of > 90%

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-33
SLIDE 33
  • We achieve ≈ 90 % of the best possible network!

We achieve 90 % of the best possible network!

33 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-34
SLIDE 34
  • With 2x as many infections as edges, the break‐even

With 2x as many infections as edges, the break even point is already 0.8 ‐ 0.9!

34 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-35
SLIDE 35
  • 5,000 news sites:

5,000 news sites:

35

Blogs Mainstream media

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-36
SLIDE 36

36

Blogs Mainstream media

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

slide-37
SLIDE 37

 Poster session:  Poster session:

  • Tuesday 3‐6pm in Gates lobby
  • C

l t i k t b d i th l bb

  • Come early to pick poster boards in the lobby
  • At least 2 (out of 5) course staff should see your

poster poster

  • There will be cookies 

 Project writeups:

  • Due Wednesday midnight

Due Wednesday midnight

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 37

slide-38
SLIDE 38

 CS246: Mining Massive Datasets (Winter 2011)

CS246: Mining Massive Datasets (Winter 2011)

  • How to deal with big datasets,
  • Emphasis on parallel processing, large scale machine

Emphasis on parallel processing, large scale machine learning, web and social network data

 CS341: Special topics in Data Mining (Spring 2011)

CS341: Special topics in Data Mining (Spring 2011)

  • Project oriented large scale data mining
  • Hadoop, Map‐reduce, unlimited access to Amazon’s cloud.

 Workshop on Social Networks (SOC 317W, Ecu317X)

  • Discussion oriented class where students present their

Discussion oriented class where students present their

  • wn research and provide feedback on others’ work

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 38