http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University Jure Leskovec Stanford University http://cs224w.stanford.edu

[LibenNowell ‐ Kleinberg ‘03]  Link prediction task:  Link prediction task:  Given G[t 0 ,t 0 ’] a graph on edges up to time t 0 ’ output a ranked list L of links (not in G[t t ’] ) that output a ranked list L of links (not in G[t 0 ,t 0 ] ) that are predicted to appear in G[t 1 ,t 1 ’]  Evaluation:  n=|E new | : # new edges that appear during the test period [t 1 ,t 1 ’]  Take top n elements of L and count correct edges 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2

[LibenNowell ‐ Kleinberg ‘03]  Link prediction task:  Link prediction task:  Given G[t 0 ,t 0 ’] a graph on edges up to time t 0 ’ output a ranked list L of links (not in G[t t ’] ) that output a ranked list L of links (not in G[t 0 ,t 0 ] ) that are predicted to appear in G[t 1 ,t 1 ’]  Evaluation:  n=|E new | : # new edges that appear during the test period [t 1 ,t 1 ’]  Take top n elements of L and count correct edges 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 3

[LibenNowell ‐ Kleinberg ‘03]  Predict links evolving collaboration network  Predict links evolving collaboration network  Core: Since network data is very sparse  Consider only nodes with in ‐ degree and out ‐ degree of at least 3 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 4

[LibenNowell ‐ Kleinberg ‘03] Γ (x) … degree of node x  For every pair of nodes (x,y) compute: For every pair of nodes (x,y) compute:  Sort the pairs by score and predict top n pairs as new links di t t i li k Γ (x) … degree of node x 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 5

[LibenNowell ‐ Kleinberg ‘03]  Rank potential links (x,y) based on: Rank potential links (x,y) based on: Γ (x) … degree of node x 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 6

12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 7

[LibenNowell ‐ Kleinberg ’ 03] 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 8

 Improvement over #common neighbors 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 9

 Recommend a list of possible friends Recommend a list of possible friends  Supervised machine learning setting:  Training example:  For every node s have a list of nodes she will create links to {v 1 , …, v k }  Problem: Problem:  Learn a model that will for a given node s rank nodes {v 1 , …, v k } higher than other nodes in the network than other nodes in the network  How to combine node/edge attributes and network structure?  Let’s learn how to bias random walks! 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 10

[WSDM ’11] v 1 v 1  Let s be the center node v 2 v 2  Let f w (u,v) be a function that assigns a strength to each edge: a uv = f w (u,v) = exp(-w Ψ uv ) f ( ) ( Ψ ) s s  Ψ uv is a feature vector  Features of node u  Features of node u v 3 v 3  Features of node v positive examples  Features of edge (u,v) negative examples negative examples  w is the parameter vector we want to learn  Do a random walk from s where transitions are according to edge strengths di t d t th  How to learn f w (u,v) ? 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 11

[WSDM ’11] v 1 v 1 v 2 v 2  Random walk transition matrix:  Random walk transition matrix: 2 s  PageRank transition matrix: g v 3 v 3  with prob. α jump back to s  Compute PageRank vector: p=p T Q  Rank nodes by p  Rank nodes by p u 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 12

[WSDM ’11] v 1 v 1  Each node u has a score p  Each node u has a score p u v 2 v 2 2  Destination nodes D={v 1 ,…, v k }  No ‐ link nodes L={the rest}  No ‐ link nodes L={the rest} s  What do we want? v 3 v 3  Hard constraints, make them soft 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 13

[WSDM ’11] v 1 v 1 v 2 v 2  Want to minimize:  Want to minimize: 2 s  Loss: h(x)=0 if x<0 , x 2 else  How to minimize F ? How to minimize F ? p l and p d depend on w : v 3 v 3  Given w assign edge weights a =f (u v) Given w assign edge weights a uv f w (u,v)  Using transition matrix Q=[a uv ] compute PageRank scores p PageRank scores p u  Want to set w such that p l <p d 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 14

[WSDM ’11]  How to minimize F? v 1 v 1 v 2 v 2 2  Take the derivative! s  We know: v 3 v 3 i.e.  So:  Looks like the PageRank equation! 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 15

[WSDM ’11] v 1 v 1 v 2 v 2  Iceland Facebook network  Iceland Facebook network 2  174,000 nodes (55% of population)  A  Avg. degree 168 d 168 s  Avg. person added 26 new friends/month  For every node s : For every node v 3 v 3  Positive examples:  D={ new friendships of s in Nov ‘09 } D { ‘09 } f i d hi f i N  Negative examples:  L { th  L={ other nodes s did not create new links to } li k t } d did t t 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 16

 Node and Edge features for learning: g g  Node:  Age  Gender  Degree  Edge:  Age of an edge  Communication, C i ti  Profile visits  Co ‐ tagged photos  Baselines: Baselines:  Decision trees and logistic regression:  Above features + 10 network features (PageRank, common friends)  Evaluation:  AUC and precision at Top20 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 17

 Facebook:  Facebook: predicting future friends friends 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 18

 Arxiv Hep Ph  Arxiv Hep ‐ Ph collaboration network network 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 19

 Results:  Results:  2.3X improvement over 2.3x previous FB ‐ PYMK system previous FB ‐ PYMK system  How to scale to FB size?  FB network:  >500 million people, >65 billion edges  40 machines, each 72GB of RAM (total 2.8TB)  System makes 8.6 million suggests per second y gg p 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 20

 Many social or information networks are implicit or Many social or information networks are implicit or hard to observe:  Hidden/hard ‐ to ‐ reach populations:  Network of needle sharing between drug injection users k f dl h b d  Implicit connections:  Network of information propagation in online news media  But we can observe results of the processes taking place on such (invisible) networks:  Virus propagation:  Drug users get sick, and we observe when they see the doctor  Information networks: Information networks:  We observe when media sites mention information  Question: Can we infer the hidden networks? 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 21

 There is a directed social network over which diff diffusions take place: i t k l a a b b b d d c c e e e  But we do not observe the edges of the network   We only see the time when a node gets infected : We only see the time when a node gets infected :  Cascade c 1 : (a, 1), (c, 2), (b, 6), (e, 9)  Cascade c 2 : (c, 1), (a, 4), (b, 5), (d, 8) 2 ( , ), ( , ), ( , ), ( , )  Task: inferring the underlying network 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 22

Word of mouth & Word of mouth & Virus propagation Viral marketing Viruses propagate p p g Recommendations and Recommendations and Process Process through the network influence propagate We only observe when We only observe when We observe We observe people get sick people get sick people buy products But NOT who infected But NOT who influenced It’s hidden whom whom whom h Can we infer the underlying network? 12/01/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 23

http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link - PowerPoint PPT Presentation

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University Jure Leskovec Stanford University http://cs224w.stanford.edu [LibenNowell Kleinberg 03] Link prediction task: Link prediction task: Given G[t 0 ,t

http://cs224w.stanford.edu October August 12/3/2013 Jure Leskovec, Stanford CS224W: Social and

http://cs224w.stanford.edu 10/31/2012 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu Course website: Course website: http://cs224w.stanford.edu

http://cs224w.stanford.edu 10/25/2010 Jure Leskovec, Stanford CS224W: Social and Information

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 12/4/17 Jure

http://cs224w.stanford.edu Nodes Nodes Network Adjacency matrix 11/30/17 Jure Leskovec,

http://cs224w.stanford.edu ? ? ? ? Machine Learning ? Node classification 10/15/19 Jure

http://cs224w.stanford.edu Output: Node embeddings. We can also embed larger network

http://cs224w.stanford.edu Stanford Social Web (ca. 1999) network

http://cs224w.stanford.edu Networks of tightly Networks of tightly connected groups

http://cs224w.stanford.edu Spreading through networks: Spreading through networks:

http://cs224w.stanford.edu Non overlapping vs overlapping communities Non overlapping

http://cs224w.stanford.edu Teams of 2 3 students (1 is also ok) Teams of 2 3 students

http://cs224w.stanford.edu How to organize/navigate it? How to organize/navigate it?

http://cs224w.stanford.edu Probabilistic models of network contagion Probabilistic models

http://cs224w.stanford.edu In decision-based models nodes make decisions based on pay-off

A L O C A L B O O K S W A P P I N G H U B T e x t b o o k s a r e e x p e n s i v e !

Software Engineering I (02161) Week 2 Assoc. Prof. Hubert Baumeister DTU Compute Technical

Elements of a Covenant Proposal Acceptance Blood Ratification Covenant Meal Book of the

Proposals for Enduring Delivery Governance Draft For Discussion at DSC Change Commitee 13 Sep 17

Implementing Trusted Digital Implementing Trusted Digital Repositories Repositories Reagan W.

EPUB 3 and in the future of e-learning SMART on ICT 2012 International Open Forum Markus Gylling,

Refactoring Lecture 7 January 02, 2009 O b j e c t O r i e n t e d S o f t w a r e E n g i

Hierarchy of Ideas Page 43 Transform the World Hierarchy of Ideas Chunking Up Chunking Down