Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! - PowerPoint PPT Presentation

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W . Moore (Google, Inc.)

� � Which pair of nodes {i,j} should be connected? � � Variant: node i is given Alice Bob Charlie Friend suggestion in Facebook Movie recommendation in Netflix

� � Predict link between nodes • � With the minimum number of hops • � With max common neighbors (length 2 paths) Alice Prolific 1000 common friends followers � Less evidence Bob Less prolific 8 followers � Much more evidence Charlie The Adamic/Adar score gives more weight to low degree common neighbors.

� � Predict link between nodes • � With the minimum number of hops • � With more common neighbors (length 2 paths) • � With larger Adamic/Adar • � With more short paths (e.g. length 3 paths ) • � …

Especially if the Link prediction accuracy* How do we justify these graph is sparse observations? Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

Raftery et al.’s Model: Points close in this space are more likely to be connected. Unit volume universe Nodes are uniformly distributed in a latent space The problem of link prediction is to find the nearest neighbor who is not currently linked to the node. � � Equivalent to inferring distances in the latent space 6

Two sources of randomness • � Point positions: uniform in D dimensional space • � Linkage probability: logistic with parameters � , r • � � , r and D are known Higher probability � determines the steepness 1 of linking � radius r 7

Especially if the graph is sparse Link prediction accuracy Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

j i � � Pr 2 (i,j) = Pr(common neighbor|d ij ) Product of two logistic probabilities, integrated over a volume determined by d ij As � � � Logistic � Step function Much easier to analyze!

Everyone has same radius r Unit volume universe j i � =Number of common neighbors Empirical Bernstein V(r)=volume Bounds on of radius r in distance D dims 10

� � OPT = node closest to i � � MAX = node with max common neighbors with i � � Theorem: w.h.p d OPT � d MAX � d OPT + 2[ �� Common neighbors is an asymptotically optimal heuristic as N � �

� � Node k has radius r k . � � i � k if d ik � r k (Directed graph) � � r k captures popularity of node k Type 2: i � k � j Type 1: i � k � j k k j r j j r k i i r k r i A(r i , r j ,d ij ) A(r k , r k ,d ij ) 12

Example graph: � � N 1 nodes of radius r 1 and N 2 nodes of radius r 2 � � r 1 << r 2 � 2 ~ Bin[N 2 , A(r 2 , r 2 , d ij )] � 1 ~ Bin[N 1 , A(r 1 , r 1 , d ij )] k i j Maximize Pr[ � 1 , � 2 | d ij ] = product of two binomials w(r 1 ) E[ � 1 |d*] + w(r 2 ) E[ � 2 |d*] = w(r 1 ) � 1 + w(r 2 ) � 2 RHS � � LHS � � d* �

Jacobian { Variance Adamic/Adar Small variance � Presence is more surprising Small variance � Absence is more surprising 1/r r is close to max radius Real world graphs generally fall in this range

Especially if the graph is sparse Link prediction accuracy Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

� � Common neighbors = 2 hop paths � � Analysis of longer paths: two components 1. Bounding E( � l | d ij ). [ � l = # l hop paths] � � Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. � l � E( � l | d ij ) Triangulation

� � Common neighbors = 2 hop paths � � Analysis of longer paths: two components 1. Bounding E( � l | d ij ) [ � l = # l hop paths] � � Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. � l � E( � l | d ij ) • � Bounded dependence of � l on position of each node � Can use McDiarmid’s inequality to bound | - E( � l | d ij )| � l

� � Bound d ij as a function of � l using McDiarmid’s inequality. � � For l’ � l we need � l’ >> � l to obtain similar bounds � � Also, we can obtain much tighter bounds for long paths if shorter paths are known to exist.

1 � Factor � weak bound for Logistic � � Can be made tighter, as logistic approaches the step function.

� � Three key ingredients 1. � Closer points are likelier to be linked. Small World Model- Watts, Strogatz, 1998, Kleinberg 2001 2. � Triangle inequality holds � necessary to extend to l hop paths 3. � Points are spread uniformly at random � Otherwise properties will depend on location as well as distance

In sparse graphs, Differentiating length 3 or more Link prediction accuracy* paths help in between different degrees is important prediction. For large dense graphs, common neighbors are enough The number of paths matters, not the length Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

Link Prediction Generative model Heuristics A few properties Most likely neighbor of node i ? node b node a Compare � Can justify the empirical observations � We also offer some new prediction algorithms 23

� � Combine bounds from different radii � � But there might not be enough data to obtain individual bounds from each radius � � New sweep estimator � � Q r = Fraction of nodes w. radius � r, which are common neighbors. � � Higher Q r � smaller d ij w.h.p

� � Q r = Fraction of nodes w. radius � r, which are common neighbors • � larger Q r � smaller d ij w.h.p � � T R : = Fraction of nodes w. radius � R, which are common neighbors. � � Smaller T R � large d ij w.h.p

Number of common neighbors of a given radius r T R = Fraction of nodes Q r = Fraction of nodes with radius � R which with radius � r which are common neighbors are common neighbors Small T R � large d ij Large Q r � small d ij

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! - PowerPoint PPT Presentation

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W . Moore (Google, Inc.) Which pair of nodes {i,j} should be connected? Variant: node i is given Alice Bob Charlie Friend suggestion in

A new graphics API Deepayan Sarkar Indian Statistical Institute, Delhi DSC 2014 Deepayan Sarkar

Statistical Graphics using lattice Deepayan Sarkar Fred Hutchinson Cancer Research Center 29

Mobile IPv6 Mobile IPv6 Connectathon 2003 2003 Connectathon IETF56 IETF56 Interoperability

Amit Chakrabarti Dartmouth College WAPMDS, IIT Kanpur, Dec 2009 Amit Chakrabarti 1 Multi-Pass

Carnegie Mellon University Search TRECVID 2004 Workshop November 2004 Mike Christel, Jun

Brendan Meeder Carnegie Mellon University Christos Faloutsos Carnegie Mellon University Given a

A First Look Franz Franchetti Carnegie Mellon University in collaboration with Daniele G.

Overlapping Clustering Models, and One (class) SVM to Bind Them All Xueyu Mao Department of

Cache Lab Implementation and Blocking Slides courtesy of: Aditya Shah, CMU 1 Carnegie Mellon

From Carnegie Mellon to Kyoto: How Far Can We Go? Project Courses at Carnegie Mellon Involve

Running Incomplete Programs Ian Voysey Carnegie Mellon University Cyrus Omar Carnegie Mellon

Feature Selection Matters for Anchor-Free Object Detection Chenchen Zhu Carnegie Mellon

SPIRAL, FFTX, and the Path to SpectralPACK Franz Franchetti Carnegie Mellon University

15-213 Recitation: Attack Lab Jenna MacCarley 28 Sep 2015 Carnegie Mellon Reminder Bomb lab

15-213 Recitation: Bomb Lab 21 Sep 2015 Monil Shah, Shelton DSouza Carnegie Mellon Agenda

for BlueGene/P Franz Franchetti 1 , Yevgen Voronenko 2 , Gheorghe Almasi 3 1 Carnegie Mellon

New characterizations of completely monotone functions and Bernstein functions, a converse to

Supremacy Experiments Complexity-Theoretic Foundations of Quantum . . . . . . 1 / 29 UT

SzegTaikov inequality for conjugate polynomials Polina Glazyrina Ural Federal University

Matrix-valued Chernoff Bounds and Applications China Theory Week Anastasios Zouzias University

Posterior consistency in Bayesian inference with exponential priors Masoumeh Dashti University

Some rigidity results for complete spacelike hypersurfaces with constant mean curvature in

Inverse problems in models of distribution of resources A. A. Shananin Contents 1.

Uniform bounds for positive random functionals with application to density estimation Oleg Lepski

Sambuz

Useful Links

Newsletter

Mail Us