purnamrita sarkar carnegie mellon deepayan chakrabarti
play

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! - PowerPoint PPT Presentation

Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W . Moore (Google, Inc.) Which pair of nodes {i,j} should be connected? Variant: node i is given Alice Bob Charlie Friend suggestion in


  1. Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W . Moore (Google, Inc.)

  2. � � Which pair of nodes {i,j} should be connected? � � Variant: node i is given Alice Bob Charlie Friend suggestion in Facebook Movie recommendation in Netflix

  3. � � Predict link between nodes • � With the minimum number of hops • � With max common neighbors (length 2 paths) Alice Prolific 1000 common friends followers � Less evidence Bob Less prolific 8 followers � Much more evidence Charlie The Adamic/Adar score gives more weight to low degree common neighbors.

  4. � � Predict link between nodes • � With the minimum number of hops • � With more common neighbors (length 2 paths) • � With larger Adamic/Adar • � With more short paths (e.g. length 3 paths ) • � …

  5. Especially if the Link prediction accuracy* How do we justify these graph is sparse observations? Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  6. Raftery et al.’s Model: Points close in this space are more likely to be connected. Unit volume universe Nodes are uniformly distributed in a latent space The problem of link prediction is to find the nearest neighbor who is not currently linked to the node. � � Equivalent to inferring distances in the latent space 6

  7. Two sources of randomness • � Point positions: uniform in D dimensional space • � Linkage probability: logistic with parameters � , r • � � , r and D are known Higher probability � determines the steepness 1 of linking � radius r 7

  8. Especially if the graph is sparse Link prediction accuracy Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  9. j i � � Pr 2 (i,j) = Pr(common neighbor|d ij ) Product of two logistic probabilities, integrated over a volume determined by d ij As � � � Logistic � Step function Much easier to analyze!

  10. Everyone has same radius r Unit volume universe j i � =Number of common neighbors Empirical Bernstein V(r)=volume Bounds on of radius r in distance D dims 10

  11. � � OPT = node closest to i � � MAX = node with max common neighbors with i � � Theorem: w.h.p d OPT � d MAX � d OPT + 2[ ������� ���� ���������� � ����� � ��� �� ��� � ��� �� �� ���������������������� Common neighbors is an asymptotically optimal heuristic as N � �

  12. � � Node k has radius r k . � � i � k if d ik � r k (Directed graph) � � r k captures popularity of node k Type 2: i � k � j Type 1: i � k � j k k j r j j r k i i r k r i A(r i , r j ,d ij ) A(r k , r k ,d ij ) 12

  13. Example graph: � � N 1 nodes of radius r 1 and N 2 nodes of radius r 2 � � r 1 << r 2 � 2 ~ Bin[N 2 , A(r 2 , r 2 , d ij )] � 1 ~ Bin[N 1 , A(r 1 , r 1 , d ij )] k i j Maximize Pr[ � 1 , � 2 | d ij ] = product of two binomials w(r 1 ) E[ � 1 |d*] + w(r 2 ) E[ � 2 |d*] = w(r 1 ) � 1 + w(r 2 ) � 2 RHS � � LHS � � d* �

  14. Jacobian { Variance Adamic/Adar Small variance � Presence is more surprising Small variance � Absence is more surprising 1/r r is close to max radius Real world graphs generally fall in this range

  15. Especially if the graph is sparse Link prediction accuracy Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  16. � � Common neighbors = 2 hop paths � � Analysis of longer paths: two components 1. Bounding E( � l | d ij ). [ � l = # l hop paths] � � Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. � l � E( � l | d ij ) Triangulation

  17. � � Common neighbors = 2 hop paths � � Analysis of longer paths: two components 1. Bounding E( � l | d ij ) [ � l = # l hop paths] � � Bounds Pr l (i,j) by using triangle inequality on a series of common neighbor probabilities. 2. � l � E( � l | d ij ) • � Bounded dependence of � l on position of each node � Can use McDiarmid’s inequality to bound | - E( � l | d ij )| � l

  18. � � Bound d ij as a function of � l using McDiarmid’s inequality. � � For l’ � l we need � l’ >> � l to obtain similar bounds � � Also, we can obtain much tighter bounds for long paths if shorter paths are known to exist.

  19. 1 � Factor � weak bound for Logistic � � Can be made tighter, as logistic approaches the step function.

  20. � � Three key ingredients 1. � Closer points are likelier to be linked. Small World Model- Watts, Strogatz, 1998, Kleinberg 2001 2. � Triangle inequality holds � necessary to extend to l hop paths 3. � Points are spread uniformly at random � Otherwise properties will depend on location as well as distance

  21. In sparse graphs, Differentiating length 3 or more Link prediction accuracy* paths help in between different degrees is important prediction. For large dense graphs, common neighbors are enough The number of paths matters, not the length Random Shortest Common Adamic/Adar Ensemble of Path Neighbors short paths *Liben-Nowell & Kleinberg, 2003; Brand, 2005; Sarkar & Moore, 2007

  22. Link Prediction Generative model Heuristics A few properties Most likely neighbor of node i ? node b node a Compare � Can justify the empirical observations � We also offer some new prediction algorithms 23

  23. � � Combine bounds from different radii � � But there might not be enough data to obtain individual bounds from each radius � � New sweep estimator � � Q r = Fraction of nodes w. radius � r, which are common neighbors. � � Higher Q r � smaller d ij w.h.p

  24. � � Q r = Fraction of nodes w. radius � r, which are common neighbors • � larger Q r � smaller d ij w.h.p � � T R : = Fraction of nodes w. radius � R, which are common neighbors. � � Smaller T R � large d ij w.h.p

  25. Number of common neighbors of a given radius r T R = Fraction of nodes Q r = Fraction of nodes with radius � R which with radius � r which are common neighbors are common neighbors Small T R � large d ij Large Q r � small d ij

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend