local algorithms and large scale graph mining
play

Local Algorithms and Large Scale Graph Mining Silvio Lattanzi - PowerPoint PPT Presentation

Local Algorithms and Large Scale Graph Mining Silvio Lattanzi (Google Research NY) Charles River Workshop on Private Analysis of Social Networks Outline Problem and challenges Graph clustering, computation limitations. Local random walk and


  1. Extension Can we identify competitors of an Ads campaign in a specific category? Campaigns Queries Also in this setting by using some pre-computation we can compute the PPR efficiently. Charles River Workshop on Private Analysis of Social Networks

  2. Local random walk and clustering in practice Joint work with: Raimondas Kiveris (Google Research NY) Vahab Mirrokni (Google Research NY) Charles River Workshop on Private Analysis of Social Networks

  3. Some basic intuitions It would be nice to have the number and the length all the possible paths between two nodes. Charles River Workshop on Private Analysis of Social Networks

  4. Some basic intuitions It would be nice to have the number and the length all the possible paths between two nodes. Infeasible. Charles River Workshop on Private Analysis of Social Networks

  5. Some basic intuitions It would be nice to have the number and the length all the possible paths between two nodes. Infeasible. We are interested just in strong relationship, we can sample. Charles River Workshop on Private Analysis of Social Networks

  6. Truncated random walk techniques Run several truncated random walk of a specific length. Charles River Workshop on Private Analysis of Social Networks

  7. Truncated random walk techniques Run several truncated random walk of a specific length. Local algorithms based on this intuition: Truncated random walk, Personalized PageRank, Evolving set Charles River Workshop on Private Analysis of Social Networks

  8. Nice experimental properties of PPR We can approximate it efficiently in MapReduce by analyzing short random walks recursively. Charles River Workshop on Private Analysis of Social Networks

  9. Nice experimental properties of PPR We can approximate it efficiently in MapReduce by analyzing short random walks recursively. It works well in synthetic settings Charles River Workshop on Private Analysis of Social Networks

  10. Nice experimental properties of PPR We can approximate it efficiently in MapReduce by analyzing short random walks recursively. It works well in synthetic settings It works well in practice: * On public graphs with 8M nodes -- Overlapping Clustering and Distributed Computation (WSDM'11, Andersen, Gleich, Mirrokni) * On YouTube co-watch Graph with 100M nodes with 100s of machines -- Large-scale Community Detection on Youtube graph (ICWSM'11, Gargi, Lu, Mirrokni, Yoon) * For sybil detection in social networks -- The evolution of Sybil Defense via Social Networks (S&P’13, Alvisi, Clement, Epasto, Lattanzi, Panconesi) Charles River Workshop on Private Analysis of Social Networks

  11. Why does it work? Suppose to have a set with few edges going outside C v Charles River Workshop on Private Analysis of Social Networks

  12. Why does it work? Suppose to have a set with few edges going outside Most of the time a random walk will stay in C C v Charles River Workshop on Private Analysis of Social Networks

  13. Why does it work? Suppose to have a set with few edges going outside Most of the time a random walk will stay in C C v Charles River Workshop on Private Analysis of Social Networks

  14. Why does it work? Suppose to have a set with few edges going outside Most of the time a random walk will stay in C C v Charles River Workshop on Private Analysis of Social Networks

  15. Why does it work? Suppose to have a set with few edges going outside Most of the time a random walk will stay in C C v Charles River Workshop on Private Analysis of Social Networks

  16. Why does it work? Suppose to have a set with few edges going outside Most of the time a random walk will stay in C C v It is possible to bound the amount of score that goes outside C Charles River Workshop on Private Analysis of Social Networks

  17. Local clustering via random walk Charles River Workshop on Private Analysis of Social Networks

  18. How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Charles River Workshop on Private Analysis of Social Networks

  19. How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) 1 17 Charles River Workshop on Private Analysis of Social Networks

  20. Set of minimum conductance Problem is NP-hard Algorithms: p φ ( S ) = O ( φ ) Spectral algorithms [Jerrum&Sinclair’89] [Leighten-Rao’99] φ ( S ) = O (log n ) φ p [Arora-Rao-Vazirani’04] φ ( S ) = O ( log n ) φ Charles River Workshop on Private Analysis of Social Networks

  21. Set of minimum conductance Problem is NP-hard Algorithms: p φ ( S ) = O ( φ ) Spectral algorithms [Jerrum&Sinclair’89] [Leighten-Rao’99] φ ( S ) = O (log n ) φ p [Arora-Rao-Vazirani’04] φ ( S ) = O ( log n ) φ Running time is at least linear in the size of the graph... Charles River Workshop on Private Analysis of Social Networks

  22. Local Graph Clustering Charles River Workshop on Private Analysis of Social Networks

  23. Local Graph Clustering Do we really need to explore all the graph?!? Charles River Workshop on Private Analysis of Social Networks

  24. Local Clustering Algorithm Given a good node v, the algorithm: - Returns a set around v of good conductance - Runs in time proportional to the size of the output - Explores only the local neighborhood of v - Returns a set with roughly the same size of S Charles River Workshop on Private Analysis of Social Networks

  25. Previous results Approximation guarantee Running time ✓ V ol ( S ) ◆ Truncated random walk 2 ˜ 1 3 log O φ 3 n [Spielman-Teng’04] φ 5 / 3 ✓ V ol ( S ) ◆ Truncated random walk 3 ˜ p O φ log 2 n [Spielman-Teng’08] φ 2 ✓ V ol ( S ) ◆ PageRank random walk p ˜ φ log n O [Andersen-Chung-Lang’06] φ ✓ V ol ( S ) ◆ Evolving Set p ˜ φ log n O [Andersen-Peres’08] √ φ r ✓ V ol ( S ) 1+ ✏ ◆ Evolving Set � ˜ O [Gharan-Trevisan’12] √ φ ✏ Charles River Workshop on Private Analysis of Social Networks

  26. Previous results Approximation guarantee Running time ✓ V ol ( S ) ◆ Truncated random walk 2 ˜ 1 3 log O φ 3 n [Spielman-Teng’04] φ 5 / 3 ✓ V ol ( S ) ◆ Truncated random walk 3 ˜ p O φ log 2 n [Spielman-Teng’08] φ 2 ✓ V ol ( S ) ◆ PageRank random walk p ˜ φ log n O [Andersen-Chung-Lang’06] φ ✓ V ol ( S ) ◆ Evolving Set p ˜ φ log n O [Andersen-Peres’08] √ φ r ✓ V ol ( S ) 1+ ✏ ◆ Evolving Set � ˜ O [Gharan-Trevisan’12] √ φ ✏ Cheeger’s inequality barrier Charles River Workshop on Private Analysis of Social Networks

  27. Previous results Approximation guarantee Running time ✓ V ol ( S ) ◆ Truncated random walk 2 ˜ 1 3 log O φ 3 n [Spielman-Teng’04] φ 5 / 3 ✓ V ol ( S ) ◆ Truncated random walk 3 ˜ p O φ log 2 n [Spielman-Teng’08] φ 2 ✓ V ol ( S ) ◆ PageRank random walk p ˜ φ log n O [Andersen-Chung-Lang’06] φ ✓ V ol ( S ) ◆ Evolving Set p ˜ φ log n O [Andersen-Peres’08] √ φ r ✓ V ol ( S ) 1+ ✏ ◆ Evolving Set � ˜ O [Gharan-Trevisan’12] √ φ ✏ Cheeger’s inequality Running time depends φ barrier only on and S Charles River Workshop on Private Analysis of Social Networks

  28. Previous results Approximation guarantee Running time ✓ V ol ( S ) ◆ Truncated random walk 2 ˜ 1 3 log O φ 3 n [Spielman-Teng’04] φ 5 / 3 ✓ V ol ( S ) ◆ Truncated random walk 3 ˜ p O φ log 2 n [Spielman-Teng’08] φ 2 ✓ V ol ( S ) ◆ PageRank random walk p ˜ φ log n O [Andersen-Chung-Lang’06] φ ✓ V ol ( S ) ◆ Evolving Set p ˜ φ log n O [Andersen-Peres’08] √ φ r ✓ V ol ( S ) 1+ ✏ ◆ Evolving Set � ˜ O [Gharan-Trevisan’12] √ φ ✏ Cheeger’s inequality Running time depends φ barrier only on and S Charles River Workshop on Private Analysis of Social Networks

  29. Clustering using PPR Approximate Personalized PageRank vector for v 0.01 0.07 0.09 0.002 0.09 0.09 v 0.03 0.06 0.08 Charles River Workshop on Private Analysis of Social Networks

  30. Clustering using PPR Approximate Personalized PageRank vector for v Sort the nodes according their normalized score 0.005 0.035 0.0225 0.001 ppr ( v, u ) d ( u ) 0.03 0.03 v 0.01 0.02 0.04 Charles River Workshop on Private Analysis of Social Networks

  31. Clustering using PPR Approximate Personalized PageRank vector for v Sort the nodes according their normalized score Select the sweep cut of best conductance 0.005 0.035 0.0225 0.001 0.03 0.03 v 0.01 0.02 0.04 Charles River Workshop on Private Analysis of Social Networks

  32. Local clustering beyond Cheeger’s barrier Joint work with: Vahab Mirrokni (Google Research NY) Zeyaun Allen Zhu (MIT) ICML 2013 Charles River Workshop on Private Analysis of Social Networks

  33. How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Charles River Workshop on Private Analysis of Social Networks

  34. How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Is it enough to define a good cluster? Charles River Workshop on Private Analysis of Social Networks

  35. How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Is it enough to define a good cluster? Same cut conductance... Charles River Workshop on Private Analysis of Social Networks

  36. How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Good cluster have good set conductance ψ | cut ( S, C − S ) | ψ = min min ( V ol ( S ) , V ol ( C − S ) S ⊆ C Charles River Workshop on Private Analysis of Social Networks

  37. How should we define a cluster? Good clusters have cut conductance φ | cut ( C, V − C ) | φ = min ( V ol ( C ) , V ol ( V − C )) Good cluster have good set conductance ψ | cut ( S, C − S ) | ψ = min min ( V ol ( S ) , V ol ( C − S ) S ⊆ C Can we do better when ? ψ >> φ Charles River Workshop on Private Analysis of Social Networks

  38. Previous results Approximation guarantee Running time ✓ V ol ( S ) ◆ Truncated random walk 2 ˜ 1 3 log O φ 3 n [Spielman-Teng’04] φ 5 / 3 ✓ V ol ( S ) ◆ Truncated random walk 3 ˜ p O φ log 2 n [Spielman-Teng’08] φ 2 ✓ V ol ( S ) ◆ PageRank random walk p ˜ φ log n O [Andersen-Chung-Lang’06] φ ✓ V ol ( S ) ◆ Evolving Set p ˜ φ log n O [Andersen-Peres’08] √ φ r ✓ V ol ( S ) 1+ ✏ ◆ Evolving Set � ˜ O [Gharan-Trevisan’12] √ φ ✏ Cheeger’s inequality Running time depends φ barrier only on and S Charles River Workshop on Private Analysis of Social Networks

  39. Our hypothesis ✓ ◆ φ 1 We study the problem when ψ 2 < O log n Charles River Workshop on Private Analysis of Social Networks

  40. Our hypothesis ✓ ◆ φ 1 We study the problem when ψ 2 < O log n Similar problem studied Makarychev et al. in STOC12 They assume that φ < C λ 1 give a global SDP that can find communities with cut conductance φ Charles River Workshop on Private Analysis of Social Networks

  41. Can we obtain the same results locally? Can we obtain a similar result using the Personalized PageRank? Theorem If there is a cluster of cut conductance and set φ conductance exists then normalized personalized ψ ✓ φ ◆ PageRank find a cluster with conductance ˜ O ψ Charles River Workshop on Private Analysis of Social Networks

  42. Main proof ideas Bound the probability of leaving a set in t step knowing that in each step we leave with probability φ V Charles River Workshop on Private Analysis of Social Networks

  43. Main proof ideas Bound the probability of leaving a set in t step knowing that in each step we leave with probability φ V Suppose that we are mixed inside C, then we would leak probability mass at each step. φ Charles River Workshop on Private Analysis of Social Networks

  44. Main proof ideas Bound the probability of leaving a set in t step knowing that in each step we leave with probability φ V 1 φ So in steps, we would leak α α Charles River Workshop on Private Analysis of Social Networks

  45. Main proof ideas Bound the probability of leaving a set in t step knowing that in each step we leave with probability φ V If we start from a good node is: pr ( u ) < 2 φ X α u/ ∈ S Charles River Workshop on Private Analysis of Social Networks

  46. Main proof ideas 1 Inside S the random walk would be mixed in steps ψ 2 Charles River Workshop on Private Analysis of Social Networks

  47. Main proof ideas 1 Inside S the random walk would be mixed in steps ψ 2 1 d ( u ) So after each node would have a score ψ 2 V ol ( S ) Charles River Workshop on Private Analysis of Social Networks

  48. Main proof ideas 1 Inside S the random walk would be mixed in steps ψ 2 We can express the score of a node inside as: pr ( v ) ≥ ˜ pr ( v ) − pr l ( v ) Charles River Workshop on Private Analysis of Social Networks

  49. Main proof ideas 1 Inside S the random walk would be mixed in steps ψ 2 We can express the score of a node inside as: pr ( v ) ≥ ˜ pr ( v ) − pr l ( v ) But we have a bound: ✓ 1 ◆ ppr ( z ) ≤ 2 φ X X pr l ( v ) = ψ 2 < O log n v ∈ S z / ∈ S Charles River Workshop on Private Analysis of Social Networks

  50. Main proof ideas We can prove that we find a set that partially overlaps with S - Most of nodes in the cluster have high score - Most of nodes outside the cluster have low score Charles River Workshop on Private Analysis of Social Networks

  51. Main proof ideas We can prove that we find a set that partially overlaps with S This implies bound on conductance!! Charles River Workshop on Private Analysis of Social Networks

  52. Can we do better? Theorem 2 If there is a cluster of cut conductance and set φ conductance exists then normalized personalized ψ PageRank find a cluster with conductance ✓ φ ◆ Ω ψ Charles River Workshop on Private Analysis of Social Networks

  53. Results Theorem 1 If there is a cluster of cut conductance and set φ conductance exists then normalized personalized ψ ✓ φ ◆ PageRank find a cluster with conductance ˜ O ψ Theorem 2 If there is a cluster of cut conductance and set φ conductance exists then normalized personalized ψ PageRank find a cluster with conductance ✓ φ ◆ Ω ψ Charles River Workshop on Private Analysis of Social Networks

  54. Experiments Charles River Workshop on Private Analysis of Social Networks

  55. Experiments Charles River Workshop on Private Analysis of Social Networks

  56. Experiments Experiments using Watts-Strogatz model for the set S As the gap decreases, precision increases Charles River Workshop on Private Analysis of Social Networks

  57. Conclusion and open problems Charles River Workshop on Private Analysis of Social Networks

  58. Conclusion and open problems Random walk based techniques can be used to solve efficiently the similarity and the clustering problem Internal connectivity is very important for random walk techniques Can we say something when the gap between internal and external connectivity is smaller? Charles River Workshop on Private Analysis of Social Networks

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend