Local Algorithms and Large Scale Graph Mining Silvio Lattanzi - - PowerPoint PPT Presentation

local algorithms and large scale graph mining
SMART_READER_LITE
LIVE PREVIEW

Local Algorithms and Large Scale Graph Mining Silvio Lattanzi - - PowerPoint PPT Presentation

Local Algorithms and Large Scale Graph Mining Silvio Lattanzi (Google Research NY) Charles River Workshop on Private Analysis of Social Networks Outline Problem and challenges Graph clustering, computation limitations. Local random walk and


slide-1
SLIDE 1

Local Algorithms and Large Scale Graph Mining

Silvio Lattanzi (Google Research NY) Charles River Workshop on Private Analysis of Social Networks

slide-2
SLIDE 2

Outline

Problem and challenges Graph clustering, computation limitations. Local random walk and node similarities Personalize page rank to detect similar nodes in a graph. Local random walk and clustering in practice Personalize page rank and distributed clusters in practice. Local clustering beyond Cheeger’s inequality A local algorithm for finding well connected clusters.

Charles River Workshop on Private Analysis of Social Networks

slide-3
SLIDE 3

Problem and challenges

Charles River Workshop on Private Analysis of Social Networks

slide-4
SLIDE 4

Local graph algorithms

Local algorithms Algorithms based on local message passing among nodes

Charles River Workshop on Private Analysis of Social Networks

slide-5
SLIDE 5

Local algorithms Algorithms based on local message passing among nodes

Local graph algorithms

Charles River Workshop on Private Analysis of Social Networks

slide-6
SLIDE 6

Local algorithms Algorithms based on local message passing among nodes Advantages

  • Applicable to large scale graphs
  • Fast, easy to implement in parallel (MapReduce, Hadoop, Pregel...)

Local graph algorithms

Charles River Workshop on Private Analysis of Social Networks

slide-7
SLIDE 7

Problems

Similarity Construct a light robust similarity measure between not

adjacent edges.

?

Charles River Workshop on Private Analysis of Social Networks

slide-8
SLIDE 8

Problems

Similarity Construct a light robust similarity measure between not

adjacent edges.

Motivation Goal is to find list of similar nodes.

?

Charles River Workshop on Private Analysis of Social Networks

slide-9
SLIDE 9

Problems

Similarity Construct a light robust similarity measure between not

adjacent edges.

Motivation Goal is to find list of similar nodes. Connection with link prediction Random walk based technique, number of paths, Jaccard similarity...

?

Charles River Workshop on Private Analysis of Social Networks

slide-10
SLIDE 10

Problems

Similarity Construct a light robust similarity measure between not

adjacent edges.

Motivation Goal is to find list of similar nodes. Connection with link prediction Random walk based technique, number of paths, Jaccard similarity... Several variations Bipartite graphs, directed graphs...

?

Charles River Workshop on Private Analysis of Social Networks

slide-11
SLIDE 11

Clustering Find good clusters quickly in parallel.

Problems

Charles River Workshop on Private Analysis of Social Networks

slide-12
SLIDE 12

Clustering Find good clusters quickly in parallel. New challenges Very large graphs, need of parallelizable solutions

Problems

Charles River Workshop on Private Analysis of Social Networks

slide-13
SLIDE 13

Clustering Find good clusters quickly in parallel. New challenges Very large graphs, need of parallelizable solutions Few approaches Random walks, hierarchical clustering, agglomerative clustering...

Problems

Charles River Workshop on Private Analysis of Social Networks

slide-14
SLIDE 14

Clustering Find good clusters quickly in parallel. New challenges Very large graphs, need of parallelizable solutions Few approaches Random walks, hierarchical clustering, agglomerative clustering... Different constraints Balanced clustering, size constraint clustering...

Problems

Charles River Workshop on Private Analysis of Social Networks

slide-15
SLIDE 15

A useful technique

Charles River Workshop on Private Analysis of Social Networks

Similarity Construct a light robust similarity measure between not

adjacent edges.

Clustering Find good clusters quickly in parallel. Common approach based on random walk to solve both problems.

?

slide-16
SLIDE 16

Local random walk and node similarities

Joint work with: Alessandro Epasto (Sapienza University) Jon Feldman (Google Research NY) Stefano Leonardi (Sapienza University) Vahab Mirrokni (Google Research NY) WWW 2014

Charles River Workshop on Private Analysis of Social Networks

slide-17
SLIDE 17

Can we identify competitors of an Ads campaign?

A real world problem

Charles River Workshop on Private Analysis of Social Networks

slide-18
SLIDE 18

Can we identify competitors of an Ads campaign?

A real world problem

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries

slide-19
SLIDE 19

Various approaches

A real world problem

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries u v

slide-20
SLIDE 20

Various approaches

A real world problem

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries u v

slide-21
SLIDE 21

Various approaches

A real world problem

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries u v

Common neighbors: 2

slide-22
SLIDE 22

Various approaches

A real world problem

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries u v

Common neighbors: 2 Jaccard similarity:

1 2

slide-23
SLIDE 23

Various approaches

A real world problem

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries u v

Common neighbors: 2 Jaccard similarity:

1 2

Number of paths: 2

slide-24
SLIDE 24

Various approaches

A real world problem

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries u v

Common neighbors: 2 Jaccard similarity:

1 2

Number of paths: 2 Short random walk(PPR)

slide-25
SLIDE 25

Personalized PageRank(v,u) Probability of visiting u in the following lazy random walk: at each step,

– With probability , go back to v. – With probability , go to a neighbor uniformly at random.

Personalized PageRank

v

1 2α 1 2(1 − α)

V

Charles River Workshop on Private Analysis of Social Networks

slide-26
SLIDE 26

Personalized PageRank(v,u) Probability of visiting u in the following lazy random walk: at each step,

– With probability , go back to v. – With probability , go to a neighbor uniformly at random.

Personalized PageRank

v

1 2α 1 2(1 − α)

V

Charles River Workshop on Private Analysis of Social Networks

slide-27
SLIDE 27

Personalized PageRank(v,u) Probability of visiting u in the following lazy random walk: at each step,

– With probability , go back to v. – With probability , go to a neighbor uniformly at random.

Personalized PageRank

v

1 2α 1 2(1 − α)

V

Charles River Workshop on Private Analysis of Social Networks

slide-28
SLIDE 28

Personalized PageRank(v,u) Probability of visiting u in the following lazy random walk: at each step,

– With probability , go back to v. – With probability , go to a neighbor uniformly at random.

Personalized PageRank

v

1 2α 1 2(1 − α)

V

Charles River Workshop on Private Analysis of Social Networks

slide-29
SLIDE 29

Personalized PageRank(v,u) Probability of visiting u in the following lazy random walk: at each step,

– With probability , go back to v. – With probability , go to a neighbor uniformly at random.

Personalized PageRank

v

1 2α 1 2(1 − α)

V

Charles River Workshop on Private Analysis of Social Networks

slide-30
SLIDE 30

Personalized PageRank(v,u) Probability of visiting u in the following lazy random walk: at each step,

– With probability , go back to v. – With probability , go to a neighbor uniformly at random.

Personalized PageRank

v

1 2α 1 2(1 − α)

V

Charles River Workshop on Private Analysis of Social Networks

slide-31
SLIDE 31

We had ground truth data

Experimental comparison

Charles River Workshop on Private Analysis of Social Networks

slide-32
SLIDE 32

Approximate efficiently

Approximate personalized PageRank

Charles River Workshop on Private Analysis of Social Networks

slide-33
SLIDE 33

Approximate efficiently Two main approach:

  • Monte Carlo techniques
  • Push-score techniques

Approximate personalized PageRank

Charles River Workshop on Private Analysis of Social Networks

slide-34
SLIDE 34

The campaigns-queries graph is lopsided

Large scale computations

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries

slide-35
SLIDE 35

The campaigns-queries graph is lopsided

Large scale computations

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries

Millions of campaigns and hundreds of millions

  • f queries.
slide-36
SLIDE 36

We can reduce to a computation only on campaigns

Large scale computations

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries u v

slide-37
SLIDE 37

We can reduce to a computation only on campaigns

Large scale computations

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries u v u v w z w z

slide-38
SLIDE 38

We can reduce to a computation only on campaigns

Large scale computations

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries u v u v w z w z

w(u, v) = X

q∈N(u)∩N(v)

w(u, q)w(q, v) d(q)

slide-39
SLIDE 39

We can reduce to a computation only on campaigns

Large scale computations

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries u v u v w z w z

w(u, v) = X

q∈N(u)∩N(v)

w(u, q)w(q, v) d(q)

PPR(u, v)B = 1 2 − αPPR(u, v)G

slide-40
SLIDE 40

Can we identify competitors of an Ads campaign in a specific category?

Extensions

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries

slide-41
SLIDE 41

Can we identify competitors of an Ads campaign in a specific category?

Extensions

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries

New York and sport

slide-42
SLIDE 42

Can we identify competitors of an Ads campaign in a specific category?

Extensions

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries

New York and pizza

slide-43
SLIDE 43

Can we identify competitors of an Ads campaign in a specific category?

Extension

Charles River Workshop on Private Analysis of Social Networks

Campaigns Queries

Also in this setting by using some pre-computation we can compute the PPR efficiently.

slide-44
SLIDE 44

Local random walk and clustering in practice

Joint work with: Raimondas Kiveris (Google Research NY) Vahab Mirrokni (Google Research NY)

Charles River Workshop on Private Analysis of Social Networks

slide-45
SLIDE 45

It would be nice to have the number and the length all the possible paths between two nodes.

Some basic intuitions

Charles River Workshop on Private Analysis of Social Networks

slide-46
SLIDE 46

It would be nice to have the number and the length all the possible paths between two nodes. Infeasible.

Some basic intuitions

Charles River Workshop on Private Analysis of Social Networks

slide-47
SLIDE 47

It would be nice to have the number and the length all the possible paths between two nodes. Infeasible. We are interested just in strong relationship, we can sample.

Some basic intuitions

Charles River Workshop on Private Analysis of Social Networks

slide-48
SLIDE 48

Run several truncated random walk of a specific length.

Truncated random walk techniques

Charles River Workshop on Private Analysis of Social Networks

slide-49
SLIDE 49

Run several truncated random walk of a specific length. Local algorithms based on this intuition: Truncated random walk, Personalized PageRank, Evolving set

Truncated random walk techniques

Charles River Workshop on Private Analysis of Social Networks

slide-50
SLIDE 50

We can approximate it efficiently in MapReduce by analyzing short random walks recursively.

Nice experimental properties of PPR

Charles River Workshop on Private Analysis of Social Networks

slide-51
SLIDE 51

We can approximate it efficiently in MapReduce by analyzing short random walks recursively. It works well in synthetic settings

Nice experimental properties of PPR

Charles River Workshop on Private Analysis of Social Networks

slide-52
SLIDE 52

We can approximate it efficiently in MapReduce by analyzing short random walks recursively. It works well in synthetic settings It works well in practice:

* On public graphs with 8M nodes

  • - Overlapping Clustering and Distributed Computation (WSDM'11, Andersen, Gleich, Mirrokni)

* On YouTube co-watch Graph with 100M nodes with 100s of machines

  • - Large-scale Community Detection on Youtube graph (ICWSM'11, Gargi, Lu, Mirrokni, Yoon)

* For sybil detection in social networks

  • - The evolution of Sybil Defense via Social Networks (S&P’13, Alvisi, Clement, Epasto, Lattanzi, Panconesi)

Nice experimental properties of PPR

Charles River Workshop on Private Analysis of Social Networks

slide-53
SLIDE 53

Suppose to have a set with few edges going outside

Why does it work?

C

v

Charles River Workshop on Private Analysis of Social Networks

slide-54
SLIDE 54

Suppose to have a set with few edges going outside Most of the time a random walk will stay in C

Why does it work?

C

v

Charles River Workshop on Private Analysis of Social Networks

slide-55
SLIDE 55

Suppose to have a set with few edges going outside Most of the time a random walk will stay in C

Why does it work?

C

v

Charles River Workshop on Private Analysis of Social Networks

slide-56
SLIDE 56

Suppose to have a set with few edges going outside Most of the time a random walk will stay in C

Why does it work?

C

v

Charles River Workshop on Private Analysis of Social Networks

slide-57
SLIDE 57

Suppose to have a set with few edges going outside Most of the time a random walk will stay in C

Why does it work?

C

v

Charles River Workshop on Private Analysis of Social Networks

slide-58
SLIDE 58

Suppose to have a set with few edges going outside Most of the time a random walk will stay in C It is possible to bound the amount of score that goes

  • utside C

Why does it work?

C

v

Charles River Workshop on Private Analysis of Social Networks

slide-59
SLIDE 59

Local clustering via random walk

Charles River Workshop on Private Analysis of Social Networks

slide-60
SLIDE 60

Good clusters have cut conductance φ

How should we define a cluster?

φ = |cut(C, V − C)| min(V ol(C), V ol(V − C))

Charles River Workshop on Private Analysis of Social Networks

slide-61
SLIDE 61

1 17

φ

How should we define a cluster?

φ = |cut(C, V − C)| min(V ol(C), V ol(V − C)) Good clusters have cut conductance

Charles River Workshop on Private Analysis of Social Networks

slide-62
SLIDE 62

Set of minimum conductance

Problem is NP-hard Algorithms:

Spectral algorithms [Jerrum&Sinclair’89] [Leighten-Rao’99] [Arora-Rao-Vazirani’04]

φ(S) = O( p log n)φ

φ(S) = O(log n)φ

φ(S) = O( p φ)

Charles River Workshop on Private Analysis of Social Networks

slide-63
SLIDE 63

Set of minimum conductance

Problem is NP-hard Algorithms:

Spectral algorithms [Jerrum&Sinclair’89] [Leighten-Rao’99] [Arora-Rao-Vazirani’04]

Running time is at least linear in the size of the graph...

φ(S) = O( p log n)φ

φ(S) = O(log n)φ

φ(S) = O( p φ)

Charles River Workshop on Private Analysis of Social Networks

slide-64
SLIDE 64

Local Graph Clustering

Charles River Workshop on Private Analysis of Social Networks

slide-65
SLIDE 65

Local Graph Clustering

Do we really need to explore all the graph?!?

Charles River Workshop on Private Analysis of Social Networks

slide-66
SLIDE 66

Local Clustering Algorithm

Given a good node v, the algorithm:

  • Returns a set around v of good conductance
  • Runs in time proportional to the size of the output
  • Explores only the local neighborhood of v
  • Returns a set with roughly the same size of S

Charles River Workshop on Private Analysis of Social Networks

slide-67
SLIDE 67

Previous results

Approximation guarantee Running time Truncated random walk [Spielman-Teng’04] Truncated random walk [Spielman-Teng’08] PageRank random walk [Andersen-Chung-Lang’06] Evolving Set [Andersen-Peres’08] Evolving Set [Gharan-Trevisan’12]

φ

1 3 log 2 3 n

p φ log

3 2 n

p φ log n p φ log n

r

˜ O ✓V ol(S) φ5/3 ◆ ˜ O ✓V ol(S) φ2 ◆ ˜ O ✓V ol(S) φ ◆ ˜ O ✓V ol(S) √φ ◆

˜ O ✓V ol(S)1+✏ √φ ◆

Charles River Workshop on Private Analysis of Social Networks

slide-68
SLIDE 68

Previous results

Approximation guarantee Running time Truncated random walk [Spielman-Teng’04] Truncated random walk [Spielman-Teng’08] PageRank random walk [Andersen-Chung-Lang’06] Evolving Set [Andersen-Peres’08] Evolving Set [Gharan-Trevisan’12]

φ

1 3 log 2 3 n

p φ log

3 2 n

p φ log n p φ log n

r

˜ O ✓V ol(S) φ5/3 ◆ ˜ O ✓V ol(S) φ2 ◆ ˜ O ✓V ol(S) φ ◆ ˜ O ✓V ol(S) √φ ◆

˜ O ✓V ol(S)1+✏ √φ ◆

Cheeger’s inequality barrier

Charles River Workshop on Private Analysis of Social Networks

slide-69
SLIDE 69

Previous results

Approximation guarantee Running time Truncated random walk [Spielman-Teng’04] Truncated random walk [Spielman-Teng’08] PageRank random walk [Andersen-Chung-Lang’06] Evolving Set [Andersen-Peres’08] Evolving Set [Gharan-Trevisan’12]

φ

1 3 log 2 3 n

p φ log

3 2 n

p φ log n p φ log n

r

˜ O ✓V ol(S) φ5/3 ◆ ˜ O ✓V ol(S) φ2 ◆ ˜ O ✓V ol(S) φ ◆ ˜ O ✓V ol(S) √φ ◆

˜ O ✓V ol(S)1+✏ √φ ◆

Running time depends

  • nly on and

Cheeger’s inequality barrier

S

φ

Charles River Workshop on Private Analysis of Social Networks

slide-70
SLIDE 70

Previous results

Approximation guarantee Running time Truncated random walk [Spielman-Teng’04] Truncated random walk [Spielman-Teng’08] PageRank random walk [Andersen-Chung-Lang’06] Evolving Set [Andersen-Peres’08] Evolving Set [Gharan-Trevisan’12]

φ

1 3 log 2 3 n

p φ log

3 2 n

p φ log n p φ log n

r

˜ O ✓V ol(S) φ5/3 ◆ ˜ O ✓V ol(S) φ2 ◆ ˜ O ✓V ol(S) φ ◆ ˜ O ✓V ol(S) √φ ◆

˜ O ✓V ol(S)1+✏ √φ ◆

Running time depends

  • nly on and

Cheeger’s inequality barrier

S

φ

Charles River Workshop on Private Analysis of Social Networks

slide-71
SLIDE 71

Approximate Personalized PageRank vector for v

Clustering using PPR

v 0.09 0.07 0.08 0.09 0.09 0.06 0.01 0.002 0.03

Charles River Workshop on Private Analysis of Social Networks

slide-72
SLIDE 72

Approximate Personalized PageRank vector for v Sort the nodes according their normalized score

Clustering using PPR

v

ppr(v, u) d(u)

0.03 0.035 0.04 0.03 0.0225 0.02 0.005 0.001 0.01

Charles River Workshop on Private Analysis of Social Networks

slide-73
SLIDE 73

Approximate Personalized PageRank vector for v Sort the nodes according their normalized score Select the sweep cut of best conductance

Clustering using PPR

v 0.03 0.035 0.04 0.03 0.0225 0.02 0.005 0.001 0.01

Charles River Workshop on Private Analysis of Social Networks

slide-74
SLIDE 74

Local clustering beyond Cheeger’s barrier

Joint work with: Vahab Mirrokni (Google Research NY) Zeyaun Allen Zhu (MIT) ICML 2013

Charles River Workshop on Private Analysis of Social Networks

slide-75
SLIDE 75

Good clusters have cut conductance φ

How should we define a cluster?

φ = |cut(C, V − C)| min(V ol(C), V ol(V − C))

Charles River Workshop on Private Analysis of Social Networks

slide-76
SLIDE 76

Good clusters have cut conductance Is it enough to define a good cluster? φ

How should we define a cluster?

φ = |cut(C, V − C)| min(V ol(C), V ol(V − C))

Charles River Workshop on Private Analysis of Social Networks

slide-77
SLIDE 77

Good clusters have cut conductance Is it enough to define a good cluster? Same cut conductance... φ

How should we define a cluster?

φ = |cut(C, V − C)| min(V ol(C), V ol(V − C))

Charles River Workshop on Private Analysis of Social Networks

slide-78
SLIDE 78

Good clusters have cut conductance Good cluster have good set conductance φ

How should we define a cluster?

ψ φ = |cut(C, V − C)| min(V ol(C), V ol(V − C)) ψ = min

S⊆C

|cut(S, C − S)| min(V ol(S), V ol(C − S)

Charles River Workshop on Private Analysis of Social Networks

slide-79
SLIDE 79

Good clusters have cut conductance Good cluster have good set conductance Can we do better when ? φ

How should we define a cluster?

ψ ψ >> φ φ = |cut(C, V − C)| min(V ol(C), V ol(V − C)) ψ = min

S⊆C

|cut(S, C − S)| min(V ol(S), V ol(C − S)

Charles River Workshop on Private Analysis of Social Networks

slide-80
SLIDE 80

Previous results

Approximation guarantee Running time Truncated random walk [Spielman-Teng’04] Truncated random walk [Spielman-Teng’08] PageRank random walk [Andersen-Chung-Lang’06] Evolving Set [Andersen-Peres’08] Evolving Set [Gharan-Trevisan’12]

φ

1 3 log 2 3 n

p φ log

3 2 n

p φ log n p φ log n

r

˜ O ✓V ol(S) φ5/3 ◆ ˜ O ✓V ol(S) φ2 ◆ ˜ O ✓V ol(S) φ ◆ ˜ O ✓V ol(S) √φ ◆

˜ O ✓V ol(S)1+✏ √φ ◆

Running time depends

  • nly on and

Cheeger’s inequality barrier

S

φ

Charles River Workshop on Private Analysis of Social Networks

slide-81
SLIDE 81

We study the problem when

Our hypothesis

φ ψ2 < O ✓ 1 log n ◆

Charles River Workshop on Private Analysis of Social Networks

slide-82
SLIDE 82

We study the problem when Similar problem studied Makarychev et al. in STOC12 They assume that give a global SDP that can find communities with cut conductance

Our hypothesis

φ ψ2 < O ✓ 1 log n ◆ φ

φ λ1 < C

Charles River Workshop on Private Analysis of Social Networks

slide-83
SLIDE 83

Can we obtain a similar result using the Personalized PageRank? Theorem If there is a cluster of cut conductance and set conductance exists then normalized personalized PageRank find a cluster with conductance

Can we obtain the same results locally?

φ ψ

˜ O ✓ φ ψ ◆

Charles River Workshop on Private Analysis of Social Networks

slide-84
SLIDE 84

Bound the probability of leaving a set in t step knowing that in each step we leave with probability

Main proof ideas

V

φ

Charles River Workshop on Private Analysis of Social Networks

slide-85
SLIDE 85

Bound the probability of leaving a set in t step knowing that in each step we leave with probability Suppose that we are mixed inside C, then we would leak probability mass at each step.

Main proof ideas

φ

V

φ

Charles River Workshop on Private Analysis of Social Networks

slide-86
SLIDE 86

Bound the probability of leaving a set in t step knowing that in each step we leave with probability So in steps, we would leak

Main proof ideas

φ

V

1 α φ α

Charles River Workshop on Private Analysis of Social Networks

slide-87
SLIDE 87

Bound the probability of leaving a set in t step knowing that in each step we leave with probability If we start from a good node is:

Main proof ideas

φ

X

u/ ∈S

pr(u) < 2φ α

V

Charles River Workshop on Private Analysis of Social Networks

slide-88
SLIDE 88

Inside S the random walk would be mixed in steps

Main proof ideas

1 ψ2

Charles River Workshop on Private Analysis of Social Networks

slide-89
SLIDE 89

Inside S the random walk would be mixed in steps So after each node would have a score

Main proof ideas

1 ψ2 1 ψ2

d(u) V ol(S)

Charles River Workshop on Private Analysis of Social Networks

slide-90
SLIDE 90

Inside S the random walk would be mixed in steps We can express the score of a node inside as:

Main proof ideas

1 ψ2

pr(v) ≥ ˜ pr(v) − prl(v)

Charles River Workshop on Private Analysis of Social Networks

slide-91
SLIDE 91

Inside S the random walk would be mixed in steps We can express the score of a node inside as: But we have a bound:

Main proof ideas

1 ψ2

pr(v) ≥ ˜ pr(v) − prl(v)

X

v∈S

prl(v) = X

z / ∈S

ppr(z) ≤ 2 φ ψ2 < O ✓ 1 log n ◆

Charles River Workshop on Private Analysis of Social Networks

slide-92
SLIDE 92

We can prove that we find a set that partially overlaps with S

  • Most of nodes in the cluster have high score
  • Most of nodes outside the cluster have low score

Main proof ideas

Charles River Workshop on Private Analysis of Social Networks

slide-93
SLIDE 93

We can prove that we find a set that partially overlaps with S This implies bound on conductance!!

Main proof ideas

Charles River Workshop on Private Analysis of Social Networks

slide-94
SLIDE 94

Can we do better?

Theorem 2 If there is a cluster of cut conductance and set conductance exists then normalized personalized PageRank find a cluster with conductance

Ω ✓ φ ψ ◆

φ ψ

Charles River Workshop on Private Analysis of Social Networks

slide-95
SLIDE 95

Theorem 1 If there is a cluster of cut conductance and set conductance exists then normalized personalized PageRank find a cluster with conductance Theorem 2 If there is a cluster of cut conductance and set conductance exists then normalized personalized PageRank find a cluster with conductance

Results

φ φ ψ ψ

˜ O ✓ φ ψ ◆

Ω ✓ φ ψ ◆

Charles River Workshop on Private Analysis of Social Networks

slide-96
SLIDE 96

Experiments

Charles River Workshop on Private Analysis of Social Networks

slide-97
SLIDE 97

Experiments

Charles River Workshop on Private Analysis of Social Networks

slide-98
SLIDE 98

Experiments

Experiments using Watts-Strogatz model for the set S As the gap decreases, precision increases

Charles River Workshop on Private Analysis of Social Networks

slide-99
SLIDE 99

Conclusion and open problems

Charles River Workshop on Private Analysis of Social Networks

slide-100
SLIDE 100

Conclusion and open problems

Random walk based techniques can be used to solve efficiently the similarity and the clustering problem Internal connectivity is very important for random walk techniques Can we say something when the gap between internal and external connectivity is smaller?

Charles River Workshop on Private Analysis of Social Networks

slide-101
SLIDE 101

Thanks!

Charles River Workshop on Private Analysis of Social Networks