1
Exact Recovery for a Family
- f Community-Detection
Generative Models
Paolo Penna Joachim Buhmann, Luca Corinzia, Luca Mondada
joint work with ISE Group, Inst for Machine Learning, ETH Zurich
Exact Recovery for a Family of Community-Detection Generative - - PowerPoint PPT Presentation
Exact Recovery for a Family of Community-Detection Generative Models Paolo Penna joint work with Joachim Buhmann, Luca Corinzia, Luca Mondada ISE Group, Inst for Machine Learning, ETH Zurich 1 A Toy Problem 2 - 1 A Toy Problem pick random
1
Paolo Penna Joachim Buhmann, Luca Corinzia, Luca Mondada
joint work with ISE Group, Inst for Machine Learning, ETH Zurich
2 - 1
A Toy Problem
2 - 2
A Toy Problem
pick random triangle
2 - 3
A Toy Problem
pick random triangle add noise
2 - 4
A Toy Problem
pick random triangle add noise
2 - 5
A Toy Problem
pick random triangle add noise Find the triangle?
3 - 1
A Toy Problem
pick random triangle 1 1 1 Find the triangle?
3 - 2
A Toy Problem
pick random triangle add noise 1 1 1 Find the triangle? ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2)
3 - 3
A Toy Problem
pick random triangle add noise µ µ µ Find the triangle? ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2)
3 - 4
A Toy Problem
pick random triangle Find the triangle? add noise ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(0, σ2) ∼ N(0, σ2)
3 - 5
A Toy Problem
pick random triangle Find the triangle? add noise ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(0, σ2) ∼ N(0, σ2) Return heaviest triangle
3 - 6
A Toy Problem
pick random triangle Find the triangle? add noise ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(0, σ2) ∼ N(0, σ2) Return heaviest triangle signal vs noise
3 - 7
A Toy Problem
pick random triangle Find the triangle? add noise ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(0, σ2) ∼ N(0, σ2) Return heaviest triangle signal vs noise
3 - 8
A Toy Problem
pick random triangle Find the triangle? add noise ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(0, σ2) ∼ N(0, σ2) Return heaviest triangle random graph model
4 - 1
Flavor of the Problem
Recover planted solution? random noise planted random model planted solution
4 - 2
Flavor of the Problem
Recover planted solution? random noise planted random model planted solution Many variants: planted clique,planted bisection, stochastic block model,...
5
6
k
7 - 1
Planted Random Models
Random Graph community k
7 - 2
Planted Random Models
Random Graph k Weighted Stochastic Block Model
7 - 3
Planted Random Models
Random Graph k Weighted Stochastic Block Model Densest k-Subgraph Problem
8
h
9
Our Model
N nodes k (solution) h (hyperedge)
10 - 1
The Simplest Model
· · · M 1 2 Random Energy Model (REM)
10 - 2
The Simplest Model
· · · M 1 2 Random Energy Model (REM) solutions
10 - 3
The Simplest Model
· · · M 1 2 Random Energy Model (REM) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2)
10 - 4
The Simplest Model
· · · M 1 2 Random Energy Model (REM) independent ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2)
10 - 5
The Simplest Model
· · · M 1 2 Random Energy Model (REM) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(µ, σ2) Planted (P-REM)
10 - 6
The Simplest Model
· · · M 1 2 Random Energy Model (REM) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(µ, σ2) Planted (P-REM) Recover planted solution? Return max weight one
10 - 7
The Simplest Model
· · · M 1 2 Random Energy Model (REM) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(µ, σ2) Planted (P-REM) Recover planted solution? Return max weight one Maximum Likelihood (ML)
11 - 1
“Simple vs Hard”
Random Graph P-REM · · · M 1 2
11 - 2
“Simple vs Hard”
Random Graph P-REM · · · M 1 2 = N
k
11 - 3
“Simple vs Hard”
Random Graph P-REM · · · M 1 2 = N
k
11 - 4
“Simple vs Hard”
Random Graph P-REM · · · M 1 2 independent dependent = N
k
11 - 5
“Simple vs Hard”
Random Graph P-REM · · · M 1 2 independent dependent search is hard search maybe easier = N
k
12
13 - 1 Signal to Noise Ratio = ^
µ ^ σ
P-REM · · · M 1 2 Recover planted solution? 1 fail success
13 - 2 Signal to Noise Ratio = ^
µ ^ σ
P-REM · · · M 1 2 Recover planted solution? 1 fail success M → ∞ P(success) → 0 P(success) → 1
14 - 1
Overview of Contribution
N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM
14 - 2
Overview of Contribution
N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM h = k
14 - 3
Overview of Contribution
N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM · · · M 1 2 h = k
14 - 4
Overview of Contribution
N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM · · · k-planted REM h = 1
14 - 5
Overview of Contribution
N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM 1 fail success · · · M 1 2 P-REM
14 - 6
Overview of Contribution
N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM 1 fail success · · · M 1 2 P-REM γfail fail success γsucc
Technique (”reduction”)
15 - 1
Bounds
h Model γfail γsucc 1 k-P-REM 1 1 2 Graph
k−1
2
k−1
2 < h < k Hypergraph
(k−1
h−1)
2
(k−1
h−1)
k P-REM 1 1 fail success γfail γsucc k = o(log N)
15 - 2
Bounds
h Model γfail γsucc 1 k-P-REM 1 1 2 Graph
k−1
2
k−1
2 < h < k Hypergraph
(k−1
h−1)
2
(k−1
h−1)
k P-REM 1 1 fail success γfail γsucc k = o(log N)
15 - 3
Bounds
h Model γfail γsucc 1 k-P-REM 1 1 2 Graph
k−1
2
k−1
2 < h < k Hypergraph
(k−1
h−1)
2
(k−1
h−1)
k P-REM 1 1 fail success γfail γsucc k = o(log N)
15 - 4
Bounds
h Model γfail γsucc 1 k-P-REM 1 1 2 Graph
k−1
2
k−1
2 < h < k Hypergraph
(k−1
h−1)
2
(k−1
h−1)
k P-REM 1 1 fail success γfail γsucc k = o(log N) k k
15 - 5
Bounds
h Model γfail γsucc 1 k-P-REM 1 1 2 Graph
k−1
2
k−1
2 < h < k Hypergraph
(k−1
h−1)
2
(k−1
h−1)
k P-REM 1 1 fail success γfail γsucc k = o(log N) k k recovery easier
15 - 6
Bounds
h Model γfail γsucc 1 k-P-REM 1 1 2 Graph
k−1
2
k−1
2 < h < k Hypergraph
(k−1
h−1)
2
(k−1
h−1)
k P-REM 1 1 fail success γfail γsucc k = o(log N) Recover planted solution? maximum likelihood (ML)
16 - 1
Connections to Other Works
p q
16 - 2
Connections to Other Works
p q (Barak et al., FOCS’16). Planted Clique Bernulli =1 =1/2 k = Θ(log N) k = √ N A nearly tight sum-of-squares lower bound for the planted clique problem . . .
16 - 3
Connections to Other Works
p q Bisection Bernulli k = N/2
16 - 4
Connections to Other Works
p q Bernulli “Exact recovery in the stochastic block model” (Abbe, , Bandeira, and Hall, IEEE Transactions on Information Theory ’16) , Stochastic Block Model “Consistency thresholds for the planted bisection model” (Mossel, Neeman, and Sly, STOC 15) “Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery,” (E. Abbe and C. Sandon, FOCS ’15)
16 - 5
Connections to Other Works
p q “Information-theoretic bounds for exact recovery in weighted stochastic block models using the renyi divergence” (Jog and Loh, arXiv 2015) Weighted Generic distr. k = N/c Stochastic Block Model
16 - 6
Connections to Other Works
p q Stochastic Block Model Hypergraph “Consistency of spectral partitioning of uniform hypergraphs under planted partition model” (Ghoshdastidar and Dukkipati, NIPS ’14) “Consistency of spectral hypergraph partitioning under planted partition model,” (Ghoshdastidar et al, The Annals of Statistics ’17) Bernulli
16 - 7
Connections to Other Works
p q Stochastic Block Model Weighted Hypergraph Gaussian k = N/2 ”Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares” (Chiheon Bandeira, Goemans, Int. Conf. on Sampling Theory and Applications, ’17)
17
Open Questions
18