Exact Recovery for a Family of Community-Detection Generative - - PowerPoint PPT Presentation

exact recovery for a family of community detection
SMART_READER_LITE
LIVE PREVIEW

Exact Recovery for a Family of Community-Detection Generative - - PowerPoint PPT Presentation

Exact Recovery for a Family of Community-Detection Generative Models Paolo Penna joint work with Joachim Buhmann, Luca Corinzia, Luca Mondada ISE Group, Inst for Machine Learning, ETH Zurich 1 A Toy Problem 2 - 1 A Toy Problem pick random


slide-1
SLIDE 1

1

Exact Recovery for a Family

  • f Community-Detection

Generative Models

Paolo Penna Joachim Buhmann, Luca Corinzia, Luca Mondada

joint work with ISE Group, Inst for Machine Learning, ETH Zurich

slide-2
SLIDE 2

2 - 1

A Toy Problem

slide-3
SLIDE 3

2 - 2

A Toy Problem

pick random triangle

slide-4
SLIDE 4

2 - 3

A Toy Problem

pick random triangle add noise

slide-5
SLIDE 5

2 - 4

A Toy Problem

pick random triangle add noise

slide-6
SLIDE 6

2 - 5

A Toy Problem

pick random triangle add noise Find the triangle?

slide-7
SLIDE 7

3 - 1

A Toy Problem

pick random triangle 1 1 1 Find the triangle?

slide-8
SLIDE 8

3 - 2

A Toy Problem

pick random triangle add noise 1 1 1 Find the triangle? ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2)

slide-9
SLIDE 9

3 - 3

A Toy Problem

pick random triangle add noise µ µ µ Find the triangle? ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2)

slide-10
SLIDE 10

3 - 4

A Toy Problem

pick random triangle Find the triangle? add noise ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(0, σ2) ∼ N(0, σ2)

slide-11
SLIDE 11

3 - 5

A Toy Problem

pick random triangle Find the triangle? add noise ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(0, σ2) ∼ N(0, σ2) Return heaviest triangle

slide-12
SLIDE 12

3 - 6

A Toy Problem

pick random triangle Find the triangle? add noise ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(0, σ2) ∼ N(0, σ2) Return heaviest triangle signal vs noise

slide-13
SLIDE 13

3 - 7

A Toy Problem

pick random triangle Find the triangle? add noise ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(0, σ2) ∼ N(0, σ2) Return heaviest triangle signal vs noise

slide-14
SLIDE 14

3 - 8

A Toy Problem

pick random triangle Find the triangle? add noise ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(µ, σ2) ∼ N(0, σ2) ∼ N(0, σ2) Return heaviest triangle random graph model

slide-15
SLIDE 15

4 - 1

Flavor of the Problem

Recover planted solution? random noise planted random model planted solution

slide-16
SLIDE 16

4 - 2

Flavor of the Problem

Recover planted solution? random noise planted random model planted solution Many variants: planted clique,planted bisection, stochastic block model,...

slide-17
SLIDE 17

5

Triangles ⇓ Our General Model

slide-18
SLIDE 18

6

Generalization #1

k

= ⇒

slide-19
SLIDE 19

7 - 1

Planted Random Models

Random Graph community k

slide-20
SLIDE 20

7 - 2

Planted Random Models

Random Graph k Weighted Stochastic Block Model

slide-21
SLIDE 21

7 - 3

Planted Random Models

Random Graph k Weighted Stochastic Block Model Densest k-Subgraph Problem

slide-22
SLIDE 22

8

Generalization #2

h

= ⇒

slide-23
SLIDE 23

9

Our Model

N nodes k (solution) h (hyperedge)

slide-24
SLIDE 24

10 - 1

The Simplest Model

· · · M 1 2 Random Energy Model (REM)

slide-25
SLIDE 25

10 - 2

The Simplest Model

· · · M 1 2 Random Energy Model (REM) solutions

slide-26
SLIDE 26

10 - 3

The Simplest Model

· · · M 1 2 Random Energy Model (REM) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2)

slide-27
SLIDE 27

10 - 4

The Simplest Model

· · · M 1 2 Random Energy Model (REM) independent ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2)

slide-28
SLIDE 28

10 - 5

The Simplest Model

· · · M 1 2 Random Energy Model (REM) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(µ, σ2) Planted (P-REM)

slide-29
SLIDE 29

10 - 6

The Simplest Model

· · · M 1 2 Random Energy Model (REM) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(µ, σ2) Planted (P-REM) Recover planted solution? Return max weight one

slide-30
SLIDE 30

10 - 7

The Simplest Model

· · · M 1 2 Random Energy Model (REM) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(0, σ2) ∼ N(µ, σ2) Planted (P-REM) Recover planted solution? Return max weight one Maximum Likelihood (ML)

slide-31
SLIDE 31

11 - 1

“Simple vs Hard”

Random Graph P-REM · · · M 1 2

slide-32
SLIDE 32

11 - 2

“Simple vs Hard”

Random Graph P-REM · · · M 1 2 = N

k

slide-33
SLIDE 33

11 - 3

“Simple vs Hard”

Random Graph P-REM · · · M 1 2 = N

k

slide-34
SLIDE 34

11 - 4

“Simple vs Hard”

Random Graph P-REM · · · M 1 2 independent dependent = N

k

slide-35
SLIDE 35

11 - 5

“Simple vs Hard”

Random Graph P-REM · · · M 1 2 independent dependent search is hard search maybe easier = N

k

slide-36
SLIDE 36

12

Our Contribution

slide-37
SLIDE 37

13 - 1 Signal to Noise Ratio = ^

µ ^ σ

P-REM · · · M 1 2 Recover planted solution? 1 fail success

slide-38
SLIDE 38

13 - 2 Signal to Noise Ratio = ^

µ ^ σ

P-REM · · · M 1 2 Recover planted solution? 1 fail success M → ∞ P(success) → 0 P(success) → 1

slide-39
SLIDE 39

14 - 1

Overview of Contribution

N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM

slide-40
SLIDE 40

14 - 2

Overview of Contribution

N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM h = k

slide-41
SLIDE 41

14 - 3

Overview of Contribution

N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM · · · M 1 2 h = k

slide-42
SLIDE 42

14 - 4

Overview of Contribution

N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM · · · k-planted REM h = 1

slide-43
SLIDE 43

14 - 5

Overview of Contribution

N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM 1 fail success · · · M 1 2 P-REM

slide-44
SLIDE 44

14 - 6

Overview of Contribution

N nodes k (solution) h (hyperedge) Family of problems “collaps” to P-REM 1 fail success · · · M 1 2 P-REM γfail fail success γsucc

= ⇒

Technique (”reduction”)

slide-45
SLIDE 45

15 - 1

Bounds

h Model γfail γsucc 1 k-P-REM 1 1 2 Graph

  • 1

k−1

2

  • 1

k−1

2 < h < k Hypergraph

  • 1

(k−1

h−1)

2

  • h

(k−1

h−1)

k P-REM 1 1 fail success γfail γsucc k = o(log N)

slide-46
SLIDE 46

15 - 2

Bounds

h Model γfail γsucc 1 k-P-REM 1 1 2 Graph

  • 1

k−1

2

  • 1

k−1

2 < h < k Hypergraph

  • 1

(k−1

h−1)

2

  • h

(k−1

h−1)

k P-REM 1 1 fail success γfail γsucc k = o(log N)

slide-47
SLIDE 47

15 - 3

Bounds

h Model γfail γsucc 1 k-P-REM 1 1 2 Graph

  • 1

k−1

2

  • 1

k−1

2 < h < k Hypergraph

  • 1

(k−1

h−1)

2

  • h

(k−1

h−1)

k P-REM 1 1 fail success γfail γsucc k = o(log N)

slide-48
SLIDE 48

15 - 4

Bounds

h Model γfail γsucc 1 k-P-REM 1 1 2 Graph

  • 1

k−1

2

  • 1

k−1

2 < h < k Hypergraph

  • 1

(k−1

h−1)

2

  • h

(k−1

h−1)

k P-REM 1 1 fail success γfail γsucc k = o(log N) k k

slide-49
SLIDE 49

15 - 5

Bounds

h Model γfail γsucc 1 k-P-REM 1 1 2 Graph

  • 1

k−1

2

  • 1

k−1

2 < h < k Hypergraph

  • 1

(k−1

h−1)

2

  • h

(k−1

h−1)

k P-REM 1 1 fail success γfail γsucc k = o(log N) k k recovery easier

slide-50
SLIDE 50

15 - 6

Bounds

h Model γfail γsucc 1 k-P-REM 1 1 2 Graph

  • 1

k−1

2

  • 1

k−1

2 < h < k Hypergraph

  • 1

(k−1

h−1)

2

  • h

(k−1

h−1)

k P-REM 1 1 fail success γfail γsucc k = o(log N) Recover planted solution? maximum likelihood (ML)

slide-51
SLIDE 51

16 - 1

Connections to Other Works

p q

slide-52
SLIDE 52

16 - 2

Connections to Other Works

p q (Barak et al., FOCS’16). Planted Clique Bernulli =1 =1/2 k = Θ(log N) k = √ N A nearly tight sum-of-squares lower bound for the planted clique problem . . .

slide-53
SLIDE 53

16 - 3

Connections to Other Works

p q Bisection Bernulli k = N/2

slide-54
SLIDE 54

16 - 4

Connections to Other Works

p q Bernulli “Exact recovery in the stochastic block model” (Abbe, , Bandeira, and Hall, IEEE Transactions on Information Theory ’16) , Stochastic Block Model “Consistency thresholds for the planted bisection model” (Mossel, Neeman, and Sly, STOC 15) “Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery,” (E. Abbe and C. Sandon, FOCS ’15)

slide-55
SLIDE 55

16 - 5

Connections to Other Works

p q “Information-theoretic bounds for exact recovery in weighted stochastic block models using the renyi divergence” (Jog and Loh, arXiv 2015) Weighted Generic distr. k = N/c Stochastic Block Model

slide-56
SLIDE 56

16 - 6

Connections to Other Works

p q Stochastic Block Model Hypergraph “Consistency of spectral partitioning of uniform hypergraphs under planted partition model” (Ghoshdastidar and Dukkipati, NIPS ’14) “Consistency of spectral hypergraph partitioning under planted partition model,” (Ghoshdastidar et al, The Annals of Statistics ’17) Bernulli

slide-57
SLIDE 57

16 - 7

Connections to Other Works

p q Stochastic Block Model Weighted Hypergraph Gaussian k = N/2 ”Community detection in hypergraphs, spiked tensor models, and Sum-of-Squares” (Chiheon Bandeira, Goemans, Int. Conf. on Sampling Theory and Applications, ’17)

slide-58
SLIDE 58

17

Open Questions

Computational Aspects Other Problems

  • trade off (dependency, recoverability, hardness)
  • our technique (“reduce” to REM)
slide-59
SLIDE 59

18

Thank You!!