How Robust are Thresholds for Community Detection? Ankur Moitra - - PowerPoint PPT Presentation

how robust are thresholds for community detection
SMART_READER_LITE
LIVE PREVIEW

How Robust are Thresholds for Community Detection? Ankur Moitra - - PowerPoint PPT Presentation

How Robust are Thresholds for Community Detection? Ankur Moitra (MIT) Robust Statistics Summer School Let me tell you a story about the success of belief propagation and statistical physics THE STOCHASTIC BLOCK MODEL Introduced by Holland,


slide-1
SLIDE 1

How Robust are Thresholds for Community Detection?

Ankur Moitra (MIT)

Robust Statistics Summer School

slide-2
SLIDE 2

Let me tell you a story about the success of belief propagation and statistical physics…

slide-3
SLIDE 3

THE STOCHASTIC BLOCK MODEL

Introduced by Holland, Laskey and Leinhardt (1983):

Ÿ k communities Ÿ connection probabilities

Q =

Q11 Q12 Q13 Q12 Q22 Q32 Q13 Q32 Q33

probability Q13 probability Q11 Ÿ edges independent

slide-4
SLIDE 4

THE STOCHASTIC BLOCK MODEL

Introduced by Holland, Laskey and Leinhardt (1983):

Ÿ k communities Ÿ connection probabilities

Q =

Q11 Q12 Q13 Q12 Q22 Q32 Q13 Q32 Q33

probability Q13 probability Q11 Ÿ edges independent

Ubiquitous model studied in statistics, computer science, information theory, statistical physics

slide-5
SLIDE 5

Testbed for diverse range of algorithms (1) Combinatorial Methods e.g. degree counting [Bui, Chaudhuri, Leighton, Sipser ‘87]

slide-6
SLIDE 6

Testbed for diverse range of algorithms (1) Combinatorial Methods e.g. degree counting [Bui, Chaudhuri, Leighton, Sipser ‘87] (2) Spectral Methods e.g. [McSherry ‘01]

slide-7
SLIDE 7

Testbed for diverse range of algorithms (1) Combinatorial Methods e.g. degree counting [Bui, Chaudhuri, Leighton, Sipser ‘87] (2) Spectral Methods e.g. [McSherry ‘01] (3) Markov chain Monte Carlo (MCMC) e.g. [Jerrum, Sorkin ‘98]

slide-8
SLIDE 8

Testbed for diverse range of algorithms (1) Combinatorial Methods e.g. degree counting [Bui, Chaudhuri, Leighton, Sipser ‘87] (2) Spectral Methods e.g. [McSherry ‘01] (3) Markov chain Monte Carlo (MCMC) e.g. [Jerrum, Sorkin ‘98] (4) Semidefinite Programs e.g. [Boppana ‘87]

slide-9
SLIDE 9

Testbed for diverse range of algorithms (1) Combinatorial Methods e.g. degree counting [Bui, Chaudhuri, Leighton, Sipser ‘87] (2) Spectral Methods e.g. [McSherry ‘01] (3) Markov chain Monte Carlo (MCMC) e.g. [Jerrum, Sorkin ‘98] (4) Semidefinite Programs e.g. [Boppana ‘87] These algorithms succeed in some ranges of parameters

slide-10
SLIDE 10

Testbed for diverse range of algorithms (1) Combinatorial Methods e.g. degree counting [Bui, Chaudhuri, Leighton, Sipser ‘87] (2) Spectral Methods e.g. [McSherry ‘01] (3) Markov chain Monte Carlo (MCMC) e.g. [Jerrum, Sorkin ‘98] (4) Semidefinite Programs e.g. [Boppana ‘87] Can we reach the fundamental limits of the SBM? These algorithms succeed in some ranges of parameters

slide-11
SLIDE 11

Following Decelle, Krzakala, Moore and Zdeborová (2011), let’s study the sparse regime: a n a n b n where a, b = O(1) so that there are O(n) edges

slide-12
SLIDE 12

Following Decelle, Krzakala, Moore and Zdeborová (2011), let’s study the sparse regime: Remark: The degree of each node is Poi(a/2+b/2) hence there are many isolated nodes whose community we cannot find a n a n b n where a, b = O(1) so that there are O(n) edges

slide-13
SLIDE 13

Following Decelle, Krzakala, Moore and Zdeborová (2011), let’s study the sparse regime: Remark: The degree of each node is Poi(a/2+b/2) hence there are many isolated nodes whose community we cannot find Goal (Partial Recovery): Find a partition that has agreement better than ½ with true community structure a n a n b n where a, b = O(1) so that there are O(n) edges

slide-14
SLIDE 14

Following Decelle, Krzakala, Moore and Zdeborová (2011), let’s study the sparse regime: a n a n b n where a, b = O(1) so that there are O(n) edges Conjecture: Partial recovery is possible iff (a-b)2 > 2(a+b)

slide-15
SLIDE 15

Following Decelle, Krzakala, Moore and Zdeborová (2011), let’s study the sparse regime: a n a n b n where a, b = O(1) so that there are O(n) edges Conjecture: Partial recovery is possible iff (a-b)2 > 2(a+b) Conjecture is based on fixed points of belief propagation…

slide-16
SLIDE 16

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-17
SLIDE 17

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-18
SLIDE 18

BELIEF PROPAGATION

Introduced by Judea Pearl (1982):

“For fundamental contributions … to probabilistic and causal reasoning”

slide-19
SLIDE 19

… … …

u v

Adapted to community detection: Message vèu Probability I think I am community #1, community #2, … Do same for all nodes

slide-20
SLIDE 20

… … …

u v

Adapted to community detection: Message vèu Probability I think I am community #1, community #2, … update beliefs Do same for all nodes

slide-21
SLIDE 21

… … …

u v

Adapted to community detection: Message vèu Probability I think I am community #1, community #2, …

… … …

u v

Message uèv New probability I think I am community #1, community #2, … update beliefs Do same for all nodes Do same for all nodes

slide-22
SLIDE 22

THE TRIVIAL FIXED POINT

Belief propagation has a trivial fixed point where it gets stuck

slide-23
SLIDE 23

THE TRIVIAL FIXED POINT

Belief propagation has a trivial fixed point where it gets stuck

u

Pr[red] = ½ Pr[blue] = ½ Pr[red] = ½ Pr[blue] = ½ Pr[red] = ½ Pr[blue] = ½ Pr[red] = ½ Pr[blue] = ½

slide-24
SLIDE 24

THE TRIVIAL FIXED POINT

Belief propagation has a trivial fixed point where it gets stuck

u

Pr[red] = ½ Pr[blue] = ½ Pr[red] = ½ Pr[blue] = ½ Pr[red] = ½ Pr[blue] = ½ Pr[red] = ½ Pr[blue] = ½ Claim: No one knows anything, so you never have to update your beliefs

slide-25
SLIDE 25

THE TRIVIAL FIXED POINT

Belief propagation has a trivial fixed point where it gets stuck Fact: If (a-b)2 > 2(a+b) then the trivial fixed point is unstable

slide-26
SLIDE 26

THE TRIVIAL FIXED POINT

Belief propagation has a trivial fixed point where it gets stuck Fact: If (a-b)2 > 2(a+b) then the trivial fixed point is unstable Hope: Whatever it finds, solves partial recovery

slide-27
SLIDE 27

THE TRIVIAL FIXED POINT

Belief propagation has a trivial fixed point where it gets stuck Fact: If (a-b)2 > 2(a+b) then the trivial fixed point is unstable Hope: Whatever it finds, solves partial recovery Evidence based on simulations

slide-28
SLIDE 28

THE TRIVIAL FIXED POINT

Belief propagation has a trivial fixed point where it gets stuck Fact: If (a-b)2 > 2(a+b) then the trivial fixed point is unstable Hope: Whatever it finds, solves partial recovery And if (a-b)2 ≤ 2(a+b) and it does get stuck, then maybe partial recovery is information theoretically impossible? Evidence based on simulations

slide-29
SLIDE 29

CONJECTURE IS PROVED!

Mossel, Neeman and Sly (2013) and Massoulie (2013): Theorem: It is possible to find a partition that is correlated with true communities iff (a-b)2 > 2(a+b)

slide-30
SLIDE 30

CONJECTURE IS PROVED!

Mossel, Neeman and Sly (2013) and Massoulie (2013): Theorem: It is possible to find a partition that is correlated with true communities iff (a-b)2 > 2(a+b) Later attempts based on SDPs only get to (a-b)2 > C(a+b), for some C > 2

slide-31
SLIDE 31

CONJECTURE IS PROVED!

Mossel, Neeman and Sly (2013) and Massoulie (2013): Theorem: It is possible to find a partition that is correlated with true communities iff (a-b)2 > 2(a+b) (a-b)2 > C(a+b), for some C > 2 Are nonconvex methods better than convex programs? Later attempts based on SDPs only get to

slide-32
SLIDE 32

CONJECTURE IS PROVED!

Mossel, Neeman and Sly (2013) and Massoulie (2013): Theorem: It is possible to find a partition that is correlated with true communities iff (a-b)2 > 2(a+b) (a-b)2 > C(a+b), for some C > 2 Are nonconvex methods better than convex programs? How do predictions of statistical physics and SDPs compare? Later attempts based on SDPs only get to

slide-33
SLIDE 33

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-34
SLIDE 34

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-35
SLIDE 35

SEMI-RANDOM MODELS

Introduced by Blum and Spencer (1995), Feige and Kilian (2001):

slide-36
SLIDE 36

SEMI-RANDOM MODELS

Introduced by Blum and Spencer (1995), Feige and Kilian (2001): (1) Sample graph from SBM

slide-37
SLIDE 37

SEMI-RANDOM MODELS

Introduced by Blum and Spencer (1995), Feige and Kilian (2001): (1) Sample graph from SBM (2) Adversary can add edges within community and delete edges crossing

slide-38
SLIDE 38

SEMI-RANDOM MODELS

Introduced by Blum and Spencer (1995), Feige and Kilian (2001): (1) Sample graph from SBM (2) Adversary can add edges within community and delete edges crossing

slide-39
SLIDE 39

SEMI-RANDOM MODELS

Introduced by Blum and Spencer (1995), Feige and Kilian (2001): (1) Sample graph from SBM (2) Adversary can add edges within community and delete edges crossing

slide-40
SLIDE 40

SEMI-RANDOM MODELS

Introduced by Blum and Spencer (1995), Feige and Kilian (2001): (1) Sample graph from SBM (2) Adversary can add edges within community and delete edges crossing Algorithms can no longer over tune to distribution

slide-41
SLIDE 41

A NON-ROBUST ALGORITHM

Consider the following SBM: 1 2 1 2 1 4

slide-42
SLIDE 42

A NON-ROBUST ALGORITHM

Consider the following SBM: 1 2 1 2 1 4 Nodes from same community: 1 2

( )

2 n 2 + 1 4

( )

2 n 2 Number of common neighbors

slide-43
SLIDE 43

A NON-ROBUST ALGORITHM

Consider the following SBM: 1 2 1 2 1 4 Nodes from same community: 1 2

( )

2 n 2 + 1 4

( )

2 n 2 Number of common neighbors Nodes from diff. community: 1 2

( )

1 4

( ) n

slide-44
SLIDE 44

A NON-ROBUST ALGORITHM

Consider the following SBM: 1 2 1 2 1 4 Nodes from same community: 1 2

( )

2 n 2 + 1 4

( )

2 n 2 Number of common neighbors Nodes from diff. community: 1 2

( )

1 4

( ) n

slide-45
SLIDE 45

A NON-ROBUST ALGORITHM

Semi-random adversary: Add clique to red community 1 2 1 1 4

slide-46
SLIDE 46

A NON-ROBUST ALGORITHM

Semi-random adversary: Add clique to red community 1 2 1 1 4 Nodes from blue community: 1 2

( )

2 n 2 + 1 4

( )

2 n 2 Number of common neighbors

slide-47
SLIDE 47

A NON-ROBUST ALGORITHM

Semi-random adversary: Add clique to red community 1 2 1 1 4 Nodes from blue community: 1 2

( )

2 n 2 + 1 4

( )

2 n 2 Number of common neighbors Nodes from diff. community: 1 2

( )

1 4

( )

n 2 + 1 4

( )

n 2

slide-48
SLIDE 48

A NON-ROBUST ALGORITHM

Semi-random adversary: Add clique to red community 1 2 1 1 4 Nodes from blue community: 1 2

( )

2 n 2 + 1 4

( )

2 n 2 Number of common neighbors Nodes from diff. community: 1 2

( )

1 4

( )

n 2 + 1 4

( )

n 2

slide-49
SLIDE 49

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-50
SLIDE 50

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-51
SLIDE 51

OUR RESULTS

“Helpful” changes can hurt: Theorem: Community detection in semirandom model is impossible for (a-b)2 ≤ Ca,b(a+b) for some Ca,b > 2

slide-52
SLIDE 52

OUR RESULTS

“Helpful” changes can hurt: But SDPs continue to work in semirandom model Theorem: Community detection in semirandom model is impossible for (a-b)2 ≤ Ca,b(a+b) for some Ca,b > 2

slide-53
SLIDE 53

OUR RESULTS

“Helpful” changes can hurt: But SDPs continue to work in semirandom model Theorem: Community detection in semirandom model is impossible for (a-b)2 ≤ Ca,b(a+b) for some Ca,b > 2 Follows same blueprint as [Guedon, Vershynin]

slide-54
SLIDE 54

OUR RESULTS

“Helpful” changes can hurt: But SDPs continue to work in semirandom model Theorem: Community detection in semirandom model is impossible for (a-b)2 ≤ Ca,b(a+b) for some Ca,b > 2 Follows same blueprint as [Guedon, Vershynin] See [Makarychev, Makarychev, Vijayaraghavan] for SDP-based robustness guarantees for k > 2 communities

slide-55
SLIDE 55

OUR RESULTS

“Helpful” changes can hurt: But SDPs continue to work in semirandom model Reaching the information theoretic threshold requires exploiting the structure of the noise Theorem: Community detection in semirandom model is impossible for (a-b)2 ≤ Ca,b(a+b) for some Ca,b > 2

slide-56
SLIDE 56

OUR RESULTS

“Helpful” changes can hurt: But SDPs continue to work in semirandom model Reaching the information theoretic threshold requires exploiting the structure of the noise This is first separation between what is possible in random vs. semirandom models Theorem: Community detection in semirandom model is impossible for (a-b)2 ≤ Ca,b(a+b) for some Ca,b > 2

slide-57
SLIDE 57

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-58
SLIDE 58

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-59
SLIDE 59

Let’s start with a simpler model originating from genetics…

slide-60
SLIDE 60

BROADCAST TREE MODEL

(1) Root is either red/blue

slide-61
SLIDE 61

BROADCAST TREE MODEL

(1) Root is either red/blue (2) Each node gives birth to Poi(a/2) nodes of same color and Poi(b/2) nodes

  • f opposite color
slide-62
SLIDE 62

BROADCAST TREE MODEL

(1) Root is either red/blue (2) Each node gives birth to Poi(a/2) nodes of same color and Poi(b/2) nodes

  • f opposite color
slide-63
SLIDE 63

BROADCAST TREE MODEL

(1) Root is either red/blue (2) Each node gives birth to Poi(a/2) nodes of same color and Poi(b/2) nodes

  • f opposite color
slide-64
SLIDE 64

BROADCAST TREE MODEL

(1) Root is either red/blue (2) Each node gives birth to Poi(a/2) nodes of same color and Poi(b/2) nodes

  • f opposite color
slide-65
SLIDE 65

BROADCAST TREE MODEL

(1) Root is either red/blue (2) Each node gives birth to Poi(a/2) nodes of same color and Poi(b/2) nodes

  • f opposite color
slide-66
SLIDE 66

BROADCAST TREE MODEL

(1) Root is either red/blue (3) Goal: From leaves and unlabeled tree, guess color

  • f root with > ½ prob. indep.
  • f n (# of levels)

(2) Each node gives birth to Poi(a/2) nodes of same color and Poi(b/2) nodes

  • f opposite color
slide-67
SLIDE 67

BROADCAST TREE MODEL

(1) Root is either red/blue (3) Goal: From leaves and unlabeled tree, guess color

  • f root with > ½ prob. indep.
  • f n (# of levels)

(2) Each node gives birth to Poi(a/2) nodes of same color and Poi(b/2) nodes

  • f opposite color

This is the natural analogue for partial recovery

slide-68
SLIDE 68

BROADCAST TREE MODEL

(1) Root is either red/blue (3) Goal: From leaves and unlabeled tree, guess color

  • f root with > ½ prob. indep.
  • f n (# of levels)

For what values of a and b can we guess the root? (2) Each node gives birth to Poi(a/2) nodes of same color and Poi(b/2) nodes

  • f opposite color
slide-69
SLIDE 69

THE KESTEN STIGUM BOUND

“Best way to reconstruct root from leaves is majority vote”

slide-70
SLIDE 70

THE KESTEN STIGUM BOUND

“Best way to reconstruct root from leaves is majority vote” Theorem [Kesten, Stigum, ‘66]: Majority vote of the leaves succeeds with probability > ½ iff (a-b)2 > 2(a+b)

slide-71
SLIDE 71

THE KESTEN STIGUM BOUND

“Best way to reconstruct root from leaves is majority vote” Theorem [Kesten, Stigum, ‘66]: Majority vote of the leaves succeeds with probability > ½ iff (a-b)2 > 2(a+b) More generally, gave a limit theorem for multi-type branching processes

slide-72
SLIDE 72

THE KESTEN STIGUM BOUND

“Best way to reconstruct root from leaves is majority vote” Theorem [Kesten, Stigum, ‘66]: Majority vote of the leaves succeeds with probability > ½ iff (a-b)2 > 2(a+b) More generally, gave a limit theorem for multi-type branching processes Theorem [Evans et al., ‘00]: Reconstruction is information theoretically impossible if (a-b)2 ≤ 2(a+b)

slide-73
SLIDE 73

THE KESTEN STIGUM BOUND

“Best way to reconstruct root from leaves is majority vote” Theorem [Kesten, Stigum, ‘66]: Majority vote of the leaves succeeds with probability > ½ iff (a-b)2 > 2(a+b) More generally, gave a limit theorem for multi-type branching processes Theorem [Evans et al., ‘00]: Reconstruction is information theoretically impossible if (a-b)2 ≤ 2(a+b) Local view in SBM = Broadcast Tree

slide-74
SLIDE 74

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-75
SLIDE 75

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-76
SLIDE 76

SEMIRANDOM BROADCAST TREE MODEL

Definition: A semirandom adversary can cut edges between nodes of opposite colors and remove entire subtree

slide-77
SLIDE 77

SEMIRANDOM BROADCAST TREE MODEL

Definition: A semirandom adversary can cut edges between nodes of opposite colors and remove entire subtree e.g.

slide-78
SLIDE 78

SEMIRANDOM BROADCAST TREE MODEL

Definition: A semirandom adversary can cut edges between nodes of opposite colors and remove entire subtree e.g.

slide-79
SLIDE 79

SEMIRANDOM BROADCAST TREE MODEL

Definition: A semirandom adversary can cut edges between nodes of opposite colors and remove entire subtree Analogous to cutting edges between communities, and changing the local neighborhood in the SBM

slide-80
SLIDE 80

SEMIRANDOM BROADCAST TREE MODEL

Definition: A semirandom adversary can cut edges between nodes of opposite colors and remove entire subtree Can the adversary usually flip the majority vote? Analogous to cutting edges between communities, and changing the local neighborhood in the SBM

slide-81
SLIDE 81

Key Observation: Some node’s descendants vote opposite way

slide-82
SLIDE 82

Key Observation: Some node’s descendants vote opposite way

slide-83
SLIDE 83

Key Observation: Some node’s descendants vote opposite way Near the Kesten-Stigum bound, this happens everywhere

slide-84
SLIDE 84

Key Observation: Some node’s descendants vote opposite way By cutting these edges, adversary can usually flip majority vote

slide-85
SLIDE 85

This breaks majority vote, but how do we move the information theoretic threshold?

slide-86
SLIDE 86

This breaks majority vote, but how do we move the information theoretic threshold? Need carefully chosen adversary where we can prove things about the distribution we get after he’s done

slide-87
SLIDE 87

This breaks majority vote, but how do we move the information theoretic threshold? Need carefully chosen adversary where we can prove things about the distribution we get after he’s done e.g. If we cut every subtree where this happens, would mess up independence properties More likely to have red children, given his parent is red and he was not cut

slide-88
SLIDE 88

This breaks majority vote, but how do we move the information theoretic threshold? Need carefully chosen adversary where we can prove things about the distribution we get after he’s done Need to design adversary that puts us back into nice model e.g. a model on a tree where a sharp threshold is known

slide-89
SLIDE 89

This breaks majority vote, but how do we move the information theoretic threshold? Need carefully chosen adversary where we can prove things about the distribution we get after he’s done Need to design adversary that puts us back into nice model e.g. a model on a tree where a sharp threshold is known Following [Mossel, Neeman, Sly] we can embed the lower bound for semi-random BTM in semi-random SBM

slide-90
SLIDE 90

This breaks majority vote, but how do we move the information theoretic threshold? Need carefully chosen adversary where we can prove things about the distribution we get after he’s done Need to design adversary that puts us back into nice model e.g. a model on a tree where a sharp threshold is known Following [Mossel, Neeman, Sly] we can embed the lower bound for semi-random BTM in semi-random SBM e.g. Usual complication: once I reveal colors at boundary

  • f neighborhood, need to show there’s little information

you can get from rest of graph

slide-91
SLIDE 91

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-92
SLIDE 92

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-93
SLIDE 93

SEMIRANDOM BROADCAST TREE MODEL

“Helpful” changes can hurt: Theorem: Reconstruction in semi-random broadcast tree model is impossible for (a-b)2 ≤ Ca,b(a+b) for some Ca,b > 2

slide-94
SLIDE 94

SEMIRANDOM BROADCAST TREE MODEL

“Helpful” changes can hurt: Theorem: Reconstruction in semi-random broadcast tree model is impossible for (a-b)2 ≤ Ca,b(a+b) for some Ca,b > 2 Is there any algorithm that succeeds in semirandom BTM?

slide-95
SLIDE 95

SEMIRANDOM BROADCAST TREE MODEL

“Helpful” changes can hurt: Theorem: Reconstruction in semi-random broadcast tree model is impossible for (a-b)2 ≤ Ca,b(a+b) for some Ca,b > 2 Theorem: Recursive majority succeeds in semi-random broadcast tree model if log a+b 2 (a-b)2 > (2 + o(1))(a+b) Is there any algorithm that succeeds in semirandom BTM?

slide-96
SLIDE 96

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-97
SLIDE 97

Part I: Introduction Ÿ The Stochastic Block Model Ÿ Belief Propagation and its Predictions Ÿ Semi-Random Models Ÿ Our Results Part II: Broadcast Tree Model Ÿ The Kesten-Stigum Bound Ÿ A First Semi-Random vs. Random Separation Ÿ Our Results, continued

OUTLINE

Part III: Above Average-Case?

slide-98
SLIDE 98

Recursive majority is used in practice, despite the fact that it is known not to achieve the KS bound, why?

slide-99
SLIDE 99

Recursive majority is used in practice, despite the fact that it is known not to achieve the KS bound, why? Models are a measuring stick to compare algorithms, but are we studying the right ones?

slide-100
SLIDE 100

Recursive majority is used in practice, despite the fact that it is known not to achieve the KS bound, why? Models are a measuring stick to compare algorithms, but are we studying the right ones? Average-case models: When we have many algorithms, can we find the best one?

slide-101
SLIDE 101

Recursive majority is used in practice, despite the fact that it is known not to achieve the KS bound, why? Semi-random models: When recursive majority works, it’s not exploiting the structure of the noise Models are a measuring stick to compare algorithms, but are we studying the right ones? Average-case models: When we have many algorithms, can we find the best one?

slide-102
SLIDE 102

Recursive majority is used in practice, despite the fact that it is known not to achieve the KS bound, why? Semi-random models: When recursive majority works, it’s not exploiting the structure of the noise Models are a measuring stick to compare algorithms, but are we studying the right ones? Average-case models: When we have many algorithms, can we find the best one? This is an axis on which recursive majority is superior

slide-103
SLIDE 103

BETWEEN WORST-CASE AND AVERAGE-CASE

“Explain why algorithms work well in practice, despite bad worst-case behavior” Spielman and Teng (2001): Usually called Beyond Worst-Case Analysis

slide-104
SLIDE 104

BETWEEN WORST-CASE AND AVERAGE-CASE

“Explain why algorithms work well in practice, despite bad worst-case behavior” Spielman and Teng (2001): Usually called Beyond Worst-Case Analysis Semirandom models as Above Average-Case Analysis?

slide-105
SLIDE 105

BETWEEN WORST-CASE AND AVERAGE-CASE

“Explain why algorithms work well in practice, despite bad worst-case behavior” Spielman and Teng (2001): Usually called Beyond Worst-Case Analysis Semirandom models as Above Average-Case Analysis? What else are we missing, if we only study problems in the average-case?

slide-106
SLIDE 106

Let M be an unknown, low-rank matrix

≈ + … +

M

+

comedy drama sports

THE NETFLIX PROBLEM

slide-107
SLIDE 107

Let M be an unknown, low-rank matrix

≈ + … +

M

+

comedy drama sports Model: We are given random observations Mi,j for all i,j Ω

THE NETFLIX PROBLEM

slide-108
SLIDE 108

Let M be an unknown, low-rank matrix

≈ + … +

M

+

comedy drama sports Model: We are given random observations Mi,j for all i,j Ω Is there an efficient algorithm to recover M?

THE NETFLIX PROBLEM

slide-109
SLIDE 109

[Fazel], [Srebro, Shraibman], [Recht, Fazel, Parrilo], [Candes, Recht], [Candes, Tao], [Candes, Plan], [Recht],

min X s.t.

*

(i,j) Ω

|Xi,j–Mi,j| ≤ η

(P)

Here * X is the nuclear norm, i.e. sum of the singular values of X

CONVEX PROGRAMMING APPROACH

slide-110
SLIDE 110

[Fazel], [Srebro, Shraibman], [Recht, Fazel, Parrilo], [Candes, Recht], [Candes, Tao], [Candes, Plan], [Recht],

min X s.t.

*

(i,j) Ω

|Xi,j–Mi,j| ≤ η

(P)

Theorem: If M is n x n and has rank r, and is C-incoherent then (P) recovers M exactly from C6nrlog2n observations Here * X is the nuclear norm, i.e. sum of the singular values of X

CONVEX PROGRAMMING APPROACH

slide-111
SLIDE 111

ALTERNATING MINIMIZATION

U

(i,j) Ω

|(UVT)i,j–Mi,j|

2

argmin

U Repeat:

V

(i,j) Ω

|(UVT)i,j–Mi,j|

2

argmin

V

[Keshavan, Montanari, Oh], [Jain, Netrapalli, Sanghavi], [Hardt]

slide-112
SLIDE 112

ALTERNATING MINIMIZATION

U

(i,j) Ω

|(UVT)i,j–Mi,j|

2

argmin

U Repeat:

V

(i,j) Ω

|(UVT)i,j–Mi,j|

2

argmin

V

[Keshavan, Montanari, Oh], [Jain, Netrapalli, Sanghavi], [Hardt]

Theorem: If M is n x n and has rank r, and is C-incoherent then alternating minimization approximately recovers M from Cnr2

F

M

2

σr

2

  • bservations
slide-113
SLIDE 113

ALTERNATING MINIMIZATION

U

(i,j) Ω

|(UVT)i,j–Mi,j|

2

argmin

U Repeat:

V

(i,j) Ω

|(UVT)i,j–Mi,j|

2

argmin

V

[Keshavan, Montanari, Oh], [Jain, Netrapalli, Sanghavi], [Hardt]

Theorem: If M is n x n and has rank r, and is C-incoherent then alternating minimization approximately recovers M from Cnr2

F

M

2

σr

2

  • bservations

Running time and space complexity are better

slide-114
SLIDE 114

What if an adversary reveals more entries of M?

slide-115
SLIDE 115

min X s.t.

*

(i,j) Ω

|Xi,j–Mi,j| ≤ η

(P)

What if an adversary reveals more entries of M? still works, it’s just more constraints Convex program:

slide-116
SLIDE 116

min X s.t.

*

(i,j) Ω

|Xi,j–Mi,j| ≤ η

(P)

What if an adversary reveals more entries of M? still works, it’s just more constraints Analysis completely breaks down

  • bserved matrix is no longer good spectral approx. to M

Convex program: Alternating minimization:

slide-117
SLIDE 117

min X s.t.

*

(i,j) Ω

|Xi,j–Mi,j| ≤ η

(P)

What if an adversary reveals more entries of M? still works, it’s just more constraints Convex program: Alternating minimization: Are there variants that work in semi-random models?

slide-118
SLIDE 118

Summary: Ÿ “Helpful” adversaries can make the problem harder Ÿ Gave first random vs. semi-random separations Ÿ Can we go above average-case analysis?

slide-119
SLIDE 119

Thanks! Any Questions?

Summary: Ÿ “Helpful” adversaries can make the problem harder Ÿ Gave first random vs. semi-random separations Ÿ Can we go above average-case analysis?