Introduction to the Low-Degree Polynomial Method Alex Wein Courant - - PowerPoint PPT Presentation

introduction to the low degree polynomial method alex wein
SMART_READER_LITE
LIVE PREVIEW

Introduction to the Low-Degree Polynomial Method Alex Wein Courant - - PowerPoint PPT Presentation

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York University 1 / 31 Part I: Why Low-Degree Polynomials? 2 / 31 Problems in High-Dimensional Statistics Example: finding a large clique in a random graph 3


slide-1
SLIDE 1

Introduction to the Low-Degree Polynomial Method Alex Wein

Courant Institute, New York University

1 / 31

slide-2
SLIDE 2

Part I: Why Low-Degree Polynomials?

2 / 31

slide-3
SLIDE 3

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph

3 / 31

slide-4
SLIDE 4

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique

3 / 31

slide-5
SLIDE 5

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique ◮ Recovery: given a graph with a planted clique, find the clique

3 / 31

slide-6
SLIDE 6

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique ◮ Recovery: given a graph with a planted clique, find the clique ◮ Optimization: given a random graph (with no planted clique), find as large a clique as possible

3 / 31

slide-7
SLIDE 7

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique ◮ Recovery: given a graph with a planted clique, find the clique ◮ Optimization: given a random graph (with no planted clique), find as large a clique as possible Common to have information-computation gaps

3 / 31

slide-8
SLIDE 8

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique ◮ Recovery: given a graph with a planted clique, find the clique ◮ Optimization: given a random graph (with no planted clique), find as large a clique as possible Common to have information-computation gaps E.g. planted k-clique (either detection or recovery)

3 / 31

slide-9
SLIDE 9

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique ◮ Recovery: given a graph with a planted clique, find the clique ◮ Optimization: given a random graph (with no planted clique), find as large a clique as possible Common to have information-computation gaps E.g. planted k-clique (either detection or recovery) What makes problems easy vs hard?

3 / 31

slide-10
SLIDE 10

The Low-Degree Polynomial Method

A framework for predicting/explaining average-case computational complexity

4 / 31

slide-11
SLIDE 11

The Low-Degree Polynomial Method

A framework for predicting/explaining average-case computational complexity Originated from sum-of-squares literature (for detection)

[Barak, Hopkins, Kelner, Kothari, Moitra, Potechin ’16] [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18 (PhD thesis)]

4 / 31

slide-12
SLIDE 12

The Low-Degree Polynomial Method

A framework for predicting/explaining average-case computational complexity Originated from sum-of-squares literature (for detection)

[Barak, Hopkins, Kelner, Kothari, Moitra, Potechin ’16] [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18 (PhD thesis)]

Today: self-contained motivation (without SoS)

4 / 31

slide-13
SLIDE 13

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials

5 / 31

slide-14
SLIDE 14

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

5 / 31

slide-15
SLIDE 15

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

5 / 31

slide-16
SLIDE 16

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: e.g. b ∈ {0, 1}

5 / 31

slide-17
SLIDE 17

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: e.g. b ∈ {0, 1} or v ∈ Rn

5 / 31

slide-18
SLIDE 18

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: e.g. b ∈ {0, 1} or v ∈ Rn

◮ “Low” means O(log n) where n is dimension

5 / 31

slide-19
SLIDE 19

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: e.g. b ∈ {0, 1} or v ∈ Rn

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms:

5 / 31

slide-20
SLIDE 20

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: e.g. b ∈ {0, 1} or v ∈ Rn

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms: input Y ∈ Rn×n

5 / 31

slide-21
SLIDE 21

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: e.g. b ∈ {0, 1} or v ∈ Rn

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms: input Y ∈ Rn×n ◮ Power iteration: Y k1 or Tr(Y k) k = O(log n)

5 / 31

slide-22
SLIDE 22

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: e.g. b ∈ {0, 1} or v ∈ Rn

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms: input Y ∈ Rn×n ◮ Power iteration: Y k1 or Tr(Y k) k = O(log n) ◮ Approximate message passing: v ← Y h(v) O(1) rounds

5 / 31

slide-23
SLIDE 23

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: e.g. b ∈ {0, 1} or v ∈ Rn

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms: input Y ∈ Rn×n ◮ Power iteration: Y k1 or Tr(Y k) k = O(log n) ◮ Approximate message passing: v ← Y h(v) O(1) rounds ◮ Local algorithms on sparse graphs radius O(1)

5 / 31

slide-24
SLIDE 24

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: e.g. b ∈ {0, 1} or v ∈ Rn

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms: input Y ∈ Rn×n ◮ Power iteration: Y k1 or Tr(Y k) k = O(log n) ◮ Approximate message passing: v ← Y h(v) O(1) rounds ◮ Local algorithms on sparse graphs radius O(1) ◮ Or any of the above applied to ˜ Y = g(Y ) deg g = O(1)

5 / 31

slide-25
SLIDE 25

The Low-Degree Polynomial Method

Claim: low-degree polynomials provide a unified explanation of information-computation gaps in detection/recovery/optimization

6 / 31

slide-26
SLIDE 26

The Low-Degree Polynomial Method

Claim: low-degree polynomials provide a unified explanation of information-computation gaps in detection/recovery/optimization For all of these problems...

6 / 31

slide-27
SLIDE 27

The Low-Degree Polynomial Method

Claim: low-degree polynomials provide a unified explanation of information-computation gaps in detection/recovery/optimization For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, planted CSPs, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, p-spin optimization, max independent set

6 / 31

slide-28
SLIDE 28

The Low-Degree Polynomial Method

Claim: low-degree polynomials provide a unified explanation of information-computation gaps in detection/recovery/optimization For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, planted CSPs, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, p-spin optimization, max independent set

...it is the case that

6 / 31

slide-29
SLIDE 29

The Low-Degree Polynomial Method

Claim: low-degree polynomials provide a unified explanation of information-computation gaps in detection/recovery/optimization For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, planted CSPs, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, p-spin optimization, max independent set

...it is the case that ◮ the best known poly-time algorithms are low-degree (spectral/AMP/local)

6 / 31

slide-30
SLIDE 30

The Low-Degree Polynomial Method

Claim: low-degree polynomials provide a unified explanation of information-computation gaps in detection/recovery/optimization For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, planted CSPs, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, p-spin optimization, max independent set

...it is the case that ◮ the best known poly-time algorithms are low-degree (spectral/AMP/local) ◮ low-degree polynomials fail in the “hard” regime

6 / 31

slide-31
SLIDE 31

The Low-Degree Polynomial Method

Claim: low-degree polynomials provide a unified explanation of information-computation gaps in detection/recovery/optimization For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, planted CSPs, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, p-spin optimization, max independent set

...it is the case that ◮ the best known poly-time algorithms are low-degree (spectral/AMP/local) ◮ low-degree polynomials fail in the “hard” regime “Low-degree conjecture” (informal): low-degree polynomials are as powerful as all poly-time algorithms for “natural” high-dimensional problems [Hopkins ’18]

6 / 31

slide-32
SLIDE 32

Overview

This talk: techniques to prove that all low-degree polynomials fail

7 / 31

slide-33
SLIDE 33

Overview

This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness

7 / 31

slide-34
SLIDE 34

Overview

This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection

[Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey)

◮ Recovery

[Schramm, W. ’20]

◮ Optimization

[Gamarnik, Jagannath, W. ’20]

7 / 31

slide-35
SLIDE 35

Part II: Detection

8 / 31

slide-36
SLIDE 36

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

9 / 31

slide-37
SLIDE 37

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D polynomial f : Rn×n → R that distinguishes P from Q

9 / 31

slide-38
SLIDE 38

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D polynomial f : Rn×n → R that distinguishes P from Q ◮ f (Y ) is “big” when Y ∼ P and “small” when Y ∼ Q

9 / 31

slide-39
SLIDE 39

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D polynomial f : Rn×n → R that distinguishes P from Q ◮ f (Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

mean in P fluctuations in Q

9 / 31

slide-40
SLIDE 40

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D polynomial f : Rn×n → R that distinguishes P from Q ◮ f (Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

mean in P fluctuations in Q = ω(1) “degree-D polynomial succeed” O(1) “degree-D polynomials fail”

9 / 31

slide-41
SLIDE 41

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique):

10 / 31

slide-42
SLIDE 42

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique): Theorem [BHKKMP16,Hop18]: For a planted k-clique in G(n, 1/2),

10 / 31

slide-43
SLIDE 43

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique): Theorem [BHKKMP16,Hop18]: For a planted k-clique in G(n, 1/2), ◮ if k = Ω(√n) then Adv≤D = ω(1) for some D = O(log n)

low-degree polynomials succeed when k √n

10 / 31

slide-44
SLIDE 44

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique): Theorem [BHKKMP16,Hop18]: For a planted k-clique in G(n, 1/2), ◮ if k = Ω(√n) then Adv≤D = ω(1) for some D = O(log n)

low-degree polynomials succeed when k √n

◮ if k = O(n1/2−ǫ) then Adv≤D = O(1) for any D = O(log n)

low-degree polynomials fail when k ≪ √n

10 / 31

slide-45
SLIDE 45

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique): Theorem [BHKKMP16,Hop18]: For a planted k-clique in G(n, 1/2), ◮ if k = Ω(√n) then Adv≤D = ω(1) for some D = O(log n)

low-degree polynomials succeed when k √n

◮ if k = O(n1/2−ǫ) then Adv≤D = O(1) for any D = O(log n)

low-degree polynomials fail when k ≪ √n

Sometimes can rule out polynomials of degree D = nδ

10 / 31

slide-46
SLIDE 46

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique): Theorem [BHKKMP16,Hop18]: For a planted k-clique in G(n, 1/2), ◮ if k = Ω(√n) then Adv≤D = ω(1) for some D = O(log n)

low-degree polynomials succeed when k √n

◮ if k = O(n1/2−ǫ) then Adv≤D = O(1) for any D = O(log n)

low-degree polynomials fail when k ≪ √n

Sometimes can rule out polynomials of degree D = nδ Extended low-degree conjecture [Hopkins ’18]: degree-D polynomials ⇔ n ˜

Θ(D)-time algorithms

D = nδ ⇔ exp(nδ±o(1)) time

10 / 31

slide-47
SLIDE 47

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

11 / 31

slide-48
SLIDE 48

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) ✶

11 / 31

slide-49
SLIDE 49

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] ✶

11 / 31

slide-50
SLIDE 50

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T

11 / 31

slide-51
SLIDE 51

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )]

11 / 31

slide-52
SLIDE 52

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S]

11 / 31

slide-53
SLIDE 53

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c

11 / 31

slide-54
SLIDE 54

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2]

11 / 31

slide-55
SLIDE 55

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S

(orthonormality)

11 / 31

slide-56
SLIDE 56

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

11 / 31

slide-57
SLIDE 57

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

Adv≤D = max

ˆ f

ˆ f , c ˆ f

11 / 31

slide-58
SLIDE 58

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

Adv≤D = max

ˆ f

ˆ f , c ˆ f Optimizer: ˆ f ∗ = c

11 / 31

slide-59
SLIDE 59

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

Adv≤D = max

ˆ f

ˆ f , c ˆ f = c, c c Optimizer: ˆ f ∗ = c

11 / 31

slide-60
SLIDE 60

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

Adv≤D = max

ˆ f

ˆ f , c ˆ f = c, c c = c Optimizer: ˆ f ∗ = c

11 / 31

slide-61
SLIDE 61

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

Adv≤D = max

ˆ f

ˆ f , c ˆ f = c, c c = c =

  • |S|≤D
  • E

Y ∼P[Y S]

2 Optimizer: ˆ f ∗ = c

11 / 31

slide-62
SLIDE 62

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ✶

12 / 31

slide-63
SLIDE 63

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ✶

12 / 31

slide-64
SLIDE 64

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace ✶

12 / 31

slide-65
SLIDE 65

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace

  • rthogonal projection w.r.t. f , g :=

E

Y ∼Q[f (Y )g(Y )]

12 / 31

slide-66
SLIDE 66

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace

  • rthogonal projection w.r.t. f , g :=

E

Y ∼Q[f (Y )g(Y )]

“low-degree likelihood ratio” ✶

12 / 31

slide-67
SLIDE 67

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace

  • rthogonal projection w.r.t. f , g :=

E

Y ∼Q[f (Y )g(Y )]

“low-degree likelihood ratio” ◮ Adv≤D = L≤D

f :=

  • f , f =

E

Y ∼Q[f (Y )2]

12 / 31

slide-68
SLIDE 68

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace

  • rthogonal projection w.r.t. f , g :=

E

Y ∼Q[f (Y )g(Y )]

“low-degree likelihood ratio” ◮ Adv≤D = L≤D

f :=

  • f , f =

E

Y ∼Q[f (Y )2]

“norm of low-degree likelihood ratio” ✶

12 / 31

slide-69
SLIDE 69

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace

  • rthogonal projection w.r.t. f , g :=

E

Y ∼Q[f (Y )g(Y )]

“low-degree likelihood ratio” ◮ Adv≤D = L≤D

f :=

  • f , f =

E

Y ∼Q[f (Y )2]

“norm of low-degree likelihood ratio” Proof: ˆ LS = E

Y ∼Q[L(Y )Y S] = E Y ∼P[Y S]

ˆ f ∗

S = E Y ∼P[Y S] ✶|S|≤D

12 / 31

slide-70
SLIDE 70

Detection (e.g. [Hopkins, Steurer ’17])

User-friendly results:

13 / 31

slide-71
SLIDE 71

Detection (e.g. [Hopkins, Steurer ’17])

User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z vs Q : Y = Z

13 / 31

slide-72
SLIDE 72

Detection (e.g. [Hopkins, Steurer ’17])

User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z vs Q : Y = Z Adv2

≤D = D

  • d=0

1 d! E

X,X ′X, X ′d

13 / 31

slide-73
SLIDE 73

Detection (e.g. [Hopkins, Steurer ’17])

User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z vs Q : Y = Z Adv2

≤D = D

  • d=0

1 d! E

X,X ′X, X ′d

◮ Rademacher model Y ∈ {±1}m: P : E[Y |X] = X vs Q : E[Y ] = 0

13 / 31

slide-74
SLIDE 74

Detection (e.g. [Hopkins, Steurer ’17])

User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z vs Q : Y = Z Adv2

≤D = D

  • d=0

1 d! E

X,X ′X, X ′d

◮ Rademacher model Y ∈ {±1}m: P : E[Y |X] = X vs Q : E[Y ] = 0 Adv2

≤D ≤ D

  • d=0

1 d! E

X,X ′X, X ′d

13 / 31

slide-75
SLIDE 75

Detection (e.g. [Hopkins, Steurer ’17])

Recap (detection):

14 / 31

slide-76
SLIDE 76

Detection (e.g. [Hopkins, Steurer ’17])

Recap (detection): ◮ Given P, Q, can compute (via linear algebra) Adv≤D = L≤D = max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

14 / 31

slide-77
SLIDE 77

Detection (e.g. [Hopkins, Steurer ’17])

Recap (detection): ◮ Given P, Q, can compute (via linear algebra) Adv≤D = L≤D = max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

◮ Need to know orthogonal polynomials w.r.t. Q

◮ Possible when Q has independent coordinates

14 / 31

slide-78
SLIDE 78

Detection (e.g. [Hopkins, Steurer ’17])

Recap (detection): ◮ Given P, Q, can compute (via linear algebra) Adv≤D = L≤D = max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

◮ Need to know orthogonal polynomials w.r.t. Q

◮ Possible when Q has independent coordinates

◮ To predict computational complexity: for D ≈ log n, Adv≤D = ω(1) ⇒ “easy” O(1) ⇒ “hard”

14 / 31

slide-79
SLIDE 79

Detection (e.g. [Hopkins, Steurer ’17])

Recap (detection): ◮ Given P, Q, can compute (via linear algebra) Adv≤D = L≤D = max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

◮ Need to know orthogonal polynomials w.r.t. Q

◮ Possible when Q has independent coordinates

◮ To predict computational complexity: for D ≈ log n, Adv≤D = ω(1) ⇒ “easy” O(1) ⇒ “hard” ◮ These predictions are “correct” for: planted clique, sparse PCA,

community detection, tensor PCA, spiked Wigner/Wishart, ...

[BHKKMP16,HS17,HKPRSS17,Hop18,BKW19,KWB19,DKWB19]

14 / 31

slide-80
SLIDE 80

Part III: Recovery

15 / 31

slide-81
SLIDE 81

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1)

16 / 31

slide-82
SLIDE 82

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery: given Y ∼ P, recover v

16 / 31

slide-83
SLIDE 83

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery: given Y ∼ P, recover v If you can recover then you can detect (poly-time reduction)

16 / 31

slide-84
SLIDE 84

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery: given Y ∼ P, recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v

16 / 31

slide-85
SLIDE 85

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery: given Y ∼ P, recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v So if Adv≤D = O(1), this suggests recovery is hard

16 / 31

slide-86
SLIDE 86

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery: given Y ∼ P, recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v So if Adv≤D = O(1), this suggests recovery is hard But planted submatrix has a detection-recovery gap

16 / 31

slide-87
SLIDE 87

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery: given Y ∼ P, recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v So if Adv≤D = O(1), this suggests recovery is hard But planted submatrix has a detection-recovery gap How to show hardness of recovery when detection is easy?

16 / 31

slide-88
SLIDE 88

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1)

17 / 31

slide-89
SLIDE 89

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Goal: given Y , estimate v1 via polynomial f : Rn×n → R

17 / 31

slide-90
SLIDE 90

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Goal: given Y , estimate v1 via polynomial f : Rn×n → R Low-degree minimum mean squared error: MMSE≤D = min

f deg D E(f (Y ) − v1)2

17 / 31

slide-91
SLIDE 91

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Goal: given Y , estimate v1 via polynomial f : Rn×n → R Low-degree minimum mean squared error: MMSE≤D = min

f deg D E(f (Y ) − v1)2

Equivalent to low-degree maximum correlation: Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Fact: MMSE≤D = E[v2

1 ] − Corr2 ≤D

17 / 31

slide-92
SLIDE 92

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

18 / 31

slide-93
SLIDE 93

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection?

18 / 31

slide-94
SLIDE 94

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S

18 / 31

slide-95
SLIDE 95

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1]

18 / 31

slide-96
SLIDE 96

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1]

18 / 31

slide-97
SLIDE 97

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c

18 / 31

slide-98
SLIDE 98

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c Denominator: E[f (Y )2]

18 / 31

slide-99
SLIDE 99

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c Denominator: E[f (Y )2] =

  • S,T

ˆ fS ˆ fT E[Y S · Y T]

18 / 31

slide-100
SLIDE 100

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c Denominator: E[f (Y )2] =

  • S,T

ˆ fS ˆ fT E[Y S · Y T] = ˆ f ⊤M ˆ f

18 / 31

slide-101
SLIDE 101

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c Denominator: E[f (Y )2] =

  • S,T

ˆ fS ˆ fT E[Y S · Y T] = ˆ f ⊤M ˆ f Corr≤D = max

ˆ f

ˆ f , c

  • ˆ

f ⊤M ˆ f

18 / 31

slide-102
SLIDE 102

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c Denominator: E[f (Y )2] =

  • S,T

ˆ fS ˆ fT E[Y S · Y T] = ˆ f ⊤M ˆ f Corr≤D = max

ˆ f

ˆ f , c

  • ˆ

f ⊤M ˆ f = √ c⊤M−1c

18 / 31

slide-103
SLIDE 103

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

19 / 31

slide-104
SLIDE 104

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2

19 / 31

slide-105
SLIDE 105

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2 Why is this tight?

19 / 31

slide-106
SLIDE 106

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2 Why is this tight? In hard regime, f depends mostly on Z

19 / 31

slide-107
SLIDE 107

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2 Why is this tight? In hard regime, f depends mostly on Z This simplifies expression enough to find a closed form:

19 / 31

slide-108
SLIDE 108

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2 Why is this tight? In hard regime, f depends mostly on Z This simplifies expression enough to find a closed form: Corr≤D ≤ max

ˆ f

ˆ f , c M ˆ f where M is upper triangular

19 / 31

slide-109
SLIDE 109

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2 Why is this tight? In hard regime, f depends mostly on Z This simplifies expression enough to find a closed form: Corr≤D ≤ max

ˆ f

ˆ f , c M ˆ f = c⊤M−1 where M is upper triangular (can invert)

19 / 31

slide-110
SLIDE 110

Recovery [Schramm, W. ’20]

End result:

20 / 31

slide-111
SLIDE 111

Recovery [Schramm, W. ’20]

End result: Theorem [Schramm, W. ’20] Additive Gaussian model Y = X + Z Scalar value to recover: x

20 / 31

slide-112
SLIDE 112

Recovery [Schramm, W. ’20]

End result: Theorem [Schramm, W. ’20] Additive Gaussian model Y = X + Z Scalar value to recover: x Corr2

≤D ≤

  • |S|≤D

κ2

S

where κS is the joint cumulant of {x} ∪ {Yi : i ∈ S}

20 / 31

slide-113
SLIDE 113

Recovery [Schramm, W. ’20]

End result: Theorem [Schramm, W. ’20] Additive Gaussian model Y = X + Z Scalar value to recover: x Corr2

≤D ≤

  • |S|≤D

κ2

S

where κS is the joint cumulant of {x} ∪ {Yi : i ∈ S} Corollary (tight bounds for planted submatrix recovery)

20 / 31

slide-114
SLIDE 114

Recovery [Schramm, W. ’20]

End result: Theorem [Schramm, W. ’20] Additive Gaussian model Y = X + Z Scalar value to recover: x Corr2

≤D ≤

  • |S|≤D

κ2

S

where κS is the joint cumulant of {x} ∪ {Yi : i ∈ S} Corollary (tight bounds for planted submatrix recovery) ◮ if λ ≪ min{1,

1 ρ√n} then MMSE≤nΩ(1) ≈ ρ(1 − ρ)

low-degree polynomials have trivial MSE in the “hard” regime

20 / 31

slide-115
SLIDE 115

Recovery [Schramm, W. ’20]

End result: Theorem [Schramm, W. ’20] Additive Gaussian model Y = X + Z Scalar value to recover: x Corr2

≤D ≤

  • |S|≤D

κ2

S

where κS is the joint cumulant of {x} ∪ {Yi : i ∈ S} Corollary (tight bounds for planted submatrix recovery) ◮ if λ ≪ min{1,

1 ρ√n} then MMSE≤nΩ(1) ≈ ρ(1 − ρ)

low-degree polynomials have trivial MSE in the “hard” regime

◮ if λ ≫ min{1,

1 ρ√n} then MMSE≤O(log n) = o(ρ)

low-degree polynomials succeed in the “easy” regime

20 / 31

slide-116
SLIDE 116

Part IV: Optimization

21 / 31

slide-117
SLIDE 117

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n×n i.i.d. N(0, 1), max

v=1

1 √nY , v⊗4

22 / 31

slide-118
SLIDE 118

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n×n i.i.d. N(0, 1), max

v=1

1 √nY , v⊗4 Optimum value: OPT = max

v=1 H(v) = Θ(1) [ABC’13]

22 / 31

slide-119
SLIDE 119

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n×n i.i.d. N(0, 1), max

v=1

1 √nY , v⊗4 Optimum value: OPT = max

v=1 H(v) = Θ(1) [ABC’13]

Best known algorithms achieve value ALG < OPT [Subag’18, EMS’20]

22 / 31

slide-120
SLIDE 120

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n×n i.i.d. N(0, 1), max

v=1

1 √nY , v⊗4 Optimum value: OPT = max

v=1 H(v) = Θ(1) [ABC’13]

Best known algorithms achieve value ALG < OPT [Subag’18, EMS’20] Result: no low-degree polynomial can achieve value OPT − ǫ

22 / 31

slide-121
SLIDE 121

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n×n i.i.d. N(0, 1), max

v=1

1 √nY , v⊗4 Optimum value: OPT = max

v=1 H(v) = Θ(1) [ABC’13]

Best known algorithms achieve value ALG < OPT [Subag’18, EMS’20] Result: no low-degree polynomial can achieve value OPT − ǫ Theorem [Gamarnik, Jagannath, W. ’20] For some ǫ > 0, no f : Rn×n×n×n → Rn of degree polylog(n) achieves both of the following with probability 1 − exp(−nΩ(1)):

22 / 31

slide-122
SLIDE 122

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n×n i.i.d. N(0, 1), max

v=1

1 √nY , v⊗4 Optimum value: OPT = max

v=1 H(v) = Θ(1) [ABC’13]

Best known algorithms achieve value ALG < OPT [Subag’18, EMS’20] Result: no low-degree polynomial can achieve value OPT − ǫ Theorem [Gamarnik, Jagannath, W. ’20] For some ǫ > 0, no f : Rn×n×n×n → Rn of degree polylog(n) achieves both of the following with probability 1 − exp(−nΩ(1)): ◮ Objective: H(f (Y )) ≥ OPT − ǫ

22 / 31

slide-123
SLIDE 123

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n×n i.i.d. N(0, 1), max

v=1

1 √nY , v⊗4 Optimum value: OPT = max

v=1 H(v) = Θ(1) [ABC’13]

Best known algorithms achieve value ALG < OPT [Subag’18, EMS’20] Result: no low-degree polynomial can achieve value OPT − ǫ Theorem [Gamarnik, Jagannath, W. ’20] For some ǫ > 0, no f : Rn×n×n×n → Rn of degree polylog(n) achieves both of the following with probability 1 − exp(−nΩ(1)): ◮ Objective: H(f (Y )) ≥ OPT − ǫ ◮ Normalization: f (Y ) ≈ 1

22 / 31

slide-124
SLIDE 124

Optimization [Gamarnik, Jagannath, W. ’20]

Example (max independent set): given sparse graph G(n, d/n), max

S⊆[n] |S| s.t. S independent

23 / 31

slide-125
SLIDE 125

Optimization [Gamarnik, Jagannath, W. ’20]

Example (max independent set): given sparse graph G(n, d/n), max

S⊆[n] |S| s.t. S independent

OPT = 2log d d n

23 / 31

slide-126
SLIDE 126

Optimization [Gamarnik, Jagannath, W. ’20]

Example (max independent set): given sparse graph G(n, d/n), max

S⊆[n] |S| s.t. S independent

OPT = 2log d d n ALG = log d d n

23 / 31

slide-127
SLIDE 127

Optimization [Gamarnik, Jagannath, W. ’20]

Example (max independent set): given sparse graph G(n, d/n), max

S⊆[n] |S| s.t. S independent

OPT = 2log d d n ALG = log d d n Result: no low-degree polynomial can achieve (1 +

1 √ 2) log d d n

23 / 31

slide-128
SLIDE 128

Optimization [Gamarnik, Jagannath, W. ’20]

Example (max independent set): given sparse graph G(n, d/n), max

S⊆[n] |S| s.t. S independent

OPT = 2log d d n ALG = log d d n Result: no low-degree polynomial can achieve (1 +

1 √ 2) log d d n

Theorem [Gamarnik, Jagannath, W. ’20] No polynomial f : {0, 1}(n

2) → Rn of degree polylog(n) achieves

both of the following with probability 1 − exp(−nΩ(1)):

23 / 31

slide-129
SLIDE 129

Optimization [Gamarnik, Jagannath, W. ’20]

Example (max independent set): given sparse graph G(n, d/n), max

S⊆[n] |S| s.t. S independent

OPT = 2log d d n ALG = log d d n Result: no low-degree polynomial can achieve (1 +

1 √ 2) log d d n

Theorem [Gamarnik, Jagannath, W. ’20] No polynomial f : {0, 1}(n

2) → Rn of degree polylog(n) achieves

both of the following with probability 1 − exp(−nΩ(1)): ◮ fi(Y ) ∈ [0, 1/3] ∪ [2/3, 1] for most i

23 / 31

slide-130
SLIDE 130

Optimization [Gamarnik, Jagannath, W. ’20]

Example (max independent set): given sparse graph G(n, d/n), max

S⊆[n] |S| s.t. S independent

OPT = 2log d d n ALG = log d d n Result: no low-degree polynomial can achieve (1 +

1 √ 2) log d d n

Theorem [Gamarnik, Jagannath, W. ’20] No polynomial f : {0, 1}(n

2) → Rn of degree polylog(n) achieves

both of the following with probability 1 − exp(−nΩ(1)): ◮ fi(Y ) ∈ [0, 1/3] ∪ [2/3, 1] for most i ◮ {i : fi(Y ) ∈ [2/3, 1]} is a near-indep set of size (1 +

1 √ 2) log d d n

23 / 31

slide-131
SLIDE 131

Optimization [Gamarnik, Jagannath, W. ’20]

Example (max independent set): given sparse graph G(n, d/n), max

S⊆[n] |S| s.t. S independent

OPT = 2log d d n ALG = log d d n Result: no low-degree polynomial can achieve (1 +

1 √ 2) log d d n

Theorem [Gamarnik, Jagannath, W. ’20] No polynomial f : {0, 1}(n

2) → Rn of degree polylog(n) achieves

both of the following with probability 1 − exp(−nΩ(1)): ◮ fi(Y ) ∈ [0, 1/3] ∪ [2/3, 1] for most i ◮ {i : fi(Y ) ∈ [2/3, 1]} is a near-indep set of size (1 +

1 √ 2) log d d n

Forthcoming: improve 1 +

1 √ 2 → 1 + ǫ

(optimal)

23 / 31

slide-132
SLIDE 132

Optimization [Gamarnik, Jagannath, W. ’20]

How to prove failure of low-degree polynomials for optimization?

24 / 31

slide-133
SLIDE 133

Optimization [Gamarnik, Jagannath, W. ’20]

How to prove failure of low-degree polynomials for optimization? Same proof as before?

24 / 31

slide-134
SLIDE 134

Optimization [Gamarnik, Jagannath, W. ’20]

How to prove failure of low-degree polynomials for optimization? Same proof as before? max

f deg D E H(f (Y )) = max f deg D E 1

√nY , f (Y )⊗4

24 / 31

slide-135
SLIDE 135

Optimization [Gamarnik, Jagannath, W. ’20]

How to prove failure of low-degree polynomials for optimization? Same proof as before? max

f deg D E H(f (Y )) = max f deg D E 1

√nY , f (Y )⊗4 No! High-degree in ˆ f

24 / 31

slide-136
SLIDE 136

Optimization [Gamarnik, Jagannath, W. ’20]

How to prove failure of low-degree polynomials for optimization? Same proof as before? max

f deg D E H(f (Y )) = max f deg D E 1

√nY , f (Y )⊗4 No! High-degree in ˆ f Instead, use 2 ingredients:

24 / 31

slide-137
SLIDE 137

Optimization [Gamarnik, Jagannath, W. ’20]

How to prove failure of low-degree polynomials for optimization? Same proof as before? max

f deg D E H(f (Y )) = max f deg D E 1

√nY , f (Y )⊗4 No! High-degree in ˆ f Instead, use 2 ingredients: ◮ Stability of low-degree polynomials

24 / 31

slide-138
SLIDE 138

Optimization [Gamarnik, Jagannath, W. ’20]

How to prove failure of low-degree polynomials for optimization? Same proof as before? max

f deg D E H(f (Y )) = max f deg D E 1

√nY , f (Y )⊗4 No! High-degree in ˆ f Instead, use 2 ingredients: ◮ Stability of low-degree polynomials ◮ Overlap gap property (OGP)

[Gamarnik, Sudan ’13] [Chen, Gamarnik, Panchenko, Rahman ’17] [Gamarnik, Jagannath ’19]

24 / 31

slide-139
SLIDE 139

Optimization [Gamarnik, Jagannath, W. ’20]

“Low-degree polynomials are stable”

25 / 31

slide-140
SLIDE 140

Optimization [Gamarnik, Jagannath, W. ’20]

“Low-degree polynomials are stable” Y ∼ i.i.d. Bernoulli(p)

25 / 31

slide-141
SLIDE 141

Optimization [Gamarnik, Jagannath, W. ’20]

“Low-degree polynomials are stable” Y ∼ i.i.d. Bernoulli(p) Interpolation path: Y (0) Y (1) Y (2) · · · Y (m−1) Y (m)

25 / 31

slide-142
SLIDE 142

Optimization [Gamarnik, Jagannath, W. ’20]

“Low-degree polynomials are stable” Y ∼ i.i.d. Bernoulli(p) Interpolation path: Y (0) Y (1) Y (2) · · · Y (m−1) Y (m) f : {0, 1}m → Rn degree D

25 / 31

slide-143
SLIDE 143

Optimization [Gamarnik, Jagannath, W. ’20]

“Low-degree polynomials are stable” Y ∼ i.i.d. Bernoulli(p) Interpolation path: Y (0) Y (1) Y (2) · · · Y (m−1) Y (m) f : {0, 1}m → Rn degree D Definition: Index i is “c-bad” if f (Y (i)) − f (Y (i−1))2 > c E

Y f (Y )2

25 / 31

slide-144
SLIDE 144

Optimization [Gamarnik, Jagannath, W. ’20]

“Low-degree polynomials are stable” Y ∼ i.i.d. Bernoulli(p) Interpolation path: Y (0) Y (1) Y (2) · · · Y (m−1) Y (m) f : {0, 1}m → Rn degree D Definition: Index i is “c-bad” if f (Y (i)) − f (Y (i−1))2 > c E

Y f (Y )2

Theorem Pr

Y (0),...,Y (m) [∄ c-bad i] ≥ p4D/c

25 / 31

slide-145
SLIDE 145

Optimization [Gamarnik, Jagannath, W. ’20]

“Low-degree polynomials are stable” Y ∼ i.i.d. Bernoulli(p) Interpolation path: Y (0) Y (1) Y (2) · · · Y (m−1) Y (m) f : {0, 1}m → Rn degree D Definition: Index i is “c-bad” if f (Y (i)) − f (Y (i−1))2 > c E

Y f (Y )2

Theorem Pr

Y (0),...,Y (m) [∄ c-bad i] ≥ p4D/c

With non-trivial probability (over path), f ’s output is “smooth”

25 / 31

slide-146
SLIDE 146

Optimization [Gamarnik, Jagannath, W. ’20]

Overlap gap property (OGP): with high probability, Y ∼ G(n, d/n) has no occurrence of

26 / 31

slide-147
SLIDE 147

Optimization [Gamarnik, Jagannath, W. ’20]

Overlap gap property (OGP): with high probability, Y ∼ G(n, d/n) has no occurrence of ◮ S, T independent sets

26 / 31

slide-148
SLIDE 148

Optimization [Gamarnik, Jagannath, W. ’20]

Overlap gap property (OGP): with high probability, Y ∼ G(n, d/n) has no occurrence of ◮ S, T independent sets ◮ |S|, |T| ≈ (1 +

1 √ 2)Φ

26 / 31

slide-149
SLIDE 149

Optimization [Gamarnik, Jagannath, W. ’20]

Overlap gap property (OGP): with high probability, Y ∼ G(n, d/n) has no occurrence of ◮ S, T independent sets ◮ |S|, |T| ≈ (1 +

1 √ 2)Φ

◮ |S ∩ T| ≈ Φ

26 / 31

slide-150
SLIDE 150

Optimization [Gamarnik, Jagannath, W. ’20]

Overlap gap property (OGP): with high probability, Y ∼ G(n, d/n) has no occurrence of ◮ S, T independent sets ◮ |S|, |T| ≈ (1 +

1 √ 2)Φ

◮ |S ∩ T| ≈ Φ Proof: first moment method [Gamarnik, Sudan ’13]

26 / 31

slide-151
SLIDE 151

Optimization [Gamarnik, Jagannath, W. ’20]

Ensemble OGP: with high probability, ∀i, j on the interpolation path Y (0) Y (1) Y (2) · · · Y (m−1) Y (m) there is no occurrence of ◮ S independent set in Y (i) ◮ T independent set in Y (j) ◮ |S|, |T| ≈ (1 +

1 √ 2)Φ

◮ |S ∩ T| ≈ Φ

27 / 31

slide-152
SLIDE 152

Optimization [Gamarnik, Jagannath, W. ’20]

Proof that low-degree polynomials fail:

28 / 31

slide-153
SLIDE 153

Optimization [Gamarnik, Jagannath, W. ’20]

Proof that low-degree polynomials fail: Suppose f (Y ) outputs independent sets of size (1 +

1 √ 2)Φ

28 / 31

slide-154
SLIDE 154

Optimization [Gamarnik, Jagannath, W. ’20]

Proof that low-degree polynomials fail: Suppose f (Y ) outputs independent sets of size (1 +

1 √ 2)Φ

Y (0) Y (1) Y (2) · · · Y (m−1) Y (m)

28 / 31

slide-155
SLIDE 155

Optimization [Gamarnik, Jagannath, W. ’20]

Proof that low-degree polynomials fail: Suppose f (Y ) outputs independent sets of size (1 +

1 √ 2)Φ

Y (0) Y (1) Y (2) · · · Y (m−1) Y (m) Separation: f (Y (0)) and f (Y (m)) are “far apart”

28 / 31

slide-156
SLIDE 156

Optimization [Gamarnik, Jagannath, W. ’20]

Proof that low-degree polynomials fail: Suppose f (Y ) outputs independent sets of size (1 +

1 √ 2)Φ

Y (0) Y (1) Y (2) · · · Y (m−1) Y (m) Separation: f (Y (0)) and f (Y (m)) are “far apart” Stability: with probability n−D, there are no big “jumps” f (Y (i)) → f (Y (i+1))

28 / 31

slide-157
SLIDE 157

Optimization [Gamarnik, Jagannath, W. ’20]

Proof that low-degree polynomials fail: Suppose f (Y ) outputs independent sets of size (1 +

1 √ 2)Φ

Y (0) Y (1) Y (2) · · · Y (m−1) Y (m) Separation: f (Y (0)) and f (Y (m)) are “far apart” Stability: with probability n−D, there are no big “jumps” f (Y (i)) → f (Y (i+1)) Contradicts OGP

28 / 31

slide-158
SLIDE 158

Future Directions?

29 / 31

slide-159
SLIDE 159

Future Directions?

◮ (Detection) bound Adv≤D when Q is not a product measure

◮ E.g. random regular graphs

29 / 31

slide-160
SLIDE 160

Future Directions?

◮ (Detection) bound Adv≤D when Q is not a product measure

◮ E.g. random regular graphs

◮ (Recovery) bound MMSE≤D when not “signal + noise”

◮ E.g. sparse regression, phase retrieval

29 / 31

slide-161
SLIDE 161

Future Directions?

◮ (Detection) bound Adv≤D when Q is not a product measure

◮ E.g. random regular graphs

◮ (Recovery) bound MMSE≤D when not “signal + noise”

◮ E.g. sparse regression, phase retrieval

◮ (Recovery) precise value of MMSE≤D

◮ Matching AMP?

29 / 31

slide-162
SLIDE 162

Future Directions?

◮ (Detection) bound Adv≤D when Q is not a product measure

◮ E.g. random regular graphs

◮ (Recovery) bound MMSE≤D when not “signal + noise”

◮ E.g. sparse regression, phase retrieval

◮ (Recovery) precise value of MMSE≤D

◮ Matching AMP?

◮ (Optimization) prove tight results for new settings

◮ E.g. p-spin optimization

29 / 31

slide-163
SLIDE 163

Future Directions?

◮ (Detection) bound Adv≤D when Q is not a product measure

◮ E.g. random regular graphs

◮ (Recovery) bound MMSE≤D when not “signal + noise”

◮ E.g. sparse regression, phase retrieval

◮ (Recovery) precise value of MMSE≤D

◮ Matching AMP?

◮ (Optimization) prove tight results for new settings

◮ E.g. p-spin optimization

◮ Implications for other algorithms?

◮ E.g. convex programming, MCMC

29 / 31

slide-164
SLIDE 164

References

◮ Detection (survey article)

Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio Kunisky, W., Bandeira arXiv:1907.11636

◮ Recovery

Computational Barriers to Estimation from Low-Degree Polynomials Schramm, W. arXiv:2008.02269

◮ Optimization

Low-Degree Hardness of Random Optimization Problems Gamarnik, Jagannath, W. arXiv:2004.12063

30 / 31

slide-165
SLIDE 165

(extra scratch paper)

31 / 31