Computational Barriers to Estimation from Low-Degree Polynomials - - PowerPoint PPT Presentation

computational barriers to estimation from low degree
SMART_READER_LITE
LIVE PREVIEW

Computational Barriers to Estimation from Low-Degree Polynomials - - PowerPoint PPT Presentation

Computational Barriers to Estimation from Low-Degree Polynomials Alex Wein Courant Institute, New York University Joint work with: Tselil Schramm Stanford 1 / 23 Part I: Why Low-Degree Polynomials? 2 / 23 Problems in High-Dimensional


slide-1
SLIDE 1

Computational Barriers to Estimation from Low-Degree Polynomials Alex Wein

Courant Institute, New York University Joint work with: Tselil Schramm

Stanford

1 / 23

slide-2
SLIDE 2

Part I: Why Low-Degree Polynomials?

2 / 23

slide-3
SLIDE 3

Problems in High-Dimensional Statistics

Example: planted k-clique in a random graph G(n, 1/2)

3 / 23

slide-4
SLIDE 4

Problems in High-Dimensional Statistics

Example: planted k-clique in a random graph G(n, 1/2) ◮ Detection/testing: distinguish between a random graph and a graph with a planted clique

3 / 23

slide-5
SLIDE 5

Problems in High-Dimensional Statistics

Example: planted k-clique in a random graph G(n, 1/2) ◮ Detection/testing: distinguish between a random graph and a graph with a planted clique ◮ Recovery/estimation: given a graph with a planted clique, find the clique

3 / 23

slide-6
SLIDE 6

Problems in High-Dimensional Statistics

Example: planted k-clique in a random graph G(n, 1/2) ◮ Detection/testing: distinguish between a random graph and a graph with a planted clique ◮ Recovery/estimation: given a graph with a planted clique, find the clique Both problems have an information-computation gap

3 / 23

slide-7
SLIDE 7

Problems in High-Dimensional Statistics

Example: planted k-clique in a random graph G(n, 1/2) ◮ Detection/testing: distinguish between a random graph and a graph with a planted clique ◮ Recovery/estimation: given a graph with a planted clique, find the clique Both problems have an information-computation gap

3 / 23

slide-8
SLIDE 8

Problems in High-Dimensional Statistics

Example: planted k-clique in a random graph G(n, 1/2) ◮ Detection/testing: distinguish between a random graph and a graph with a planted clique ◮ Recovery/estimation: given a graph with a planted clique, find the clique Both problems have an information-computation gap What makes problems easy vs hard?

3 / 23

slide-9
SLIDE 9

The Low-Degree Polynomial Method

A framework for predicting/explaining average-case computational complexity

4 / 23

slide-10
SLIDE 10

The Low-Degree Polynomial Method

A framework for predicting/explaining average-case computational complexity Originated from sum-of-squares literature (for detection)

[Barak, Hopkins, Kelner, Kothari, Moitra, Potechin ’16] [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18 (PhD thesis)]

4 / 23

slide-11
SLIDE 11

The Low-Degree Polynomial Method

A framework for predicting/explaining average-case computational complexity Originated from sum-of-squares literature (for detection)

[Barak, Hopkins, Kelner, Kothari, Moitra, Potechin ’16] [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18 (PhD thesis)]

Today: self-contained motivation (without SoS)

4 / 23

slide-12
SLIDE 12

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials

5 / 23

slide-13
SLIDE 13

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

5 / 23

slide-14
SLIDE 14

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

5 / 23

slide-15
SLIDE 15

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: b ∈ {0, 1} (detection)

5 / 23

slide-16
SLIDE 16

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: b ∈ {0, 1} (detection) or v ∈ Rn (recovery)

5 / 23

slide-17
SLIDE 17

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: b ∈ {0, 1} (detection) or v ∈ Rn (recovery)

◮ “Low” means O(log n) where n is dimension

5 / 23

slide-18
SLIDE 18

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: b ∈ {0, 1} (detection) or v ∈ Rn (recovery)

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms:

5 / 23

slide-19
SLIDE 19

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: b ∈ {0, 1} (detection) or v ∈ Rn (recovery)

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms: input Y ∈ Rn×n

5 / 23

slide-20
SLIDE 20

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: b ∈ {0, 1} (detection) or v ∈ Rn (recovery)

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms: input Y ∈ Rn×n ◮ Power iteration: Y k1 or Tr(Y k) k = O(log n)

5 / 23

slide-21
SLIDE 21

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: b ∈ {0, 1} (detection) or v ∈ Rn (recovery)

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms: input Y ∈ Rn×n ◮ Power iteration: Y k1 or Tr(Y k) k = O(log n) ◮ Approximate message passing: v ← Y h(v) O(1) rounds

5 / 23

slide-22
SLIDE 22

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: b ∈ {0, 1} (detection) or v ∈ Rn (recovery)

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms: input Y ∈ Rn×n ◮ Power iteration: Y k1 or Tr(Y k) k = O(log n) ◮ Approximate message passing: v ← Y h(v) O(1) rounds ◮ Local algorithms on sparse graphs radius O(1)

5 / 23

slide-23
SLIDE 23

The Low-Degree Polynomial Method

Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

◮ Input: e.g. graph Y ∈ {0, 1}(

n 2)

◮ Output: b ∈ {0, 1} (detection) or v ∈ Rn (recovery)

◮ “Low” means O(log n) where n is dimension Examples of low-degree algorithms: input Y ∈ Rn×n ◮ Power iteration: Y k1 or Tr(Y k) k = O(log n) ◮ Approximate message passing: v ← Y h(v) O(1) rounds ◮ Local algorithms on sparse graphs radius O(1) ◮ Or any of the above applied to ˜ Y = g(Y ) deg g = O(1)

5 / 23

slide-24
SLIDE 24

Optimality of Low-Degree Polynomials?

Low-degree polynomials seem to be optimal for many problems!

6 / 23

slide-25
SLIDE 25

Optimality of Low-Degree Polynomials?

Low-degree polynomials seem to be optimal for many problems! For all of these problems...

6 / 23

slide-26
SLIDE 26

Optimality of Low-Degree Polynomials?

Low-degree polynomials seem to be optimal for many problems! For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, ...

6 / 23

slide-27
SLIDE 27

Optimality of Low-Degree Polynomials?

Low-degree polynomials seem to be optimal for many problems! For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, ...

...it is the case that

6 / 23

slide-28
SLIDE 28

Optimality of Low-Degree Polynomials?

Low-degree polynomials seem to be optimal for many problems! For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, ...

...it is the case that ◮ the best known poly-time algorithms are captured by O(log n)-degree polynomials (spectral/AMP)

6 / 23

slide-29
SLIDE 29

Optimality of Low-Degree Polynomials?

Low-degree polynomials seem to be optimal for many problems! For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, ...

...it is the case that ◮ the best known poly-time algorithms are captured by O(log n)-degree polynomials (spectral/AMP) ◮ low-degree polynomials fail in the “hard” regime

6 / 23

slide-30
SLIDE 30

Optimality of Low-Degree Polynomials?

Low-degree polynomials seem to be optimal for many problems! For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, ...

...it is the case that ◮ the best known poly-time algorithms are captured by O(log n)-degree polynomials (spectral/AMP) ◮ low-degree polynomials fail in the “hard” regime “Low-degree conjecture” (informal): for “natural” problems, if low-degree polynomials fail then all poly-time algorithms fail

[Hopkins ’18]

6 / 23

slide-31
SLIDE 31

Optimality of Low-Degree Polynomials?

Low-degree polynomials seem to be optimal for many problems! For all of these problems...

planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, ...

...it is the case that ◮ the best known poly-time algorithms are captured by O(log n)-degree polynomials (spectral/AMP) ◮ low-degree polynomials fail in the “hard” regime “Low-degree conjecture” (informal): for “natural” problems, if low-degree polynomials fail then all poly-time algorithms fail

[Hopkins ’18]

Caveat: Gaussian elimination for planted XOR-SAT

6 / 23

slide-32
SLIDE 32

Overview

This talk: techniques to prove that all low-degree polynomials fail

7 / 23

slide-33
SLIDE 33

Overview

This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness

7 / 23

slide-34
SLIDE 34

Overview

This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work)

[Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey)

7 / 23

slide-35
SLIDE 35

Overview

This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work)

[Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey)

◮ Recovery (this work)

[Schramm, W. ’20]

7 / 23

slide-36
SLIDE 36

Overview

This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work)

[Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey)

◮ Recovery (this work)

[Schramm, W. ’20]

◮ Optimization

[Gamarnik, Jagannath, W. ’20]

7 / 23

slide-37
SLIDE 37

Relation to Other Frameworks

8 / 23

slide-38
SLIDE 38

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

8 / 23

slide-39
SLIDE 39

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification

8 / 23

slide-40
SLIDE 40

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

8 / 23

slide-41
SLIDE 41

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

8 / 23

slide-42
SLIDE 42

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

◮ Need i.i.d. samples

8 / 23

slide-43
SLIDE 43

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20]

8 / 23

slide-44
SLIDE 44

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20]

◮ Approximate message passing (AMP) [DMM09, LKZ15,...]

8 / 23

slide-45
SLIDE 45

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20]

◮ Approximate message passing (AMP) [DMM09, LKZ15,...]

◮ AMP algorithms are low-degree

8 / 23

slide-46
SLIDE 46

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20]

◮ Approximate message passing (AMP) [DMM09, LKZ15,...]

◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14]

8 / 23

slide-47
SLIDE 47

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20]

◮ Approximate message passing (AMP) [DMM09, LKZ15,...]

◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14]

◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...]

8 / 23

slide-48
SLIDE 48

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20]

◮ Approximate message passing (AMP) [DMM09, LKZ15,...]

◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14]

◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...]

◮ MCMC algorithms are not low-degree (?)

8 / 23

slide-49
SLIDE 49

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20]

◮ Approximate message passing (AMP) [DMM09, LKZ15,...]

◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14]

◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...]

◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18]

8 / 23

slide-50
SLIDE 50

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20]

◮ Approximate message passing (AMP) [DMM09, LKZ15,...]

◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14]

◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...]

◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18]

◮ Average-case reductions [BR13,...]

8 / 23

slide-51
SLIDE 51

Relation to Other Frameworks

◮ Sum-of-squares lower bounds [BHKKMP16,...]

◮ Actually for certification ◮ Connected to low-degree [HKPRSS17]

◮ Statistical query lower bounds [FGRVX12,...]

◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20]

◮ Approximate message passing (AMP) [DMM09, LKZ15,...]

◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14]

◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...]

◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18]

◮ Average-case reductions [BR13,...]

◮ Need to argue that starting problem is hard [BB20]

8 / 23

slide-52
SLIDE 52

Part II: Detection

9 / 23

slide-53
SLIDE 53

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

10 / 23

slide-54
SLIDE 54

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D polynomial f : Rn×n → R that distinguishes P from Q

10 / 23

slide-55
SLIDE 55

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D polynomial f : Rn×n → R that distinguishes P from Q ◮ f (Y ) is “big” when Y ∼ P and “small” when Y ∼ Q

10 / 23

slide-56
SLIDE 56

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D polynomial f : Rn×n → R that distinguishes P from Q ◮ f (Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

mean in P fluctuations in Q

10 / 23

slide-57
SLIDE 57

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D polynomial f : Rn×n → R that distinguishes P from Q ◮ f (Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

mean in P fluctuations in Q = ω(1) “degree-D polynomial succeed” O(1) “degree-D polynomials fail”

10 / 23

slide-58
SLIDE 58

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique):

11 / 23

slide-59
SLIDE 59

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique): Theorem [BHKKMP16,Hop18]: For a planted k-clique in G(n, 1/2),

11 / 23

slide-60
SLIDE 60

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique): Theorem [BHKKMP16,Hop18]: For a planted k-clique in G(n, 1/2), ◮ if k = Ω(√n) then Adv≤D = ω(1) for some D = O(log n)

low-degree polynomials succeed when k √n

11 / 23

slide-61
SLIDE 61

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique): Theorem [BHKKMP16,Hop18]: For a planted k-clique in G(n, 1/2), ◮ if k = Ω(√n) then Adv≤D = ω(1) for some D = O(log n)

low-degree polynomials succeed when k √n

◮ if k = O(n1/2−ǫ) then Adv≤D = O(1) for any D = O(log n)

low-degree polynomials fail when k ≪ √n

11 / 23

slide-62
SLIDE 62

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique): Theorem [BHKKMP16,Hop18]: For a planted k-clique in G(n, 1/2), ◮ if k = Ω(√n) then Adv≤D = ω(1) for some D = O(log n)

low-degree polynomials succeed when k √n

◮ if k = O(n1/2−ǫ) then Adv≤D = O(1) for any D = O(log n)

low-degree polynomials fail when k ≪ √n

Sometimes can rule out polynomials of degree D = nδ

11 / 23

slide-63
SLIDE 63

Detection (e.g. [Hopkins, Steurer ’17])

Prototypical result (planted clique): Theorem [BHKKMP16,Hop18]: For a planted k-clique in G(n, 1/2), ◮ if k = Ω(√n) then Adv≤D = ω(1) for some D = O(log n)

low-degree polynomials succeed when k √n

◮ if k = O(n1/2−ǫ) then Adv≤D = O(1) for any D = O(log n)

low-degree polynomials fail when k ≪ √n

Sometimes can rule out polynomials of degree D = nδ Extended low-degree conjecture [Hopkins ’18]: degree-D polynomials ⇔ n ˜

Θ(D)-time algorithms

D = nδ ⇔ exp(nδ±o(1)) time

11 / 23

slide-64
SLIDE 64

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

12 / 23

slide-65
SLIDE 65

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) ✶

12 / 23

slide-66
SLIDE 66

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] ✶

12 / 23

slide-67
SLIDE 67

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T

12 / 23

slide-68
SLIDE 68

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )]

12 / 23

slide-69
SLIDE 69

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S]

12 / 23

slide-70
SLIDE 70

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c

12 / 23

slide-71
SLIDE 71

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2]

12 / 23

slide-72
SLIDE 72

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S

(orthonormality)

12 / 23

slide-73
SLIDE 73

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

12 / 23

slide-74
SLIDE 74

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

Adv≤D = max

ˆ f

ˆ f , c ˆ f

12 / 23

slide-75
SLIDE 75

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

Adv≤D = max

ˆ f

ˆ f , c ˆ f Optimizer: ˆ f ∗ = c

12 / 23

slide-76
SLIDE 76

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

Adv≤D = max

ˆ f

ˆ f , c ˆ f = c, c c Optimizer: ˆ f ∗ = c

12 / 23

slide-77
SLIDE 77

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

Adv≤D = max

ˆ f

ˆ f , c ˆ f = c, c c = c Optimizer: ˆ f ∗ = c

12 / 23

slide-78
SLIDE 78

Detection (e.g. [Hopkins, Steurer ’17])

Goal: compute Adv≤D := max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

Suppose Q is i.i.d. Unif(±1) Write f (Y ) =

|S|≤D ˆ

fSY S Y S :=

i∈S Yi

S ⊆ [m] {Y S}S⊆[m] are orthonormal: EY ∼Q[Y SY T] = ✶S=T Numerator: E

Y ∼P[f (Y )] =

  • |S|≤D

ˆ fS E

Y ∼P[Y S] =: ˆ

f , c Denominator: E

Y ∼Q[f (Y )2] =

  • |S|≤D

ˆ f 2

S = ˆ

f 2

(orthonormality)

Adv≤D = max

ˆ f

ˆ f , c ˆ f = c, c c = c =

  • |S|≤D
  • E

Y ∼P[Y S]

2 Optimizer: ˆ f ∗ = c

12 / 23

slide-79
SLIDE 79

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ✶

13 / 23

slide-80
SLIDE 80

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ✶

13 / 23

slide-81
SLIDE 81

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace ✶

13 / 23

slide-82
SLIDE 82

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace

  • rthogonal projection w.r.t. f , g :=

E

Y ∼Q[f (Y )g(Y )]

13 / 23

slide-83
SLIDE 83

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace

  • rthogonal projection w.r.t. f , g :=

E

Y ∼Q[f (Y )g(Y )]

“low-degree likelihood ratio” ✶

13 / 23

slide-84
SLIDE 84

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace

  • rthogonal projection w.r.t. f , g :=

E

Y ∼Q[f (Y )g(Y )]

“low-degree likelihood ratio” ◮ Adv≤D = L≤D

f :=

  • f , f =

E

Y ∼Q[f (Y )2]

13 / 23

slide-85
SLIDE 85

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace

  • rthogonal projection w.r.t. f , g :=

E

Y ∼Q[f (Y )g(Y )]

“low-degree likelihood ratio” ◮ Adv≤D = L≤D

f :=

  • f , f =

E

Y ∼Q[f (Y )2]

“norm of low-degree likelihood ratio” ✶

13 / 23

slide-86
SLIDE 86

Detection (e.g. [Hopkins, Steurer ’17])

Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L(Y ) = dP dQ(Y ) ◮ Best degree-D test (maximizer of Adv≤D) is f ∗ = L≤D := projection of L onto deg-D subspace

  • rthogonal projection w.r.t. f , g :=

E

Y ∼Q[f (Y )g(Y )]

“low-degree likelihood ratio” ◮ Adv≤D = L≤D

f :=

  • f , f =

E

Y ∼Q[f (Y )2]

“norm of low-degree likelihood ratio” Proof: ˆ LS = E

Y ∼Q[L(Y )Y S] = E Y ∼P[Y S]

ˆ f ∗

S = E Y ∼P[Y S] ✶|S|≤D

13 / 23

slide-87
SLIDE 87

Part III: Recovery

14 / 23

slide-88
SLIDE 88

Planted Submatrix

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1)

15 / 23

slide-89
SLIDE 89

Planted Submatrix

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Regime: 1/√n ≪ ρ ≪ 1

15 / 23

slide-90
SLIDE 90

Planted Submatrix

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Regime: 1/√n ≪ ρ ≪ 1 Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p.

15 / 23

slide-91
SLIDE 91

Planted Submatrix

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Regime: 1/√n ≪ ρ ≪ 1 Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ (ρ√n)−2

15 / 23

slide-92
SLIDE 92

Planted Submatrix

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Regime: 1/√n ≪ ρ ≪ 1 Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ (ρ√n)−2 Recovery: given Y ∼ P, recover v

15 / 23

slide-93
SLIDE 93

Planted Submatrix

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Regime: 1/√n ≪ ρ ≪ 1 Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ (ρ√n)−2 Recovery: given Y ∼ P, recover v ◮ Leading eigenvector succeeds when λ ≫ (ρ√n)−1

15 / 23

slide-94
SLIDE 94

Planted Submatrix

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Regime: 1/√n ≪ ρ ≪ 1 Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ (ρ√n)−2 Recovery: given Y ∼ P, recover v ◮ Leading eigenvector succeeds when λ ≫ (ρ√n)−1 ◮ Exhaustive search succeeds when λ ≫ (ρn)−1/2

15 / 23

slide-95
SLIDE 95

Planted Submatrix

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Regime: 1/√n ≪ ρ ≪ 1 Detection: distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ (ρ√n)−2 Recovery: given Y ∼ P, recover v ◮ Leading eigenvector succeeds when λ ≫ (ρ√n)−1 ◮ Exhaustive search succeeds when λ ≫ (ρn)−1/2 Detection-recovery gap

15 / 23

slide-96
SLIDE 96

Recovery Hardness from Detection Hardness?

16 / 23

slide-97
SLIDE 97

Recovery Hardness from Detection Hardness?

If you can recover then you can detect (poly-time reduction)

16 / 23

slide-98
SLIDE 98

Recovery Hardness from Detection Hardness?

If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v

16 / 23

slide-99
SLIDE 99

Recovery Hardness from Detection Hardness?

If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v So if Adv≤D = O(1), this suggests recovery is hard

16 / 23

slide-100
SLIDE 100

Recovery Hardness from Detection Hardness?

If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v So if Adv≤D = O(1), this suggests recovery is hard But how to show hardness of recovery when detection is easy?

16 / 23

slide-101
SLIDE 101

Recovery Hardness from Detection Hardness?

If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v So if Adv≤D = O(1), this suggests recovery is hard But how to show hardness of recovery when detection is easy? Attempt: choose a better null distribution?

16 / 23

slide-102
SLIDE 102

Recovery Hardness from Detection Hardness?

If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v So if Adv≤D = O(1), this suggests recovery is hard But how to show hardness of recovery when detection is easy? Attempt: choose a better null distribution? ◮ Match mean of planted distribution?

16 / 23

slide-103
SLIDE 103

Recovery Hardness from Detection Hardness?

If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v So if Adv≤D = O(1), this suggests recovery is hard But how to show hardness of recovery when detection is easy? Attempt: choose a better null distribution? ◮ Match mean of planted distribution? ◮ Gaussian matching first 2 moments of planted distribution?

16 / 23

slide-104
SLIDE 104

Recovery Hardness from Detection Hardness?

If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ {0, 1}n; check ˆ v⊤Y ˆ v So if Adv≤D = O(1), this suggests recovery is hard But how to show hardness of recovery when detection is easy? Attempt: choose a better null distribution? ◮ Match mean of planted distribution? ◮ Gaussian matching first 2 moments of planted distribution? This closes detection-recovery gap partially but not all the way

16 / 23

slide-105
SLIDE 105

Low-Degree Recovery

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1)

17 / 23

slide-106
SLIDE 106

Low-Degree Recovery

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Goal: given Y , estimate v1 via polynomial f : Rn×n → R

17 / 23

slide-107
SLIDE 107

Low-Degree Recovery

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Goal: given Y , estimate v1 via polynomial f : Rn×n → R Low-degree minimum mean squared error: MMSE≤D = min

f deg D E(f (Y ) − v1)2

17 / 23

slide-108
SLIDE 108

Low-Degree Recovery

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ λ > 0 vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Goal: given Y , estimate v1 via polynomial f : Rn×n → R Low-degree minimum mean squared error: MMSE≤D = min

f deg D E(f (Y ) − v1)2

Equivalent to low-degree maximum correlation: Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Fact: MMSE≤D = E[v2

1 ] − Corr2 ≤D

17 / 23

slide-109
SLIDE 109

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

18 / 23

slide-110
SLIDE 110

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection?

18 / 23

slide-111
SLIDE 111

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S

18 / 23

slide-112
SLIDE 112

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1]

18 / 23

slide-113
SLIDE 113

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1]

18 / 23

slide-114
SLIDE 114

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c

18 / 23

slide-115
SLIDE 115

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c Denominator: E[f (Y )2]

18 / 23

slide-116
SLIDE 116

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c Denominator: E[f (Y )2] =

  • S,T

ˆ fS ˆ fT E[Y S · Y T]

18 / 23

slide-117
SLIDE 117

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c Denominator: E[f (Y )2] =

  • S,T

ˆ fS ˆ fT E[Y S · Y T] = ˆ f ⊤M ˆ f

18 / 23

slide-118
SLIDE 118

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c Denominator: E[f (Y )2] =

  • S,T

ˆ fS ˆ fT E[Y S · Y T] = ˆ f ⊤M ˆ f Corr≤D = max

ˆ f

ˆ f , c

  • ˆ

f ⊤M ˆ f

18 / 23

slide-119
SLIDE 119

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? f =

  • |S|≤D

ˆ fSY S Numerator: E[f (Y ) · v1] =

  • |S|≤D

ˆ fS E[Y S · v1] =: ˆ f , c Denominator: E[f (Y )2] =

  • S,T

ˆ fS ˆ fT E[Y S · Y T] = ˆ f ⊤M ˆ f Corr≤D = max

ˆ f

ˆ f , c

  • ˆ

f ⊤M ˆ f = √ c⊤M−1c

18 / 23

slide-120
SLIDE 120

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

19 / 23

slide-121
SLIDE 121

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2

19 / 23

slide-122
SLIDE 122

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2 Why is this tight?

19 / 23

slide-123
SLIDE 123

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2 Why is this tight? In hard regime, f depends mostly on Z

19 / 23

slide-124
SLIDE 124

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2 Why is this tight? In hard regime, f depends mostly on Z This simplifies expression enough to find a closed form:

19 / 23

slide-125
SLIDE 125

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2 Why is this tight? In hard regime, f depends mostly on Z This simplifies expression enough to find a closed form: Corr≤D ≤ max

ˆ f

ˆ f , c M ˆ f where M is upper triangular

19 / 23

slide-126
SLIDE 126

Hardness of Recovery

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Trick: bound denominator via Jensen’s inequality on “signal” X E[f (Y )2] = E

Z E X[f (X + Z)2] ≥ E Z

  • E

X f (X + Z)

2 Why is this tight? In hard regime, f depends mostly on Z This simplifies expression enough to find a closed form: Corr≤D ≤ max

ˆ f

ˆ f , c M ˆ f = c⊤M−1 where M is upper triangular (can invert)

19 / 23

slide-127
SLIDE 127

Main Result

20 / 23

slide-128
SLIDE 128

Main Result

Theorem [Schramm, W. ’20] Additive Gaussian model Y = X + Z Scalar value to recover: x

20 / 23

slide-129
SLIDE 129

Main Result

Theorem [Schramm, W. ’20] Additive Gaussian model Y = X + Z Scalar value to recover: x Corr2

≤D ≤

  • |S|≤D

κ2

S

where κS is the joint cumulant of {x} ∪ {Yi : i ∈ S}

20 / 23

slide-130
SLIDE 130

Main Result

Theorem [Schramm, W. ’20] Additive Gaussian model Y = X + Z Scalar value to recover: x Corr2

≤D ≤

  • |S|≤D

κ2

S

where κS is the joint cumulant of {x} ∪ {Yi : i ∈ S} Corollary (tight bounds for planted submatrix recovery)

20 / 23

slide-131
SLIDE 131

Main Result

Theorem [Schramm, W. ’20] Additive Gaussian model Y = X + Z Scalar value to recover: x Corr2

≤D ≤

  • |S|≤D

κ2

S

where κS is the joint cumulant of {x} ∪ {Yi : i ∈ S} Corollary (tight bounds for planted submatrix recovery) ◮ if λ ≪ min{1,

1 ρ√n} then MMSE≤nΩ(1) ≈ ρ(1 − ρ)

low-degree polynomials have trivial MSE in the “hard” regime

20 / 23

slide-132
SLIDE 132

Main Result

Theorem [Schramm, W. ’20] Additive Gaussian model Y = X + Z Scalar value to recover: x Corr2

≤D ≤

  • |S|≤D

κ2

S

where κS is the joint cumulant of {x} ∪ {Yi : i ∈ S} Corollary (tight bounds for planted submatrix recovery) ◮ if λ ≪ min{1,

1 ρ√n} then MMSE≤nΩ(1) ≈ ρ(1 − ρ)

low-degree polynomials have trivial MSE in the “hard” regime

◮ if λ ≫ min{1,

1 ρ√n} then MMSE≤O(log n) = o(ρ)

low-degree polynomials succeed in the “easy” regime

20 / 23

slide-133
SLIDE 133

Future Directions?

21 / 23

slide-134
SLIDE 134

Future Directions?

◮ (Detection) bound Adv≤D when Q is not a product measure

◮ E.g. random regular graphs

21 / 23

slide-135
SLIDE 135

Future Directions?

◮ (Detection) bound Adv≤D when Q is not a product measure

◮ E.g. random regular graphs

◮ (Recovery) bound MMSE≤D when not “signal + noise”

◮ E.g. sparse regression, phase retrieval

21 / 23

slide-136
SLIDE 136

Future Directions?

◮ (Detection) bound Adv≤D when Q is not a product measure

◮ E.g. random regular graphs

◮ (Recovery) bound MMSE≤D when not “signal + noise”

◮ E.g. sparse regression, phase retrieval

◮ (Recovery) sharp threshold for planted submatrix

◮ AMP succeeds when λ > (ρ√en)−1 [Hajek, Wu, Xu ’15]

21 / 23

slide-137
SLIDE 137

Future Directions?

◮ (Detection) bound Adv≤D when Q is not a product measure

◮ E.g. random regular graphs

◮ (Recovery) bound MMSE≤D when not “signal + noise”

◮ E.g. sparse regression, phase retrieval

◮ (Recovery) sharp threshold for planted submatrix

◮ AMP succeeds when λ > (ρ√en)−1 [Hajek, Wu, Xu ’15]

◮ Implications for other algorithms?

◮ E.g. convex programming, MCMC

21 / 23

slide-138
SLIDE 138

References

◮ Detection (survey article)

Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio Kunisky, W., Bandeira arXiv:1907.11636

◮ Recovery

Computational Barriers to Estimation from Low-Degree Polynomials Schramm, W. arXiv:2008.02269

◮ Optimization

Low-Degree Hardness of Random Optimization Problems Gamarnik, Jagannath, W. arXiv:2004.12063

22 / 23

slide-139
SLIDE 139

(extra scratch paper)

23 / 23