The Power of Low-Degree Polynomials for Solving Statistical Problems - - PowerPoint PPT Presentation

the power of low degree polynomials for solving
SMART_READER_LITE
LIVE PREVIEW

The Power of Low-Degree Polynomials for Solving Statistical Problems - - PowerPoint PPT Presentation

The Power of Low-Degree Polynomials for Solving Statistical Problems Alex Wein Courant Institute, New York University Based on joint work with: David Gamarnik (MIT) Aukosh Jagannath (Waterloo) Tselil Schramm (Stanford) 1 / 10 Problems in


slide-1
SLIDE 1

The Power of Low-Degree Polynomials for Solving Statistical Problems Alex Wein

Courant Institute, New York University Based on joint work with: David Gamarnik (MIT) Aukosh Jagannath (Waterloo) Tselil Schramm (Stanford)

1 / 10

slide-2
SLIDE 2

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph

2 / 10

slide-3
SLIDE 3

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique

2 / 10

slide-4
SLIDE 4

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique ◮ Recovery: given a graph with a planted clique, find the clique

2 / 10

slide-5
SLIDE 5

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique ◮ Recovery: given a graph with a planted clique, find the clique ◮ Optimization: given a random graph (with no planted clique), find as large a clique as possible

2 / 10

slide-6
SLIDE 6

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique ◮ Recovery: given a graph with a planted clique, find the clique ◮ Optimization: given a random graph (with no planted clique), find as large a clique as possible Common to have information-computation gaps

2 / 10

slide-7
SLIDE 7

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique ◮ Recovery: given a graph with a planted clique, find the clique ◮ Optimization: given a random graph (with no planted clique), find as large a clique as possible Common to have information-computation gaps E.g. planted k-clique (either detection or recovery)

2 / 10

slide-8
SLIDE 8

Problems in High-Dimensional Statistics

Example: finding a large clique in a random graph ◮ Detection: distinguish between a random graph and a graph with a planted clique ◮ Recovery: given a graph with a planted clique, find the clique ◮ Optimization: given a random graph (with no planted clique), find as large a clique as possible Common to have information-computation gaps E.g. planted k-clique (either detection or recovery) What makes problems easy vs hard?

2 / 10

slide-9
SLIDE 9

The Low-Degree Polynomial Method

A framework for understanding computational complexity

3 / 10

slide-10
SLIDE 10

The Low-Degree Polynomial Method

A framework for understanding computational complexity Originated from sum-of-squares literature (for detection)

[Barak, Hopkins, Kelner, Kothari, Moitra, Potechin ’16] [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18 (PhD thesis)]

3 / 10

slide-11
SLIDE 11

The Low-Degree Polynomial Method

A framework for understanding computational complexity Originated from sum-of-squares literature (for detection) Study a restricted class of algorithms: low-degree polynomials

3 / 10

slide-12
SLIDE 12

The Low-Degree Polynomial Method

A framework for understanding computational complexity Originated from sum-of-squares literature (for detection) Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM

3 / 10

slide-13
SLIDE 13

The Low-Degree Polynomial Method

A framework for understanding computational complexity Originated from sum-of-squares literature (for detection) Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM ◮ “Low” means O(log n) where n is dimension

3 / 10

slide-14
SLIDE 14

The Low-Degree Polynomial Method

A framework for understanding computational complexity Originated from sum-of-squares literature (for detection) Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM ◮ “Low” means O(log n) where n is dimension Some low-degree algorithms:

3 / 10

slide-15
SLIDE 15

The Low-Degree Polynomial Method

A framework for understanding computational complexity Originated from sum-of-squares literature (for detection) Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM ◮ “Low” means O(log n) where n is dimension Some low-degree algorithms: ◮ Spectral methods (power iteration)

3 / 10

slide-16
SLIDE 16

The Low-Degree Polynomial Method

A framework for understanding computational complexity Originated from sum-of-squares literature (for detection) Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM ◮ “Low” means O(log n) where n is dimension Some low-degree algorithms: ◮ Spectral methods (power iteration) ◮ Approximate message passing (AMP) [DMM09]

3 / 10

slide-17
SLIDE 17

The Low-Degree Polynomial Method

A framework for understanding computational complexity Originated from sum-of-squares literature (for detection) Study a restricted class of algorithms: low-degree polynomials ◮ Multivariate polynomial f : RN → RM ◮ “Low” means O(log n) where n is dimension Some low-degree algorithms: ◮ Spectral methods (power iteration) ◮ Approximate message passing (AMP) [DMM09] Low-degree algorithms are as powerful as the best known polynomial-time algorithms for many problems: planted clique,

sparse PCA, community detection, tensor PCA, constraint satisfaction, spiked matrix [BHKKMP16,HS17,HKPRSS17,Hop18,BKW19,KWB19,DKWB19]

3 / 10

slide-18
SLIDE 18

Overview

This talk: techniques to prove that all low-degree polynomials fail

4 / 10

slide-19
SLIDE 19

Overview

This talk: techniques to prove that all low-degree polynomials fail Constitutes evidence for computational hardness

4 / 10

slide-20
SLIDE 20

Overview

This talk: techniques to prove that all low-degree polynomials fail Constitutes evidence for computational hardness Settings: ◮ Detection (prior work) ◮ Recovery Schramm, W. “Computational Barriers to Estimation from Low-Degree Polynomials”, arXiv, 2020. ◮ Optimization Gamarnik, Jagannath, W. “Low-Degree Hardness of Random Optimization Problems”, FOCS 2020.

4 / 10

slide-21
SLIDE 21

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

5 / 10

slide-22
SLIDE 22

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D (multivariate) polynomial f : Rn×n → R that distinguishes P from Q

5 / 10

slide-23
SLIDE 23

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D (multivariate) polynomial f : Rn×n → R that distinguishes P from Q ◮ In the sense that f (Y ) is “big” when Y ∼ P and “small” when Y ∼ Q

5 / 10

slide-24
SLIDE 24

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D (multivariate) polynomial f : Rn×n → R that distinguishes P from Q ◮ In the sense that f (Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

mean in P fluctuations in Q

5 / 10

slide-25
SLIDE 25

Detection (e.g. [Hopkins, Steurer ’17])

Goal: hypothesis test with error probability o(1) between: ◮ Null model Y ∼ Qn

e.g. G(n, 1/2)

◮ Planted model Y ∼ Pn

e.g. G(n, 1/2) ∪ {random k-clique}

Look for a degree-D (multivariate) polynomial f : Rn×n → R that distinguishes P from Q ◮ In the sense that f (Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

mean in P fluctuations in Q = ω(1) degree-D polynomial succeed O(1) degree-D polynomials fail

5 / 10

slide-26
SLIDE 26

Detection (e.g. [Hopkins, Steurer ’17])

max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

6 / 10

slide-27
SLIDE 27

Detection (e.g. [Hopkins, Steurer ’17])

max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

f , g = EY ∼Q[f (Y )g(Y )] f =

  • f , f

6 / 10

slide-28
SLIDE 28

Detection (e.g. [Hopkins, Steurer ’17])

max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

EY ∼Q[L(Y )f (Y )]

  • EY ∼Q[f (Y )2]

f , g = EY ∼Q[f (Y )g(Y )] f =

  • f , f

Likelihood ratio: L(Y ) = dP

dQ(Y )

6 / 10

slide-29
SLIDE 29

Detection (e.g. [Hopkins, Steurer ’17])

max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

EY ∼Q[L(Y )f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

L, f f f , g = EY ∼Q[f (Y )g(Y )] f =

  • f , f

Likelihood ratio: L(Y ) = dP

dQ(Y )

6 / 10

slide-30
SLIDE 30

Detection (e.g. [Hopkins, Steurer ’17])

max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

EY ∼Q[L(Y )f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

L, f f f , g = EY ∼Q[f (Y )g(Y )] f =

  • f , f

Likelihood ratio: L(Y ) = dP

dQ(Y )

Maximizer: f = L≤D := projection of L onto degree-D subspace

6 / 10

slide-31
SLIDE 31

Detection (e.g. [Hopkins, Steurer ’17])

max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

EY ∼Q[L(Y )f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

L, f f = L≤D f , g = EY ∼Q[f (Y )g(Y )] f =

  • f , f

Likelihood ratio: L(Y ) = dP

dQ(Y )

Maximizer: f = L≤D := projection of L onto degree-D subspace Norm of low-degree likelihood ratio

6 / 10

slide-32
SLIDE 32

Detection (e.g. [Hopkins, Steurer ’17])

max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

EY ∼Q[L(Y )f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

L, f f = L≤D f , g = EY ∼Q[f (Y )g(Y )] f =

  • f , f

Likelihood ratio: L(Y ) = dP

dQ(Y )

Maximizer: f = L≤D := projection of L onto degree-D subspace Norm of low-degree likelihood ratio To project: expand L in orthogonal polynomials w.r.t. Q

6 / 10

slide-33
SLIDE 33

Detection (e.g. [Hopkins, Steurer ’17])

max

f deg D

EY ∼P[f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

EY ∼Q[L(Y )f (Y )]

  • EY ∼Q[f (Y )2]

= max

f deg D

L, f f = L≤D f , g = EY ∼Q[f (Y )g(Y )] f =

  • f , f

Likelihood ratio: L(Y ) = dP

dQ(Y )

Maximizer: f = L≤D := projection of L onto degree-D subspace Norm of low-degree likelihood ratio To project: expand L in orthogonal polynomials w.r.t. Q ◮ Works if Q has independent entries

6 / 10

slide-34
SLIDE 34

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ where λ > 0 and vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1)

7 / 10

slide-35
SLIDE 35

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ where λ > 0 and vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Goal: given Y , estimate v1 via polynomial f : Rn×n → R

7 / 10

slide-36
SLIDE 36

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ where λ > 0 and vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Goal: given Y , estimate v1 via polynomial f : Rn×n → R Low-degree minimum mean squared error: MMSE≤D = min

f deg D E(f (Y ) − v1)2

7 / 10

slide-37
SLIDE 37

Recovery [Schramm, W. ’20]

Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λvv⊤ where λ > 0 and vi ∼ Bernoulli(ρ) ◮ Noise: Z i.i.d. N(0, 1) Goal: given Y , estimate v1 via polynomial f : Rn×n → R Low-degree minimum mean squared error: MMSE≤D = min

f deg D E(f (Y ) − v1)2

Equivalent to low-degree maximum correlation: Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Fact: MMSE≤D = E[v2

1 ] − Corr2 ≤D

7 / 10

slide-38
SLIDE 38

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

8 / 10

slide-39
SLIDE 39

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection?

8 / 10

slide-40
SLIDE 40

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? Issue: would need orthogonal polynomials for planted distribution

8 / 10

slide-41
SLIDE 41

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? Issue: would need orthogonal polynomials for planted distribution Trick: bound denominator via Jensen’s inequality E[f (Y )2] = EZEX[f (X + Z)2] ≥ EZ (EXf (X + Z))2

8 / 10

slide-42
SLIDE 42

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? Issue: would need orthogonal polynomials for planted distribution Trick: bound denominator via Jensen’s inequality E[f (Y )2] = EZEX[f (X + Z)2] ≥ EZ (EXf (X + Z))2 ◮ This simplifies expression enough to find closed form

8 / 10

slide-43
SLIDE 43

Recovery [Schramm, W. ’20]

For hardness, want upper bound on Corr≤D = max

f deg D

E[f (Y ) · v1]

  • E[f (Y )2]

Same proof as detection? Issue: would need orthogonal polynomials for planted distribution Trick: bound denominator via Jensen’s inequality E[f (Y )2] = EZEX[f (X + Z)2] ≥ EZ (EXf (X + Z))2 ◮ This simplifies expression enough to find closed form ◮ Yields tight bounds for planted submatrix problem

8 / 10

slide-44
SLIDE 44

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n i.i.d. N(0, 1), find unit vector v maximizing H(v) =

1 √nY , v⊗3

9 / 10

slide-45
SLIDE 45

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n i.i.d. N(0, 1), find unit vector v maximizing H(v) =

1 √nY , v⊗3

Optimum value: OPT = max

v=1 H(v) = Θ(1)

9 / 10

slide-46
SLIDE 46

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n i.i.d. N(0, 1), find unit vector v maximizing H(v) =

1 √nY , v⊗3

Optimum value: OPT = max

v=1 H(v) = Θ(1)

Our result: no constant-degree polynomial can achieve value OPT − ǫ

9 / 10

slide-47
SLIDE 47

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n i.i.d. N(0, 1), find unit vector v maximizing H(v) =

1 √nY , v⊗3

Optimum value: OPT = max

v=1 H(v) = Θ(1)

Our result: no constant-degree polynomial can achieve value OPT − ǫ

Theorem (GJW’20)

For some ǫ > 0, no degree-O(1) polynomial f : Rn×n×n → Rn achieves both of the following with probability 1 − o(1): ◮ Objective: H(f (Y )) ≥ OPT − ǫ ◮ Normalization: f (Y ) ≈ 1

9 / 10

slide-48
SLIDE 48

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n i.i.d. N(0, 1), find unit vector v maximizing H(v) =

1 √nY , v⊗3

Optimum value: OPT = max

v=1 H(v) = Θ(1)

Our result: no constant-degree polynomial can achieve value OPT − ǫ ◮ Best known algorithms are constant-degree [Sub18,Mon18,EMS20]

9 / 10

slide-49
SLIDE 49

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n i.i.d. N(0, 1), find unit vector v maximizing H(v) =

1 √nY , v⊗3

Optimum value: OPT = max

v=1 H(v) = Θ(1)

Our result: no constant-degree polynomial can achieve value OPT − ǫ ◮ Best known algorithms are constant-degree [Sub18,Mon18,EMS20] ◮ Proof:

◮ Low-degree polynomials are stable ◮ Overlap gap property [GS13,CGPR17,GJ19]

9 / 10

slide-50
SLIDE 50

Optimization [Gamarnik, Jagannath, W. ’20]

Example (spherical spin glass): for Y ∈ Rn×n×n i.i.d. N(0, 1), find unit vector v maximizing H(v) =

1 √nY , v⊗3

Optimum value: OPT = max

v=1 H(v) = Θ(1)

Our result: no constant-degree polynomial can achieve value OPT − ǫ ◮ Best known algorithms are constant-degree [Sub18,Mon18,EMS20] ◮ Proof:

◮ Low-degree polynomials are stable ◮ Overlap gap property [GS13,CGPR17,GJ19]

◮ Open: show that no low-degree polynomial can achieve the precise objective value achieved by [Sub18]

9 / 10

slide-51
SLIDE 51

References

◮ Detection (survey article) Kunisky, W., Bandeira. “Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio”, arXiv:1907.11636 ◮ Recovery Schramm, W. “Computational Barriers to Estimation from Low-Degree Polynomials”, arXiv:2008.02269 ◮ Optimization Gamarnik, Jagannath, W. “Low-Degree Hardness of Random Optimization Problems”, arXiv:2004.12063

10 / 10