Introduction to the Low-Degree Polynomial Method Alex Wein Courant - PowerPoint PPT Presentation

The Low-Degree Polynomial Method Claim: low-degree polynomials provide a unified explanation of information-computation gaps in detection/recovery/optimization For all of these problems... planted clique, sparse PCA, community detection, tensor PCA, planted CSPs, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, p-spin optimization, max independent set ...it is the case that ◮ the best known poly-time algorithms are low-degree (spectral/AMP/local) ◮ low-degree polynomials fail in the “hard” regime “Low-degree conjecture” (informal): low-degree polynomials are as powerful as all poly-time algorithms for “natural” high-dimensional problems [Hopkins ’18] 6 / 31

Overview This talk: techniques to prove that all low-degree polynomials fail 7 / 31

Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness 7 / 31

Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) ◮ Recovery [Schramm, W. ’20] ◮ Optimization [Gamarnik, Jagannath, W. ’20] 7 / 31

Part II: Detection 8 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } 9 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q 9 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q 9 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max fluctuations in Q � E Y ∼ Q [ f ( Y ) 2 ] f deg D 9 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max fluctuations in Q � E Y ∼ Q [ f ( Y ) 2 ] f deg D � ω (1) “degree- D polynomial succeed” = O (1) “degree- D polynomials fail” 9 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): 10 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), 10 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n 10 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n 10 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ 10 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ Extended low-degree conjecture [Hopkins ’18] : degree- D polynomials ⇔ n ˜ Θ( D ) -time algorithms D = n δ exp( n δ ± o (1) ) time ⇔ 10 / 31

✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 11 / 31

✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) 11 / 31

✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T Numerator: Y ∼ P [ f ( Y )] E 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] Numerator: Y ∼ P [ f ( Y )] = E f S E | S |≤ D 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] E Denominator: 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ E Denominator: (orthonormality) S | S |≤ D 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � = � c , c � Adv ≤ D = max � ˆ � c � ˆ f � f f ∗ = c Optimizer: ˆ 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � = � c , c � � c � = � c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 11 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � � 2 � ˆ f , c � = � c , c � � � � � c � = � c � = � Y ∼ P [ Y S ] Adv ≤ D = max E � ˆ � ˆ f � f | S |≤ D f ∗ = c Optimizer: ˆ 11 / 31

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: 12 / 31

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) 12 / 31

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace 12 / 31

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] 12 / 31

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” 12 / 31

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E 12 / 31

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” 12 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” Proof: ˆ Y ∼ Q [ L ( Y ) Y S ] = E Y ∼ P [ Y S ] f ∗ ˆ Y ∼ P [ Y S ] ✶ | S |≤ D S = E L S = E 12 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: 13 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs 13 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D = d ! E d =0 13 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D = d ! E d =0 ◮ Rademacher model Y ∈ {± 1 } m : P : E [ Y | X ] = X Q : E [ Y ] = 0 vs 13 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D = d ! E d =0 ◮ Rademacher model Y ∈ {± 1 } m : P : E [ Y | X ] = X Q : E [ Y ] = 0 vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D ≤ d ! E d =0 13 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): 14 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 14 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D ◮ Need to know orthogonal polynomials w.r.t. Q ◮ Possible when Q has independent coordinates 14 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D ◮ Need to know orthogonal polynomials w.r.t. Q ◮ Possible when Q has independent coordinates ◮ To predict computational complexity: for D ≈ log n , � ω (1) ⇒ “easy” Adv ≤ D = ⇒ “hard” O (1) 14 / 31

Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D ◮ Need to know orthogonal polynomials w.r.t. Q ◮ Possible when Q has independent coordinates ◮ To predict computational complexity: for D ≈ log n , � ω (1) ⇒ “easy” Adv ≤ D = ⇒ “hard” O (1) ◮ These predictions are “correct” for: planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, ... [BHKKMP16,HS17,HKPRSS17,Hop18,BK W 19,K W B19,DK W B19] 14 / 31

Part III: Recovery 15 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) 16 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v 16 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) 16 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v 16 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard 16 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard But planted submatrix has a detection-recovery gap 16 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard But planted submatrix has a detection-recovery gap How to show hardness of recovery when detection is easy? 16 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) 17 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Goal: given Y , estimate v 1 via polynomial f : R n × n → R 17 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Goal: given Y , estimate v 1 via polynomial f : R n × n → R Low-degree minimum mean squared error: f deg D E ( f ( Y ) − v 1 ) 2 MMSE ≤ D = min 17 / 31

Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Goal: given Y , estimate v 1 via polynomial f : R n × n → R Low-degree minimum mean squared error: f deg D E ( f ( Y ) − v 1 ) 2 MMSE ≤ D = min Equivalent to low-degree maximum correlation: E [ f ( Y ) · v 1 ] Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D 1 ] − Corr 2 Fact: MMSE ≤ D = E [ v 2 ≤ D 17 / 31

Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D 18 / 31

Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? 18 / 31

Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D 18 / 31

Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D Numerator: E [ f ( Y ) · v 1 ] 18 / 31

Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] ˆ � Numerator: E [ f ( Y ) · v 1 ] = | S |≤ D 18 / 31

Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D 18 / 31

Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D Denominator: E [ f ( Y ) 2 ] 18 / 31

Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D f T E [ Y S · Y T ] � f S ˆ ˆ Denominator: E [ f ( Y ) 2 ] = S , T 18 / 31

Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D f T E [ Y S · Y T ] = ˆ � f S ˆ ˆ f ⊤ M ˆ Denominator: E [ f ( Y ) 2 ] = f S , T 18 / 31

Introduction to the Low-Degree Polynomial Method Alex Wein Courant - PowerPoint PPT Presentation

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York University 1 / 31 Part I: Why Low-Degree Polynomials? 2 / 31 Problems in High-Dimensional Statistics Example: finding a large clique in a random graph 3

Computational Barriers to Estimation from Low-Degree Polynomials Alex Wein Courant Institute,

Low-Degree Hardness of Random Optimization Problems Alex Wein Courant Institute, New York

The Power of Low-Degree Polynomials for Solving Statistical Problems Alex Wein Courant

Hardness of Certification for Constrained PCA Alex Wein Courant Institute, NYU Joint work with:

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Understanding Statistical-vs-Computational Tradeoffs via the Low-Degree Likelihood Ratio Alex

Estimation in the Presence of Group Actions Alex Wein MIT Mathematics 1 / 28 Joint work with:

Statistical Estimation in the Presence of Group Actions Alex Wein MIT Mathematics 1 / 39 In

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, NYU Joint work with: Ahmed El

Spectral Methods from Tensor Networks Alex Wein Courant Institute, NYU Joint work with Ankur

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

On Kauffman polynomial of alternating knot and HOMFLY polynomial of its Whitehead double

PATTERN RECOGNITION AND MACHINE LEARNING Polynomial Curve Fitting Sum-of-Squares Error Function 0

Property of the interior polynomial from the HOMFLY polynomial

Cubic spline interpolation High-degree polynomial fitting has strong oscillations. Can we get a

Polynomial Equations and Inequal- ities We will consider polynomial equations first and assume

X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS Background measurement Image

RooStats Lecture and Tutorials Lorenzo Moneta (CERN) Terascale Alliance School and Workshop,

Statistical Tests Amanda Stathopoulos amanda.stathopoulos@epfl.ch Transport and Mobility

Statistical Tests Matthieu de Lapparent matthieu.delapparent@epfl.ch Transport and Mobility

Retrieval by Content Part 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent

New CMS results on B 0 K* 0 + decay studies Introduction Signal evidence & fit

Overlap distribution in the Spherical Sherrington-Kirkpatrick model with V.-L. Nguyen, Benjamin

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models S.M. Eslami,N. Heess, T.

Introduction to the Low-Degree Polynomial Method Alex Wein Courant - PowerPoint PPT Presentation

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York University 1 / 31 Part I: Why Low-Degree Polynomials? 2 / 31 Problems in High-Dimensional Statistics Example: finding a large clique in a random graph 3

Computational Barriers to Estimation from Low-Degree Polynomials Alex Wein Courant Institute,

Low-Degree Hardness of Random Optimization Problems Alex Wein Courant Institute, New York

The Power of Low-Degree Polynomials for Solving Statistical Problems Alex Wein Courant

Hardness of Certification for Constrained PCA Alex Wein Courant Institute, NYU Joint work with:

Introduction Warping polynomial Span of warping polynomial Span and dealternating number Ayaka

Understanding Statistical-vs-Computational Tradeoffs via the Low-Degree Likelihood Ratio Alex

Estimation in the Presence of Group Actions Alex Wein MIT Mathematics 1 / 28 Joint work with:

Statistical Estimation in the Presence of Group Actions Alex Wein MIT Mathematics 1 / 39 In

The Kikuchi Hierarchy and Tensor PCA Alex Wein Courant Institute, NYU Joint work with: Ahmed El

Spectral Methods from Tensor Networks Alex Wein Courant Institute, NYU Joint work with Ankur

Why Algorithmic and Rigorous Polynomial Approximations? Rigorous Polynomial Approximation =

On Kauffman polynomial of alternating knot and HOMFLY polynomial of its Whitehead double

PATTERN RECOGNITION AND MACHINE LEARNING Polynomial Curve Fitting Sum-of-Squares Error Function 0

Property of the interior polynomial from the HOMFLY polynomial

Cubic spline interpolation High-degree polynomial fitting has strong oscillations. Can we get a

Polynomial Equations and Inequal- ities We will consider polynomial equations first and assume

X-RAY SPECTRAL WORKSHOP 2019 POISSON STATISTICS WITH BACKGROUNDS Background measurement Image

RooStats Lecture and Tutorials Lorenzo Moneta (CERN) Terascale Alliance School and Workshop,

Statistical Tests Amanda Stathopoulos amanda.stathopoulos@epfl.ch Transport and Mobility

Statistical Tests Matthieu de Lapparent matthieu.delapparent@epfl.ch Transport and Mobility

Retrieval by Content Part 3: Text Retrieval Latent Semantic Indexing Srihari: CSE 626 1 Latent

New CMS results on B 0 K* 0 + decay studies Introduction Signal evidence &amp; fit

Overlap distribution in the Spherical Sherrington-Kirkpatrick model with V.-L. Nguyen, Benjamin

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models S.M. Eslami,N. Heess, T.

New CMS results on B 0 K* 0 + decay studies Introduction Signal evidence & fit