introduction to the low degree polynomial method alex wein
play

Introduction to the Low-Degree Polynomial Method Alex Wein Courant - PowerPoint PPT Presentation

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York University 1 / 31 Part I: Why Low-Degree Polynomials? 2 / 31 Problems in High-Dimensional Statistics Example: finding a large clique in a random graph 3


  1. The Low-Degree Polynomial Method Claim: low-degree polynomials provide a unified explanation of information-computation gaps in detection/recovery/optimization For all of these problems... planted clique, sparse PCA, community detection, tensor PCA, planted CSPs, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, p-spin optimization, max independent set ...it is the case that ◮ the best known poly-time algorithms are low-degree (spectral/AMP/local) ◮ low-degree polynomials fail in the “hard” regime “Low-degree conjecture” (informal): low-degree polynomials are as powerful as all poly-time algorithms for “natural” high-dimensional problems [Hopkins ’18] 6 / 31

  2. Overview This talk: techniques to prove that all low-degree polynomials fail 7 / 31

  3. Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness 7 / 31

  4. Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) ◮ Recovery [Schramm, W. ’20] ◮ Optimization [Gamarnik, Jagannath, W. ’20] 7 / 31

  5. Part II: Detection 8 / 31

  6. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } 9 / 31

  7. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q 9 / 31

  8. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q 9 / 31

  9. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max fluctuations in Q � E Y ∼ Q [ f ( Y ) 2 ] f deg D 9 / 31

  10. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max fluctuations in Q � E Y ∼ Q [ f ( Y ) 2 ] f deg D � ω (1) “degree- D polynomial succeed” = O (1) “degree- D polynomials fail” 9 / 31

  11. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): 10 / 31

  12. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), 10 / 31

  13. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n 10 / 31

  14. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n 10 / 31

  15. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ 10 / 31

  16. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ Extended low-degree conjecture [Hopkins ’18] : degree- D polynomials ⇔ n ˜ Θ( D ) -time algorithms D = n δ exp( n δ ± o (1) ) time ⇔ 10 / 31

  17. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 11 / 31

  18. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) 11 / 31

  19. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i 11 / 31

  20. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T 11 / 31

  21. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T Numerator: Y ∼ P [ f ( Y )] E 11 / 31

  22. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] Numerator: Y ∼ P [ f ( Y )] = E f S E | S |≤ D 11 / 31

  23. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D 11 / 31

  24. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] E Denominator: 11 / 31

  25. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ E Denominator: (orthonormality) S | S |≤ D 11 / 31

  26. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D 11 / 31

  27. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f 11 / 31

  28. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 11 / 31

  29. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � = � c , c � Adv ≤ D = max � ˆ � c � ˆ f � f f ∗ = c Optimizer: ˆ 11 / 31

  30. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � ˆ f , c � = � c , c � � c � = � c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 11 / 31

  31. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = E f S E f , c � | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 E Denominator: (orthonormality) | S |≤ D � � 2 � ˆ f , c � = � c , c � � � � � c � = � c � = � Y ∼ P [ Y S ] Adv ≤ D = max E � ˆ � ˆ f � f | S |≤ D f ∗ = c Optimizer: ˆ 11 / 31

  32. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: 12 / 31

  33. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) 12 / 31

  34. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace 12 / 31

  35. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] 12 / 31

  36. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” 12 / 31

  37. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E 12 / 31

  38. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” 12 / 31

  39. Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := E Y ∼ Q [ f ( Y ) g ( Y )] “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” Proof: ˆ Y ∼ Q [ L ( Y ) Y S ] = E Y ∼ P [ Y S ] f ∗ ˆ Y ∼ P [ Y S ] ✶ | S |≤ D S = E L S = E 12 / 31

  40. Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: 13 / 31

  41. Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs 13 / 31

  42. Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D = d ! E d =0 13 / 31

  43. Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D = d ! E d =0 ◮ Rademacher model Y ∈ {± 1 } m : P : E [ Y | X ] = X Q : E [ Y ] = 0 vs 13 / 31

  44. Detection (e.g. [Hopkins, Steurer ’17] ) User-friendly results: ◮ Additive Gaussian model: P : Y = X + Z Q : Y = Z vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D = d ! E d =0 ◮ Rademacher model Y ∈ {± 1 } m : P : E [ Y | X ] = X Q : E [ Y ] = 0 vs D 1 � Adv 2 X , X ′ � X , X ′ � d ≤ D ≤ d ! E d =0 13 / 31

  45. Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): 14 / 31

  46. Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 14 / 31

  47. Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D ◮ Need to know orthogonal polynomials w.r.t. Q ◮ Possible when Q has independent coordinates 14 / 31

  48. Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D ◮ Need to know orthogonal polynomials w.r.t. Q ◮ Possible when Q has independent coordinates ◮ To predict computational complexity: for D ≈ log n , � ω (1) ⇒ “easy” Adv ≤ D = ⇒ “hard” O (1) 14 / 31

  49. Detection (e.g. [Hopkins, Steurer ’17] ) Recap (detection): ◮ Given P , Q , can compute (via linear algebra) E Y ∼ P [ f ( Y )] Adv ≤ D = � L ≤ D � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D ◮ Need to know orthogonal polynomials w.r.t. Q ◮ Possible when Q has independent coordinates ◮ To predict computational complexity: for D ≈ log n , � ω (1) ⇒ “easy” Adv ≤ D = ⇒ “hard” O (1) ◮ These predictions are “correct” for: planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, ... [BHKKMP16,HS17,HKPRSS17,Hop18,BK W 19,K W B19,DK W B19] 14 / 31

  50. Part III: Recovery 15 / 31

  51. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) 16 / 31

  52. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v 16 / 31

  53. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) 16 / 31

  54. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v 16 / 31

  55. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard 16 / 31

  56. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard But planted submatrix has a detection-recovery gap 16 / 31

  57. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. Recovery : given Y ∼ P , recover v If you can recover then you can detect (poly-time reduction) ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v ⊤ Y ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard But planted submatrix has a detection-recovery gap How to show hardness of recovery when detection is easy? 16 / 31

  58. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) 17 / 31

  59. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Goal: given Y , estimate v 1 via polynomial f : R n × n → R 17 / 31

  60. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Goal: given Y , estimate v 1 via polynomial f : R n × n → R Low-degree minimum mean squared error: f deg D E ( f ( Y ) − v 1 ) 2 MMSE ≤ D = min 17 / 31

  61. Recovery [Schramm, W. ’20] Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Goal: given Y , estimate v 1 via polynomial f : R n × n → R Low-degree minimum mean squared error: f deg D E ( f ( Y ) − v 1 ) 2 MMSE ≤ D = min Equivalent to low-degree maximum correlation: E [ f ( Y ) · v 1 ] Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D 1 ] − Corr 2 Fact: MMSE ≤ D = E [ v 2 ≤ D 17 / 31

  62. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D 18 / 31

  63. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? 18 / 31

  64. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D 18 / 31

  65. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D Numerator: E [ f ( Y ) · v 1 ] 18 / 31

  66. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] ˆ � Numerator: E [ f ( Y ) · v 1 ] = | S |≤ D 18 / 31

  67. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D 18 / 31

  68. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D Denominator: E [ f ( Y ) 2 ] 18 / 31

  69. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D f T E [ Y S · Y T ] � f S ˆ ˆ Denominator: E [ f ( Y ) 2 ] = S , T 18 / 31

  70. Recovery [Schramm, W. ’20] E [ f ( Y ) · v 1 ] For hardness, want upper bound on Corr ≤ D = max � E [ f ( Y ) 2 ] f deg D Same proof as detection? � f S Y S ˆ f = | S |≤ D f S E [ Y S · v 1 ] =: � ˆ ˆ � Numerator: E [ f ( Y ) · v 1 ] = f , c � | S |≤ D f T E [ Y S · Y T ] = ˆ � f S ˆ ˆ f ⊤ M ˆ Denominator: E [ f ( Y ) 2 ] = f S , T 18 / 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend