hardness of certification for constrained pca
play

Hardness of Certification for Constrained PCA Alex Wein Courant - PowerPoint PPT Presentation

Hardness of Certification for Constrained PCA Alex Wein Courant Institute, NYU Joint work with: Afonso Bandeira (NYU) Tim Kunisky (NYU) 1 / 19 Part I: Statistical-to-Computational Gaps and the Low-Degree Method 2 / 19


  1. The Low-Degree Method Suppose we want to hypothesis test (with error probability o (1)) between two distributions: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { k -clique } Look for a degree- D multivariate polynomial f that distinguishes P from Q : E Y ∼ P [ f ( Y )] max � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D 6 / 19

  2. The Low-Degree Method Suppose we want to hypothesis test (with error probability o (1)) between two distributions: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { k -clique } Look for a degree- D multivariate polynomial f that distinguishes P from Q : E Y ∼ P [ f ( Y )] max � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D Want f ( Y ) to be big when Y ∼ P and small when Y ∼ Q 6 / 19

  3. The Low-Degree Method E Y ∼ P [ f ( Y )] max R [ Y ] D : polynomials of � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D degree ≤ D (subspace) 7 / 19

  4. The Low-Degree Method E Y ∼ P [ f ( Y )] max R [ Y ] D : polynomials of � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D degree ≤ D (subspace) E Y ∼ Q [ L ( Y ) f ( Y )] = max � E Y ∼ Q [ f ( Y ) 2 ] L ( Y ) = d P f ∈ R [ Y ] D d Q ( Y ) 7 / 19

  5. The Low-Degree Method E Y ∼ P [ f ( Y )] max R [ Y ] D : polynomials of � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D degree ≤ D (subspace) E Y ∼ Q [ L ( Y ) f ( Y )] = max � E Y ∼ Q [ f ( Y ) 2 ] L ( Y ) = d P f ∈ R [ Y ] D d Q ( Y ) � L , f � = max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � f � f ∈ R [ Y ] D � � f � = � f , f � 7 / 19

  6. The Low-Degree Method E Y ∼ P [ f ( Y )] max R [ Y ] D : polynomials of � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D degree ≤ D (subspace) E Y ∼ Q [ L ( Y ) f ( Y )] = max � E Y ∼ Q [ f ( Y ) 2 ] L ( Y ) = d P f ∈ R [ Y ] D d Q ( Y ) � L , f � = max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � f � f ∈ R [ Y ] D = � L ≤ D � � � f � = � f , f � Maximizer: f = L ≤ D := proj ( R [ Y ] D ) L 7 / 19

  7. The Low-Degree Method E Y ∼ P [ f ( Y )] max R [ Y ] D : polynomials of � E Y ∼ Q [ f ( Y ) 2 ] f ∈ R [ Y ] D degree ≤ D (subspace) E Y ∼ Q [ L ( Y ) f ( Y )] = max � E Y ∼ Q [ f ( Y ) 2 ] L ( Y ) = d P f ∈ R [ Y ] D d Q ( Y ) � L , f � = max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � f � f ∈ R [ Y ] D = � L ≤ D � � � f � = � f , f � Maximizer: f = L ≤ D := proj ( R [ Y ] D ) L Norm of low-degree likelihood ratio 7 / 19

  8. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D 8 / 19

  9. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail 8 / 19

  10. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms 8 / 19

  11. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms ◮ Spectral method: distinguish via top eigenvalue of matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y 8 / 19

  12. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms ◮ Spectral method: distinguish via top eigenvalue of matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y ◮ Log-degree distinguisher: f ( Y ) = Tr ( M q ) with q = Θ(log n ) 8 / 19

  13. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms ◮ Spectral method: distinguish via top eigenvalue of matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y ◮ Log-degree distinguisher: f ( Y ) = Tr ( M q ) with q = Θ(log n ) ◮ Spectral methods ⇔ sum-of-squares [HKPRSS ’17] 8 / 19

  14. The Low-Degree Method E Y ∼ P [ f ( Y )] √ E Y ∼ Q [ f ( Y ) 2 ] = � L ≤ D � Conclusion: max f ∈ R [ Y ] D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms ◮ Spectral method: distinguish via top eigenvalue of matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y ◮ Log-degree distinguisher: f ( Y ) = Tr ( M q ) with q = Θ(log n ) ◮ Spectral methods ⇔ sum-of-squares [HKPRSS ’17] Conjecture (informal variant of [Hopkins ’18] ) For “nice” Q , P , if � L ≤ D � = O (1) for D = log 1+Ω(1) ( n ) then no polynomial-time algorithm can distinguish Q , P with success probability 1 − o (1) . 8 / 19

  15. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems 9 / 19

  16. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems ◮ And the predictions are correct! (i.e. matching widely-believed conjectures) 9 / 19

  17. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems ◮ And the predictions are correct! (i.e. matching widely-believed conjectures) ◮ Planted clique, sparse PCA, stochastic block model, tensor PCA, ... 9 / 19

  18. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems ◮ And the predictions are correct! (i.e. matching widely-believed conjectures) ◮ Planted clique, sparse PCA, stochastic block model, tensor PCA, ... ◮ Heuristically, low-degree prediction matches performance of sum-of-squares 9 / 19

  19. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems ◮ And the predictions are correct! (i.e. matching widely-believed conjectures) ◮ Planted clique, sparse PCA, stochastic block model, tensor PCA, ... ◮ Heuristically, low-degree prediction matches performance of sum-of-squares ◮ But low-degree calculation is much easier than proving SOS lower bounds 9 / 19

  20. Advantages of the Low-Degree Method ◮ Can actually calculate/bound � L ≤ D � for many problems ◮ And the predictions are correct! (i.e. matching widely-believed conjectures) ◮ Planted clique, sparse PCA, stochastic block model, tensor PCA, ... ◮ Heuristically, low-degree prediction matches performance of sum-of-squares ◮ But low-degree calculation is much easier than proving SOS lower bounds ◮ By varying degree D , can explore power of subexponential-time algorithms: ◮ Degree- n δ polynomials ⇔ Time-2 n δ algorithms δ ∈ (0 , 1) 9 / 19

  21. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) 10 / 19

  22. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) = E X exp( � Y , X �− 1 L ( Y ) = d P 2 � X � 2 ) exp( − 1 2 � Y � 2 ) 10 / 19

  23. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) = E X exp( � Y , X �− 1 L ( Y ) = d P 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Write L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) 10 / 19

  24. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) = E X exp( � Y , X �− 1 L ( Y ) = d P 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Write L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) � L ≤ D � 2 = � | α |≤ D c 2 α where c α = � L , h α � = E Y ∼ Q [ L ( Y ) h α ( Y )] 10 / 19

  25. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) = E X exp( � Y , X �− 1 L ( Y ) = d P 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Write L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) � L ≤ D � 2 = � | α |≤ D c 2 α where c α = � L , h α � = E Y ∼ Q [ L ( Y ) h α ( Y )] · · · 10 / 19

  26. How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) = E X exp( � Y , X �− 1 L ( Y ) = d P 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Write L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) � L ≤ D � 2 = � | α |≤ D c 2 α where c α = � L , h α � = E Y ∼ Q [ L ( Y ) h α ( Y )] · · · Result: � L ≤ D � 2 = � D d ! E X , X ′ [ � X , X ′ � d ] 1 d =0 10 / 19

  27. Part II: Hardness of Certification for Constrained PCA Problems 11 / 19

  28. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) 12 / 19

  29. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) ◮ Eigenvalues follow semicircle law on [ − 2 , 2] 12 / 19

  30. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) ◮ Eigenvalues follow semicircle law on [ − 2 , 2] � x � =1 x ⊤ Wx PCA: max 12 / 19

  31. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) ◮ Eigenvalues follow semicircle law on [ − 2 , 2] � x � =1 x ⊤ Wx = λ max ( W ) → 2 PCA: max as n → ∞ 12 / 19

  32. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) ◮ Eigenvalues follow semicircle law on [ − 2 , 2] � x � =1 x ⊤ Wx = λ max ( W ) → 2 PCA: max as n → ∞ x ∈{± 1 / √ n } n x ⊤ Wx Constrained PCA: φ ( W ) := max 12 / 19

  33. Constrained PCA Let W ∼ GOE( n ) “Gaussian orthogonal ensemble” ◮ n × n random symmetric matrix: W ij = W ji ∼ N (0 , 1 / n ) , W ii ∼ N (0 , 2 / n ) ◮ Eigenvalues follow semicircle law on [ − 2 , 2] � x � =1 x ⊤ Wx = λ max ( W ) → 2 PCA: max as n → ∞ x ∈{± 1 / √ n } n x ⊤ Wx Constrained PCA: φ ( W ) := max Statistical physics: “Sherrington–Kirkpatrick spin glass model” ◮ φ ( W ) → 2P ∗ ≈ 1 . 5264 as n → ∞ [Parisi ’80; Talagrand ’06] 12 / 19

  34. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) 13 / 19

  35. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: 13 / 19

  36. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx 13 / 19

  37. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) 13 / 19

  38. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B 13 / 19

  39. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B ◮ Formally: algorithm { f n } outputs f n ( W ) ∈ R such that: 13 / 19

  40. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B ◮ Formally: algorithm { f n } outputs f n ( W ) ∈ R such that: (i) φ ( W ) ≤ f n ( W ) ∀ W ∈ R n × n 13 / 19

  41. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B ◮ Formally: algorithm { f n } outputs f n ( W ) ∈ R such that: (i) φ ( W ) ≤ f n ( W ) ∀ W ∈ R n × n (ii) if W ∼ GOE ( n ), f n ( W ) ≤ B + o (1) w.p. 1 − o (1) 13 / 19

  42. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B ◮ Formally: algorithm { f n } outputs f n ( W ) ∈ R such that: (i) φ ( W ) ≤ f n ( W ) ∀ W ∈ R n × n (ii) if W ∼ GOE ( n ), f n ( W ) ≤ B + o (1) w.p. 1 − o (1) ◮ Note: cannot just output f n ( W ) = 2P ∗ + ε 13 / 19

  43. Search vs Certification x ∈{± 1 / √ n } n x ⊤ Wx , φ ( W ) := max W ∼ GOE ( n ) Two computational problems: ◮ Search: given W , find x ∈ {± 1 / √ n } n with large x ⊤ Wx ◮ Proves a lower bound on φ ( W ) ◮ Certification: given W , prove φ ( W ) ≤ B for some bound B ◮ Formally: algorithm { f n } outputs f n ( W ) ∈ R such that: (i) φ ( W ) ≤ f n ( W ) ∀ W ∈ R n × n (ii) if W ∼ GOE ( n ), f n ( W ) ≤ B + o (1) w.p. 1 − o (1) ◮ Note: cannot just output f n ( W ) = 2P ∗ + ε 13 / 19

  44. Search vs Certification: Prior Work 14 / 19

  45. Search vs Certification: Prior Work Perfect search is possible in poly time 14 / 19

  46. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] 14 / 19

  47. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] 14 / 19

  48. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: 14 / 19

  49. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max 14 / 19

  50. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? 14 / 19

  51. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? ◮ Convex relaxation? 14 / 19

  52. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? ◮ Convex relaxation? ◮ Sum-of-squares? 14 / 19

  53. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? ◮ Convex relaxation? ◮ Sum-of-squares? Answer: no! 14 / 19

  54. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? ◮ Convex relaxation? ◮ Sum-of-squares? Answer: no! ◮ In particular, any convex relaxation fails 14 / 19

  55. Search vs Certification: Prior Work Perfect search is possible in poly time ◮ Can find x ∈ {± 1 / √ n } n such that x ⊤ Wx ≥ 2P ∗ − ε [Montanari ’18] ◮ Optimization of full-RSB models [Subag ’18] Trivial spectral certification: � x � =1 x ⊤ Wx = λ max ( W ) → 2 φ ( W ) ≤ max Can we do better (in poly time)? ◮ Convex relaxation? ◮ Sum-of-squares? Answer: no! ◮ In particular, any convex relaxation fails 14 / 19

  56. Main Result 15 / 19

  57. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . 15 / 19

  58. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . ◮ In fact, need essentially exponential time: 2 n 1 − o (1) 15 / 19

  59. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . ◮ In fact, need essentially exponential time: 2 n 1 − o (1) ◮ Also for constraint sets other than {± 1 / √ n } n 15 / 19

  60. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . ◮ In fact, need essentially exponential time: 2 n 1 − o (1) ◮ Also for constraint sets other than {± 1 / √ n } n Proof outline: 15 / 19

  61. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . ◮ In fact, need essentially exponential time: 2 n 1 − o (1) ◮ Also for constraint sets other than {± 1 / √ n } n Proof outline: (i) Reduction from a hypothesis testing problem (negatively-spiked Wishart) to certification problem 15 / 19

  62. Main Result Theorem (informal) Conditional on the low-degree method, for any ε > 0 , no polynomial-time algorithm can certify an upper bound of 2 − ε on φ ( W ) . ◮ In fact, need essentially exponential time: 2 n 1 − o (1) ◮ Also for constraint sets other than {± 1 / √ n } n Proof outline: (i) Reduction from a hypothesis testing problem (negatively-spiked Wishart) to certification problem (ii) Use low-degree method to show that the hypothesis testing problem is hard 15 / 19

  63. Spiked Wishart Model 16 / 19

  64. Spiked Wishart Model Q : Observe N independent samples y 1 , . . . , y N where y i ∼ N (0 , I n ) 16 / 19

  65. Spiked Wishart Model Q : Observe N independent samples y 1 , . . . , y N where y i ∼ N (0 , I n ) P : Planted vector x ∼ Unif ( {± 1 / √ n } n ) Observe y 1 , . . . , y N with y i ∼ N (0 , I n + β xx ⊤ ) Parameters: n / N → γ, β ∈ [ − 1 , ∞ ) 16 / 19

  66. Spiked Wishart Model Q : Observe N independent samples y 1 , . . . , y N where y i ∼ N (0 , I n ) P : Planted vector x ∼ Unif ( {± 1 / √ n } n ) Observe y 1 , . . . , y N with y i ∼ N (0 , I n + β xx ⊤ ) Parameters: n / N → γ, β ∈ [ − 1 , ∞ ) Spectral threshold: if β 2 > γ , can distinguish Q , P using top/bottom eigenvalue of sample covariance matrix Y = 1 i y i y ⊤ � [Baik, Ben Arous, P´ ech´ e ’05] N i 16 / 19

  67. Spiked Wishart Model Q : Observe N independent samples y 1 , . . . , y N where y i ∼ N (0 , I n ) P : Planted vector x ∼ Unif ( {± 1 / √ n } n ) Observe y 1 , . . . , y N with y i ∼ N (0 , I n + β xx ⊤ ) Parameters: n / N → γ, β ∈ [ − 1 , ∞ ) Spectral threshold: if β 2 > γ , can distinguish Q , P using top/bottom eigenvalue of sample covariance matrix Y = 1 i y i y ⊤ � [Baik, Ben Arous, P´ ech´ e ’05] N i Using low-degree method, we show: if β 2 < γ , cannot distinguish Q , P (unless given exponential time) 16 / 19

  68. Negatively-Spiked Wishart Model Our case of interest: β = − 1 (technically β > − 1 , β ≈ − 1) 17 / 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend