computational barriers to estimation from low degree
play

Computational Barriers to Estimation from Low-Degree Polynomials - PowerPoint PPT Presentation

Computational Barriers to Estimation from Low-Degree Polynomials Alex Wein Courant Institute, New York University Joint work with: Tselil Schramm Stanford 1 / 23 Part I: Why Low-Degree Polynomials? 2 / 23 Problems in High-Dimensional


  1. Optimality of Low-Degree Polynomials? Low-degree polynomials seem to be optimal for many problems! For all of these problems... planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, ... ...it is the case that ◮ the best known poly-time algorithms are captured by O (log n )-degree polynomials (spectral/AMP) ◮ low-degree polynomials fail in the “hard” regime “Low-degree conjecture” (informal): for “natural” problems, if low-degree polynomials fail then all poly-time algorithms fail [Hopkins ’18] Caveat: Gaussian elimination for planted XOR-SAT 6 / 23

  2. Overview This talk: techniques to prove that all low-degree polynomials fail 7 / 23

  3. Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness 7 / 23

  4. Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work) [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) 7 / 23

  5. Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work) [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) ◮ Recovery (this work) [Schramm, W. ’20] 7 / 23

  6. Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work) [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) ◮ Recovery (this work) [Schramm, W. ’20] ◮ Optimization [Gamarnik, Jagannath, W. ’20] 7 / 23

  7. Relation to Other Frameworks 8 / 23

  8. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] 8 / 23

  9. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification 8 / 23

  10. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] 8 / 23

  11. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] 8 / 23

  12. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples 8 / 23

  13. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] 8 / 23

  14. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] 8 / 23

  15. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree 8 / 23

  16. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] 8 / 23

  17. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] 8 / 23

  18. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) 8 / 23

  19. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18] 8 / 23

  20. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18] ◮ Average-case reductions [BR13,...] 8 / 23

  21. Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18] ◮ Average-case reductions [BR13,...] ◮ Need to argue that starting problem is hard [BB20] 8 / 23

  22. Part II: Detection 9 / 23

  23. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } 10 / 23

  24. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q 10 / 23

  25. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q 10 / 23

  26. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] fluctuations in Q f deg D 10 / 23

  27. Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] fluctuations in Q f deg D � ω (1) “degree- D polynomial succeed” = O (1) “degree- D polynomials fail” 10 / 23

  28. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): 11 / 23

  29. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), 11 / 23

  30. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n 11 / 23

  31. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n 11 / 23

  32. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ 11 / 23

  33. Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ Extended low-degree conjecture [Hopkins ’18] : degree- D polynomials ⇔ n ˜ Θ( D ) -time algorithms D = n δ exp( n δ ± o (1) ) time ⇔ 11 / 23

  34. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 12 / 23

  35. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) 12 / 23

  36. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i 12 / 23

  37. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T 12 / 23

  38. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T Numerator: Y ∼ P [ f ( Y )] E 12 / 23

  39. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] Numerator: Y ∼ P [ f ( Y )] = E f S E | S |≤ D 12 / 23

  40. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D 12 / 23

  41. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] Denominator: E 12 / 23

  42. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ Denominator: E (orthonormality) S | S |≤ D 12 / 23

  43. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D 12 / 23

  44. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f 12 / 23

  45. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 12 / 23

  46. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � = � c , c � Adv ≤ D = max � ˆ � c � ˆ f � f f ∗ = c Optimizer: ˆ 12 / 23

  47. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � = � c , c � � c � = � c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 12 / 23

  48. Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � � 2 � ˆ f , c � = � c , c � � � � � c � = � c � = � Y ∼ P [ Y S ] Adv ≤ D = max E � ˆ � ˆ f � f | S |≤ D f ∗ = c Optimizer: ˆ 12 / 23

  49. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: 13 / 23

  50. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) 13 / 23

  51. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace 13 / 23

  52. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E 13 / 23

  53. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” 13 / 23

  54. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E 13 / 23

  55. ✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” 13 / 23

  56. Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” Proof: ˆ f ∗ ˆ Y ∼ Q [ L ( Y ) Y S ] = E Y ∼ P [ Y S ] Y ∼ P [ Y S ] ✶ | S |≤ D L S = E S = E 13 / 23

  57. Part III: Recovery 14 / 23

  58. Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) 15 / 23

  59. Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 15 / 23

  60. Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. 15 / 23

  61. Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 15 / 23

  62. Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v 15 / 23

  63. Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v ◮ Leading eigenvector succeeds when λ ≫ ( ρ √ n ) − 1 15 / 23

  64. Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v ◮ Leading eigenvector succeeds when λ ≫ ( ρ √ n ) − 1 ◮ Exhaustive search succeeds when λ ≫ ( ρ n ) − 1 / 2 15 / 23

  65. Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v ◮ Leading eigenvector succeeds when λ ≫ ( ρ √ n ) − 1 ◮ Exhaustive search succeeds when λ ≫ ( ρ n ) − 1 / 2 Detection-recovery gap 15 / 23

  66. Recovery Hardness from Detection Hardness? 16 / 23

  67. Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) 16 / 23

  68. Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) v ⊤ Y ˆ ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v 16 / 23

  69. Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) v ⊤ Y ˆ ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard 16 / 23

  70. Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) v ⊤ Y ˆ ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard But how to show hardness of recovery when detection is easy? 16 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend