Computational Barriers to Estimation from Low-Degree Polynomials - PowerPoint PPT Presentation

Optimality of Low-Degree Polynomials? Low-degree polynomials seem to be optimal for many problems! For all of these problems... planted clique, sparse PCA, community detection, tensor PCA, spiked Wigner/Wishart, planted submatrix, planted dense subgraph, ... ...it is the case that ◮ the best known poly-time algorithms are captured by O (log n )-degree polynomials (spectral/AMP) ◮ low-degree polynomials fail in the “hard” regime “Low-degree conjecture” (informal): for “natural” problems, if low-degree polynomials fail then all poly-time algorithms fail [Hopkins ’18] Caveat: Gaussian elimination for planted XOR-SAT 6 / 23

Overview This talk: techniques to prove that all low-degree polynomials fail 7 / 23

Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness 7 / 23

Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work) [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) 7 / 23

Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work) [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) ◮ Recovery (this work) [Schramm, W. ’20] 7 / 23

Overview This talk: techniques to prove that all low-degree polynomials fail ◮ Gives evidence for computational hardness Settings: ◮ Detection (prior work) [Hopkins, Steurer ’17] [Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17] [Hopkins ’18] (PhD thesis) [Kunisky, W., Bandeira ’19] (survey) ◮ Recovery (this work) [Schramm, W. ’20] ◮ Optimization [Gamarnik, Jagannath, W. ’20] 7 / 23

Relation to Other Frameworks 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18] 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18] ◮ Average-case reductions [BR13,...] 8 / 23

Relation to Other Frameworks ◮ Sum-of-squares lower bounds [BHKKMP16,...] ◮ Actually for certification ◮ Connected to low-degree [HKPRSS17] ◮ Statistical query lower bounds [FGRVX12,...] ◮ Need i.i.d. samples ◮ Equivalent to low-degree [BBHLS20] ◮ Approximate message passing (AMP) [DMM09, LKZ15,...] ◮ AMP algorithms are low-degree ◮ AMP can be sub-optimal (e.g. tensor PCA) [MR14] ◮ Overlap gap property / MCMC lower bounds [GS13, GZ17,...] ◮ MCMC algorithms are not low-degree (?) ◮ MCMC can be sub-optimal (e.g. tensor PCA) [BGJ18] ◮ Average-case reductions [BR13,...] ◮ Need to argue that starting problem is hard [BB20] 8 / 23

Part II: Detection 9 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } 10 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q 10 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q 10 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] fluctuations in Q f deg D 10 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Goal: hypothesis test with error probability o (1) between: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D polynomial f : R n × n → R that distinguishes P from Q ◮ f ( Y ) is “big” when Y ∼ P and “small” when Y ∼ Q Compute “advantage”: E Y ∼ P [ f ( Y )] mean in P Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] fluctuations in Q f deg D � ω (1) “degree- D polynomial succeed” = O (1) “degree- D polynomials fail” 10 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): 11 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), 11 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n 11 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n 11 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ 11 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Prototypical result (planted clique): Theorem [BHKKMP16,Hop18] : For a planted k -clique in G ( n , 1 / 2), ◮ if k = Ω( √ n ) then Adv ≤ D = ω (1) for some D = O (log n ) low-degree polynomials succeed when k � √ n ◮ if k = O ( n 1 / 2 − ǫ ) then Adv ≤ D = O (1) for any D = O (log n ) low-degree polynomials fail when k ≪ √ n Sometimes can rule out polynomials of degree D = n δ Extended low-degree conjecture [Hopkins ’18] : degree- D polynomials ⇔ n ˜ Θ( D ) -time algorithms D = n δ exp( n δ ± o (1) ) time ⇔ 11 / 23

✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 12 / 23

✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) 12 / 23

✶ Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T Numerator: Y ∼ P [ f ( Y )] E 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] Numerator: Y ∼ P [ f ( Y )] = E f S E | S |≤ D 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] Denominator: E 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ Denominator: E (orthonormality) S | S |≤ D 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � = � c , c � Adv ≤ D = max � ˆ � c � ˆ f � f f ∗ = c Optimizer: ˆ 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � ˆ f , c � = � c , c � � c � = � c � Adv ≤ D = max � ˆ ˆ f � f f ∗ = c Optimizer: ˆ 12 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] Goal: compute Adv ≤ D := max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Suppose Q is i.i.d. Unif ( ± 1) Y S := � | S |≤ D ˆ f S Y S Write f ( Y ) = � S ⊆ [ m ] i ∈ S Y i { Y S } S ⊆ [ m ] are orthonormal: E Y ∼ Q [ Y S Y T ] = ✶ S = T � ˆ Y ∼ P [ Y S ] =: � ˆ Numerator: Y ∼ P [ f ( Y )] = f , c � E f S E | S |≤ D Y ∼ Q [ f ( Y ) 2 ] = � f 2 ˆ S = � ˆ f � 2 Denominator: E (orthonormality) | S |≤ D � � 2 � ˆ f , c � = � c , c � � � � � c � = � c � = � Y ∼ P [ Y S ] Adv ≤ D = max E � ˆ � ˆ f � f | S |≤ D f ∗ = c Optimizer: ˆ 12 / 23

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: 13 / 23

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) 13 / 23

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace 13 / 23

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E 13 / 23

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” 13 / 23

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E 13 / 23

✶ Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” 13 / 23

Detection (e.g. [Hopkins, Steurer ’17] ) Remarks: ◮ Best test is likelihood ratio (Neyman-Pearson lemma) L ( Y ) = d P d Q ( Y ) ◮ Best degree- D test (maximizer of Adv ≤ D ) is f ∗ = L ≤ D := projection of L onto deg- D subspace orthogonal projection w.r.t. � f , g � := Y ∼ Q [ f ( Y ) g ( Y )] E “low-degree likelihood ratio” ◮ Adv ≤ D = � L ≤ D � � Y ∼ Q [ f ( Y ) 2 ] � f � := � f , f � = E “norm of low-degree likelihood ratio” Proof: ˆ f ∗ ˆ Y ∼ Q [ L ( Y ) Y S ] = E Y ∼ P [ Y S ] Y ∼ P [ Y S ] ✶ | S |≤ D L S = E S = E 13 / 23

Part III: Recovery 14 / 23

Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) 15 / 23

Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 15 / 23

Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. 15 / 23

Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 15 / 23

Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v 15 / 23

Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v ◮ Leading eigenvector succeeds when λ ≫ ( ρ √ n ) − 1 15 / 23

Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v ◮ Leading eigenvector succeeds when λ ≫ ( ρ √ n ) − 1 ◮ Exhaustive search succeeds when λ ≫ ( ρ n ) − 1 / 2 15 / 23

Planted Submatrix Example (planted submatrix): observe n × n matrix Y = X + Z ◮ Signal: X = λ vv ⊤ v i ∼ Bernoulli ( ρ ) λ > 0 ◮ Noise: Z i.i.d. N (0 , 1) Regime: 1 / √ n ≪ ρ ≪ 1 Detection : distinguish P : Y = X + Z vs Q : Y = Z w.h.p. ◮ Sum of all entries succeeds when λ ≫ ( ρ √ n ) − 2 Recovery : given Y ∼ P , recover v ◮ Leading eigenvector succeeds when λ ≫ ( ρ √ n ) − 1 ◮ Exhaustive search succeeds when λ ≫ ( ρ n ) − 1 / 2 Detection-recovery gap 15 / 23

Recovery Hardness from Detection Hardness? 16 / 23

Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) 16 / 23

Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) v ⊤ Y ˆ ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v 16 / 23

Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) v ⊤ Y ˆ ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard 16 / 23

Recovery Hardness from Detection Hardness? If you can recover then you can detect (poly-time reduction) v ⊤ Y ˆ ◮ How: run recovery algorithm to get ˆ v ∈ { 0 , 1 } n ; check ˆ v So if Adv ≤ D = O (1), this suggests recovery is hard But how to show hardness of recovery when detection is easy? 16 / 23

Computational Barriers to Estimation from Low-Degree Polynomials - PowerPoint PPT Presentation

Computational Barriers to Estimation from Low-Degree Polynomials Alex Wein Courant Institute, New York University Joint work with: Tselil Schramm Stanford 1 / 23 Part I: Why Low-Degree Polynomials? 2 / 23 Problems in High-Dimensional

WIOA Populations with Barriers and Proposed Solutions WIOA BARRIER POTENTIAL BARRIERS TO ACCESS

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Degree-degree correlations in directed networks with heavy-tailed degrees Pim van der Hoorn

When the catenary degree meets the tame degree in embedding dimension three numerical semigroups

Equitable Housing Site Barriers and Solutions HLA Conference February 9, 2018 Equitable

Pool Barrier, Fence, Gate, Closer and 1 Alarm Requirements Barriers Barriers are not child

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

What does a degree result mean? Charting UCLs journey from Honours Degree Classification to

Memo rial Luiz V elho Overview Education B.S. Degree - ESDI { M.S. Degree - MIT

Some rst b ounds on the degree A b ound on the degree of SPN onstrutions

Computing central values of twisted L-functions of higher degree Nathan Ryan Computational

Low-degree cohomology for finite groups of Lie type Niles Johnson Joint with UGA VIGRE Algebra

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York

Lecture #1: Welcome to CS88! UC Berkeley EECS Lecturer Michael Ball August 26, 2020

CS 188: Artificial Intelligence Bayes Nets: Inference Instructors: Dan Klein and Pieter Abbeel

650 50 MH MHz Sol olid id Stat ate e RF F Pow ower er development velopment at t RRCA

8. Biasing Transistor Amplifiers Lecture notes: Sec. 5 Sedra & Smith (6 th Ed): Sec. 5.4, 5.6

Signal Types Recall even digital signals are just voltages Analog signal Continuous

Let = + i denote a nontrivial zero of ( s ) , and consider the sequence of ordinates

Problems for Amplifier Section Lecture notes: Sec. 6 F. Najmabadi, ECE65, Winter 2012 Exercise

Introduction to Transistor Amplifiers: Concept & Biasing Lecture notes: Sec. 5 Sedra &

Computational Barriers to Estimation from Low-Degree Polynomials - PowerPoint PPT Presentation

Computational Barriers to Estimation from Low-Degree Polynomials Alex Wein Courant Institute, New York University Joint work with: Tselil Schramm Stanford 1 / 23 Part I: Why Low-Degree Polynomials? 2 / 23 Problems in High-Dimensional

WIOA Populations with Barriers and Proposed Solutions WIOA BARRIER POTENTIAL BARRIERS TO ACCESS

Motion Estimation by Affine Transforms Motion Estimation by Affine Transforms Motion Estimation

Degree-degree correlations in directed networks with heavy-tailed degrees Pim van der Hoorn

When the catenary degree meets the tame degree in embedding dimension three numerical semigroups

Equitable Housing Site Barriers and Solutions HLA Conference February 9, 2018 Equitable

Pool Barrier, Fence, Gate, Closer and 1 Alarm Requirements Barriers Barriers are not child

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

M-Estimation under High-Dimensional Asymptotics DLD, Andrea Montanari 2014-05-01 DLD, Andrea

Part 3. Spectrum Estimation Part 3. Spectrum Estimation 3.2 Parametric Methods for Spectral

Lecture 6. Bayesian estimation Lecture 6. Bayesian estimation 1 (172) 6. Bayesian estimation

What does a degree result mean? Charting UCLs journey from Honours Degree Classification to

Memo rial Luiz V elho Overview Education B.S. Degree - ESDI { M.S. Degree - MIT

Some rst b ounds on the degree A b ound on the degree of SPN onstrutions

Computing central values of twisted L-functions of higher degree Nathan Ryan Computational

Low-degree cohomology for finite groups of Lie type Niles Johnson Joint with UGA VIGRE Algebra

Introduction to the Low-Degree Polynomial Method Alex Wein Courant Institute, New York

Lecture #1: Welcome to CS88! UC Berkeley EECS Lecturer Michael Ball August 26, 2020

CS 188: Artificial Intelligence Bayes Nets: Inference Instructors: Dan Klein and Pieter Abbeel

650 50 MH MHz Sol olid id Stat ate e RF F Pow ower er development velopment at t RRCA

8. Biasing Transistor Amplifiers Lecture notes: Sec. 5 Sedra &amp; Smith (6 th Ed): Sec. 5.4, 5.6

Signal Types Recall even digital signals are just voltages Analog signal Continuous

Let = + i denote a nontrivial zero of ( s ) , and consider the sequence of ordinates

Problems for Amplifier Section Lecture notes: Sec. 6 F. Najmabadi, ECE65, Winter 2012 Exercise

Introduction to Transistor Amplifiers: Concept &amp; Biasing Lecture notes: Sec. 5 Sedra &amp;

8. Biasing Transistor Amplifiers Lecture notes: Sec. 5 Sedra & Smith (6 th Ed): Sec. 5.4, 5.6

Introduction to Transistor Amplifiers: Concept & Biasing Lecture notes: Sec. 5 Sedra &