Understanding Statistical-vs-Computational Tradeoffs via the - PowerPoint PPT Presentation

How to Show that a Problem is Hard? We don’t know how to prove that average-case problems are hard, but various forms of “rigorous evidence”: ◮ Reductions (e.g. from planted clique) [Berthet, Rigollet ’13; Brennan, Bresler,...] ◮ Failure of MCMC [Jerrum ’92] ◮ Shattering of solution space [Achlioptas, Coja-Oghlan ’08] 6 / 27

How to Show that a Problem is Hard? We don’t know how to prove that average-case problems are hard, but various forms of “rigorous evidence”: ◮ Reductions (e.g. from planted clique) [Berthet, Rigollet ’13; Brennan, Bresler,...] ◮ Failure of MCMC [Jerrum ’92] ◮ Shattering of solution space [Achlioptas, Coja-Oghlan ’08] ◮ Failure of local algorithms [Gamarnik, Sudan ’13] 6 / 27

How to Show that a Problem is Hard? We don’t know how to prove that average-case problems are hard, but various forms of “rigorous evidence”: ◮ Reductions (e.g. from planted clique) [Berthet, Rigollet ’13; Brennan, Bresler,...] ◮ Failure of MCMC [Jerrum ’92] ◮ Shattering of solution space [Achlioptas, Coja-Oghlan ’08] ◮ Failure of local algorithms [Gamarnik, Sudan ’13] ◮ Statistical physics, belief propagation [Decelle, Krzakala, Moore, Zdeborov´ a ’11] 6 / 27

How to Show that a Problem is Hard? We don’t know how to prove that average-case problems are hard, but various forms of “rigorous evidence”: ◮ Reductions (e.g. from planted clique) [Berthet, Rigollet ’13; Brennan, Bresler,...] ◮ Failure of MCMC [Jerrum ’92] ◮ Shattering of solution space [Achlioptas, Coja-Oghlan ’08] ◮ Failure of local algorithms [Gamarnik, Sudan ’13] ◮ Statistical physics, belief propagation [Decelle, Krzakala, Moore, Zdeborov´ a ’11] ◮ Optimization landscape, Kac-Rice formula [Auffinger, Ben Arous, ˇ Cern´ y ’10] 6 / 27

How to Show that a Problem is Hard? We don’t know how to prove that average-case problems are hard, but various forms of “rigorous evidence”: ◮ Reductions (e.g. from planted clique) [Berthet, Rigollet ’13; Brennan, Bresler,...] ◮ Failure of MCMC [Jerrum ’92] ◮ Shattering of solution space [Achlioptas, Coja-Oghlan ’08] ◮ Failure of local algorithms [Gamarnik, Sudan ’13] ◮ Statistical physics, belief propagation [Decelle, Krzakala, Moore, Zdeborov´ a ’11] ◮ Optimization landscape, Kac-Rice formula [Auffinger, Ben Arous, ˇ Cern´ y ’10] ◮ Statistical query lower bounds [Feldman, Grigorescu, Reyzin, Vempala, Xiao ’12] 6 / 27

How to Show that a Problem is Hard? We don’t know how to prove that average-case problems are hard, but various forms of “rigorous evidence”: ◮ Reductions (e.g. from planted clique) [Berthet, Rigollet ’13; Brennan, Bresler,...] ◮ Failure of MCMC [Jerrum ’92] ◮ Shattering of solution space [Achlioptas, Coja-Oghlan ’08] ◮ Failure of local algorithms [Gamarnik, Sudan ’13] ◮ Statistical physics, belief propagation [Decelle, Krzakala, Moore, Zdeborov´ a ’11] ◮ Optimization landscape, Kac-Rice formula [Auffinger, Ben Arous, ˇ Cern´ y ’10] ◮ Statistical query lower bounds [Feldman, Grigorescu, Reyzin, Vempala, Xiao ’12] ◮ Sum-of-squares lower bounds [Barak, Hopkins, Kelner, Kothari, Moitra, Potechin ’16] 6 / 27

How to Show that a Problem is Hard? We don’t know how to prove that average-case problems are hard, but various forms of “rigorous evidence”: ◮ Reductions (e.g. from planted clique) [Berthet, Rigollet ’13; Brennan, Bresler,...] ◮ Failure of MCMC [Jerrum ’92] ◮ Shattering of solution space [Achlioptas, Coja-Oghlan ’08] ◮ Failure of local algorithms [Gamarnik, Sudan ’13] ◮ Statistical physics, belief propagation [Decelle, Krzakala, Moore, Zdeborov´ a ’11] ◮ Optimization landscape, Kac-Rice formula [Auffinger, Ben Arous, ˇ Cern´ y ’10] ◮ Statistical query lower bounds [Feldman, Grigorescu, Reyzin, Vempala, Xiao ’12] ◮ Sum-of-squares lower bounds [Barak, Hopkins, Kelner, Kothari, Moitra, Potechin ’16] ◮ This talk: “low-degree method” [Barak, Hopkins, Kelner, Kothari, Moitra, Potechin ’16; Hopkins, Steurer ’17; Hopkins, Kothari, Potechin, Raghavendra, Schramm, Steurer ’17; Hopkins ’18 (PhD thesis)] 6 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) Suppose we want to hypothesis test with error probability o (1) between two distributions: 7 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) Suppose we want to hypothesis test with error probability o (1) between two distributions: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) 7 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) Suppose we want to hypothesis test with error probability o (1) between two distributions: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } 7 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) Suppose we want to hypothesis test with error probability o (1) between two distributions: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D (multivariate) polynomial f : R n × n → R that distinguishes P from Q : 7 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) Suppose we want to hypothesis test with error probability o (1) between two distributions: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D (multivariate) polynomial f : R n × n → R that distinguishes P from Q : Want f ( Y ) to be big when Y ∼ P and small when Y ∼ Q 7 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) Suppose we want to hypothesis test with error probability o (1) between two distributions: ◮ Null model Y ∼ Q n e.g. G ( n , 1 / 2) ◮ Planted model Y ∼ P n e.g. G ( n , 1 / 2) ∪ { random k -clique } Look for a degree- D (multivariate) polynomial f : R n × n → R that distinguishes P from Q : Want f ( Y ) to be big when Y ∼ P and small when Y ∼ Q E Y ∼ P [ f ( Y )] mean in P Compute max � E Y ∼ Q [ f ( Y ) 2 ] fluctuations in Q f deg D 7 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 8 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � E Y ∼ Q [ f ( Y ) 2 ] f deg D � � f � = � f , f � 8 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � E Y ∼ Q [ f ( Y ) 2 ] f deg D � E Y ∼ Q [ L ( Y ) f ( Y )] � f � = � f , f � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Likelihood ratio: L ( Y ) = d P d Q ( Y ) 8 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � E Y ∼ Q [ f ( Y ) 2 ] f deg D � E Y ∼ Q [ L ( Y ) f ( Y )] � f � = � f , f � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D � L , f � Likelihood ratio: = max L ( Y ) = d P � f � d Q ( Y ) f deg D 8 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � E Y ∼ Q [ f ( Y ) 2 ] f deg D � E Y ∼ Q [ L ( Y ) f ( Y )] � f � = � f , f � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D � L , f � Likelihood ratio: = max L ( Y ) = d P � f � d Q ( Y ) f deg D Maximizer: f = L ≤ D := projection of L onto degree- D subspace 8 / 27

The Low-Degree Method (e.g. [Hopkins, Steurer ’17] ) E Y ∼ P [ f ( Y )] max � f , g � = E Y ∼ Q [ f ( Y ) g ( Y )] � E Y ∼ Q [ f ( Y ) 2 ] f deg D � E Y ∼ Q [ L ( Y ) f ( Y )] � f � = � f , f � = max � E Y ∼ Q [ f ( Y ) 2 ] f deg D � L , f � Likelihood ratio: = max L ( Y ) = d P � f � d Q ( Y ) f deg D = � L ≤ D � Maximizer: f = L ≤ D := projection of L onto degree- D subspace Norm of low-degree likelihood ratio 8 / 27

The Low-Degree Method E Y ∼ P [ f ( Y )] = � L ≤ D � Conclusion: max � E Y ∼ Q [ f ( Y ) 2 ] f deg D 9 / 27

The Low-Degree Method E Y ∼ P [ f ( Y )] = � L ≤ D � Conclusion: max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail 9 / 27

The Low-Degree Method E Y ∼ P [ f ( Y )] = � L ≤ D � Conclusion: max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Conjecture (informal variant of [Hopkins ’18] ) For “nice” Q , P , if � L ≤ D � = O (1) for some D = ω (log n ) then no polynomial-time algorithm can distinguish Q , P with success probability 1 − o (1) . 9 / 27

The Low-Degree Method E Y ∼ P [ f ( Y )] = � L ≤ D � Conclusion: max � E Y ∼ Q [ f ( Y ) 2 ] f deg D Heuristically, � ω (1) degree- D polynomial can distinguish Q , P � L ≤ D � = O (1) degree- D polynomials fail Conjecture (informal variant of [Hopkins ’18] ) For “nice” Q , P , if � L ≤ D � = O (1) for some D = ω (log n ) then no polynomial-time algorithm can distinguish Q , P with success probability 1 − o (1) . Degree- O (log n ) polynomials ⇔ Polynomial-time algorithms 9 / 27

Formal Consequences of the Low-Degree Method The case D = ∞ : If � L � = O (1) (as n → ∞ ) then no test can distinguish Q from P (with success probability 1 − o (1)) ◮ Classical second moment method 10 / 27

Formal Consequences of the Low-Degree Method The case D = ∞ : If � L � = O (1) (as n → ∞ ) then no test can distinguish Q from P (with success probability 1 − o (1)) ◮ Classical second moment method If � L ≤ D � = O (1) for some D = ω (log n ) then no spectral method can distinguish Q from P (in a particular sense) [Kunisky, W, Bandeira ’19] ◮ Spectral method: threshold top eigenvalue of poly-size matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y 10 / 27

Formal Consequences of the Low-Degree Method The case D = ∞ : If � L � = O (1) (as n → ∞ ) then no test can distinguish Q from P (with success probability 1 − o (1)) ◮ Classical second moment method If � L ≤ D � = O (1) for some D = ω (log n ) then no spectral method can distinguish Q from P (in a particular sense) [Kunisky, W, Bandeira ’19] ◮ Spectral method: threshold top eigenvalue of poly-size matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y ◮ Proof: consider polynomial f ( Y ) = Tr ( M q ) with q = Θ(log n ) 10 / 27

Formal Consequences of the Low-Degree Method The case D = ∞ : If � L � = O (1) (as n → ∞ ) then no test can distinguish Q from P (with success probability 1 − o (1)) ◮ Classical second moment method If � L ≤ D � = O (1) for some D = ω (log n ) then no spectral method can distinguish Q from P (in a particular sense) [Kunisky, W, Bandeira ’19] ◮ Spectral method: threshold top eigenvalue of poly-size matrix M = M ( Y ) whose entries are O (1)-degree polynomials in Y ◮ Proof: consider polynomial f ( Y ) = Tr ( M q ) with q = Θ(log n ) ◮ Spectral methods are believed to be as powerful as sum-of-squares for average-case problems [HKPRSS ’17] 10 / 27

Low-Degree Method: Recap Given a hypothesis testing question Q n vs P n 11 / 27

Low-Degree Method: Recap Given a hypothesis testing question Q n vs P n Take D ≈ log n 11 / 27

Low-Degree Method: Recap Given a hypothesis testing question Q n vs P n Take D ≈ log n Compute/bound � L ≤ D � in the limit n → ∞ 11 / 27

Low-Degree Method: Recap Given a hypothesis testing question Q n vs P n Take D ≈ log n Compute/bound � L ≤ D � in the limit n → ∞ ◮ If � L ≤ D � = ω (1), suggests that the problem is poly-time solvable 11 / 27

Low-Degree Method: Recap Given a hypothesis testing question Q n vs P n Take D ≈ log n Compute/bound � L ≤ D � in the limit n → ∞ ◮ If � L ≤ D � = ω (1), suggests that the problem is poly-time solvable ◮ If � L ≤ D � = O (1), suggests that the problem is NOT poly-time solvable (and gives rigorous evidence: spectral methods fail) 11 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems 12 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems ◮ Predictions seem “correct”! 12 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems ◮ Predictions seem “correct”! ◮ Planted clique, sparse PCA, stochastic block model, ... 12 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems ◮ Predictions seem “correct”! ◮ Planted clique, sparse PCA, stochastic block model, ... ◮ (Relatively) simple 12 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems ◮ Predictions seem “correct”! ◮ Planted clique, sparse PCA, stochastic block model, ... ◮ (Relatively) simple ◮ Much simpler than sum-of-squares lower bounds 12 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems ◮ Predictions seem “correct”! ◮ Planted clique, sparse PCA, stochastic block model, ... ◮ (Relatively) simple ◮ Much simpler than sum-of-squares lower bounds ◮ Detection vs certification 12 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems ◮ Predictions seem “correct”! ◮ Planted clique, sparse PCA, stochastic block model, ... ◮ (Relatively) simple ◮ Much simpler than sum-of-squares lower bounds ◮ Detection vs certification ◮ General: no assumptions on Q , P 12 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems ◮ Predictions seem “correct”! ◮ Planted clique, sparse PCA, stochastic block model, ... ◮ (Relatively) simple ◮ Much simpler than sum-of-squares lower bounds ◮ Detection vs certification ◮ General: no assumptions on Q , P ◮ Captures sharp thresholds [Hopkins, Steurer ’17] 12 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems ◮ Predictions seem “correct”! ◮ Planted clique, sparse PCA, stochastic block model, ... ◮ (Relatively) simple ◮ Much simpler than sum-of-squares lower bounds ◮ Detection vs certification ◮ General: no assumptions on Q , P ◮ Captures sharp thresholds [Hopkins, Steurer ’17] ◮ By varying degree D , can explore runtimes other than polynomial ◮ Conjecture (Hopkins ’18): degree- D polynomials ⇔ Θ( D ) algorithms time- n ˜ 12 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems ◮ Predictions seem “correct”! ◮ Planted clique, sparse PCA, stochastic block model, ... ◮ (Relatively) simple ◮ Much simpler than sum-of-squares lower bounds ◮ Detection vs certification ◮ General: no assumptions on Q , P ◮ Captures sharp thresholds [Hopkins, Steurer ’17] ◮ By varying degree D , can explore runtimes other than polynomial ◮ Conjecture (Hopkins ’18): degree- D polynomials ⇔ Θ( D ) algorithms time- n ˜ ◮ No ingenuity required 12 / 27

Advantages of the Low-Degree Method ◮ Possible to calculate/bound � L ≤ D � for many problems ◮ Predictions seem “correct”! ◮ Planted clique, sparse PCA, stochastic block model, ... ◮ (Relatively) simple ◮ Much simpler than sum-of-squares lower bounds ◮ Detection vs certification ◮ General: no assumptions on Q , P ◮ Captures sharp thresholds [Hopkins, Steurer ’17] ◮ By varying degree D , can explore runtimes other than polynomial ◮ Conjecture (Hopkins ’18): degree- D polynomials ⇔ Θ( D ) algorithms time- n ˜ ◮ No ingenuity required ◮ Interpretable 12 / 27

How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) 13 / 27

How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) L ( Y ) = d P = E X exp( � Y , X �− 1 2 � X � 2 ) exp( − 1 2 � Y � 2 ) 13 / 27

How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) L ( Y ) = d P = E X exp( � Y , X �− 1 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Expand L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) 13 / 27

How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) L ( Y ) = d P = E X exp( � Y , X �− 1 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Expand L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) � L ≤ D � 2 = � | α |≤ D c 2 α where c α = � L , h α � = E Y ∼ Q [ L ( Y ) h α ( Y )] 13 / 27

How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) L ( Y ) = d P = E X exp( � Y , X �− 1 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Expand L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) � L ≤ D � 2 = � | α |≤ D c 2 α where c α = � L , h α � = E Y ∼ Q [ L ( Y ) h α ( Y )] · · · 13 / 27

How to Compute � L ≤ D � Additive Gaussian noise: P : Y = X + Z vs Q : Y = Z where X ∼ P , any distribution over R N and Z is i.i.d. N (0 , 1) d Q ( Y ) = E X exp( − 1 2 � Y − X � 2 ) L ( Y ) = d P = E X exp( � Y , X �− 1 2 � X � 2 ) exp( − 1 2 � Y � 2 ) Expand L = � α c α h α where { h α } are Hermite polynomials (orthonormal basis w.r.t. Q ) � L ≤ D � 2 = � | α |≤ D c 2 α where c α = � L , h α � = E Y ∼ Q [ L ( Y ) h α ( Y )] · · · D 1 Result: � L ≤ D � 2 = � d ! E X , X ′ [ � X , X ′ � d ] d =0 13 / 27

References For more on the low-degree method... ◮ Samuel B. Hopkins, PhD thesis ’18: “Statistical Inference and the Sum of Squares Method” ◮ Connection to SoS ◮ Survey article: Kunisky, W, Bandeira, “Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio”, arxiv:1907.11636 14 / 27

Part II: Sparse PCA Based on: Ding, Kunisky, W., Bandeira, “Subexponential-Time Algorithms for Sparse PCA”, arxiv:1907.11635 15 / 27

Spiked Wigner Model Observe n × n matrix Y = λ xx T + W Signal: x ∈ R n , � x � = 1 Noise: W ∈ R n × n with entries W ij = W ji ∼ N (0 , 1 / n ) i.i.d. λ > 0: signal-to-noise ratio 16 / 27

Spiked Wigner Model Observe n × n matrix Y = λ xx T + W Signal: x ∈ R n , � x � = 1 Noise: W ∈ R n × n with entries W ij = W ji ∼ N (0 , 1 / n ) i.i.d. λ > 0: signal-to-noise ratio Goal: given Y , estimate the signal x 16 / 27

Spiked Wigner Model Observe n × n matrix Y = λ xx T + W Signal: x ∈ R n , � x � = 1 Noise: W ∈ R n × n with entries W ij = W ji ∼ N (0 , 1 / n ) i.i.d. λ > 0: signal-to-noise ratio Goal: given Y , estimate the signal x Or, even simpler: distinguish (w.h.p.) Y from pure noise W 16 / 27

Spiked Wigner Model Observe n × n matrix Y = λ xx T + W Signal: x ∈ R n , � x � = 1 Noise: W ∈ R n × n with entries W ij = W ji ∼ N (0 , 1 / n ) i.i.d. λ > 0: signal-to-noise ratio Goal: given Y , estimate the signal x Or, even simpler: distinguish (w.h.p.) Y from pure noise W Structure: suppose x is drawn from some prior, e.g. 16 / 27

Spiked Wigner Model Observe n × n matrix Y = λ xx T + W Signal: x ∈ R n , � x � = 1 Noise: W ∈ R n × n with entries W ij = W ji ∼ N (0 , 1 / n ) i.i.d. λ > 0: signal-to-noise ratio Goal: given Y , estimate the signal x Or, even simpler: distinguish (w.h.p.) Y from pure noise W Structure: suppose x is drawn from some prior, e.g. ◮ spherical (uniform on unit sphere) 16 / 27

Spiked Wigner Model Observe n × n matrix Y = λ xx T + W Signal: x ∈ R n , � x � = 1 Noise: W ∈ R n × n with entries W ij = W ji ∼ N (0 , 1 / n ) i.i.d. λ > 0: signal-to-noise ratio Goal: given Y , estimate the signal x Or, even simpler: distinguish (w.h.p.) Y from pure noise W Structure: suppose x is drawn from some prior, e.g. ◮ spherical (uniform on unit sphere) ◮ Rademacher (i.i.d. ± 1 / √ n ) 16 / 27

Spiked Wigner Model Observe n × n matrix Y = λ xx T + W Signal: x ∈ R n , � x � = 1 Noise: W ∈ R n × n with entries W ij = W ji ∼ N (0 , 1 / n ) i.i.d. λ > 0: signal-to-noise ratio Goal: given Y , estimate the signal x Or, even simpler: distinguish (w.h.p.) Y from pure noise W Structure: suppose x is drawn from some prior, e.g. ◮ spherical (uniform on unit sphere) ◮ Rademacher (i.i.d. ± 1 / √ n ) ◮ sparse 16 / 27

PCA (Principal Component Analysis) Y = λ xx T + W J. Baik, G. Ben Arous, S. Peche, AoP 2005. D. Feral, S. Peche, CMP 2006. 17 / 27

PCA (Principal Component Analysis) Y = λ xx T + W PCA: top eigenvalue λ 1 ( Y ) and (unit-norm) eigenvector v 1 J. Baik, G. Ben Arous, S. Peche, AoP 2005. D. Feral, S. Peche, CMP 2006. 17 / 27

PCA (Principal Component Analysis) Y = λ xx T + W PCA: top eigenvalue λ 1 ( Y ) and (unit-norm) eigenvector v 1 Theorem (BBP’05, FP’06) Almost surely, as n → ∞ , J. Baik, G. Ben Arous, S. Peche, AoP 2005. D. Feral, S. Peche, CMP 2006. 17 / 27

PCA (Principal Component Analysis) Y = λ xx T + W PCA: top eigenvalue λ 1 ( Y ) and (unit-norm) eigenvector v 1 Theorem (BBP’05, FP’06) Almost surely, as n → ∞ , ◮ If λ ≤ 1 : λ 1 ( Y ) → 2 and � x , v 1 � → 0 J. Baik, G. Ben Arous, S. Peche, AoP 2005. D. Feral, S. Peche, CMP 2006. 17 / 27

PCA (Principal Component Analysis) Y = λ xx T + W PCA: top eigenvalue λ 1 ( Y ) and (unit-norm) eigenvector v 1 Theorem (BBP’05, FP’06) Almost surely, as n → ∞ , ◮ If λ ≤ 1 : λ 1 ( Y ) → 2 and � x , v 1 � → 0 λ > 2 and � x , v 1 � 2 → 1 − 1 /λ 2 > 0 ◮ If λ > 1 : λ 1 ( Y ) → λ + 1 J. Baik, G. Ben Arous, S. Peche, AoP 2005. D. Feral, S. Peche, CMP 2006. 17 / 27

PCA (Principal Component Analysis) Y = λ xx T + W PCA: top eigenvalue λ 1 ( Y ) and (unit-norm) eigenvector v 1 Theorem (BBP’05, FP’06) Almost surely, as n → ∞ , ◮ If λ ≤ 1 : λ 1 ( Y ) → 2 and � x , v 1 � → 0 λ > 2 and � x , v 1 � 2 → 1 − 1 /λ 2 > 0 ◮ If λ > 1 : λ 1 ( Y ) → λ + 1 Sharp threshold: PCA can detect and recover the signal iff λ > 1 J. Baik, G. Ben Arous, S. Peche, AoP 2005. D. Feral, S. Peche, CMP 2006. 17 / 27

Is PCA Optimal? 18 / 27

Is PCA Optimal? PCA does not exploit structure of signal x 18 / 27

Is PCA Optimal? PCA does not exploit structure of signal x Is the PCA threshold ( λ = 1) optimal? ◮ Is it statistically possible to detect/recover when λ < 1? 18 / 27

Understanding Statistical-vs-Computational Tradeoffs via the - PowerPoint PPT Presentation

Understanding Statistical-vs-Computational Tradeoffs via the Low-Degree Likelihood Ratio Alex Wein Courant Institute, NYU Joint work with: Afonso Bandeira Yunzi Ding Tim Kunisky (ETH Zurich) (NYU) (NYU) 1 / 27 Motivation 2 / 27

REDD+ within the WEL nexus Opportunities and tradeoffs Kristy Graham May 2011 Outline What

Space/time tradeoffs; dynamic programming; y g g transform and conquer 1. Space/time

Storage Tradeoffs in a Collaborative Backup Service for Mobile Devices Ludovic Courts,

Area and Time Tradeoffs in FPGAs Examining the concept of area/time tradeoffs in FPGA design,

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Tradeoffs in Infinite Games Martin Zimmermann Saarland University May 15th, 2018 Scientific

Design Tradeoffs in Query Processing and Online Architectures T. Yang 293S 2017 Content

AUTOMATIC TRADEOFFS: ACCURACY AND ENERGY Stephanie Forrest

Quantum Time-Space Tradeoffs for Deciding Systems of Linear Inequalities Robert Spalek

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

Brief introduction to computational & statistical neuroscience Jonathan Pillow Lecture #1

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Understanding Tradeoffs for Scalability Steve Vinoski Architect, Basho Technologies Cambridge,

CS1063: Understanding CS1063: Understanding CS1063: Understanding CS1063: Understanding

Dynamic Expressivity with Static Optimization for Streaming Languages Robert Soul Michael I.

Comparing the Yosemite Project and ONC Roadmaps for Healthcare Information Interoperability

Cach: Caching Location-Enhanced Content to Improve User Privacy Carnegie Mellon University

Chapter 11. Network Community Detection Wei Pan Division of Biostatistics, School of Public

RECON 2010 - Montreal Metasm Tracer MSR NIC Plan Metasm 1 Tracer 2 MSR 3 NIC 4 A.

A GENERALIZATION OF SYMANZIK POLYNOMIALS MASTER THESIS, UNDER THE SUPERVISION OF OMID AMINI

Maximum Entropy Approach for Reconstructing Bivariate Probability Distributions Zahra Amini

Quantum feedback for preparation and protection of quantum states of light I gor Dotsenko

Understanding Statistical-vs-Computational Tradeoffs via the - PowerPoint PPT Presentation

Understanding Statistical-vs-Computational Tradeoffs via the Low-Degree Likelihood Ratio Alex Wein Courant Institute, NYU Joint work with: Afonso Bandeira Yunzi Ding Tim Kunisky (ETH Zurich) (NYU) (NYU) 1 / 27 Motivation 2 / 27

REDD+ within the WEL nexus Opportunities and tradeoffs Kristy Graham May 2011 Outline What

Space/time tradeoffs; dynamic programming; y g g transform and conquer 1. Space/time

Storage Tradeoffs in a Collaborative Backup Service for Mobile Devices Ludovic Courts,

Area and Time Tradeoffs in FPGAs Examining the concept of area/time tradeoffs in FPGA design,

Statistical Statistical Statistical Model Statistical Model Model Checking Model Checking

Tradeoffs in Infinite Games Martin Zimmermann Saarland University May 15th, 2018 Scientific

Design Tradeoffs in Query Processing and Online Architectures T. Yang 293S 2017 Content

AUTOMATIC TRADEOFFS: ACCURACY AND ENERGY Stephanie Forrest

Quantum Time-Space Tradeoffs for Deciding Systems of Linear Inequalities Robert Spalek

Tradeoffs in XML Database Compression James Cheney University of Edinburgh Data Compression

Brief introduction to computational &amp; statistical neuroscience Jonathan Pillow Lecture #1

Statistical and Computational Statistical and Computational Learning Theory Learning Theory

UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING (LMOU) LOCAL MEMORANDUM OF UNDERSTANDING

Statistical graphics with Statistical graphics with ggplot2 ggplot2 Programming for Statistical

Understanding Tradeoffs for Scalability Steve Vinoski Architect, Basho Technologies Cambridge,

CS1063: Understanding CS1063: Understanding CS1063: Understanding CS1063: Understanding

Dynamic Expressivity with Static Optimization for Streaming Languages Robert Soul Michael I.

Comparing the Yosemite Project and ONC Roadmaps for Healthcare Information Interoperability

Cach: Caching Location-Enhanced Content to Improve User Privacy Carnegie Mellon University

Chapter 11. Network Community Detection Wei Pan Division of Biostatistics, School of Public

RECON 2010 - Montreal Metasm Tracer MSR NIC Plan Metasm 1 Tracer 2 MSR 3 NIC 4 A.

A GENERALIZATION OF SYMANZIK POLYNOMIALS MASTER THESIS, UNDER THE SUPERVISION OF OMID AMINI

Maximum Entropy Approach for Reconstructing Bivariate Probability Distributions Zahra Amini

Quantum feedback for preparation and protection of quantum states of light I gor Dotsenko

Brief introduction to computational & statistical neuroscience Jonathan Pillow Lecture #1