High-dimensional statistics: Some progress and challenges ahead - PowerPoint PPT Presentation

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture 2 Joint work with: Alekh Agarwal, Arash Amini, Po-Ling Loh, Sahand Negahban, Garvesh Raskutti, Pradeep Ravikumar, Bin Yu.

High-level overview Last lecture: least-squares loss and ℓ 1 -regularization. The big picture: Lots of other estimators with same basic form: � � � L ( θ ; Z n θ λ n ∈ arg min 1 ) + λ n R ( θ ) . �� θ ∈ Ω Estimate Loss function Regularizer Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 2 / 29

High-level overview Last lecture: least-squares loss and ℓ 1 -regularization. The big picture: Lots of other estimators with same basic form: � � � L ( θ ; Z n θ λ n ∈ arg min 1 ) + λ n R ( θ ) . �� θ ∈ Ω Estimate Loss function Regularizer Past years have witnessed an explosion of results (compressed sensing, covariance estimation, block-sparsity, graphical models, matrix completion...) Question: Is there a common set of underlying principles? Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 2 / 29

Last lecture: Sparse linear regression θ ∗ X y w S = + n n × p S c Set-up: noisy observations y = Xθ ∗ + w with sparse θ ∗ Estimator: Lasso program n p � � 1 i θ ) 2 + λ n � ( y i − x T θ ∈ arg min | θ j | n θ i =1 j =1

Block-structured extension Θ ∗ Y X W = + S n n n × p p S c r r r Signal Θ ∗ is a p × r matrix: partitioned into non-zero rows S and zero rows S c Various applications: multiple-view imaging, gene array prediction, graphical model fitting.

Block-structured extension Θ ∗ Y X W = + S n n n × p p S c r r r Row-wise ℓ 1 /ℓ 2 -norm p � | | | Θ | | | 1 , 2 = � Θ j � 2 j =1

Example: Low-rank matrix approximation V T U D Θ ∗ r × p 2 r × r p 1 × r p 1 × p 2 Set-up: Matrix Θ ∗ ∈ R p 1 × p 2 with rank r ≪ min { p 1 , p 2 } . Estimator: � 1 � min { p 1 ,p 2 } n � � � ) 2 + λ n � Θ ∈ arg min ( y i − � � X i , Θ � σ j (Θ) n Θ i =1 j =1 Some past work: Fazel, 2001; Srebro et al., 2004; Recht, Fazel & Parillo, 2007; Bach, 2008; Candes & Recht, 2008; Keshavan et al., 2009; Rohde & Tsybakov, 2010; Recht, 2009; Negahban & W., 2010 ...

Application: Collaborative filtering           . . . . . .           4 ∗ 3 . . . . . . ∗               3 5 ∗ . . . . . . 2               5 4 3 . . . . . . 3         2 ∗ ∗ . . . . . . 1 Universe of p 1 individuals and p 2 films Observe n ≪ p 2 p 2 ratings (e.g., Srebro, Alon & Jaakkola, 2004)

Security and robustness issues Spiritual guide Break-down of Amazon recommendation system, 2002.

Security and robustness issues Spiritual guide Sex manual Break-down of Amazon recommendation system, 2002.

Matrix decomposition: Low-rank plus sparse Matrix Y can be (approximately) decomposed into sum: V T U D Y r × p 2 r × r + ≈ p 1 × p 2 p 1 × r Θ ∗ Γ ∗ Y = + �� Low-rank component Sparse component

Matrix decomposition: Low-rank plus sparse Matrix Y can be (approximately) decomposed into sum: V T U D Y r × p 2 r × r + ≈ p 1 × p 2 p 1 × r Θ ∗ Γ ∗ Y = + �� Low-rank component Sparse component exact decomposition: initially studied by Chandrasekaran, Sanghavi, Parillo & Willsky, 2009 subsequent work: Candes et al., 2010; Xu et al., 2010 Hsu et al., 2010; Agarwal et al., 2011 Various applications: ◮ robust collaborative filtering ◮ robust PCA ◮ graphical model selection with hidden variables

Gauss-Markov models with hidden variables Z X 1 X 2 X 3 X 4 Problems with hidden variables: conditioned on hidden Z , vector X = ( X 1 , X 2 , X 3 , X 4 ) is Gauss-Markov.

Gauss-Markov models with hidden variables Z X 1 X 2 X 3 X 4 Problems with hidden variables: conditioned on hidden Z , vector X = ( X 1 , X 2 , X 3 , X 4 ) is Gauss-Markov. Inverse covariance of X satisfies { sparse, low-rank } decomposition:   1 − µ µ µ µ   µ 1 − µ µ µ  = I 4 × 4 − µ 11 T .    µ µ 1 − µ µ µ µ µ 1 − µ (Chandrasekaran, Parrilo & Willsky, 2010)

Example: Sparse principal components analysis �� + = �� ZZ T D Σ Set-up: Covariance matrix Σ = ZZ T + D , where leading eigenspace Z has sparse columns. Estimator: � � � � � Θ , � Θ ∈ arg min −� Σ � � + λ n | Θ jk | Θ ( j,k ) Some past work: Johnstone, 2001; Joliffe et al., 2003; Johnstone & Lu, 2004; Zou et al., 2004; d’Aspr´ emont et al., 2007; Johnstone & Paul, 2008; Amini & Wainwright, 2008

Motivation and roadmap many results on different high-dimensional models all based on estimators of the type: � � � L ( θ ; Z n θ λ n ∈ arg min 1 ) + λ n R ( θ ) . �� θ ∈ Ω Estimate Loss function Regularizer Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 11 / 29

Motivation and roadmap many results on different high-dimensional models all based on estimators of the type: � � � L ( θ ; Z n θ λ n ∈ arg min 1 ) + λ n R ( θ ) . �� θ ∈ Ω Estimate Loss function Regularizer Question: Is there a common set of underlying principles? Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 11 / 29

Motivation and roadmap many results on different high-dimensional models all based on estimators of the type: � � � L ( θ ; Z n θ λ n ∈ arg min 1 ) + λ n R ( θ ) . �� θ ∈ Ω Estimate Loss function Regularizer Question: Is there a common set of underlying principles? Answer: Yes, two essential ingredients. (I) Restricted strong convexity of loss function (II) Decomposability of the regularizer Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 11 / 29

(I) Role of curvature 1 Curvature controls difficulty of estimation: δ L δ L ∆ ∆ � � θ θ θ θ High curvature: easy to estimate (b) Low curvature: harder

(I) Role of curvature 1 Curvature controls difficulty of estimation: δ L δ L ∆ ∆ � � θ θ θ θ High curvature: easy to estimate (b) Low curvature: harder 2 captured by lower bound on Taylor series error T L (∆; θ ∗ ) L ( θ ∗ + ∆) − L ( θ ∗ ) − �∇L ( θ ∗ ) , ∆ � ≥ γ 2 � ∆ � 2 � �� T L (∆ ,θ ∗ ) for all ∆ around θ ∗ .

High dimensions: no strong convexity! 1 0.8 0.6 0.4 0.2 0 1 0.5 1 0.5 0 0 −0.5 −0.5 −1 −1 When p > n , the Hessian ∇ 2 L ( θ ; Z n 1 ) has nullspace of dimension p − n .

Restricted strong convexity Definition Loss function L n satisfies restricted strong convexity (RSC) with respect to regularizer R if � � L n ( θ ∗ + ∆) − L n ( θ ∗ ) + �∇L n ( θ ∗ ) , ∆ � γ 2 ℓ � ∆ � 2 − τ 2 ℓ R 2 (∆) ≥ e � �� Lower curvature Tolerance Taylor error T L (∆; θ ∗ ) for all ∆ in a suitable neighborhood of θ ∗ .

Restricted strong convexity Definition Loss function L n satisfies restricted strong convexity (RSC) with respect to regularizer R if � � L n ( θ ∗ + ∆) − L n ( θ ∗ ) + �∇L n ( θ ∗ ) , ∆ � γ 2 ℓ � ∆ � 2 − τ 2 ℓ R 2 (∆) ≥ e � �� Lower curvature Tolerance Taylor error T L (∆; θ ∗ ) for all ∆ in a suitable neighborhood of θ ∗ . ordinary strong convexity: ◮ special case with tolerance τ ℓ = 0 ◮ does not hold for most loss functions when p > n

Restricted strong convexity Definition Loss function L n satisfies restricted strong convexity (RSC) with respect to regularizer R if � � L n ( θ ∗ + ∆) − L n ( θ ∗ ) + �∇L n ( θ ∗ ) , ∆ � γ 2 ℓ � ∆ � 2 − τ 2 ℓ R 2 (∆) ≥ e � �� Lower curvature Tolerance Taylor error T L (∆; θ ∗ ) for all ∆ in a suitable neighborhood of θ ∗ . ordinary strong convexity: ◮ special case with tolerance τ ℓ = 0 ◮ does not hold for most loss functions when p > n RSC enforces a lower bound on curvature, but only when R 2 (∆) ≪ � ∆ � 2 e

High-dimensional statistics: Some progress and challenges ahead - PowerPoint PPT Presentation

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture 2 Joint work with: Alekh Agarwal, Arash Amini, Po-Ling Loh,

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Statistics for High-Dimensional Data: Selected Topics Peter B uhlmann Seminar f ur

Mean field asymptotics in high-dimensional statistics: A few references Andrea Montanari July

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

High-dimensional statistics and probability Christophe Giraud Universit e Paris Saclay M2

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel

High Dimensional Data Alark Joshi High dimensional data Data with multiple dimensions,

Deep Neural Network Mathematical Mysteries for High Dimensional Learning Stphane Mallat

Statistics for high-dimensional data: p-values and confidence intervals Peter B uhlmann

SOCIAL PROGRESS INDEX SOCIAL SOCIAL PROGRESS PROGRESS IMPERATIVE IMPERATIVE Social Progress

Three-dimensional Radial Visualization of High-dimensional Continuous or Discrete Datasets Fan

PIPS Is not (just) Polyhedral Software Mehdi A MINI 1 , 2 Corinne A NCOURT 2 Fabien C OELHO 2

Spectral sets and derivatives of the psd cone Mario Kummer TU Berlin August 28, 2020 Mario

IV.4 Topic-Specific & Personalized PageRank PageRank produces one-size-fits-all

Estimating Sparse Principal Components and Subspaces Jing Lei Department of Statistics, CMU

Fuzzing the Phone in Your Phone Charlie MIller Collin Mulliner Independent Security Evaluators

Sparse stochastic processes and biomedical image reconstruction Michael Unser Biomedical Imaging

Multi-class to Binary reduction of Large-scale classification Problems Bikash Joshi Joint work

with Individual Atoms Christopher Monroe Univ. Maryland, JQI, QuICS, and IonQ Atomic Qubit ( 171

High-dimensional statistics: Some progress and challenges ahead - PowerPoint PPT Presentation

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture 2 Joint work with: Alekh Agarwal, Arash Amini, Po-Ling Loh,

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Statistics for High-Dimensional Data: Selected Topics Peter B uhlmann Seminar f ur

Mean field asymptotics in high-dimensional statistics: A few references Andrea Montanari July

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

High-dimensional statistics and probability Christophe Giraud Universit e Paris Saclay M2

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel

High Dimensional Data Alark Joshi High dimensional data Data with multiple dimensions,

Deep Neural Network Mathematical Mysteries for High Dimensional Learning Stphane Mallat

Statistics for high-dimensional data: p-values and confidence intervals Peter B uhlmann

SOCIAL PROGRESS INDEX SOCIAL SOCIAL PROGRESS PROGRESS IMPERATIVE IMPERATIVE Social Progress

Three-dimensional Radial Visualization of High-dimensional Continuous or Discrete Datasets Fan

PIPS Is not (just) Polyhedral Software Mehdi A MINI 1 , 2 Corinne A NCOURT 2 Fabien C OELHO 2

Spectral sets and derivatives of the psd cone Mario Kummer TU Berlin August 28, 2020 Mario

IV.4 Topic-Specific &amp; Personalized PageRank PageRank produces one-size-fits-all

Estimating Sparse Principal Components and Subspaces Jing Lei Department of Statistics, CMU

Fuzzing the Phone in Your Phone Charlie MIller Collin Mulliner Independent Security Evaluators

Sparse stochastic processes and biomedical image reconstruction Michael Unser Biomedical Imaging

Multi-class to Binary reduction of Large-scale classification Problems Bikash Joshi Joint work

with Individual Atoms Christopher Monroe Univ. Maryland, JQI, QuICS, and IonQ Atomic Qubit ( 171

IV.4 Topic-Specific & Personalized PageRank PageRank produces one-size-fits-all