high dimensional statistics some progress and challenges
play

High-dimensional statistics: Some progress and challenges ahead - PowerPoint PPT Presentation

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture 2 Joint work with: Alekh Agarwal, Arash Amini, Po-Ling Loh,


  1. High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture 2 Joint work with: Alekh Agarwal, Arash Amini, Po-Ling Loh, Sahand Negahban, Garvesh Raskutti, Pradeep Ravikumar, Bin Yu.

  2. High-level overview Last lecture: least-squares loss and ℓ 1 -regularization. The big picture: Lots of other estimators with same basic form: � � � L ( θ ; Z n θ λ n ∈ arg min 1 ) + λ n R ( θ ) . ���� � �� � � �� � θ ∈ Ω Estimate Loss function Regularizer Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 2 / 29

  3. High-level overview Last lecture: least-squares loss and ℓ 1 -regularization. The big picture: Lots of other estimators with same basic form: � � � L ( θ ; Z n θ λ n ∈ arg min 1 ) + λ n R ( θ ) . ���� � �� � � �� � θ ∈ Ω Estimate Loss function Regularizer Past years have witnessed an explosion of results (compressed sensing, covariance estimation, block-sparsity, graphical models, matrix completion...) Question: Is there a common set of underlying principles? Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 2 / 29

  4. Last lecture: Sparse linear regression θ ∗ X y w S = + n n × p S c Set-up: noisy observations y = Xθ ∗ + w with sparse θ ∗ Estimator: Lasso program n p � � 1 i θ ) 2 + λ n � ( y i − x T θ ∈ arg min | θ j | n θ i =1 j =1

  5. Block-structured extension Θ ∗ Y X W = + S n n n × p p S c r r r Signal Θ ∗ is a p × r matrix: partitioned into non-zero rows S and zero rows S c Various applications: multiple-view imaging, gene array prediction, graphical model fitting.

  6. Block-structured extension Θ ∗ Y X W = + S n n n × p p S c r r r Row-wise ℓ 1 /ℓ 2 -norm p � | | | Θ | | | 1 , 2 = � Θ j � 2 j =1

  7. Block-structured extension Θ ∗ Y X W = + S n n n × p p S c r r r Row-wise ℓ 1 /ℓ 2 -norm p � | | | Θ | | | 1 , 2 = � Θ j � 2 j =1 More complicated group structure: (Obozinski et al., 2009) � | Θ ∗ | | | | | G = � Θ g � 2 g ∈G

  8. Example: Low-rank matrix approximation V T U D Θ ∗ r × p 2 r × r p 1 × r p 1 × p 2 Set-up: Matrix Θ ∗ ∈ R p 1 × p 2 with rank r ≪ min { p 1 , p 2 } . Estimator: � 1 � min { p 1 ,p 2 } n � � � ) 2 + λ n � Θ ∈ arg min ( y i − � � X i , Θ � σ j (Θ) n Θ i =1 j =1 Some past work: Fazel, 2001; Srebro et al., 2004; Recht, Fazel & Parillo, 2007; Bach, 2008; Candes & Recht, 2008; Keshavan et al., 2009; Rohde & Tsybakov, 2010; Recht, 2009; Negahban & W., 2010 ...

  9. Application: Collaborative filtering           . . . . . .           4 ∗ 3 . . . . . . ∗               3 5 ∗ . . . . . . 2               5 4 3 . . . . . . 3         2 ∗ ∗ . . . . . . 1 Universe of p 1 individuals and p 2 films Observe n ≪ p 2 p 2 ratings (e.g., Srebro, Alon & Jaakkola, 2004)

  10. Security and robustness issues Spiritual guide Break-down of Amazon recommendation system, 2002.

  11. Security and robustness issues Spiritual guide Sex manual Break-down of Amazon recommendation system, 2002.

  12. Matrix decomposition: Low-rank plus sparse Matrix Y can be (approximately) decomposed into sum: V T U D Y r × p 2 r × r + ≈ p 1 × p 2 p 1 × r Θ ∗ Γ ∗ Y = + ���� ���� Low-rank component Sparse component

  13. Matrix decomposition: Low-rank plus sparse Matrix Y can be (approximately) decomposed into sum: V T U D Y r × p 2 r × r + ≈ p 1 × p 2 p 1 × r Θ ∗ Γ ∗ Y = + ���� ���� Low-rank component Sparse component exact decomposition: initially studied by Chandrasekaran, Sanghavi, Parillo & Willsky, 2009 subsequent work: Candes et al., 2010; Xu et al., 2010 Hsu et al., 2010; Agarwal et al., 2011 Various applications: ◮ robust collaborative filtering ◮ robust PCA ◮ graphical model selection with hidden variables

  14. Gauss-Markov models with hidden variables Z X 1 X 2 X 3 X 4 Problems with hidden variables: conditioned on hidden Z , vector X = ( X 1 , X 2 , X 3 , X 4 ) is Gauss-Markov.

  15. Gauss-Markov models with hidden variables Z X 1 X 2 X 3 X 4 Problems with hidden variables: conditioned on hidden Z , vector X = ( X 1 , X 2 , X 3 , X 4 ) is Gauss-Markov. Inverse covariance of X satisfies { sparse, low-rank } decomposition:   1 − µ µ µ µ   µ 1 − µ µ µ  = I 4 × 4 − µ 11 T .    µ µ 1 − µ µ µ µ µ 1 − µ (Chandrasekaran, Parrilo & Willsky, 2010)

  16. Example: Sparse principal components analysis ���� ���� ���� ���� ���� ���� ���� ���� + = ���� ���� ���� ���� ���� ���� ���� ���� ZZ T D Σ Set-up: Covariance matrix Σ = ZZ T + D , where leading eigenspace Z has sparse columns. Estimator: � � � � � Θ , � Θ ∈ arg min −� Σ � � + λ n | Θ jk | Θ ( j,k ) Some past work: Johnstone, 2001; Joliffe et al., 2003; Johnstone & Lu, 2004; Zou et al., 2004; d’Aspr´ emont et al., 2007; Johnstone & Paul, 2008; Amini & Wainwright, 2008

  17. Motivation and roadmap many results on different high-dimensional models all based on estimators of the type: � � � L ( θ ; Z n θ λ n ∈ arg min 1 ) + λ n R ( θ ) . ���� � �� � � �� � θ ∈ Ω Estimate Loss function Regularizer Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 11 / 29

  18. Motivation and roadmap many results on different high-dimensional models all based on estimators of the type: � � � L ( θ ; Z n θ λ n ∈ arg min 1 ) + λ n R ( θ ) . ���� � �� � � �� � θ ∈ Ω Estimate Loss function Regularizer Question: Is there a common set of underlying principles? Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 11 / 29

  19. Motivation and roadmap many results on different high-dimensional models all based on estimators of the type: � � � L ( θ ; Z n θ λ n ∈ arg min 1 ) + λ n R ( θ ) . ���� � �� � � �� � θ ∈ Ω Estimate Loss function Regularizer Question: Is there a common set of underlying principles? Answer: Yes, two essential ingredients. (I) Restricted strong convexity of loss function (II) Decomposability of the regularizer Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 11 / 29

  20. (I) Role of curvature 1 Curvature controls difficulty of estimation: δ L δ L ∆ ∆ � � θ θ θ θ High curvature: easy to estimate (b) Low curvature: harder

  21. (I) Role of curvature 1 Curvature controls difficulty of estimation: δ L δ L ∆ ∆ � � θ θ θ θ High curvature: easy to estimate (b) Low curvature: harder 2 captured by lower bound on Taylor series error T L (∆; θ ∗ ) L ( θ ∗ + ∆) − L ( θ ∗ ) − �∇L ( θ ∗ ) , ∆ � ≥ γ 2 � ∆ � 2 � �� � T L (∆ ,θ ∗ ) for all ∆ around θ ∗ .

  22. High dimensions: no strong convexity! 1 0.8 0.6 0.4 0.2 0 1 0.5 1 0.5 0 0 −0.5 −0.5 −1 −1 When p > n , the Hessian ∇ 2 L ( θ ; Z n 1 ) has nullspace of dimension p − n .

  23. Restricted strong convexity Definition Loss function L n satisfies restricted strong convexity (RSC) with respect to regularizer R if � � L n ( θ ∗ + ∆) − L n ( θ ∗ ) + �∇L n ( θ ∗ ) , ∆ � γ 2 ℓ � ∆ � 2 − τ 2 ℓ R 2 (∆) ≥ e � �� � � �� � � �� � Lower curvature Tolerance Taylor error T L (∆; θ ∗ ) for all ∆ in a suitable neighborhood of θ ∗ .

  24. Restricted strong convexity Definition Loss function L n satisfies restricted strong convexity (RSC) with respect to regularizer R if � � L n ( θ ∗ + ∆) − L n ( θ ∗ ) + �∇L n ( θ ∗ ) , ∆ � γ 2 ℓ � ∆ � 2 − τ 2 ℓ R 2 (∆) ≥ e � �� � � �� � � �� � Lower curvature Tolerance Taylor error T L (∆; θ ∗ ) for all ∆ in a suitable neighborhood of θ ∗ . ordinary strong convexity: ◮ special case with tolerance τ ℓ = 0 ◮ does not hold for most loss functions when p > n

  25. Restricted strong convexity Definition Loss function L n satisfies restricted strong convexity (RSC) with respect to regularizer R if � � L n ( θ ∗ + ∆) − L n ( θ ∗ ) + �∇L n ( θ ∗ ) , ∆ � γ 2 ℓ � ∆ � 2 − τ 2 ℓ R 2 (∆) ≥ e � �� � � �� � � �� � Lower curvature Tolerance Taylor error T L (∆; θ ∗ ) for all ∆ in a suitable neighborhood of θ ∗ . ordinary strong convexity: ◮ special case with tolerance τ ℓ = 0 ◮ does not hold for most loss functions when p > n RSC enforces a lower bound on curvature, but only when R 2 (∆) ≪ � ∆ � 2 e

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend