High-dimensional statistics: Some progress and challenges ahead - PowerPoint PPT Presentation

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture 1 Joint work with: Alekh Agarwal, Arash Amini, Po-Ling Loh, Sahand Negahban, Garvesh Raskutti, Pradeep Ravikumar, Bin Yu.

Introduction classical asymptotic theory: sample size n → + ∞ with number of parameters p fixed ◮ law of large numbers, central limit theory ◮ consistency of maximum likelihood estimation

Introduction classical asymptotic theory: sample size n → + ∞ with number of parameters p fixed ◮ law of large numbers, central limit theory ◮ consistency of maximum likelihood estimation modern applications in science and engineering: ◮ large-scale problems: both p and n may be large (possibly p ≫ n ) ◮ need for high-dimensional theory that provides non-asymptotic results for ( n, p )

Introduction classical asymptotic theory: sample size n → + ∞ with number of parameters p fixed ◮ law of large numbers, central limit theory ◮ consistency of maximum likelihood estimation modern applications in science and engineering: ◮ large-scale problems: both p and n may be large (possibly p ≫ n ) ◮ need for high-dimensional theory that provides non-asymptotic results for ( n, p ) curses and blessings of high dimensionality ◮ exponential explosions in computational complexity ◮ statistical curses (sample complexity) ◮ concentration of measure

Introduction modern applications in science and engineering: ◮ large-scale problems: both p and n may be large (possibly p ≫ n ) ◮ need for high-dimensional theory that provides non-asymptotic results for ( n, p ) curses and blessings of high dimensionality ◮ exponential explosions in computational complexity ◮ statistical curses (sample complexity) ◮ concentration of measure Key ideas: what embedded low-dimensional structures are present in data? how can they can be exploited algorithmically?

Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ ∈ R p × p given i.i.d. samples X i ∼ N (0 , Σ), for i = 1 , 2 , . . . , n

Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ ∈ R p × p given i.i.d. samples X i ∼ N (0 , Σ), for i = 1 , 2 , . . . , n Classical approach: Estimate Σ via sample covariance matrix: n � 1 � X i X T Σ n := i n i =1 � �� average of p × p rank one matrices

Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ ∈ R p × p given i.i.d. samples X i ∼ N (0 , Σ), for i = 1 , 2 , . . . , n Classical approach: Estimate Σ via sample covariance matrix: n � 1 � X i X T Σ n := i n i =1 � �� average of p × p rank one matrices Reasonable properties: ( p fixed, n increasing) Unbiased: E [ � Σ n ] = Σ Consistent: � a.s. Σ n − → Σ as n → + ∞ Asymptotic distributional properties available

Vignette I: High-dimensional matrix estimation want to estimate a covariance matrix Σ ∈ R p × p given i.i.d. samples X i ∼ N (0 , Σ), for i = 1 , 2 , . . . , n Classical approach: Estimate Σ via sample covariance matrix: n � 1 � X i X T Σ n := i n i =1 � �� average of p × p rank one matrices An alternative experiment: Fix some α > 0 Study behavior over sequences with p n = α Does � Σ n ( p ) converge to anything reasonable?

Empirical vs MP law ( α = 0.5) 1 Theory 0.9 0.8 0.7 Density (rescaled) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 2.5 3 Eigenvalue Marcenko & Pastur, 1967.

Empirical vs MP law ( α = 0.2) 1 Theory 0.9 0.8 0.7 Density (rescaled) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 Eigenvalue Marcenko & Pastur, 1967.

Low-dimensional structure: Gaussian graphical models Zero pattern of inverse covariance 1 1 2 2 3 4 3 5 5 4 1 2 3 4 5 � � − 1 2 x T Θ ∗ x P ( x 1 , x 2 , . . . , x p ) ∝ exp . Martin Wainwright (UC Berkeley) High-dimensional statistics February, 2013 5 / 1

Maximum-likelihood with ℓ 1 -regularization Zero pattern of inverse covariance 1 1 2 2 3 4 3 5 5 4 1 2 3 4 5 Set-up: Samples from random vector with sparse covariance Σ or sparse inverse covariance Θ ∗ ∈ R p × p .

Maximum-likelihood with ℓ 1 -regularization Zero pattern of inverse covariance 1 1 2 2 3 4 3 5 5 4 1 2 3 4 5 Set-up: Samples from random vector with sparse covariance Σ or sparse inverse covariance Θ ∗ ∈ R p × p . Estimator (for inverse covariance) � � n � � � 1 � x i x T Θ ∈ arg min � i , Θ � � − log det(Θ) + λ n | Θ jk | n Θ i =1 j � = k Some past work: Yuan & Lin, 2006; d’Aspr´ emont et al., 2007; Bickel & Levina, 2007; El Karoui, 2007; d’Aspremont et al., 2007; Rothman et al., 2007; Zhou et al., 2007; Friedman et al., 2008; Lam & Fan, 2008; Ravikumar et al., 2008; Zhou, Cai & Huang, 2009

Gauss-Markov models with hidden variables Z X 1 X 2 X 3 X 4 Problems with hidden variables: conditioned on hidden Z , vector X = ( X 1 , X 2 , X 3 , X 4 ) is Gauss-Markov.

Gauss-Markov models with hidden variables Z X 1 X 2 X 3 X 4 Problems with hidden variables: conditioned on hidden Z , vector X = ( X 1 , X 2 , X 3 , X 4 ) is Gauss-Markov. Inverse covariance of X satisfies { sparse, low-rank } decomposition:   1 − µ µ µ µ   µ 1 − µ µ µ  = I 4 × 4 − µ 11 T .    µ µ 1 − µ µ µ µ µ 1 − µ (Chandrasekaran, Parrilo & Willsky, 2010)

Vignette II: High-dimensional sparse linear regression θ ∗ X y w S = + n n × p S c Set-up: noisy observations y = Xθ ∗ + w with sparse θ ∗ Estimator: Lasso program n p � � 1 i θ ) 2 + λ n � ( y i − x T θ ∈ arg min | θ j | n θ i =1 j =1 Some past work: Tibshirani, 1996; Chen et al., 1998; Donoho/Xuo, 2001; Tropp, 2004; Fuchs, 2004; Efron et al., 2004; Meinshausen & Buhlmann, 2005; Candes & Tao, 2005; Donoho, 2005; Haupt & Nowak, 2005; Zhou & Yu, 2006; Zou, 2006; Koltchinskii, 2007; van

Application A: Compressed sensing (Donoho, 2005; Candes & Tao, 2005) X β ∗ y = n n × p p (a) Image: vectorize to β ∗ ∈ R p (b) Compute n random projections

Application A: Compressed sensing (Donoho, 2005; Candes & Tao, 2005) In practice, signals are sparse in a transform domain: θ ∗ := Ψ β ∗ is a sparse signal, where Ψ is an orthonormal matrix. X Ψ T θ ∗ y s = n n × p p Reconstruct θ ∗ (and hence image β ∗ = Ψ T θ ∗ ) based on finding a sparse solution to under-constrained linear system X = X Ψ T is another random matrix. y = � where � X θ

Noiseless ℓ 1 recovery: Unrescaled sample size Prob. exact recovery vs. sample size ( µ = 0) 1 0.9 0.8 Prob. of exact recovery 0.7 0.6 0.5 0.4 0.3 p = 128 0.2 p = 256 p = 512 0.1 0 0 50 100 150 200 250 300 Raw sample size n Probability of recovery versus sample size n .

Application B: Graph structure estimation let G = ( V, E ) be an undirected graph on p = | V | vertices pairwise graphical model factorizes over edges of graph: � � � P ( x 1 , . . . , x p ; θ ) ∝ exp θ st ( x s , x t ) . ( s,t ) ∈ E given n independent and identically distributed (i.i.d.) samples of X = ( X 1 , . . . , X p ), identify the underlying graph structure

Pseudolikelihood and neighborhood regression Markov properties encode neighborhood structure: d ( X s | X V \ s ) = ( X s | X N ( s ) ) � �� Condition on full graph Condition on Markov blanket N ( s ) = { s, t, u, v, w } X s X t X w X s X u X v basis of pseudolikelihood method (Besag, 1974) basis of many graph learning algorithm (Friedman et al., 1999; Csiszar & Talata, 2005; Abeel et al., 2006; Meinshausen & Buhlmann, 2006)

Graph selection via neighborhood regression 1 0 0 1 1 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 1 0 0 0 . . . . . . . . . . . . . . . 0 Predict X s based on X \ s := { X s , t � = s } . 0 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 1 0 0 0 1 0 1 1 X \ s X s

Graph selection via neighborhood regression 1 0 0 1 1 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 1 0 0 0 . . . . . . . . . . . . . . . 0 Predict X s based on X \ s := { X s , t � = s } . 0 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 1 0 0 0 1 0 1 1 X \ s X s 1 For each node s ∈ V , compute (regularized) max. likelihood estimate: � � n � − 1 � θ [ s ] := arg min L ( θ ; X i \ s ) + λ n � θ � 1 n �� θ ∈ R p − 1 i =1 local log. likelihood regularization

Graph selection via neighborhood regression 1 0 0 1 1 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 0 0 1 0 0 0 . . . . . . . . . . . . . . . 0 Predict X s based on X \ s := { X s , t � = s } . 0 1 1 1 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 0 1 0 1 0 0 0 1 0 1 1 X \ s X s 1 For each node s ∈ V , compute (regularized) max. likelihood estimate: � � n � − 1 � θ [ s ] := arg min L ( θ ; X i \ s ) + λ n � θ � 1 n �� θ ∈ R p − 1 i =1 local log. likelihood regularization 2 Estimate the local neighborhood � N ( s ) as support of regression vector � θ [ s ] ∈ R p − 1 .

US Senate network (2004–2006 voting)

High-dimensional statistics: Some progress and challenges ahead - PowerPoint PPT Presentation

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture 1 Joint work with: Alekh Agarwal, Arash Amini, Po-Ling Loh,

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley

High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley

n -dimensional manifold M with T := TM n -dimensional manifold M with T := TM T n -dimensional

Statistics for High-Dimensional Data: Selected Topics Peter B uhlmann Seminar f ur

Mean field asymptotics in high-dimensional statistics: A few references Andrea Montanari July

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

High-dimensional statistics and probability Christophe Giraud Universit e Paris Saclay M2

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

High Dimensional Data, Covariance Matrices High Dimensional Data Examples and Application to

High-Dimensional Nearest Neighbor Search High-Dimensional Nearest Neighbor Search Who?

Using Local Neighborhoods to Find Subspace Clusters Emin Aksehirli with Bart Goethals, Emmanuel

High Dimensional Data Alark Joshi High dimensional data Data with multiple dimensions,

Deep Neural Network Mathematical Mysteries for High Dimensional Learning Stphane Mallat

Statistics for high-dimensional data: p-values and confidence intervals Peter B uhlmann

SOCIAL PROGRESS INDEX SOCIAL SOCIAL PROGRESS PROGRESS IMPERATIVE IMPERATIVE Social Progress

Three-dimensional Radial Visualization of High-dimensional Continuous or Discrete Datasets Fan

Data Collection Process Webinar March 1, 2018 Katie Dively and Jay Otto Centers Purpose We

Assembled Coalitions Dr. Robert Wittman Presented by Dr. Mark Pullen LS-141 - C2 to Simulation

Differential expression analysis Mary Piper Bioinformatics Consultant and Trainer DataCamp

Economists in International Development Policy Making Stefan Dercon ECONOMISTS AS THE LOYAL

Me em mb br ra an ne e C Co om mp pu ut ti in ng g After T Twenty Y Years G Gh

Diplomata Belgica Analysing medieval charter texts ( dictamen ) through a quantitative approach

BBM406 Fundamentals of Machine Learning Lecture 20: AdaBoost Aykut Erdem // Hacettepe

Recent measurements of low-energy hadronic cross secCons at