The Many Flavors of Penalized Linear Discriminant Analysis Daniela - PowerPoint PPT Presentation

Linear Discriminant Analysis Penalized LDA Connections The Many Flavors of Penalized Linear Discriminant Analysis Daniela M. Witten Assistant Professor of Biostatistics University of Washington May 9, 2011 Fourth Erich L. Lehmann Symposium Rice University 1 / 29

Linear Discriminant Analysis Penalized LDA Connections Overview ◮ There has been a great deal of interest in the past 15+ years in penalized regression, {|| y − X β || 2 + P ( β ) } , minimize β especially in the setting where the number of features p exceeds the number of observations n . ◮ P is a penalty function. Could be chosen to promote ◮ sparsity: e.g. the lasso, P ( β ) = || β || 1 ◮ smoothness ◮ piecewise constancy... ◮ How can we extend the concepts developed for regression when p > n to other problems? ◮ A Case Study: Penalized linear discriminant analysis. 2 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Fisher’s Discriminant Problem Connections Optimal Scoring The classification problem ◮ The Set-up: ◮ We are given n training observations x 1 , . . . , x n ∈ R p , each of which falls into one of K classes. ◮ Let y ∈ { 1 , . . . , K } n contain class memberships for the training observations. x T   1 . ◮ Let X = .  .   .  x T n ◮ Each column of X (feature) is centered to have mean zero. 3 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Fisher’s Discriminant Problem Connections Optimal Scoring The classification problem ◮ The Set-up: ◮ We are given n training observations x 1 , . . . , x n ∈ R p , each of which falls into one of K classes. ◮ Let y ∈ { 1 , . . . , K } n contain class memberships for the training observations. x T   1 . ◮ Let X = .  .   .  x T n ◮ Each column of X (feature) is centered to have mean zero. ◮ The Goal: ◮ We wish to develop a classifier based on the training observations x 1 , . . . , x n ∈ R p , that we can use to classify a test observation x ∗ ∈ R p . ◮ A classical approach: linear discriminant analysis. 3 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Fisher’s Discriminant Problem Connections Optimal Scoring Linear discriminant analysis 4 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Fisher’s Discriminant Problem Connections Optimal Scoring LDA via the normal model ◮ Fit a simple normal model to the data: x i | y i = k ∼ N ( µ k , Σ w ) ◮ Apply Bayes’ Theorem to obtain a classifier: assign x ∗ to the class for which δ k ( x ∗ ) is largest: w µ k − 1 δ k ( x ∗ ) = x ∗ T Σ − 1 2 µ T k Σ − 1 w µ k + log π k 5 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Fisher’s Discriminant Problem Connections Optimal Scoring Fisher’s discriminant A geometric perspective: project the data to achieve good classification. 6 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Fisher’s Discriminant Problem Connections Optimal Scoring Fisher’s discriminant and the associated criterion Look for the discriminant vector β ∈ R p that maximizes β T ˆ Σ b β subject to β T ˆ Σ w β ≤ 1 . ◮ ˆ Σ b is an estimate for the between-class covariance matrix. ◮ ˆ Σ w is an estimate for the within-class covariance matrix. ◮ This is a generalized eigen problem; can obtain multiple discriminant vectors. ◮ To classify, multiply data by discriminant vectors and perform nearest centroid classification in this reduced space. ◮ If we use K − 1 discriminant vectors then we get the LDA classification rule. If we use fewer than K − 1, we get reduced-rank LDA. 7 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Fisher’s Discriminant Problem Connections Optimal Scoring LDA via optimal scoring ◮ Classification is such a bother. Isn’t regression so much nicer? ◮ It wouldn’t make sense to solve {|| y − X β || 2 } . minimize β ◮ But can we formulate classification as a regression problem in some other way? 8 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Fisher’s Discriminant Problem Connections Optimal Scoring LDA via optimal scoring ◮ Let Y be a n × K matrix of dummy variables; Y ik = 1 y i = k . {|| Y θ − X β || 2 } subject to θ T Y T Y θ = 1 . minimize β , θ ◮ We are choosing the optimal scoring of the class labels in order to recast the classification problem as a regression problem. ◮ The resulting β is proportional to the discriminant vector in Fisher’s discriminant problem. ◮ Can obtain the LDA classification rule, or reduced-rank LDA. 9 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Fisher’s Discriminant Problem Connections Optimal Scoring Linear discriminant analysis 10 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Optimal Scoring Connections Fisher’s Discriminant Problem LDA when p ≫ n When p ≫ n , we cannot apply LDA directly, because the within-class covariance matrix is singular. There is also an interpretability issue: ◮ All p features are involved in the classification rule. ◮ We want an interpretable classifier. For instance, a classification rule that is a ◮ sparse, ◮ smooth, or ◮ piecewise constant linear combination of the features. 11 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Optimal Scoring Connections Fisher’s Discriminant Problem Penalized LDA ◮ We could extend LDA to the high-dimensional setting by applying (convex) penalties, in order to obtain an interpretable classifier. ◮ For concreteness, in this talk: we will use ℓ 1 penalties in order to obtain a sparse classifier. ◮ Which version of LDA should we penalize, and does it matter? 12 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Optimal Scoring Connections Fisher’s Discriminant Problem Penalized LDA via the normal model ◮ The classification rule for LDA is µ k − 1 x ∗ T ˆ k ˆ Σ − 1 µ T Σ − 1 w ˆ 2 ˆ w ˆ µ k , where ˆ Σ w and ˆ µ k denote MLEs for Σ w and µ k . ◮ When p ≫ n , we cannot invert ˆ Σ w . ◮ Can use a regularized estimate of Σ w , such as  σ 2  ˆ 0 0 . . . 1 . ... .  σ 2  0 ˆ . Σ D   2 w = . .  ... ...  .  . 0    σ 2 0 . . . 0 ˆ p 13 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Optimal Scoring Connections Fisher’s Discriminant Problem Interpretable class centroids in the normal model ◮ For a sparse classifier, we need zeros in estimate of Σ − 1 w µ k . ◮ An interpretable classifier: ◮ Use Σ D w , and estimate µ k according to   p ( X ij − µ kj ) 2   � � minimize + λ || µ k || 1  . σ 2 µ k j  j =1 i : y i = k ◮ Apply Bayes’ Theorem to obtain a classification rule. ◮ This is the nearest shrunken centroids proposal, which yields a sparse classifier because we are using a diagonal estimate of the within-class covariance matrix and a sparse estimate of the class mean vectors. Citation: Tibshirani et al. 2003, Stat Sinica 14 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Optimal Scoring Connections Fisher’s Discriminant Problem Penalized LDA via optimal scoring ◮ We can easily extend the optimal scoring criterion: { 1 n || Y θ − X β || 2 + λ || β || 1 } subject to θ T Y T Y θ = 1 . minimize β , θ ◮ An efficient iterative algorithm will find a local optimum. ◮ We get sparse discriminant vectors, and hence classification using a subset of the features. Citation: Clemmensen Hastie Witten and Ersboll 2011, Submitted 15 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Optimal Scoring Connections Fisher’s Discriminant Problem Penalized LDA via Fisher’s discriminant problem ◮ A simple formulation: { β T ˆ Σ b β − λ || β || 1 ) } subject to β T ˜ maximize Σ w β ≤ 1 β where ˜ Σ w is some full rank estimate of Σ w . ◮ A non-convex problem, because β T ˆ Σ b β isn’t concave in β . ◮ Can we find a local optimum? Citation: Witten and Tibshirani 2011, JRSSB 16 / 29

Linear Discriminant Analysis The Normal Model Penalized LDA Optimal Scoring Connections Fisher’s Discriminant Problem Maximizing a function via minorization 17 / 29

The Many Flavors of Penalized Linear Discriminant Analysis Daniela - PowerPoint PPT Presentation

Linear Discriminant Analysis Penalized LDA Connections The Many Flavors of Penalized Linear Discriminant Analysis Daniela M. Witten Assistant Professor of Biostatistics University of Washington May 9, 2011 Fourth Erich L. Lehmann Symposium

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Discriminant Analysis aka. Discriminant Function Analysis Discriminant Analysis (DISCRIM)

Discriminant Analysis In discriminant analysis, we try to find functions of the data that

Flexible Discriminant Analysis Using Motivation MGLMM Multivariate Mixed Models Discriminant

Linear Discriminant Functions Linear Discriminant Functions 5.8, 5.9, 5.11 Jacob Hays Amit

Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Local Fisher Discriminant Local Fisher Discriminant Analysis for Supervised Analysis for

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Selecting Variables in Two-Group Robust Linear Discriminant Analysis . . . . . Stefan Van

Linear Discriminant Analysis and Logistic Regression Matthieu R. Bloch 1 Linear Discriminant

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Linear

Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA Eigenfaces (PCA) 1 Linear

Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 1 Discriminant-Based

Introduction to Machine Learning Classification: Discriminant Analysis

Comparisons of discriminant analysis techniques for high- dimensional correlated data Line H.

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2013 Overview

Towards the ultimate precision limits in parameter estimation: An introduction to quantum

Lecture 19 Spatial GLM + Point Reference Spatial Data Colin Rundel 11/09/2017 1 Spatial GLM

EVALUATION (1-10) IHCC 2019 Set a circle around your choice 1= no relevance/bad presentation at

The owning house data Can we separate the points with a line? 200 Income

Homework Homework Lecture 7: Linear Classification Methods Final projects? Groups Topics

Linear classification Course of Machine Learning Master Degree in Computer Science University of

Lecture 8 N.MORGAN / B.GOLD LECTURE 8