PCA by Projection Pursuit Department of Statistics and Probability - PowerPoint PPT Presentation

Joint work with . . . P. Filzmoser PCA by Projection Pursuit Department of Statistics and Probability Theory Vienna University of Technology, Austria The Package pcaPP C. Croux Heinrich Fritz Department of Applied Economics Vienna University of Technology, Austria K.U. Leuven, Belgium Vienna, Austria M.R. Oliveira June, 2006 Department of Mathematics Instituto Superior T´ ecnico, Lisbon, Portugal K. Kalcher Vienna University of Technology Vienna University of Technology, Austria Agenda Principal Component Analysis (PCA) 4 • Principal components 3 • Robust approaches y 2 • The implementation 1 • Supporting methods • Covariance estimation by PCAs 0 0 1 2 3 4 x

Principal Component Analysis (PCA) Principal Component Analysis (PCA) 4 4 PC1 3 3 y 2 y 2 PC2 1 1 0 0 0 1 2 3 4 0 1 2 3 4 x x Outliers Outliers 4 4 PC2 3 3 y 2 y 2 1 1 PC1 0 0 0 1 2 3 4 0 1 2 3 4 x x

Outliers The Classical Approach 4 PC2 • PCA by decomposition of the covariance matrix PC1 3 Σ = ΓΛΓ t ˆ x t � � Y = X − 1¯ Γ y 2 • Robustness due to robust covariance estimates. PC2 1 PC1 – package rrcov : covMCD , covMest – package robustbase : covGK , covOGK 0 0 1 2 3 4 x PCA by Projection Pursuit Defining the Data Center 4 • No covariance estimation necessary 3 • Especially for high dimensional data • Procedure y 2 – Define a data center ( mean , median , l1median , . . . ) 1 – Search for promising directions by maximizing a spread estimation ( sd , mad , qn ) of the data projected onto these directions 0 – Reduce the amount of candidate directions 0 1 2 3 4 x

Maximizing Spread Maximizing Spread 5 5 4 4 3 3 y y 2 2 1 1 s = 0.62 s = 0.62 0 0 MAD = 0.54 MAD = 0.46 −1 −1 −1 0 1 2 3 4 5 −1 0 1 2 3 4 5 x x Maximizing Spread Maximizing Spread 5 5 4 4 3 3 y y 2 2 1 1 s = 0.63 s = 0.63 0 0 MAD = 0.4 MAD = 0.32 −1 −1 −1 0 1 2 3 4 5 −1 0 1 2 3 4 5 x x

Maximizing Spread PCAproj 4 5 4 3 3 y y 2 2 1 1 s = 0.62 0 MAD = 0.54 −1 0 −1 0 1 2 3 4 5 0 1 2 3 4 x x PCAproj PCAproj 4 4 3 3 y 2 y 2 1 1 0 0 0 1 2 3 4 0 1 2 3 4 x x

PCAproj PCAproj 4 Candidate Directions: 4 • each data point 3 3 • additionally random directions through center • additional directions by y 2 y 2 linear combinations of data points 1 • update algorithm (based 1 on eigenvalues) 0 0 1 2 3 4 0 x 0 1 2 3 4 x PCAgrid Implementation Grid Algorithm: 4 • Implementation in C Optimization is done on a 3 regular grid in the plane. • Wrapping functions 2 • select two variables y – PCAproj (x, k = 2, method = c("sd", "mad", "qn"), CalcMethod • optimization on the grid = c("eachobs", "lincomb", "sphere"), nmax = 1000, update = • select other variables 1 TRUE, scores = TRUE, maxit = 5, maxhalf = 5, control, ...) • . . . – PCAgrid (x, k = 2, method = c("sd", "mad", "qn"), maxiter = 0 10, splitcircle = 10, scores = TRUE, anglehalving = TRUE, 0 1 2 3 4 fact2dim = 10, control, ...) x

Common Parameters PCAproj - Individual Parameters • x : Data matrix (data frame) • CalcMethod : "eachobs" , "lincomb" or "sphere" • k : Number of principal components • nmax : Max directions to search in each step (for "lincomb" or "sphere" ) • method : Spread estimator for projection pursuit • update : Perform update steps? • scores : Return scores-matrix? – maxhalf : Maximum number of steps for angle halving – maxit : Maximum number of iterations • control : Control-structure • ... Passed to ScaleAdv PCAgrid - Individual Parameters Return Structure • (S3) class pcaPP derived from princomp : • splitcircle : Number of directions – sdev : Spread of principal components • anglehalving : Perform anglehalving – loadings : Matrix containing the loadings – center : Center applied to the data matrix • fact2dim : Behavior in 2 dimensional case. – scale : Scale applied to the data matrix • maxiter : Maximum number of iterations. – n.obs : Number of observations – scores : Matrix containing the scores – call : Function call

Additional Functions Robust Covariance Estimation • Robust covariance estimation based on PCs • l1median(X, MaxStep = 200, ItTol = 10 − 8 ) Robust center estimator Σ = ˆ ˆ Γˆ Λˆ Γ t • qn(x) • covPCAproj(x, control) Robust scale estimator • covPCAgrid(x, control) • ScaleAdv(x, center = mean, scale = sd) Advanced scaling method (takes functions or vectors as input values) • covPC (x, k, method) (under construction . . . ) Example Example screeplot(result) > library(pcaPP) Scree−plot > data(swiss) > result = PCAproj(swiss, k = 6, method = "mad") 1500 > summary(result) Importance of components: Variances Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 1000 Standard deviation 44.1005199 41.0302723 17.09114152 6.92022550 4.619893062 Proportion of Variance 0.4859749 0.4206639 0.07299087 0.01196649 0.005333229 Cumulative Proportion 0.4859749 0.9066387 0.97962962 0.99159611 0.996929342 500 Comp.6 Standard deviation 3.505520822 Proportion of Variance 0.003070658 Cumulative Proportion 1.000000000 0 Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6

Example Covariance Estimation biplot(result) Biplot −200 −100 0 100 200 300 Herens Sierre Conthey Entremont 300 Martigwy St Maurice Glane > library (covrob) 0.2 Monthey Veveyse Broye Sion > covswiss.mad <- covrob (swiss, method="covPCAproj", control = list Gruyere Agriculture 200 Franches−Mnt (k=6,method="mad")) Sarine Delemont Catholic 0.1 Porrentruy > covswiss.sd <- covrob (swiss, method="covPCAproj", control = list 100 Echallens Comp.2 Fertility (k=6,method="sd")) Oron Paysd’enhaut Lavaux Aubonne Infant.Mortality Cossonay Moutier Rive Droite Payerne > plot (covswiss.mad, covswiss.sd) 0.0 Aigle Rolle Avenches 0 Morges Moudon Nyone Yverdon Orbe Rive Gauche Education Neuveville Val de Ruz Examination −100 Boudry Grandson Vevey −0.1 Courtelary Le Locle ValdeTravers Neuchatel Lausanne −200 La Chauxdfnd La Vallee V. De Geneve −0.2 −0.2 −0.1 0.0 0.1 0.2 Comp.1 Covariance Estimation Infant.Mortality Examination Agriculture Education Catholic Fertility Fertility 56.08 Agriculture 410.6773 −62.8119 −120.8495 Examination −233.6847 −467.9115 −78.4054 −130.6045 65.9795 Education −140.1594 −318.3069 144.4185 230.4597 255.309 −210.4271 −106.3998 Catholic −166.5624 −20.9876 2.7944 31.8595 11.9388 6.665 −3.9372 −5.1075 11.3704 Infant.Mortality 195.156 127.608 −80.5001 −49.1052 −226.1518 Robust cov − estimation based on PCs (projection mode − sd) Robust cov − estimation based on PCs (projection mode − mad)

PCA by Projection Pursuit Department of Statistics and Probability - PowerPoint PPT Presentation

Joint work with . . . P. Filzmoser PCA by Projection Pursuit Department of Statistics and Probability Theory Vienna University of Technology, Austria The Package pcaPP C. Croux Heinrich Fritz Department of Applied Economics Vienna University

Robust PCA Yingjun Wu Preliminary: vector projection Scalar projection of a onto b: a1 could be

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

Fast Kernel Smoothing in Projection Pursuit David Hofmeyr Dept. Statistics and Actuarial Science

Overview Focus Projection Focus Projection Focus to Accent Focus to Accent Restricted View of

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection

Pursuit Curves Molly Severdia May 15, 2008 Molly Severdia Pursuit Curves Assumptions y ( x 0

Mixed Strategies 4/24/17 Recall: Pursuit/Evasion Game Pursuit/Evasion Payoff Matrix L R L

Managing Polarities in Pursuit Managing Polarities in Pursuit of Quality of Quality USE

Radial Projection Techniques InfoVis SS2020 G4 12 05 2020 Radial Projection Basics Also

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

Application of PCA to Facial Recognition Aaron Kosmatin, Clayton Broman Math 45 December 17,

ECON2915 Economic Growth Lecture 3 : Population and economic growth. Andreas Moxnes University

Pricing the Biological Clock: Reproductive Capital on the US Marriage Market Corinne Low June 4,

Sermon #243 Galatians 5:16-26 June 3, 2018 (Title Slide 1) The Fruitful Life Its a barren,

Arjen van der Wel Max Planck Institute for Astronomy (Heidelberg, Germany) CANDELS & 3D-HST

Welcome to Chicago!! 2019 Conference Co-Chairs Courtney Finlayson, MD Olivia Frias, MSN, RN

Religions, Fertility, and Growth in South-East Asia David de la Croix 1 and Clara Delavallade 2 1

Education, Family Composition, Fertility and Trend Carlos Bethencourt Jos e-V ctor R

CSE 490 Natural Language Processing Spring 2016 Language Models Yejin Choi Slides adapted from

PCA by Projection Pursuit Department of Statistics and Probability - PowerPoint PPT Presentation

Joint work with . . . P. Filzmoser PCA by Projection Pursuit Department of Statistics and Probability Theory Vienna University of Technology, Austria The Package pcaPP C. Croux Heinrich Fritz Department of Applied Economics Vienna University

Robust PCA Yingjun Wu Preliminary: vector projection Scalar projection of a onto b: a1 could be

ECS231 PCA, revisited May 28, 2019 1 / 18 Outline 1. PCA for lossy data compression 2. PCA for

MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline

Projection Ping Yu School of Economics and Finance The University of Hong Kong Ping Yu (HKU)

Fast Kernel Smoothing in Projection Pursuit David Hofmeyr Dept. Statistics and Actuarial Science

Overview Focus Projection Focus Projection Focus to Accent Focus to Accent Restricted View of

Ive Got You Under My Skin: A Comparison of IV and s/c PCA Nick Williamson Clinical Nurse

Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA

Lecture 25: Autoencoders Kernel PCA Aykut Erdem January 2017 Hacettepe University Today

Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection

Pursuit Curves Molly Severdia May 15, 2008 Molly Severdia Pursuit Curves Assumptions y ( x 0

Mixed Strategies 4/24/17 Recall: Pursuit/Evasion Game Pursuit/Evasion Payoff Matrix L R L

Managing Polarities in Pursuit Managing Polarities in Pursuit of Quality of Quality USE

Radial Projection Techniques InfoVis SS2020 G4 12 05 2020 Radial Projection Basics Also

Kernel PCA for SNe Kernel PCA for SNe photometric classification photometric classification

Application of PCA to Facial Recognition Aaron Kosmatin, Clayton Broman Math 45 December 17,

ECON2915 Economic Growth Lecture 3 : Population and economic growth. Andreas Moxnes University

Pricing the Biological Clock: Reproductive Capital on the US Marriage Market Corinne Low June 4,

Sermon #243 Galatians 5:16-26 June 3, 2018 (Title Slide 1) The Fruitful Life Its a barren,

Arjen van der Wel Max Planck Institute for Astronomy (Heidelberg, Germany) CANDELS &amp; 3D-HST

Welcome to Chicago!! 2019 Conference Co-Chairs Courtney Finlayson, MD Olivia Frias, MSN, RN

Religions, Fertility, and Growth in South-East Asia David de la Croix 1 and Clara Delavallade 2 1

Education, Family Composition, Fertility and Trend Carlos Bethencourt Jos e-V ctor R

CSE 490 Natural Language Processing Spring 2016 Language Models Yejin Choi Slides adapted from

Arjen van der Wel Max Planck Institute for Astronomy (Heidelberg, Germany) CANDELS & 3D-HST