Reduced-Rank Singular Value Decomposition for Dimension Reduction - PowerPoint PPT Presentation

Reduced-Rank Singular Value Decomposition for Dimension Reduction with High-Dimensional Data Maxime Turgeon June 12th, 2017 McGill University Department of Epidemiology, Biostatistics, and Occupational Health 1/17

Acknowledgements • Stepan Grinek (BC Cancer Agency) • Celia Greenwood (McGill University) • Aur´ elie Labbe (HEC Montr´ eal) 2/17

Introduction • Modern genomics bring an abundance of high-dimensional, correlated measurements Y . • We are interested in describing the relationship between such a Y and a set of covariates X . • Our approach is to summarise this relationship using the largest root λ of a double Wishart problem : det ( A − λ ( A + B )) = 0 . 3/17

Double Wishart Problem There are many well-known examples: • Multivariate Analysis of Variance (MANOVA); • Canonical Correlation Analysis (CCA); • Testing for independence of two multivariate samples; • Testing for the equality of covariance matrices of two independent samples from multivariate normal distributions; • Principal Component of Explained Variance (PCEV) . 4/17

Main contribution In this work: 1. We explain how to solve the double Wishart problem in a high-dimensional setting. 2. We provide a heuristic for assessing the significance of the largest root of the determinantal equation. In what follows, we illustrate this approach using PCEV, but it is applicable to any double Wishart problem (e.g. CCA). 5/17

Methods

PCEV: Statistical model We assume a linear relationship: Y = β T X + ε. The total variance of the outcome can then be decomposed as Var ( Y ) = Var ( β T X ) + Var ( ε ) = V M + V R . 6/17

PCEV: Statistical model The PCEV framework seeks a linear combination w T Y such that the proportion of variance explained by X is maximised; this proportion is defined as the following Rayleigh quotient: w T V M w h ( w ) = w T ( V M + V R ) w . For the corresponding Wishart problem, we have A = V M , B = V R . We also have λ = max w h ( w ). 7/17

Singular Value Decomposition From the theory of SVD, we know there exists an orthogonal matrix T such that D := T T ( V R + V M ) T is diagonal. When p > n , the diagonal matrix D is singular , with rank r < p . Solution : Focus only on the nonzero diagonal elements. 8/17

Reduced-Rank SVD T = T [ r ] D − 1 / 2 Let ˜ . Therefore we get: [ r ] T T ( V R + V M ) ˜ ˜ T = I r . Similarly, we can diagonalise ˜ T T V M ˜ T via an orthogonal transformation S : S T � � T T V M ˜ ˜ T S = Λ . The largest root λ of the double Wishart problem is the largest element on the diagonal of Λ . Note: the vector w maximising the proportion of variance h ( w ) is the column of ˜ TS corresponding to the largest root. 9/17

Inference There is evidence in the literature that the null distribution of the largest root λ should be related to the Tracy-Widom distribution . • Johnstone: (log( λ ) − µ ) /σ → TW when p < n . • Turgeon et al. : The null distribution of λ is asymptotically the same as the largest root of a scaled Wishart. • The null distribution of the largest root of a Wishart is also related to TW . • More generally, random matrix theory suggests that the Tracy-widom distribution is key in central-limit-like theorem for random matrices. 10/17

Inference – Heuristic Estimate the null distribution 1. Perform a small number of permutations ( ∼ 25) on the rows of Y ; 2. For each permutation, compute the largest root statistic. 3. Fit a location-scale variant of the Tracy-Widom distribution. Numerical investigations support this approach for computing p-values. The main advantage over a traditional permutation strategy is the computation time. 11/17

Simulations

Simulation setting • We compared 4 different approaches: • PCEV with reduced-rank SVD • Lasso • Elastic net • Principal Component Regression • We simulated p = 500 , 750 , . . . , 2000 outcomes, 100 observations, one binary covariate. • Covariance structure is block-diagonal: • 10 uncorrelated blocks of equal size • Within block is autoregressive (with parameter ρ ) with baseline correlation α • 25% of the outcomes in each block are associated with the covariate, with a fix effect size of 0 . 333. 12/17

Simulation results: Power analysis Method Enet Lasso PCEV PCR alpha: 0 alpha: 0.2 1.00 0.75 rho: 0 0.50 0.25 0.00 1.00 0.75 rho: 0.2 0.50 0.25 power 0.00 1.00 rho: 0.5 0.75 0.50 0.25 0.00 1.00 0.75 rho: 0.7 0.50 0.25 0.00 500 1000 1500 2000 500 1000 1500 2000 p 13/17

Data analysis

Data • DNA methylation measured with Illumina 450k on 120 cell-separated samples • We focus on Monocytes only. • 18 controls; 35 Rheumatoid arthritis, 24 Lupus, 43 Scleroderma • We group CpGs by KEGG pathways • On average about 1500 CpGs per pathway; max of 21,800. • We compare PCEV to Lasso and Elastic-net. 14/17

Results Penalty ● Enet ● Lasso 1.00 Prop. selected CpGs 0.75 ● ● ● ● 0.50 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 2 4 6 PCEV p−value (−log10 scale) 15/17

Results Pathway PCEV pvalue Lasso Prop. Enet Prop. < 3 . 4 × 10 − 8 Vitamin B6 metabolism 0.12 0.32 < 3 . 4 × 10 − 8 Primary bile acid biosynthesis 0.10 0.28 < 3 . 4 × 10 − 8 Fatty acid biosynthesis 0.07 0.25 < 3 . 4 × 10 − 8 Ascorbate and aldarate metabolism 0.10 0.24 < 3 . 4 × 10 − 8 Steroid biosynthesis 0.08 0.22 < 3 . 4 × 10 − 8 Glycosphingolipid biosynthesis 0.06 0.21 < 3 . 4 × 10 − 8 Histidine metabolism 0.07 0.20 < 3 . 4 × 10 − 8 Thiamine metabolism 0.10 0.19 < 3 . 4 × 10 − 8 Folate biosynthesis 0.10 0.19 < 3 . 4 × 10 − 8 Other types of O-glycan biosynthesis 0.09 0.19 16/17

Conclusion • Data summary is an important feature in data analysis, and this can be achieved using dimension reduction techniques. • In a high-dimensional setting, estimation and inference are more challenging • Estimation: Reduced-rank SVD; • Inference: Fitted location-scale Tracy-Widom. • Our approach is computationally simple and provides good power. • Simulations and data analyses confirm its advantage over a more traditional approach using PCA, as well as other high-dimensional approaches such as Lasso and Elastic-net regression. 17/17

Questions or comments? For more information and updates, visit maxturgeon.ca . 17/17

Reduced-Rank Singular Value Decomposition for Dimension Reduction - PowerPoint PPT Presentation

Reduced-Rank Singular Value Decomposition for Dimension Reduction with High-Dimensional Data Maxime Turgeon June 12th, 2017 McGill University Department of Epidemiology, Biostatistics, and Occupational Health 1/17 Acknowledgements Stepan

[11] The Singular Value Decomposition The Singular Value Decomposition Gene Golubs license

Singular Value Decomposition Presented by Matthew Motoki 1 What is a singular value

2 3 4 5 8 9 MINNEAPOLIS MILWAUKEE MSA RANK #16 MSA RANK #39 CHICAGO MSA RANK #3

Eigenvalue Problems and Singular Value Decomposition Sanzheng Qiao Department of Computing and

Singular Value Decomposition (matrix factorization) Singular Value Decomposition The SVD is a

1 Singular Value Decomposition The singular vector decomposition allows us to write any matrix A

The Singular Value Decomposition COMPSCI 527 Computer Vision COMPSCI 527 Computer Vision

Parallel Singular Value Decomposition Jiaxing Tan Outline What is SVD? How to calculate

Chapter 5 Singular value decomposition and principal component analysis In A Practical Approach to

Investigation into a Parallel Singular Value Decomposition Travis Askham Steven Delong Michael

CS475 / CS675 Lecture 19: July 5, 2016 Singular value decomposition Reading: [TB] Chapter 31

Decomposition rank of UHF-absorbing C -algebras Joint work with Hiroki Matui 12, Mar., 2013.

Best rank-one approximation Definition: The first left singular vector of A is defined to be the

On the product of a singular Wishart matrix and a singular Gaussian vector in high dimension.

Singular Value Decomposition and Digital Image Compression Chris Bingham December 12, 2016

Thermal decomposition of the Thermal decomposition of the Thermal decomposition of the Thermal

Presenters: Assoc. Prof. Ping Yu Team: Wilf Yeo, David Reid, Barbara Sinclair, Xiaoqi Feng,

VIDEO SIGNALS Lossless coding g LOSSLESS CODING LOSSLESS CODING The goal of lossless image

REVIEW OF ICF PROJECTS AND INITIATIVES WHICH DEMONSTRATE GOOD PRACTICE ICF LEARNING

Readout(scheme(for(the(Baby2 MIND(detector( E.(Noah 1 ,(A.(Blondel 1 ,(Y.(Favre 1 ,(Y.(Kudenko 2

Definitions View Space : coordinate system with the viewer looking down the -z axis, with +x to

Gibbs-Markov-Young structures Jos e F. Alves International Workshop on Differentiable

in Sports and Rehabilitation Medicine Federica Villa 1 , Alessandro Magnani 1 , Martina A. Maggioni

TODO add: PID material from Pont slides Some inverted pendulum videos Model-based

Sambuz

Useful Links

Newsletter

Mail Us