Nonlinear Dimension Reduction to Improve Predictive Accuracy in - PowerPoint PPT Presentation

Nonlinear Dimension Reduction to Improve Predictive Accuracy in Genomic and Neuroimaging Studies Maxime Turgeon June 5, 2018 McGill University Department of Epidemiology, Biostatistics, and Occupational Health 1/21

Acknowledgements This (ongoing) work has been done under the supervision of: • Celia Greenwood (McGill University) • Aur´ elie Labbe (HEC Montr´ eal) 2/21

Motivation • Modern genomics and neuroimaging bring an abundance of high-dimensional, correlated measurements X . • We are interested in predicting a clinical outcome Y based on the observed covariates X . • However, the collected data typically contains thousands of covariates, whereas the sample size is at most a few hundreds. • We would also want to capture the potentially complex, nonlinear association between X and Y , and between the covariates themselves. 3/21

Motivation • With a low to medium signal-to-noise ratio, the information contained in the data should be used sparingly. • Moreover, from a clinical perspective, we need to account for the possibility of similar clinical profiles leading to different outcomes. • We want prediction , not classification . 4/21

Proposed approach This work investigates the properties of the following approach: • Let X be p -dimensional and Y binary. • Using nonlinear dimension reduction methods, extract K components ˆ L 1 , . . . , ˆ L K . • Predict Y using a logistic regression model of the form K � � �� Y | ˆ L 1 , . . . , ˆ � β i ˆ L K = β 0 + L i . logit E i =1 5/21

Nonlinear dimension reduction

General principle • In PCA and ICA, we learn a linear transformation from the latent structure to the observed variables (and back). • On the other hand, nonlinear dimension reduction (NLDR) methods try to learn the manifold underlying the latent structure. • NLDR methods are non-generative , i.e. they do not learn the transformation. • The main approach: preserve local structures in the data. 6/21

Multidimensional Scaling • Main principle : Manifolds can be described by pairwise distances. • Let D = ( d ij ) be the matrix of pairwise distances for the observed values X 1 , . . . , X n . • The goal is now to find L 1 , . . . L n in a lower dimensional space such that 1 / 2   � ( d ij − � L i − L j � ) 2  i � = j is minimized. • The objective function can also be weighted in a such a way that preserving small distances is prioritized. 7/21

Other methods Other methods that are considered in this work: • Isomap; • Laplace Eigenmaps (SE); • kernel PCA; • Locally Linear Embedding (LLE); • t-distributed Stochastic Embedding (t-SNE). All methods are implemented in the Python module scikit-learn . 8/21

Simulations

� � General framework � Y X 1 , . . . , X p L 1 , . . . , L K 9/21

Performance metrics We want to measure two key properties: 1. Calibration : using the Brier score ( lower is better); 2. Discrimination : using the AUROC ( higher is better). 10/21

1. Swiss roll • We first generate two uniform variables L 1 ∼ U (0 , 10) and L 2 ∼ U ( − 1 , 1). • We then generate a binary outcome Y : logit ( E ( Y | L 1 , L 2 )) = − 5 + L 1 − L 2 . • Finally, we generate three covariates X 1 , X 2 , X 3 : ( X 1 , X 2 , X 3 ) = ( L 1 cos( L 1 ) , L 2 , L 1 sin( L 1 )) . • We fix n = 500 and repeat the simulation B = 250 times. 11/21

1. Swiss roll 12/21

1. Swiss roll We compared 10 approaches: 1. Oracle : logistic regression with L 1 , L 2 (i.e. the true model); 2. Baseline : logistic regression with X 1 , X 2 , X 3 ; 3. Classical linear methods : PCA, ICA; 4. Manifold learning methods : kernel PCA, Multidimensional scaling (MDS), Isomap, Locally Linear Embedding (LLE), Spectral Embedding (SE), and t-distributed Stochastic Neighbour Embedding (tSNE). 13/21

1. Swiss roll–Results AUROC Brier 1.00 0.25 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.20 0.75 ● ● ● ● ● ● ● ● 0.15 ● Value ● ● ● ● ● 0.50 ● ● ● ● 0.10 0.25 0.05 0.00 0.00 e a a a s e p e e e e a a a s e p e e e n c c c d l a s n l n c c c d l a s n l l c l c i p i p m m s i p i p m m s l a l a e k t e k t r r s o s o o o s s a a i i b b Method 14/21

2. Random quadratic forms • We first generate K latent variables L 1 , . . . , L K . • All p covariates are generated as random quadratic forms of the latent variables. 1. Select a random subset L 1 , . . . , L k of the K latent variables. • E.g. L 1 and L 4 . 2. Form all possible quadratic combinations of the selected variables. • E.g. L 2 1 , L 1 L 4 , L 2 4 . 3. Sample coefficients from standard normal and sum all terms. • E.g. X i = − 0 . 5 L 2 1 − 0 . 1 L 1 L 4 + 0 . 7 L 2 4 . 15/21

2. Random quadratic forms • The association between Y and L 1 , . . . , L 5 is defined via 5 � logit ( E ( Y | L 1 , . . . , L 5 )) = β i L i , i =1 where β i = ( − 1) i 2 √ . 5 • The sample size varies as n = 100 , 150 , 250, 300. • The distribution of the covariates: • Standard normal; • Folded standard normal; • Exponential with mean 1. • The simulation was repeated B = 50 times. 16/21

2. Random quadratic forms We compared 12 approaches: 1. Oracle : logistic regression with only the first five covariates (i.e. the true model); 2. Baseline : logistic regression with all p variables; 3. Lasso regression using all p variables; 4. Elastic-net regression using all p variables; 5. Classical methods and nonlinear extensions : PCA, ICA, kernel PCA, and Multidimensional scaling (MDS); 6. Manifold learning methods : Isomap, Locally Linear Embedding (LLE), Spectral Embedding (SE), and t-distributed Stochastic Neighbour Embedding (tSNE). 17/21

2. Random quadratic forms–Results baseline ica lasso mds se Method enet isomap lle pca oracle AUROC Brier 0.85 0.20 0.19 0.80 Value 0.18 0.17 0.75 0.16 0.70 0.15 10 20 30 40 50 10 20 30 40 50 Number of covariates 18/21

Discussion

Summary • The Swiss roll example shows that manifold learning methods recover the latent structure, which leads to good predictive performance. • The random quadratic form example shows that highly complex models can lead to worse performance that classical PCR. • NLDR methods have known limitations: • Trouble with manifolds with non-trivial homology (holes and self-intersections) • Sensitive to choice of neighbourhoods. • Where is the boundary between both regimes? 19/21

Theoretical results • Whitney’s and Nash’s embedding theorems guarantee that any (smooth or Riemannian) manifold can be embedded without intersections in a Euclidean space of high enough dimension. • Johnson-Lindenstrauss lemma : We can project high-dimensional data points and preserve distances if dimension of lower space is high enough. 20/21

Final remarks • Where does nature fit in all this? What kind of latent structures may underlie neuroimaging or genomic data? • Future Work : Find low dimensional example with low performance, and high-dimensional example with good performance. • The latter implies finding a way to generate a high-dimensional structure with no self-intersection. 21/21

Questions or comments? For more information and updates, visit maxturgeon.ca . 21/21

Nonlinear Dimension Reduction to Improve Predictive Accuracy in - PowerPoint PPT Presentation

Nonlinear Dimension Reduction to Improve Predictive Accuracy in Genomic and Neuroimaging Studies Maxime Turgeon June 5, 2018 McGill University Department of Epidemiology, Biostatistics, and Occupational Health 1/21 Acknowledgements This

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

Nonlinear Control Lecture # 31 Nonlinear Observers Nonlinear Control Lecture # 31 Nonlinear

Nonlinear Control Lecture # 22 Special nonlinear Forms Nonlinear Control Lecture # 22 Special

Nonlinear Control Lecture # 21 Special nonlinear Forms Nonlinear Control Lecture # 21 Special

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

Nonlinear Control Lecture # 8 Special nonlinear Forms Nonlinear Control Lecture # 8 Special

Nonlinear Control Lecture # 12 Nonlinear Observers and Output Feedback Stabilization Nonlinear

Nonlinear Control Lecture # 20 Special nonlinear Forms Nonlinear Control Lecture # 20 Special

VC-dimension and Erd os-P osa property Nicolas Bousquet LIRMM, University Montpellier II

Session 3 Upskilling for Predictive Analytics Travis M Short, FSA Upskilling for Predictive

Model Predictive Control Model Predictive Control of Hybrid Systems of Hybrid Systems Model

Multiple Model Predictive Control (MMPC) for Nonlinear Systems and Improved Disturbance Rejection

Non-linear MPC Robert Platt Northeastern University NonLinear Model Predictive Control Given:

Outline n From linear to nonlinear n Model-predictive control (MPC) n POMDPs Page 1

Nonlinear Control Lecture # 1 Introduction Nonlinear Control Lecture # 1 Introduction Nonlinear

Choosing a microscope John Rubinstein Molecular Structure and Function Program The Hospital for

Spin and orbital freezing in unconventional superconductors Philipp Werner University of

Math 1060Q Lecture 22 Jeffrey Connors University of Connecticut Dec. 1, 2014 Models of

OSS is changing the Security information sharing landscape. Focus on the MISP objects and other

st Prss st Prss rt

Toward Ubiquitous Web Hirotaka UEDA, Tahar CHERIF Sharp Corporation March 9, 2006 Whats

Institute for Defense Analyses 4850 Mark Center Drive Alexandria, Virginia 22311-1882

STS for NLG Christian Chiarcos chiarcos@uni-potsdam.de Natural Language Generation Natural