Geometric perspectives for supervised dimension reduction A Tale of - PowerPoint PPT Presentation

Geometric perspectives for supervised dimension reduction Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, D-X. Zhou, J. Guinney Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Department of Mathematics Duke University December 11, 2009

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher (beloved Bayesian) and goes back to at least Adcock 1878 and Edgeworth 1884.

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher (beloved Bayesian) and goes back to at least Adcock 1878 and Edgeworth 1884. X 1 , ..., X n drawn iid form a Gaussian can be reduced to µ, σ 2 .

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Regression Assume the model Y = f ( X ) + ε, E ε = 0 , I with X ∈ X ⊂ R p and Y ∈ R .

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Regression Assume the model Y = f ( X ) + ε, E ε = 0 , I with X ∈ X ⊂ R p and Y ∈ R . iid Data – D = { ( x i , y i ) } n ∼ ρ ( X , Y ). i =1

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Dimension reduction R p replace X with If the data lives in a p-dimensional space X ∈ I R d , p � d . Θ( X ) ∈ I

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Dimension reduction R p replace X with If the data lives in a p-dimensional space X ∈ I R d , p � d . Θ( X ) ∈ I My belief: physical, biological and social systems are inherently low dimensional and variation of interest in these systems can be captured by a low-dimensional submanifold.

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Supervised dimension reduction (SDR) Given response variables Y 1 , ..., Y n ∈ I R and explanatory variables or covariates X 1 , ..., X n ∈ X ⊂ R p iid ∼ No(0 , σ 2 ) . Y i = f ( X i ) + ε i , ε i

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Supervised dimension reduction (SDR) Given response variables Y 1 , ..., Y n ∈ I R and explanatory variables or covariates X 1 , ..., X n ∈ X ⊂ R p iid ∼ No(0 , σ 2 ) . Y i = f ( X i ) + ε i , ε i Is there a submanifold S ≡ S Y | X such that Y ⊥ ⊥ X | P S ( X ) ?

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Visualization of SDR (a) Data (b) Diffusion map 20 1 0.8 10 0.5 0.5 Dimension 2 0.6 0 z 0 0 0.4 − 10 − 0.5 − 0.5 0.2 − 20 100 50 0 0 20 − 20 0 0 0.5 1 y x Dimension 1 (c) GOP (d) GDM 20 1 0.8 10 0.5 0.5 Dimension 2 Dimension 2 0.6 0 0 0 0.4 − 10 − 0.5 − 0.5 0.2 − 20 0 − 10 0 10 20 0 0.5 1 Dimension 1 Dimension 1

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Linear projections capture nonlinear manifolds In this talk P S ( X ) = B T X where B = ( b 1 , ..., b d ).

Geometric perspectives for supervised dimension reduction Supervised dimension reduction Linear projections capture nonlinear manifolds In this talk P S ( X ) = B T X where B = ( b 1 , ..., b d ). Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace.

Geometric perspectives for supervised dimension reduction Learning gradients SDR model Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace.

Geometric perspectives for supervised dimension reduction Learning gradients SDR model Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace. Assume marginal distribution ρ X is concentrated on a manifold R p of dimension d � p . M ⊂ I

Geometric perspectives for supervised dimension reduction Learning gradients Gradients and outer products Given a smooth function f the gradient is � T � ∂ f ( x ) ∂ x 1 , ..., ∂ f ( x ) ∇ f ( x ) = . ∂ x p

Geometric perspectives for supervised dimension reduction Learning gradients Gradients and outer products Given a smooth function f the gradient is � T � ∂ f ( x ) ∂ x 1 , ..., ∂ f ( x ) ∇ f ( x ) = . ∂ x p Define the gradient outer product matrix Γ ∂ f ( x ) ∂ f � Γ ij = ( x ) d ρ X ( x ) , ∂ x i ∂ x j X Γ = E [( ∇ f ) ⊗ ( ∇ f )] .

Geometric perspectives for supervised dimension reduction Learning gradients GOP captures the d.r. space Suppose y = f ( X ) + ε = g ( b T 1 X , ..., b T d X ) + ε.

Geometric perspectives for supervised dimension reduction Learning gradients GOP captures the d.r. space Suppose y = f ( X ) + ε = g ( b T 1 X , ..., b T d X ) + ε. Note that for B = ( b 1 , ..., b d ) λ i b i = Γ b i .

Geometric perspectives for supervised dimension reduction Learning gradients GOP captures the d.r. space Suppose y = f ( X ) + ε = g ( b T 1 X , ..., b T d X ) + ε. Note that for B = ( b 1 , ..., b d ) λ i b i = Γ b i . For i = 1 , .., d ∂ f ( x ) = v T i ( ∇ f ( x )) � = 0 ⇒ b T i Γ b i � = 0 . ∂ v i If w ⊥ b i for all i then w T Γ w = 0.

Geometric perspectives for supervised dimension reduction Learning gradients Statistical interpretation Linear case y = β T x + ε, ε iid ∼ No(0 , σ 2 ) . Ω = cov ( E [ X | Y ]), Σ X = cov ( X ), σ 2 Y = var ( Y ).

Geometric perspectives for supervised dimension reduction Learning gradients Statistical interpretation Linear case y = β T x + ε, ε iid ∼ No(0 , σ 2 ) . Ω = cov ( E [ X | Y ]), Σ X = cov ( X ), σ 2 Y = var ( Y ). � 2 � 1 − σ 2 Γ = σ 2 Σ − 1 X ΩΣ − 1 ≈ σ 2 Y Σ − 1 X ΩΣ − 1 X . σ 2 Y X Y

Geometric perspectives for supervised dimension reduction Learning gradients Statistical interpretation For smooth f ( x ) ε iid ∼ No(0 , σ 2 ) . y = f ( x ) + ε, Ω = cov ( E [ X | Y ]) not so clear.

Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1

Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ])

Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i )

Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i ) σ 2 = var ( Y χ i ) i

Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i ) σ 2 = var ( Y χ i ) i m i = ρ X ( χ i ) .

Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i ) σ 2 = var ( Y χ i ) i m i = ρ X ( χ i ) . I � m i σ 2 i Σ − 1 Ω i Σ − 1 Γ ≈ . i i i =1

Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient Taylor expansion y i ≈ f ( x i ) ≈ f ( x j ) + �∇ f ( x j ) , x j − x i � ≈ y j + �∇ f ( x j ) , x j − x i � if x i ≈ x j .

Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient Taylor expansion y i ≈ f ( x i ) ≈ f ( x j ) + �∇ f ( x j ) , x j − x i � ≈ y j + �∇ f ( x j ) , x j − x i � if x i ≈ x j . Let � f ≈ ∇ f the following should be small � w ij ( y i − y j − � � f ( x j ) , x j − x i � ) 2 , i , j s p +2 exp( −� x i − x j � 2 / 2 s 2 ) enforces x i ≈ x j . 1 w ij =

Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient The gradient estimate   n  1 � 2 � f ( x j )) T ( x j − x i ) � � y i − y j − ( � + λ � � f � 2 f D = arg min w ij K  n 2 � f ∈H p i , j =1 where � � f � K is a smoothness penalty, reproducing kernel Hilbert space norm.

Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient The gradient estimate   n  1 � 2 � f ( x j )) T ( x j − x i ) � � y i − y j − ( � + λ � � f � 2 f D = arg min w ij K  n 2 � f ∈H p i , j =1 where � � f � K is a smoothness penalty, reproducing kernel Hilbert space norm. Goto board.

Geometric perspectives for supervised dimension reduction A Tale of - PowerPoint PPT Presentation

Geometric perspectives for supervised dimension reduction Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, D-X. Zhou, J. Guinney Department of Statistical Science

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

Linear Dimension Reduction (in L 2 ) Linear Dimension Reduction: R D R d Goal: Find a low-dim.

VC-dimension and Erd os-P osa property Nicolas Bousquet LIRMM, University Montpellier II

Geometric Optimization Piotr Indyk April 26, 2005 Lecture 19: Geometric Optimization Geometric

Geometric Algebra A powerful tool for solving geometric problems in visual computing Leandro A.

PCA CS 446 Supervised learning So far, weve done supervised learning: Given (( x i , y i )) ,

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Supervised Maximum Likelihood

Active Manifolds: A Geometric Approach to Dimension Reduction for Sensitivity Analysis Anthony

The Human Dimension Sue Manns Regional Director Pegasus The Human Dimension The Human

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Dimension Reduction CS 760@UW-Madison Goals for the lecture you should understand the following

Nonparametric Variable Selection via Sufficient Dimension Reduction Lexin Li Workshop on Current

Extreme Value Theory and Dimension GARDES Inference on reduction for the study of hyperspectral

Dimension Reduction CS 6242 Ramakrishnan Kannan Thanks : Prof. Jaegul Choo and Prof. Le

Medical Applica+ons for Manifold Learning: I)

AMMI Introduction to Deep Learning 8.2. Looking at activations Fran cois Fleuret

Machine Learning 2 DS 4420 - Spring 2020 Dimensionality reduction 2 Byron C Wallace Today A

6.1 Dimensionality reduction Previously in the course, we have discussed algorithms suited for a

CS 171: Visualization High-Dimensional Data Hanspeter Pfister pfister@seas.harvard.edu This

Eigenvectors, Heat Kernels, and Low Dimensional Representation of Data Sets Thanks to: Yale: R.

7.- Non supervised Neural Networks: Self-organizing Maps by Pascual Campoy Grupo de Visin

Fourth Quarter and Full Year 2015 Results Presentation to Investors and Analysts February 4,