geometric perspectives for supervised dimension reduction
play

Geometric perspectives for supervised dimension reduction A Tale of - PowerPoint PPT Presentation

Geometric perspectives for supervised dimension reduction Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, D-X. Zhou, J. Guinney Department of Statistical Science


  1. Geometric perspectives for supervised dimension reduction Geometric perspectives for supervised dimension reduction A Tale of Two Manifolds S. Mukherjee, K. Mao, F. Liang, Q. Wu, D-X. Zhou, J. Guinney Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Department of Mathematics Duke University December 11, 2009

  2. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher (beloved Bayesian) and goes back to at least Adcock 1878 and Edgeworth 1884.

  3. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Information and sufficiency A fundamental idea in statistical thought is to reduce data to relevant information. This was the paradigm of R.A. Fisher (beloved Bayesian) and goes back to at least Adcock 1878 and Edgeworth 1884. X 1 , ..., X n drawn iid form a Gaussian can be reduced to µ, σ 2 .

  4. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Regression Assume the model Y = f ( X ) + ε, E ε = 0 , I with X ∈ X ⊂ R p and Y ∈ R .

  5. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Regression Assume the model Y = f ( X ) + ε, E ε = 0 , I with X ∈ X ⊂ R p and Y ∈ R . iid Data – D = { ( x i , y i ) } n ∼ ρ ( X , Y ). i =1

  6. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Dimension reduction R p replace X with If the data lives in a p-dimensional space X ∈ I R d , p � d . Θ( X ) ∈ I

  7. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Dimension reduction R p replace X with If the data lives in a p-dimensional space X ∈ I R d , p � d . Θ( X ) ∈ I My belief: physical, biological and social systems are inherently low dimensional and variation of interest in these systems can be captured by a low-dimensional submanifold.

  8. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Supervised dimension reduction (SDR) Given response variables Y 1 , ..., Y n ∈ I R and explanatory variables or covariates X 1 , ..., X n ∈ X ⊂ R p iid ∼ No(0 , σ 2 ) . Y i = f ( X i ) + ε i , ε i

  9. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Supervised dimension reduction (SDR) Given response variables Y 1 , ..., Y n ∈ I R and explanatory variables or covariates X 1 , ..., X n ∈ X ⊂ R p iid ∼ No(0 , σ 2 ) . Y i = f ( X i ) + ε i , ε i Is there a submanifold S ≡ S Y | X such that Y ⊥ ⊥ X | P S ( X ) ?

  10. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Visualization of SDR (a) Data (b) Diffusion map 20 1 0.8 10 0.5 0.5 Dimension 2 0.6 0 z 0 0 0.4 − 10 − 0.5 − 0.5 0.2 − 20 100 50 0 0 20 − 20 0 0 0.5 1 y x Dimension 1 (c) GOP (d) GDM 20 1 0.8 10 0.5 0.5 Dimension 2 Dimension 2 0.6 0 0 0 0.4 − 10 − 0.5 − 0.5 0.2 − 20 0 − 10 0 10 20 0 0.5 1 Dimension 1 Dimension 1

  11. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Linear projections capture nonlinear manifolds In this talk P S ( X ) = B T X where B = ( b 1 , ..., b d ).

  12. Geometric perspectives for supervised dimension reduction Supervised dimension reduction Linear projections capture nonlinear manifolds In this talk P S ( X ) = B T X where B = ( b 1 , ..., b d ). Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace.

  13. Geometric perspectives for supervised dimension reduction Learning gradients SDR model Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace.

  14. Geometric perspectives for supervised dimension reduction Learning gradients SDR model Semiparametric model Y i = f ( X i ) + ε i = g ( b T 1 X i , . . . , b T d X i ) + ε i , span B is the dimension reduction (d.r.) subspace. Assume marginal distribution ρ X is concentrated on a manifold R p of dimension d � p . M ⊂ I

  15. Geometric perspectives for supervised dimension reduction Learning gradients Gradients and outer products Given a smooth function f the gradient is � T � ∂ f ( x ) ∂ x 1 , ..., ∂ f ( x ) ∇ f ( x ) = . ∂ x p

  16. Geometric perspectives for supervised dimension reduction Learning gradients Gradients and outer products Given a smooth function f the gradient is � T � ∂ f ( x ) ∂ x 1 , ..., ∂ f ( x ) ∇ f ( x ) = . ∂ x p Define the gradient outer product matrix Γ ∂ f ( x ) ∂ f � Γ ij = ( x ) d ρ X ( x ) , ∂ x i ∂ x j X Γ = E [( ∇ f ) ⊗ ( ∇ f )] .

  17. Geometric perspectives for supervised dimension reduction Learning gradients GOP captures the d.r. space Suppose y = f ( X ) + ε = g ( b T 1 X , ..., b T d X ) + ε.

  18. Geometric perspectives for supervised dimension reduction Learning gradients GOP captures the d.r. space Suppose y = f ( X ) + ε = g ( b T 1 X , ..., b T d X ) + ε. Note that for B = ( b 1 , ..., b d ) λ i b i = Γ b i .

  19. Geometric perspectives for supervised dimension reduction Learning gradients GOP captures the d.r. space Suppose y = f ( X ) + ε = g ( b T 1 X , ..., b T d X ) + ε. Note that for B = ( b 1 , ..., b d ) λ i b i = Γ b i . For i = 1 , .., d ∂ f ( x ) = v T i ( ∇ f ( x )) � = 0 ⇒ b T i Γ b i � = 0 . ∂ v i If w ⊥ b i for all i then w T Γ w = 0.

  20. Geometric perspectives for supervised dimension reduction Learning gradients Statistical interpretation Linear case y = β T x + ε, ε iid ∼ No(0 , σ 2 ) . Ω = cov ( E [ X | Y ]), Σ X = cov ( X ), σ 2 Y = var ( Y ).

  21. Geometric perspectives for supervised dimension reduction Learning gradients Statistical interpretation Linear case y = β T x + ε, ε iid ∼ No(0 , σ 2 ) . Ω = cov ( E [ X | Y ]), Σ X = cov ( X ), σ 2 Y = var ( Y ). � 2 � 1 − σ 2 Γ = σ 2 Σ − 1 X ΩΣ − 1 ≈ σ 2 Y Σ − 1 X ΩΣ − 1 X . σ 2 Y X Y

  22. Geometric perspectives for supervised dimension reduction Learning gradients Statistical interpretation For smooth f ( x ) ε iid ∼ No(0 , σ 2 ) . y = f ( x ) + ε, Ω = cov ( E [ X | Y ]) not so clear.

  23. Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1

  24. Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ])

  25. Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i )

  26. Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i ) σ 2 = var ( Y χ i ) i

  27. Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i ) σ 2 = var ( Y χ i ) i m i = ρ X ( χ i ) .

  28. Geometric perspectives for supervised dimension reduction Learning gradients Nonlinear case Partition into sections and compute local quantities I � X = χ i i =1 Ω i = cov ( E [ X χ i | Y χ i ]) Σ i = cov ( X χ i ) σ 2 = var ( Y χ i ) i m i = ρ X ( χ i ) . I � m i σ 2 i Σ − 1 Ω i Σ − 1 Γ ≈ . i i i =1

  29. Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient Taylor expansion y i ≈ f ( x i ) ≈ f ( x j ) + �∇ f ( x j ) , x j − x i � ≈ y j + �∇ f ( x j ) , x j − x i � if x i ≈ x j .

  30. Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient Taylor expansion y i ≈ f ( x i ) ≈ f ( x j ) + �∇ f ( x j ) , x j − x i � ≈ y j + �∇ f ( x j ) , x j − x i � if x i ≈ x j . Let � f ≈ ∇ f the following should be small � w ij ( y i − y j − � � f ( x j ) , x j − x i � ) 2 , i , j s p +2 exp( −� x i − x j � 2 / 2 s 2 ) enforces x i ≈ x j . 1 w ij =

  31. Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient The gradient estimate   n  1 � 2 � f ( x j )) T ( x j − x i ) � � y i − y j − ( � + λ � � f � 2 f D = arg min w ij K  n 2 � f ∈H p i , j =1 where � � f � K is a smoothness penalty, reproducing kernel Hilbert space norm.

  32. Geometric perspectives for supervised dimension reduction Learning gradients Estimating the gradient The gradient estimate   n  1 � 2 � f ( x j )) T ( x j − x i ) � � y i − y j − ( � + λ � � f � 2 f D = arg min w ij K  n 2 � f ∈H p i , j =1 where � � f � K is a smoothness penalty, reproducing kernel Hilbert space norm. Goto board.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend