stat 209 dimensionality reduction
play

STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer - PowerPoint PPT Presentation

Dimensionality Reduction STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24 Dimensionality Reduction Outline Dimensionality Reduction 2 / 24 Dimensionality Reduction High Dimensional Data Modern datasets


  1. Dimensionality Reduction STAT 209 Dimensionality Reduction November 26, 2019 Colin Reimer Dawson 1 / 24

  2. Dimensionality Reduction Outline Dimensionality Reduction 2 / 24

  3. Dimensionality Reduction High Dimensional Data ● Modern datasets often have huge numbers of variables ● E.g., images, biomarker data, measurements at fine-grained time points, social networks, product preferences ● Clustering can be a useful way to find “groups” of similar observations ● However, distance measures have some strange properties in high dimensions ● Can be useful to try to extract a few dimensions that carry most of the “signal” 3 / 24

  4. Dimensionality Reduction Images Have Many Variables 4 / 24 but maybe only a few meaningful “features”

  5. Dimensionality Reduction High dimensional inputs Comprehensible arranged this way... 5 / 24

  6. Dimensionality Reduction “Eigenfaces” 6 / 24

  7. Dimensionality Reduction Finding the "Main Direction" of Variation 20 10 QuizCentered ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −10 −20 −20 −10 0 10 20 MidtermCentered 7 / 24

  8. Dimensionality Reduction Finding the “Eigen-features” ## Here I am pulling out the perpendicular directions in (Midterm,Quiz) ## space that align with the ellipse on the scatterplot. ## If you know some linear algebra: ## These are the eigenvectors of the covariance matrix directions <- select(Scores, Midterm, Quiz) %>% cov() %>% eigen() directions %>% extract2("vectors") %>% round(digits = 2) [,1] [,2] [1,] -0.97 0.24 [2,] -0.24 -0.97 ## Creating two new variables that are a weighted sum and weighted ## difference of the midterm and quiz score, with weights chosen so ## that the new variables are uncorrelated Scores_augmented <- mutate(Scores, V1 = 0.97 * Midterm + 0.24 * Quiz, V2 = 0.24 * Midterm - 0.97 * Quiz) 8 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend