multidimensional scaling
play

Multidimensional Scaling Applied Multivariate Statistics Spring 2012 - PowerPoint PPT Presentation

Multidimensional Scaling Applied Multivariate Statistics Spring 2012 Outline Fundamental Idea Classical Multidimensional Scaling Non-metric Multidimensional Scaling Appl. Multivariate Statistics - Spring 2012 2 How to represent


  1. Multidimensional Scaling Applied Multivariate Statistics – Spring 2012

  2. Outline  Fundamental Idea  Classical Multidimensional Scaling  Non-metric Multidimensional Scaling Appl. Multivariate Statistics - Spring 2012 2

  3. How to represent in two dimensions? Basic Idea Appl. Multivariate Statistics - Spring 2012 3

  4. Idea 1: Projection Appl. Multivariate Statistics - Spring 2012 4

  5. Idea 2: Squeeze on table Close points stay close Appl. Multivariate Statistics - Spring 2012 5

  6. Which idea is better? Appl. Multivariate Statistics - Spring 2012 6

  7. Idea of MDS  Represent high-dimensional point cloud in few (usually 2) dimensions keeping distances between points similar  Classical/Metric MDS: Use a clever projection R: cmdscale  Non-metric MDS: Squeeze data on table R: isoMDS Appl. Multivariate Statistics - Spring 2012 7

  8. Classical MDS  Problem: Given euclidean distances among points, recover the position of the points!  Example: Road distance between 21 European cities (almost euclidean, but not quite) … Appl. Multivariate Statistics - Spring 2012 8

  9. Classical MDS  First try: Appl. Multivariate Statistics - Spring 2012 9

  10. Can identify points up to - shift Classical MDS - rotation - reflection  Flip axes: Appl. Multivariate Statistics - Spring 2012 10

  11. Classical MDS  Another example: Airpollution in US cities  Range of manu and popul is much bigger than range of wind  Need to standardize to give every variable equal weight Appl. Multivariate Statistics - Spring 2012 11

  12. Classical MDS Appl. Multivariate Statistics - Spring 2012 12

  13. Classical MDS: Theory  Input: Euclidean distances between n objects in p dimensions  Output: Position of points up to rotation, reflection, shift  Two steps: - Compute inner products matrix B from distance - Compute positions from B Appl. Multivariate Statistics - Spring 2012 13

  14. Classical MDS: Theory – Step 1  Inner products matrix B = XX T  Connect to distance: d 2 ij = b ii + b jj ¡ 2 b ij  Center points to avoid shift invariance  Invert realtionship: b ij = ¡ 1 2 ( d 2 ij ¡ d 2 i: ¡ d 2 :j + d 2 :: ) “doubly centered” Appl. Multivariate Statistics - Spring 2012 14

  15. Classical MDS: Theory – Step 2  Since B = XX T , we need the “square root” of B  B is a symmetric and positive definite n*n matrix  Thus, B can be diagonalized: B = V ¤ V T D is a diagonal matrix with on diagonal ¸ 1 ¸ ¸ 2 ¸ ::: ¸ ¸ n (“eigenvalues”) V contains as columns normalized eigenvectors  Some eigenvalues will be zero; drop them: B = V 1 ¤ 1 V T 1 ¡ 1  Take “square root”: X = V 1 ¤ 2 1 Appl. Multivariate Statistics - Spring 2012 15

  16. Classical MDS: Low-dim representation  Keep only few (e.g. 2) largest eigenvalues and corresponding eigenvectors  The resulting X will be the low-dimensional representation we were looking for  Goodness of fit (GOF) if we reduce to m dimensions: P m (should be at least 0.8) i =1 ¸ i P n GOF = i =1 ¸ i  Finds “optimal” low -dim representation: Minimizes ³ ij ) 2 ´ S = P n P n ij ¡ ( d ( m ) d 2 i =1 j =1 Appl. Multivariate Statistics - Spring 2012 16

  17. Classical MDS: Pros and Cons + Optimal for euclidean input data + Still optimal, if B has non-negative eigenvalues (pos. semidefinite) + Very fast - No guarantees if B has negative eigenvalues However, in practice, it is still used then. New measures for Goodness of fit: P m P m P m i =1 ¸ 2 i =1 j ¸ i j i =1 max (0 ;¸ i ) P n P n P n i GOF = GOF = GOF = i =1 ¸ 2 i =1 j ¸ i j i =1 max (0 ;¸ i ) i Used in R function “ cmdscale ” Appl. Multivariate Statistics - Spring 2012 17

  18. Non-metric MDS: Idea  Sometimes, there is no strict metric on original points  Example: How much do you like the portraits? (1: Not at all, 10: Very much) 9 2 6 10 ?? 5 1 OR Appl. Multivariate Statistics - Spring 2012 18

  19. Non-metric MDS: Idea  Absolute values are not > that meaningful  Ranking is important  Non-metric MDS finds a low-dimensional representation, which respects the ranking of distances > Appl. Multivariate Statistics - Spring 2012 19

  20. Non-metric MDS: Theory  is the true dissimilarity, d ij is the distance of representation ± ij  Minimize STRESS ( is an increasing function): µ P i<j ( µ ( ± ij ) ¡ d ij ) 2 P S = i<j d 2 ij  Optimize over both position of points and µ  is called “disparity” ^ d ij = µ ( ± ij )  Solved numerically (isotonic regression); Classical MDS as starting value; very time consuming Appl. Multivariate Statistics - Spring 2012 20

  21. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 5 3 STRESS = 19.7 A 2 B d AB < d BC < d AC Appl. Multivariate Statistics - Spring 2012 21

  22. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 4.8 2.7 STRESS = 20.1 A 2 B d AB < d BC < d AC Appl. Multivariate Statistics - Spring 2012 22

  23. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 5.2 STRESS = 18.9 2.9 A 2 B Stop if minimal STRESS is found. We will finally represent the distances d AB = 2, d BC = 2.9, d AC = 5.2 d AB < d BC < d AC Appl. Multivariate Statistics - Spring 2012 23

  24. Non-metric MDS: Pros and Cons + Fulfills a clear objective without many assumptions (minimize STRESS) + Results don’t change with rescaling or monotonic variable transformation + Works even if you only have rank information - Slow in large problems - Usually only local (not global) optimum found - Only gets ranks of distances right Appl. Multivariate Statistics - Spring 2012 24

  25. Non-metric MDS: Example  Do people in the same party vote alike?  Agreement of 15 congressman in 19 votes … Appl. Multivariate Statistics - Spring 2012 25

  26. Non-metric MDS: Example Appl. Multivariate Statistics - Spring 2012 26

  27. Concepts to know  Classical MDS: - Finds low-dim projection that respects distances - Optimal for euclidean distances - No clear guarantees for other distances - fast  Non-metric MDS: - Squeezes data points on table - respects only rankings of distances - (locally) solves clear objective - slow Appl. Multivariate Statistics - Spring 2012 27

  28. R commands to know  cmdscale included in standard R distribution  isoMDS from package “MASS” Appl. Multivariate Statistics - Spring 2012 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend