multidimensional scaling
play

Multidimensional Scaling Applied Multivariate Statistics Spring 2013 - PowerPoint PPT Presentation

Multidimensional Scaling Applied Multivariate Statistics Spring 2013 Outline Fundamental Idea Classical Multidimensional Scaling Non-metric Multidimensional Scaling Appl. Multivariate Statistics - Spring 2013 How to represent in


  1. Multidimensional Scaling Applied Multivariate Statistics – Spring 2013

  2. Outline  Fundamental Idea  Classical Multidimensional Scaling  Non-metric Multidimensional Scaling Appl. Multivariate Statistics - Spring 2013

  3. How to represent in two dimensions? Basic Idea Appl. Multivariate Statistics - Spring 2013

  4. Idea 1: Projection Appl. Multivariate Statistics - Spring 2013

  5. Idea 2: Squeeze on table Close points stay close Appl. Multivariate Statistics - Spring 2013

  6. Which idea is better? Appl. Multivariate Statistics - Spring 2013

  7. Idea of MDS  Represent high-dimensional point cloud in few (usually 2) dimensions keeping distances between points similar  Classical/Metric MDS: Use a clever projection R: cmdscale  Non-metric MDS: Squeeze data on table, only conserve ranks R: isoMDS Appl. Multivariate Statistics - Spring 2013

  8. Classical MDS  Problem: Given euclidean distances among points, recover the position of the points!  Example: Road distance between 21 European cities (almost euclidean, but not quite) … Appl. Multivariate Statistics - Spring 2013

  9. Classical MDS  First try: Appl. Multivariate Statistics - Spring 2013

  10. Can identify points up to - shift Classical MDS - rotation - reflection  Flip axes: Appl. Multivariate Statistics - Spring 2013

  11. Classical MDS  Another example: Airpollution in US cities  Range of manu and popul is much bigger than range of wind  Need to standardize to give every variable equal weight Appl. Multivariate Statistics - Spring 2013

  12. Classical MDS Appl. Multivariate Statistics - Spring 2013

  13. Classical MDS: Theory  Input: Euclidean distances between n objects in p dimensions  Output: Position of points up to rotation, reflection, shift  Two steps: - Compute inner products matrix B from distance - Compute positions from B Appl. Multivariate Statistics - Spring 2013

  14. Classical MDS: Theory – Step 1 n * q data matrix  Inner products matrix B = XX T b ij = P q k =1 x ik x jk ij = P q k =1 ( x ik ¡ x jk ) 2 = ::: = b ii + b jj ¡ 2 b ij  Connect to distance: d 2  Center points to avoid shift invariance ³ ´ x = 0 ! P n i =1 x ik = 0 ! P i or j b ij = 0  Invert relationship: b ij = ¡ 1 2 ( d 2 ij ¡ d 2 i: ¡ d 2 :j + d 2 :: ) “doubly centered” (Hint for middle of page 108: Plug in (4.3) and equations on top of page 108 to show that the expression involving d’s is equal to b ij )  Thus, we obtained B from the distance matrix Appl. Multivariate Statistics - Spring 2013

  15. Classical MDS: Theory – Step 2  Since B = XX T , we need the “square root” of B  B is a symmetric and positive definite n*n matrix  Thus, B can be diagonalized: B = V ¤ V T D is a diagonal matrix with on diagonal ¸ 1 ¸ ¸ 2 ¸ ::: ¸ ¸ n (“eigenvalues”) V contains as columns normalized eigenvectors  Some eigenvalues will be zero; drop them: B = V 1 ¤ 1 V T 1 1  Take “square root”: X = V 1 ¤ 2 1  Thus we obtained the position of points from the distances between all points Appl. Multivariate Statistics - Spring 2013

  16. Classical MDS: Low-dim representation  Keep only few (e.g. 2) largest eigenvalues and corresponding eigenvectors  The resulting X will be the low-dimensional representation we were looking for  Goodness of fit (GOF) if we reduce to m dimensions: P m (should be at least 0.8) i =1 ¸ i P n GOF = i =1 ¸ i  Finds “optimal” low -dim representation: Minimizes ³ ij ) 2 ´ S = P n P n ij ¡ ( d ( m ) d 2 i =1 j =1 Appl. Multivariate Statistics - Spring 2013

  17. Classical MDS: Pros and Cons + Optimal for euclidean input data + Still optimal, if B has non-negative eigenvalues (pos. semidefinite) + Very fast - No guarantees if B has negative eigenvalues However, in practice, it is still used then. New measures for Goodness of fit: P m P m P m i =1 ¸ 2 i =1 j ¸ i j i =1 max (0 ;¸ i ) P n P n P n i GOF = GOF = GOF = i =1 ¸ 2 i =1 j ¸ i j i =1 max (0 ;¸ i ) i Used in R function “ cmdscale ” Appl. Multivariate Statistics - Spring 2013

  18. Non-metric MDS: Idea  Sometimes, there is no strict metric on original points  Example: How beautiful are these persons? (1: Not at all, 10: Very much) 9 6 2 10 ?? 1 5 OR Appl. Multivariate Statistics - Spring 2013

  19. Non-metric MDS: Idea  Absolute values are not > that meaningful  Ranking is important  Non-metric MDS finds a low-dimensional representation, which respects the ranking of distances > Appl. Multivariate Statistics - Spring 2013

  20. Non-metric MDS: Theory  is the true dissimilarity, d ij is the distance of representation ± ij  Minimize STRESS ( is an increasing function): µ P i<j ( µ ( ± ij ) ¡ d ij ) 2 P S = i<j d 2 ij  Optimize over both position of points and µ  is called “disparity” ^ d ij = µ ( ± ij )  Solved numerically (isotonic regression); Classical MDS as starting value; very time consuming Appl. Multivariate Statistics - Spring 2013

  21. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 5 3 STRESS = 19.7 A 2 B ± AB < ± BC < ± AC Appl. Multivariate Statistics - Spring 2013

  22. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Compute best representation 4.8 2.7 STRESS = 20.1 A 2 B ± AB < ± BC < ± AC Appl. Multivariate Statistics - Spring 2013

  23. Non-metric MDS: Example for intuition (only) True points in high dimensional space C Stop if minimal STRESS is found. Compute best representation 5.2 STRESS = 18.9 2.9 We will finally represent the A 2 “transformed true distances” B (called disparities): d AB = 2 ; ^ ^ d BC = 2 : 9 ; ^ ± AB < ± BC < ± AC d AC = 5 : 2 instead of the true distances: ± AB = 2 ; ± BC = 3 ; ± AC = 5 Appl. Multivariate Statistics - Spring 2013

  24. Non-metric MDS: Pros and Cons + Fulfills a clear objective without many assumptions (minimize STRESS) + Results don’t change with rescaling or monotonic variable transformation + Works even if you only have rank information - Slow in large problems - Usually only local (not global) optimum found - Only gets ranks of distances right Appl. Multivariate Statistics - Spring 2013

  25. Non-metric MDS: Example  Do people in the same party vote alike?  Number of votes where 15 congressmen disagreed in 19 votes … Appl. Multivariate Statistics - Spring 2013

  26. Non-metric MDS: Example Appl. Multivariate Statistics - Spring 2013

  27. Concepts to know  Classical MDS: - Finds low-dim projection that respects distances - Optimal for euclidean distances - No clear guarantees for other distances - fast  Non-metric MDS: - Squeezes data points on table - respects only rankings of distances - (locally) solves clear objective - slow Appl. Multivariate Statistics - Spring 2013

  28. R commands to know  cmdscale included in standard R distribution  isoMDS from package “MASS” Appl. Multivariate Statistics - Spring 2013

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend