Multidimensional Scaling Max Turgeon STAT 4690Applied Multivariate - PowerPoint PPT Presentation

Multidimensional Scaling Max Turgeon STAT 4690–Applied Multivariate Analysis

Recap: PCA • We discussed several interpretations of PCA. • Pearson : PCA gives the best linear approximation to the data (at a fjxed dimension). • We also used PCA to visualized multivariate data: • Fit PCA • Plot PC1 against PC2. 2

Multidimensional scaling • Multidimensional scaling is a method that looks at these two goals explicitely. • It has PCA has a special case. • But it is much more general. aims to represent the data in a lower-dimensional space possible to the original dissimilarities. 3 • The input of MDS is a dissimilarity matrix ∆ , and it such that the resulting dissimilarities ˜ ∆ are as close as • ∆ ≈ ˜ ∆ .

Example of dissimilarities • Dissimilaries measure how difgerent two observations are. • Larger disssimilarity, more difgerent. • Therefore, any distance measure can be used as a dissimilarity measure. • Mahalanobis distance. • Driving distance between cities. • Graph-based distance. • Any similarity measure can be turned into a dissimilarity measure using a monotone decreasing transformation. 4 • Euclidean distance in R p . ⇒ 1 − r 2 • E.g. r ij = ij

Two types of MDS • Metric MDS • The embedding in the lower dimensional space uses the same dissimilarity measure as in the original space. • Nonmetric MDS • The embedding in the lower dimensional space only uses the rank information from the original space. 5

Metric MDS–Algorithm Algorithm 6 • Input: An n × n matrix ∆ of dissimilarities. • Output: An n × r matrix ˜ X , with r < p . 1. Create the matrix D containing the square of the entries in ∆ . 2. Create S by centering both the rows and the columns and multiplying by − 1 2 . 3. Compute the eigenvalue decomposition S = U Λ U T . 4. Let ˜ X be the matrix containing the fjrst r columns of Λ 1 / 2 U T .

Example i Delta <- dist (swiss) D <- Delta ^ 2 # Center columns B <- scale (D, center = TRUE, scale = FALSE) # Center rows B <- t ( scale ( t (B), center = TRUE, scale = FALSE)) 7 B <- -0.5 * B

Example ii decomp <- eigen (B) X_tilde <- decomp $ vectors %*% Lambda ^ 0.5 plot (X_tilde) 8 Lambda <- diag ( pmax (decomp $ values, 0))

9 Example iii 20 0 X_tilde[,2] −20 −40 −60 −60 −40 −20 0 20 40 X_tilde[,1]

Example iv mds <- cmdscale (Delta, k = 2) all.equal (X_tilde[,1 : 2], mds, check.attributes = FALSE) ## [1] TRUE 10

Example v library (tidyverse) # Let's add annotations dimnames (X_tilde) <- list ( rownames (swiss), paste0 ("MDS", seq_len ( ncol (X_tilde)))) rownames_to_column ("District") 11 X_tilde <- as.data.frame (X_tilde) %>%

Example vi X_tilde <- X_tilde %>% mutate (Canton = case_when ( "Neuveville") ~ "Bern", "Sarine", "Veveyse") ~ "Fribourg", "Martigwy", "Monthey", "St Maurice", "Sierre", "Sion") ~ "Valais")) 12 District %in% c ("Courtelary", "Moutier", District %in% c ("Broye", "Glane", "Gruyere", District %in% c ("Conthey", "Entremont", "Herens",

Example vii X_tilde <- X_tilde %>% mutate (Canton = case_when ( !is.na (Canton) ~ Canton, "Le Locle", "Neuchatel", "ValdeTravers", "Val de Ruz") ~ "Neuchatel", "Rive Gauche") ~ "Geneva", District %in% c ("Delemont", "Franches-Mnt", "Porrentruy") ~ "Jura", TRUE ~ "Vaud")) 13 District %in% c ("Boudry", "La Chauxdfnd", District %in% c ("V. De Geneve", "Rive Droite",

Example viii library (ggrepel) ggplot ( aes (MDS1, MDS2)) + geom_point ( aes (colour = Canton)) + geom_label_repel ( aes (label = District)) + theme_minimal () + theme (legend.position = "top") 14 X_tilde %>%

Example ix 15 Bern Geneva Neuchatel Vaud Canton Fribourg Jura Valais Oron Lavaux Aubonne Paysd'enhaut Cossonay Herens Echallens Payerne Avenches Conthey Aigle 20 Moudon Morges Sierre Entremont Rolle Orbe St Maurice Yverdon Neuveville Martigwy Broye Nyone Val de Ruz 0 Glane Monthey Grandson Boudry Veveyse Moutier Sion Delemont ValdeTravers MDS2 Rive Droite Courtelary Gruyere La Vallee −20 Vevey Franches−Mnt Sarine Lausanne Le Locle Porrentruy La Chauxdfnd Neuchatel Rive Gauche −40 −60 V. De Geneve −75 −50 −25 0 25 50 MDS1

Figure 1 16

Another example i 392 1769 0 ## DEN 1209 1769 918 1493 0 1493 392 598 542 ## DCA 918 598 0 853 585 ## ORD 0 853 library (psych) 934 ## BOS 542 1209 934 585 0 ## ATL DEN DCA BOS ORD ATL ## cities[1 : 5, 1 : 5] 17

Another example ii mds <- cmdscale (cities, k = 2) colnames (mds) <- c ("MDS1", "MDS2") mds <- mds %>% as.data.frame %>% rownames_to_column ("Cities") 18

Another example iii mds %>% ggplot ( aes (MDS1, MDS2)) + geom_point () + geom_label_repel ( aes (label = Cities)) + theme_minimal () 19

Another example iv 20 MIA MSY 400 LAX ATL SFO MDS2 0 DEN DCA ORD −400 JFK BOS SEA −1000 0 1000 MDS1

Another example v mds %>% mutate (MDS1 = - MDS1, MDS2 = - MDS2) %>% ggplot ( aes (MDS1, MDS2)) + geom_point () + geom_label_repel ( aes (label = Cities)) + theme_minimal () 21

Another example vi 22 SEA BOS 400 JFK ORD DCA 0 MDS2 DEN SFO ATL −400 LAX MSY MIA −1000 0 1000 MDS1

Why does it work? i • The algorithm may seem like black magic… • Double centering? • Eigenvectors of distances? • Let’s try to justify it. product are related as follows: 23 • Let Y 1 , . . . , Y n be a set of points in R p . • Recall that in R p , the Euclidean distance and the scalar

Why does it work? given by ii 24 d ( Y i , Y j ) 2 = ⟨ Y i − Y j , Y i − Y j ⟩ = ( Y i − Y j ) T ( Y i − Y j ) = Y T i Y i − 2 Y T i Y j + Y T j Y j . • In other words, the scalar product between Y i and Y j is i Y j = − 1 d ( Y i , Y j ) 2 − Y T � � Y T i Y i − Y T j Y j . 2

Why does it work? iii 25 • Let S be the matrix whose ( i, j ) -th entry is Y T i Y j , and note that D is the matrix whose ( i, j ) -th entry is d ( Y i , Y j ) 2 . • Now, assume that the dataset Y 1 , . . . , Y n has sample mean ¯ Y = 0 (i.e. it is centred). The average of the i -th row of D is

Why does it work? iv 26 n n 1 d ( Y i , Y j ) 2 = 1 � � Y T i Y i − 2 Y T i Y j + Y T � � j Y j n n j =1 j =1 n n i Y i − 2 i Y j + 1 = Y T Y T Y T � � j Y j n n j =1 j =1 n Y + 1 i ¯ = Y T i Y i − 2 Y T � Y T j Y j n j =1 n = S ii + 1 � S jj . n j =1

Why does it work? • We can then deduce that the mean of all the entries of v 27 • Similarly, the average of the j -th column of D is given by n n 1 d ( Y i , Y j ) 2 = 1 � � S ii + S jj . n n i =1 i =1 D is given by n n n n 1 d ( Y i , Y j ) 2 = 1 S ii + 1 � � � � S jj . n 2 n n i =1 j =1 i =1 j =1

Why does it work? vi 28 • Putting all of this together, we now have that n j Y j = 1 Y T i Y i + Y T � d ( Y i , Y j ) 2 n j =1 n + 1 � d ( Y i , Y j ) 2 n i =1 n n − 1 � � d ( Y i , Y j ) 2 . n 2 i =1 j =1

Why does it work? vii • In other words , we can recover the scalar products from the square distances through double centering and scaling. • Moreover, since we assumed the data was centred, the covariance matrix. • In this context, up to a constant, MDS and PCA give the same results. • Note : This idea that double centering allows us to go from dissimilaries to scalar products will come back again in the next lecture on kernel methods. 29 SVD of the matrix S is related to the SVD of the sample

Further comments • In PCA, we performed an eigendecomposition of the sample covariance matrix. • In MDS, we performed an eigendecomposition of the doubly centred and scaled matrix of squared distances. • If our dissimilarities are computed using the Euclidean distance, both methods will give the same answer. • BUT : the smallest matrix will be faster to compute and faster to decompose. 30 • This is a p × p matrix. • This is an n × n matrix. • n > p ⇒ PCA ; n < p ⇒ MDS

Stress function i • Nonmetric MDS approaches the problem a bit difgerently. also have an objective function called the stress function . • Recall that our goal is to represent the data in a lower-dimensional space such that the resulting dissimilarities. 31 • We still have the same output ∆ of dissimilarities, but we dissimilarities ˜ ∆ are as close as possible to the original • ∆ ij ≈ ˜ ∆ ij , for all i, j .

Stress function ii • The stress function is defjned as minimise the stress function. • Note that the stress function depends on both the where 32 � i,j =1 w ij (∆ ij − ˜ � n � ∆ ij ) 2 � Stress( ˜ ∆; r ) = , � c • w ij are nonnegative weights; • c is a normalising constant. dimension r of the lower space and the distances ˜ ∆ . • Goal : Find points in R r such that their similarities

Multidimensional Scaling Max Turgeon STAT 4690Applied Multivariate - PowerPoint PPT Presentation

Multidimensional Scaling Max Turgeon STAT 4690Applied Multivariate Analysis Recap: PCA We discussed several interpretations of PCA. Pearson : PCA gives the best linear approximation to the data (at a fjxed dimension). We also

Multidimensional Scaling Applied Multivariate Statistics Spring 2012 Outline Fundamental

Multidimensional Scaling Applied Multivariate Statistics Spring 2013 Outline Fundamental

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

EE 355 Unit 5 Multidimensional Arrays Mark Redekopp 2 MULTIDIMENSIONAL ARRAYS 3

Shaken but Not Stirred: an Example of Subject Classification using Multidimensional Scaling

Multidimensional Scaling MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Multidimensional SAR SAR Imaging Imaging: : Multidimensional Studies in the in the Framework

High Capability Multidimensional Data High Capability Multidimensional Data Compression on GPUs

Chapter 9 Multidimensional Arrays and the ArrayList Class Topics Declaring and

Hierarchical Multidimensional Modelling Hierarchical Multidimensional Modelling in the Concept-

Multidimensional Quasi-Cyclic and Convolutional Codes Buket Ozkaya joint work with Cem G

Scaling (NMDS) Objective: Group data points into classes of similar points based on a series of

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Experiences In Cyber Security Education: The MIT Lincoln Laboratory Capture-the-Flag Exercise*

Secure Programming Laboratory 3: Web app security Joseph Hallett and David Aspinall, Informatics

XSRF How it works 3 - form is submitted on bank.com 4 - bank.com helpfully transfers money

Data Handling: Import, Cleaning and Visualisation Lecture 7: Data Sources, Data Gathering, Data

Computer Security DD2395 http://www.csc.kth.se/utbildning/kth/kurser/DD2395/dasakh11/ Fall 2011

Collaborative Discussions Steve Smith September 5, 2002 The Herd Departs

Midterm Review Tyler Moore CSE 3353, SMU, Dallas, TX , 2013 Portions of these slides have been

Women in ICT and Newbie Night August 15 Introduction Statistics Who here is from some

Sambuz

Useful Links

Newsletter

Mail Us

Multidimensional Scaling Max Turgeon STAT 4690Applied Multivariate - PowerPoint PPT Presentation

Multidimensional Scaling Max Turgeon STAT 4690Applied Multivariate Analysis Recap: PCA We discussed several interpretations of PCA. Pearson : PCA gives the best linear approximation to the data (at a fjxed dimension). We also

Multidimensional Scaling Applied Multivariate Statistics Spring 2012 Outline Fundamental

Multidimensional Scaling Applied Multivariate Statistics Spring 2013 Outline Fundamental

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

EE 355 Unit 5 Multidimensional Arrays Mark Redekopp 2 MULTIDIMENSIONAL ARRAYS 3

Shaken but Not Stirred: an Example of Subject Classification using Multidimensional Scaling

Multidimensional Scaling MAT 6480W / STT 6705V Guy Wolf guy.wolf@umontreal.ca Universit e de

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Multidimensional SAR SAR Imaging Imaging: : Multidimensional Studies in the in the Framework

High Capability Multidimensional Data High Capability Multidimensional Data Compression on GPUs

Chapter 9 Multidimensional Arrays and the ArrayList Class Topics Declaring and

Hierarchical Multidimensional Modelling Hierarchical Multidimensional Modelling in the Concept-

Multidimensional Quasi-Cyclic and Convolutional Codes Buket Ozkaya joint work with Cem G

Scaling (NMDS) Objective: Group data points into classes of similar points based on a series of

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

Scaling From simple models to rich strategies PPPLab Day, November 30th Scaling: recent

Experiences In Cyber Security Education: The MIT Lincoln Laboratory Capture-the-Flag Exercise*

Secure Programming Laboratory 3: Web app security Joseph Hallett and David Aspinall, Informatics

XSRF How it works 3 - form is submitted on bank.com 4 - bank.com helpfully transfers money

Data Handling: Import, Cleaning and Visualisation Lecture 7: Data Sources, Data Gathering, Data

Computer Security DD2395 http://www.csc.kth.se/utbildning/kth/kurser/DD2395/dasakh11/ Fall 2011

Collaborative Discussions Steve Smith September 5, 2002 The Herd Departs

Midterm Review Tyler Moore CSE 3353, SMU, Dallas, TX , 2013 Portions of these slides have been

Women in ICT and Newbie Night August 15 Introduction Statistics Who here is from some

Sambuz

Useful Links

Newsletter

Mail Us

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms