Statistics and learning
Multivariate statistics 2 and clustering Emmanuel Rachelson and Matthieu Vignes
ISAE SupAero
Wednesday 2nd and 9th October 2013
- E. Rachelson & M. Vignes (ISAE)
SAD 2013 1 / 14
Statistics and learning Multivariate statistics 2 and clustering - - PowerPoint PPT Presentation
Statistics and learning Multivariate statistics 2 and clustering Emmanuel Rachelson and Matthieu Vignes ISAE SupAero Wednesday 2 nd and 9 th October 2013 E. Rachelson & M. Vignes (ISAE) SAD 2013 1 / 14 Link to the previous session Goal
ISAE SupAero
SAD 2013 1 / 14
◮ review PCA needed ?
SAD 2013 2 / 14
◮ review PCA needed ? ◮ introduce Multidimensional scaling (MDS) as a factor analysis of a
SAD 2013 2 / 14
◮ review PCA needed ? ◮ introduce Multidimensional scaling (MDS) as a factor analysis of a
◮ introduce Canonical correlation analysis (CCA): for p quantitative
SAD 2013 2 / 14
◮ review PCA needed ? ◮ introduce Multidimensional scaling (MDS) as a factor analysis of a
◮ introduce Canonical correlation analysis (CCA): for p quantitative
◮ introduce Correspondence analysis (CA): for 2 qualitative variables
SAD 2013 2 / 14
◮ review PCA needed ? ◮ introduce Multidimensional scaling (MDS) as a factor analysis of a
◮ introduce Canonical correlation analysis (CCA): for p quantitative
◮ introduce Correspondence analysis (CA): for 2 qualitative variables
◮ introduce clustering methods like hierarchical clustering or
SAD 2013 2 / 14
◮ now only an index between individual is known, variables are not
SAD 2013 3 / 14
◮ now only an index between individual is known, variables are not
◮ Goal: represent the cloud of points in a low-dimensional subspace.
SAD 2013 3 / 14
◮ now only an index between individual is known, variables are not
◮ Goal: represent the cloud of points in a low-dimensional subspace. ◮ MDS = PCA on distance matrix !
SAD 2013 3 / 14
◮ now only an index between individual is known, variables are not
◮ Goal: represent the cloud of points in a low-dimensional subspace. ◮ MDS = PCA on distance matrix !
SAD 2013 3 / 14
◮ Uses techniques close to PCA to achieve a kind of multiple output
SAD 2013 4 / 14
◮ Uses techniques close to PCA to achieve a kind of multiple output
◮ Goal: Linking 2 groups of variables (X and Y ) measured on the
SAD 2013 4 / 14
◮ Uses techniques close to PCA to achieve a kind of multiple output
◮ Goal: Linking 2 groups of variables (X and Y ) measured on the
◮ Example from yesterday on the study of fatty acids and gene levels on
SAD 2013 4 / 14
◮ Uses techniques close to PCA to achieve a kind of multiple output
◮ Goal: Linking 2 groups of variables (X and Y ) measured on the
◮ Example from yesterday on the study of fatty acids and gene levels on
◮ Consists in looking for a couple of vectors, one related to X (gene
SAD 2013 4 / 14
◮ Uses techniques close to PCA to achieve a kind of multiple output
◮ Goal: Linking 2 groups of variables (X and Y ) measured on the
◮ Example from yesterday on the study of fatty acids and gene levels on
◮ Consists in looking for a couple of vectors, one related to X (gene
◮ Variables can be represented in either basis, it does not change the
SAD 2013 4 / 14
SAD 2013 5 / 14
◮ Becomes AFC in French
SAD 2013 6 / 14
◮ Becomes AFC in French ◮ similar concept to PCA: represent the distribution of the 2 variables
SAD 2013 6 / 14
◮ Becomes AFC in French ◮ similar concept to PCA: represent the distribution of the 2 variables
◮ This is double PCA (line and column profiles) on (Xij) = ( fi,j fi,.f.j − 1),
SAD 2013 6 / 14
◮ Becomes AFC in French ◮ similar concept to PCA: represent the distribution of the 2 variables
◮ This is double PCA (line and column profiles) on (Xij) = ( fi,j fi,.f.j − 1),
◮ Note that χ2 writes n i
i,j
SAD 2013 6 / 14
SAD 2013 7 / 14
◮ ”Clustering: unsupervised classification”. Distance, hierarchical
◮ Keep in mind that this is still exploratory statistics so the best
◮ End of practical session on mice data set. ◮ And a new guided session on multivariate stats: CA on presidential
SAD 2013 8 / 14
SAD 2013 9 / 14
SAD 2013 9 / 14
SAD 2013 9 / 14
◮ Task of grouping objects so that objects belonging to the same
SAD 2013 10 / 14
◮ Task of grouping objects so that objects belonging to the same
SAD 2013 10 / 14
◮ Task of grouping objects so that objects belonging to the same
SAD 2013 10 / 14
◮ Task of grouping objects so that objects belonging to the same
SAD 2013 10 / 14
◮ Task of grouping objects so that objects belonging to the same
SAD 2013 10 / 14
◮ Task of grouping objects so that objects belonging to the same
◮ Several algorithms can do the job, their differences mainly being
SAD 2013 10 / 14
◮ Task of grouping objects so that objects belonging to the same
◮ Several algorithms can do the job, their differences mainly being
◮ Possibly, different parameters (initialisation, distance used, ending
SAD 2013 10 / 14
SAD 2013 11 / 14
◮ hierarchical clustering with dissimilarity min → single, max →
◮ centroid models (e.g. K-means clustering) ◮ distribution models (statistical definition e.g. multivariate Gaussian
◮ graph or density models (e.g. clique) ◮ . . .
SAD 2013 11 / 14
◮ Define a similarity (symetry, self-similarity, bounded) → dissimilarity ◮ Distance need additional properties: d(i, j) = 0 ⇒ i = j and
SAD 2013 12 / 14
◮ Define a similarity (symetry, self-similarity, bounded) → dissimilarity ◮ Distance need additional properties: d(i, j) = 0 ⇒ i = j and
d(i,j) maxk d′(k).
SAD 2013 12 / 14
SAD 2013 13 / 14
SAD 2013 14 / 14
SAD 2013 14 / 14