Hierarchical Clustering on Principal Components (HCPC) 70 60 50 40 - - PowerPoint PPT Presentation

hierarchical clustering on principal components
SMART_READER_LITE
LIVE PREVIEW

Hierarchical Clustering on Principal Components (HCPC) 70 60 50 40 - - PowerPoint PPT Presentation

Hierarchical Clustering on Principal Components (HCPC) 70 60 50 40 height 30 20 10 10 cluster 1 cluster 3 Kiev 5 cluster 2 Moscow Krakow Budapest Rome Athens 0 Helsinki Minsk Sarajevo Sofia Madrid Prague Oslo -5 Copenhagen Berlin Brussels Paris


slide-1
SLIDE 1

1

Hierarchical Clustering on Principal Components

(HCPC)

LE RAY Guillaume MOLTO Quentin

Students of AGROCAMPUS OUEST majored in applied statistics

  • 20
  • 10

10 20 30 10 20 30 40 50 60 70

  • 15
  • 10
  • 5

5 10

height

Moscow Helsinki Minsk Reykjavik Oslo Stockholm Kiev Krakow Copenhagen Prague Berlin Sarajevo Sofia Dublin London Amsterdam Budapest Brussels Paris Madrid Rome Lisbon Athens

cluster 1 cluster 2 cluster 3

slide-2
SLIDE 2

2

Context

  • R: A free, opensource software for statistics (1875

packages).

  • FactoMineR: a R package, developped in Agrocampus-

Ouest, dedicated to factorial analysis.

  • The aim is to create a complementary tool to this

package, dedicated to clustering, especially after a factorial analysis.

  • Wide range of choices and uses, results, and graphical

representations.

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

slide-3
SLIDE 3

3

Clustering and factorial analysis

  • Factorial analysis and hierarchical clustering are

very complementary tools to explore data.

  • Removing the last factors of a factorial analysis

remove noise and makes the clustering robuster.

Analyses factorielles simples et multiples 4éme édition, Escofier,Pagès 2008

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

slide-4
SLIDE 4

4

Program structure

Factorial analysis Hierarchical clustering Cutting the tree Consolidation Description of clusters and factor maps

Factorial analysis

PCA, MCA, MFA…

Hierarchical Clustering

Ward, Euclidean

Cutting the tree

partition

Consolidation

K-means

Description of clusters and factor maps

slide-5
SLIDE 5

5

Statistic methods (1)

  • Hierarchical clustering:

– Function agnes – Euclidean distance – Ward criterion=d²(i,j)x(mi.mj)/(mi+mj)

  • Suggested level to cut the tree:

– Intra-cluster inertia – Partition comparison: Q=(In+1 - In)/In+1 – Max = nb of individuals/2

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

1 2 3 4 5 6 7 8 9 10

Nb of clusters Inertia

0 2 4 6 8 10

Factorial analysis

Hierarchical clustering Cutting the tree

Consolidation Description of clusters and factor maps

slide-6
SLIDE 6

6

Statistic methods (2)

Consolidation

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

Factorial analysis Hierarchical clustering Cutting the tree

Consolidation

Description of clusters and factor maps

  • Non optimal partition
  • K means with the

cluster centers

slide-7
SLIDE 7

7 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

Statistic methods (2)

Consolidation

Factorial analysis Hierarchical clustering Cutting the tree

Consolidation

Description of clusters and factor maps

  • Non optimal partition
  • K means with the

cluster centers

slide-8
SLIDE 8

8

  • Description by individuals:

– Use real individuals to caracterise clusters.

  • Description by variables:

– Give list of typical variable of clusters.

  • Description by axes:

– Like in factorial analysis.

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

Statistic methods (3)

Clusters description

Factorial analysis Hierarchical clustering Cutting the tree Consolidation

Description of clusters and factor maps

slide-9
SLIDE 9

9

Dataset presentation

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

slide-10
SLIDE 10

10

Factorial Analysis

  • 5

5

  • 4
  • 3
  • 2
  • 1

1 2 3

Dim 1 (82.9%) Dim 2 (15.4%)

Amsterdam Athens Berlin Brussels Budapest Copenhagen Dublin Helsinki Kiev Krakow Lisbon London Madrid Minsk Moscow Oslo Paris Prague Reykjavik Rome Sarajevo Sofia Stockholm

  • 1.0
  • 0.5

0.0 0.5 1.0

  • 1.0
  • 0.5

0.0 0.5 1.0

Dimension 1 (82.9%) Dimension 2 (15.4%)

January February March April May June July August September October November December

Factorial analysis

Hierarchical clustering Cutting the tree Consolidation Description of clusters and factor maps

slide-11
SLIDE 11

11

0 20 40 60 80

Hierarchical Clustering

inertia gain Moscow Minsk Helsinki Oslo Stockholm Kiev Krakow Reykjavik Copenhagen Prague Sarajevo Sofia Berlin Budapest Dublin London Amsterdam Brussels Paris Madrid Rome Lisbon Athens

10 20 30 40 50 60 70

Click to cut the tree

suggested level of cutting. Option:

  • Sort the individuals as
  • n the first component.

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

Factorial analysis

Hierarchical clustering

Cutting the tree Consolidation Description of clusters and factor maps

slide-12
SLIDE 12

12

Moscow Minsk Helsinki Oslo Stockholm Kiev Krakow Reykjavik Copenhagen Prague Sarajevo Sofia Berlin Budapest Dublin London Amsterdam Brussels Paris Madrid Rome Lisbon Athens

10 20 30 40 50 60 70

Hierarchical Clustering

Colored rectangles are drawn around the

  • clusters. We keep the

same color for each cluster in the next graphs (function rect).

Options:

  • cut automatically

the tree at the suggested level,

  • Cut at level with a

choosen number of clusters.

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

Factorial analysis Hierarchical clustering

Cutting the tree Consolidation

Description of clusters and factor maps

slide-13
SLIDE 13

13

Options:

  • Draw other axes,
  • Remove the names, the centers.

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

Factor map and clusters

  • 20
  • 10

10 20 30

  • 15
  • 10
  • 5

5 10

Dim 1 (82.9%) Dim 2 (15.4%) Moscow Helsinki Minsk Reykjavik Oslo Stockholm Kiev Krakow Copenhagen Prague Berlin Sarajevo Sofia Dublin London Amsterdam Budapest Brussels Paris Madrid Rome Lisbon Athens

cluster 1 cluster 2 cluster 3 Factorial analysis Hierarchical clustering Cutting the tree Consolidation

Description of clusters and factor maps

slide-14
SLIDE 14

14

  • 20
  • 10

10 20 30 10 20 30 40 50 60 70

  • 15
  • 10
  • 5

5 10

Dim 1 (82.9%) height

cluster 1 cluster 2 cluster 3

Moscow Elsinki Minsk Reykjavik Oslo Stockholm Kiev Krakow Copenhagen Prague Berlin Sarajevo Sofia Dublin London Amsterdam Budapest Brussels Paris Madrid Rome Lisbon Athens

Options:

  • Draw only a part of the tree,
  • Draw other axes,
  • Remove the names
  • Change the height

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

Factor map, clusters, and tree

Factorial analysis Hierarchical clustering Cutting the tree Consolidation

Description of clusters and factor maps

slide-15
SLIDE 15

15 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  • 20
  • 10

10 20 30

  • 15
  • 10
  • 5

5 10 Dim 1 (82.9%) Dim 2 (15.4%)

Moscow Helsinki Minsk Reykjavik Oslo Stockholm Kiev Krakow Copenhagen Prague Berlin Sarajevo Sofia Dublin London Amsterdam Budapest Brussels Paris Madrid Rome Lisbon Athens

cluster 1 cluster 2 cluster 3

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

factorial analysis Hierarchical clustering Cutting the tree Consolidation

Description of clusters and factor maps

Option: the number of individuals for each cluster (here 2)

Cluster description (1)

By individuals

slide-16
SLIDE 16

16

Option: the number of individuals for each cluster (here 2)

Berlin Oslo Rome Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  • 20
  • 10

10 20 30

  • 15
  • 10
  • 5

5 10 Dim 1 (82.9%) Dim 2 (15.4%)

Moscow Helsinki Minsk Reykjavik Stockholm Kiev Krakow Copenhagen Prague Sarajevo Sofia Dublin London Amsterdam Budapest Brussels Paris Madrid Lisbon Athens

cluster 1 cluster 2 cluster 3

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

factorial analysis Hierarchical clustering Cutting the tree Consolidation

Description of clusters and factor maps

Cluster description (1)

By individuals

slide-17
SLIDE 17

17 Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

  • 20
  • 10

10 20 30

  • 15
  • 10
  • 5

5 10 Dim 1 (82.9%) Dim 2 (15.4%)

Moscow Helsinki Minsk Reykjavik Oslo Stockholm Kiev Krakow Copenhagen Prague Berlin Sarajevo Sofia Dublin London Amsterdam Budapest Brussels Paris Madrid Rome Lisbon Athens

cluster 1 cluster 2 cluster 3

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

factorial analysis Hierarchical clustering Cutting the tree Consolidation

Description of clusters and factor maps

Option: the number of individuals for each cluster (here 2)

Cluster description (2)

By individuals

slide-18
SLIDE 18

18

This is the result

  • f a catdes, it

describes the different clusters by the variables (the mean in the category, the v.test…)

Option:

the p.value (here 0.05).

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

factorial analysis Hierarchical clustering Cutting the tree Consolidation

Description of clusters and factor maps

Cluster description (3)

By variables

slide-19
SLIDE 19

19

This is the result of a catdes, it describes the different clusters by the axes (the mean in the category, the v.test…)

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"

Option:

the p.value (here 0.05).

factorial analysis Hierarchical clustering Cutting the tree Consolidation

Description of clusters and factor maps

Cluster description (3)

By axes

slide-20
SLIDE 20

20

Conclusion

This function was presented with a PCA, but it also acepts:

– MCA and MFA results, – directly a quantitative dataset (non- scaled PCA), – a continuous variables to divide into modalities.

A normal distribution divided in 3 clusters

slide-21
SLIDE 21

21

Mars Février Novembre Octobre Décembre Janvier Avril Septembre Août Mai Juillet Juin

cluster 1

v.test

  • 4
  • 2

2 4 Mai Janvier Décembre Juin Février Mars Avril Juillet Novembre Août Octobre Septembre

cluster 3

v.test

  • 4
  • 2

2 4

Dim.1

cluster 1

v.test

  • 4
  • 2

2 4 Dim.3

cluster 2

v.test

  • 4
  • 2

2 4 Dim.1

cluster 3

v.test

  • 4
  • 2

2 4

Function plot.catdes

It is a graphical representation of the desc.var results

Option:

  • show only the quantitative, qualitative variables or all

Le Ray, Molto - Agrocampus-Ouest Students - Feb. 09 - "HCPC"