Stairstep-like dendrogram cut: a permutation test approach Dario - - PowerPoint PPT Presentation

stairstep like dendrogram cut a permutation test approach
SMART_READER_LITE
LIVE PREVIEW

Stairstep-like dendrogram cut: a permutation test approach Dario - - PowerPoint PPT Presentation

Stairstep-like dendrogram cut: a permutation test approach Dario Bruzzese Domenico Vistocco dbruzzes@unina.it vistocco@unicas.it Department of


slide-1
SLIDE 1

Stairstep-like dendrogram cut: a permutation test approach

Dario Bruzzese Domenico Vistocco dbruzzes@unina.it vistocco@unicas.it

——————————————————————————————– Department of Department of Preventive Medical Sciences Economics UNIVERSITY OF NAPLES UNIVERSITY OF CASSINO ITALY ITALY

All computations and graphics were done using the R system (packages: cluster, clusterGeneration, ggplot2) ————————————— Slides has been composed using L

A

T EX(beamer class) and the Sweave tool

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 1 / 22

slide-2
SLIDE 2

Stairstep-like dendrogram cut: a permutation test approach

Dario Bruzzese Domenico Vistocco dbruzzes@unina.it vistocco@unicas.it

——————————————————————————————– Department of Department of Preventive Medical Sciences Economics UNIVERSITY OF NAPLES UNIVERSITY OF CASSINO ITALY ITALY

All computations and graphics were done using the R system (packages: cluster, clusterGeneration, ggplot2) ————————————— Slides has been composed using L

A

T EX(beamer class) and the Sweave tool

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 1 / 22

(a not necessarily regular cut for a dendrogram)

slide-3
SLIDE 3

Motivation

The rep1HighNoise dataset

Yeung KY, Medvedovic M, Bumgarner KY: Clustering gene-expression data with repeated measurements. Genome Biology, 2003, 4:R34

n = 200 p = 20

It is a synthetic data set with error distributions derived from real array data.

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-4
SLIDE 4

Motivation

Horizontal cut

k = 2

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-5
SLIDE 5

Motivation

Horizontal cut

k = 2 (red clusters)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-6
SLIDE 6

Motivation

Horizontal cut

k = 2 (red clusters) k = 3

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-7
SLIDE 7

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-8
SLIDE 8

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-9
SLIDE 9

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4 (blue clusters)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-10
SLIDE 10

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4 (blue clusters) . . . k = 7

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-11
SLIDE 11

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4 (blue clusters) . . . k = 7 (brown clusters)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-12
SLIDE 12

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4 (blue clusters) . . . k = 7 (brown clusters)

An alternative cut

k = 3 (rainbow clusters)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-13
SLIDE 13

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4 (blue clusters) . . . k = 7 (brown clusters)

An alternative cut

k = 3 (rainbow clusters)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-14
SLIDE 14

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4 (blue clusters) . . . k = 7 (brown clusters)

An alternative cut

k = 4 (rainbow clusters)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-15
SLIDE 15

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4 (blue clusters) . . . k = 7 (brown clusters)

An alternative cut

k = 4 (rainbow clusters)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-16
SLIDE 16

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4 (blue clusters) . . . k = 7 (brown clusters)

α = 0.01

5 clusters

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-17
SLIDE 17

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4 (blue clusters) . . . k = 7 (brown clusters)

α = 0.01

5 clusters

An alternative cut

k = 5 (rainbow clusters)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-18
SLIDE 18

Motivation

Horizontal cut

k = 2 (red clusters) k = 3 (green clusters) k = 4 (blue clusters) . . . k = 7 (brown clusters)

α = 0.01

5 clusters

An alternative cut

k = 5 (rainbow clusters)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 2 / 22

slide-19
SLIDE 19

The reference framework

Tools

Statistics R

slide-20
SLIDE 20

The reference framework

Tools

Statistics

Hierarchi- cal clustering Permuta- tion tests

R

slide-21
SLIDE 21

The reference framework

Tools

Statistics

Hierarchi- cal clustering Permuta- tion tests

R

hclust plot.hclust {stats} genRandom- Clust {cluster- Generation} qplot ggplot {ggplot2}

slide-22
SLIDE 22

La Carte

1

A (? simple ?) idea

2

A (? not so ?) simple procedure

3

Some results

4

The Wishlist

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 4 / 22

slide-23
SLIDE 23

La Carte

1

A (? simple ?) idea

2

A (? not so ?) simple procedure

3

Some results

4

The Wishlist

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 5 / 22

slide-24
SLIDE 24

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

Let:

slide-25
SLIDE 25

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

  • Let:

n the number of objects to classify;

slide-26
SLIDE 26

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1)

slide-27
SLIDE 27

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

C1

R

C1

L

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1)

slide-28
SLIDE 28

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

C2

R

C2

L

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1)

slide-29
SLIDE 29

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

C3

R

C3

L

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1)

slide-30
SLIDE 30

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

slide-31
SLIDE 31

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

h “ C1

L ∪ C1 R

” C1

R

C1

L

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

slide-32
SLIDE 32

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

h “ C2

L ∪ C2 R

” C2

R

C2

L

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

slide-33
SLIDE 33

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

h “ C3

L ∪ C3 R

” C3

R

C3

L

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

slide-34
SLIDE 34

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-35
SLIDE 35

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

h “ C1

L

” C1

L

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-36
SLIDE 36

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

h “ C1

R

” C1

R

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-37
SLIDE 37

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

h “ C2

L

” C2

L

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-38
SLIDE 38

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

h “ C2

R

” C2

R

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-39
SLIDE 39

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

h “ C3

L

” C3

L

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-40
SLIDE 40

The (? not so ?) simple idea - notation

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 6 / 22

h “ C3

R

” C3

R

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-41
SLIDE 41

The (? simple ?) idea

Input: A dataset and its related dendrogram Output: A partition of the dataset

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 7 / 22

slide-42
SLIDE 42

The (? simple ?) idea

Input: A dataset and its related dendrogram Output: A partition of the dataset initialization: aggregationLevelsToVisit ← h(C1

L ∪ C1 R)

permClusters ← [ ] i ← 1

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 7 / 22

slide-43
SLIDE 43

The (? simple ?) idea

Input: A dataset and its related dendrogram Output: A partition of the dataset initialization: aggregationLevelsToVisit ← h(C1

L ∪ C1 R)

permClusters ← [ ] i ← 1 repeat if Ci

L ≡ Ci R then

add Ci

L ∪ Ci R to permClusters

else add h(Ci

L) and h(Ci R) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending order end

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 7 / 22

slide-44
SLIDE 44

The (? simple ?) idea

Input: A dataset and its related dendrogram Output: A partition of the dataset initialization: aggregationLevelsToVisit ← h(C1

L ∪ C1 R)

permClusters ← [ ] i ← 1 repeat if Ci

L ≡ Ci R then

add Ci

L ∪ Ci R to permClusters

else add h(Ci

L) and h(Ci R) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending order end remove the first element from aggregationLevelsToVisit i ← i+1

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 7 / 22

slide-45
SLIDE 45

The (? simple ?) idea

Input: A dataset and its related dendrogram Output: A partition of the dataset initialization: aggregationLevelsToVisit ← h(C1

L ∪ C1 R)

permClusters ← [ ] i ← 1 repeat if Ci

L ≡ Ci R then

add Ci

L ∪ Ci R to permClusters

else add h(Ci

L) and h(Ci R) to aggregationLevelsToVisit

sort aggregationLevelsToVisit in descending order end remove the first element from aggregationLevelsToVisit i ← i+1 until aggregationLevelsToVisit is empty

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 7 / 22

slide-46
SLIDE 46

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

Iteration

i ← 1

slide-47
SLIDE 47

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

h “ C1

L ∪ C1 R

” C1

R

C1

L

permClusters aggregationLevelsToVisit

h(C1

L ∪ C1 R)

Iteration

i ← 1

slide-48
SLIDE 48

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

C1

R

C1

L

clusters to compare

H0 : C1

L ≡ C1 R → reject

permClusters aggregationLevelsToVisit

h(C1

L ∪ C1 R)

Iteration

i ← 1

slide-49
SLIDE 49

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

aggregationLevelsToVisit

h(C1

R), h(C1 L)

Iteration

i ← 2

C1

R

C1

L

permClusters

slide-50
SLIDE 50

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

h “ C1

R

” C1

R

aggregationLevelsToVisit

h(C1

R), h(C1 L)

Iteration

i ← 2

permClusters

slide-51
SLIDE 51

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

C2

R

C2

L

clusters to compare

H0 : C2

L ≡ C2 R → reject

aggregationLevelsToVisit

h(C1

R), h(C1 L)

Iteration

i ← 2

permClusters

slide-52
SLIDE 52

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

C2

R

C2

L

aggregationLevelsToVisit

h(C1

L), h(C2 R), h(C2 L)

Iteration

i ← 3

permClusters

slide-53
SLIDE 53

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

h “ C1

L

” C1

L

aggregationLevelsToVisit

h(C1

L), h(C2 R), h(C2 L)

Iteration

i ← 3

permClusters

slide-54
SLIDE 54

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

C3

R

C3

L

clusters to compare

H0 : C3

L ≡ C3 R → reject

aggregationLevelsToVisit

h(C1

L), h(C2 R), h(C2 L)

Iteration

i ← 3

permClusters

slide-55
SLIDE 55

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

aggregationLevelsToVisit

h(C3

R), h(C2 R), h(C2 L), h(C3 L)

Iteration

i ← 4

C3

R

C3

L

permClusters

slide-56
SLIDE 56

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

h “ C3

R

” C3

R

aggregationLevelsToVisit

h(C3

R), h(C2 R), h(C2 L), h(C3 L)

Iteration

i ← 4

permClusters

slide-57
SLIDE 57

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

permClusters

C4

L ∪ C4 R

clusters to compare

H0 : C4

L ≡ C4 R → accept C4

R

C4

L

aggregationLevelsToVisit

h(C3

R), h(C2 R), h(C2 L), h(C3 L)

Iteration

i ← 4

slide-58
SLIDE 58

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

C3

R

permClusters

C4

L ∪ C4 R ⇔ C3 R

clusters to compare

H0 : C4

L ≡ C4 R → accept

aggregationLevelsToVisit

h(C3

R), h(C2 R), h(C2 L), h(C3 L)

Iteration

i ← 4

slide-59
SLIDE 59

The (? not so ?) simple idea in action

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 8 / 22

permClusters

C3

L, C3 R, C2 L, C4 L, C4 R

aggregationLevelsToVisit Iteration

i ← 9

aggregationLevelsToVisit

h(C3

R), h(C2 R), h(C2 L), h(C3 L)

slide-60
SLIDE 60

La Carte

1

A (? simple ?) idea

2

A (? not so ?) simple procedure

3

Some results

4

The Wishlist

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 9 / 22

slide-61
SLIDE 61

The (? not so ?) simple procedure

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 10 / 22

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-62
SLIDE 62

The (? not so ?) simple procedure

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 10 / 22 max h(C3

j )

min h(C3

j )

For each k, the difference between max

j∈{L,R} h

“ Ck

j

” and min

j∈{L,R} h

“ Ck

j

” can be considered as the minimum cost necessary to merge the two classes. .

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-63
SLIDE 63

The (? not so ?) simple procedure

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 10 / 22 max h(C3

j )

h(C3

L ∪ C3 R)

For each k, the difference between max

j∈{L,R} h

“ Ck

j

” and min

j∈{L,R} h

“ Ck

j

” can be considered as the minimum cost necessary to merge the two classes. The difference between h “ Ck

L ∪ Ck R

” and max

j∈{L,R} h

“ Ck

j

” can be, instead, considered as the cost actually incurred for merging Ck

L and Ck R.

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-64
SLIDE 64

The (? not so ?) simple procedure

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 10 / 22

The ratio between these two costs: max

j∈{L,R} h

“ Ck

j

” − min

j∈{L,R} h

“ Ck

j

” h ` Ck

L ∪ Ck R

´ − max

j∈{L,R} h

“ Ck

j

” is thus a measure that characterizes the aggregation process resulting in the new class Ck

L ∪ Ck R

Let:

n the number of objects to classify; Ck

L and Ck R the two classes merged at level k

(k=1,...,n-1) h “ Ck

L ∪ Ck R

” the height necessary to merge Ck

L and Ck R

h “ Ck

j

” the height at which Ck

j has been obtained

(j ∈ { L, R })

slide-65
SLIDE 65

The (? not so ?) simple procedure: detail

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 11 / 22

The algorithm retraces down-ward the tree, starting from the root of the dendrogram where all objects are classified in a unique cluster.

slide-66
SLIDE 66

The (? not so ?) simple procedure: detail

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 11 / 22

C1

L C1 R

The algorithm retraces down-ward the tree, starting from the root of the dendrogram where all objects are classified in a unique cluster. ∀ k a permutation test is designed to test the Null Hypothesis that the two classes Ck

L and Ck R really

belong to the same cluster, i.e. : H0 : Ck

L ≡ Ck R

slide-67
SLIDE 67

The (? not so ?) simple procedure: detail

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 11 / 22

C1

L C1 R

The algorithm retraces down-ward the tree, starting from the root of the dendrogram where all objects are classified in a unique cluster. ∀ k a permutation test is designed to test the Null Hypothesis that the two classes Ck

L and Ck R really

belong to the same cluster, i.e. : H0 : Ck

L ≡ Ck R

Under H0, mixing up (permuting) the statistical units

  • f Ck

L and Ck R should not alter the aggregation pro-

cess resulting in their merging in.

slide-68
SLIDE 68

The (? not so ?) simple procedure: detail

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 11 / 22 mC1 L mC1 R

C1

L C1 R

Let mCk

L and mCk R be the two new classes obtained by permuting the elements in Ck L and Ck R

The algorithm retraces down-ward the tree, starting from the root of the dendrogram where all objects are classified in a unique cluster. ∀ k a permutation test is designed to test the Null Hypothesis that the two classes Ck

L and Ck R really

belong to the same cluster, i.e. : H0 : Ck

L ≡ Ck R

Under H0, mixing up (permuting) the statistical units

  • f Ck

L and Ck R should not alter the aggregation pro-

cess resulting in their merging in.

slide-69
SLIDE 69

The (? not so ?) simple procedure: detail

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 11 / 22 mC1 R mC1 L mC1 L mC1 R

C1

L C1 R

Let mCk

L and mCk R be the two new classes obtained by permuting the elements in Ck L and Ck R

For each of them a new dendrogram is generated.

slide-70
SLIDE 70

The (? not so ?) simple procedure: detail

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 11 / 22 mC1 R mC1 L mC1 L mC1 R

C1

L C1 R

h(mC1

R)

h(mC1

L)

Let mCk

L and mCk R be the two new classes obtained by permuting the elements in Ck L and Ck R

For each of them a new dendrogram is generated. The heights at which each of the two classes are buit up again, clearly correspond to the heights of the root nodes of the corresponding dendrograms.

slide-71
SLIDE 71

The (? not so ?) simple procedure: detail

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 11 / 22 mC1 R mC1 L mC1 L mC1 R

C1

L C1 R

h(mC1

R)

h(mC1

L)

The ratio: cost “

mCk L ∪ mCk R

” = max

j∈{L,R} h

mCk j

” − min

j∈{L,R} h

mCk j

” h ` Ck

L ∪ Ck R

´ − max

j∈{L,R} h

mCk j

” is thus a measure that characterizes the aggregation process resulting in the new (potential) class mCk

L ∪ mCk R

slide-72
SLIDE 72

The (? not so ?) simple procedure: detail

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 11 / 22 mC1 R mC1 L mC1 L mC1 R

C1

L C1 R

Under H0 the aggregation process resulting in the new cluster Ck

L ∪ Ck R should be very similar

to the one that potentially produces mCk

L ∪ mCk R; thus the two values cost

mCk L ∪ mCk R

” and cost “ Ck

L ∪ Ck R

” should be close enough.

slide-73
SLIDE 73

The (? not so ?) simple procedure: detail

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 11 / 22 mC1 R mC1 L mC1 L mC1 R

C1

L C1 R

The permutation procedure is repeated M times and each time a new couple mCk

L , mCk R is ob-

  • tained. The pvalue Montecarlo is thus computed as:

p = # ˘ cost `

mCk L ∪ mCk R

´ ≤ cost ` Ck

L ∪ Ck R

´¯ + 1 M + 1

slide-74
SLIDE 74

La Carte

1

A (? simple ?) idea

2

A (? not so ?) simple procedure

3

Some results

4

The Wishlist

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 12 / 22

slide-75
SLIDE 75

Some results

The yeast galactose dataset

Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner RE, Goodlett DR, Aebersold R, Hood L Integrated genomic and proteomic analyses of a systemically perturbed metabolic network. Science 2001, 292:929-934.

n = 205 p = 80

It is a subset of 205 genes that reflect four functional categories in the Gene Ontology listings.

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 13 / 22

slide-76
SLIDE 76

Some results

Settings

distanceMethod = euclidean aggregationMethod = Ward

α = 0.05 M = 999

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 13 / 22

slide-77
SLIDE 77

Some results

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 13 / 22

slide-78
SLIDE 78

Some results

The diabetes dataset

Banfield JD, Raftery AE Model–based Gaussian and Non–Gaussian Clustering. Biometrics, 1993, 49, 803-821.

n = 145 p = 3

It contains 145 subjects divided into three groups (normal, chemical diabetes, overt diabetes) on the basis of their

  • ral glucose tolerance

descripted by three variables

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 14 / 22

slide-79
SLIDE 79

Some results

Settings

distanceMethod = euclidean aggregationMethod = Ward

α = 0.05 M = 999

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 14 / 22

slide-80
SLIDE 80

Some results

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 14 / 22

slide-81
SLIDE 81

Some results... for 5 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 5 sepVal = 0.01

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 15 / 22

slide-82
SLIDE 82

Some results... for 5 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 5 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 15 / 22

slide-83
SLIDE 83

Some results... for 5 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 5 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

M = 999 α = 0.1

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 15 / 22

slide-84
SLIDE 84

Some results... for 5 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 5 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

M = 999 α = 0.05

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 15 / 22

slide-85
SLIDE 85

Some results... for 5 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 5 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

M = 999 α = 0.01

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 15 / 22

slide-86
SLIDE 86

Some results... for 5 variables (100 replications)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 16 / 22

slide-87
SLIDE 87

Some results... for 10 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 10 sepVal = 0.01

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 17 / 22

slide-88
SLIDE 88

Some results... for 10 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 10 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 17 / 22

slide-89
SLIDE 89

Some results... for 10 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 10 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

M = 999 α = 0.1

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 17 / 22

slide-90
SLIDE 90

Some results... for 10 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 10 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

M = 999 α = 0.05

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 17 / 22

slide-91
SLIDE 91

Some results... for 10 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 10 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

M = 999 α = 0.01

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 17 / 22

slide-92
SLIDE 92

Some results... for 10 variables (100 replications)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 18 / 22

slide-93
SLIDE 93

Some results... for 15 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 15 sepVal = 0.01

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 19 / 22

slide-94
SLIDE 94

Some results... for 15 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 15 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 19 / 22

slide-95
SLIDE 95

Some results... for 15 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 15 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

M = 999 α = 0.1

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 19 / 22

slide-96
SLIDE 96

Some results... for 15 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 15 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

M = 999 α = 0.05

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 19 / 22

slide-97
SLIDE 97

Some results... for 15 variables

genRandomCluster

numClust = 2:7 numNonNoisy = 15 sepVal = 0.01

Settings

distanceMethod = euclidean aggregationMethod = Ward

M = 999 α = 0.01

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 19 / 22

slide-98
SLIDE 98

Some results... for 15 variables (100 replications)

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 20 / 22

slide-99
SLIDE 99

La Carte

1

A (? simple ?) idea

2

A (? not so ?) simple procedure

3

Some results

4

The Wishlist

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 21 / 22

slide-100
SLIDE 100

The wishlist

Statistical issues R issues

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 22 / 22

slide-101
SLIDE 101

The wishlist

Statistical issues Quality measures of the obtained partition Use of different types of clusters

◮ different cardinality of clusters ◮ different type of cluster generation

Study on the stability of the number of Montecarlo replications Computational complexity R issues

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 22 / 22

slide-102
SLIDE 102

The wishlist

Statistical issues Quality measures of the obtained partition Use of different types of clusters

◮ different cardinality of clusters ◮ different type of cluster generation

Study on the stability of the number of Montecarlo replications Computational complexity R issues profiling and optimizing the R code use of compiled code use of S3–S4 methods deploying a package

  • D. Bruzzese, D. Vistocco ( ——————————————————————————————–

Department of Stairstep-like dendrogram cut UseR 2009 22 / 22