Chapter 5-2: Clu lust ster erin ing Jilles Vreeken Revision 1, - - PowerPoint PPT Presentation

β–Ά
chapter 5 2 clu lust ster erin ing
SMART_READER_LITE
LIVE PREVIEW

Chapter 5-2: Clu lust ster erin ing Jilles Vreeken Revision 1, - - PowerPoint PPT Presentation

Chapter 5-2: Clu lust ster erin ing Jilles Vreeken Revision 1, November 20 th typos fixed: dendrogram Revision 2, December 10 th clarified: we do consider a point as a member of its own -neighborhood IRDM 15/16 12 Nov 2015


slide-1
SLIDE 1

IRDM β€˜15/16

Jilles Vreeken

Chapter 5-2: Clu lust ster erin ing

12 Nov 2015 Revision 1, November 20th typo’s fixed: dendrogram Revision 2, December 10th clarified: we do consider a point 𝑦 as a member of its own πœ—-neighborhood

slide-2
SLIDE 2

IRDM β€˜15/16

Novem vember 19th

th 2015

2015

Th The Fir e First M Midt idterm T T es est

Wh Where: GΓΌnter-Hotz HΓΆrsaal (E2.2) Material: the first four lectures, the first two homeworks You are a allo llowed to br brin ing o

  • ne (1

(1) ) sheet o

  • f A

A4 p pape per wit with handwr writ itten or pr prin inted notes o

  • n bo

both s sid ides . . No other material (n l (notes, bo books, c course m materials) ) or devic ices (c (calc lculator, n notebook, c cell ph ll phone, t toothbrush, etc tc) a ) allo llowed. Br Brin ing a an ID; D; eit ither y your UdS UdS card, o

  • r pa

passport.

V-2: 2

slide-3
SLIDE 3

IRDM β€˜15/16

Preliminary dates: Februar ary 15 15th

th an

and 16 16th

th 2016

2016

Th The Fin e Final Ex l Exam am

Oral e l exam. Can o

  • nly

ly be be t taken wh when you passed tw two o

  • ut o

t of th three mid id-term t tests. More details l later.

V-2: 3

slide-4
SLIDE 4

IRDM β€˜15/16

IRDM Chapter 5, overview

1.

Basic idea

2.

Representative-based clustering

3.

Probabilistic clustering

4.

Validation

5.

Hierarchical clustering

6.

Density-based clustering

7.

Clustering high-dimensional data

You’ll find this covered in Aggarwal Ch. 6, 7 Zaki & Meira, Ch. 13β€”15

V-2: 4

slide-5
SLIDE 5

IRDM β€˜15/16

IRDM Chapter 5, today

1.

Basic idea

2.

Representative-based clustering

3.

Probabilistic clustering

4.

Validation

5.

Hierarchical clustering

6.

Density-based clustering

7.

Clustering high-dimensional data

You’ll find this covered in Aggarwal Ch. 6, 7 Zaki & Meira, Ch. 13β€”15

V-2: 5

slide-6
SLIDE 6

IRDM β€˜15/16

Chapter 5.5:

Hier ierarchi hical C Clust lustering ng

Aggarwal Ch. 6.4

V-2: 6

slide-7
SLIDE 7

IRDM β€˜15/16

The basic idea

Create clustering for each number of clusters 𝑙 = 1,2, … , π‘œ The clusterings must be hie ierarch chica ical

 every cluster of 𝑙-clustering is a union of some clusters in an π‘š-

clustering for all 𝑙 < π‘š

 i.e. for all π‘š, and for all 𝑙 > π‘š, every cluster in an π‘š-clustering is a

subset of some cluster in the 𝑙-clustering Example:

V-2: 7

k = 6

slide-8
SLIDE 8

IRDM β€˜15/16

The basic idea

Create clustering for each number of clusters 𝑙 = 1,2, … , π‘œ The clusterings must be hie ierarch chica ical

 every cluster of 𝑙-clustering is a union of some clusters in an π‘š-

clustering for all 𝑙 < π‘š

 i.e. for all π‘š, and for all 𝑙 > π‘š, every cluster in an π‘š-clustering is a

subset of some cluster in the 𝑙-clustering Example:

V-2: 8

k = 5

slide-9
SLIDE 9

IRDM β€˜15/16

The basic idea

Create clustering for each number of clusters 𝑙 = 1,2, … , π‘œ The clusterings must be hie ierarch chica ical

 every cluster of 𝑙-clustering is a union of some clusters in an π‘š-

clustering for all 𝑙 < π‘š

 i.e. for all π‘š, and for all 𝑙 > π‘š, every cluster in an π‘š-clustering is a

subset of some cluster in the 𝑙-clustering Example:

V-2: 9

k = 4

slide-10
SLIDE 10

IRDM β€˜15/16

The basic idea

Create clustering for each number of clusters 𝑙 = 1,2, … , π‘œ The clusterings must be hie ierarch chica ical

 every cluster of 𝑙-clustering is a union of some clusters in an π‘š-

clustering for all 𝑙 < π‘š

 i.e. for all π‘š, and for all 𝑙 > π‘š, every cluster in an π‘š-clustering is a

subset of some cluster in the 𝑙-clustering Example:

V-2: 10

k = 3

slide-11
SLIDE 11

IRDM β€˜15/16

The basic idea

Create clustering for each number of clusters 𝑙 = 1,2, … , π‘œ The clusterings must be hie ierarch chica ical

 every cluster of 𝑙-clustering is a union of some clusters in an π‘š-

clustering for all 𝑙 < π‘š

 i.e. for all π‘š, and for all 𝑙 > π‘š, every cluster in an π‘š-clustering is a

subset of some cluster in the 𝑙-clustering Example:

V-2: 11

k = 2

slide-12
SLIDE 12

IRDM β€˜15/16

The basic idea

Create clustering for each number of clusters 𝑙 = 1,2, … , π‘œ The clusterings must be hie ierarch chica ical

 every cluster of 𝑙-clustering is a union of some clusters in an π‘š-

clustering for all 𝑙 < π‘š

 i.e. for all π‘š, and for all 𝑙 > π‘š, every cluster in an π‘š-clustering is a

subset of some cluster in the 𝑙-clustering Example:

V-2: 12

k = 1

slide-13
SLIDE 13

IRDM β€˜15/16

Dendrograms

The difference in height between the tree and its subtrees shows the distance between the two branches

V-2: 13

Distance is β‰ˆ0.7

slide-14
SLIDE 14

IRDM β€˜15/16

Dendrograms and clusters

V-2: 14

slide-15
SLIDE 15

IRDM β€˜15/16

Dendrograms, revisited

Dendrograms show the hierarchy of the clustering Number of clusters can be deduced from a dendrogram

 higher branches

Outliers can be detected from a dendrogram

 single points that are far from others

V-2: 15

slide-16
SLIDE 16

IRDM β€˜15/16

Agglomerative and Divisive

Agglome

  • merative: bottom-up

 start with π‘œ clusters  combine two closest clusters into a cluster of one bigger cluster

Div ivis isiv ive: top-down

 start with 1 cluster  divide the cluster into two

 divide the largest (per diameter) cluster into smaller clusters

V-2: 16

slide-17
SLIDE 17

IRDM β€˜15/16

Cluster distances

The distance between two points 𝑦 and 𝑧 is 𝑒(𝑦, 𝑧) What is the distance between two clusters? Many intuitive definitions – no universal truth

 different cluster distances yield different clusterings  the selection of cluster distance depends on application

Some distances between clusters 𝐢 and 𝐷:

 minimum distance

𝑒(𝐢, 𝐷) = min {𝑒(𝑦, 𝑧) ∢ 𝑦 ∈ 𝐢 π‘π‘œπ‘’ 𝑧 ∈ 𝐷}

 maximum distance

𝑒(𝐢, 𝐷) = max {𝑒(𝑦, 𝑧) ∢ 𝑦 ∈ 𝐢 π‘π‘œπ‘’ 𝑧 ∈ 𝐷}

 average distance

𝑒(𝐢, 𝐷) = 𝑏𝑏𝑏{𝑒(𝑦, 𝑧) ∢ 𝑦 ∈ 𝐢 π‘π‘œπ‘’ 𝑧 ∈ 𝐷}

 distance of centroids

𝑒(𝐢, 𝐷) = 𝑒(𝜈𝐢, 𝜈𝐷), where 𝜈𝐢 is the centroid of 𝐢 and 𝜈𝐷 is the centroid of 𝐷

V-2: 17

slide-18
SLIDE 18

IRDM β€˜15/16

Single link

The distance between two clusters is the distance between the closest points

 𝑒(𝐢, 𝐷) = min

{𝑒(𝑦, 𝑧) ∢ 𝑦 ∈ 𝐢 π‘π‘œπ‘’ 𝑧 ∈ 𝐷}

V-2: 18

slide-19
SLIDE 19

IRDM β€˜15/16

Strength of single-link

Can n ha hand ndle non non-spheric ical l clu clusters o

  • f unequal s

l size ize

V-2: 19

slide-20
SLIDE 20

IRDM β€˜15/16

Weaknesses of single-link

Se Sens nsitive to

  • noi

noise a and nd out

  • utliers

Prod

  • duc

uces e elong

  • ngated clus

usters

V-2: 20

slide-21
SLIDE 21

IRDM β€˜15/16

Complete link

The distance between two clusters is the distance between the furthest points

 𝑒(𝐢, 𝐷) = max

{𝑒(𝑦, 𝑧) ∢ 𝑦 ∈ 𝐢 π‘π‘œπ‘’ 𝑧 ∈ 𝐷}

V-2: 21

slide-22
SLIDE 22

IRDM β€˜15/16

Strengths of complete link

Le Less s sus usceptible t to

  • noi

noise and nd out

  • utliers

V-2: 22

slide-23
SLIDE 23

IRDM β€˜15/16

Weaknesses of complete-link

Break aks s largest c st clusters Bia iased t towards s spherica ical clu l clusters

V-2: 23

slide-24
SLIDE 24

IRDM β€˜15/16

Group average and Mean distance

Gr Group

  • up average is the average of pairwise distances

 𝑒 𝐢, 𝐷 = avg 𝑒 𝑦, 𝑧 : 𝑦 ∈ 𝐢 π‘π‘œπ‘’ 𝑧 ∈ 𝐷 = βˆ‘

𝑒 𝑦,𝑧 𝐢 𝐷 π‘¦βˆˆπΆ,π‘§βˆˆπ·

Mean an di dista stance is the distance of the cluster centroids

 𝑒 𝐢, 𝐷 = 𝑒(𝜈𝐢, 𝜈𝐷)

V-2: 24

slide-25
SLIDE 25

IRDM β€˜15/16

Properties of group average

A compromise between single and complete link Le Less s sus usceptible t to

  • noi

noise and nd out

  • utliers

 similar to complete link

Bia iased t towards s spherica ical clu l clusters

 similar to complete link

V-2: 25

slide-26
SLIDE 26

IRDM β€˜15/16

Ward’s method

Ward’s dis istanc nce between clusters 𝐡 and 𝐢 is the increase in sum of squared errors (SSE) when the two clusters are merged

 SSE for cluster 𝐡 is 𝑇𝑇𝐹𝐡 = βˆ‘

𝑦 βˆ’ 𝜈𝐡 2

π‘¦βˆˆπ΅

 difference for merging clusters 𝐡 and 𝐢 into cluster 𝐷 is then

𝑒(𝐡, 𝐢) = Δ𝑇𝑇𝐹𝐷 = 𝑇𝑇𝐹𝐷 – 𝑇𝑇𝐹𝐡 – 𝑇𝑇𝐹𝐢

 or, equivalently, weighted mean distance

𝑒 𝐡, 𝐢 =

𝐡 𝐢 𝐡 +|𝐢| 𝜈A βˆ’ 𝜈𝐢 2

V-2: 26

slide-27
SLIDE 27

IRDM β€˜15/16

Discussion on Ward’s method

Le Less s sus usceptible t to

  • noi

noise and nd out

  • utliers

Biase ases t s towar ards sp s spherical al clust sters Hierarchical analogue of 𝑙-means

 hence many shared pro’s and con’s  can be used to initialise 𝑙-means

V-2: 27

slide-28
SLIDE 28

IRDM β€˜15/16

Comparison

V-2: 28

Single link Group average Complete link Ward’s method

slide-29
SLIDE 29

IRDM β€˜15/16

Comparison

V-2: 29

Single link Group average Complete link Ward’s method

slide-30
SLIDE 30

IRDM β€˜15/16

Comparison

V-2: 30

Single link Group average Complete link Ward’s method

slide-31
SLIDE 31

IRDM β€˜15/16

Comparison

V-2: 31

Single link Group average Complete link Ward’s method

slide-32
SLIDE 32

IRDM β€˜15/16

Lance-Williams formula

After merging clusters 𝐡 and 𝐢 into cluster 𝐷 we need to compute 𝐷’s distance to another cluster π‘Ž. The Lance- Williams formula provides a general equation for this:

𝑒 𝐷, π‘Ž = 𝛽𝐡𝑒 𝐡, π‘Ž + 𝛽𝐢𝑒 𝐢, π‘Ž + 𝛾𝑒 𝐡, 𝐢 + 𝛿|𝑒 𝐡, π‘Ž βˆ’ 𝑒 𝐢, π‘Ž |

V-2: 32

𝛽𝐡 𝛽𝐢 𝛾 𝛿

Single link 1/2 1/2 – 1/2 Complete link 1/2 1/2 1/2 Group average |𝐡|/(|𝐡| + |𝐢|) |𝐢|/(|𝐡| + |𝐢|) Mean distance |𝐡|/(|𝐡| + |𝐢|) |𝐢|/(|𝐡| + |𝐢|) – |𝐡||𝐢|/(|𝐡| + |𝐢|)2 Ward’s method (|𝐡| + |π‘Ž|)/(|𝐡| + |𝐢| + |π‘Ž|) (|𝐢| + |π‘Ž|)/(|𝐡| + |𝐢| + |π‘Ž|) – |π‘Ž|/(|𝐡| + |𝐢| + |π‘Ž|)

slide-33
SLIDE 33

IRDM β€˜15/16

Computational complexity

Takes 𝑃(π‘œ3) time in most cases

 π‘œ steps  in each step, π‘œ2 distance matrix must be updated and searched

𝑃(π‘œ2 log (π‘œ)) time for some approaches that use appropriate data structures

 e.g. keep distances in a heap  each step takes 𝑃(π‘œ log π‘œ) time

𝑃(π‘œ2) space complexity

 have to store the distance matrix

V-2: 33

slide-34
SLIDE 34

IRDM β€˜15/16

Chapter 5.6:

Gri Grid and and Densit Density-bas based ed

Aggarwal Ch. 6.6

V-2: 34

slide-35
SLIDE 35

IRDM β€˜15/16

The idea

Representation-based clustering can find only convex clusters

 data may contain interesting

non-convex clusters

V-2: 35

In den ensit ity-based sed clust ster ering a cluster is a β€˜dense area of points’

 how to define β€˜dense area’?

slide-36
SLIDE 36

IRDM β€˜15/16

Grid-based Clustering

Alg lgorithm hm GENERICGRID(data 𝑬, num-ranges π‘ž, min-density 𝜐) :

 discretise each dimension of 𝑬 into π‘ž ranges  determine those cells with density β‰₯ 𝜐  create a graph 𝐻 with a node per dense cell,

add an edge if the two cells are adjacent

 determine the connected components

re return points in each component as a cluster

V-2: 36

slide-37
SLIDE 37

IRDM β€˜15/16

Discussing Grid-based clustering

The G Good

  • od

 we don’t have to specify 𝑙  we can find arbitrarily shaped clusters

Th The Ba Bad

 we have to specify a global minimal density 𝜐  only points in dense cells are part of clusters, all points in

neighbouring sparse cells are ignored

The Ugl e Ugly

 we consider only a single, global, rectangular-shaped grid  number of grid cells increases exponentially with dimensionality

V-2: 37

slide-38
SLIDE 38

IRDM β€˜15/16

Some definitions

An 𝝑-neighbo hbour urhood of point π’š of data 𝑬 is the set of points of 𝑬 that are within πœ— distance from π’š

 π‘‚πœ— π’š = 𝒛 ∈ 𝑬: 𝑒 π’š, 𝒛 ≀ πœ— -- note, we count x aswell!  parameter πœ— is set by the user

Point π’š ∈ 𝑬 is a cor

  • re p

poin

  • int if π‘‚πœ— π’š

β‰₯ π‘›π‘›π‘œπ‘žπ‘›π‘›

 minpts

ts (aka 𝜐) is a user supplied parameter

Point π’š ∈ 𝑬 is a border poin

  • int if it is not a core point,

but π’š ∈ π‘‚πœ—(π’œ) for some core point π’œ A point π’š ∈ 𝑬 that is neither a core point nor a border point is called a nois

  • ise p

poin

  • int

(be aware: some definitions do count a point as a member of its own πœ—-neighborhood, some do not. Here we do.) V-2: 38

slide-39
SLIDE 39

IRDM β€˜15/16

Example

(minpts was 5, now 6 to make clear we count x as an epsilon-neighbor of itself) V-2: 39

x z y min inpt pts = 6

Core point Noise point Border point

slide-40
SLIDE 40

IRDM β€˜15/16

Density reachability

Point π’š ∈ 𝑬 is direc ectly d y density r y reacha chabl ble e from point 𝒛 ∈ 𝑬 if

 𝒛 is a core point  π’š ∈ π‘‚πœ—(𝒛)

Point π’š ∈ 𝑬 is densi sity r y reach chable e from point 𝒛 ∈ 𝑬 if there is a chain of points π’š0, π’š1, … , π’šπ‘š s.t. π’š = π’š0, 𝒛 = π’šπ‘š, and π’šπ‘—βˆ’πŸ is directly density reachable from π’šπ‘— for all 𝑛 = 1, … , π‘š

 not a symmetric relationship (!)

Points π’š, 𝒛 ∈ 𝑬 are densi sity c y conne nected ed if there exists a core point π’œ s.t. both π’š and 𝒛 are density reachable from π’œ

V-2: 40

slide-41
SLIDE 41

IRDM β€˜15/16

Density-based clusters

A de densi sity-based ed c cluster er is a maximal set of density connected points

(image from Wikipedia) V-2: 41

slide-42
SLIDE 42

IRDM β€˜15/16

The DBSCAN algorithm

 fo

for ea each unvisited point π’š in the data

 compute π‘‚πœ—(π’š)  if

if π‘‚πœ— π’š β‰₯ 𝐧𝐧𝐧𝐧𝐧𝐧

 EXPANDCLUSTER(𝑦, ++clusterID)

 EXPANDCLUSTER(π’š, ID)

 assign π’š to cluster ID and set N ← π‘‚πœ—(π’š)  fo

for ea each 𝒛 ∈ 𝑂

 if

if 𝒛 is not visited and π‘‚πœ— 𝒛 β‰₯ 𝐧𝐧𝐧𝐧𝐧𝐧

 𝑂 ← 𝑂 βˆͺ π‘‚πœ—(𝒛)  if

if 𝒛 does not belong to any cluster

 assign 𝒛 to cluster ID

V-2: 42

slide-43
SLIDE 43

IRDM β€˜15/16

More on DBSCAN

DBSCAN can return either overlapping or non-overlapping clusters

 ties are broken arbitrarily

The main time complexity comes from computing the neighborhoods

 total 𝑃(π‘œ log π‘œ) with spatial index structures

 won't work with high dimensions, worst case is 𝑃(π‘œ2)

With the neighborhoods known, DBSCAN

  • nly needs a si

singl gle p pass ass over the data

V-2: 43

slide-44
SLIDE 44

IRDM β€˜15/16

The parameters

DBSCAN requires two parameters, πœ— and 𝐧𝐧𝐧𝐧𝐧𝐧 𝐧𝐧𝐧𝐧𝐧𝐧 controls the minimum size of a cluster

 𝐧𝐧𝐧𝐧𝐧𝐧 = 1 allows singleton clusters  𝐧𝐧𝐧𝐧𝐧𝐧 = 2 makes DBSCAN essentially a single-link clustering  higher values avoid the long-and-narrow clusters of single link

πœ— controls the required density

 a single πœ— is not enough if the clusters are of very different density

V-2: 44

slide-45
SLIDE 45

IRDM β€˜15/16

Chapter 5.7:

More C e Clustering ing M Models els

Aggarwal Ch. 6.7-6.8

V-2: 45

slide-46
SLIDE 46

IRDM β€˜15/16

More clustering models

So far we’ve seen

 representative-based clustering  model-based clustering  hierarchical clustering  density-based clustering

There are many more types of clustering, including

 co-clustering  graph clustering (Aggarwal Ch. 6.8)  non-negative matrix factorisation (NMF) (Aggarwal Ch. 6.9)

But we’re not going to discuss these in IRDM.

 phew!

V-2: 46

slide-47
SLIDE 47

IRDM β€˜15/16

Chapter 5.8:

Clust lustering H Hig igh-Dimens ensional nal Dat Data

Aggarwal Ch. 7.4β€”7.4.2

V-2: 47

slide-48
SLIDE 48

IRDM β€˜15/16

Clustering High Dimensional Data

If we compute similarity over many dimensions, all points will be roughly equi-distant. The here e exist no no clus usters ov

  • ver ma

many ny dime mensions ns.

 or, are there?

Of course there are!

 data can have a much lower intrinsic dimensionality (SVD)

i.e. many dimensions are noisy, irrelevant, or copies

 data can have clusters embedded in subsets of it

its dim imens nsio ions

V-2: 48

slide-49
SLIDE 49

IRDM β€˜15/16

Spaces

The full s ll space ce of data 𝑬 is its set of attributes 𝒝 A su subsp space 𝑇 of 𝑬 is a subset of 𝒝, i.e. 𝑇 βŠ† 𝒝

 there exist 2 𝒝 βˆ’ 1 non-empty subspaces

A su subsp space c clust ster is a cluster 𝐷 over a subspace 𝑇

 a group of points that is highly similar over subspace 𝑇

V-2: 49

slide-50
SLIDE 50

IRDM β€˜15/16

High-dimensional Grids

In full-dimensional grid-based methods, the grid cells are determined on the intersection of the discretization ranges π‘ž across all ll dimensions. What happens for high-dimensional data?

 many many grid cells will be empty

CLIQUE is a generalisation of grid-based clustering to

  • subspaces. In CLIQUE the ranges are determined over
  • nly a sub

ubset o

  • f dime

mens nsions

  • ns with density greater than 𝜐.

V-2: 50

slide-51
SLIDE 51

IRDM β€˜15/16

CLustering In QUEst

CLIQUE is the first subspace clustering algorithm.

 partition each dimension into π‘ž ranges  for each subspace we now have grid

cells of the same volume

 subspace clusters are connected

dense cells in the grid

(Agrawal et al. 1998) V-2: 51

slide-52
SLIDE 52

IRDM β€˜15/16

Finding dense cells

CLIQUE uses anti-monotonicity to find dense grid cells in subspaces: the higher the dimensionality, the sparser the cells

Main Idea:



every subspace we consider is a β€˜transaction database’, every cell is then a β€˜transaction’. If a cell is 𝜐-dense, the subspace β€˜itemset’ has been β€˜bought’.



we now mine frequent itemsets with minsup=1

V-2: 52

slide-53
SLIDE 53

IRDM β€˜15/16

Example

A-priori for subspace clusters:

For every level π‘š in the subspace lattice, we check, for all subspaces 𝑇 ∈ 𝒝 π‘š whether 𝑇 contains dense cells; but only if all subspaces 𝑇′ βŠ‚ 𝑇 contain dense cells. If 𝑇 contains dense cells, we report each group of adjacent dense cells as a cluster 𝐷 over subspace 𝑇

V-2: 53 A

Dense cluster in subspace A

slide-54
SLIDE 54

IRDM β€˜15/16

Example

A-priori for subspace clusters:

For every level π‘š in the subspace lattice, we check, for all subspaces 𝑇 ∈ 𝒝 π‘š whether 𝑇 contains dense cells; but only if all subspaces 𝑇′ βŠ‚ 𝑇 contain dense cells. If 𝑇 contains dense cells, we report each group of adjacent dense cells as a cluster 𝐷 over subspace 𝑇

V-2: 54 A B

Dense cluster in subspace B

slide-55
SLIDE 55

IRDM β€˜15/16

Example

A-priori for subspace clusters:

For every level π‘š in the subspace lattice, we check, for all subspaces 𝑇 ∈ 𝒝 π‘š whether 𝑇 contains dense cells; but only if all subspaces 𝑇′ βŠ‚ 𝑇 contain dense cells. If 𝑇 contains dense cells, we report each group of adjacent dense cells as a cluster 𝐷 over subspace 𝑇

V-2: 55 A B

AB

Dense cluster in subspace AB Dense cluster in subspace AB

slide-56
SLIDE 56

IRDM β€˜15/16

Example

A-priori for subspace clusters:

For every level π‘š in the subspace lattice, we check, for all subspaces 𝑇 ∈ 𝒝 π‘š whether 𝑇 contains dense cells; but only if all subspaces 𝑇′ βŠ‚ 𝑇 contain dense cells. If 𝑇 contains dense cells, we report each group of adjacent dense cells as a cluster 𝐷 over subspace 𝑇

V-2: 56 A B

To find dense clusters in a subspace, we only have to consider grid cells that are dense in all super-spaces

slide-57
SLIDE 57

IRDM β€˜15/16

Discussion of CLIQUE

CLIQUE was the first subspace clustering algorithm.

 and it shows

It produces an enormous amount of clusters

 just like frequent itemset mining

 nothing like β€˜a summary of your dataβ€˜

This, however, is general problem of subspace clustering

 there are exponentially many subspaces  and for each subspace there are exponentially many clusters

V-2: 57

slide-58
SLIDE 58

IRDM β€˜15/16

Conclusions

Clustering is one of the most important and most used data analysis methods There exist many different types of clustering

 we’ve seen representative, hierarchical, probabilistic, and density-based

Analysis of clustering methods is often difficult Always think what you’re doing if you use clustering

 in fact, just always think what you’re doing

V-2: 58

slide-59
SLIDE 59

IRDM β€˜15/16

Thank you!

Clustering is one of the most important and most used data analysis methods There exist many different types of clustering

 we’ve seen representative, hierarchical, probabilistic, and density-based

Analysis of clustering methods is often difficult Always think what you’re doing if you use clustering

 in fact, just always think what you’re doing

V-2: 59