Clustering DWML, 2007 1/27 Densitiy Based Clustering DBSCAN Idea: - - PowerPoint PPT Presentation

clustering
SMART_READER_LITE
LIVE PREVIEW

Clustering DWML, 2007 1/27 Densitiy Based Clustering DBSCAN Idea: - - PowerPoint PPT Presentation

Clustering DWML, 2007 1/27 Densitiy Based Clustering DBSCAN Idea: identify contiguous regions of high density. DWML, 2007 2/27 Densitiy Based Clustering Step 1: classification of points 1.: Choose parameters , k DWML, 2007 3/27 Densitiy


slide-1
SLIDE 1

Clustering

DWML, 2007 1/27

slide-2
SLIDE 2

Densitiy Based Clustering

DBSCAN

Idea: identify contiguous regions of high density.

DWML, 2007 2/27

slide-3
SLIDE 3

Densitiy Based Clustering

Step 1: classification of points

1.: Choose parameters ǫ, k

DWML, 2007 3/27

slide-4
SLIDE 4

Densitiy Based Clustering

Step 1: classification of points

1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ

DWML, 2007 3/27

slide-5
SLIDE 5

Densitiy Based Clustering

Step 1: classification of points

1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ

DWML, 2007 3/27

slide-6
SLIDE 6

Densitiy Based Clustering

Step 1: classification of points

1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ 3.: Label as border points: points within distance ǫ of a core point

DWML, 2007 3/27

slide-7
SLIDE 7

Densitiy Based Clustering

Step 1: classification of points

1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ 3.: Label as border points: points within distance ǫ of a core point

DWML, 2007 3/27

slide-8
SLIDE 8

Densitiy Based Clustering

Step 1: classification of points

1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ 3.: Label as border points: points within distance ǫ of a core point 4.: Label as isolated points: all remaining points

DWML, 2007 3/27

slide-9
SLIDE 9

Densitiy Based Clustering

Step 1: classification of points

1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ 3.: Label as border points: points within distance ǫ of a core point 4.: Label as isolated points: all remaining points

DWML, 2007 3/27

slide-10
SLIDE 10

Densitiy Based Clustering

Step 2: Define Connectivity

DWML, 2007 4/27

slide-11
SLIDE 11

Densitiy Based Clustering

Step 2: Define Connectivity

  • 1. Two core points are directly connected if they are within ǫ distance of each other.
  • 2. Each border point is directly connected to one randomly chosen core point within distance

ǫ.

DWML, 2007 4/27

slide-12
SLIDE 12

Densitiy Based Clustering

Step 2: Define Connectivity

  • 1. Two core points are directly connected if they are within ǫ distance of each other.
  • 2. Each border point is directly connected to one randomly chosen core point within distance

ǫ.

  • 3. Each connected component of the directly connected relation (with at least one core

point) is a cluster.

DWML, 2007 4/27

slide-13
SLIDE 13

Densitiy Based Clustering

Setting k and ǫ

For fixed k there exist heuristic methods for choosing ǫ by considering the distribution in the data of the distance to the kth nearest neighbor.

Pros and Cons + Can detect clusters of highly irregular shape + Robust with respect to outliers

  • Difficulties with clusters of varying density
  • Parameters k, ǫ must be suitably chosen

DWML, 2007 5/27

slide-14
SLIDE 14

EM Clustering

Probabilistic Model for Clustering

Assumption:

  • Data a1, . . . , aN is generated by a mixture of k probability distributions P1, . . . , Pk, i.e.

P(a) =

k

X

i=1

λiPi(a) (

k

X

i=1

λi = 1)

  • Cluster label of instance = (index of) distribution from which instance was drawn
  • The Pi are not (exactly) known

DWML, 2007 6/27

slide-15
SLIDE 15

EM Clustering

Clustering principle

Try to find the most likely explanation of the data, i.e.

  • determine (parameters of) P1, . . . , Pk and λ1, . . . , λk, such that
  • likelihood function

P(a1, . . . , aN | P1, . . . , Pk, λ1, . . . , λk) =

N

Y

j=1

P(aj) is maximized.

  • instance a is assigned to cluster j = maxk

i=1 Pi(a)λi.

DWML, 2007 7/27

slide-16
SLIDE 16

EM Clustering

Mixture of Gaussians

Mixture of three Gaussian distributions with weights λ = 0.2, 0.3, 0.5. Pi(x | µ, Σ) = 1 (2π)N/2|Σ|1/2 exp „ − 1 2(x − u)TΣ−1(x − µ) «

DWML, 2007 8/27

slide-17
SLIDE 17

EM Clustering

Mixture Model → Data

Equi-potential lines and centers of mixture components

DWML, 2007 9/27

slide-18
SLIDE 18

EM Clustering

Mixture Model → Data

Equi-potential lines and centers of mixture components Sample from mixture

DWML, 2007 9/27

slide-19
SLIDE 19

EM Clustering

Mixture Model → Data

Equi-potential lines and centers of mixture components Sample from mixture Data we see

DWML, 2007 9/27

slide-20
SLIDE 20

EM Clustering

Data → Clustering

DWML, 2007 10/27

slide-21
SLIDE 21

EM Clustering

Data → Clustering

Fit a mixture of three Gaussians to the data

DWML, 2007 10/27

slide-22
SLIDE 22

EM Clustering

Data → Clustering

Fit a mixture of three Gaussians to the data

DWML, 2007 10/27

slide-23
SLIDE 23

EM Clustering

Data → Clustering

Fit a mixture of three Gaussians to the data

DWML, 2007 10/27

slide-24
SLIDE 24

EM Clustering

Data → Clustering

Fit a mixture of three Gaussians to the data Assign instances to their most probable mixture components

DWML, 2007 10/27

slide-25
SLIDE 25

EM Clustering

Gaussian Mixture Models

Each mixture component is a Gaussian distribution. A Gaussian distribution is determined by

  • a mean vector (“center”)
  • a covarianc matrix

Usually: all components are assumed to have the same covariance matrix. Then to fit mixture to data: need to find weights and mean vectors of mixture components. If covariance matrix is a diagonal matrix with constant entries on the diagonal, then fitting the Gaussian mixture model is equivalent to minimizing the within cluster point scatter, i.e. the k-means algorithm effectively fits such a Gaussian mixture model.

DWML, 2007 11/27

slide-26
SLIDE 26

EM Clustering

Naive Bayes Mixture

Model (for discrete attributes): each mixture component is a distribution in which the attributes are independent: A1 A2 A3 A4 A5 A6 A7 C Model determined by parameters:

  • λ1, . . . , λk (prior probabilities of the class variable)
  • P(Aj = a | C = cj)

(a ∈ States(Aj), cj ∈ States(C)) Fitting the model: finding parameters that maximize probability of observed instances.

DWML, 2007 12/27

slide-27
SLIDE 27

EM Clustering

Clustering as fitting Incomplete Data

Clustering data as incomplete labeled data: SL SW PL PW Cluster 5.1 3.5 1.4 0.2 ? 4.9 3.0 1.4 0.2 ? 6.3 2.9 6.0 2.1 ? 6.3 2.5 4.9 1.5 ? . . . . . . . . . . . . . . . SubAllCap TrustSend InvRet . . . B’zambia’ Cluster y n n . . . n ? n n n . . . n ? n y n . . . n ? n n n . . . n ? . . . . . . . . . . . . . . . . . .

DWML, 2007 13/27

slide-28
SLIDE 28

EM Clustering

Given:

  • incomplete data with unobserved Cluster variable
  • a mixture model for the joint distribution of mixture component (=cluster variable) and
  • attributes. Model specifies number of states of the cluster variable.

Want:

  • the (parameters of) the mixture distribution that best fits the data
  • (as a side effect): the index of the most likely mixture component for each instance

Can use the EM algorithm for parameter estimation from incomplete data! When applied to Gaussian mixture model, EM proceeds in a similar way as k-means.

DWML, 2007 14/27

slide-29
SLIDE 29

Clustering: Evaluation

Scoring a Clustering

Goal of a clustering algorithm is to find a clustering that maximizes a given score function. These score functions are often highly domain- and problem-specific. The k-means algorithm tries to minimize the within cluster point scatter W(S1, . . . , Sk) :=

k

X

i=1

X

s,s′∈Si

d(s, s′) (but is not guaranteed to produce a global minimum). In clustering there is no gold standard (unlike in classification, where a classifier with 100% accuracy will be optimal according to every evaluation criterion)!

DWML, 2007 15/27

slide-30
SLIDE 30

Clustering: Evaluation

Axioms for Clustering

Try to specify on an abstract level properties that a clustering algorithm should have. E.g.: For any cluster Si, s, s′ ∈ Si, and s′′ ∈ Si: d(s, s′) < d(s, s′′) Intuitive in many cases, but not fulfilled e.g. for “correct” clustering of two concentric circles. Kleinberg[2002] shows that there does not exist a clustering method that simultaneously satisfies three seemingly intuitive axioms.

DWML, 2007 16/27

slide-31
SLIDE 31

Association Rules

DWML, 2007 17/27

slide-32
SLIDE 32

Association rules

Market basket data

A database: Transaction Items bought 1 Beer,Soap,Milk,Butter 2 Beer,Chips,Butter 3 Milk,Spaghetti,Butter,Tomatos . . . . . . The database consists of list of itemsets, i.e. subsets of a set I of possible items. An alternative representation could be a (sparse!) 0/1-matrix: Transaction Aalborg Aquavit . . . Beer . . . Chips . . . Milk . . . ZZtop CD 1 . . . 1 . . . . . . 1 . . . 2 . . . 1 . . . 1 . . . . . . 3 . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

DWML, 2007 18/27

slide-33
SLIDE 33

Association rules

Rule structure

Example: {Beer, Chips} ⇒ {Pizza}. In general, an association rule is a pattern of the form α : {Iα,1, . . . , Iα,j} | {z }

Antecedent

⇒ i |{z}

Consequent

where I is the set of items, {Iα,1, . . . , Iα,j} ⊆ I and i ∈ I.

DWML, 2007 19/27

slide-34
SLIDE 34

Association rules

Rule structure

Example: {Beer, Chips} ⇒ {Pizza}. In general, an association rule is a pattern of the form α : {Iα,1, . . . , Iα,j} | {z }

Antecedent

⇒ i |{z}

Consequent

where I is the set of items, {Iα,1, . . . , Iα,j} ⊆ I and i ∈ I.

Issues to consider when learning association rules

Two issues to be addressed:

  • What characterizes an “interesting” association rule?
  • The number of possible association rules is huge: k · 2k−1; for 100 items we would get

100 · 299 ≈ 6.4 · 1031.

DWML, 2007 19/27

slide-35
SLIDE 35

Association rules

Interesting association rules

We would like rules that either:

  • appears often,
  • have a high accuracy,
  • r possibly both.

DWML, 2007 20/27

slide-36
SLIDE 36

Association rules

Interesting association rules

We would like rules that either:

  • appears often,
  • have a high accuracy,
  • r possibly both.

The frequency of an itemset I ⊆ I in a database of itemsets (transactions) I1, I2, . . . , IN: freq(I) := |{j | I ⊆ Ij}|.

DWML, 2007 20/27

slide-37
SLIDE 37

Association rules

Interesting association rules

We would like rules that either:

  • appears often,
  • have a high accuracy,
  • r possibly both.

The frequency of an itemset I ⊆ I in a database of itemsets (transactions) I1, I2, . . . , IN: freq(I) := |{j | I ⊆ Ij}|. Support of α (in the given database): supp(α) := freq(I ∪ {i}) N .

DWML, 2007 20/27

slide-38
SLIDE 38

Association rules

Interesting association rules

We would like rules that either:

  • appears often,
  • have a high accuracy,
  • r possibly both.

The frequency of an itemset I ⊆ I in a database of itemsets (transactions) I1, I2, . . . , IN: freq(I) := |{j | I ⊆ Ij}|. Support of α (in the given database): supp(α) := freq(I ∪ {i}) N . Confidence (or accuracy) of α: conf(α) := freq(I ∪ {i}) freq(I) .

DWML, 2007 20/27

slide-39
SLIDE 39

Association rules

Support, Confidence

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1

DWML, 2007 21/27

slide-40
SLIDE 40

Association rules

Support, Confidence

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1 freq(Asparagus,Bean,Broccoli) = 1 support({Asparagus,Bean} ⇒ Broccoli) = 1 7 conf({Asparagus,Bean} ⇒ Broccoli) = 1 2

DWML, 2007 21/27

slide-41
SLIDE 41

Association rules

Support, Confidence

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1 freq(Asparagus,Bean,Broccoli) = 1 support({Asparagus,Bean} ⇒ Broccoli) = 1 7 conf({Asparagus,Bean} ⇒ Broccoli) = 1 2 freq(Squash,Tomatos) = 2 support({Squash} ⇒ Tomatos) = 2 7 conf({Squash} ⇒ Tomatos) = 2 3

DWML, 2007 21/27

slide-42
SLIDE 42

Association rules

Support, Accuracy Example: The rule

{Beer,ZZtopCD,Olives,Milk} ⇒ {Eggs} may have a high confidence (only one transaction contained Beer,ZZtopCD,Olives,Milk, and that also contained Eggs), but is uninteresting due to low support.

Example: In fraud detection we would be interested in rules with high confidence although

they have low support.

Association rule mining

Fix s, a ∈ R. Find all association rules with support at least s and accuracy at least a.

DWML, 2007 22/27

slide-43
SLIDE 43

Association rules

Frequent sets

Fix s ∈ R. Call an itemset I frequent if freq(I) ≥ s. ∅ i1 i2 i3 i4 i1, i2 i1, i3 i1, i4 i2, i3 i2, i4 i3, i4 i1, i2, i3 i1, i2, i4 i1, i3, i4 i2, i3, i4 i1, i2, i3, i4 Itemsets for I = {i1, . . . , i4}. An itemset can only be frequent if all its subsets are frequent.

DWML, 2007 23/27

slide-44
SLIDE 44

Association rules

The APriori Algorithm

Given database over I = {i1, . . . , ik} and frequency threshold s, compute all frequent itemsets:

  • 1. Let Fk−1 be the frequent itemsets of size k − 1.

DWML, 2007 24/27

slide-45
SLIDE 45

Association rules

The APriori Algorithm

Given database over I = {i1, . . . , ik} and frequency threshold s, compute all frequent itemsets:

  • 1. Let Fk−1 be the frequent itemsets of size k − 1.
  • 2. Construct a set Ck of candidate itemsets of size k from Fk−1:

(a) For all I, I′ ∈ Fk−1 s.t. |I ∪ I′| = k:

  • i. If J ∈ Fk−1 for all J ⊆ I ∪ I′ with |J| = k − 1:

Ck := Ck ∪ {I ∪ I′}.

DWML, 2007 24/27

slide-46
SLIDE 46

Association rules

The APriori Algorithm

Given database over I = {i1, . . . , ik} and frequency threshold s, compute all frequent itemsets:

  • 1. Let Fk−1 be the frequent itemsets of size k − 1.
  • 2. Construct a set Ck of candidate itemsets of size k from Fk−1:

(a) For all I, I′ ∈ Fk−1 s.t. |I ∪ I′| = k:

  • i. If J ∈ Fk−1 for all J ⊆ I ∪ I′ with |J| = k − 1:

Ck := Ck ∪ {I ∪ I′}.

  • 3. Construct Fk from Ck:

(a) For all I ∈ Ck

  • i. If freq(I) ≥ s then Fk := Fk ∪ I.

DWML, 2007 24/27

slide-47
SLIDE 47

Association rules

The APriori Algorithm

Given data and frequency threshold s = 2:

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1 Frequent 1-item sets (F1): freq(Asparagus) = 4 freq(Beans) = 3 freq(Broccoli) = 2 freq(Corn) = 5 freq(Pepper) = 2 freq(Squash) = 3 freq(Tomatos) = 4 Candidate 2-item sets (C2): All possible combinations. Frequent 2-item sets (F2): A Be Br C P S T A 2 1 2 1 2 2 Be 1 1 2 2 Br 1 1 C 2 2 3 P 1 S 2

DWML, 2007 25/27

slide-48
SLIDE 48

Association rules

Frequent sets → Association rules

Let I1, . . . , Ip be the itemsets with frequency at least s. To compute all association rules with support s and accuracy a: for j = 1, . . . , p: for all i ∈ Ij: if conf(I \ i ⇒ i) = freq(Ij)/freq(Ij \ i) ≥ a output Ij \ i ⇒ i. Ij \ i1 Ij \ i2 Ij \ i3 Ij \ ik Ij

DWML, 2007 26/27

slide-49
SLIDE 49

Association rules

Not all rules are interesting

x

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1

Consider the rule {Broccoli} ⇒ Corn: conf({Broccoli} ⇒ Corn) = freq(Broccoli, Corn) freq(Broccoli) = 1 2. But freq(Corn) = 5!

DWML, 2007 27/27

slide-50
SLIDE 50

Association rules

Not all rules are interesting

x

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1

Consider the rule {Broccoli} ⇒ Corn: conf({Broccoli} ⇒ Corn) = freq(Broccoli, Corn) freq(Broccoli) = 1 2. But freq(Corn) = 5! Instead you may consider the confidence ratio: conf-ratio({Broccoli} ⇒ Corn) = freq(Broccoli, Corn) freq(Broccoli)freq(Corn).

DWML, 2007 27/27