Data Warehousing and Machine Learning Density-based clustering - - PowerPoint PPT Presentation

data warehousing and machine learning
SMART_READER_LITE
LIVE PREVIEW

Data Warehousing and Machine Learning Density-based clustering - - PowerPoint PPT Presentation

Data Warehousing and Machine Learning Density-based clustering Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 Density-based Clustering DWML Spring 2008 1 / 29 Densitiy Based Clustering DBSCAN Idea:


slide-1
SLIDE 1

Data Warehousing and Machine Learning

Density-based clustering Thomas D. Nielsen

Aalborg University Department of Computer Science

Spring 2008

Density-based Clustering DWML Spring 2008 1 / 29

slide-2
SLIDE 2

Densitiy Based Clustering

DBSCAN Idea: identify contiguous regions of high density.

Density-based Clustering DWML Spring 2008 2 / 29

slide-3
SLIDE 3

Densitiy Based Clustering

Step 1: classification of points 1.: Choose parameters ǫ, k .: Label as core points: points with at least k other points within distance ǫ .: Label as border points: points within distance ǫ of a core point .: Label as isolated points: all remaining points

Density-based Clustering DWML Spring 2008 3 / 29

slide-4
SLIDE 4

Densitiy Based Clustering

Step 1: classification of points 1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ .: Label as border points: points within distance ǫ of a core point .: Label as isolated points: all remaining points

Density-based Clustering DWML Spring 2008 3 / 29

slide-5
SLIDE 5

Densitiy Based Clustering

Step 1: classification of points 1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ .: Label as border points: points within distance ǫ of a core point .: Label as isolated points: all remaining points

Density-based Clustering DWML Spring 2008 3 / 29

slide-6
SLIDE 6

Densitiy Based Clustering

Step 1: classification of points 1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ 3.: Label as border points: points within distance ǫ of a core point .: Label as isolated points: all remaining points

Density-based Clustering DWML Spring 2008 3 / 29

slide-7
SLIDE 7

Densitiy Based Clustering

Step 1: classification of points 1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ 3.: Label as border points: points within distance ǫ of a core point .: Label as isolated points: all remaining points

Density-based Clustering DWML Spring 2008 3 / 29

slide-8
SLIDE 8

Densitiy Based Clustering

Step 1: classification of points 1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ 3.: Label as border points: points within distance ǫ of a core point 4.: Label as isolated points: all remaining points

Density-based Clustering DWML Spring 2008 3 / 29

slide-9
SLIDE 9

Densitiy Based Clustering

Step 1: classification of points 1.: Choose parameters ǫ, k 2.: Label as core points: points with at least k other points within distance ǫ 3.: Label as border points: points within distance ǫ of a core point 4.: Label as isolated points: all remaining points

Density-based Clustering DWML Spring 2008 3 / 29

slide-10
SLIDE 10

Densitiy Based Clustering

Step 2: Define Connectivity

Density-based Clustering DWML Spring 2008 4 / 29

slide-11
SLIDE 11

Densitiy Based Clustering

Step 2: Define Connectivity

  • 1. Two core points are directly connected if they are within ǫ distance of each other.
  • 2. Each border point is directly connected to one randomly chosen core point within distance ǫ.

Density-based Clustering DWML Spring 2008 4 / 29

slide-12
SLIDE 12

Densitiy Based Clustering

Step 2: Define Connectivity

  • 1. Two core points are directly connected if they are within ǫ distance of each other.
  • 2. Each border point is directly connected to one randomly chosen core point within distance ǫ.
  • 3. Each connected component of the directly connected relation (with at least one core point) is a

cluster.

Density-based Clustering DWML Spring 2008 4 / 29

slide-13
SLIDE 13

Densitiy Based Clustering

Setting k and ǫ For fixed k there exist heuristic methods for choosing ǫ by considering the distribution in the data

  • f the distance to the kth nearest neighbor.

Pros and Cons + Can detect clusters of highly irregular shape + Robust with respect to outliers

  • Difficulties with clusters of varying density
  • Parameters k, ǫ must be suitably chosen

Density-based Clustering DWML Spring 2008 5 / 29

slide-14
SLIDE 14

EM Clustering

Probabilistic Model for Clustering Assumption:

  • Data a1, . . . , aN is generated by a mixture of k probability distributions P1, . . . , Pk , i.e.

P(a) =

k

X

i=1

λiPi(a) (

k

X

i=1

λi = 1)

  • Cluster label of instance = (index of) distribution from which instance was drawn
  • The Pi are not (exactly) known

Density-based Clustering DWML Spring 2008 6 / 29

slide-15
SLIDE 15

EM Clustering

Clustering principle Try to find the most likely explanation of the data, i.e.

  • determine (parameters of) P1, . . . , Pk and λ1, . . . , λk, such that
  • likelihood function

P(a1, . . . , aN | P1, . . . , Pk, λ1, . . . , λk) =

N

Y

j=1

P(aj) is maximized.

  • instance a is assigned to cluster j = maxk

i=1 Pi(a)λi.

Density-based Clustering DWML Spring 2008 7 / 29

slide-16
SLIDE 16

EM Clustering

Standard normal distribution

−5 −4 −3 −2 −1 1 2 3 4 5 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 Standard normal distribution x Probability density

A standard normal distribution (normal distribution with mean µ = 0 and standard deviation σ = 1): P(x|µ, σ) = 1 √ 2πσ exp „ − (x − µ)2 2σ2 «

Density-based Clustering DWML Spring 2008 8 / 29

slide-17
SLIDE 17

EM Clustering

Bivariate normal distribution

−4 −2 2 4 6 −0.5 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x Bivariate normal y −4 −2 2 4 6 1 2 3 4 5 0.05 0.1 0.15 0.2 y x Bivariate normal Probability density

µ = »1 2 – Σ = » 2 0.5 0.5 0.5 – A bivariate normal distribution with mean µ and covariance matrix Σ: P(x | µ, Σ) = 1 (2π)N/2|Σ|1/2 exp „ − 1 2(x − u)

TΣ−1(x − µ)

«

Density-based Clustering DWML Spring 2008 9 / 29

slide-18
SLIDE 18

EM Clustering

Mixture of Gaussians Mixture of three Gaussian distributions with weights λ = 0.2, 0.3, 0.5. Pi(x | µ, Σ) = 1 (2π)N/2|Σ|1/2 exp „ − 1 2 (x − u)

TΣ−1(x − µ)

«

Density-based Clustering DWML Spring 2008 10 / 29

slide-19
SLIDE 19

EM Clustering

Mixture Model → Data Equi-potential lines and centers of mixture components

Density-based Clustering DWML Spring 2008 11 / 29

slide-20
SLIDE 20

EM Clustering

Mixture Model → Data Equi-potential lines and centers of mixture components Sample from mixture

Density-based Clustering DWML Spring 2008 11 / 29

slide-21
SLIDE 21

EM Clustering

Mixture Model → Data Equi-potential lines and centers of mixture components Sample from mixture Data we see

Density-based Clustering DWML Spring 2008 11 / 29

slide-22
SLIDE 22

EM Clustering

Data → Clustering

Density-based Clustering DWML Spring 2008 12 / 29

slide-23
SLIDE 23

EM Clustering

Data → Clustering Fit a mixture of three Gaussians to the data

Density-based Clustering DWML Spring 2008 12 / 29

slide-24
SLIDE 24

EM Clustering

Data → Clustering Fit a mixture of three Gaussians to the data

Density-based Clustering DWML Spring 2008 12 / 29

slide-25
SLIDE 25

EM Clustering

Data → Clustering Fit a mixture of three Gaussians to the data

Density-based Clustering DWML Spring 2008 12 / 29

slide-26
SLIDE 26

EM Clustering

Data → Clustering Fit a mixture of three Gaussians to the data Assign instances to their most probable mixture components

Density-based Clustering DWML Spring 2008 12 / 29

slide-27
SLIDE 27

EM Clustering

Gaussian Mixture Models Each mixture component is a Gaussian distribution. A Gaussian distribution is determined by

  • a mean vector (“center”)
  • a covarianc matrix

Usually: all components are assumed to have the same covariance matrix. Then to fit mixture to data: need to find weights and mean vectors of mixture components. If covariance matrix is a diagonal matrix with constant entries on the diagonal, then fitting the Gaussian mixture model is equivalent to minimizing the sum of squared errors (or within cluster point scatter), i.e. the k-means algorithm effectively fits such a Gaussian mixture model.

Density-based Clustering DWML Spring 2008 13 / 29

slide-28
SLIDE 28

EM Clustering

Naive Bayes Mixture Model (for discrete attributes): each mixture component is a distribution in which the attributes are independent: A1 A2 A3 A4 A5 A6 A7 C Model determined by parameters:

  • λ1, . . . , λk (prior probabilities of the class variable)
  • P(Aj = a | C = cj)

(a ∈ States(Aj), cj ∈ States(C)) Fitting the model: finding parameters that maximize probability of observed instances.

Density-based Clustering DWML Spring 2008 14 / 29

slide-29
SLIDE 29

EM Clustering

Clustering as fitting Incomplete Data Clustering data as incomplete labeled data: SL SW PL PW Cluster 5.1 3.5 1.4 0.2 ? 4.9 3.0 1.4 0.2 ? 6.3 2.9 6.0 2.1 ? 6.3 2.5 4.9 1.5 ? . . . . . . . . . . . . . . . SubAllCap TrustSend InvRet . . . B’zambia’ Cluster y n n . . . n ? n n n . . . n ? n y n . . . n ? n n n . . . n ? . . . . . . . . . . . . . . . . . .

Density-based Clustering DWML Spring 2008 15 / 29

slide-30
SLIDE 30

EM Clustering

Given:

  • incomplete data with unobserved Cluster variable
  • a mixture model for the joint distribution of mixture component (=cluster variable) and
  • attributes. Model specifies number of states of the cluster variable.

Want:

  • the (parameters of) the mixture distribution that best fits the data
  • (as a side effect): the index of the most likely mixture component for each instance

Can use the EM algorithm for parameter estimation from incomplete data! When applied to Gaussian mixture model, EM proceeds in a similar way as k-means.

Density-based Clustering DWML Spring 2008 16 / 29

slide-31
SLIDE 31

Clustering: Evaluation

Scoring a Clustering Goal of a clustering algorithm is to find a clustering that maximizes a given score function. These score functions are often highly domain- and problem-specific. The k-means algorithm tries to minimize the sum of squared errors: W(S1, . . . , Sk ) :=

k

X

i=1

X

s,s′∈Si

d(s, s′) (but is not guaranteed to produce a global minimum). In clustering there is no gold standard (unlike in classification, where a classifier with 100% accuracy will be optimal according to every evaluation criterion)!

Density-based Clustering DWML Spring 2008 17 / 29

slide-32
SLIDE 32

Clustering: Evaluation

Axioms for Clustering Try to specify on an abstract level properties that a clustering algorithm should have. E.g.: For any cluster Si, s, s′ ∈ Si, and s′′ ∈ Si: d(s, s′) < d(s, s′′) Intuitive in many cases, but not fulfilled e.g. for “correct” clustering of two concentric circles. Kleinberg[2002] shows that there does not exist a clustering method that simultaneously satisfies three seemingly intuitive axioms.

Density-based Clustering DWML Spring 2008 18 / 29

slide-33
SLIDE 33

Data Warehousing and Machine Learning

Association Rules Thomas D. Nielsen

Aalborg University Department of Computer Science

Spring 2008

Association Rules DWML Spring 2008 19 / 29

slide-34
SLIDE 34

Association rules

Market basket data A database: Transaction Items bought 1 Beer,Soap,Milk,Butter 2 Beer,Chips,Butter 3 Milk,Spaghetti,Butter,Tomatos . . . . . . The database consists of list of itemsets, i.e. subsets of a set I of possible items. An alternative representation could be a (sparse!) 0/1-matrix: Transaction Aalborg Aquavit . . . Beer . . . Chips . . . Milk . . . ZZtop CD 1 . . . 1 . . . . . . 1 . . . 2 . . . 1 . . . 1 . . . . . . 3 . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Association Rules DWML Spring 2008 20 / 29

slide-35
SLIDE 35

Association rules

Rule structure Example: {Beer, Chips} ⇒ {Pizza}. In general, an association rule is a pattern of the form α : {Iα,1, . . . , Iα,j} | {z }

Antecedent

⇒ i |{z}

Consequent

where I is the set of items, {Iα,1, . . . , Iα,j} ⊆ I and i ∈ I.

Association Rules DWML Spring 2008 21 / 29

slide-36
SLIDE 36

Association rules

Rule structure Example: {Beer, Chips} ⇒ {Pizza}. In general, an association rule is a pattern of the form α : {Iα,1, . . . , Iα,j} | {z }

Antecedent

⇒ i |{z}

Consequent

where I is the set of items, {Iα,1, . . . , Iα,j} ⊆ I and i ∈ I. Issues to consider when learning association rules Two issues to be addressed:

  • What characterizes an “interesting” association rule?
  • The number of possible association rules is huge: k · 2k−1; for 100 items we would get

100 · 299 ≈ 6.4 · 1031.

Association Rules DWML Spring 2008 21 / 29

slide-37
SLIDE 37

Association rules

Interesting association rules We would like rules that either:

  • appears often,
  • have a high accuracy,
  • r possibly both.

Association Rules DWML Spring 2008 22 / 29

slide-38
SLIDE 38

Association rules

Interesting association rules We would like rules that either:

  • appears often,
  • have a high accuracy,
  • r possibly both.

The frequency of an itemset I ⊆ I in a database of itemsets (transactions) I1, I2, . . . , IN: freq(I) := |{j | I ⊆ Ij}|.

Association Rules DWML Spring 2008 22 / 29

slide-39
SLIDE 39

Association rules

Interesting association rules We would like rules that either:

  • appears often,
  • have a high accuracy,
  • r possibly both.

The frequency of an itemset I ⊆ I in a database of itemsets (transactions) I1, I2, . . . , IN: freq(I) := |{j | I ⊆ Ij}|. Support of α (in the given database): supp(α) := freq(I ∪ {i}) N .

Association Rules DWML Spring 2008 22 / 29

slide-40
SLIDE 40

Association rules

Interesting association rules We would like rules that either:

  • appears often,
  • have a high accuracy,
  • r possibly both.

The frequency of an itemset I ⊆ I in a database of itemsets (transactions) I1, I2, . . . , IN: freq(I) := |{j | I ⊆ Ij}|. Support of α (in the given database): supp(α) := freq(I ∪ {i}) N . Confidence (or accuracy) of α: conf(α) := freq(I ∪ {i}) freq(I) .

Association Rules DWML Spring 2008 22 / 29

slide-41
SLIDE 41

Association rules

Support, Confidence ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1

Association Rules DWML Spring 2008 23 / 29

slide-42
SLIDE 42

Association rules

Support, Confidence ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1 freq(Asparagus,Bean,Broccoli) = 1 support({Asparagus,Bean} ⇒ Broccoli) = 1 7 conf({Asparagus,Bean} ⇒ Broccoli) = 1 2

Association Rules DWML Spring 2008 23 / 29

slide-43
SLIDE 43

Association rules

Support, Confidence ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1 freq(Asparagus,Bean,Broccoli) = 1 support({Asparagus,Bean} ⇒ Broccoli) = 1 7 conf({Asparagus,Bean} ⇒ Broccoli) = 1 2 freq(Squash,Tomatos) = 2 support({Squash} ⇒ Tomatos) = 2 7 conf({Squash} ⇒ Tomatos) = 2 3

Association Rules DWML Spring 2008 23 / 29

slide-44
SLIDE 44

Association rules

Support, Accuracy Example: The rule {Beer,ZZtopCD,Olives,Milk} ⇒ {Eggs} may have a high confidence (only one transaction contained Beer,ZZtopCD,Olives,Milk, and that also contained Eggs), but is uninteresting due to low support. Example: In fraud detection we would be interested in rules with high confidence although they have low support. Association rule mining Fix s, a ∈ R. Find all association rules with support at least s and accuracy at least a.

Association Rules DWML Spring 2008 24 / 29

slide-45
SLIDE 45

Association rules

Frequent sets Fix s ∈ R. Call an itemset I frequent if freq(I) ≥ s. ∅ i1 i2 i3 i4 i1, i2 i1, i3 i1, i4 i2, i3 i2, i4 i3, i4 i1, i2, i3 i1, i2, i4 i1, i3, i4 i2, i3, i4 i1, i2, i3, i4 Itemsets for I = {i1, . . . , i4}. An itemset can only be frequent if all its subsets are frequent.

Association Rules DWML Spring 2008 25 / 29

slide-46
SLIDE 46

Association rules

The APriori Algorithm Given database over I = {i1, . . . , ik} and frequency threshold s, compute all frequent itemsets:

1

Let Fk−1 be the frequent itemsets of size k − 1.

Association Rules DWML Spring 2008 26 / 29

slide-47
SLIDE 47

Association rules

The APriori Algorithm Given database over I = {i1, . . . , ik} and frequency threshold s, compute all frequent itemsets:

1

Let Fk−1 be the frequent itemsets of size k − 1.

2

Construct a set Ck of candidate itemsets of size k from Fk−1:

1

For all I, I′ ∈ Fk−1 s.t. |I ∪ I′| = k:

1

If J ∈ Fk−1 for all J ⊆ I ∪ I′ with |J| = k − 1: Ck := Ck ∪ {I ∪ I′}.

Association Rules DWML Spring 2008 26 / 29

slide-48
SLIDE 48

Association rules

The APriori Algorithm Given database over I = {i1, . . . , ik} and frequency threshold s, compute all frequent itemsets:

1

Let Fk−1 be the frequent itemsets of size k − 1.

2

Construct a set Ck of candidate itemsets of size k from Fk−1:

1

For all I, I′ ∈ Fk−1 s.t. |I ∪ I′| = k:

1

If J ∈ Fk−1 for all J ⊆ I ∪ I′ with |J| = k − 1: Ck := Ck ∪ {I ∪ I′}.

3

Construct Fk from Ck:

1

For all I ∈ Ck

1

If freq(I) ≥ s then Fk := Fk ∪ I.

Association Rules DWML Spring 2008 26 / 29

slide-49
SLIDE 49

Association rules

The APriori Algorithm Given data and frequency threshold s = 2:

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1 Frequent 1-item sets (F1): freq(Asparagus) = 4 freq(Beans) = 3 freq(Broccoli) = 2 freq(Corn) = 5 freq(Pepper) = 2 freq(Squash) = 3 freq(Tomatos) = 4

Association Rules DWML Spring 2008 27 / 29

slide-50
SLIDE 50

Association rules

The APriori Algorithm Given data and frequency threshold s = 2:

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1 Frequent 1-item sets (F1): freq(Asparagus) = 4 freq(Beans) = 3 freq(Broccoli) = 2 freq(Corn) = 5 freq(Pepper) = 2 freq(Squash) = 3 freq(Tomatos) = 4 Candidate 2-item sets (C2): All possible combinations.

Association Rules DWML Spring 2008 27 / 29

slide-51
SLIDE 51

Association rules

The APriori Algorithm Given data and frequency threshold s = 2:

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1 Frequent 1-item sets (F1): freq(Asparagus) = 4 freq(Beans) = 3 freq(Broccoli) = 2 freq(Corn) = 5 freq(Pepper) = 2 freq(Squash) = 3 freq(Tomatos) = 4 Candidate 2-item sets (C2): All possible combinations. Frequent 2-item sets (F2): A Be Br C P S T A 2 1 2 1 2 2 Be 1 1 2 2 Br 1 1 C 2 2 3 P 1 S 2

Association Rules DWML Spring 2008 27 / 29

slide-52
SLIDE 52

Association rules

Frequent sets → Association rules Let I1, . . . , Ip be the itemsets with frequency at least s. To compute all association rules with support s and accuracy a: for j = 1, . . . , p: for all i ∈ Ij: if conf(I \ i ⇒ i) = freq(Ij)/freq(Ij \ i) ≥ a output Ij \ i ⇒ i. Ij \ i1 Ij \ i2 Ij \ i3 Ij \ ik Ij

Association Rules DWML Spring 2008 28 / 29

slide-53
SLIDE 53

Association rules

Not all rules are interesting x

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1

Consider the rule {Broccoli} ⇒ Corn: conf({Broccoli} ⇒ Corn) = freq(Broccoli, Corn) freq(Broccoli) = 1 2 . But freq(Corn) = 5!

Association Rules DWML Spring 2008 29 / 29

slide-54
SLIDE 54

Association rules

Not all rules are interesting x

ID Asparagus Beans Broccoli Corn Pepper Squash Tomatos 1 1 1 1 2 1 1 1 3 1 1 1 1 4 1 1 1 1 5 1 1 1 6 1 1 1 1 7 1 1

Consider the rule {Broccoli} ⇒ Corn: conf({Broccoli} ⇒ Corn) = freq(Broccoli, Corn) freq(Broccoli) = 1 2 . But freq(Corn) = 5! Instead you may consider the confidence ratio: conf − ratio({Broccoli} ⇒ Corn) = freq(Broccoli, Corn) freq(Broccoli)freq(Corn) .

Association Rules DWML Spring 2008 29 / 29