lecture 8
play

LECTURE 8 The EM Algorithm Clustering Validation Sequence - PowerPoint PPT Presentation

DATA MINING LECTURE 8 The EM Algorithm Clustering Validation Sequence segmentation CLUSTERING What is a Clustering? In general a grouping of objects such that the objects in a group (cluster) are similar (or related) to one another and


  1. DATA MINING LECTURE 8 The EM Algorithm Clustering Validation Sequence segmentation

  2. CLUSTERING

  3. What is a Clustering? • In general a grouping of objects such that the objects in a group (cluster) are similar (or related) to one another and different from (or unrelated to) the objects in other groups Inter-cluster Intra-cluster distances are distances are maximized minimized

  4. Clustering Algorithms • K-means and its variants • Hierarchical clustering • DBSCAN

  5. MIXTURE MODELS AND THE EM ALGORITHM

  6. Model-based clustering • In order to understand our data, we will assume that there is a generative process (a model) that creates/describes the data, and we will try to find the model that best fits the data. • Models of different complexity can be defined, but we will assume that our model is a distribution from which data points are sampled • Example: the data is the height of all people in Greece • In most cases, a single distribution is not good enough to describe all data points: different parts of the data follow a different distribution • Example: the data is the height of all people in Greece and China • We need a mixture model • Different distributions correspond to different clusters in the data.

  7. Gaussian Distribution • Example: the data is the height of all people in Greece • Experience has shown that this data follows a Gaussian (Normal) distribution • Reminder: Normal distribution: 𝑓 − 𝑦−𝜈 2 1 2𝜏 2 𝑄 𝑦 = 2𝜌𝜏 • 𝜈 = mean, 𝜏 = standard deviation

  8. Gaussian Model • What is a model? • A Gaussian distribution is fully defined by the mean 𝜈 and the standard deviation 𝜏 • We define our model as the pair of parameters 𝜄 = (𝜈, 𝜏) • This is a general principle: a model is defined as a vector of parameters 𝜄

  9. Fitting the model • We want to find the normal distribution that best fits our data • Find the best values for 𝜈 and 𝜏 • But what does best fit mean?

  10. Maximum Likelihood Estimation (MLE) • Suppose that we have a vector 𝑌 = (𝑦 1 , … , 𝑦 𝑜 ) of values and we want to fit a Gaussian 𝑂(𝜈, 𝜏) model to the data • Probability of observing point 𝑦 𝑗 : 𝑓 − 𝑦 𝑗 −𝜈 2 1 𝑄 𝑦 𝑗 = 2𝜏 2 2𝜌𝜏 • Probability of observing all points (assume independence) 𝑜 𝑜 𝑓 − 𝑦 𝑗 −𝜈 2 1 2𝜏 2 𝑄 𝑌 = 𝑄 𝑦 𝑗 = 2𝜌𝜏 𝑗=1 𝑗=1 • We want to find the parameters 𝜄 = (𝜈, 𝜏) that maximize the probability 𝑄(𝑌|𝜄)

  11. Maximum Likelihood Estimation (MLE) • The probability 𝑄(𝑌|𝜄) as a function of 𝜄 is called the Likelihood function 𝑜 𝑓 − 𝑦 𝑗 −𝜈 2 1 𝑀(𝜄) = 2𝜏 2 2𝜌𝜏 𝑗=1 • It is usually easier to work with the Log-Likelihood function 𝑜 𝑦 𝑗 − 𝜈 2 − 1 𝑀𝑀 𝜄 = − 2 𝑜 log 2𝜌 − 𝑜 log 𝜏 2𝜏 2 𝑗=1 • Maximum Likelihood Estimation • Find parameters 𝜈, 𝜏 that maximize 𝑀𝑀(𝜄) 𝑜 𝑜 𝜈 = 1 𝜏 2 = 1 (𝑦 𝑗 −𝜈) 2 = 𝜏 𝑌 2 𝑜 𝑦 𝑗 = 𝜈 𝑌 𝑜 𝑗=1 𝑗=1 Sample Mean Sample Variance

  12. MLE • Note: these are also the most likely parameters given the data 𝑄 𝜄 𝑌 = 𝑄 𝑌 𝜄 𝑄(𝜄) 𝑄(𝑌) • If we have no prior information about 𝜄 , or X, then maximizing 𝑄 𝑌 𝜄 is the same as maximizing 𝑄 𝜄 𝑌

  13. Mixture of Gaussians • Suppose that you have the heights of people from Greece and China and the distribution looks like the figure below (dramatization)

  14. Mixture of Gaussians • In this case the data is the result of the mixture of two Gaussians • One for Greek people, and one for Chinese people • Identifying for each value which Gaussian is most likely to have generated it will give us a clustering.

  15. Mixture model • A value 𝑦 𝑗 is generated according to the following process: • First select the nationality • With probability 𝜌 𝐻 select Greece, with probability 𝜌 𝐷 select China (𝜌 𝐻 + 𝜌 𝐷 = 1) We can also thing of this as a Hidden Variable Z that takes two values: Greece and China • Given the nationality, generate the point from the corresponding Gaussian • 𝑄 𝑦 𝑗 𝜄 𝐻 ~ 𝑂 𝜈 𝐻 , 𝜏 𝐻 if Greece 𝜄 𝐻 : parameters of the Greek distribution 𝜄 𝐷 : parameters of the China distribution • 𝑄 𝑦 𝑗 𝜄 𝐷 ~ 𝑂 𝜈 𝐷 , 𝜏 𝐷 if China

  16. Mixture Model • Our model has the following parameters Θ = (𝜌 𝐻 , 𝜌 𝐷 , 𝜈 𝐻 , 𝜏 𝐻 , 𝜈 𝐷 , 𝜏 𝐷 ) 𝜄 𝐻 : parameters of the Greek distribution Mixture probabilities 𝜄 𝐷 : parameters of the China distribution

  17. Mixture Model • Our model has the following parameters Θ = (𝜌 𝐻 , 𝜌 𝐷 , 𝜈 𝐻 , 𝜏 𝐻 , 𝜈 𝐷 , 𝜏 𝐷 ) Mixture probabilities Distribution Parameters • For value 𝑦 𝑗 , we have: 𝑄 𝑦 𝑗 |Θ = 𝜌 𝐻 𝑄 𝑦 𝑗 𝜄 𝐻 + 𝜌 𝐷 𝑄(𝑦 𝑗 |𝜄 𝐷 ) • For all values 𝑌 = 𝑦 1 , … , 𝑦 𝑜 𝑜 𝑄 𝑌|Θ = 𝑄(𝑦 𝑗 |Θ) 𝑗=1 • We want to estimate the parameters that maximize the Likelihood of the data

  18. Mixture Model • Our model has the following parameters Θ = (𝜌 𝐻 , 𝜌 𝐷 , 𝜈 𝐻 , 𝜏 𝐻 , 𝜈 𝐷 , 𝜏 𝐷 ) Mixture probabilities Distribution Parameters • For value 𝑦 𝑗 , we have: 𝑄 𝑦 𝑗 |Θ = 𝜌 𝐻 𝑄 𝑦 𝑗 𝜄 𝐻 + 𝜌 𝐷 𝑄(𝑦 𝑗 |𝜄 𝐷 ) • For all values 𝑌 = 𝑦 1 , … , 𝑦 𝑜 𝑜 𝑄 𝑌|Θ = 𝑄(𝑦 𝑗 |Θ) 𝑗=1 • We want to estimate the parameters that maximize the Likelihood of the data

  19. Mixture Models • Once we have the parameters Θ = (𝜌 𝐻 , 𝜌 𝐷 , 𝜈 𝐻 , 𝜈 𝐷 , 𝜏 𝐻 , 𝜏 𝐷 ) we can estimate the membership probabilities 𝑄 𝐻 𝑦 𝑗 and 𝑄 𝐷 𝑦 𝑗 for each point 𝑦 𝑗 : • This is the probability that point 𝑦 𝑗 belongs to the Greek or the Chinese population (cluster) Given from the Gaussian distribution 𝑂(𝜈 𝐻 , 𝜏 𝐻 ) for Greek 𝑄 𝑦 𝑗 𝐻 𝑄(𝐻) 𝑄 𝐻 𝑦 𝑗 = 𝑄 𝑦 𝑗 𝐻 𝑄 𝐻 + 𝑄 𝑦 𝑗 𝐷 𝑄(𝐷) 𝑄 𝑦 𝑗 𝜄 𝐻 𝜌 𝐻 = 𝑄 𝑦 𝑗 𝜄 𝐻 𝜌 𝐻 + 𝑄 𝑦 𝑗 𝜄 𝐷 𝜌 𝐷

  20. EM (Expectation Maximization) Algorithm • Initialize the values of the parameters in Θ to some random values • Repeat until convergence • E-Step: Given the parameters Θ estimate the membership probabilities 𝑄 𝐻 𝑦 𝑗 and 𝑄 𝐷 𝑦 𝑗 • M-Step: Compute the parameter values that (in expectation) maximize the data likelihood 𝑜 𝑜 𝜌 𝐷 = 1 𝜌 𝐻 = 1 Fraction of 𝑜 𝑄(𝐷|𝑦 𝑗 ) 𝑜 𝑄(𝐻|𝑦 𝑗 ) population in G,C 𝑗=1 𝑗=1 𝑜 𝑄 𝐷 𝑦 𝑗 𝑜 𝑄 𝐻 𝑦 𝑗 𝜈 𝐷 = 𝑦 𝑗 MLE Estimates 𝜈 𝐻 = 𝑦 𝑗 𝑜 ∗ 𝜌 𝐷 𝑜 ∗ 𝜌 𝐻 if 𝜌 ’s were fixed 𝑗=1 𝑗=1 𝑜 𝑄 𝐷 𝑦 𝑗 𝑜 𝑄 𝐻 𝑦 𝑗 2 = 2 = 𝑦 𝑗 − 𝜈 𝐷 2 𝑦 𝑗 − 𝜈 𝐻 2 𝜏 𝐷 𝜏 𝐻 𝑜 ∗ 𝜌 𝐷 𝑜 ∗ 𝜌 𝐻 𝑗=1 𝑗=1

  21. Relationship to K-means • E-Step: Assignment of points to clusters • K-means: hard assignment, EM: soft assignment • M-Step: Computation of centroids • K-means assumes common fixed variance (spherical clusters) • EM: can change the variance for different clusters or different dimensions (ellipsoid clusters) • If the variance is fixed then both minimize the same error function

  22. CLUSTERING EVALUATION

  23. Clustering Evaluation • How do we evaluate the “ goodness ” of the resulting clusters? • But “ clustering lies in the eye of the beholder ”! • Then why do we want to evaluate them? • To avoid finding patterns in noise • To compare clusterings, or clustering algorithms • To compare against a “ ground truth ”

  24. Clusters found in Random Data 1 1 0.9 0.9 0.8 0.8 0.7 0.7 Random DBSCAN 0.6 0.6 Points y 0.5 0.5 y 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x 1 1 0.9 0.9 0.8 0.8 K-means Complete 0.7 0.7 Link 0.6 0.6 0.5 0.5 y y 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 x x

  25. Different Aspects of Cluster Validation 1. Determining the clustering tendency of a set of data, i.e., distinguishing whether non-random structure actually exists in the data. 2. Comparing the results of a cluster analysis to externally known results, e.g., to externally given class labels. 3. Evaluating how well the results of a cluster analysis fit the data without reference to external information. - Use only the data 4. Comparing the results of two different sets of cluster analyses to determine which is better. Determining the ‘correct’ number of clusters . 5. For 2, 3, and 4, we can further distinguish whether we want to evaluate the entire clustering or just individual clusters.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend