CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 - PowerPoint PPT Presentation

CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014

Methods to Learn Matrix Data Set Data Sequence Time Series Graph & Data Network Classification Decision Tree; Naïve HMM Label Propagation Bayes; Logistic Regression SVM; kNN K-means; hierarchical SCAN; Spectral Clustering clustering; DBSCAN; Clustering Mixture Models; kernel k-means Apriori; GSP; Frequent FP-growth PrefixSpan Pattern Mining Prediction Linear Regression Autoregression Similarity DTW P-PageRank Search Ranking PageRank 2

Matrix Data: Clustering: Part 2 • Revisit K-means • Mixture Model and EM algorithm • Kernel K-means • Summary 3

Recall K-Means • Objective function 𝑙 𝑘 || 2 • 𝐾 = 𝑘=1 𝐷 𝑗 =𝑘 ||𝑦 𝑗 − 𝑑 • Total within-cluster variance • Re-arrange the objective function 𝑙 𝑘 || 2 • 𝐾 = 𝑘=1 𝑗 𝑥 𝑗𝑘 ||𝑦 𝑗 − 𝑑 • 𝑥 𝑗𝑘 ∈ {0,1} • 𝑥 𝑗𝑘 = 1, 𝑗𝑔 𝑦 𝑗 𝑐𝑓𝑚𝑝𝑜𝑕𝑡 𝑢𝑝 𝑑𝑚𝑣𝑡𝑢𝑓𝑠 𝑘; 𝑥 𝑗𝑘 = 0, 𝑝𝑢ℎ𝑓𝑠𝑥𝑗𝑡𝑓 • Looking for: • The best assignment 𝑥 𝑗𝑘 • The best center 𝑑 𝑘 4

Solution of K-Means 𝑙 𝑘 || 2 𝐾 = 𝑥 𝑗𝑘 ||𝑦 𝑗 − 𝑑 • Iterations 𝑘=1 𝑗 • Step 1: Fix centers 𝑑 𝑘 , find assignment 𝑥 𝑗𝑘 that minimizes 𝐾 𝑘 || 2 is the smallest • => 𝑥 𝑗𝑘 = 1, 𝑗𝑔 ||𝑦 𝑗 − 𝑑 • Step 2: Fix assignment 𝑥 𝑗𝑘 , find centers that minimize 𝐾 • => first derivative of 𝐾 = 0 𝜖𝐾 • => 𝜖𝑑 𝑘 = −2 𝑗 𝑥 𝑗𝑘 (𝑦 𝑗 − 𝑑 𝑘 ) = 0 𝑗 𝑥 𝑗𝑘 𝑦 𝑗 • => 𝑑 𝑘 = 𝑗 𝑥 𝑗𝑘 • Note 𝑗 𝑥 𝑗𝑘 is the total number of objects in cluster j 5

Converges! Why?

Limitations of K-Means • K-means has problems when clusters are of differing • Sizes • Densities • Non-Spherical Shapes 12

Limitations of K-Means: Different Density and Size 13

Limitations of K-Means: Non-Spherical Shapes 14

Demo • http://webdocs.cs.ualberta.ca/~yaling/Clu ster/Applet/Code/Cluster.html 15

Connections of K-means to Other Methods K-means Gaussian Kernel K- Mixture means Model 16

Matrix Data: Clustering: Part 2 • Revisit K-means • Mixture Model and EM algorithm • Kernel K-means • Summary 17

Fuzzy Set and Fuzzy Cluster • Clustering methods discussed so far • Every data object is assigned to exactly one cluster • Some applications may need for fuzzy or soft cluster assignment • Ex. An e-game could belong to both entertainment and software • Methods: fuzzy clusters and probabilistic model-based clusters • Fuzzy cluster: A fuzzy set S: F S : X → [0 , 1] (value between 0 and 1) 18

Probabilistic Model-Based Clustering • Cluster analysis is to find hidden categories. • A hidden category (i.e., probabilistic cluster) is a distribution over the data space, which can be mathematically represented using a probability density function (or distribution function). Ex. categories for digital cameras sold   consumer line vs. professional line  density functions f 1 , f 2 for C 1 , C 2  obtained by probabilistic clustering A mixture model assumes that a set of observed objects is a mixture  of instances from multiple probabilistic clusters, and conceptually each observed object is generated independently Our task : infer a set of k probabilistic clusters that is mostly likely to  generate D using the above data generation process 19

Mixture Model-Based Clustering • A set C of k probabilistic clusters C 1 , …, C k with probability density functions f 1 , …, f k , respectively, and their probabilities w 1 , …, w k , 𝑘 𝑥 𝑘 = 1 • Probability of an object i generated by cluster C j is: 𝑄(𝑦 𝑗 , 𝑨 𝑗 = 𝐷 𝑘 ) = 𝑥 𝑘 𝑔 𝑘 (𝑦 𝑗 ) • Probability of i generated by the set of cluster C is: 𝑄 𝑦 𝑗 = 𝑘 𝑥 𝑘 𝑔 𝑘 (𝑦 𝑗 ) 20

Maximum Likelihood Estimation • Since objects are assumed to be generated independently, for a data set D = {x 1 , …, x n }, we have, 𝑄 𝐸 = 𝑄 𝑦 𝑗 = 𝑥 𝑘 𝑔 𝑘 (𝑦 𝑗 ) 𝑗 𝑗 𝑘 • Task: Find a set C of k probabilistic clusters s.t. P ( D ) is maximized 21

The EM (Expectation Maximization) Algorithm • The (EM) algorithm: A framework to approach maximum likelihood or maximum a posteriori estimates of parameters in statistical models. • E-st step ep assigns objects to clusters according to the current fuzzy clustering or parameters of probabilistic clusters 𝑢 = 𝑞 𝑨 𝑗 = 𝑘 𝜄 𝑢 𝑞(𝐷 𝑢 , 𝑦 𝑗 ∝ 𝑞 𝑦 𝑗 𝐷 𝑢 , 𝜄 𝑢 ) • 𝑥 𝑗𝑘 𝑘 𝑘 𝑘 𝑘 • M-st step ep finds the new clustering or parameters that maximize the expected likelihood 22

Case 1: Gaussian Mixture Model • Generative model • For each object: • Pick its distribution component: 𝑎~𝑁𝑣𝑚𝑢𝑗 𝑥 1 , … , 𝑥 𝑙 • Sample a value from the selected distribution: 2 𝑌~𝑂 𝜈 𝑎 , 𝜏 𝑎 • Overall likelihood function 2 ) • 𝑀 𝐸| 𝜄 = 𝑗 𝑘 𝑥 𝑘 𝑞(𝑦 𝑗 |𝜈 𝑘 , 𝜏 𝑘 • Q: What is 𝜄 here? 23

Apply EM algorithm • An iterative algorithm (at iteration t+1) • E(expectation)-step • Evaluate the weight 𝑥 𝑗𝑘 when 𝜈 𝑘 , 𝜏 𝑘 , 𝑥 𝑘 are given 𝑢 𝑞(𝑦 𝑗 |𝜈 𝑘 𝑢 ,(𝜏 𝑘 2 ) 𝑢 ) 𝑥 𝑘 𝑢 = • 𝑥 𝑗𝑘 𝑢 𝑞(𝑦 𝑗 |𝜈 𝑘 𝑢 ,(𝜏 𝑘 2 ) 𝑢 ) 𝑘 𝑥 𝑘 • M(maximization)-step • Evaluate 𝜈 𝑘 , 𝜏 𝑘 , 𝜕 𝑘 when 𝑥 𝑗𝑘 ’s are given that maximize the weighted likelihood • It is equivalent to Gaussian distribution parameter estimation when each point has a weight belonging to each distribution 2 𝑢 𝑢 𝑢 𝑦 𝑗 𝑗 𝑥 𝑗𝑘 𝑦 𝑗 −𝜈 𝑘 𝑗 𝑥 𝑗𝑘 𝑢+1 = 𝑢+1 ∝ 𝑗 𝑥 𝑗𝑘 2 ) 𝑢+1 = 𝑢 • 𝜈 𝑘 𝑢 ; (𝜏 𝑘 ; 𝑥 𝑘 𝑢 𝑗 𝑥 𝑗𝑘 𝑗 𝑥 𝑗𝑘 25

K-Means: A Special Case of Gaussian Mixture Model • When each Gaussian component with covariance matrix 𝜏 2 𝐽 • Soft K-means Distance! 2 /𝜏 2 } • 𝑞 𝑦 𝑗 𝜈 𝑘 , 𝜏 2 ∝ exp{− 𝑦 𝑗 − 𝜈 𝑘 • When 𝜏 2 → 0 • Soft assignment becomes hard assignment • 𝑥 𝑗𝑘 → 1, 𝑗𝑔 𝑦 𝑗 is closest to 𝜈 𝑘 (why?) 26

Case 2: Multinomial Mixture Model • Generative model • For each object: • Pick its distribution component: 𝑎~𝑁𝑣𝑚𝑢𝑗 𝑥 1 , … , 𝑥 𝑙 • Sample a value from the selected distribution: 𝑌~𝑁𝑣𝑚𝑢𝑗 𝛾 𝑎1 , 𝛾 𝑎2 , … , 𝛾 𝑎𝑛 • Overall likelihood function • 𝑀 𝐸| 𝜄 = 𝑗 𝑘 𝑥 𝑘 𝑞(𝒚 𝑗 |𝜸 𝑘 ) • 𝑘 𝑥 𝑘 = 1; 𝑚 𝛾 𝑘𝑚 = 1 • Q: What is 𝜄 here? 27

Application: Document Clustering • A vocabulary containing m words • Each document i: • A m-dimensional vector: 𝑑 𝑗1 , 𝑑 𝑗2 , … , 𝑑 𝑗𝑛 • 𝑑 𝑗𝑚 is the number of occurrence of word l appearing in document i • Under unigram assumption Length of document ( 𝑛 𝑑 𝑗𝑚 )! 𝑑 𝑗1 … 𝛾 𝑘𝑛 𝑑 𝑗𝑛 • 𝑞 𝒚 𝑗 𝜸 𝑘 = 𝑑 𝑗1 !…𝑑 𝑗𝑛 ! 𝛾 𝑘1 Constant to all parameters 28

Example 29

Estimating Parameters • 𝑚 𝐸; 𝜄 = 𝑗 log 𝑘 𝜕 𝑘 𝑚 𝑑 𝑗𝑚 𝑚𝑝𝑕𝛾 𝑘𝑚 • Apply EM algorithm • E-step: 𝑥 𝑘 𝑞(𝒚 𝑗 |𝜸 𝑘 ) • w 𝑗𝑘 = 𝑘 𝑥 𝑘 𝑞(𝒚 𝑗 |𝜸 𝑘 ) • M-step: maximize weighted likelihood 𝑗 𝑥 𝑗𝑘 𝑚 𝑑 𝑗𝑚 𝑚𝑝𝑕𝛾 𝑘𝑚 𝑗 𝑥 𝑗𝑘 𝑑 𝑗𝑚 • 𝛾 𝑘𝑚 = 𝑚′ 𝑗 𝑥 𝑗𝑘 𝑑 𝑗𝑚′ ; 𝜕 𝑘 ∝ 𝑗 𝑥 𝑗𝑘 Weighted percentage of word l in cluster j 30

Better Way for Topic Modeling • Topic: a word distribution • Unigram multinomial mixture model • Once the topic of a document is decided, all its words are generated from that topic • PLSA (probabilistic latent semantic analysis) • Every word of a document can be sampled from different topics • LDA (Latent Dirichlet Allocation) • Assume priors on word distribution and/or document cluster distribution 31

CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 - PowerPoint PPT Presentation

CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Time Series Graph & Data Network Classification Decision Tree;

Link Analysis Lecture 7 Link Analysis November 29, 2017 1 CS6220 Data Mining Techniques

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

CS6220: DATA MINING TECHNIQUES Chapter 7: Advanced Pattern Mining Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: Yizhou Sun yzsun@ccs.neu.edu

CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Set Data: Frequent Pattern Mining Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Mining Graph/Network Data: Part I Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Set Data: Frequent Pattern Mining Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Mining Sequential and Time Series Data Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES Mining Time Series Data Instructor: Yizhou Sun yzsun@ccs.neu.edu

CS6220: DATA MINING TECHNIQUES Set Data: Frequent Pattern Mining Instructor: Yizhou Sun

CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu

CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou

IS311 Programming Concepts Abstract Window Toolkit (part 1: Drawing Simple Graphics) 2

Breaking and Fixing IoT Apps Andrei Sabelfeld Joint work with Iulia Bastys and Musard Balliu

OMWeb - Virtual Web-based Remote Library for Modelica in Engineering Courses Presenter: Mohsen

Showing your applet on a webpage 1 11/22/2013 Steps Find the build folder inside your

TORNADOES IN 2017 HTTPS://WWW.STATCRUNCH.COM/APP/INDEX.PHP?DATAID=2822127 Data Source:

Manipulating the Frame Information With an Underflow Attack Emilie FAUGERON - CARDIS 2013

Graphics Computer Graphics vs. Graphic Design Computer Graphics is not using Photoshop-

Web Engineering 1. Introduction 2. Client Side Programming Prof. Dr. Dr. h.c. mult. Gerhard