CS145: INTRODUCTION TO DATA MINING
Instructor: Yizhou Sun
yzsun@cs.ucla.edu November 7, 2017
CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and - - PowerPoint PPT Presentation
CS145: INTRODUCTION TO DATA MINING Clustering Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu November 7, 2017 Learnt Clustering Methods Vector Data Set Data Sequence Data Text Data Logistic Regression; Nave
yzsun@cs.ucla.edu November 7, 2017
2
Vector Data Set Data Sequence Data Text Data Classification
Logistic Regression; Decision Tree; KNN; SVM; NN NaΓ―ve Bayes for Text
Clustering
K-means; hierarchical clustering; DBSCAN; Mixture Models PLSA
Prediction
Linear Regression GLM*
Frequent Pattern Mining
Apriori; FP growth GSP; PrefixSpan
Similarity Search
DTW
3
clustering quality measure
the clusters are separated, and how compact the clusters are
4
1 π Οπ max π
cluster ππ
matched clusters
5
6
π(ππβ©ππ) π ππ π(ππ)
π
7
|ππβ©ππ| π
π|ππβ©ππ| ππ β |ππ|
Cluster 1 Cluster 2 Cluster 3 sum crosses 5 1 2 8 circles 1 4 5 diamonds 1 3 4 sum 6 6 5 N=17
8
|ππ β© π π| |ππ| |π π|
NMI=0.36
9
Same cluster Different clusters Same class TP FN Different classes FP TN
Data points Output clustering Ground truth clustering (class) a 1 2 b 1 2 c 2 2 d 2 1
10
TP = 1 FP = 1 FN = 2 TN = 2 RI = 0.5 P= Β½, R= 1/3, F = 0.4
11
Data points Output clustering Ground truth clustering (class) a 1 2 b 1 2 c 2 2 d 2 1
12
13
14
15
16
17
18