Introduction to Information Retrieval
Lecture 4 : Clustering
楊立偉教授
wyang@ntu.edu.tw 本投影片修改自Introduction to Information Retrieval一書之投影片 Ch 16 & 17
1
Lecture 4 : Clustering wyang@ntu.edu.tw Introduction to - - PowerPoint PPT Presentation
Introduction to Information Retrieval Lecture 4 : Clustering wyang@ntu.edu.tw Introduction to Information Retrieval Ch 16 & 17 1 Introduction to Information Retrieval Clustering :
Introduction to Information Retrieval
wyang@ntu.edu.tw 本投影片修改自Introduction to Information Retrieval一書之投影片 Ch 16 & 17
1
Introduction to Information Retrieval
2
Introduction to Information Retrieval
3
3
Introduction to Information Retrieval
4
4
Introduction to Information Retrieval
5
5
Introduction to Information Retrieval
6
Introduction to Information Retrieval
7
– [Mountain height = cluster size]
Introduction to Information Retrieval
8
Why might this happen?
Introduction to Information Retrieval
9
Introduction to Information Retrieval
10
Introduction to Information Retrieval
11
Introduction to Information Retrieval
12
Introduction to Information Retrieval
13
Introduction to Information Retrieval
14
Introduction to Information Retrieval
15
15
Introduction to Information Retrieval
16
Introduction to Information Retrieval
17
17
Introduction to Information Retrieval
18
c x
Introduction to Information Retrieval
19
Introduction to Information Retrieval
20
Introduction to Information Retrieval
21
x x
x x x x
通常做3至4回就大致穩定(但仍需視資料與群集多寡而調整)
Introduction to Information Retrieval
22
Introduction to Information Retrieval
23
Introduction to Information Retrieval
24
(sum over all di in cluster k)
回合的動作只會讓G越來越小
Introduction to Information Retrieval
25
Introduction to Information Retrieval
26
In the above, if you start with B and E as centroids you converge to {A,B,C} and {D,E,F} If you start with D and F you converge to {A,B,D,E} {C,F}
Example showing sensitivity to seeds
Introduction to Information Retrieval
27
– Partition n docs into predetermined number of clusters
連應該分成幾群都不知道
– Given docs, partition into an “appropriate” number of subsets. – E.g., for query results - ideal value of K not known up front - though UI may impose limits. 查詢結果分群時通常不會預先知道該分幾群
Introduction to Information Retrieval
28
Introduction to Information Retrieval
29
Ref: "Determining the number of clusters in a data set", Wikipedia.
Introduction to Information Retrieval
30
Ref: "Clustering Indices", clusterCrit package, R project.
Introduction to Information Retrieval
31
每個點調整後就重算重心,可以加快收斂
Introduction to Information Retrieval
32
Introduction to Information Retrieval
33
Introduction to Information Retrieval
34
Introduction to Information Retrieval
35
35
Introduction to Information Retrieval
36
36
Introduction to Information Retrieval
37
Number of points Same Cluster in clustering 分在同一群 Different Clusters in clustering 分在不同群 Same class in ground truth 已知同一類
Different classes in ground truth 已知不同類
Introduction to Information Retrieval
38
Introduction to Information Retrieval
39
Introduction to Information Retrieval
40
Introduction to Information Retrieval
41
41
Ref: Density-Based Spatial Clustering of Applications with Noise
Introduction to Information Retrieval
42
42
Introduction to Information Retrieval
43
(MinPts) within Eps—These are points that are at the interior of a cluster
neighborhood of a core point
43
Introduction to Information Retrieval
44 See http://www.cse.buffalo.edu/~jing/cse601/fa12/materials/clustering_density.pdf
Introduction to Information Retrieval
45
45
Introduction to Information Retrieval
46
Introduction to Information Retrieval
47
Introduction to Information Retrieval
48
Introduction to Information Retrieval
49
Introduction to Information Retrieval
50
Introduction to Information Retrieval
51
Visualize the algorithm http://www.naftaliharris.com/blog/visualizing-dbscan-clustering/
Introduction to Information Retrieval
52
After clustering.
Introduction to Information Retrieval
53
Introduction to Information Retrieval
54
可由每一層不斷執行分群演算法所組成
animal vertebrate fish reptile amphib. mammal worm insect crustacean invertebrate
Introduction to Information Retrieval
55
Introduction to Information Retrieval
56
Introduction to Information Retrieval
57
Introduction to Information Retrieval
58
d1 d2 d3 d4 d5
d1,d2 d4,d5 d3 d3,d4,d5
Introduction to Information Retrieval
59
Introduction to Information Retrieval
60
j i
k j k i k j i
Introduction to Information Retrieval
61
Introduction to Information Retrieval
62
j i
k j k i k j i
Introduction to Information Retrieval
63
Introduction to Information Retrieval
64
Introduction to Information Retrieval
65
Introduction to Information Retrieval
66
d1 d2 d3 d4 d5 d6
Introduction to Information Retrieval
67
– moment of point to centroid > M some cluster moment.
Centroid Outlier Say 10.
Introduction to Information Retrieval
68
Introduction to Information Retrieval
69
Introduction to Information Retrieval
70
Introduction to Information Retrieval
71
Introduction to Information Retrieval
72
Introduction to Information Retrieval
73
Introduction to Information Retrieval
74
Introduction to Information Retrieval
75
brain computer data dna evolve gene genetic life
nerve neuron number …
Clustering words into topics