INF4820: Algorithms for AI and NLP Clustering
Milen Kouylekov & Stephan Oepen
Language Technology Group University of Oslo
- Oct. 2, 2014
INF4820: Algorithms for AI and NLP Clustering Milen Kouylekov & - - PowerPoint PPT Presentation
INF4820: Algorithms for AI and NLP Clustering Milen Kouylekov & Stephan Oepen Language Technology Group University of Oslo Oct. 2, 2014 Agenda Yesterday Flat clustering k -Means Today Bottom-up hierarchical clustering.
2
3
◮ The cardinality k (the number of clusters). ◮ The similarity function s.
◮ High intra-cluster similarity ◮ Low inter-cluster similarity 4
5
6
7
8
9
◮ pick k random objects from the collection; ◮ pick k random points in the space; ◮ pick k sets of m random points and compute centroids for each set; ◮ compute an hierarchical clustering on a subset of the data to find k initial
10
11
12
13
14
{cj,ck}⊆C ∧ jk
15
16
◮ Single-linkage ◮ Complete-linkage ◮ Centroid-linkage ◮ Average-linkage
17
◮ Let the nearest neighbor of cluster ck be in either ci or cj. If we merge
◮ The distance of the two closest members is a local property that is not
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
◮ Represent objects as vector of features ◮ Calculate similarity between vectors 34
◮ Partition the data into subsets, so that the similarity among members of
35
◮ sequences ◮ labelled sequences ◮ trees 36
37
38
39
40
41
42
43
44