Hierarchical clustering 10601 Machine Learning
Reading: Bishop: 9-9.2
10601 Machine Learning Hierarchical clustering Reading: Bishop: - - PowerPoint PPT Presentation
10601 Machine Learning Hierarchical clustering Reading: Bishop: 9-9.2 Second half: Overview Clustering - Hierarchical, semi-supervised learning Graphical models - Bayesian networks, HMMs, Reasoning under uncertainty Putting it
Reading: Bishop: 9-9.2
such that there is
groupings among objects.
new functions for unknown genes
area
– The most cited (12,309) paper in PNAS!
techniques
labels
another unsupervised learning method later in the course
The quality or state of being similar; likeness; resemblance; as, a similarity of features.
Similarity is hard to define, but… “We know it when we see it” The real meaning
philosophical
take a more pragmatic approach. Webster's Dictionary
Definition: Let O1 and O2 be two objects from the universe of possible objects. The distance (dissimilarity) between O1 and O2 is a real number denoted by D(O1,O2)
0.23 3 342.7
gene2 gene1
A few examples:
3
d('' '', ' , '') ') = = 0 0 d(s d(s, '') , '') = = d(' d('', ', s) = | |s| --
, s2+ch2) = m min( d(s1, s2) + if ch1=ch2 then 0 else 1 f fi, d(s1+ch1, , s2) 2) + 1, 1, d(s d(s1, 1, s2+ch2 h2) + 1 ) )
Inside these black boxes: some function on two variables (might be simple or very complex) gene2 gene1
d(x,y) (xi yi)2
i
s(x,y) (xi x)(yi y)
i
x y
determine input parameters
Optional
Hierarchical
evaluate them by some criterion
the set of objects using some criterion (focus of this class)
Partitional
Top down Bottom up or top down
The number of dendrograms with n leafs = (2n -3)!/[(2(n -2)) (n -2)!]
Number Number of Possible
Dendrograms 2 1 3 3 4 15 5 105 ... … 10 34,459,425
Bottom-Up (agglomerative): Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
8 8 7 7 2 4 4 3 3 1
We begin with a distance matrix which contains the distances between every pair
Bottom-Up (agglomerative):
Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
…
Consider all possible merges… Choose the best
Bottom-Up (agglomerative):
Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
…
Consider all possible merges… Choose the best Consider all possible merges…
…
Choose the best
Bottom-Up (agglomerative):
Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
…
Consider all possible merges… Choose the best Consider all possible merges…
…
Choose the best Consider all possible merges… Choose the best
…
Bottom-Up (agglomerative):
Starting with each item in its own cluster, find the best pair to merge into a new cluster. Repeat until all clusters are fused together.
…
Consider all possible merges… Choose the best Consider all possible merges…
…
Choose the best Consider all possible merges… Choose the best
…
But how do we compute distances between clusters rather than
members in each class
long and skinny clusters
+ tight clusters
the most widely used measure Robust against noise
4 5 8 9 7 9 10 3 6 2 5 4 3 2 1 5 4 3 2 1
1 2 3 4 5
4 5 8 9 7 9 10 3 6 2 5 4 3 2 1 5 4 3 2 1
4 5 8 7 9 3 5 4 3 ) 2 , 1 ( 5 4 3 ) 2 , 1 (
1 2 3 4 5
8 } 8 , 9 min{ } , min{ 9 } 9 , 10 min{ } , min{ 3 } 3 , 6 min{ } , min{
5 , 2 5 , 1 5 ), 2 , 1 ( 4 , 2 4 , 1 4 ), 2 , 1 ( 3 , 2 3 , 1 3 ), 2 , 1 (
d d d d d d d d d
4 5 7 5 4 ) 3 , 2 , 1 ( 5 4 ) 3 , 2 , 1 (
1 2 3 4 5
4 5 8 7 9 3 5 4 3 ) 2 , 1 ( 5 4 3 ) 2 , 1 (
4 5 8 9 7 9 10 3 6 2 5 4 3 2 1 5 4 3 2 1
5 } 5 , 8 min{ } , min{ 7 } 7 , 9 min{ } , min{
5 , 3 5 ), 2 , 1 ( 5 ), 3 , 2 , 1 ( 4 , 3 4 ), 2 , 1 ( 4 ), 3 , 2 , 1 (
d d d d d d
4 5 7 5 4 ) 3 , 2 , 1 ( 5 4 ) 3 , 2 , 1 (
1 2 3 4 5
4 5 8 7 9 3 5 4 3 ) 2 , 1 ( 5 4 3 ) 2 , 1 (
4 5 8 9 7 9 10 3 6 2 5 4 3 2 1 5 4 3 2 1
5 } , min{
5 ), 3 , 2 , 1 ( 4 ), 3 , 2 , 1 ( ) 5 , 4 ( ), 3 , 2 , 1 (
d d d
29 2 6 11 9 17 10 13 24 25 26 20 22 30 27 1 3 8 4 12 5 14 23 15 16 18 19 21 28 7 1 2 3 4 5 6 7
Average linkage Single linkage Height represents distance between objects / clusters
In some cases we can determine the “correct” number of clusters. However, things are rarely this clear cut, unfortunately.
Outlier
The single isolated branch is suggestive of a data point that is very different to all others
genes in different conditions
functions for unknown genes
1 2 3 4 5 1 2 3 4 5
expression in condition 2
Re-assign and move centers, until … no objects changed membership. k1 k2 k3
Gaussian mixture clustering
Hierarchical K-means GMM Running time naively, O(N3) fastest (each iteration is linear) fast (each iteration is linear) Assumptions requires a similarity / distance measure strong assumptions strongest assumptions Input parameters none K (number of clusters) K (number of clusters) Clusters subjective (only a tree is returned) exactly K clusters exactly K clusters
10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9
In general, this is a unsolved problem. However there are many approximate methods. In the next few slides we will see an example.
1 2 3 4 5 6 7 8 9 10
When k = 1, the objective function is 873.0
1 2 3 4 5 6 7 8 9 10
When k = 2, the objective function is 173.1
1 2 3 4 5 6 7 8 9 10
When k = 3, the objective function is 133.6
0.00E+00 1.00E+02 2.00E+02 3.00E+02 4.00E+02 5.00E+02 6.00E+02 7.00E+02 8.00E+02 9.00E+02 1.00E+03 1 2 3 4 5 6
We can plot the objective function values for k equals 1 to 6… The abrupt change at k = 2, is highly suggestive of two clusters in the data. This technique for determining the number of clusters is known as “knee finding” or “elbow finding”. Note that the results are not always as clear cut as in this toy example k Objective Function
the left out data to determine which model (number of clusters) is more accurate
p(x1 xn |) p(x j |C i)wi
i1 k
j1 n