Cluster Analysis
- Grouping the data items into a number of sets such that the members
- f each set have “more in common” with each other than with any
members of any other set
– “More in common” can be defined in many ways but some form of distance metric based on the characteristics of each data item is normal – Data items belonging to a cluster will be nearer to each other in terms of this distance measure than to data items in any other cluster
- Clustering algorithms can be divided into 2 types
– Hierarchical – Non-hierarchical
Hierarchical Clustering
- Hierarchical clustering produces a family of alternative clusterings
- If we have n data items then we start with n clusters – this is our first
clustering
- We merge the two clusters which are “closest” according to some
metric to form n-1 clusters – this is our second clustering
- We continue to merge the closest pairs of clusters – producing
successive clusterings – until we have just one cluster which contains all of the data items
- This can be visualised in a dendrogram