Advanced Machine Learning Course IV - (Hierarchical) Clustering L. - PowerPoint PPT Presentation

Advanced Machine Learning Course IV - (Hierarchical) Clustering L. Omar Chehab (1) and Frédéric Pascal (2) (1) Parietal Team, Inria (2) Laboratory of Signals and Systems (L2S), CentraleSupélec, University Paris-Saclay l-emir-omar.chehab@inria.fr, frederic.pascal@centralesupelec.fr, http://fredericpascal.blogspot.fr Dominante MDS (Mathématiques, Data Sciences) Sept. - Dec., 2020

Contents 1 Introduction - Reminders of probability theory and mathematical statistics (Bayes, estimation, tests) - FP 2 Robust regression approaches - EC / OC 3 Hierarchical clustering - FP / OC 4 Stochastic approximation algorithms - EC / OC 5 Nonnegative matrix factorization (NMF) - EC / OC 6 Mixture models fitting / Model Order Selection - FP / OC 7 Inference on graphical models - EC / VR 8 Exam

Key references for this course Tan, P. N., Steinbach, M., Kumar V., Data mining cluster analysis: basic concepts and algorithms. Introduction to data mining . 2013. Bishop, C. M. Pattern Recognition and Machine Learning. Springer, 2006. Hastie, T., Tibshirani, R. and Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second edition. Springer, 2009. James, G., Witten, D., Hastie, T. and Tibshirani, R. An Introduction to Statistical Learning, with Applications in R. Springer, 2013 F. Pascal 3 / 48

Course 4 (Hierarchical) Clustering F. Pascal 4 / 48

I. Introduction to clustering II. Clustering algorithms III. Clustering algorithm performance

What is Clustering? Divide data into groups (clusters) that are meaningful and / or useful, i.e. that capture the natural structure. Purposes of the clustering is either understanding or utility: Clustering for understanding e.g., in Biology, Information retrieval (web...), Climate, Psychology and Medicine, Business... Clustering for utility: Summarization : dimension reduction → PCA, regression on high dimensional data. Work on clusters characteristics instead of all data Compression, a.k.a vector quantization Efficiently finding nearest neighbors. It is an unsupervised learning contrary to (supervised) classification! Introduction to clustering F. Pascal 5 / 48

Hierarchical vs Partitional Partitional clustering: Division of the sets of data objects into non-overlapping subsets (clusters) s.t. each data is in exactly one subset. If clusters can have sub-clusters ⇒ Hierarchical clustering: set of nested clusters, organized as a tree. Each node (cluster) in the tree (except the leaf nodes) is the union of its children (subclusters).The root of the tree is the cluster containing all objects. P1 P2 P4 P3 P1 P2 P3 P4 (a) Hierarchical Clusters (b) Dendrogram Introduction to clustering F. Pascal 6 / 48

Distinctions between sets of clusters Exclusive vs non-exclusive (overlapping): separate clusters vs points may belong to more than one cluster Fuzzy vs non-fuzzy: each observation x i belongs to every cluster C k with a given weight w k ∈ [0,1] and � K k = 1 w k = 1 (Similar to probabilistic clustering). Partial vs Complete: all data are clustered vs there may be non-clustered data, e.g., outliers, noise, “uninteresting background”... Homogeneous vs Heterogeneous: Clusters with �= size, shape, density... Introduction to clustering F. Pascal 7 / 48

Type of clusters Well-separated: Any point in a cluster is closer (or more similar) to every other point in the cluster than to any point not in the cluster. Prototype-Based: an object in a cluster is closer (more similar) to the “center” of a cluster, than to the center of any other cluster. Center = centroid (average) or medoid (most representative) Density-based: dense region of points, which is separated by low-density regions, from other regions of high density. Used when the clusters are irregular or intertwined, and when noise and outliers are present. Others... graph-based... Introduction to clustering F. Pascal 8 / 48

Data set The objective is to cluster the noisy data for a segmentation application in image processing. (c) Tree data (d) Noisy tree data Figure: Data on which the clustering algorithms are evaluated Should be easy... Introduction to clustering F. Pascal 9 / 48

I. Introduction to clustering II. Clustering algorithms K-means Hierarchical clustering DBSCAN HDBSCAN III. Clustering algorithm performance

Clustering algorithms K-means Clustering algorithms F. Pascal 10 / 48

K-means It is a prototype-based clustering technique. Notations: n unlabelled data vectors of R p denoted as x = ( x 1 ,..., x n ) which K � should be split into K classes C 1 ,..., C K , with Card( C k ) = n k , n k = n . k = 1 Centroid of C k is denoted m k . Optimal solution Number of partitions of x into K subsets: 1 K k n ( − 1) K − k C k � P ( n , K ) = K for K < n K ! k = 0 K ! where C k k !( K − k )! . K = Example: P (100,5) ≈ 10 68 !!!! Clustering algorithms K-means F. Pascal 11 / 48

K-means algorithm Partitional clustering approach where K of clusters must be specified Each observation is assigned to the cluster with the closest centroid 1 n k || x i − m k || 2 Minimizes the intra-cluster variance V = � � k i | x i ∈ C k The basic algorithm is very simple Algorithm 1 K -means algorithm Input : x observation vectors and the number K of clusters Output : z = ( z 1 ,..., z N ) , the labels of ( x 1 ,..., x N ) Initialization : Randomly select K points as the initial centroids Until convergence (define a criterion, e.g. error, changes, centroids estimation...) Repeat 1 Form K clusters by assigning x i to the closest centroid m k C k = { x i , ∀ i ∈ {1,..., n } | d ( x i , m k ) ≤ d ( x i − m j ) , ∀ j ∈ {1,..., K } } 2 Recompute the centroids ∀ k ∈ {1,..., K } : m k = 1 � x i . n k x i ∈ C k Clustering algorithms K-means F. Pascal 12 / 48

K-means drawbacks... Random initialization Empty clusters Used for clusters with convex shape sensitive to noise and outliers Computational cost ... Several alternatives K-means++: Seeding algorithm to initialize clusters with centroids “spread-out” throughout the data K-medoids: To address the robustness aspects Kernel K-means: For overcoming the convex shape Many others ... Clustering algorithms K-means F. Pascal 13 / 48

Correct initilization Iteration 1 Iteration 2 Iteration 3 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 y y y 1 1 1 0.5 0.5 0.5 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x x Iteration 4 Iteration 5 Iteration 6 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 y y y 1 1 1 0.5 0.5 0.5 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x x Clustering algorithms K-means F. Pascal 14 / 48

Correct initilization Iteration 6 Iteration 4 Iteration 2 Iteration 5 Iteration 1 Iteration 3 3 3 3 3 3 3 2.5 2.5 2.5 2.5 2.5 2.5 2 2 2 2 2 2 1.5 1.5 1.5 1.5 1.5 1.5 1 1 1 1 1 1 0.5 0.5 0.5 0.5 0.5 0.5 0 0 0 0 0 0 -2 -2 -2 -2 -2 -2 -1.5 -1.5 -1.5 -1.5 -1.5 -1.5 -1 -1 -1 -1 -1 -1 -0.5 -0.5 -0.5 -0.5 -0.5 -0.5 0 0 0 0 0 0 0.5 0.5 0.5 0.5 0.5 0.5 1 1 1 1 1 1 1.5 1.5 1.5 1.5 1.5 1.5 2 2 2 2 2 2 x x x x x x Clustering algorithms K-means F. Pascal 15 / 48

Bad initialization Iteration 1 Iteration 2 3 3 2.5 2.5 2 2 1.5 1.5 y y 1 1 0.5 0.5 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x Iteration 3 Iteration 4 Iteration 5 3 3 3 2.5 2.5 2.5 2 2 2 1.5 1.5 1.5 y y y 1 1 1 0.5 0.5 0.5 0 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x x x Clustering algorithms K-means F. Pascal 16 / 48

Results on the data set (a) K-means++ (b) “Clusters” Figure: Clustering obtained with two different initialization techniques Comments... Clustering algorithms K-means F. Pascal 17 / 48

Clustering algorithms Hierarchical clustering Clustering algorithms K-means F. Pascal 18 / 48

Hierarchical clustering Two types of Hierarchical clustering: Agglomerative: Bottom-up - Start with as much clusters as observations and iteratively aggregate observations thanks to a given distance Divise: Top-down - Start with one cluster containing all observations and iteratively split into smaller clusters Principles: Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram: A tree like diagram that records the sequences of merges or splits with branch length corresponding to cluster distance Clustering algorithms Hierarchical clustering F. Pascal 19 / 48

Hierarchical clustering 5 6 0.2 4 3 4 2 0.15 5 2 0.1 1 0.05 1 3 0 1 3 2 5 4 6 Figure: General principles Clustering algorithms Hierarchical clustering F. Pascal 20 / 48

Inter-Cluster distance Most popular clustering techniques Algorithm 2 Agglomerative hierarchical clustering Input : x observation vectors and “cutting” threshold λ Output : all merged clusters set (at each iteration) and “inter-cluster” distances (between clusters) Initialization : n = sample size = number of clusters. While Number of clusters > 1 1 Compute distances between clusters 2 Merged the two nearest clusters Clustering algorithms Hierarchical clustering F. Pascal 21 / 48

Advanced Machine Learning Course IV - (Hierarchical) Clustering L. - PowerPoint PPT Presentation

Advanced Machine Learning Course IV - (Hierarchical) Clustering L. Omar Chehab (1) and Frdric Pascal (2) (1) Parietal Team, Inria (2) Laboratory of Signals and Systems (L2S), CentraleSuplec, University Paris-Saclay

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Advanced Nutrition Course Advanced Nutrition Course 6 Week Advanced Nutrition Live Online

Hierarchical Bounding Volume October 11, 2005 () Hierarchical Bounding Volume October 11, 2005

What is a hierarchical model? Richard Erickson Quantitative Ecologist DataCamp Hierarchical

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING Overview Todays Lecture

ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Unsupervised Learning and Clustering Owen Roberts, Zach Busser, Ganesh Sugunan Hierarchical

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Multi-Building WiFi Fingerprinting using Bayesian and Hierarchical Supervised Machine Learning

Descriptive Image Paragraphs Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

CSSE463: Image Recognition Day 18 Upcoming schedule: Lightning talks shortly Midterm

Analyzing Backprop 3-4-16 Reading Quiz Q1: If a neural network has 3 layers with 10 input, 6

t r

Briefing on Management of Low-Level Waste, High-Level Waste, and Spent Nuclear Fuel September

The Structure and Application of High Level Safety Goals Geoff Vaughan Safety Goals Subcommittee

Lecture 17: Recursion The story of the universe* *According to douard Lucas, Rcrations

Higher Spin Gauge Theories Lecture 1 1- Introduction Main topic HS gauge fields

Sambuz

Useful Links

Newsletter

Mail Us