UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING - PowerPoint PPT Presentation

UNSUPERVISED LEARNING, CLUSTERING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING ▸ Supervised learning: ▸ X - y pairs, f(x) function approximation ▸ Unsupervised learning: ▸ only X, no y ▸ Exploring the space of X measurements, understanding data, identifying populations, problems, outliers (before modelling) ▸ Dimension reduction, important when working with high dimensional data ▸ Usually part of exploratory data analysis, which may lead to measuring the “supervising” signal when interesting structure is found in the X data ▸ Not a well defined problem

UNSUPERVISED LEARNING DATA EXPLORATION, DIMENSIONALITY REDUCTION ▸ Large dimensional datasets (N dim often >> N data) ▸ impossible to “visually” find structure, clusters, outliers, batch effects, etc. ▸ One way to explore the data is to somehow embed it into a few dimensions, which humans are capable of inspecting visually (1,2,3?) ▸ It is very important to know the internal structure of your data! ▸ Usually the first step with large dimensional data is dimensionality reduction ( in parallel with opening your data in a spreadsheet and just eyeballing it for a few hours :) )

UNSUPERVISED LEARNING PCA - PRINCIPAL COMPONENT ANALYSIS ▸ PCA is a linear basis transformation from the original bases to new bases dictated by the variation in the data itself ▸ 1st component direction is along the largest variance in the data ▸ 2nd component is the orthogonal direction to the 1st σ 2 signal with the largest variance and so on … σ 2 y noise ▸ Number of components is min(n_features, n_data) x ▸ The projections of the original data points give the scores ▸ Projected data points (scores) are uncorrelated in PCA space ▸ The first components capture the largest variation in the data, the interesting things! We can reveal some structure ▸ (Image from (Shlens)) of the data using only few dimensions.

UNSUPERVISED LEARNING 1.0 PCA - PRINCIPAL COMPONENT ANALYSIS Second principal component 0.5 ▸ Standard use: 2D plots of projections 0.0 ▸ Original base directions may be useful to − 0.5 plot − 1.0 − 1.0 − 0.5 0.0 0.5 1.0 First principal component ▸ Outliers: Sometimes − 0.5 0.0 0.5 components correspond UrbanPop 3 to individual data 2 Second Principal Component 0.5 points, outliers.   * * * * * * * 1 * * * * * * These should be * Rape * * * * * * * * * * * 0.0 * * 0 * * * inspected and removed. * * * * * * Assault * * * − 1 * * * PCA should be repeated * * * Murder * * − 0.5 − 2 * without the outliers. * * − 3 − 3 − 2 − 1 0 1 2 3 First Principal Component

UNSUPERVISED LEARNING PCA - PRINCIPAL COMPONENT ANALYSIS 1.0 1.0 ▸ How many components do you need? Cumulative Prop. Variance Explained 0.8 0.8 Prop. Variance Explained Proportion of variance explained. 0.6 0.6 0.4 0.4 0.2 0.2 ▸ Zero mean per dimension is assumed, 0.0 0.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 do it! (Fitting ellipse around the origin) Principal Component Principal Component ▸ If different quantities are measured, Scaled Unscaled − 0.5 0.0 0.5 − 0.5 0.0 0.5 1.0 units may not be comparable (Number 1.0 UrbanPop 3 UrbanPop 150 of fingers or height in cm?)   2 Second Principal Component Second Principal Component 0.5 100 * * * 0.5 * * * * 1 In this case, normalise original * * * * 50 * * * Rape * * * * * * Rape * * * * 0.0 * * * * * * 0 * * * * * * * * * * * * * * * * * * 0.0 dimensions to have variance = 1 * * * * * * * * * * 0 * * * * * * * * Murder * * * * Assau * * Assault * * * * * * * * * * * * − 1 * * * * * * * * * * − 50 * Murder * * − 0.5 − 0.5 − 2 * − 100 ▸ Only line of direction is defined: -1 * * − 3 flips might occur! − 3 − 2 − 1 0 1 2 3 − 100 − 50 0 50 100 150 First Principal Component First Principal Component

UNSUPERVISED LEARNING MORE DIMENSION REDUCTION, EMBEDDING ▸ MDS, Multi dimensional scaling (embed the points in low dimension, given their measured distances) ▸ T-SNE, t-distributed stochastic neighbour embedding (Local embedding, usually works best with complex data) ▸ UMAP: Uniform Manifold Approximation and Projection (way way faster than TSNE) ▸ ICA, independent component analysis (PCA: uncorrelated, ICA independent, e.g.: EEG) ▸ NMF Non-negative matrix factorisation e.g.: mutations) ▸ And more, left source http://scikit- learn.org/stable/modules/manifold.html

CLUSTERING CLUSTERING ▸ Data points can be meaningfully categorised: clusters ▸ Classification: we have labels (y) for groups ▸ Clustering: labels are not measured, they are inferred from the (X) data ▸ Not a well defined problem ▸ Clusters inferred should be validated (with measurements, new data)

CLUSTERING K-MEANS CLUSTERING ▸ A priori fix the number of clusters ▸ Minimise the sum of intra-cluster distances ▸ Algorithm: ▸ 1. randomly data assign each data point to clusters ▸ 2. calculate cluster centroids, reassign each data point to the closest centroid, repeat until convergence ▸ Distance metric is generally Euclidean ▸ Local minimum is found, repeat multiple times to for best solution, and assessment of stability ▸ Left: possible failure modes.   source (http://scikit-learn.org/stable/ auto_examples/cluster/ plot_kmeans_assumptions.html)

CLUSTERING HIERARCHICAL CLUSTERING 10 10 10 ▸ Number of clusters not fixed 8 8 8 6 6 6 ▸ Iteratively agglomerate clusters from individual observations 4 4 4 2 2 2 ▸ Algorithm: 0 0 0 ▸ 1. assign each data point to a cluster ▸ 2. join the two closest clusters Average Linkage Complete Linkage Single Linkage ▸ Cluster distance metric is super important ▸ Single (smallest pairwise distance), Average, Complete (maximal distance) ▸ The result is not a clustering, it is a dendrogram. A horizontal cut defines a clustering. Where to cut? Well.

CLUSTERING MORE CLUSTERING ▸ DBSCAN, density thresholds define clusters ▸ Spectral clustering: using the eigenvectors of the pairwise distance matrix ▸ Gaussian mixture models ▸ And more,   left source: http://scikit- learn.org/stable/modules/ clustering.html

SEMI-SUPERVISED LEARNING SEMI-SUPERVISED LEARNING ▸ Few data points have labels, most others not ▸ Exploit data structure of unlabelled examples for most effective supervised learning ▸ Use unsupervised learning to explore the data structure, clusters, and use few points to assign labels to cluster ▸ Hot topic, as data labelling is often much more expensive data unlabelled data collection

SELF-SUPERVISED LEARNING ▸ Images: Lotter et al, Zhang et al, Noroozi et Favaro, Walker et al SELF-SUPERVISED LEARNING ▸ Unsupervised learning, where a part of the data is predicted from another part of the data. ▸ Examples explain it ▸ Future video frame prediction ▸ Grayscale image colorisation ▸ Impainting ▸ Jigsaw puzzle solving ▸ Motion direction predictions → ▸ etc.. (a) (b) (c) ▸ orders of magnitudes unsupervised data is collected (images videos) ▸ Human visual learning is supposedly unsupervised (maybe it is self supervised)

REFERENCES REFERENCES ▸ ISLR, chapter 10. ▸ ESL, chapter 14. ▸ http://scikit-learn.org/stable/modules/decomposition.html#decompositions ▸ http://scikit-learn.org/stable/modules/manifold.html ▸ http://scikit-learn.org/stable/modules/clustering.html#clustering ▸ https://umap-learn.readthedocs.io/en/latest/ ▸ Shlens, J., 2014. A Tutorial on Principal Component Analysis. arXiv:1404.1100 [cs, stat]. ▸ Walker, J., Gupta, A., Hebert, M., 2015. Dense Optical Flow Prediction from a Static Image. arXiv:1505.00295 [cs]. ▸ Lotter, W., Kreiman, G., Cox, D., 2016. Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning. arXiv:1605.08104 [cs, q-bio]. ▸ Zhang, R., Isola, P., Efros, A.A., 2016. Colorful Image Colorization. arXiv:1603.08511 [cs]. ▸ Noroozi, M., Favaro, P., 2016. Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles. arXiv: 1603.09246 [cs].

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING - PowerPoint PPT Presentation

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised learning: X - y pairs, f(x) function approximation Unsupervised learning: only X, no y Exploring the space of X measurements,

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Chapter 7: Clustering (Unsupervised Data Organization) 7.1 Hierarchical Clustering 7.2 Flat

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Lecture 11 Jan-Willem van de Meent Clustering Clustering Unsupervised learning (no labels

Lecture 10 Jan-Willem van de Meent Clustering Clustering Unsupervised learning (no labels

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

INFO 1998: Introduction to Machine Learning Lecture 9: Clustering and Unsupervised Learning INFO

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

UNSUPERVISED LEARNING AND CLUSTERING Jeff Robble, Brian Renzenbrink, Doug Roberts Unsupervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Data Visualisation with R Caroline Sporleder & Ines Rehbein WS 09/10 Sporleder & Rehbein

Introduction to Data Science Winter Semester 2019/20 Oliver Ernst TU Chemnitz, Fakultt fr

Maties Machine Learning https://mml-stellenbosch.github.io/ Meet the MML Research Groups

Workshop 14: R-mode MVA Murray Logan 06 Aug 2016 > # might want to put this in to make the

Machine Learning for Computational Linguistics May 24, 2016 . ltekin, label of an unknown

/ Data Pre-Processing in R Fraud Detection Course - 2019/2020 Nuno Moniz nuno.moniz@fc.up.pt /

Introduction to the demo Dark Matter Tool on the Web

Wi Wireless Access Graduate course in Communications Engineering University of Rome La Sapienza

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING - PowerPoint PPT Presentation

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised learning: X - y pairs, f(x) function approximation Unsupervised learning: only X, no y Exploring the space of X measurements,

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Chapter 7: Clustering (Unsupervised Data Organization) 7.1 Hierarchical Clustering 7.2 Flat

Hierarchical Clustering 4-4-16 Hierarchical clustering: the setting Unsupervised learning

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

Lecture 11 Jan-Willem van de Meent Clustering Clustering Unsupervised learning (no labels

Lecture 10 Jan-Willem van de Meent Clustering Clustering Unsupervised learning (no labels

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised

Unsupervised learning Clustering and Dimensionality Reduction Marta Arias marias@cs.upc.edu

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

INFO 1998: Introduction to Machine Learning Lecture 9: Clustering and Unsupervised Learning INFO

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

UNSUPERVISED LEARNING AND CLUSTERING Jeff Robble, Brian Renzenbrink, Doug Roberts Unsupervised

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Data Visualisation with R Caroline Sporleder &amp; Ines Rehbein WS 09/10 Sporleder &amp; Rehbein

Introduction to Data Science Winter Semester 2019/20 Oliver Ernst TU Chemnitz, Fakultt fr

Maties Machine Learning https://mml-stellenbosch.github.io/ Meet the MML Research Groups

Workshop 14: R-mode MVA Murray Logan 06 Aug 2016 &gt; # might want to put this in to make the

Machine Learning for Computational Linguistics May 24, 2016 . ltekin, label of an unknown

/ Data Pre-Processing in R Fraud Detection Course - 2019/2020 Nuno Moniz nuno.moniz@fc.up.pt /

Introduction to the demo Dark Matter Tool on the Web

Wi Wireless Access Graduate course in Communications Engineering University of Rome La Sapienza

Data Visualisation with R Caroline Sporleder & Ines Rehbein WS 09/10 Sporleder & Rehbein

Workshop 14: R-mode MVA Murray Logan 06 Aug 2016 > # might want to put this in to make the