Unsupervised Learning Andrea Passerini passerini@disi.unitn.it - PowerPoint PPT Presentation

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised Learning

Unsupervised Learning Setting Supervised learning requires the availability of labelled examples Labelling examples can be an extremely expensive process Sometimes we don’t even know how to label examples Unsupervised techniques can be employed to group examples into clusters Unsupervised Learning

k-means clustering Setting Assumes examples should be grouped into k clusters Each cluster i is represented by its mean µ i Algorithm Initialize cluster means µ 1 , . . . , µ k 1 Iterate until no mean changes: 2 Assign each example to cluster with nearest mean 1 Update cluster means according to assigned examples 2 Unsupervised Learning

How can we define (dis)similarity between examples ? (Dis)similarity measures R d : Standard Euclidean distance in I � d � � � d ( x , x ′ ) = ( x i − x ′ i ) 2 � i = 1 Generic Minkowski metric for p ≥ 1: � d � 1 / p � d ( x , x ′ ) = | x i − x ′ i | p i = 1 Cosine similarity (cosine of the angle between vectors): x T x ′ s ( x , x ′ ) = || x |||| x ′ || Unsupervised Learning

How can we define quality of obtained clusters ? Sum-of-squared error criterion Let n i be the number of samples in cluster D i Let µ i be the cluster sample mean: µ i = 1 � x n i x ∈D i The sum-of-squared errors is defined as: k � � || x − µ i || 2 E = i = 1 x ∈D i Measures the squared error incurred in representing each example with its cluster mean Unsupervised Learning

Gaussian Mixture Model (GMM) Setting Cluster examples using a mixture of Gaussian distributions Assume number of Gaussians is given Estimate mean and possibly variance of each Gaussian Unsupervised Learning

Gaussian Mixture Model (GMM) Parameter Estimation Maximum likelihood estimation cannot be applied as cluster assignment of examples is unknown Expectation-Maximization approach: Compute expected cluster assignment given current 1 parameter setting Estimate parameters given cluster assignment 2 Iterate 3 Unsupervised Learning

Example: estimating means of k univariate Gaussians Setting A dataset of x 1 , . . . , x n examples is observed For each example x i , cluster assignment is modelled as z i 1 , . . . , z ik binary latent (i.e. unknown) variables z ij = 1 if Gaussian j generated x i , 0 otherwise. Parameters to be estimated are the µ 1 , . . . , µ k Gaussians means All Gaussians are assumed to have the same (known) variance σ 2 Unsupervised Learning

Example: estimating means of k univariate Gaussians Algorithm Initialize h = � µ 1 , . . . , µ k � 1 Iterate until difference in maximum likelihood (ML) is below 2 a certain threshold: E-step Calculate expected value E [ z ij ] of each latent variable assuming current hypothesis h = � µ 1 , . . . , µ k � holds M-step Calculate a new ML hypothesis h ′ = � µ ′ 1 , . . . , µ ′ k � assuming values of latent variables are their expected values just computed. Replace h ← h ′ Unsupervised Learning

Example: estimating means of k univariate Gaussians Algorithm E-step The expected value of z ij is the probability that x i is generated by Gaussian j assuming hypothesis h = � µ 1 , . . . , µ k � holds: exp − 1 2 σ 2 ( x i − µ j ) 2 p ( x i | µ j ) E [ z ij ] = = � k � k l = 1 exp − 1 2 σ 2 ( x i − µ l ) 2 l = 1 p ( x i | µ l ) M-step The maximum-likelihood mean µ j is the weighted sample mean, each instance being weighted by its probability of being generated by Gaussian j : � n i = 1 E [ z ij ] x i µ ′ j = � n i = 1 E [ z ij ] Unsupervised Learning

Expectation-Maximization (EM) Formal setting We are given a dataset made of an observed part X and an unobserved part Z We wish to estimate the hypothesis maximizing the expected log-likelihood for the data, with expectation taken over unobserved data: h ∗ = argmax h E Z [ ln p ( X , Z | h )] Problem The unobserved data Z should be treated as random variables governed by the distribution depending on X and h Unsupervised Learning

Expectation-Maximization (EM) Generic algorithm Initialize hypothesis h 1 Iterate until convergence 2 E-step Compute the expected likelihood of an hypothesis h ′ for the full data, where the unobserved data distribution is modelled according to the current hypothesis h and the observed data: Q ( h ′ ; h ) = E Z [ ln p ( X , Z | h ′ ) | h , X ] M-step replace the current hypothesis with the one maximizing Q ( h ′ ; h ) h ← argmax h ′ Q ( h ′ ; h ) Unsupervised Learning

Example: estimating means of k univariate Gaussians Derivation the likelihood of an example is:   k ( x i − µ ′ j ) 2 1 � p ( x i , z i 1 , . . . , z ik | h ′ ) = √ exp  − z ij  2 σ 2 2 πσ j = 1 the dataset log-likelihood is:   n k j ) 2 ( x i − µ ′ 1 � � √ ln p ( X , Z | h ) =  ln − z ij  2 σ 2 2 πσ i = 1 j = 1 Unsupervised Learning

Example: estimating means of k univariate Gaussians E-step the expected log-likelihood (remember linearity of the expectation operator):     n k ( x i − µ ′ j ) 2 1 E Z [ ln p ( X , Z | h ′ )] = E Z � � √ −  ln z ij    2 σ 2 2 πσ i = 1 j = 1   n k ( x i − µ ′ j ) 2 1 � � √ =  ln − E [ z ij ]  2 σ 2 2 πσ i = 1 j = 1 The expectation given current hypothesis h and observed data X is computed as: exp − 1 2 σ 2 ( x i − µ j ) 2 p ( x i | µ j ) E [ z ij ] = = � k � k l = 1 exp − 1 2 σ 2 ( x i − µ l ) 2 l = 1 p ( x i | µ l ) Unsupervised Learning

Example: estimating means of k univariate Gaussians M-step The likelihood maximization gives:   n k j ) 2 ( x i − µ ′ 1 � � argmax h ′ Q ( h ′ ; h ) = argmax h ′ √  ln − E [ z ij ]  2 σ 2 2 πσ i = 1 j = 1 n k � � E [ z ij ]( x i − µ ′ j ) 2 = argmin h ′ i = 1 j = 1 zeroing the derivative wrt to each mean we get: n ∂ � E [ z ij ]( x i − µ ′ = − 2 j ) = 0 ∂µ j i = 1 � n i = 1 E [ z ij ] x i µ ′ j = � n i = 1 E [ z ij ] Unsupervised Learning

How to choose the number of clusters? Elbow method: idea Increasing number of clusters allows for better modeling of data Needs to trade-off quality of clusters with quantity Stop increasing number of clusters when advantage is limited Unsupervised Learning

How to choose the number of clusters? Elbow method: approach Run clustering algorithm for increasing number of clusters 1 Plot clustering evaluation metric (e.g. sum of squared 2 errors) for different k Choose k when there is an angle (making an elbow) in the 3 plot (drop in gain) Unsupervised Learning

How to choose the number of clusters? Elbow method: problem The Elbow method can be ambiguous, with multiple candidate points (e.g. k=2 and k=4 in the figure). Unsupervised Learning

How to choose the number of clusters? Average silhouette method: idea Increasing the numbers of clusters makes each cluster more homogeneuous Increasing the number of clusters can make different clusters more similar Use quality metric that trades-off intra-cluster similarity and inter-cluster dissimilarity Unsupervised Learning

How to choose the number of clusters? Silhouette coefficient for example i Compute the average dissimilarity between i and examples 1 of its cluster C : a i = d ( i , C ) = 1 � d ( i , j ) | C | j ∈ C Compute the average dissimilarity between i and examples 2 of each cluster C ′ � = C , take the minimum: C ′ � = C d ( i , C ′ ) b i = min The silhouette coefficient is: 3 b i − a i s i = max ( a i , b i ) Unsupervised Learning

How to choose the number of clusters? Average silhouette method: approach Run clustering algorithm for increasing number of clusters 1 Plot average (over examples) silhouette coefficient for 2 different k Choose k where the average silhouette coefficient is 3 maximal Unsupervised Learning

Hierarchical clustering Setting Clustering does not need to be flat Natural grouping of data is often hierarchical (e.g. biological taxonomy, topic taxonomy, etc.) A hierarchy of clusters can be built on examples Top-down approach: start from a single cluster with all examples recursively split clusters into subclusters Bottom-up approach: start with n clusters of individual examples (singletons) recursively aggregate pairs of clusters Unsupervised Learning

Dendograms Unsupervised Learning

Agglomerative hierarchical clustering Algorithm Initialize: 1 Final cluster number k (e.g. k=1) Initial cluster number ˆ k = n Initial clusters D i = { x i } , i ∈ 1 , . . . , n while ˆ k > k : 2 find pairwise nearest clusters D i , D j 1 merge D i and D j 2 update ˆ k = ˆ k − 1 3 Note Stopping criterion can be threshold on pairwise similarity Unsupervised Learning

Measuring cluster similarities Similarity measures Nearest-neighbour || x − x ′ || d min ( D i , D j ) = min x ∈D i , x ′ ∈D j Farthest-neighbour || x − x ′ || d max ( D i , D j ) = max x ∈D i , x ′ ∈D j Average distance 1 � � || x − x ′ || d avg ( D i , D j ) = n i n j x ∈D i x ′ ∈D j Distance between means d mean ( D i , D j ) = || µ i − µ j || d min and d max are more sensitive to outliers Unsupervised Learning

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it - PowerPoint PPT Presentation

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised Learning Unsupervised Learning Setting Supervised learning requires the availability of labelled examples Labelling examples can be an extremely

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

Practical Unsupervised Learning INFO/CS 4300, Spring 2016 Jack Hessel Unsupervised Learning is

Probabilistic & Unsupervised Learning Beyond linear-Gaussian and Mixture models Maneesh

Trusted Platform Module Dries Schellekens COSIC, KU Leuven Nomenclature Trusted versus

CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 1 Instructor: Yizhou Sun

Social Media Computing Lecture 4: Introduction to Information Retrieval and Classification

Experiences of Teaching Real-Time Systems to Control Engineers Karl-Erik rzn Dept of

Concise Introduction to Deep Neural Networks Outline: Classification problems Motivating

Free Actions on Handlebodies 1 handlebody = (compact) 3-dimensional orientable handlebody

Learning: Linear Methods CE417: Introduction to Artificial Intelligence Sharif University of

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it - PowerPoint PPT Presentation

Unsupervised Learning Andrea Passerini passerini@disi.unitn.it Machine Learning Unsupervised Learning Unsupervised Learning Setting Supervised learning requires the availability of labelled examples Labelling examples can be an extremely

UNSUPERVISED LEARNING, CLUSTERING UNSUPERVISED LEARNING UNSUPERVISED LEARNING Supervised

Unsupervised Learning and Clustering l In unsupervised learning you are given a data set with no

Introduction to PCA Unsupervised Learning in R Unsupervised learning Two methods of

4CSLL5 Parameter Estimation (Supervised and Unsupervised) Unsupervised Maximum Likelihood

Unsupervised Language Learning: Representation Learning for NLP Katia Shutova ILLC University

Unsupervised Learning Unsupervised Learning Learning without Class Labels (or correct Learning

Unsupervised Learning Introduction Nakul Verma Unsupervised Learning What can we learn from

12. Unsupervised Deep Learning CS 535 Deep Learning, Winter 2018 Fuxin Li With materials from

Machine Learning for NLP Unsupervised Learning Aurlie Herbelot 2019 Centre for Mind/Brain

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

Unsupervised Learning Unsupervised vs Supervised Learning: Most of this course focuses on

Unsupervised Learning Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National

On the Limitations of Unsupervised Bilingual Dictionary Induction Anders Sgaard Sebastian

Unsupervised learning introduction October 7, 2019 Unsupervised learning introduction

Practical Unsupervised Learning INFO/CS 4300, Spring 2016 Jack Hessel Unsupervised Learning is

Probabilistic &amp; Unsupervised Learning Beyond linear-Gaussian and Mixture models Maneesh

Trusted Platform Module Dries Schellekens COSIC, KU Leuven Nomenclature Trusted versus

CS6220: DATA MINING TECHNIQUES Matrix Data: Classification: Part 1 Instructor: Yizhou Sun

Social Media Computing Lecture 4: Introduction to Information Retrieval and Classification

Experiences of Teaching Real-Time Systems to Control Engineers Karl-Erik rzn Dept of

Concise Introduction to Deep Neural Networks Outline: Classification problems Motivating

Free Actions on Handlebodies 1 handlebody = (compact) 3-dimensional orientable handlebody

Learning: Linear Methods CE417: Introduction to Artificial Intelligence Sharif University of

Probabilistic & Unsupervised Learning Beyond linear-Gaussian and Mixture models Maneesh