machine learning
play

MACHINE-LEARNING Pr. Fabien MOUTARDE Center for Robotics MINES - PDF document

UNSUPERVISED MACHINE-LEARNING Pr. Fabien MOUTARDE Center for Robotics MINES ParisTech PSL Universit Paris Fabien.Moutarde@mines-paristech.fr http://people.mines-paristech.fr/fabien.moutarde UNSUPERVISED Machine-Learning, Pr. Fabien


  1. UNSUPERVISED MACHINE-LEARNING Pr. Fabien MOUTARDE Center for Robotics MINES ParisTech PSL Université Paris Fabien.Moutarde@mines-paristech.fr http://people.mines-paristech.fr/fabien.moutarde UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 1 Machine-Learning TYPOLOGY UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 3

  2. Supervised vs UNsupervised learning Learning is called "supervised" when there are "target" values for every example in training dataset: examples = (input-output) = (x 1 ,y 1 ),(x 2 ,y 2 ), … ,(x n ,y n ) The goal is to build a (generally non-linear) approximate model for interpolation, in order to be able to GENERALIZE to input values other than those in training set "Unsupervised" = when there are NO target values: dataset = {x 1 , x 2 , … , x n } The goal is typically either to do datamining (unveil structure in the distribution of examples in input space), or to find an output maximizing a given evaluation function UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 4 Examples of UNSUPERVISED Machine-Learning Datamining (clustering) Generative Learning Generated fake faces UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 5

  3. UNSUPERVISED learning from data Set of “input - only” (i.e. unlabeled) examples : X= {x 1 , x 2 , … , x n } ( x i ÎÂ d , often with « large d») h Î H so that criterion J(h,X) LEARNING H family of is verified or mathematical models ALGORITHM [ each h Î H à y=h(x) ] optimised Hyper-parameters for training algorithm Typical example: “clustering” • h(x) Î C={1,2, …, K} [ each i ßà a “cluster” ] • J(h,X) : dist(x i ,x j ) smaller for x i ,x j with h( x i )=h( x j ) than for x i ,x j with h( x i ) ¹ h( x j ) UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 6 C lustering (en français, regroupement ou partitionnement) Goal = identify structure in data distribution • Group together examples that are close/similar • Pb: groups not always well-defined/delimited, can have arbitrary shape, and fuzzy borders UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 7

  4. Similarity and Distances Similarity • The larger a similarity measure, the more similar the points are • » inverse of distance How to measure distance between 2 points d(x 1 ; x 2 ) ? • Euclidian distance: d 2 (x 1 ;x 2 )= S i (x 1i -x 2i ) 2 = (x 1 -x 2 ). t (x 1 -x 2 ) [L 2 norm] • Manhattan distance: d(x 1 ;x 2 )= S i |x 1i -x 2i | [L 1 norm] • Sebestyen distance: d 2 (x 1 ;x 2 )= (x 1 -x 2 ).W. t (x 1 -x 2 ) [with W=diagonal matrix] • Mahalanobis distance: d 2 (x 1 ;x 2 )= (x 1 -x 2 ).C. t (x 1 -x 2 ) [with C=Covariance matrix] UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 8 Typology of clustering techniques • By agglomeration – Agglomerative Hierarchical Clustering, AHC [en français, Regroupement Hiérarchique Ascendant] • By partitioning – Partitionnement Hiérarchique Descendant – Spectral partitioning (separation in the space of vecteurs propres of adjacency matrix) – K-means • By modelling – Mixture of Gaussians (GMM) – Self-Organizing (Kohohen) Maps, SOM (Cartes de Kohonen) • Based on data density UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 9

  5. Agglomerative Hierarchical Clustering (AHC) Principle: recursively, each point or cluster is absorbed by the nearest cluster Algorithm • Initialization: – Each example is a cluster with only one point – Compute the matrix M of similarities for each pair of clusters • Repeat: – Selection in M of the 2 most mutually similar clusters C i and C j – Fusion of C i and C j in a more general cluster C g – Update of M matrix, by computing similarities between C g and all pre-existing clusters Until fusion of the 2 last clusters UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 10 Distance between 2 clusters?? • Min distance (between closest points): min ( d(i,j) i Î C 1 & j Î C 2 ) • Max distance: max(d(i,j) i Î C 1 & j Î C 2 ) • Average distance: ( S i Î C1&j Î C2 d(i,j) ) / (card(C 1 ) ´ card(C 2 )) • distance between the 2 centroïds: d(b 1 ;b 2 ) • Ward distance: sqrt ( n 1 n 2 /(n 1 +n 2 ) ) ´ d (b 1 ;b 2 ) [où n i =card(C i )] Each type of clusters inter-distance è specific variant ¹ of AHC – distMin (ppV) à single-linkage – distMax à complete-linkage UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 11

  6. AHC output = dendrogram • dendrogram = representation of the full hierarchy of successively grouped clusters • Height from a cluster to its sub-clusters » distance between the 2 merged clusters UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 12 Clustering by partitionning: K-means algorithm • Each cluster C k defined by its « centroïd » c k , which is a « prototype » (a vector template in input space); • Each training example x is « assigned » to cluster C k(x) which has centroïd nearest to x : k(x)=ArgMin k (dist(x,c k )) • ALGO : – Initialization = randomly choose K distinct points c 1 ,…, c K among training examples {x 1 ,…, x n } – REPEAT until stabilization » of all c k : • Assign each x i to cluster C k(i) which has min dist(x i ,c k(i) ) å ( ) = c x card C • Recompute centroïds c k of clusters: k k Î x C k K ( ) åå 2 = D dist c , x [This minimizes ] k = Î k 1 x C k UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 13

  7. Other partitioning method: SPECTRAL clustering • Principle = use adjacency graph 0.1 5 1 0.9 0.8 Nodes = input examples 0.6 6 Edge values = similarities 2 0.4 (in [0;1], so 1 ßà same point) 0.8 0.5 0.2 3 4 è Graph partitioning algos (min-cut , etc… ) allow to recursively split graph in several connex componants Ex: Minimal Spanning Tree + remove edges from smallest to bigger values à single-linkage clusters UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 14 Spectral clustering algo • Compute Laplacian matrix L=D-A of the adjacency graph 0.1 5 1 x 1 x 2 X 3 x 4 x 5 x 6 0.9 0.8 x 1 1.5 -0.8 -0.6 0 -0.1 0 0.6 6 2 x 2 -0.8 1.6 -0.8 0 0 0 0.4 x 3 -0.6 -0.8 1.6 -0.2 0 0 0.8 0.5 0.2 3 x 4 0 0 -0.2 1.1 -0.4 -0.5 4 2 2 - s || s s || /2 = - A e x 5 -0.1 0 0 -0.4 1.4 -0.9 i j ij x 6 0 0 0 -0.5 -0.9 1.4 • L is symmetric è it has real and positive eigenvalues (and ┴ eigen vectors) • Compute and sort the eigenvalues, then project examples x i ÎÂ d on the k eigen vectors of highest eigen values à new input space s i ÎÂ k , in which separation in clusters will be easier 0.8 2 0.6 1.5 0.4 1 0.2 0.5 0 0 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -0.709 -0.7085 -0.708 -0.7075 -0.707 -0.7065 -0.706 -0.5 -0.2 -1 -0.4 -1.5 -0.6 -2 -0.8 UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 15

  8. Other UNsupervised algos • Learn the PROBABILITY DISTRIBUTION: – Restricted Boltzmann Machine (RBM) – etc… • Learn a kind of « PROJECTION » into a LOWER DIMENSION SPACE (« Manifold Learning ») : – Non-linear Principle Componant Analysis (PCA), (e.g. kernel-based) – Auto-encoders – Kohonen topological Self-Organizing Maps (SOM) – … UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 16 Restricted Boltzmann Machine • Proposed by Smolensky (1986) + Hinton (2005) • Learns the probability distribution of examples • Two-layers Neural Networks with BINARY neurons and bidirectional connections • Use: where = energy • Training: maximize product of probabilities P i P(v i ) by gradient descent with Contrastive Divergence v’ = reconstruction from h and h’ deduced from v’ UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 17

  9. Kohonen Self-Organizing Maps (SOM) Another specific type of Neural Network OUTPUT neurons X1 X2 Xn Inputs … with a self-organizing training algorithm which generates a MAPPING from input space to the Map THAT RESPECTS THE TOPOLOGY OF DATA UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 18 Inspiration and use of SOM Biological inspiration: self-organization of regions in perception parts of brain. USE IN DATA ANALYSIS • VISUALIZE (generally in 2D) the distribution of data with a topology-preserving “projection” (2 points close in input space should be projected on close cells on the SOM) • CLUSTERING UNSUPERVISED Machine-Learning, Pr. Fabien MOUTARDE, Centre for Robotics, MINES ParisTech, PSL, May2019 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend