Clustering with k-means and Gaussian mixture distributions Machine - PowerPoint PPT Presentation

Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, November 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/MLCR.12.13

Objectives of visual recognition  Image classification: predict presence of objects in the image Car: present Cow: present Bike: not present Horse: not present …  Object localization: define the location and the category Category label + location Car Cow

Difficulties: appearance variation of same object  Variability in appearance of the same object: Viewpoint and illumination, ► occlusions, ► articulation of deformable objects ► ... ►

Difficulties: within-class variations

Visual category recognition  Robust image description Appropriate descriptors for objects and categories ► Local descriptors to be robust against occlusions ►  Machine learning techniques to learn models from examples  scene types (city, beach, mountains,...) : images  object categories (car, cat, person, ...) : cropped objects  human actions (run, sit-down, open-door, ...): video clips

Why machine learning?  Early approaches: simple features + handcrafted models  Can handle only few images, simple tasks L. G. Roberts, Ph.D. thesis Machine Perception of Three Dimensional Solids, MIT Department of Electrical Engineering, 1963.

Why machine learning?  Early approaches: manual programming of rules  Tedious, limited and does not take into account the data Y. Ohta, T. Kanade, and T. Sakai, “ An Analysis System for Scenes Containing objects with Substructures,” International Joint Conference on Pattern Recognition , 1978.

Bag-of-features image classification Excellent results in the presence of  background clutter, ► occlusion, ► lighting variations, ► viewpoint changes ► bikes books building cars people phones trees

Bag-of-features image classification in a nutshell 1) Extract local image regions For example using interest point detectors ► 2) Compute descriptors of these regions For example SIFT descriptors ► 3) Aggregate the local descriptors into global image representation This is where clustering techniques come in ► 4) Classification of the image based on this representation SVM or other classifier ►

Bag-of-features image classification in a nutshell 1) Extract local image regions For example using interest point detectors ► 2) Compute descriptors of these regions For example SIFT descriptors ► 3) Aggregate the local descriptors into bag-of-word histogram Map each local descriptor to one of K clusters (a.k.a. “visual words”) ► Use histogram of word counts to represent image ► Frequency in image Visual word index …..

Example visual words found by clustering Airplanes Motorbikes Faces Wild Cats Leafs People Bikes

Clustering  Finding a group structure in the data – Data in one cluster similar to each other – Data in different clusters dissimilar  Map each data point to a discrete cluster index – “flat” methods find K groups – “hierarchical” methods define a tree structure over the data

Hierarchical Clustering  Data set is organized into a tree structure  Top-down construction – Start all data in one cluster: root node – Apply “flat” clustering into k groups – Recursively cluster the data in each group  Bottom-up construction – Start with all points in separate cluster – Recursively merge “closest” clusters – Distance between clusters A and B • E.g. min, max, or mean distance between x in A, and y in B

Clustering descriptors into visual words  Offline clustering : Find groups of similar local descriptors Using many descriptors from many training images ►  Encoding a new image: – Detect local regions – Compute local descriptors – Count descriptors in each cluster [5, 2, 3] [3, 6, 1]

Definition of k-means clustering  Given: data set of N points x n , n=1,…,N  Goal: find K cluster centers m k , k=1,…,K that minimize the squared distance to nearest cluster centers K )= ∑ n = 1 N 2 E ({ m k } k = 1 min k ∈{ 1,... ,K } ∥ x n − m k ∥  Clustering = assignment of data points to nearest cluster center – Indicator variables r nk =1 if x n assgined to x n , r nk =0 otherwise  For fixed cluster centers , error criterion equals sum of squared distances between each data point and assigned cluster center N ∑ k = 1 K )= ∑ n = 1 K 2 E ({ m k } k = 1 r nk ∥ x n − m k ∥

Examples of k-means clustering  Data uniformly sampled in unit square  k-means with 5, 10, 15, and 25 centers

Minimizing the error function Goal find centers m k to minimize the error function • K )= ∑ n = 1 N 2 E ({ m k } k = 1 min k ∈{ 1,... ,K } ∥ x n − m k ∥ • Any set of assignments , not only the best assignment, gives an upper-bound on the error: N ∑ k = 1 K )= ∑ n = 1 K r nk ∥ x n − m k ∥ 2 F ({ m k } k = 1 • The iterative k-means algorithm minimizes this bound 1) Initialize cluster centers, eg. on randomly selected data points 2) Update assignments r nk for fixed centers m k 3) Update centers m k for fixed data assignments r nk 4) If cluster centers changed: return to step 2 5) Return cluster centers

Minimizing the error bound N ∑ k = 1 K )= ∑ n = 1 K r nk ∥ x n − m k ∥ 2 F ({ m k } k = 1 ∑ k r nk ∥ x n − m k ∥ ∑ k r nk ∥ x n − m k ∥ 2 2 • Update assignments r nk for fixed centers m k • Decouples over the data points • Constraint: exactly one r nk =1, rest zero • Solution: assign to closest center • Update centers m k for fixed assignments r nk • Decouples over the centers ∑ n r nk ∥ x n − m k ∥ 2 • Set derivative to zero • Put center at mean of assigned data points ∂ F = 2 ∑ n r nk ( x n − m k )= 0 ∂ m k m k = ∑ n r nk x n ∑ n r nk

Examples of k-means clustering  Several k-means iterations with two centers Error function

Minimizing the error function K )= ∑ n = 1 N 2 E ({ m k } k = 1 min k ∈{ 1,... ,K } ∥ x n − m k ∥ Goal find centers m k to minimize the error function • – Proceeded by iteratively minimizing the error bound N ∑ k = 1 K )= ∑ n = 1 K r nk ∥ x n − m k ∥ 2 F ({ m k } k = 1 • K-means iterations monotonically decrease error function since – Both steps reduce the error bound – Error bound matches true error after update of the assignments Bound #1 Bound #2 True error Error Placement of centers

Problems with k-means clustering  Solution depends heavily on initialization Several runs from different initializations ►

Problems with k-means clustering  Assignment of data to clusters is only based on the distance to center – No representation of the shape of the cluster – Implicitly assumes spherical shape of clusters

Clustering with Gaussian mixture density  Each cluster represented by Gaussian density – Parameters: center m, covariance matrix C – Covariance matrix encodes spread around center, can be interpreted as defining a non-isotropic distance around center Two Gaussians in 1 dimension A Gaussian in 2 dimensions

Clustering with Gaussian mixture density  Each cluster represented by Gaussian density – Parameters: center m, covariance matrix C – Covariance matrix encodes spread around center, can be interpreted as defining a non-isotropic distance around center Definition of Gaussian density in d dimensions  − 1 / 2 exp ( − 1 − 1 ( x − m ) ) T C − d / 2 ∣ C ∣ N ( x ∣ m,C )=( 2 π) 2 ( x − m ) Determinant of Quadratic function of covariance matrix C point x and mean m Mahanalobis distance

Mixture of Gaussian (MoG) density  Mixture density is weighted sum of Gaussian densities – Mixing weight: importance of each cluster K p ( x )= ∑ k = 1 π k N ( x ∣ m k , C k ) π k ≥ 0  Density has to integrate to 1, so we require K ∑ k = 1 π k = 1 Mixture in 2 dimensions Mixture in 1 dimension

Clustering with Gaussian mixture density  Given: data set of N points x n , n=1,…,N  Find mixture of Gaussians (MoG) that best explains data Maximize log-likelihood of fixed data set w.r.t. parameters of MoG ► Assume data points are drawn independently from MoG ► N N K L (θ)= ∑ n = 1 log p ( x n )= ∑ n = 1 log ∑ k = 1 π k N ( x n ∣ m k ,C k ) K θ={π k ,m k ,C k } k = 1  MoG learning very similar to k-means clustering – Also an iterative algorithm to find parameters – Also sensitive to initialization of paramters

Assignment of data points to clusters  As with k-means z n indicates cluster index for x n  To sample data point from MoG p ( z = k )=π k – Select cluster with probability given by mixing weight p ( x ∣ z = k )= N ( x ∣ m k ,C k ) – Sample point from the k-th Gaussian – MoG recovered if we marginalize over the unknown cluster index p ( x )= ∑ k p ( z = k ) p ( x ∣ z = k )= ∑ k π k N ( x ∣ m k ,C k ) Color coded model and data of each cluster Mixture model and data from it

Soft assignment of data points to clusters  Given data point x, infer cluster index z p ( z = k ∣ x )= p ( z = k , x ) p ( x ) π k N ( x ∣ m k ,C k ) p ( z = k ) p ( x ∣ z = k ) = ∑ k p ( z = k ) p ( x ∣ z = k )= ∑ k π k N ( x ∣ m k ,C k ) Color-coded MoG model Data soft-assignments

Clustering with k-means and Gaussian mixture distributions Machine - PowerPoint PPT Presentation

Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, November 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/MLCR.12.13 Objectives of visual recognition

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition

Clustering with k-means and Gaussian mixture distributions Machine Learning and Category

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Contents Clustering K-means Mixture of Gaussians Expectation Maximization

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Statistical Learning Marco Chiarandini Deptartment of Mathematics & Computer Science

LP Decoding of Regular LDPC Codes in Memoryless Channels Nissim Halabi Guy Even ISIT 2010 1

Maximum likelihood threshold of a graph Elizabeth Gross San Jos e State University Joint work

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

CSCE 478/878 Lecture 7: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchells slides)

Data Warehousing and Machine Learning Probabilistic Classifiers Thomas D. Nielsen Aalborg

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Clustering with k-means and Gaussian mixture distributions Machine - PowerPoint PPT Presentation

Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, November 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/MLCR.12.13 Objectives of visual recognition

Expectation Maximization Greg Mori - CMPT 419/726 Bishop PRML Ch. 9 K-Means Gaussian Mixture

Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition

Clustering with k-means and Gaussian mixture distributions Machine Learning and Category

Lecture 20 Lecture 20 Nov 12 th 2008 Clustering with Mixture of Gaussians Clustering with Mixture

Clustering and Dimensionality Reduction Preview Clustering K -means clustering

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Bernoulli Mixture Models Victor Medina Researcher at SBIF DataCamp Mixture Models in R The

Structure of mixture models Victor Medina Researcher at SBIF DataCamp Mixture Models in R

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

k -means clustering Method to automatically separate data sets into distinct groups. Clustering

Contents Clustering K-means Mixture of Gaussians Expectation Maximization

Gaussian Filter The Gaussian filter 1 2 1 A Gaussian kernel gives less 1 2 4 2 weight to

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

Deep Gaussian Mixture Models Cinzia Viroli (University of Bologna, Italy) joint with Geoff

Practical Bioinformatics Mark Voorhies 5/26/2015 Mark Voorhies Practical Bioinformatics Habits

Statistical Learning Marco Chiarandini Deptartment of Mathematics &amp; Computer Science

LP Decoding of Regular LDPC Codes in Memoryless Channels Nissim Halabi Guy Even ISIT 2010 1

Maximum likelihood threshold of a graph Elizabeth Gross San Jos e State University Joint work

Max Likelihood for Log-Linear Models Daphne Koller Log-Likelihood for Markov Nets A B C

CSCE 478/878 Lecture 7: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchells slides)

Data Warehousing and Machine Learning Probabilistic Classifiers Thomas D. Nielsen Aalborg

Foundations of AI Why learning works 1 6 . Statistical Machine Learning Bayesian Learning and

Statistical Learning Marco Chiarandini Deptartment of Mathematics & Computer Science