K-Means + GMMs Clustering Readings: EM, GMM Readings: Matt - PowerPoint PPT Presentation

10-‑601 ¡Introduction ¡to ¡Machine ¡Learning Machine ¡Learning ¡Department School ¡of ¡Computer ¡Science Carnegie ¡Mellon ¡University K-‑Means + GMMs Clustering ¡Readings: EM, ¡GMM ¡Readings: Matt ¡Gormley Murphy ¡25.5 Murphy ¡11.4.1, ¡11.4.2, ¡11.4.4 Bishop ¡12.1, ¡12.3 Bishop ¡9 Lecture ¡16 HTF ¡14.3.0 HTF ¡8.5 ¡-‑ 8.5.3 March ¡20, ¡2017 Mitchell ¡-‑-‑ Mitchell ¡6.12 ¡-‑ 6.12.2 1

Reminders • Homework 5: ¡Readings / ¡Application of ¡ML – Release: ¡Wed, ¡Mar. ¡08 – Due: ¡Wed, ¡Mar. ¡22 ¡at ¡11:59pm 2

Peer ¡Tutoring Tutor Tutee 3

K-‑MEANS 5

K-‑Means ¡Outline • Clustering: ¡Motivation ¡/ ¡Applications • Optimization ¡Background – Coordinate ¡Descent – Block ¡Coordinate ¡Descent • Clustering – Inputs ¡and ¡Outputs Last ¡Lecture – Objective-‑based ¡Clustering • K-‑Means – K-‑Means ¡Objective – Computational ¡Complexity – K-‑Means ¡Algorithm ¡/ ¡Lloyd’s ¡Method • K-‑Means ¡Initialization – Random – Farthest ¡Point – K-‑Means++ This ¡Lecture 6

Clustering, ¡Informal ¡Goals Goal : ¡Automatically ¡partition ¡unlabeled data ¡into ¡groups ¡of ¡similar ¡ datapoints. Question : ¡When ¡and ¡why ¡would ¡we ¡want ¡to ¡do ¡this? Useful ¡for: • ¡Automatically ¡organizing ¡data. • ¡Understanding ¡hidden ¡structure ¡in ¡data. • ¡Preprocessing ¡for ¡further ¡analysis. • ¡ Representing ¡high-‑dimensional ¡data ¡in ¡a ¡low-‑dimensional ¡space ¡(e.g., ¡ for ¡visualization ¡purposes). Slide ¡courtesy ¡of ¡Nina ¡Balcan

Applications (Clustering ¡comes ¡up ¡everywhere…) • Cluster ¡news ¡articles ¡or ¡web ¡pages ¡or ¡search ¡results ¡by ¡topic. • Cluster ¡protein ¡sequences ¡by ¡function ¡or ¡genes ¡according ¡to ¡expression ¡ profile. • Cluster ¡users ¡of ¡social ¡networks ¡by ¡interest ¡(community ¡detection). Twitter Network Facebook network Slide ¡courtesy ¡of ¡Nina ¡Balcan

Applications ¡ (Clustering ¡comes ¡up ¡everywhere…) • Cluster ¡customers ¡according ¡to ¡purchase ¡history. • Cluster ¡galaxies ¡or ¡nearby ¡stars (e.g. ¡Sloan ¡Digital ¡Sky ¡Survey) • And ¡many ¡many more ¡applications…. Slide ¡courtesy ¡of ¡Nina ¡Balcan

Optimization ¡Background Whiteboard: – Coordinate ¡Descent – Block ¡Coordinate ¡Descent 10

Clustering Question: ¡Which ¡of ¡these ¡partitions ¡is ¡“better”? 11

Clustering Whiteboard: – Inputs ¡and ¡Outputs – Objective-‑based ¡Clustering 12

K-‑Means Whiteboard: – K-‑Means ¡Objective – Computational ¡Complexity – K-‑Means ¡Algorithm ¡/ ¡Lloyd’s ¡Method 13

K-‑Means ¡Initialization Whiteboard: – Random – Furthest ¡Traversal – K-‑Means++ 14

Lloyd’s ¡method: ¡Random ¡Initialization Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Random ¡Initialization Example: ¡Given ¡a ¡set ¡of ¡datapoints Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Random ¡Initialization Select ¡initial ¡centers ¡at ¡random Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Random ¡Initialization Assign ¡each ¡point ¡to ¡its ¡nearest ¡center Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Random ¡Initialization Recompute optimal ¡centers ¡given ¡a ¡fixed ¡clustering Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Random ¡Initialization Recompute optimal ¡centers ¡given ¡a ¡fixed ¡clustering Get ¡a ¡good ¡ ¡quality ¡solution ¡in ¡this ¡example. Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Performance It ¡always ¡converges, ¡but ¡it ¡may ¡converge ¡at ¡a ¡local ¡optimum ¡that ¡is ¡ different ¡from ¡the ¡global ¡optimum, ¡and ¡in ¡fact ¡could ¡be ¡arbitrarily ¡ worse ¡in ¡terms ¡of ¡its ¡score. Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Performance Local ¡optimum: ¡every ¡point ¡is ¡assigned ¡to ¡its ¡nearest ¡center ¡and ¡ every ¡center ¡is ¡the ¡mean ¡value ¡of ¡its ¡points. Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Performance .It ¡is ¡arbitrarily ¡worse ¡than ¡optimum ¡solution…. Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Performance This ¡bad ¡performance, ¡can ¡happen ¡ even ¡with ¡well ¡separated ¡Gaussian ¡ clusters. Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Performance This ¡bad ¡performance, ¡can ¡ happen ¡even ¡with ¡well ¡ separated ¡Gaussian ¡clusters. Some ¡Gaussian ¡are ¡ combined….. Slide ¡courtesy ¡of ¡Nina ¡Balcan

Lloyd’s ¡method: ¡Performance • If ¡we ¡do ¡random ¡initialization, ¡as ¡k increases, ¡it ¡becomes ¡more ¡likely ¡ we ¡won’t ¡have ¡perfectly ¡picked ¡one ¡center ¡per ¡Gaussian ¡in ¡our ¡ initialization ¡(so ¡ Lloyd’s ¡method ¡will ¡output ¡a ¡bad ¡solution ). • For ¡k ¡equal-‑sized ¡Gaussians, ¡Pr[each ¡initial ¡center ¡is ¡in ¡a ¡ "! % different ¡Gaussian] ¡ ≈ " $ ≈ & $ • Becomes ¡unlikely ¡as ¡k ¡gets ¡large. ¡ Slide ¡courtesy ¡of ¡Nina ¡Balcan

Another ¡Initialization ¡Idea: ¡Furthest ¡Point ¡ Heuristic Choose ¡ 𝐝 𝟐 arbitrarily ¡(or ¡at ¡random). For ¡ j = 2, … , k • Pick ¡ 𝐝 𝐤 among ¡datapoints ¡ 𝐲 𝟐 , 𝐲 𝟑 , … , 𝐲 𝐨 that ¡is ¡farthest ¡ • from ¡previously ¡chosen ¡ 𝐝 𝟐 , 𝐝 𝟑 , … , 𝐝 𝒌0𝟐 Fixes ¡the ¡Gaussian ¡problem. ¡But ¡it ¡can ¡be ¡thrown ¡off ¡by ¡ outliers…. Slide ¡courtesy ¡of ¡Nina ¡Balcan

Furthest ¡point ¡heuristic ¡does ¡well ¡on ¡previous ¡ example Slide ¡courtesy ¡of ¡Nina ¡Balcan

Furthest ¡point ¡initialization ¡heuristic ¡sensitive ¡ to ¡outliers Assume ¡k=3 (0,1) (-‑2,0) (3,0) (0,-‑1) Slide ¡courtesy ¡of ¡Nina ¡Balcan

K-‑means++ ¡Initialization: ¡ D 6 sampling ¡ [AV07] • Interpolate ¡between ¡random ¡and ¡furthest ¡point ¡initialization Let ¡D( x ) be ¡the ¡distance ¡between ¡a ¡point ¡ 𝑦 and ¡its ¡nearest ¡center. ¡ • Chose ¡the ¡next ¡center ¡proportional ¡to ¡ D 6 (𝐲) . Choose ¡ 𝐝 𝟐 at ¡random. • For ¡ j = 2, … , k • Pick ¡ 𝐝 𝐤 among ¡ 𝐲 𝟐 , 𝐲 𝟑 , … , 𝐲 𝒐 according ¡to ¡the ¡distribution • 𝟑 𝐲 𝐣 − 𝐝 𝐤 ? 𝐐𝐬(𝐝 𝐤 = 𝐲 𝐣 ) ∝ 𝐧𝐣𝐨 𝐤 ? @𝐤 ¡ D 6 (𝐲 𝐣 ) Theorem: ¡ K-‑means++ ¡always ¡attains ¡an ¡O(log ¡k) ¡approximation ¡to ¡optimal ¡ k-‑means ¡solution ¡in ¡expectation. Running ¡Lloyd’s can ¡only ¡further ¡improve ¡the ¡cost. Slide ¡courtesy ¡of ¡Nina ¡Balcan

K-‑means++ ¡Idea: ¡ D 6 sampling • Interpolate ¡between ¡random ¡and ¡furthest ¡point ¡initialization Let ¡D( x ) be ¡the ¡distance ¡between ¡a ¡point ¡ 𝑦 and ¡its ¡nearest ¡center. • 𝜷 Chose ¡the ¡next ¡center ¡proportional ¡to ¡ D D 𝐲 = 𝐧𝐣𝐨 𝐤 ? @𝐤 ¡ 𝐲 𝐣 − 𝐝 𝐤 ? . 𝛽 = 0 , ¡random ¡sampling • 𝛽 = ∞ , ¡furthest ¡point ¡ (Side ¡note: ¡it ¡actually ¡works ¡well ¡for ¡k-‑center) • 𝛽 = 2 , ¡ k-‑means++ • Side ¡note: ¡ 𝛽 = 1 , ¡works ¡well ¡for ¡k-‑median ¡ Slide ¡adapted ¡from ¡Nina ¡Balcan

K-‑means ¡++ ¡Fix (0,1) (-‑2,0) (3,0) (0,-‑1) Slide ¡courtesy ¡of ¡Nina ¡Balcan

K-‑means++/ Lloyd’s Running ¡Time • K-‑means ¡++ ¡initialization: ¡O(nd) ¡and ¡one ¡pass ¡over ¡data ¡to ¡select ¡ next ¡center. ¡So ¡O(nkd) ¡time ¡in ¡total. • Lloyd’s ¡ method Repeat until ¡there ¡is ¡no ¡change ¡in ¡the ¡cost. Each ¡round ¡takes ¡time ¡ For ¡each ¡j: ¡ ¡ C K ← { 𝑦 ∈ 𝑇 whose ¡closest ¡center ¡is ¡ 𝐝 𝐤 } • O(nkd). • For ¡each ¡j: ¡ 𝐝 𝐤 ← mean ¡of ¡ C K • Exponential ¡# ¡of ¡rounds ¡in ¡the ¡worst ¡case ¡ [AV07] . • Expected ¡polynomial ¡time ¡in ¡the ¡smoothed ¡analysis ¡(non ¡worst-‑case) ¡ model! Slide ¡courtesy ¡of ¡Nina ¡Balcan

K-‑means++/ Lloyd’s ¡Summary • Running ¡Lloyd’s ¡can ¡only ¡further ¡improve ¡the ¡cost. • Exponential ¡# ¡of ¡rounds ¡in ¡the ¡worst ¡case ¡ [AV07] . • Expected ¡polynomial ¡time ¡in ¡the ¡smoothed ¡analysis ¡model! • Does ¡well ¡in ¡practice. • K-‑means++ ¡always ¡attains ¡an ¡O(log ¡k) ¡approximation ¡to ¡optimal ¡ k-‑means ¡solution ¡in ¡expectation. Slide ¡courtesy ¡of ¡Nina ¡Balcan

K-Means + GMMs Clustering Readings: EM, GMM Readings: Matt - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K-Means + GMMs Clustering Readings: EM, GMM Readings: Matt Gormley Murphy

On GANs and GMMs Eitan Richardson and Yair Weiss The Hebrew University of Jerusalem GAN: Sharp

Semantic Indexing Using GMM Supervectors and Tree-structured GMMs Nakamasa Inoue, Koichi Shinoda,

High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Nakamasa

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Lecture 23/Chapter 19 Diversity of Sample Means Means versus Proportions Behavior of

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

K -means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] COMP24111 Machine Learning Outline

11/11/2014 Chapter 22 INFERENCES ABOUT MEANS 1 SAMPLING DISTRIBUTION FOR MEANS Recall, the

Chapter 7: The Distribution of Sample Means Frequency 2 1 0 1 2 3 4 5 6 7 8 9 Scores Distribution

A Semantics for Means-End Relations Jesse Hughes Technical University of Eindhoven August 29,

k -means++ seeding Have seen that the k -means algorithm can output arbitrarily poor solutions, if

EM Algorithm 11-09-2019 For Mixture Gaussian Models Instructor - Sriram Ganapathy

Multilingual and low-resource ASR Lecture 18 CS 753 Instructor: Preethi Jyothi Recall Hybrid

MacConvilles Surveying BIM What it Means to Quantity Surveying BIM What it Means to

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos & Aarti Singh

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 35 Outline (1)

Evaluation of reactants for the oxidation of mercury using high-res speciated observations

Social Movements and the Food Movement IT TAKES A REGION CONFERENCE, NOVEMBER 2015 KATHY RUHF,

A Course in Applied Econometrics Outline Lecture 16 1. Introduction 2. Generalized Method

Lecture 3 Gaussian Mixture Models and Introduction to HMMs Michael Picheny, Bhuvana

Lecture 3: Euler-Equation Estimation Simon Gilchrist Boston Univerity and NBER EC 745 Fall,

Multimedia Event Detection using GS-SVMs and Audio-HMMs Shunsuke Sato Nakamasa Inoue, Yusuke

K-Means + GMMs Clustering Readings: EM, GMM Readings: Matt - PowerPoint PPT Presentation

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University K-Means + GMMs Clustering Readings: EM, GMM Readings: Matt Gormley Murphy

On GANs and GMMs Eitan Richardson and Yair Weiss The Hebrew University of Jerusalem GAN: Sharp

Semantic Indexing Using GMM Supervectors and Tree-structured GMMs Nakamasa Inoue, Koichi Shinoda,

High-Level Feature Extraction Using SIFT GMMs, Audio Models, and MFoM Ilseo Kim, Nakamasa

K-MEANS++ OPTIMAL INITIALIZATION ALGORITHM An Improved K-means Clustering Method OVERVIEW

Lecture 23/Chapter 19 Diversity of Sample Means Means versus Proportions Behavior of

Data Clustering: Data Clustering: 50 Years Beyond K means 50 Years Beyond K means 50 Years

Multi-variable Optimization K-means clustering K-means clustering on points is finding K

1 K-means clustering The K-means clustering algorithm can be seen as applying the EM algorithm to

K -means Clustering Ke Chen Reading: [7.3, EA], [9.1, CMB] COMP24111 Machine Learning Outline

11/11/2014 Chapter 22 INFERENCES ABOUT MEANS 1 SAMPLING DISTRIBUTION FOR MEANS Recall, the

Chapter 7: The Distribution of Sample Means Frequency 2 1 0 1 2 3 4 5 6 7 8 9 Scores Distribution

A Semantics for Means-End Relations Jesse Hughes Technical University of Eindhoven August 29,

k -means++ seeding Have seen that the k -means algorithm can output arbitrarily poor solutions, if

EM Algorithm 11-09-2019 For Mixture Gaussian Models Instructor - Sriram Ganapathy

Multilingual and low-resource ASR Lecture 18 CS 753 Instructor: Preethi Jyothi Recall Hybrid

MacConvilles Surveying BIM What it Means to Quantity Surveying BIM What it Means to

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos &amp; Aarti Singh

Generalized Method of Moments (GMM) Estimation Heino Bohn Nielsen 1 of 35 Outline (1)

Evaluation of reactants for the oxidation of mercury using high-res speciated observations

Social Movements and the Food Movement IT TAKES A REGION CONFERENCE, NOVEMBER 2015 KATHY RUHF,

A Course in Applied Econometrics Outline Lecture 16 1. Introduction 2. Generalized Method

Lecture 3 Gaussian Mixture Models and Introduction to HMMs Michael Picheny, Bhuvana

Lecture 3: Euler-Equation Estimation Simon Gilchrist Boston Univerity and NBER EC 745 Fall,

Multimedia Event Detection using GS-SVMs and Audio-HMMs Shunsuke Sato Nakamasa Inoue, Yusuke

Introduction to Machine Learning CMU-10701 Clustering and EM Barnabs Pczos & Aarti Singh