Machine Learning Fall 2017 Unsupervised Learning (Clustering: k - PowerPoint PPT Presentation

Machine Learning Fall 2017 Unsupervised Learning (Clustering: k -means, EM, mixture models) Professor Liang Huang (Chaps. 15-16 of CIML)

Roadmap CIML Chaps. 3, 4,5,7,11,17 ,18 • so far: (large-margin) supervised learning • online learning: avg perceptron/MIRA, convergence proof • SVMs: formulation, KKT, dual, convex, QP y=+ 1 y=- 1 , SGD (Pegasos) • kernels and kernelized perceptron in dual; kernel SVM the man bit the dog • briefly: k -NN and leave-one-out; RL and imitation learning • structured perceptron/SVM, HMM, MLE, Viterbi DT NN VBD DT NN • what we left out: many classical algorithms CIML Chaps. 1,9,10,13 • decision trees, logistic regression, linear regression, boosting, ... • next up: unsupervised learning CIML Chaps. 15,16 • clustering: k -means, EM, mixture models, hierarchical • dimensionality reduction: PCA, non-linear (LLE, etc) 2

CIML book extra topics covered: 1 Decision Trees MIRA 2 Limits of Learning week 5b aggressive MIRA 3 Geometry and Nearest Neighbors week 1 convex programming 4 The Perceptron week 2 quadratic programming 5 Practical Issues Pegasos 6 Beyond Binary Classification weeks 3,4 dual Pegasos 7 Linear Models structured Pegasos 8 Bias and Fairness important 9 Probabilistic Modeling in DL 10Neural Networks week 5 11Kernel Methods in retrospect: 12Learning Theory should start with k-NN 13Ensemble Methods should cover logistic regression 14Efficient Learning next: week 8b,9a 15Unsupervised Learning next: week 8b 16Expectation Maximization weeks 7,8a 17Structured Prediction week 5b 18Imitation Learning 3

Sup=>Unsup: k- NN => k -means • let’s review a supervised learning method: nearest neighbor • SVM, perceptron (in dual) and NN are all instance-based learning • instance-based learning: store a subset of examples for classification • compression rate: SVM: very high, perceptron: medium high, NN: 0 4

k -Nearest Neighbor • one way to prevent overfitting => more stable results 5

NN Voronoi in 2D and 3D 6

Voronoi for Euclidian and Manhattan 7

Unsupervised Learning • cost of supervised learning (a) 2 • labeled data: expensive to annotate! • but there exists huge data w/o labels 0 • unsupervised learning − 2 • can only hallucinate the labels − 2 0 2 • infer some “internal structures” of data • still the “compression” view of learning • too much data => reduce it! • clustering: reduce # of examples • dimensionality reduction: reduce # of dimensions 8

Challenges in Unsupervised Learning • how to evaluate the results? (a) 2 • there is no gold standard data! 0 • internal metric? • how to interpret the results? − 2 − 2 0 2 • how to “name” the clusters? • how to initialize the model/guess? • a bad initial guess can lead to very bad results • unsup is very sensitive to initialization (unlike supervised) • how to do optimization => in general no longer convex! 9

k -means • (randomly) pick k points to be initial centroids • repeat the two steps until convergence • assignment to centroids: voronoi, like NN • recomputation of centroids based on the new assignment (a) 2 0 − 2 − 2 0 2 10

k -means • (randomly) pick k points to be initial centroids • repeat the two steps until convergence • assignment to centroids: voronoi, like NN • recomputation of centroids based on the new assignment (b) 2 0 − 2 − 2 0 2 11

k -means • (randomly) pick k points to be initial centroids • repeat the two steps until convergence • assignment to centroids: voronoi, like 1-NN • recomputation of centroids based on the new assignment (c) 2 0 − 2 − 2 0 2 12

k -means • (randomly) pick k points to be initial centroids • repeat the two steps until convergence • assignment to centroids: voronoi, like NN • recomputation of centroids based on the new assignment (d) 2 0 − 2 − 2 0 2 13

k -means • (randomly) pick k points to be initial centroids • repeat the two steps until convergence • assignment to centroids: voronoi, like NN • recomputation of centroids based on the new assignment (e) 2 0 − 2 − 2 0 2 14

k -means • (randomly) pick k points to be initial centroids • repeat the two steps until convergence • assignment to centroids: voronoi, like NN • recomputation of centroids based on the new assignment (f) 2 0 − 2 − 2 0 2 15

k -means • (randomly) pick k points to be initial centroids • repeat the two steps until convergence • assignment to centroids: voronoi, like NN • recomputation of centroids based on the new assignment (g) 2 0 − 2 − 2 0 2 16

k -means • (randomly) pick k points to be initial centroids • repeat the two steps until convergence • assignment to centroids: voronoi, like NN • recomputation of centroids based on the new assignment (h) 2 0 − 2 − 2 0 2 17

k -means • (randomly) pick k points to be initial centroids • repeat the two steps until convergence • assignment to centroids: voronoi, like NN • recomputation of centroids based on the new assignment (i) 2 0 − 2 − 2 0 2 18

k -means • (randomly) pick k points to be initial centroids • repeat the two steps until convergence • assignment to centroids: voronoi, like NN • recomputation of centroids based on the new assignment • how to define convergence? (i) 2 • after a fixed number of iterations, or 0 • assignments do not change, or • centroids do not change (equivalent?) or − 2 − 2 0 2 • change in objective function falls below threshold 19

k- means objective function • residual sum of squares (RSS) • sum of distances from points to their centroids • guaranteed to decrease monotonically • convergence proof: decrease + finite # of clusterings 1000 J 500 0 1 2 3 4 20

k -means for image segmentation 21

Problems with k- means • problem: sensitive to initialization • the objective function is non-convex: many local minima • why? • k-means works well if • clusters are spherical • clusters are well separated • clusters of similar volumes • clusters have similar # of examples 22

Better (“soft”) k -means? • random restarts -- definitely helps • soft clusters => EM with Gaussian Mixture Model p ( x ) (i) 2 1 (a) 0.5 0.2 0 0.3 0.5 0 x 0 0.5 1 − 2 − 2 0 2 1 (b) 0.5 0 23 0 0.5 1

k -means • randomize k initial centroids • repeat the two steps until convergence • E-step: assignment each example to centroids (Voronoi) • M-step: recomputation of centroids (based on the new assignment) (a) 2 0 − 2 − 2 0 2 24

EM for Gaussian Mixtures • randomize k means, covariances, mixing coefficients • repeat the two steps until convergence • E-step: evaluate the responsibilities using current parameters • M-step: reestimate parameters using current responsibilities 2 0 − 2 − 2 0 2 (a) 25

EM for Gaussian Mixtures • randomize k means, covariances, mixing coefficients • repeat the two steps until convergence • E-step: evaluate the responsibilities using current parameters • M-step: reestimate parameters using current responsibilities “fractional 2 assignments” 0 − 2 − 2 0 2 (b) 26

EM for Gaussian Mixtures • randomize k means, covariances, mixing coefficients • repeat the two steps until convergence • E-step: evaluate the responsibilities using current parameters • M-step: reestimate parameters using current responsibilities 2 L = 1 0 − 2 − 2 0 2 (c) 27

EM for Gaussian Mixtures • randomize k means, covariances, mixing coefficients • repeat the two steps until convergence • E-step: evaluate the responsibilities using current parameters • M-step: reestimate parameters using current responsibilities 2 L = 2 0 − 2 − 2 0 2 (d) 28

EM for Gaussian Mixtures • randomize k means, covariances, mixing coefficients • repeat the two steps until convergence • E-step: evaluate the responsibilities using current parameters • M-step: reestimate parameters using current responsibilities 2 L = 5 0 − 2 − 2 0 2 (e) 29

EM for Gaussian Mixtures • randomize k means, covariances, mixing coefficients • repeat the two steps until convergence • E-step: evaluate the responsibilities using current parameters • M-step: reestimate parameters using current responsibilities 2 L = 20 0 − 2 − 2 0 2 (f) 30

EM for Gaussian Mixtures • randomize k means, covariances, mixing coefficients • repeat the two steps until convergence • E-step: evaluate the responsibilities using current parameters • M-step: reestimate parameters using current responsibilities 2 L = 20 0 − 2 − 2 0 2 (f) 31

EM for Gaussian Mixtures 2 2 L = 1 0 0 − 2 − 2 − 2 0 2 − 2 0 2 (b) (c) 32

Convergence • EM converges much slower than k -means • can’t use “assignment doesn’t change” for convergence • use log likelihood of the data • stop if increase in log likelihood smaller than threshold • or a maximum # of iterations has reached L = log P (data) = log Π j P ( x j ) X = log P ( x j ) j X X = log P ( c i ) P ( x j | c i ) j i 33

Machine Learning Fall 2017 Unsupervised Learning (Clustering: k - PowerPoint PPT Presentation

Machine Learning Fall 2017 Unsupervised Learning (Clustering: k -means, EM, mixture models) Professor Liang Huang (Chaps. 15-16 of CIML) Roadmap CIML Chaps. 3, 4,5,7,11,17 ,18 so far: (large-margin) supervised learning online learning:

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Introducing the Tax Credit Fast Track RAD First Component July 22, 2014 A GENDA Review the key

Welcome to CSE 311: Foundations of Computing I F Instructor: Rajesh Rao (rao@cs.washington.edu) F

Cybersecurity seen through a chao1c lens What is the problem and why is it important?

Large deviations of the top eigenvalue of random matrices and applications in statistical physics

Discrete Mathematics & Mathematical Reasoning Basic Structures: Sets, Functions, Relations,

Welcome to the 2015 CHIPP Plenary First CHIPP Plenary meeting during an SPS annual meeting First

All Under Sin All Under Sin All Under Sin All Under Sin Gentiles Jews

Systems Shan-Hung Wu CS, NTHU Why do you need a database system? 2 To store data, why not

Sambuz

Useful Links

Newsletter

Mail Us

Machine Learning Fall 2017 Unsupervised Learning (Clustering: k - PowerPoint PPT Presentation

Machine Learning Fall 2017 Unsupervised Learning (Clustering: k -means, EM, mixture models) Professor Liang Huang (Chaps. 15-16 of CIML) Roadmap CIML Chaps. 3, 4,5,7,11,17 ,18 so far: (large-margin) supervised learning online learning:

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Machine learning for finance Nathan George Data Science Professor DataCamp Machine Learning

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

Introducing the Tax Credit Fast Track RAD First Component July 22, 2014 A GENDA Review the key

Welcome to CSE 311: Foundations of Computing I F Instructor: Rajesh Rao (rao@cs.washington.edu) F

Cybersecurity seen through a chao1c lens What is the problem and why is it important?

Large deviations of the top eigenvalue of random matrices and applications in statistical physics

Discrete Mathematics &amp; Mathematical Reasoning Basic Structures: Sets, Functions, Relations,

Welcome to the 2015 CHIPP Plenary First CHIPP Plenary meeting during an SPS annual meeting First

All Under Sin All Under Sin All Under Sin All Under Sin Gentiles Jews

Systems Shan-Hung Wu CS, NTHU Why do you need a database system? 2 To store data, why not

Sambuz

Useful Links

Newsletter

Mail Us

Discrete Mathematics & Mathematical Reasoning Basic Structures: Sets, Functions, Relations,