T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam - PowerPoint PPT Presentation

Dimensionality Reduction Clustering T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam¨ aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of Technology (TKK) Autumn 2007 AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Clustering Remaining Lectures 6 Nov: Dimensionality Reduction & Clustering (Aplaydin Ch 6&7) 13 Nov: Clustering & Algorithms in Data Analysis (PDF chapter) 20 Nov: Assessing Algorithms & Decision Trees (Alpaydin Ch 14&9) 27 Nov: Machine Learning @ Google /TBA (additionally, Google recruitment talk in afternoon in T1 at 16 o’clock, see http://www.cis.hut.fi/googletalk07/ ) 4 Dec: Decision Trees & Linear Discrimination (Alpaydin Ch 10) (7 Dec: last problem session.) 11 Dec: Recap AB The plan is preliminary (may still change) Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Clustering About the Text Book This course has Alpaydin (2004) as a text book. The lecture slides (neither mine nor the ones on the Alpaydin’s site) are not meant to be a replacement for the text book. It is important also to read the book chapters. Library has some reading room copies (they are planning to order some more). If nothing else, you should probably at least copy some key chapters. AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Outline Dimensionality Reduction 1 Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) Clustering 2 Introduction K-means Clustering EM Algorithm AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) PCA finds low-dimensional linear subspace such that when x is projected there information loss (here defined as variance) is minimized. Finds directions of maximal variance. Projection pursuit: find direction w such that some measure (here variance Var ( w T x )) is maximized. Equivalent to finding eigenvalues and -vectors of covariance or correlation matrix. AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) 2� z� 2� x� z� 1� z� 2� x� z� 1� 1� Figure 6.1: Principal components analysis centers the sample and then rotates the axes to line up with the directions of highest variance. If the variance on z 2 is too small, it can be ignored and we have dimensionality reduction from two to one. From: E. Alpaydın. 2004. Introduction to Machine Learning . � The MIT Press. c AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) t =1 , x t ∈ R d . More formally: data X = { x t } N Center data: y t = x t − m , where m = � t x t / N . Two options: Use covariance matrix S = � t yy T / N . � Use correlation matrix R , where R ij = S ij / S ii S jj . Diagonalize S (or R ) using Singular Value Decomposition (SVD): C T SC = D , where C is an orthogonal (rotation) matrix satisfying CC T = C T C = 1 and D is a diagonal matrix whose diagonal elements are the eigenvalues λ 1 ≥ . . . ≥ λ d ≥ 0. i th column of C is the i th eigenvector. Project data vectors y t to principal components z t = C T y t (equivalently y t = C z t ). AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) Observation: covariance matrix of { z t } N t =1 is a diagonal matrix D whose diagonal elements are the 2� z� 2� x� z� 1� z� 2� variances. X X zz T / N = C T yy T C / N S z = x� z� 1� 1� t t Figure 6.1: Principal components analysis centers X ! C T yy T / N C = C T SC = D , the sample and then rotates the axes to line up with = the directions of highest variance. If the variance on t z 2 is too small, it can be ignored and we have dimensionality reduction from two to one. From: where the diagonal elements of D E. Alpaydın. 2004. Introduction to Machine are the variances D ii = σ 2 Learning . � The MIT Press. c z i . Eigenvalues λ i ⇔ variances σ 2 i . AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) Idea: in the PC space ( z space), k first principal components explain the data well enough, where k < d . “Well enough” means here that the reconstruction error is small enough. More formally: Project the data vectors y t into R k using ˆ z t = W T y t , where W ∈ R d × k is a matrix containing the first k columns of C . z t is a representation of y t in k dimensions. (“W < - C[,1:k]”). ˆ z t back to y t space: Project ˆ y t = W ˆ z t = WW T y t ˆ What is the average reconstruction error y t − y t ) T (ˆ y t − y t ) / N ? E = � t (ˆ AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) What is the average reconstruction error y t − y t ) T (ˆ y t − y t ) / N ? E = � t (ˆ E = Tr ( E [(ˆ y − y ) (ˆ y − y )]) WW T − 1 WW T − 1 ““ ” h yy T i “ ”” = Tr E “ WW T CDC T WW T ” “ CDC T ” “ ” W T CDC T W = Tr + Tr − 2 Tr d X = λ i , i = k +1 where we have used the fact that S = CDC T = E � yy T � and the cyclic property of the trace, Tr ( AB ) = Tr ( BA ). AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Principal Component Analysis (PCA) Result: PCA is a linear projection of data from R d into R k such that the average reconstruction error y − y ) T (ˆ � � E = E (ˆ y − y ) is minimized. Proportion of Variance (PoV) Explained: PoV = � k i =1 λ i / � d i =1 λ i . Some rules of thumb to find a good k : PoV ≈ 0 . 9, or PoV curve has an elbow. z t instead of Dimension reduction: it may be sufficient to use ˆ x t to train a classifier etc. ˆ z t using k = 2 (first thing Visualization: plotting the data to ˆ to do with new data). Data compression: instead of storing the full data vectors y t it z t and then reconstruct the may be sufficient to store only ˆ y t = W ˆ z t , if necessary. original data using ˆ AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Example: Optdigits optdigits data set contains 5620 instances of digitized handwritten digits in range 0–9. Each digit is a R 64 vector: 8 × 8 = 64 pixels, 16 grayscales. 0 4 6 2 AB Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) Example: Optdigits AB !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC Kai Puolam¨ aki T-61.3050

Dimensionality Reduction Principal Component Analysis (PCA) Clustering Linear Discriminant Analysis (LDA) !"#$%&"'()$"*'+)&','-./012 ! 3'4556'73$&)2%#$8)3'$)'90#:83"'!"0&383;'< =:"'97='>&"**'?@ABAC AB Kai Puolam¨ aki T-61.3050

T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam - PowerPoint PPT Presentation

Dimensionality Reduction Clustering T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of

T-61.3050 Machine Learning: Basic Principles Model Selection Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Introduction Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Multivariate Methods Kai Puolam aki Laboratory

T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Decision Trees Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam aki

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Stiefel Manifolds and their Applications Pierre-Antoine Absil (UCLouvain) CESAME seminar 22

Methods for finding coupled patterns in two data sets Martin Widmann VALUE training school, ICTP

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Block-Quantized Kernel Matrix for Fast Spectral Embedding Kai Zhang James T. Kwok Department of

Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data

Analytical Query Processing Marco Serafini COMPSCI 532 Lecture 7 Announcement Midterm date

Scalable Data Processing at Network transfer rates with nCorium Compute in Memory Modules Suresh

T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam - PowerPoint PPT Presentation

Dimensionality Reduction Clustering T-61.3050 Machine Learning: Basic Principles Clustering Kai Puolam aki Laboratory of Computer and Information Science (CIS) Department of Computer Science and Engineering Helsinki University of

T-61.3050 Machine Learning: Basic Principles Model Selection Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Introduction Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Multivariate Methods Kai Puolam aki Laboratory

T-61.3050 Machine Learning: Basic Principles Bayesian Networks Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Decision Trees Kai Puolam aki Laboratory of

T-61.3050 Machine Learning: Basic Principles Dimensionality Reduction Kai Puolam aki

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

APPLIED MACHINE LEARNING Methods for Clustering K-means, Soft K-means DBSCAN 1 MACHINE

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Stiefel Manifolds and their Applications Pierre-Antoine Absil (UCLouvain) CESAME seminar 22

Methods for finding coupled patterns in two data sets Martin Widmann VALUE training school, ICTP

Reducing Dimensionality Steven J Zeil Old Dominion Univ. Fall 2010 1 Feature Selection

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Block-Quantized Kernel Matrix for Fast Spectral Embedding Kai Zhang James T. Kwok Department of

Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data

Analytical Query Processing Marco Serafini COMPSCI 532 Lecture 7 Announcement Midterm date

Scalable Data Processing at Network transfer rates with nCorium Compute in Memory Modules Suresh

Regularization Overview Regularization Overview Problems & Multicollinearity We will