Probability and Statistics for Computer Science Principal - PowerPoint PPT Presentation

Probability and Statistics ì for Computer Science Principal Component Analysis --- Exploring the data in less dimensions Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.27.2020

Last time ✺ Review of Bayesian inference ✺ Visualizing high dimensional data & Summarizing data ✺ The covariance matrix

Objectives ✺ Principal Component Analysis ✺ Examples of PCA

Diagonalization of a symmetric matrix ✺ If A is an n × n symmetric square matrix, the eigenvalues are real. ✺ If the eigenvalues are also disSnct, their eigenvectors are orthogonal ✺ We can then scale the eigenvectors to unit length, and place them into an orthogonal matrix U = [ u 1 u 2 …. u n ] ✺ We can write the diagonal matrix such Λ = U T AU that the diagonal entries of Λ are λ 1 , λ 2 … λ n in that order.

Diagonalization example ✺ For � � 5 3 A = 3 5

Covariance for a pair of components in a data set ✺ For the jth and kth components of a data set {x} i ( x ( j ) − mean ( { x ( j ) } ))( x ( k ) − mean ( { x ( k ) } )) T � cov ( { x } ; j, k ) = i i N

Covariance matrix Data set { x } 7×8 { x } Covmat( ) 7×7 cov ( { x } ; 3 , 5) 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 1 * * * * * * * 1 * * * * * * * * 2 * * * * * * * 2 * * * * * * * * 3 * * * * * * * * 3 * * * * * * * { 4 * * * * * * * * 4 * * * * * * * 5 * * * * * * * * 5 * * * * * * * 6 * * * * * * * * 7 * * * * * * * * 6 * * * * * * * 7 * * * * * * *

Properties of Covariance matrix { x } Covmat( ) cov ( { x } ; j, j ) = var ( { x ( j ) } ) 7×7 1 2 3 4 5 6 7 ✺ The diagonal elements 1 * * * * * * * of the covariance matrix 2 * * * * * * * are just variances of 3 * * * * * * * each jth components 4 * * * * * * * ✺ The off diagonals are 5 * * * * * * * covariance between 6 * * * * * * * different components 7 * * * * * * *

Properties of Covariance matrix { x } Covmat( ) 7×7 cov ( { x } ; j, k ) = cov ( { x } ; k, j ) 1 2 3 4 5 6 7 ✺ The covariance 1 * * * * * * * matrix is symmetric ! 2 * * * * * * * ✺ And it’s posi6ve 3 * * * * * * * semi-definite , that is 4 * * * * * * * all λ i ≥ 0 5 * * * * * * * 6 * * * * * * * ✺ Covariance matrix is 7 * * * * * * * diagonalizable

Properties of Covariance matrix { x } Covmat( ) ✺ If we define x c as the 7×7 mean centered 1 2 3 4 5 6 7 matrix for dataset {x} 1 * * * * * * * 2 * * * * * * * Covmat ( { x } ) = x c × x T 3 * * * * * * * c N 4 * * * * * * * 5 * * * * * * * ✺ The covariance 6 * * * * * * * matrix is a d×d matrix 7 * * * * * * * d =7

Example: covariance matrix of a data set (I) What are the dimensions of the covariance matrix of this data? X (1) � � 5 4 3 2 1 A 0 = X (2) − 1 1 0 1 − 1 A) 2 by 2 B) 5 by 5 C) 5 by 2 D) 2 by 5

Example: covariance matrix of a data set Mean centering A 2 = A 1 A T (II) (I) 1 � � 5 4 3 2 1 Inner product of each pairs: A 0 = [1,1] = 10 − 1 1 0 1 − 1 A 2 [2,2] = 4 A 2 � � 2 1 0 − 1 − 2 [1,2] = 0 A 2 A 1 = − 1 1 0 1 − 1 (III) Divide the matrix with N – the number of data poits � � � � = 1 N A 2 = 1 10 0 2 0 Covmat( ) { x } = 0 . 8 0 4 0 5

What do the data look like when Covmat({x}) is diagonal? X (2) X (1) � � 5 4 3 2 1 A 0 = − 1 1 0 1 − 1 X (2) * * X (1) * * * � � � � { x } = 1 N A 2 = 1 10 0 2 0 Covmat( ) = 0 . 8 0 4 0 5

What is the correlation between the 2 components for the data m? � � 20 25 Covmat ( m ) = 25 40

Q. Is this true? Transforming a matrix with orthonormal matrix only rotates the data A. Yes B. No

Dimension Reduction ✺ In stead of showing more dimensions through visualizaSon, it’s a good idea to do dimension reducSon in order to see the major features of the data set. ✺ For example, principal component analysis help find the major components of the data set. ✺ PCA is essenSally about finding eigenvectors of the covariance matrix of the data set {x}

Dimension reduction from 2D to 1D Credit: Prof. Forsyth

Step 1: subtract the mean Credit: Prof. Forsyth

Step 2: Rotate to diagonalize the covariance Credit: Prof. Forsyth

Step 3: Drop component(s) Credit: Prof. Forsyth

Principal Components ✺ The columns of are the normalized eigenvectors of U the Covmat({x}) and are called the principal components of the data {x}

Principal components analysis ✺ We reduce the dimensionality of dataset { x } represented by matrix from d to s (s < d). D d × n ✺ Step 1. define matrix such that m = D − mean ( D ) m d × n r i = U T m i ✺ Step 2. define matrix such that r d × n Λ = U T Covmat ( { x } ) U Λ Where saSsfies , is U T the diagonalizaSon of with the eigenvalues Covmat ( { x } ) sorted in decreasing order, is the orthonormal U eigenvectors’ matrix ✺ Step 3. Define matrix such that is with the last p p d × n r d-s components of made zero. r

What happened to the mean? ✺ Step 1. mean ( m ) = mean ( D − mean ( D )) = 0 ✺ Step 2. mean ( r ) = U T mean ( m ) = U T 0 = 0 ✺ Step 3. mean ( p i ) = mean ( r i ) = 0 while i ∈ 1 : s mean ( p i ) = 0 while i ∈ s + 1 : d

What happened to the covariances? ✺ Step 1. Covmat ( m ) = Covmat ( D ) = Covmat ( { x } ) ✺ Step 2. Covmat ( r ) = U T Covmat ( m ) U = Λ ✺ Step 3. is with the last/smallest d-s Λ Covmat ( p ) diagonal terms turned to 0.

Sample covariance matrix ✺ In many staSsScal programs, the sample covariance matrix is defined to be Covmat ( m ) = m m T N − 1 ✺ Similar to what happens to the unbiased standard deviaSon

PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. ✺ Step 3.

PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. � � 20 25 Covmat ( m ) = λ 1 ≃ 57; λ 2 ≃ 3 ⇒ 25 40 � � � � 0 . 5606288 0 . 8280672 0 . 5606288 − 0 . 8280672 U T = ⇒ U = − 0 . 8280672 0 . 5606288 0 . 8280672 0 . 5606288 ✺ Step 3.

PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. � � 20 25 Covmat ( m ) = λ 1 ≃ 57; λ 2 ≃ 3 ⇒ 25 40 � � � � 0 . 5606288 0 . 8280672 0 . 5606288 − 0 . 8280672 U T = ⇒ U = − 0 . 8280672 0 . 5606288 0 . 8280672 0 . 5606288 � � 7 . 478 − 7 . 211 10 . 549 − 0 . 267 − 3 . 071 − 7 . 478 ⇒ r = U T m = 1 . 440 − 0 . 052 − 1 . 311 − 1 . 389 2 . 752 − 1 . 440 ✺ Step 3.

PCA an example ✺ Step 1. � � � � 3 − 4 7 1 − 4 − 3 0 D = ⇒ mean ( D ) = 7 − 6 8 − 1 − 1 − 7 0 � � 3 − 4 7 1 − 4 − 3 m = 7 − 6 8 − 1 − 1 − 7 ✺ Step 2. � � 20 25 Covmat ( m ) = λ 1 ≃ 57; λ 2 ≃ 3 ⇒ 25 40 � � � � 0 . 5606288 0 . 8280672 0 . 5606288 − 0 . 8280672 U T = ⇒ U = − 0 . 8280672 0 . 5606288 0 . 8280672 0 . 5606288 � � 7 . 478 − 7 . 211 10 . 549 − 0 . 267 − 3 . 071 − 7 . 478 ⇒ r = U T m = 1 . 440 − 0 . 052 − 1 . 311 − 1 . 389 2 . 752 − 1 . 440 ✺ Step 3. � � 7 . 478 − 7 . 211 10 . 549 − 0 . 267 − 3 . 071 − 7 . 478 ⇒ p = 0 0 0 0 0 0

What is this matrix for the previous example? U T Covmat ( m ) U =?

What is this matrix for the previous example? U T Covmat ( m ) U =? � � 57 0 0 3

The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d 1 1 ∥ r i − p i ∥ 2 = ( r ( j ) � � � i ) 2 N − 1 N − 1 j = s +1 i i

The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d d 1 1 ∥ r i − p i ∥ 2 = 1 ( r ( j ) � � � i ) 2 N − 1( r ( j ) � � i ) 2 = N − 1 N − 1 j = s +1 i i j = s +1 i

The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d d 1 1 ∥ r i − p i ∥ 2 = 1 ( r ( j ) � � � i ) 2 N − 1( r ( j ) � � i ) 2 = N − 1 N − 1 j = s +1 i i j = s +1 i d var ( r ( j ) � = i ) j = s +1

The Mean square error of the projection ✺ The mean square error is the sum of the smallest d-s eigenvalues in Λ d d 1 1 ∥ r i − p i ∥ 2 = 1 ( r ( j ) � � � i ) 2 N − 1( r ( j ) � � i ) 2 = N − 1 N − 1 j = s +1 i i j = s +1 i d var ( r ( j ) � = i ) j = s +1 d � λ j = j = s +1

Probability and Statistics for Computer Science Principal - PowerPoint PPT Presentation

Probability and Statistics for Computer Science Principal Component Analysis --- Exploring the data in less dimensions Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.27.2020 Last time Review of Bayesian inference

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root

IN5550: Neural Methods in Natural Language Processing Lecture 2 Supervised Machine Learning:

Deep Learning-based Short Video Recommendation and Prefetching for Mobile Commuting Users Qian Li

Forecast verification 4th VALUE Training School Jonas Bhend, Sven Kotlarski Forecast verification

About interpolation on manifolds... How to interpolate points on curved spaces ? Light fast

Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport

Measuring inter- -annotator annotator Measuring inter agreement in GO agreement in GO

A S S O C I A T I O N O F S T A T E P U B L I C H E A L T H N U T R I T I O N I S T S A S S O

Probability and Statistics for Computer Science Principal - PowerPoint PPT Presentation

Probability and Statistics for Computer Science Principal Component Analysis --- Exploring the data in less dimensions Credit: wikipedia Hongye Liu, Teaching Assistant Prof, CS361, UIUC, 10.27.2020 Last time Review of Bayesian inference

Probability Basics Martin Emms October 1, 2020 Probability Basics Outline Probability

Continuing Probability. Wrap up: Total Probability and Conditional Probability. Continuing

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Probability Basics Probability Background Martin Emms October 1, 2020 Probability Basics

Chapter 2 Probability 1. Definition of Probability 2. Probability of disjoint events 3.

Categorical Probability and Statistics Peter McCullagh Department of Statistics University of

Counting and Probability Whats to come? Counting and Probability Whats to come?

Unit 2: Probability and distributions Lecture 1: Probability and conditional probability

Which probability Which probability Which probability Which probability theory for cosmology?

Recap of Basic Probability Elements of basic probability theory probability theory The

1 2 3 4 Stopping Probability Visiting Probability 5 Stopping

ACMS 20340 Statistics for Life Sciences Chapter 9: Introducing Probability Why Consider

Statistics 1B Statistics 1B 1 (11) 0. Lecture 1. Introduction and probability review

Statistics 370 Probability and Statistics for Engineers Instructor: Peter Bloomfield Course

Chapter II.2: Basic Probability Theory and Statistics 1. What is a probability? 1.1. Probability

Official Statistics Matt Dray, Assistant Statistician Official Statistics 2 Official

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) &amp; (Root

IN5550: Neural Methods in Natural Language Processing Lecture 2 Supervised Machine Learning:

Deep Learning-based Short Video Recommendation and Prefetching for Mobile Commuting Users Qian Li

Forecast verification 4th VALUE Training School Jonas Bhend, Sven Kotlarski Forecast verification

About interpolation on manifolds... How to interpolate points on curved spaces ? Light fast

Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport

Measuring inter- -annotator annotator Measuring inter agreement in GO agreement in GO

A S S O C I A T I O N O F S T A T E P U B L I C H E A L T H N U T R I T I O N I S T S A S S O

JUST THE MATHS SLIDES NUMBER 13.2 INTEGRATION APPLICATIONS 2 (Mean values) & (Root