Outline Multivariate Data 1 Multivariate Parametric Methods - PowerPoint PPT Presentation

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Outline Multivariate Data 1 Multivariate Parametric Methods Multivariate Normal Distribution 2 Steven J Zeil Multivariate Classification 3 Old Dominion Univ. Discriminants Tuning Complexity Fall 2010 Discrete Features Multivariate Regression 4 1 2 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Basic Multivariate Statistics µ = [ µ 1 , µ 2 , . . . , µ d ] T d inputs (a.k.a., features , attributes ) Mean: E [ � x ] = � N instances (a.k.a., observations , examples ) Covariance: σ ij ≡ Cov ( x i , x j ) = E [( x i − µ i )( x j − µ j )] = E [ x i x j ] − µ i µ j  x 1 x 1 x 1  . . . 1 2 d σ ij Correlation: Corr ( x i , x j ) ≡ ρ ij = x 2 x 2 x 2 . . . σ i σ j  1 2  d X = . . .   . . . Covariance matrix   . . .   x N x N x N . . . 1 2 d µ ) T ] Σ ≡ Cov ( � x ) = E [( � x − � µ )( � x − � σ 2   σ 12 . . . σ 1 d (Later will consider what happens if some gapes are allowed in 1 σ 2 σ 21 . . . σ 2 d observations.)   2 =  . . .  . . .   . . .   σ 2 σ 2 σ d 1 . . . d d 3 4

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Parameter Estimation Imputation � N t =1 x t What if certain instances have missing attributes? Sample Mean � m : m i = , i = 1 . . . d i N Throw out entire instance? � N t =1 ( x t i − m i )( x t j − m j ) Covariance Matrix: s ij = problem if the sample is small N s ij Correlation Matrix: R : r ij = Imputation : fill in the missing value s i s j Mean imputation: use the expected value Imputation by regression: predict based on other attributes 5 6 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Normal Distribution Slicing Any slice (projection) along a single direction � w is normal: � x ∼ N d ( � µ, Σ) w T � w T � w T Σ � � x ∼ N ( � µ, � w ) 1 � − 1 � µ ) T Σ − 1 ( � x − � x − � p ( � x ) = (2 π ) d / 2 | Σ | 1 / 2 exp 2( � µ ) Any projection onto a linearly transformed set of axes of dimention ≤ d is MV Normal 7 8

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Effects of Covariance Normalized Distance z = x − µ can be seen as a distance from µ to x in normalized σ σ -size units. Generalizing to d dimensions gives the Mahalanobis distance µ ) T Σ − 1 ( � ( � x − � x − � µ ) If x i has larger variance than x j , x i gets lower weight in this distance. If x i and x j are highly correlated, they get less weight than two less correlated variables. A small | Σ | indicates that the samples are close to � µ and/or the variables are highly correlated If | Σ | is zero, then some of the variables are constant or there is a linear dependency among variables. Either way, reduce the dimensionality by removing unneeded variables 9 10 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Special Cases of Mahalanobis Distance Outline µ ) T Σ − 1 ( � d ( � x ) = ( � x − � x − � µ ) Multivariate Data 1 If the x i are independent, off-diagonal elements of Σ are zero Multivariate Normal Distribution 2 d � 2 � x i − µ i � d ( � x ) = Multivariate Classification 3 Σ i Discriminants i =0 Tuning Complexity If the variances are also equal, reduces to Euclidean distance Discrete Features Multivariate Regression 4 11 12

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Classification Quadratic Discriminant If p ( � x | C i ) ∼ N ( � µ i , Σ i ), d 2 log 2 π − 1 2 log | S i | − 1 1 � − 1 � m i ) T S − 1 m i ) + log ˆ µ i ) T Σ − 1 g i ( � 2( � x − � ( � x = � x ) = P ( C i ) p ( � x | C i ) = (2 π ) d / 2 | Σ i | 1 / 2 exp 2( � x − � ( � x − � µ i ) i i − 1 2 log | S i | − 1 m i ) T S − 1 m i ) + log ˆ ≃ 2( � x − � ( � x = � P ( C i ) Discriminants are i − 1 2 log | S i | − 1 � � x T S − 1 x T S − 1 m T i S − 1 = � x − 2 � � m i + � � � g i ( � log p ( � x | C i ) + log P ( C i ) m i x ) = i i i 2 2 log 2 π − 1 d 2 log | Σ i | − 1 + log ˆ µ i ) T Σ − 1 P ( C i ) 2( � x − � ( � x − � = µ i ) i x T W i � w T x + w 0 = � x + � i � + log P ( C i ) i This is a quadratic in � x . Estimate as x ) = d 2 log 2 π − 1 2 log | S i |− 1 m i )+log ˆ m i ) T S − 1 g i ( � 2( � x − � ( � x − � P ( C i ) i 13 14 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Simplification: Shared covariance Share a common sample covariance S � ˆ S = P ( C i ) S i i Discriminant simplifies to x ) = 1 likelihoods m i ) T S − 1 ( � m i ) + log ˆ g i ( � 2( � x − � x − � P ( C i ) Although this function is quadratic in � x , it yields a linear x t � discriminant because the � x quadratic term is identical across all i . posterior for C i 15 16

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Linear Discriminant Further Simplification: Independence If we share a common sample covariance S and the variables are independent, then the off-diagonal elements of S are zero. Discriminant simplifies to � 2 d � x t j − m ij x ) = 1 � + log ˆ g i ( � P ( C i ) 2 s j j =1 This is the Naive Bayes Classifier . Each variable is an independent Gaussian Distance measured in standard deviation units 17 18 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Diagonal S Further Simplification: Equal Variances If variances are also equal, Discriminant simplifies to � 2 d � x t j − m ij x ) = 1 � + log ˆ g i ( � P ( C i ) 2 s j =1 This is the nearest mean classifier . 19 20 Ellipsoids are aligned with axes.

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Model Selection Binary Features x j ∈ { 0 , 1 } Assumption Covariance matrix # Parameters p ij = p ( x j = 1 | C i ) S i = S = s 2 I Equal variances 1 If the x j are independent (Naive Bayes) Independent S i = S , s ij = 0 d Shared Covariance S i = S d ( d + 1) / 2 d Different Covariances Kd ( d + 1) / 2 p x j S i � ij (1 − p ij ) (1 − x j ) x | C i ) = p ( � j = i The discriminant in linear � p ij )] + log ˆ g i ( � p ij + (1 − x j ) log (1 − ˆ x ) = [ x j log ˆ P ( C i ) j 21 22 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Discrete Features Multivariate Regression x j ∈ { v 1 , v 2 , . . . , v n j } r t = g ( � x t | w 0 , w 1 , . . . , w d ) + ε p ijk = p ( z jk = 1 | C i ) = p ( x j = v k | C i ) If the x j are independent Multivariate linear model w 0 + w 1 x t 1 + w 2 x t 2 + . . . + w d x t n j d d p z jk � � x | C i ) = p ( � Error: ijk j = i w |X ) = 1 k = i r t − ( w 0 + w 1 x t � 1 + w 2 x t 2 + . . . + w d x t � � E ( � d ) 2 t � � p ijk + log ˆ g i ( � x ) = z jk log ˆ P ( C i ) x 1 x 1 x 1 r 1     1 . . . j k 1 2 k x 2 x 2 x 2 r 2 1 . . .     1 2 k D =  � r = . . . . .     . . . . .     . . . . .    x N x N x N r N 1 . . . 1 2 k ( D T D ) � w = D T � r w = ( D T D ) − 1 D T � � r 23 24

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Regression ( D T D ) � w = D T � r w = ( D T D ) − 1 D T � � r Solution is same as for Univariate polynomial regression, but using the distinct variables instead of different powers. 25

Outline Multivariate Data 1 Multivariate Parametric Methods - PowerPoint PPT Presentation

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Outline Multivariate Data 1

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

= = = 2 Further Var( X ) Var( ) Y a a a X =

Spiked Eigenvalues of High Dimensional Separable Sample Covariance Matrices Guangming Pan,

Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence GPSS 13th September 2016

Probability and Statistics for Computer Science cov ( X, Y

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product

Where do Multivariate Normal Samples Come from? Paul E. Johnson 1 2 1 Department of Political

PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, PLUGIN CLASSIFIERS: NAIVE BAYES, LDA, LOGISTIC REGRESSION

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.