Outline Multivariate Data 1 Multivariate Parametric Methods - - PowerPoint PPT Presentation

outline
SMART_READER_LITE
LIVE PREVIEW

Outline Multivariate Data 1 Multivariate Parametric Methods - - PowerPoint PPT Presentation

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression Outline Multivariate Data 1


slide-1
SLIDE 1

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Multivariate Parametric Methods

Steven J Zeil

Old Dominion Univ.

Fall 2010

1 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Outline

1

Multivariate Data

2

Multivariate Normal Distribution

3

Multivariate Classification Discriminants Tuning Complexity Discrete Features

4

Multivariate Regression

2 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Multivariate Data

d inputs (a.k.a., features, attributes) N instances (a.k.a., observations, examples) X =      x1

1

x1

2

. . . x1

d

x2

1

x2

2

. . . x2

d

. . . . . . . . . xN

1

xN

2

. . . xN

d

     (Later will consider what happens if some gapes are allowed in

  • bservations.)

3 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Basic Multivariate Statistics

Mean: E[ x] = µ = [µ1, µ2, . . . , µd]T Covariance: σij ≡ Cov(xi, xj) = E[(xi − µi)(xj − µj)] = E[xixj] − µiµj Correlation: Corr(xi, xj) ≡ ρij =

σij σiσj

Covariance matrix Σ ≡ Cov( x) = E[( x − µ)( x − µ)T] =      σ2

1

σ12 . . . σ1d σ21 σ2

2

. . . σ2d . . . . . . . . . σd1 σ2

d

. . . σ2

d

    

4

slide-2
SLIDE 2

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Multivariate Parameter Estimation

Sample Mean m: mi =

N

t=1 xt i

N

, i = 1 . . . d Covariance Matrix: sij =

N

t=1(xt i −mi)(xt j −mj)

N

Correlation Matrix: R : rij =

sij sisj

5 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Imputation

What if certain instances have missing attributes? Throw out entire instance?

problem if the sample is small

Imputation: fill in the missing value

Mean imputation: use the expected value Imputation by regression: predict based on other attributes

6 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Multivariate Normal Distribution

  • x ∼ Nd(

µ, Σ) p( x) = 1 (2π)d/2|Σ|1/2 exp

  • −1

2( x − µ)TΣ−1( x − µ)

  • 7

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Slicing

Any slice (projection) along a single direction w is normal:

  • wT

x ∼ N( wT µ, wTΣ w) Any projection onto a linearly transformed set of axes of dimention ≤ d is MV Normal

8

slide-3
SLIDE 3

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Effects of Covariance

9 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Normalized Distance

z = x−µ

σ

can be seen as a distance from µ to x in normalized σ-size units. Generalizing to d dimensions gives the Mahalanobis distance ( x − µ)TΣ−1( x − µ)

If xi has larger variance than xj, xi gets lower weight in this distance. If xi and xj are highly correlated, they get less weight than two less correlated variables. A small |Σ| indicates that the samples are close to µ and/or the variables are highly correlated If |Σ| is zero, then some of the variables are constant or there is a linear dependency among variables.

Either way, reduce the dimensionality by removing unneeded variables

10 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Special Cases of Mahalanobis Distance

d( x) = ( x − µ)TΣ−1( x − µ) If the xi are independent, off-diagonal elements of Σ are zero d( x) =

d

  • i=0

xi − µi Σi 2 If the variances are also equal, reduces to Euclidean distance

11 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Outline

1

Multivariate Data

2

Multivariate Normal Distribution

3

Multivariate Classification Discriminants Tuning Complexity Discrete Features

4

Multivariate Regression

12

slide-4
SLIDE 4

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Multivariate Classification

If p( x|Ci) ∼ N( µi, Σi), p( x|Ci) = 1 (2π)d/2|Σi|1/2 exp

  • −1

2( x − µi)TΣ−1

i

( x − µi)

  • Discriminants are

gi( x) = log p( x|Ci) + log P(Ci) = d 2 log 2π − 1 2 log |Σi| − 1 2( x − µi)TΣ−1

i

( x − µi) + log P(Ci) Estimate as gi( x) = d 2 log 2π−1 2 log |Si|−1 2( x− mi)TS−1

i

( x− mi)+log ˆ P(Ci)

13 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Quadratic Discriminant

gi( x) = d 2 log 2π − 1 2 log |Si| − 1 2( x − mi)TS−1

i

( x = mi) + log ˆ P(Ci) ≃ −1 2 log |Si| − 1 2( x − mi)TS−1

i

( x = mi) + log ˆ P(Ci) = −1 2 log |Si| − 1 2

  • xTS−1

i

  • x − 2

xTS−1

i

  • mi +

mT

i S−1 i

  • mi
  • + log ˆ

P(Ci) =

  • xTWi

x + wT

i

x + w0

i

This is a quadratic in x.

14 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

likelihoods posterior for Ci

15 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Simplification: Shared covariance

Share a common sample covariance S S =

  • i

ˆ P(Ci)Si Discriminant simplifies to gi( x) = 1 2( x − mi)TS−1( x − mi) + log ˆ P(Ci)

Although this function is quadratic in x, it yields a linear discriminant because the xt x quadratic term is identical across all i.

16

slide-5
SLIDE 5

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Linear Discriminant

17 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Further Simplification: Independence

If we share a common sample covariance S and the variables are independent, then the off-diagonal elements of S are zero. Discriminant simplifies to gi( x) = 1 2

d

  • j=1
  • xt

j − mij

sj 2 + log ˆ P(Ci) This is the Naive Bayes Classifier.

Each variable is an independent Gaussian Distance measured in standard deviation units

18 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Diagonal S

Ellipsoids are aligned with axes.

19 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Further Simplification: Equal Variances

If variances are also equal, Discriminant simplifies to gi( x) = 1 2

d

  • j=1
  • xt

j − mij

s 2 + log ˆ P(Ci) This is the nearest mean classifier.

20

slide-6
SLIDE 6

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Model Selection

Assumption Covariance matrix # Parameters Equal variances Si = S = s2I 1 Independent Si = S, sij = 0 d Shared Covariance Si = S d(d + 1)/2 Different Covariances Si Kd(d + 1)/2

21 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Binary Features

xj ∈ {0, 1} pij = p(xj = 1|Ci) If the xj are independent (Naive Bayes) p( x|Ci) =

d

  • j=i

pxj

ij (1 − pij)(1−xj)

The discriminant in linear gi( x) =

  • j

[xj log ˆ pij + (1 − xj) log (1 − ˆ pij)] + log ˆ P(Ci)

22 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Discrete Features

xj ∈ {v1, v2, . . . , vnj} pijk = p(zjk = 1|Ci) = p(xj = vk|Ci) If the xj are independent p( x|Ci) =

d

  • j=i

nj

  • k=i

pzjk

ijk

gi( x) =

  • j
  • k

zjk log ˆ pijk + log ˆ P(Ci)

23 Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Multivariate Regression

rt = g( xt|w0, w1, . . . , wd) + ε Multivariate linear model w0 + w1xt

1 + w2xt 2 + . . . + wdxt d

Error: E( w|X) = 1 2

  • t
  • rt − (w0 + w1xt

1 + w2xt 2 + . . . + wdxt d)

  • D =

     1 x1

1

x1

2

. . . x1

k

1 x2

1

x2

2

. . . x2

k

. . . . . . . . . . . . 1 xN

1

xN

2

. . . xN

k

     r =      r1 r2 . . . rN      (DTD) w = DT r

  • w = (DTD)−1DT

r

24

slide-7
SLIDE 7

Multivariate Data Multivariate Normal Distribution Multivariate Classification Multivariate Regression

Multivariate Regression

(DTD) w = DT r

  • w = (DTD)−1DT

r Solution is same as for Univariate polynomial regression, but using the distinct variables instead of different powers.

25