Multivariate normal distribution Surajit Ray Reader, University of - - PowerPoint PPT Presentation

multivariate normal distribution
SMART_READER_LITE
LIVE PREVIEW

Multivariate normal distribution Surajit Ray Reader, University of - - PowerPoint PPT Presentation

DataCamp Multivariate Probability Distributions in R MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R Multivariate normal distribution Surajit Ray Reader, University of Glasgow DataCamp Multivariate Probability Distributions in R Univariate


slide-1
SLIDE 1

DataCamp Multivariate Probability Distributions in R

Multivariate normal distribution

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Surajit Ray

Reader, University of Glasgow

slide-2
SLIDE 2

DataCamp Multivariate Probability Distributions in R

Univariate normal distribution

Univariate normal with mean 2 and variance 1

slide-3
SLIDE 3

DataCamp Multivariate Probability Distributions in R

Density shape of a bivariate normal

slide-4
SLIDE 4

DataCamp Multivariate Probability Distributions in R

Bivariate normal density - 3D density plot

μ = , Σ = (1 2) ( 1 0.5 0.5 2 )

slide-5
SLIDE 5

DataCamp Multivariate Probability Distributions in R

Bivariate normal density - contour plot

μ = , Σ = (1 2) ( 1 0.5 0.5 2 )

slide-6
SLIDE 6

DataCamp Multivariate Probability Distributions in R

Bivariate normal density with a different mean

μ = , Σ = (−1 −3) ( 1 0.5 0.5 2 )

slide-7
SLIDE 7

DataCamp Multivariate Probability Distributions in R

Bivariate normal density with a different variance

μ = , Σ = ( 1 2) (2 2)

slide-8
SLIDE 8

DataCamp Multivariate Probability Distributions in R

Bivariate normal density with strong correlation

μ = , Σ = ( 1 2) ( 1 0.95 0.95 1 )

slide-9
SLIDE 9

DataCamp Multivariate Probability Distributions in R

Functions for statistical distributions in R

slide-10
SLIDE 10

DataCamp Multivariate Probability Distributions in R

Functions for statistical distributions in R

The first letter denotes

p for "probability" q for "quantile" d for "density" r for "random"

Followed by the distribution name

norm mvnorm t mvt

slide-11
SLIDE 11

DataCamp Multivariate Probability Distributions in R

The rmvnorm function

Need to specify:

n the number of samples mean the mean of the distribution sigma the variance-covariance matrix

library(mvtnorm) rmvnorm(n, mean , sigma)

slide-12
SLIDE 12

DataCamp Multivariate Probability Distributions in R

Using rmvnorm to generate random samples

Generate 1000 samples from a 3 dimensional normal with μ = Σ = ⎝ ⎛ 1 2 −5⎠ ⎞ ⎝ ⎛1 1 1 2 5⎠ ⎞

mu1 <- c(1, 2, -5) sigma1 <- matrix(c(1,1,0, 1,2,0, 0,0,5),3,3) set.seed(34) rmvnorm(n = 1000, mean = mu1, sigma = sigma1)

slide-13
SLIDE 13

DataCamp Multivariate Probability Distributions in R

Plot of generated samples

slide-14
SLIDE 14

DataCamp Multivariate Probability Distributions in R

Let's practice simulating from a multivariate normal distribution!

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

slide-15
SLIDE 15

DataCamp Multivariate Probability Distributions in R

Density of a multivariate normal distribution

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Surajit Ray

Reader, University of Glasgow

slide-16
SLIDE 16

DataCamp Multivariate Probability Distributions in R

Why calculate the density of a distribution?

slide-17
SLIDE 17

DataCamp Multivariate Probability Distributions in R

Why calculate the density of a distribution?

slide-18
SLIDE 18

DataCamp Multivariate Probability Distributions in R

Univariate normal functions dnorm()

slide-19
SLIDE 19

DataCamp Multivariate Probability Distributions in R

Probability density of a bivariate normal

Standard bivariate normal with μ = ,Σ = Density heights calculated at several locations (xy coordinates) (0 0) (1 1)

slide-20
SLIDE 20

DataCamp Multivariate Probability Distributions in R

Density using dmvnorm

x can be a row vector or a matrix

library(mvtnorm) dmvnorm(x, mean, sigma) mu1 <- c(1, 2) sigma1 <- matrix(c(1, .5, .5, 2), 2) dmvnorm(x = c(0, 0), mean = mu1, sigma = sigma1) 0.0384

slide-21
SLIDE 21

DataCamp Multivariate Probability Distributions in R

Density at multiple points using dmvnorm

x <- rbind(c(0, 0), c(1, 1), c(0, 1)); x [1,] 0 0 [2,] 1 1 [3,] 0 1 dmvnorm(x = x, mean = mu, sigma = sigma) [1] 0.0384 0.0904 0.0679

slide-22
SLIDE 22

DataCamp Multivariate Probability Distributions in R

Plotting bivariate densities with perspective plot

Steps: Create grid of x and y coordinates Calculate density on grid

slide-23
SLIDE 23

DataCamp Multivariate Probability Distributions in R

Plotting bivariate densities with perspective plot

Steps: Create grid of x and y coordinates Calculate density on grid Convert densities into a matrix Create perspective plot using

persp() function

slide-24
SLIDE 24

DataCamp Multivariate Probability Distributions in R

Code for plotting bivariate densities

# Create grid d <- expand.grid(seq(-3, 6, length.out = 50 ), seq(-3, 6, length.out = 50)) # Calculate density on grid dens1 <- dmvnorm(as.matrix(d), mean=c(1,2), sigma=matrix(c(1, .5, .5, 2), 2)) # Convert to matrix dens1 <- matrix(dens1, nrow = 50 ) # Use perspective plot persp(dens1, theta = 80, phi = 30, expand = 0.6, shade = 0.2, col = "lightblue", xlab = "x", ylab = "y", zlab = "dens")

slide-25
SLIDE 25

DataCamp Multivariate Probability Distributions in R

Changing viewing angle in perspective plot

persp() with theta = 30, phi = 30 persp() with theta = 80, phi = 10

slide-26
SLIDE 26

DataCamp Multivariate Probability Distributions in R

Let's practice!

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

slide-27
SLIDE 27

DataCamp Multivariate Probability Distributions in R

Cumulative Distribution and Inverse CDF

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Surajit Ray

Reader, University of Glasgow

slide-28
SLIDE 28

DataCamp Multivariate Probability Distributions in R

When do we need to calculate CDF and inverse CDF?

slide-29
SLIDE 29

DataCamp Multivariate Probability Distributions in R

When do we need to calculate CDF and inverse CDF?

Normal density with μ = 210 and σ = 10

slide-30
SLIDE 30

DataCamp Multivariate Probability Distributions in R

When do we need to calculate CDF and inverse CDF?

Area under the curve for x < 200

slide-31
SLIDE 31

DataCamp Multivariate Probability Distributions in R

When do we need to calculate CDF and inverse CDF?

pnorm(200, mean = 210, sd = 10) [1] 0.159

slide-32
SLIDE 32

DataCamp Multivariate Probability Distributions in R

When do we need to calculate CDF and inverse CDF?

What is the x such that the cumulative probability at x is 0.95? ⇒ 95% of the coffee jars will have less than 226.45 grams of coffee

qnorm( p = 0.95, mean = 210, sd = 10) [1] 226.45

slide-33
SLIDE 33

DataCamp Multivariate Probability Distributions in R

Cumulative distribution for a bivariate normal

Bivariate CDF at x = 2 and y = 4 for a normal with μ = , Σ = (1 2) ( 1 .5 .5 2)

slide-34
SLIDE 34

DataCamp Multivariate Probability Distributions in R

Cumulative distribution using pmvnorm

Bivariate CDF at x = 2 and y = 4 for a normal with μ = , Σ = (1 2) ( 1 0.5 0.5 2 )

mu1 <- c(1, 2) sigma1 <- matrix(c(1, 0.5, 0.5, 2), 2) pmvnorm(upper = c(2, 4), mean = mu1, sigma = sigma1) [1] 0.79 attr(,"error") [1] 1e-15 attr(,"msg") [1] "Normal Completion"

slide-35
SLIDE 35

DataCamp Multivariate Probability Distributions in R

Probability between two values using pmvnorm

Probability of 1<x<2 and 2<y <4

pmvnorm(lower = c(1, 2), upper = c(2, 4), mean = mu1, sigma = sigma1)

slide-36
SLIDE 36

DataCamp Multivariate Probability Distributions in R

Probability between two values using pmvnorm

Probability of 1 < x < 2 and 2 < y < 4

pmvnorm(lower = c(1, 2), upper = c(2, 4), mean = mu1, sigma = sigma1) [1] 0.163

slide-37
SLIDE 37

DataCamp Multivariate Probability Distributions in R

Inverse CDF for bivariate normal

Dark red ellipse is the 0.95 quantile

slide-38
SLIDE 38

DataCamp Multivariate Probability Distributions in R

Implementing qmvnorm to calculate quantiles

The red circle with radius 2.24 contains 0.95 of the probability

sigma1 <- diag(2) sigma1 [,1] [,2] [1,] 1 0 [2,] 0 1 qmvnorm(p = 0.95, sigma = sigma1, tail = "both") $quantile [1] 2.24 $f.quantile [1] -1.31e-06 attr(,"message") [1] "Normal Completion"

slide-39
SLIDE 39

DataCamp Multivariate Probability Distributions in R

Let's practice!

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

slide-40
SLIDE 40

DataCamp Multivariate Probability Distributions in R

Checking normality of multivariate data

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R

Surajit Ray

Reader, University of Glasgow

slide-41
SLIDE 41

DataCamp Multivariate Probability Distributions in R

Why check normality?

Classical statistical techniques that assume univariate/multivariate normality: Multivariate regression Discriminant analysis Model-based clustering Principal component analysis (PCA) Multivariate analysis of variance (MANOVA)

slide-42
SLIDE 42

DataCamp Multivariate Probability Distributions in R

Review: univariate normality tests

If the values lie along the reference line the distribution is close to normal

qqnorm(iris_raw[, 1]) qqline(iris_raw[, 1])

slide-43
SLIDE 43

DataCamp Multivariate Probability Distributions in R

Review: univariate normality tests

If the values lie along the reference line the distribution is close to normal Deviation from the line might indicate heavier tails skewness

  • utliers

clustered data

qqnorm(iris_raw[, 1]) qqline(iris_raw[, 1])

slide-44
SLIDE 44

DataCamp Multivariate Probability Distributions in R

qqnorm of all variables

uniPlot(iris_raw[, 1:4])

slide-45
SLIDE 45

DataCamp Multivariate Probability Distributions in R

MVN library multivariate normality test functions

Multivariate normality tests by Mardia Henze-Zirkler Royston Graphical appoaches chi-square Q-Q perspective contour plots

slide-46
SLIDE 46

DataCamp Multivariate Probability Distributions in R

MVN library multivariate normality test functions

Multivariate normality tests by Mardia ✓ Henze-Zirkler ✓ Royston Graphical appoaches chi-square Q-Q ✓ perspective contour plots

slide-47
SLIDE 47

DataCamp Multivariate Probability Distributions in R

Using mardiaTest to check multivariate normality

mardiaTest(iris_raw[, 1:4]) Mardia Multivariate Normality Test

  • data : iris_raw[, 1:4]

g1p : 2.697 chi.skew : 67.43 p.value.skew : 4.758e-07 g2p : 23.74 z.kurtosis : -0.2301 p.value.kurt : 0.818 chi.small.skew : 69.33 p.value.small : 2.342e-07 Result : Data are not multivariate normal.

slide-48
SLIDE 48

DataCamp Multivariate Probability Distributions in R

Using qqplot from mardiaTest to check multivariate normality

mardiaTest(iris_raw[, 1:4], qqplot = TRUE)

slide-49
SLIDE 49

DataCamp Multivariate Probability Distributions in R

Using hzTest to check multivariate normality

hzTest(iris_raw[,1:4]) Henze-Zirkler's Multivariate Normality Test

  • data : iris_raw[, 1:4]

HZ : 2.333269 p-value : 0 Result : Data are not multivariate normal.

slide-50
SLIDE 50

DataCamp Multivariate Probability Distributions in R

Testing multivariate normality by species

mardiaTest(iris[iris_raw$Species == "setosa", 1:4]) Mardia's Multivariate Normality Test

  • g1p : 3.08

chi.skew : 25.7 p.value.skew : 0.177 g2p : 26.5 z.kurtosis : 1.29 p.value.kurt : 0.195 chi.small.skew : 27.85973 p.value.small : 0.1127617 Result : Data are multivariate normal.

slide-51
SLIDE 51

DataCamp Multivariate Probability Distributions in R

Let's make use of the tests for multivariate normality!

MULTIVARIATE PROBABILITY DISTRIBUTIONS IN R