1 Random vectors I Some experiments produce outcomes that are - - PDF document

1
SMART_READER_LITE
LIVE PREVIEW

1 Random vectors I Some experiments produce outcomes that are - - PDF document

Advanced topics from statistics Anders Ringgaard Kristensen Advanced Herd Management Slide 1 Outline Covariance and correlation Random vectors and multivariate distributions The multinomial distribution The multivariate


slide-1
SLIDE 1

1

Slide 1

”Advanced” topics from statistics

Anders Ringgaard Kristensen

Advanced Herd Management

Slide 2

Outline

Covariance and correlation Random vectors and multivariate distributions

  • The multinomial distribution
  • The multivariate normal distribution

Hyper distributions and hyper parameters Commonly used hyper distributions Conjugate families

Slide 3

Covariance and correlation Let and be two random variables having expected values µ, µ and standard deviations σ and σy the covariance between and is defined as

  • Cov(, ) = σ = E(( − µ)( − µ)) = E() . µµ

The correlation beween and is In particular we have Cov(, ) = σ2 and Corr(, ) = 1 If and are independent, then E() = µµ and therefore:

  • Cov(, ) = 0
  • Corr(, ) = 0
slide-2
SLIDE 2

2

Slide 4

Random vectors I

Some experiments produce outcomes that are vectors. Such a vector is called a We write = (1 2 … )’. Each element in is a random variable having an expected value E() = µ and a variance Var() = σ2. The covariance between two elements and is denoted σ For convenience we may use the notation σ = σ2

Slide 5

Random vectors II

A random vector = (1 2 … )’ has an expected value, which is also a vector. It has a ”variance”, Σ, which is a matrix: Σ is also called the variance.covariance matrix or just the covariance matrix. Since Cov(, ) = Cov(, ), we conclude that Σ is symmetric, i.e σ = σ

Slide 6

Random vectors III Let be a random vector of dimension Assume that E() = µ µ µ µ, and let Σ Σ Σ Σ be the covariance matrix of . Define = + , where is an × matrix and is an dimensional vector. Then is an dimensional random vector with E() = µ µ µ µ + , and covariance matrix Σ Σ Σ Σ’ (compare with corresponding rule for ordinary random variables).

slide-3
SLIDE 3

3

Slide 7

Multivariate distributions

The distribution of a random vector is called a distribution. Some multivariate distributions may be expressed by a certain function over the sample space. We shall consider 2 such common multivariate distributions:

  • The multinomial distribution (discrete)
  • The multivariate normal distribution (continuous)
Slide 8

The multinomial distribution I

Consider an experiment with categorical outcomes. Assume that there are mutually exclusive and exhaustive outcomes.

  • Rolling a dice → 1, 2, 3, 4, 5, 6 (= 6)
  • Testing for somatic cell counts at cow level →

200,000;

  • 200,000 < ≤ 300.000;
  • 300,000 < ≤ 400.000;
  • 400,000 < ( = 4)

Assume that the probability of category is and ∑ik = 1 The experiment is repeated times. Let = (1 2 … ) be a random vector defined so that is the total number of outcomes belonging to category . The sample space of the compound experiments is = { ∈ R | ∑ = n}

Slide 9

The multinomial distribution II

The random vector is then said to have a multinomial distribution with parameters = (1 2 … )’ and . The probability distribution for ∈ is The expected value is E() = The covariance matrix Σ Σ Σ Σ is

slide-4
SLIDE 4

4

Slide 10

The multivariate normal distribution I A dimensional random vector with sample space = R has a multivariate normal distribution if it has a density function given as The expected value is E() = µ, and the covariance matrix is Σ.

Slide 11

The multivariate normal distribution II

The density function of the 2 dimensional random vector = (1 2)’. What is the sign of Cov(1 2)?

Slide 12

The multivariate normal distribution III

Conditional distribution of subset:

  • Suppose that = (1… k)’ is N(µ, Σ)

and we partition into two sub.vectors 1 = (1 …j)’ and = (j+1 … k)’. We partition the mean vector µ and the covariance matrix Σ accordingly and write

  • Then 1 ∼ N(µ1, Σ11) and 2 ∼ N(µ2, Σ22)
slide-5
SLIDE 5

5

Slide 13

The multivariate normal distribution IV

Conditional distribution, continued:

  • The matrix Σ

Σ Σ Σ12 = Σ Σ Σ Σ’21 contains the co. variances between elements of the sub.vector 1 and the sub.vector 2.

  • Moreover, for 1 = 1 the conditional

distribution of 2 is N(ν ν ν ν, ) where

  • ν

ν ν ν = µ µ µ µ2 + Σ Σ Σ Σ21Σ Σ Σ Σ11

.1 (1 − µ

µ µ µ1 )

  • = Σ

Σ Σ Σ22 − Σ Σ Σ Σ21Σ Σ Σ Σ11

.1Σ

Σ Σ Σ12

Slide 14

The multivariate normal distribution V Example:

  • Let 1, 2, … 5 denote the first five lactations of

a dairy cow.

  • It is then reasonable to assume that = (1 2

…5)’ has a 5 dimensional normal distribution.

  • Having observed e.g. 1, 2 and 3 we can predict

4 and 5 according to the conditional formulas on previous slide.

Slide 15

Hyper distributions: Motivation I

Until now, our assessment to distributions has been under the assumption that a parameter has a fixed (but often unknown) value. When we for instance observe the number of sows conceiving out of inseminated, we have tested the hypothesis that is drawn from a binomial distribution with parameter = 0 where 0 is some fixed value (e.g. 0.84). For analysis of production results this is not really a problem.

slide-6
SLIDE 6

6

Slide 16

Hyper distributions: Motivation II

Even though it is interesting to analyze whether an achieved conception rate is “satisfactory”, the real challenge is to use the information achieved for planning the future. In planning the future, predictions play an important role. If we mate 50 sows, how many can we expect to farrow in 115 days? A central question in production planning.

Slide 17

Hyper distributions: Motivation III Prediction of number of farrowings:

  • “Naïve”: Assume that you know the conception

rate with certainty, e.g. = 0.84. With 50 matings this gives us an expected number of farrowings is = 0.84 × 50 = 42.

  • “Semi.naïve”: Take the binomial variation into
  • account. By use of the binomial probability

function we can calculate the probability for any ≤ 50

  • “Correct”: Take the uncertainty about the true

value of into account.

Slide 18

Binomial variation of prediction

slide-7
SLIDE 7

7

Slide 19

Full variation of prediction

Slide 20

Can we take uncertainty into account

In planning (prediction) we should take uncertainty into account. Can we do that? Yes, we can specify a distribution for the parameter (in this case the conception rate). Such a distribution for a parameter is called a hyper distribution.

Slide 21

Hyper distributions

In all kinds of planning, representation of uncertainty is important. We use hyper distributions to specify our belief in the true value. Hyper distributions learn over time. As observations are collected, our uncertainty may decrease. This is reflected in the hyper distribution. The parameters of a hyper distribution are called hyper parameters.

slide-8
SLIDE 8

8

Slide 22

Conjugate families

Most standard distributions have corresponding families of hyper distributions that are suitable for representation of

  • ur belief in parameter values.

If the family of hyper distributions is closed under sampling it is called a To be ”closed under sampling” means that if our belief in a parameter value is represented by a distribution from a certain family, then the distribution after having taken further observations belongs to the same family.

Slide 23

Conjugate families: Binomial I The conjugate family for the probability parameter

  • f a binomial distribution is the family of Beta

distributions. The sample space of a Beta distribution is = ]0; 1[ The Beta distribution has two parameters α and β. The density function is (where Γ is the gamma function): The expectation and variance are

Slide 24

Conjugate families: Binomial II

Assume that our belief in the probability of a binomial distribution is Beta(α, β) and we take a new observation from the binomial distribution in question. If we observe successes out of trials the belief in given the new observation is Beta(α+ , β+ – ). When we specify a Beta prior we may think of it this way:

  • What is our initial guess about the parameter? This guess is put

equal to α/(α+β).

  • How certain are we about our guess? As if we have observed

10, 100, 1000 trials (cases)? We then put α + β = and from these two equations we can determine the parameters α and β.

slide-9
SLIDE 9

9

Slide 25

Conjugate families: Poisson I The conjugate family for the parameter λ of a poisson distribution is the Gamma distribution with density function The expectation and variance are

Slide 26

Conjugate families: Poisson II

Assume that our belief in the parameter λ of a poisson distribution is Gamma(α, β) and we take new observations 1 = 1, …, n = n from the poisson distribution in question. Our belief in λ given the new observations is Gamma(α + Σin i, β + ). When we specify a Gamma prior we may think of it this way:

  • What is our initial guess about the parameter? This

guess is put equal to α/β.

  • How certain are we about our guess? As if we have
  • bserved 10, 100, 1000 values? We then put β =

and from these two equations we can determine the parameters α and β.

Slide 27

Conjugate priors, other distributions

  • Mean µ of normal distribution

Normal distribution Inverse variance (=precision)

  • f normal distribution

Gamma distribution Exponential distribution Gamma distribution Uniform (Pareto) Multinomial (Dirichlet)