Probability Review Applied Bayesian Statistics Dr. Earvin Balderama - - PowerPoint PPT Presentation

probability review
SMART_READER_LITE
LIVE PREVIEW

Probability Review Applied Bayesian Statistics Dr. Earvin Balderama - - PowerPoint PPT Presentation

Probability Review Applied Bayesian Statistics Dr. Earvin Balderama Department of Mathematics & Statistics Loyola University Chicago August 31, 2017 Applied Bayesian Statistics 1 Last edited September 8, 2017 by Earvin Balderama


slide-1
SLIDE 1

Probability Review

Applied Bayesian Statistics

  • Dr. Earvin Balderama

Department of Mathematics & Statistics Loyola University Chicago

August 31, 2017

1

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-2
SLIDE 2

Random Variables

Mathematically, a random variable is a function that maps a sample space into the real numbers: X : S → R.

1

Countable (discrete).

2

Uncountable (continuous). Example: 3 coin tosses S = {HHH, HHT, HTH, THH, THT, TTH, TTT} We may want to create a random variable, X, defined as the number of tails. X = {0, 1, 2, 3}

2

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-3
SLIDE 3

Probability

Mathematically, a probability function assigns numbers (between 0 and 1) to subsets of a sample space: P : B → [0, 1], ∀B ⊆ S. Two interpretations:

1

(Frequentist) Based on long-run relative frequencies of possible

  • utcomes.

2

(Bayesian) Based on belief about how likely each possible outcome is.

3

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-4
SLIDE 4

Probability

Mathematically, a probability function assigns numbers (between 0 and 1) to subsets of a sample space: P : B → [0, 1], ∀B ⊆ S. Two interpretations:

1

(Frequentist) Based on long-run relative frequencies of possible

  • utcomes.

2

(Bayesian) Based on belief about how likely each possible outcome is. Regardless of interpretation, same basic probability laws apply, e.g., P(A) ≥ 0, P(S) = 1, P(A ∪ B) = P(A) + P(B), for mutually exclusive A and B.

3

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-5
SLIDE 5

Probability distributions

A probability distribution is a list of all possible values of a random variable and their corresponding probabilities.

1

Discrete random variable: probability mass function (PMF)

PMF: f(x) = Prob(X = x) ≥ 0 Mean: E(X) =

  • x

xf(x) Variance: V(X) =

  • x

[x − E(X)]2f(x)

2

Continuous random variable: probability density function (PDF)

Prob(X = x) = 0 for all x PDF: f(x) ≥ 0, Prob(X ∈ B) =

  • B

f(x)dx Mean: E(X) =

  • xf(x)dx

Variance: V(X) = x − E(X) 2 f(x)dx

4

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-6
SLIDE 6

Parametric families of distributions

A statistical analysis typically proceeds by selecting a PMF (or PDF) that seems to match the distribution of a sample. We rarely know the PMF (or PDF) exactly, but we may assume it is from a parametric family of distributions, and estimate the parameters.

1

Discrete random variables

Binomial (Bernoulli is a special case) Poisson NegativeBinomial

2

Continuous random variables

Normal Gamma (Exponential and χ2 are special cases) InverseGamma Beta (Uniform is a special case)

5

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-7
SLIDE 7

X ∼ Bernoulli(θ)

Only two outcomes, (success/failure, 0/1, zero/nonzero, etc.), where θ is the probability of success. X ∈ {0, 1} PMF: f(x) = Prob(X = x) =

  • 1 − θ,

if x = 0, θ, if x = 1. Mean: E(X) =

  • x

xf(x) = 0(1 − θ) + 1θ = θ Variance: V(X) =

  • x

[x − θ]2f(x) = (0 − θ)2(1 − θ) + (1 − θ)2θ = θ(1 − θ)

6

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-8
SLIDE 8

X ∼ Binomial(n, θ)

X = number of “successes” in n independent “Bernoulli trials,” where θ is the probability of success on each trial. X ∈ {0, 1, . . . , n} PMF: f(x) = Prob(X = x) = n

x

  • θx(1 − θ)n−x.

Mean: E(X) = nθ Variance: V(X) = nθ(1 − θ)

7

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-9
SLIDE 9

X ∼ Poisson(λ)

X = number of events that occur in a unit of time. X ∈ {0, 1, . . .} PMF: f(x) = Prob(X = x) = λxe−λ x! . Mean: E(X) = λ Variance: V(X) = λ Note: Can be parameterized with λ = nθ, where θ is the expected number of events per unit time. E(X) = V(X) = nθ.

8

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-10
SLIDE 10

X ∼ NegativeBinomial(r, θ)

X = number of “failures” until r “successes” in a sequence of independent “Bernoulli trials,” where θ is the probability of success on each trial. X ∈ {0, 1, . . . , n} PMF: f(x) = Prob(X = x) = x+r−1

x

  • θr(1 − θ)x.

Mean: E(X) = r(1−θ)

θ

Variance: V(X) = r(1−θ)

θ2

Note: The geometric distribution is a special case: Geom(θ) = NB(1, θ). Note: MANY different ways to specify the NB distribution. The important thing to note is that NB is a discrete count distribution that is a more flexible model than Poisson.

9

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-11
SLIDE 11

X ∼ Normal(µ, σ2)

X ∈ (−∞, ∞) PDF: f(x) = 1 √ 2πσ exp

  • −1

2 x − µ σ 2 . Mean: E(X) = µ Variance: V(X) = σ2

10

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-12
SLIDE 12

X ∼ Gamma(a, b)

X ∈ (0, ∞) PDF: f(x) =

ba Γ(a)xa−1e−bx.

Mean: E(X) = a b Variance: V(X) = a b2 Parameters: shape a > 0, rate b > 0.

11

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-13
SLIDE 13

X ∼ InverseGamma(a, b)

If Y ∼ Gamma(a, b), then X = 1

Y ∼ InverseGamma(a, b).

X ∈ (0, ∞) PDF: f(x) =

ba Γ(a)x−a−1e−b/x.

Mean: E(X) = b a − 1, for a > 1. Variance: V(X) = b2 (a − 1)2(a − 2), for a > 2. Parameters: shape a > 0, rate b > 0.

12

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-14
SLIDE 14

X ∼ Beta(a, b)

X ∈ [0, 1] PDF: f(x) =

Γ(a+b) Γ(a)Γ(b)xa−1(1 − x)b−1.

Mean: E(X) = a a + b Variance: V(X) = ab (a + b)2(a + b + 1) Parameters: a > 0, b > 0.

13

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-15
SLIDE 15

Joint distributions

A random vector of p random variables: X = (X1, X2, . . . , Xp). For now, suppose we have just p = 2 random variables, X and Y. (X, Y) can be discrete or continuous.

14

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-16
SLIDE 16

Joint distributions

1

Discrete (X, Y)

joint PMF: f(x, y) = Prob(X = x, Y = y) marginal PMF for X: fX(x) = Prob(X = x) =

  • y

f(x, y) marginal PMF for Y: fY(y) = Prob(Y = y) =

  • x

f(x, y)

2

Continuous (X, Y)

joint PDF: f(x, y) Prob[(X, Y) ∈ B] =

  • B

f(x, y)dxdy marginal PDF for X: fX(x) =

  • f(x, y)dy

marginal PDF for Y: fY(y) =

  • f(x, y)dx

15

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-17
SLIDE 17

Discrete random variables

Example Patients are randomly assigned a dose and followed to determine whether they develop a tumor. X ∈ {5, 10, 20} is the dose; Y ∈ {0, 1} is 1 if a tumor develops and 0 otherwise. The joint PMF is given by X Y 5 10 20 0.469 0.124 0.049 1 0.231 0.076 0.051

16

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-18
SLIDE 18

Discrete random variables

Example Find the marginal PMFs of X and Y.

17

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-19
SLIDE 19

Discrete random variables

Example Find the marginal PMFs of X and Y. fY(0) =

  • x

f(x, 0) = 0.469 + 0.124 + 0.049 = 0.642 fY(1) =

  • x

f(x, 1) = 0.231 + 0.076 + 0.051 = 0.358

17

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-20
SLIDE 20

Discrete random variables

Example Find the marginal PMFs of X and Y. fY(0) =

  • x

f(x, 0) = 0.469 + 0.124 + 0.049 = 0.642 fY(1) =

  • x

f(x, 1) = 0.231 + 0.076 + 0.051 = 0.358 fX(5) = 0.7, fX(10) = 0.2, fX(20) = 0.1 X Y 5 10 20 0.469 0.124 0.049 0.642 1 0.231 0.076 0.051 0.358 0.7 0.2 0.1 1

17

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-21
SLIDE 21

Discrete random variables

conditional PMF of Y given X: f(y |x) = Prob(Y = y |X = x) = Prob(X = x, Y = y) Prob(X = x) = f(x, y) fX(x) conditional = joint marginal

18

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-22
SLIDE 22

Discrete random variables

conditional PMF of Y given X: f(y |x) = Prob(Y = y |X = x) = Prob(X = x, Y = y) Prob(X = x) = f(x, y) fX(x) conditional = joint marginal Note: Here, x is treated as fixed, so f(y |x) is only a function of y. Note: This is not

  • x

f(x, y) = fY(y) nor

  • y

f(x, y) = fX(x). Note: Showing that f(y |x) is a valid PMF,

  • y

f(y |x) =

  • y

f(y, x) fX(x) =

  • y f(y, x)

fX(x) = fX(x) fX(x) = 1

18

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-23
SLIDE 23

Discrete random variables

Example Find f(y |x) and f(x |y). The joint PMF is given by X Y 5 10 20 0.469 0.124 0.049 0.642 1 0.231 0.076 0.051 0.358 0.7 0.2 0.1 1

19

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-24
SLIDE 24

Discrete random variables

Example Find f(y |x) and f(x |y). The joint PMF is given by X Y 5 10 20 0.469 0.124 0.049 0.642 1 0.231 0.076 0.051 0.358 0.7 0.2 0.1 1

20

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-25
SLIDE 25

Discrete random variables

Example Find f(y |x) and f(x |y). The joint PMF is given by X Y 5 10 20 0.469 0.124 0.049 0.642 1 0.231 0.076 0.051 0.358 0.7 0.2 0.1 1 Prob(Y = 0 |X = 5) = 0.469 0.7 = 0.67 Prob(Y = 1 |X = 5) = 0.231 0.7 = 0.33 Prob(X = 5 |Y = 0) = 0.469 0.642 = 0.67 . . .

20

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-26
SLIDE 26

Continuous random variables

Example Let X = birthweight, Y = gestational age. X ∈ (2, 10) pounds; Y ∈ (20, 50) weeks. The joint PDF is given by f(x, y) = 0.26 exp(−|x − 7| − |y − 40|). Find Prob(X > 7, Y > 40)

21

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-27
SLIDE 27

Continuous random variables

Example Let X = birthweight, Y = gestational age. X ∈ (2, 10) pounds; Y ∈ (20, 50) weeks. The joint PDF is given by f(x, y) = 0.26 exp(−|x − 7| − |y − 40|). Find Prob(X > 7, Y > 40) = 50

40

10

7

f(x, y)dxdy = . . . = 0.25

21

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-28
SLIDE 28

Continuous random variables

Example Let X = birthweight, Y = gestational age. X ∈ (2, 10) pounds; Y ∈ (20, 50) weeks. The joint PDF is given by f(x, y) = 0.26 exp(−|x − 7| − |y − 40|). Find fX(x)

22

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-29
SLIDE 29

Continuous random variables

Example Let X = birthweight, Y = gestational age. X ∈ (2, 10) pounds; Y ∈ (20, 50) weeks. The joint PDF is given by f(x, y) = 0.26 exp(−|x − 7| − |y − 40|). Find fX(x) = 50

20

f(x, y)dy = . . . = 0.52e−|x−7|

22

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-30
SLIDE 30

Continuous random variables

conditional PDF of Y given X: f(y |x) = f(x, y) fX(x) conditional = joint marginal

23

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-31
SLIDE 31

Continuous random variables

conditional PDF of Y given X: f(y |x) = f(x, y) fX(x) conditional = joint marginal Note: Here, x is treated as fixed, so f(y |x) is only a function of y. Note: This is not

  • f(x, y)dx = fY(y) nor
  • f(x, y)dy = fX(x).

Note: Showing that f(y |x) is a valid PDF,

  • f(y |x)dy =

f(y, x) fX(x) dy =

  • f(y, x)dy

fX(x) = fX(x) fX(x) = 1

23

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-32
SLIDE 32

Continuous random variables

Example Let X = birthweight, Y = gestational age. X ∈ (2, 10) pounds; Y ∈ (20, 50) weeks. The joint PDF is given by f(x, y) = 0.26 exp(−|x − 7| − |y − 40|). Find f(y |x).

24

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-33
SLIDE 33

Bivariate normal distribution

The bivariate normal is the most common multivariate family. There are 5 parameters:

1

µx is the marginal mean of X.

2

µy is the marginal mean of Y.

3

σ2

x is the marginal variance of X.

4

σ2

y is the marginal variance of Y.

5

ρxy is the correlation between X and Y.

The joint PDF is 1 2πσXσY

  • 1 − ρ2 exp

     −

  • x−µX

σX

2 +

  • y−µY

σY

2 − 2ρ

  • x−µX

σX

  • y−µY

σY

  • 2(1 − ρ2)

    

25

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-34
SLIDE 34

Bivariate normal distribution

Example Suppose X and Y is bivariate normal with µX = µY = 0 and σX = σY = 1. Find the marginal distribution of X.

26

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-35
SLIDE 35

Bivariate normal distribution

Example Suppose X and Y is bivariate normal with µX = µY = 0 and σX = σY = 1. Find the conditional distribution of Y given X.

27

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-36
SLIDE 36

Bayes’ theorem

Recall conditional distributions: f(y |x) = f(x, y) f(x) conditional = joint marginal Can be extended to f(y |x) = f(x, y) f(x) = f(x |y)f(y) f(x) = f(x |y)f(y)

  • all y

f(x |y)f(y) This is the form of the famous “Bayes’ theorem” (or “Bayes’ rule”). Note: the denominator is simply a normalizing constant.

28

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>

slide-37
SLIDE 37

Bayes’ theorem

In a Bayesian data analysis, we select:

1

the prior f(θ),

2

the likelihood f(y |θ). Based on these, we must compute

3

the posterior f(θ |y).

Bayes’ theorem The mathematical formula to convert the likelihood and prior to the posterior. f(θ |y) = f(y |θ)f(θ) f(y) Posterior ∝ Likelihood × Prior

29

Applied Bayesian Statistics Last edited September 8, 2017 by Earvin Balderama <ebalderama@luc.edu>