Statistical Inference Lecture 3: Common Families of Distributions - - PowerPoint PPT Presentation

statistical inference
SMART_READER_LITE
LIVE PREVIEW

Statistical Inference Lecture 3: Common Families of Distributions - - PowerPoint PPT Presentation

Statistical Inference Lecture 3: Common Families of Distributions MING GAO DASE @ ECNU (for course related communications) mgao@dase.ecnu.edu.cn Mar. 24, 2020 Outline Discrete Distributions 1 Continuous Distributions 2 Exponential Family


slide-1
SLIDE 1

Statistical Inference

Lecture 3: Common Families of Distributions MING GAO

DASE @ ECNU (for course related communications) mgao@dase.ecnu.edu.cn

  • Mar. 24, 2020
slide-2
SLIDE 2

Outline

1

Discrete Distributions

2

Continuous Distributions

3

Exponential Family

4

Location and Scale Families

5

Take-aways

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

2 / 28

slide-3
SLIDE 3

Discrete Distributions

Discrete distributions

A r.v. X is said to have a discrete distribution if the range of X is

  • countable. In most situations, the r.v. has integer-valued outcomes.

Discrete uniform distribution A r.v. X has a discrete uniform (1, N) distribution if P(X = x|N) = 1 N , x = 1, 2, · · · , N, where N is a specified integer. This distribution puts equal mass on each of the outcomes 1, 2, · · · , N. k

i=1 i = k(k+1) 2

, and k

i=1 i2 = k(k+1)(2k+1) 6

. E(X) = N

x=1 xP(X = x|N) = N+1 2 ;

Var(X) = E(X 2) − E(X)2 = (N+1)(N−1)

12

.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

3 / 28

slide-4
SLIDE 4

Discrete Distributions

Bernoulli Trials

Definition Each performance of an experiment with two possible outcomes is called a Bernoulli trial.

slide-5
SLIDE 5

Discrete Distributions

Bernoulli Trials

Definition Each performance of an experiment with two possible outcomes is called a Bernoulli trial. In general, a possible outcome of a Bernoulli trial is called a success or a failure. If p is the probability of a success and q is the probability of a failure, it follows that p + q = 1. E(X) = 0·(1−p)+1·p = p and E(X 2) = 02·(1−p)+12·p = p; Var(X) = p(1 − p).

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

4 / 28

slide-6
SLIDE 6

Discrete Distributions

Binomial distribution

Many problems can be solved by determining the probability of k successes when an experiment consists of n mutually independent Bernoulli trials. Let r.v. Xi be the i−th experimental outcome (i = 1, 2, · · · , n), where Xi denote whether it successes or not. Hence, we have

Xi = 1, if we obtain head with probability p; 0,

  • therwise with probability (1 − p).
slide-7
SLIDE 7

Discrete Distributions

Binomial distribution

Many problems can be solved by determining the probability of k successes when an experiment consists of n mutually independent Bernoulli trials. Let r.v. Xi be the i−th experimental outcome (i = 1, 2, · · · , n), where Xi denote whether it successes or not. Hence, we have

Xi = 1, if we obtain head with probability p; 0,

  • therwise with probability (1 − p).

Let r.v. X = n

i=1 Xi. We have

P(X = x|n, p) = n x

  • px(1 − p)n−x, x = 0, 1, 2, · · · , n.

We call this function the binomial distribution, i.e., B(k; n, p) = P(X = k) = C(n, k)pkqn−k.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

5 / 28

slide-8
SLIDE 8

Discrete Distributions

Expected value of Binomial r.v.s

Theorem The expected number of successes when n mutually independent Bernoulli trials are performed, where p is the probability of success

  • n each trial, is np.
slide-9
SLIDE 9

Discrete Distributions

Expected value of Binomial r.v.s

Theorem The expected number of successes when n mutually independent Bernoulli trials are performed, where p is the probability of success

  • n each trial, is np.

Proof. Let X be the r.v. equal to # successes in n trials. We have known that P(X = k) = C(n, k)pkqn−k. Hence, we have

E(X) =

n

  • k=0

k · P(X = k) =

n

  • k=1

k · C(n, k)pkqn−k =

n

  • k=1

n · n − 1 k − 1

  • pkqn−k = np

n−1

  • j=0

n − 1 j

  • pjqn−1−j

= np(p + q)n−1 = np

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

6 / 28

slide-10
SLIDE 10

Discrete Distributions

Variance of Binomial r.v.s

Question: Let r.v. X be the number of successes of n mutually independent Bernoulli trials, where p is the probability of success on each trial. What is the variance of X?

slide-11
SLIDE 11

Discrete Distributions

Variance of Binomial r.v.s

Question: Let r.v. X be the number of successes of n mutually independent Bernoulli trials, where p is the probability of success on each trial. What is the variance of X? Solution:

E(X 2) =

n

  • k=0

k2 · P(X = k) =

n

  • k=1

k(k − 1) · P(X = k) +

n

  • k=1

k · P(X = k) = n(n − 1)p2

n

  • k=2
  • n − 2

k − 2

  • pk−2qn−k + np

= n(n − 1)p2

n−2

  • j=0
  • n − 2

j

  • pjqn−2−j + np

= n(n − 1)p2(p + q)n−2 + np = n(n − 1)p2 + np, V (X) = E(X 2) − (E(X))2 = n(n − 1)p2 + np − (np)2 = np(1 − p).

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

7 / 28

slide-12
SLIDE 12

Discrete Distributions

Geometric distribution

Let r.v. Y be # experiments until the first success obtained in independent Bernoulli trials.

slide-13
SLIDE 13

Discrete Distributions

Geometric distribution

Let r.v. Y be # experiments until the first success obtained in independent Bernoulli trials. P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · · Xk−1 = 0 ∧ Xk = 1) = Πk−1

i=1 P(Xi = 0) · P(Xk = 1) = pqk−1

slide-14
SLIDE 14

Discrete Distributions

Geometric distribution

Let r.v. Y be # experiments until the first success obtained in independent Bernoulli trials. P(Y = k) = P(X1 = 0 ∧ X2 = 0 ∧ · · · Xk−1 = 0 ∧ Xk = 1) = Πk−1

i=1 P(Xi = 0) · P(Xk = 1) = pqk−1

We call this function the Geometric distribution, i.e., G(k; p) = pqk−1. The geometric distribution is sometimes used to model “lifetimes” or “time until failure” of components. For example,, if the probability is 0.001 that a light bulb will fail on any given day, what is the probability that it will last at least 30 days?

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

8 / 28

slide-15
SLIDE 15

Discrete Distributions

Expectation of Geometric r.v.s

Theorem E(X) and Var(X) when a r.v. X follows a Geometric distribution are 1

p and q p2 , where p is the probability of success on each trial.

slide-16
SLIDE 16

Discrete Distributions

Expectation of Geometric r.v.s

Theorem E(X) and Var(X) when a r.v. X follows a Geometric distribution are 1

p and q p2 , where p is the probability of success on each trial.

Proof. We have known that P(X = k) = qk−1p. Hence, we have E(X) =

  • k=0

k · qk−1p = p(

  • m=1

  • k=m

qk−1) = p(

  • m=1

qm−1 1 − q ) =

  • m=1

qm−1 = 1 1 − q = 1 p

slide-17
SLIDE 17

Discrete Distributions

Expectation of Geometric r.v.s

Theorem E(X) and Var(X) when a r.v. X follows a Geometric distribution are 1

p and q p2 , where p is the probability of success on each trial.

Proof. We have known that P(X = k) = qk−1p. Hence, we have E(X) =

  • k=0

k · qk−1p = p(

  • m=1

  • k=m

qk−1) = p(

  • m=1

qm−1 1 − q ) =

  • m=1

qm−1 = 1 1 − q = 1 p

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

9 / 28

slide-18
SLIDE 18

Discrete Distributions

Variance of Geometric r.v.s

E(X 2) =

  • k=0

k2 · P(X = k) =

  • k=1

[k(k − 1) + k] · P(X = k) = p

  • k=2

(2

k−1

  • j=1

j)qk−1 + 1 p = 2p

  • j=1

  • k=j+1

(jqk−1) + 1 p = 2q p2 + 1 p = 2q + p p2 V (X) = 2q + p p2 − (1 p)2 = 2q − (1 − p) p2 = q p2 .

slide-19
SLIDE 19

Discrete Distributions

Variance of Geometric r.v.s

E(X 2) =

  • k=0

k2 · P(X = k) =

  • k=1

[k(k − 1) + k] · P(X = k) = p

  • k=2

(2

k−1

  • j=1

j)qk−1 + 1 p = 2p

  • j=1

  • k=j+1

(jqk−1) + 1 p = 2q p2 + 1 p = 2q + p p2 V (X) = 2q + p p2 − (1 p)2 = 2q − (1 − p) p2 = q p2 .

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

10 / 28

slide-20
SLIDE 20

Discrete Distributions

Hypergeometric distributions

Suppose we have a large urn filled with N balls that are identical in every way except that M are red and N − M are green. Let a r.v., denoted as X, be the number of red balls in a sample of size K. The r.v. X has a hypergeometric distribution given by P(X = x|N, M, K) = M

x

N−M

K−x

  • N

K

  • , x = 1, 2, · · · , K

K

x=0

M

x

N−M

K−x

  • =

N

K

  • ;

K

  • x=0

P(X = x) =

K

  • x=0

M

x

N−M

K−x

  • N

K

  • = 1.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

11 / 28

slide-21
SLIDE 21

Discrete Distributions

Hypergeometric distributions Cont’d

Expectation and variance E(X) =

K

  • x=0

x M

x

N−M

K−x

  • N

K

  • =

K

  • x=1

M M−1

x−1

N−M

K−x

  • N

K

N−1

K−1

  • =

K

  • y=0

M M−1

y

(N−1)−(M−1)

K−1−y

  • N

K

N−1

K−1

  • = KM

N . Var(X) = KM N · (N − M)(N − K) N(N − 1) .

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

12 / 28

slide-22
SLIDE 22

Discrete Distributions

Possion distributions

A r.v. X, taking the values in the nonnegative integers, has a Poisson(λ) distribution if P(X = x|λ) = λx x! e−λ, x = 0, 1, 2, · · · . Note that ∞

k=0 xk k! = ex;

E(X) = ∞

x=0 x λx x! e−λ = λe−λ ∞ y=0 λx−1 (x−1)! = λ;

E(X 2) =

  • x=0

x2 λx x! e−λ =

  • x=1

[x(x − 1) + x]λx x! e−λ = λ2 + λ; Var(X) = λ

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

13 / 28

slide-23
SLIDE 23

Continuous Distributions

Continuous uniform distribution

The continuous uniform distribution is defined by spreading mass uniformly over an interval [a, b]. Its pdf is given by f (x|a, b)

  • 1

b−a,

if x ∈ [a, b]; 0,

  • therwise.

E(X) = b

a x b−adx = a+b 2 ;

Var(X) = b

a (x− a+b

2 )2

b−a

dx = (b−a)2

12

.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

14 / 28

slide-24
SLIDE 24

Continuous Distributions

Gamma distribution

Note that, if α > 0, then +∞ tα−1e−tdt < ∞. Let Γ(α) = +∞ tα−1e−tdt, Γ(α + 1) = Γ(α); Γ(n) = (n − 1)!; Γ( 1

2) = √π.

Gamma distribution is defined the interval [0, +∞). Its pdf is given by f (x|α, β) = 1 Γ(α)βα xα−1e−x/β, 0 ≤ x < ∞, α > 0, β > 0. E(X) =

1 Γ(α)βα

+∞ xαe−x/βdx = αβ; Var(X) =

1 Γ(α)βα

+∞ xα+1e−x/βdx − (αβ)2 = αβ2.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

15 / 28

slide-25
SLIDE 25

Continuous Distributions

Special cases of Gamma distribution

Chi squared distribution Let α = p

2, and p ∈ Z +, β = 2, then its pdf is given by

f (x|p) = 1 Γ( p

2)2

p 2 x p 2 −1e−x/2, 0 ≤ x < ∞,

which is the chi squared pdf with p degree of freedom. Note that E(X) = p, Var(X) = 2p.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

16 / 28

slide-26
SLIDE 26

Continuous Distributions

Special cases of Gamma distribution

Chi squared distribution Let α = p

2, and p ∈ Z +, β = 2, then its pdf is given by

f (x|p) = 1 Γ( p

2)2

p 2 x p 2 −1e−x/2, 0 ≤ x < ∞,

which is the chi squared pdf with p degree of freedom. Note that E(X) = p, Var(X) = 2p. Exponential distribution If we set α = 1 for Gamma distribution, then its pdf is given by

f (x|p) = 1 β e−x/β, 0 ≤ x < ∞.

Note that E(X) = β, Var(X) = β2.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

16 / 28

slide-27
SLIDE 27

Continuous Distributions

Normal distribution/Gaussian distribution

The pdf of the normal distribution with mean µ and variance σ2, denoted as N(µ, σ2), is given by f (x|µ, σ2) = 1 √ 2πσ e− (x−µ)2

2σ2 , −∞ < x < ∞.

slide-28
SLIDE 28

Continuous Distributions

Normal distribution/Gaussian distribution

The pdf of the normal distribution with mean µ and variance σ2, denoted as N(µ, σ2), is given by f (x|µ, σ2) = 1 √ 2πσ e− (x−µ)2

2σ2 , −∞ < x < ∞.

E(X) = µ, Var(X) = σ2;

slide-29
SLIDE 29

Continuous Distributions

Normal distribution/Gaussian distribution

The pdf of the normal distribution with mean µ and variance σ2, denoted as N(µ, σ2), is given by f (x|µ, σ2) = 1 √ 2πσ e− (x−µ)2

2σ2 , −∞ < x < ∞.

E(X) = µ, Var(X) = σ2; The standard normal distribution is N(0, 1) with pdf f (x|0, 1) =

1 √ 2πe− x2

2 ;

slide-30
SLIDE 30

Continuous Distributions

Normal distribution/Gaussian distribution

The pdf of the normal distribution with mean µ and variance σ2, denoted as N(µ, σ2), is given by f (x|µ, σ2) = 1 √ 2πσ e− (x−µ)2

2σ2 , −∞ < x < ∞.

E(X) = µ, Var(X) = σ2; The standard normal distribution is N(0, 1) with pdf f (x|0, 1) =

1 √ 2πe− x2

2 ;

If X ∼ N(µ, σ2), the r.v. Z = X−µ

σ

∼ N(0, 1);

slide-31
SLIDE 31

Continuous Distributions

Normal distribution/Gaussian distribution

The pdf of the normal distribution with mean µ and variance σ2, denoted as N(µ, σ2), is given by f (x|µ, σ2) = 1 √ 2πσ e− (x−µ)2

2σ2 , −∞ < x < ∞.

E(X) = µ, Var(X) = σ2; The standard normal distribution is N(0, 1) with pdf f (x|0, 1) =

1 √ 2πe− x2

2 ;

If X ∼ N(µ, σ2), the r.v. Z = X−µ

σ

∼ N(0, 1); P(|X − µ| ≤ σ) = P(|Z| ≤ 1) = 0.6826; (1) P(|X − µ| ≤ 2σ) = P(|Z| ≤ 2) = 0.9544; (2) P(|X − µ| ≤ 3σ) = P(|Z| ≤ 3) = 0.9974. (3)

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

17 / 28

slide-32
SLIDE 32

Continuous Distributions

Normal approximation

Let X ∼ binomial(25, 0.6). We can approximate X with a normal r.v., Y , with mean µ = 25 × 0.6 = 15 and standard deviation σ =

  • 25 × 0.6(1 − 0.6) = 2.45. Thus

P(X ≤ 13) ≈ P(Y ≤ 13) = P(Z ≤ 13 − 15 2.45 ) (4) = P(Z ≤ −0.82) = 0.206; (5) P(X ≤ 13) =

13

  • x=0

25 x

  • 0.6x0.425−x = 0.267.

(6)

slide-33
SLIDE 33

Continuous Distributions

Normal approximation

Let X ∼ binomial(25, 0.6). We can approximate X with a normal r.v., Y , with mean µ = 25 × 0.6 = 15 and standard deviation σ =

  • 25 × 0.6(1 − 0.6) = 2.45. Thus

P(X ≤ 13) ≈ P(Y ≤ 13) = P(Z ≤ 13 − 15 2.45 ) (4) = P(Z ≤ −0.82) = 0.206; (5) P(X ≤ 13) =

13

  • x=0

25 x

  • 0.6x0.425−x = 0.267.

(6) In general, X ∼ binomial(n, p), then E(X) = np and Var(X) = np(1 − p). We can approximate the distribution with N(np, np(1 − p)).

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

18 / 28

slide-34
SLIDE 34

Continuous Distributions

Beta distribution

The beta family of distribution is a continuous family on (0, 1) in- dexed by two parameters. The Beta(α, β) pdf is f (x|α, β) = 1 B(α, β)xα−1(1 − x)β−1, 0 < x < 1, α > 0, β > 0, where B(α, β) denotes the beta function, B(α, β) = 1 xα−1(1 − x)β−1dx.

slide-35
SLIDE 35

Continuous Distributions

Beta distribution

The beta family of distribution is a continuous family on (0, 1) in- dexed by two parameters. The Beta(α, β) pdf is f (x|α, β) = 1 B(α, β)xα−1(1 − x)β−1, 0 < x < 1, α > 0, β > 0, where B(α, β) denotes the beta function, B(α, β) = 1 xα−1(1 − x)β−1dx. B(α, β) = Γ(α)Γ(β)

Γ(α+β) .

slide-36
SLIDE 36

Continuous Distributions

Beta distribution

The beta family of distribution is a continuous family on (0, 1) in- dexed by two parameters. The Beta(α, β) pdf is f (x|α, β) = 1 B(α, β)xα−1(1 − x)β−1, 0 < x < 1, α > 0, β > 0, where B(α, β) denotes the beta function, B(α, β) = 1 xα−1(1 − x)β−1dx. B(α, β) = Γ(α)Γ(β)

Γ(α+β) .

E(X) =

α α+β, and Var(X) = αβ (α+β)2(α+β+1);

slide-37
SLIDE 37

Continuous Distributions

Beta distribution

The beta family of distribution is a continuous family on (0, 1) in- dexed by two parameters. The Beta(α, β) pdf is f (x|α, β) = 1 B(α, β)xα−1(1 − x)β−1, 0 < x < 1, α > 0, β > 0, where B(α, β) denotes the beta function, B(α, β) = 1 xα−1(1 − x)β−1dx. B(α, β) = Γ(α)Γ(β)

Γ(α+β) .

E(X) =

α α+β, and Var(X) = αβ (α+β)2(α+β+1);

E(X n) = Γ(α+n)Γ(α+β)

Γ(α+β+n)Γ(α).

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

19 / 28

slide-38
SLIDE 38

Continuous Distributions

Cauchy distribution

The Cauchy distribution is a symmetric, bell-shaped distribution on (−∞, +∞) with pdf f (x|θ) = 1 π 1 1 + (x − θ)2 , −∞ < x < +∞, −∞ < θ < +∞.

slide-39
SLIDE 39

Continuous Distributions

Cauchy distribution

The Cauchy distribution is a symmetric, bell-shaped distribution on (−∞, +∞) with pdf f (x|θ) = 1 π 1 1 + (x − θ)2 , −∞ < x < +∞, −∞ < θ < +∞. E|X| = +∞

−∞ 1 π |x| 1+(x−θ)2 dx = ∞;

slide-40
SLIDE 40

Continuous Distributions

Cauchy distribution

The Cauchy distribution is a symmetric, bell-shaped distribution on (−∞, +∞) with pdf f (x|θ) = 1 π 1 1 + (x − θ)2 , −∞ < x < +∞, −∞ < θ < +∞. E|X| = +∞

−∞ 1 π |x| 1+(x−θ)2 dx = ∞;

The parameter θ does measure the center of the distribution; it is the median.

slide-41
SLIDE 41

Continuous Distributions

Cauchy distribution

The Cauchy distribution is a symmetric, bell-shaped distribution on (−∞, +∞) with pdf f (x|θ) = 1 π 1 1 + (x − θ)2 , −∞ < x < +∞, −∞ < θ < +∞. E|X| = +∞

−∞ 1 π |x| 1+(x−θ)2 dx = ∞;

The parameter θ does measure the center of the distribution; it is the median. P(X ≥ θ) = 1

2.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

20 / 28

slide-42
SLIDE 42

Continuous Distributions

Lognormal distribution

If X is a r.v. whose logarithm is normally distributed, that is, log X ∼ N(µ, σ2), then X has a lognormal distribution. Its pdf is f (x|µ, σ2) = 1 √ 2πσ 1 x e− (log x−µ)2

2σ2

, x > 0, −∞ < µ < +∞, σ > 0.

slide-43
SLIDE 43

Continuous Distributions

Lognormal distribution

If X is a r.v. whose logarithm is normally distributed, that is, log X ∼ N(µ, σ2), then X has a lognormal distribution. Its pdf is f (x|µ, σ2) = 1 √ 2πσ 1 x e− (log x−µ)2

2σ2

, x > 0, −∞ < µ < +∞, σ > 0. E(X) = E(elog X) = eµ+σ2/2;

slide-44
SLIDE 44

Continuous Distributions

Lognormal distribution

If X is a r.v. whose logarithm is normally distributed, that is, log X ∼ N(µ, σ2), then X has a lognormal distribution. Its pdf is f (x|µ, σ2) = 1 √ 2πσ 1 x e− (log x−µ)2

2σ2

, x > 0, −∞ < µ < +∞, σ > 0. E(X) = E(elog X) = eµ+σ2/2; Var(X) = e2(µ+σ2) − e2µ+σ2.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

21 / 28

slide-45
SLIDE 45

Exponential Family

Exponential family

A family of pdfs or pmfs is called an exponential family if it can be expressed as

f (x|θ) = h(x)c(θ)exp k

i=1 wi(θ)ti(x)

  • .

Here h(x) ≥ 0 and ti(x) is real-valued function of the observation x, and c(θ) ≥ 0 and wi(θ) is real-valued function of the possibly vector-valued parameter θ.

slide-46
SLIDE 46

Exponential Family

Exponential family

A family of pdfs or pmfs is called an exponential family if it can be expressed as

f (x|θ) = h(x)c(θ)exp k

i=1 wi(θ)ti(x)

  • .

Here h(x) ≥ 0 and ti(x) is real-valued function of the observation x, and c(θ) ≥ 0 and wi(θ) is real-valued function of the possibly vector-valued parameter θ. To verify that a family of pdfs or pmfs is an exponential family, we must identify the functions h(x), c(θ), wi(x), and ti(x) and show that the family has the above form.

slide-47
SLIDE 47

Exponential Family

Exponential family

A family of pdfs or pmfs is called an exponential family if it can be expressed as

f (x|θ) = h(x)c(θ)exp k

i=1 wi(θ)ti(x)

  • .

Here h(x) ≥ 0 and ti(x) is real-valued function of the observation x, and c(θ) ≥ 0 and wi(θ) is real-valued function of the possibly vector-valued parameter θ. To verify that a family of pdfs or pmfs is an exponential family, we must identify the functions h(x), c(θ), wi(x), and ti(x) and show that the family has the above form. Bernoulli, Gaussian, Binomial, Poisson, Exponential, Weibull, Laplace, Gamma, Beta, Multinomial, Wishart distributions are all exponential families

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

22 / 28

slide-48
SLIDE 48

Exponential Family

Example: binomial distribution

Let consider the binomial(n, p) family, where n ∈ Z +

f (x|p) = n x

  • px(1 − p)n−x =

n x

  • (1 − p)n(

p 1 − p )x = n x

  • (1 − p)nexpx log (

p 1−p )

(7)

slide-49
SLIDE 49

Exponential Family

Example: binomial distribution

Let consider the binomial(n, p) family, where n ∈ Z +

f (x|p) = n x

  • px(1 − p)n−x =

n x

  • (1 − p)n(

p 1 − p )x = n x

  • (1 − p)nexpx log (

p 1−p )

(7)

Define h(x) = n

x

  • ,

x ∈ [0, n]; 0,

  • therwise. , c(p) = (1 − p)n

(8) w1(p) = log ( p 1 − p), t1(x) = x. (9)

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

23 / 28

slide-50
SLIDE 50

Exponential Family

Example: normal distribution

Let consider the normal family N(µ, σ2)

f (x|µ, σ2) = 1 √ 2πσ e− (x−µ)2

2σ2

= 1 √ 2πσ e− µ2

2σ2 e− x2 2σ2 + µx σ2

(10)

slide-51
SLIDE 51

Exponential Family

Example: normal distribution

Let consider the normal family N(µ, σ2)

f (x|µ, σ2) = 1 √ 2πσ e− (x−µ)2

2σ2

= 1 √ 2πσ e− µ2

2σ2 e− x2 2σ2 + µx σ2

(10)

Define h(x) = 1, c(θ) = c(µ, σ) = 1 √ 2πσ e− µ2

2σ2

w1(µ, σ) = 1 σ2 , w1(µ, σ) = µ σ2 t1(x) = −x2 2 , t2(x) = x. (11)

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

24 / 28

slide-52
SLIDE 52

Exponential Family

Example: normal distribution

Let consider the binomial(n, p) family, where n ∈ Z +

f (x|p) = n x

  • px(1 − p)n−x =

n x

  • (1 − p)n(

p 1 − p )x = n x

  • (1 − p)nexpx log (

p 1−p )

(12)

slide-53
SLIDE 53

Exponential Family

Example: normal distribution

Let consider the binomial(n, p) family, where n ∈ Z +

f (x|p) = n x

  • px(1 − p)n−x =

n x

  • (1 − p)n(

p 1 − p )x = n x

  • (1 − p)nexpx log (

p 1−p )

(12)

Define h(x) = n

x

  • ,

x ∈ [0, n]; 0,

  • therwise. , c(p) = (1 − p)n

(13) w1(p) = log ( p 1 − p), t1(x) = x. (14)

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

25 / 28

slide-54
SLIDE 54

Exponential Family

Theorem

If X is a r.v. with pdf or pmf, E k

i=1 ∂wi(θ) ∂θj ti(X)

  • = − ∂

∂θj log c(θ);

Var k

i=1 ∂wi(θ) ∂θj ti(X)

  • =

− ∂2

∂θ2

j log c(θ) − E

k

i=1 ∂2wi(θ) ∂θ2

j

ti(X)

  • .
slide-55
SLIDE 55

Exponential Family

Theorem

If X is a r.v. with pdf or pmf, E k

i=1 ∂wi(θ) ∂θj ti(X)

  • = − ∂

∂θj log c(θ);

Var k

i=1 ∂wi(θ) ∂θj ti(X)

  • =

− ∂2

∂θ2

j log c(θ) − E

k

i=1 ∂2wi(θ) ∂θ2

j

ti(X)

  • .

Their advantage is that we can replace integration or summation by differentiation, which is often more straightforward.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

26 / 28

slide-56
SLIDE 56

Exponential Family

Binomial mean and variance

For the binomial distribution, we have d dpw1(p) = d dp log p 1 − p = 1 p(1 − p) (15) d dp log c(p) = d dpn log 1 − p = −n p(1 − p) (16)

slide-57
SLIDE 57

Exponential Family

Binomial mean and variance

For the binomial distribution, we have d dpw1(p) = d dp log p 1 − p = 1 p(1 − p) (15) d dp log c(p) = d dpn log 1 − p = −n p(1 − p) (16) Thus, we have E

  • X

p(1 − p)

  • =

n 1 − p.

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

27 / 28

slide-58
SLIDE 58

Take-aways

Take-aways

Conclusions Discrete distributions Continuous distributions Exponential family Location and scale families Inequalities

MING GAO (DaSE@ECNU) Statistical Inference

  • Mar. 24, 2020

28 / 28