[PPT] - Statistical Inference Lecture 2: Transformations and Expectations PowerPoint Presentation

SLIDE 1

Statistical Inference

Lecture 2: Transformations and Expectations MING GAO

DASE @ ECNU (for course related communications) mgao@dase.ecnu.edu.cn

Mar. 17, 2020

SLIDE 2

Outline

1

Transformation Functions of a r.v. Monotone Transformations

2

Expectation Properties of Expectations Moment Moment Generating Functions

3

Differentiating Under an Integral Sign

4

Take-aways

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

2 / 54

SLIDE 3

Transformation Functions of a r.v.

Outline

1

Transformation Functions of a r.v. Monotone Transformations

2

Expectation Properties of Expectations Moment Moment Generating Functions

3

Differentiating Under an Integral Sign

4

Take-aways

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

3 / 54

SLIDE 4

Transformation Functions of a r.v.

Functions of a r.v.

If X is a r.v., then any function of X, say g(X), is also a r.v..

SLIDE 5

Transformation Functions of a r.v.

Functions of a r.v.

If X is a r.v., then any function of X, say g(X), is also a r.v.. A natural question is whether we can describe the probabilistic behavior of Y in terms of that of X.

SLIDE 6

Transformation Functions of a r.v.

Functions of a r.v.

If X is a r.v., then any function of X, say g(X), is also a r.v.. A natural question is whether we can describe the probabilistic behavior of Y in terms of that of X. That is, for any set A, P(Y ∈ A) = P(g(X) ∈ A).

SLIDE 7

Transformation Functions of a r.v.

Functions of a r.v.

If X is a r.v., then any function of X, say g(X), is also a r.v.. A natural question is whether we can describe the probabilistic behavior of Y in terms of that of X. That is, for any set A, P(Y ∈ A) = P(g(X) ∈ A). We associate with g an inverse mapping, denoted as g−1, which is a mapping from subsets of Y to subsets of X, and is defined by g−1(A) = {x ∈ X|g(x) ∈ A}.

SLIDE 8

Transformation Functions of a r.v.

Functions of a r.v.

If X is a r.v., then any function of X, say g(X), is also a r.v.. A natural question is whether we can describe the probabilistic behavior of Y in terms of that of X. That is, for any set A, P(Y ∈ A) = P(g(X) ∈ A). We associate with g an inverse mapping, denoted as g−1, which is a mapping from subsets of Y to subsets of X, and is defined by g−1(A) = {x ∈ X|g(x) ∈ A}. For any set A ⊂ Y, P(Y ∈ A) = P(g(X) ∈ A) = P(X ∈ g−1(A)).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

4 / 54

SLIDE 9

Transformation Functions of a r.v.

Example of discrete transformation

Binomial transformation A discrete r.v. X has a binomial distribution if its pmf is of the form fX(x) = P(X = x) = n x

px(1 − p)n−x, x = 0, 1, · · · , n.

SLIDE 10

Transformation Functions of a r.v.

Example of discrete transformation

Binomial transformation A discrete r.v. X has a binomial distribution if its pmf is of the form fX(x) = P(X = x) = n x

px(1 − p)n−x, x = 0, 1, · · · , n.

Consider the r.v. Y = g(X) = n − X, and Y = {y|y = g(x), x ∈ X} = {0, 1, · · · , n}.

SLIDE 11

Transformation Functions of a r.v.

Example of discrete transformation

Binomial transformation A discrete r.v. X has a binomial distribution if its pmf is of the form fX(x) = P(X = x) = n x

px(1 − p)n−x, x = 0, 1, · · · , n.

Consider the r.v. Y = g(X) = n − X, and Y = {y|y = g(x), x ∈ X} = {0, 1, · · · , n}. fY (y) =

x∈g−1(y)

fX(x) = fX(n − y) =

n

n − y

pn−y(1 − p)n−(n−y) =

n y

(1 − p)ypn−y.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

5 / 54

SLIDE 12

Transformation Functions of a r.v.

Example of continuous transformation

Uniform transformation Suppose X has a uniform distribution on the interval (0, 2π), that is fX(x) =

1

2π,

0 < x < 2π; 0,

therwise.

SLIDE 13

Transformation Functions of a r.v.

Example of continuous transformation

Uniform transformation Suppose X has a uniform distribution on the interval (0, 2π), that is fX(x) =

1

2π,

0 < x < 2π; 0,

therwise.

Consider the r.v. Y = sin2(X), and Y ∈ [0, 1].

SLIDE 14

Transformation Functions of a r.v.

Example of continuous transformation

Uniform transformation Suppose X has a uniform distribution on the interval (0, 2π), that is fX(x) =

1

2π,

0 < x < 2π; 0,

therwise.

Consider the r.v. Y = sin2(X), and Y ∈ [0, 1].

fY (y) = P(Y ≤ y) = P(X ≤ x1) + P(x2 ≤ X ≤ x3) + P(X ≥ x4) = 2P(X ≤ x1) + 2P(x2 ≤ X ≤ π),

where x1 and x2 are the two solutions to sin2(x) = y for 0 < x < π.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

6 / 54

SLIDE 15

Transformation Monotone Transformations

Outline

1

Transformation Functions of a r.v. Monotone Transformations

2

Expectation Properties of Expectations Moment Moment Generating Functions

3

Differentiating Under an Integral Sign

4

Take-aways

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

7 / 54

SLIDE 16

Transformation Monotone Transformations

Theorem for monotone transformations

Let X have cdf FX(x), let Y = g(X), and let X and Y be defined as X = {x|fX(x) > 0} and Y = {y|y = g(x) for x ∈ X}.

SLIDE 17

Transformation Monotone Transformations

Theorem for monotone transformations

Let X have cdf FX(x), let Y = g(X), and let X and Y be defined as X = {x|fX(x) > 0} and Y = {y|y = g(x) for x ∈ X}.

a. If g is increasing on X, FY (y) = FX(g−1(y)) for y ∈ Y;

SLIDE 18

Transformation Monotone Transformations

Theorem for monotone transformations

Let X have cdf FX(x), let Y = g(X), and let X and Y be defined as X = {x|fX(x) > 0} and Y = {y|y = g(x) for x ∈ X}.

a. If g is increasing on X, FY (y) = FX(g−1(y)) for y ∈ Y;
b. If g is decreasing on X and X is a continuous r.v.,

FY (y) = 1 − FX(g−1(y)) for y ∈ Y;

SLIDE 19

Transformation Monotone Transformations

Theorem for monotone transformations

Let X have cdf FX(x), let Y = g(X), and let X and Y be defined as X = {x|fX(x) > 0} and Y = {y|y = g(x) for x ∈ X}.

a. If g is increasing on X, FY (y) = FX(g−1(y)) for y ∈ Y;
b. If g is decreasing on X and X is a continuous r.v.,

FY (y) = 1 − FX(g−1(y)) for y ∈ Y; Proof. [a.] {x ∈ X|g(x) ≤ y} = {x ∈ X|x ≤ g−1(y)} since If g is

increasing. Furthermore, we have

FY (y) =

x∈X:x≤g −1(y)

fX(x)dx = g −1(y)

−∞

fX(x)dx = FX(g −1(y)).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

8 / 54

SLIDE 20

Transformation Monotone Transformations

Uniform exponential relationship

Suppose X ∼ fX(x) = 1 if 0 < x < 1 and 0 otherwise, the uniform (0, 1) distribution. It is straightforward to check that FX(x) = x, 0 < x < 1. Let Y = g(X) = − log X. d dx g(x) = d dx (− log x) = −1 x < 0, for 0 < x < 1.

SLIDE 21

Transformation Monotone Transformations

Uniform exponential relationship

Suppose X ∼ fX(x) = 1 if 0 < x < 1 and 0 otherwise, the uniform (0, 1) distribution. It is straightforward to check that FX(x) = x, 0 < x < 1. Let Y = g(X) = − log X. d dx g(x) = d dx (− log x) = −1 x < 0, for 0 < x < 1. That is g(x) is a decreasing function. As X ∈ [0, 1] and − log x ∈ [0, ∞].

SLIDE 22

Transformation Monotone Transformations

Uniform exponential relationship

Suppose X ∼ fX(x) = 1 if 0 < x < 1 and 0 otherwise, the uniform (0, 1) distribution. It is straightforward to check that FX(x) = x, 0 < x < 1. Let Y = g(X) = − log X. d dx g(x) = d dx (− log x) = −1 x < 0, for 0 < x < 1. That is g(x) is a decreasing function. As X ∈ [0, 1] and − log x ∈ [0, ∞]. For y > 0, y = − log x implies x = e−y, therefore, FY (y) = 1 − FX(g−1(y)) = 1 − FX(e−y) = 1 − e−y. Of course, FY (y) = 0 for y ≤ 0.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

9 / 54

SLIDE 23

Transformation Monotone Transformations

Theorem for continuous r.v.

Let X have pdf fX(x) and let Y = g(X), where g is a monotone

function. Suppose that fX(x) is continuous on X that g−1(y) has a

continuous derivative on Y. Then the pdf of Y is given by fY (y) =

fX(g−1(y))| d

dx g−1(y)|,

y ∈ Y; 0,

therwise.

Proof.

SLIDE 24

Transformation Monotone Transformations

Theorem for continuous r.v.

Let X have pdf fX(x) and let Y = g(X), where g is a monotone

function. Suppose that fX(x) is continuous on X that g−1(y) has a

continuous derivative on Y. Then the pdf of Y is given by fY (y) =

fX(g−1(y))| d

dx g−1(y)|,

y ∈ Y; 0,

therwise.

Proof. By the chain rule, fY (y) = d dx FY (y) =      fX(g−1(y)) d

dy g−1(y),

if g is increasing; −fX(g−1(y)) d

dy g−1(y),

if g is decreasing.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

10 / 54

SLIDE 25

Transformation Monotone Transformations

Example

Inverted Gamma pdf Question: Suppose X has the Gamma pdf f (x) = 1 (n − 1)!βn xn−1e−x/β, 0 < x < ∞. Suppose we want to find the pdf of g(X) = 1

X .

SLIDE 26

Transformation Monotone Transformations

Example

Inverted Gamma pdf Question: Suppose X has the Gamma pdf f (x) = 1 (n − 1)!βn xn−1e−x/β, 0 < x < ∞. Suppose we want to find the pdf of g(X) = 1

X .

Solution: If we let y = g(x), then g−1(y) = 1

y and d dy g−1(y) = −1 y2 .

SLIDE 27

Transformation Monotone Transformations

Example

Inverted Gamma pdf Question: Suppose X has the Gamma pdf f (x) = 1 (n − 1)!βn xn−1e−x/β, 0 < x < ∞. Suppose we want to find the pdf of g(X) = 1

X .

Solution: If we let y = g(x), then g−1(y) = 1

y and d dy g−1(y) = −1 y2 .

Applying the above theorem, for y ∈ (0, ∞), we get fY (y) = fX(g−1(y))| d dx g−1(y)| = 1 (n − 1)!βn (1 y )n−1e−1/yβ 1 y2 = 1 (n − 1)!βn (1 y )n+1e−1/yβ.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

11 / 54

SLIDE 28

Transformation Monotone Transformations

Theorem for non-monotone r.v.

Let X have pdf fX(x) and let Y = g(X). Suppose there exists a partition, A0, A1, · · · , Ak, of X such that P(X ∈ A0) = 0 and fX(x) is continuous on each Ai. Further, suppose there exist functions gi(x) defined on Ai for 1 ≤ i ≤ k.

SLIDE 29

Transformation Monotone Transformations

Theorem for non-monotone r.v.

Let X have pdf fX(x) and let Y = g(X). Suppose there exists a partition, A0, A1, · · · , Ak, of X such that P(X ∈ A0) = 0 and fX(x) is continuous on each Ai. Further, suppose there exist functions gi(x) defined on Ai for 1 ≤ i ≤ k. g(x) = gi(x), for x ∈ Ai;

SLIDE 30

Transformation Monotone Transformations

Theorem for non-monotone r.v.

Let X have pdf fX(x) and let Y = g(X). Suppose there exists a partition, A0, A1, · · · , Ak, of X such that P(X ∈ A0) = 0 and fX(x) is continuous on each Ai. Further, suppose there exist functions gi(x) defined on Ai for 1 ≤ i ≤ k. g(x) = gi(x), for x ∈ Ai; gi(x) is monotone on Ai;

SLIDE 31

Transformation Monotone Transformations

Theorem for non-monotone r.v.

Let X have pdf fX(x) and let Y = g(X). Suppose there exists a partition, A0, A1, · · · , Ak, of X such that P(X ∈ A0) = 0 and fX(x) is continuous on each Ai. Further, suppose there exist functions gi(x) defined on Ai for 1 ≤ i ≤ k. g(x) = gi(x), for x ∈ Ai; gi(x) is monotone on Ai; the set Y = {y|y = gi(x) for some x ∈ Ai} is the same for each 1 ≤ i ≤ k;

SLIDE 32

Transformation Monotone Transformations

Theorem for non-monotone r.v.

Let X have pdf fX(x) and let Y = g(X). Suppose there exists a partition, A0, A1, · · · , Ak, of X such that P(X ∈ A0) = 0 and fX(x) is continuous on each Ai. Further, suppose there exist functions gi(x) defined on Ai for 1 ≤ i ≤ k. g(x) = gi(x), for x ∈ Ai; gi(x) is monotone on Ai; the set Y = {y|y = gi(x) for some x ∈ Ai} is the same for each 1 ≤ i ≤ k; g−1

i

(y) has a continuous derivative on Y for 1 ≤ i ≤ k.

SLIDE 33

Transformation Monotone Transformations

Theorem for non-monotone r.v.

Let X have pdf fX(x) and let Y = g(X). Suppose there exists a partition, A0, A1, · · · , Ak, of X such that P(X ∈ A0) = 0 and fX(x) is continuous on each Ai. Further, suppose there exist functions gi(x) defined on Ai for 1 ≤ i ≤ k. g(x) = gi(x), for x ∈ Ai; gi(x) is monotone on Ai; the set Y = {y|y = gi(x) for some x ∈ Ai} is the same for each 1 ≤ i ≤ k; g−1

i

(y) has a continuous derivative on Y for 1 ≤ i ≤ k. Then fY (y) = k

i=1 fX(g−1 i

(y))| d

dx g−1 i

(y)|, y ∈ Y; 0,

therwise.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

12 / 54

SLIDE 34

Transformation Monotone Transformations

Normal-Chi squared relationship

Suppose X has the standard normal distribution,

f (x) = 1 √ 2π e−x2/2, −∞ < x < +∞.

Suppose Y = g(X) = X 2. Since g(x) = x2 is monotone on (−∞, 0) and (0, +∞). Applying above theorem, we take

A0 = {0}; A1 = (−∞, 0), g1(x) = x2, g −1

1 (y) = −√y;

A2 = (0, +∞), g2(x) = x2, g −1

2 (y) = √y.

SLIDE 35

Transformation Monotone Transformations

Normal-Chi squared relationship

Suppose X has the standard normal distribution,

f (x) = 1 √ 2π e−x2/2, −∞ < x < +∞.

Suppose Y = g(X) = X 2. Since g(x) = x2 is monotone on (−∞, 0) and (0, +∞). Applying above theorem, we take

A0 = {0}; A1 = (−∞, 0), g1(x) = x2, g −1

1 (y) = −√y;

A2 = (0, +∞), g2(x) = x2, g −1

2 (y) = √y.

Then the pdf of Y is

fY (y) = 1 √ 2π e−(−√y)2/2| − 1 2√y | + 1 √ 2π e−(√y)2/2| 1 2√y | = 1 √ 2π 1 √y e−y/2, 0 < y < ∞.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

13 / 54

SLIDE 36

Transformation Monotone Transformations

Probability integral transformation

Let X have continuous cdf FX(x) and define the r.v. Y as Y = FX(X). Then Y is uniformly distributed on [0, 1], that is, P(Y ≤ y) = y, 0 < y < 1.

SLIDE 37

Transformation Monotone Transformations

Probability integral transformation

Let X have continuous cdf FX(x) and define the r.v. Y as Y = FX(X). Then Y is uniformly distributed on [0, 1], that is, P(Y ≤ y) = y, 0 < y < 1. Proof. For Y = FX(X) and 0 < y < 1, we have P(Y ≤ y) = P(FX(X) ≤ y) = P(F −1

X (FX(X)) ≤ F −1 X (y))

= P(X ≤ F −1

X (y))

= FX(F −1

X (y)) = y.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

14 / 54

SLIDE 38

Expectation

Running example Cont’d

Suppose that in order to raise income for a local seniors citizens home, the town council for Pickering decides to hold a charity lottery:

SLIDE 39

Expectation

Running example Cont’d

Suppose that in order to raise income for a local seniors citizens home, the town council for Pickering decides to hold a charity lottery: After reading the fine print. Is this a good bet?

SLIDE 40

Expectation

Running example Cont’d

Suppose that in order to raise income for a local seniors citizens home, the town council for Pickering decides to hold a charity lottery: After reading the fine print. Is this a good bet? If 1,000 tickets will be sold. Is this a good bet?

SLIDE 41

Expectation

Running example Cont’d

Suppose that in order to raise income for a local seniors citizens home, the town council for Pickering decides to hold a charity lottery: After reading the fine print. Is this a good bet? If 1,000 tickets will be sold. Is this a good bet? If 10,000 tickets will be sold. Is this a good bet?

SLIDE 42

Expectation

Running example Cont’d

Suppose that in order to raise income for a local seniors citizens home, the town council for Pickering decides to hold a charity lottery: After reading the fine print. Is this a good bet? If 1,000 tickets will be sold. Is this a good bet? If 10,000 tickets will be sold. Is this a good bet? If 100,000 tickets will be sold. Is this a good bet?

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

15 / 54

SLIDE 43

Expectation

Running example Cont’d

Question: If 1,000 tickets will be sold. Is this a good bet?

SLIDE 44

Expectation

Running example Cont’d

Question: If 1,000 tickets will be sold. Is this a good bet? We can compute the average win of every investor as follows:

avg. = 20000 + 20 × 500

1000 = 30 > 10. (avg. = 1 1000 · 20000 + 20 1000 × 500 + 979 1000 × 0) Hence, it is worth to invest the charity lottery. If there are 10,000 tickets will be sold, how about your answer?

SLIDE 45

Expectation

Running example Cont’d

Question: If 1,000 tickets will be sold. Is this a good bet? We can compute the average win of every investor as follows:

avg. = 20000 + 20 × 500

1000 = 30 > 10. (avg. = 1 1000 · 20000 + 20 1000 × 500 + 979 1000 × 0)

SLIDE 46

Expectation

Running example Cont’d

Question: If 1,000 tickets will be sold. Is this a good bet? We can compute the average win of every investor as follows:

avg. = 20000 + 20 × 500

1000 = 30 > 10. (avg. = 1 1000 · 20000 + 20 1000 × 500 + 979 1000 × 0) Hence, it is worth to invest the charity lottery.

SLIDE 47

Expectation

Running example Cont’d

Question: If 1,000 tickets will be sold. Is this a good bet? We can compute the average win of every investor as follows:

avg. = 20000 + 20 × 500

1000 = 30 > 10. (avg. = 1 1000 · 20000 + 20 1000 × 500 + 979 1000 × 0) Hence, it is worth to invest the charity lottery. If there are 10,000 tickets will be sold, how about your answer?

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

16 / 54

SLIDE 48

Expectation

Expected value

Definition The expected value or mean of a r.v. g(X), denoted as E(g(X)), is E(g(X)) = +∞

−∞ g(x)fX(x)dx,

if X is continuous;

x∈X g(x)P(X = x),

if X is discrete;

SLIDE 49

Expectation

Expected value

Definition The expected value or mean of a r.v. g(X), denoted as E(g(X)), is E(g(X)) = +∞

−∞ g(x)fX(x)dx,

if X is continuous;

x∈X g(x)P(X = x),

if X is discrete; The deviation of X at ω ∈ Ω is X(ω) − E(X), the difference between the value of X and the mean of X. If E|g(X)| = ∞, we say that E(g(X)) does not exist.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

17 / 54

SLIDE 50

Expectation

Example I

Exponential mean Question: Suppose X has an exponential (λ) distribution, f (x) = 1 λe−x/λ, 0 ≤ x < +∞, λ > 0. Suppose Y = g(X) = X 2. Solution: The E(X) is given by E(X) = ∞ 1 λxe−x/λdx = −xe−x/λ|∞

0 +

∞ e−x/λdx = ∞ e−x/λdx = λ.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

18 / 54

SLIDE 51

Expectation

Example II

Binomial mean Question: Suppose X has a binomial distribution, its pmf is given by P(X = x) = n x

px(1 − p)n−x, x = 0, 1, · · · , n.

Solution:

E(X) =

n

x=0

x · P(X = x) =

n

x=1

x · n x

px(1 − p)n−x

=

n

x=1

n · n − 1 x − 1

pxqn−x = np

n−1

j=0

n − 1 j

pjqn−1−j

= np(p + q)n−1 = np

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

19 / 54

SLIDE 52

Expectation

Expected value of Geometric r.v.s

Theorem The expected number of successes when a r.v. X follows a Geometric distribution is 1

p, where p is the probability of success on each trial.

SLIDE 53

Expectation

Expected value of Geometric r.v.s

Theorem The expected number of successes when a r.v. X follows a Geometric distribution is 1

p, where p is the probability of success on each trial.

Proof. We have known that P(X = k) = qk−1p. Hence, we have E(X) =

∞

k=0

k · qk−1p = p(

∞

m=1

∞

k=m

qk−1) = p(

∞

m=1

qm−1 1 − q ) =

∞

m=1

qm−1 = 1 1 − q = 1 p

SLIDE 54

Expectation

Expected value of Geometric r.v.s

Theorem The expected number of successes when a r.v. X follows a Geometric distribution is 1

p, where p is the probability of success on each trial.

Proof. We have known that P(X = k) = qk−1p. Hence, we have E(X) =

∞

k=0

k · qk−1p = p(

∞

m=1

∞

k=m

qk−1) = p(

∞

m=1

qm−1 1 − q ) =

∞

m=1

qm−1 = 1 1 − q = 1 p

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

20 / 54

SLIDE 55

Expectation

Cauchy mean

A classic example of a r.v. whose expected value does not exist is a Cauchy r.v., that is, one with pdf fX(x) = 1 π 1 1 + x2 , −∞ < x < ∞. It is straightforward to check that +∞

−∞ fX(x)dx = 1, but E|X| = ∞.

For any positive number M, E|X| = +∞

−∞

|x| π 1 1 + x2 dx = 2 π +∞ x 1 + x2 dx = lim

M→∞

2 π M x 1 + x2 dx = 1 π lim

M→∞ log (1 + M2) = ∞

and E(X) does not exist.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

21 / 54

SLIDE 56

Expectation Properties of Expectations

Outline

1

Transformation Functions of a r.v. Monotone Transformations

2

Expectation Properties of Expectations Moment Moment Generating Functions

3

Differentiating Under an Integral Sign

4

Take-aways

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

22 / 54

SLIDE 57

Expectation Properties of Expectations

Linearity of expectations

Theorem Let X be a r.v. and let a, b and c be constants. Then for any functions g1(x) and g2(x) whose expectations exist,

a. E(ag1(X) + bg2(X) + c) = aE(g1(X)) + bE(g2(X)) + c;
b. If g1(x) ≥ 0 for all x, then E(g1(X)) ≥ 0;
c. If g1(x) ≥ g2(x) for all x, then E(g1(X)) ≥ E(g2(X));
d. If a ≤ g1(x) ≤ b for all x, then a ≤ E(g1(X)) ≤ b;

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

23 / 54

SLIDE 58

Expectation Properties of Expectations

Expected value of Bernoulli trials

Proof with linearity of expectations The expected number of successes when n mutually independent Bernoulli trials are performed, where p is the probability of success

n each trial, is np.

SLIDE 59

Expectation Properties of Expectations

Expected value of Bernoulli trials

Proof with linearity of expectations The expected number of successes when n mutually independent Bernoulli trials are performed, where p is the probability of success

n each trial, is np.

Proof. Let Xi be # heads in the i−th Bernoulli trial, and X be the number

f successes in n mutually independent Bernoulli trials. Hence we

have X = n

i=1 Xi, and E(Xi) = p.

E(X) = E(

n

i=1

Xi) =

n

i=1

E(Xi) = np.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

24 / 54

SLIDE 60

Expectation Properties of Expectations

Uniform exponential relationship

Suppose X ∼ fX(x) = 1 if 0 ≤ x ≤ 1 and 0 otherwise, the uniform (0, 1) distribution. Let Y = g(X) = − log X. E(g(X)) = E(− log X) = 1 − log xdx = 1 − x log x|1

0 = 1.

We also have Y = − log X has cdf 1−e−y, and pdf fY (y) = d

dy (1−

e−y) = e−y, 0 < y < ∞, which is a special case of the exponential pdf with λ = 1. Thus, E(Y ) = 1.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

25 / 54

SLIDE 61

Expectation Moment

Outline

1

Transformation Functions of a r.v. Monotone Transformations

2

Expectation Properties of Expectations Moment Moment Generating Functions

3

Differentiating Under an Integral Sign

4

Take-aways

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

26 / 54

SLIDE 62

Expectation Moment

Moment

For each integer n, the n−th moment of X, µ

′

n, is µ

′

n = E(X n).

The n−th central moment of X, µn, is µn = E(X − µ)n, where µ = µ

′

1 = E(X).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

27 / 54

SLIDE 63

Expectation Moment

Moment

For each integer n, the n−th moment of X, µ

′

n, is µ

′

n = E(X n).

The n−th central moment of X, µn, is µn = E(X − µ)n, where µ = µ

′

1 = E(X).

Variance The variance of a r.v. X is its second central moment, Var(X) = E(X − µ)2. The positive square root of Var(X) is the standard deviation of X. Var(X) = E(X 2) − (E(X))2; The variance gives a measure of the degree of spread of a distribution around its mean.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

27 / 54

SLIDE 64

Expectation Moment

Exponential variance

Let X have the exponential(λ) distribution. We can now calculate the variance by Var(X) = E(X − λ)2 = ∞ (x − λ)2 1 λe−x/λdx = ∞ (x2 − 2xλ + λ2) 1 λe−x/λdx To complete the integration, we can integrate each of the terms separately, using integration by parts on the terms involving x and x2. Upon doing this, we find that Var(X) = λ2.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

28 / 54

SLIDE 65

Expectation Moment

Variance of Bernoulli trial

Question: A coin is flipped one time. Let Ω be the sample space of the possible outcomes, and let X be r.v. that assigns to an outcome # heads in this outcome. What is the variance of X if it is a biased coin with P({H}) = p?

SLIDE 66

Expectation Moment

Variance of Bernoulli trial

Question: A coin is flipped one time. Let Ω be the sample space of the possible outcomes, and let X be r.v. that assigns to an outcome # heads in this outcome. What is the variance of X if it is a biased coin with P({H}) = p? Solution: E(X 2) = 12 · p + 02 · (1 − p) = p E(X) = 1 · p + 0 · (1 − p) = p V (X) = E(X 2) − (E(X))2 = p − p2 = p(1 − p)

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

29 / 54

SLIDE 67

Expectation Moment

Variance of Binomial r.v.s

Question: Let r.v. X be the number of successes of n mutually independent Bernoulli trials, where p is the probability of success on each trial. What is the variance of X?

SLIDE 68

Expectation Moment

Variance of Binomial r.v.s

Question: Let r.v. X be the number of successes of n mutually independent Bernoulli trials, where p is the probability of success on each trial. What is the variance of X? Solution:

E(X 2) =

n

k=0

k2 · P(X = k) =

n

k=1

k(k − 1) · P(X = k) +

n

k=1

k · P(X = k) = n(n − 1)p2

n

k=2
n − 2

k − 2

pk−2qn−k + np

= n(n − 1)p2

n−2

j=0
n − 2

j

pjqn−2−j + np

= n(n − 1)p2(p + q)n−2 + np = n(n − 1)p2 + np, V (X) = E(X 2) − (E(X))2 = n(n − 1)p2 + np − (np)2 = np(1 − p).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

30 / 54

SLIDE 69

Expectation Moment

Nonlinearity of variance

Theorem If X is a r.v. on Ω, and if a and b are real numbers, then V (aX + b) = a2V (X). Proof. V (aX + b) = E((aX + b)2) − (E(aX + b))2 = E((a2X 2 + 2abX + b2)) − (a2(E(X))2 + 2abE(X) + b2) = a2E(X 2) + 2abE(X) + b2 − a2(E(X))2 − 2abE(X) − b2 = a2E(X 2) − a2(E(X))2 = a2V (X).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

31 / 54

SLIDE 70

Expectation Moment

Bienaym´ e’s formula

Theorem Question: If X and Y are two independent r.v.s on a sample space Ω, then V (X +Y ) = V (X)+V (Y ). Furthermore, if Xi, i = 1, 2, · · · , n, with n a positive integer, are pairwise independent r.v.s on Ω, then V (

n

i=1

Xi) =

n

i=1

V (Xi). Proof:

V (X + Y ) = E((X + Y )2) − [E(X + Y )]2. = E(X 2 + 2XY + Y 2) − ([E(X)]2 + 2E(X)E(Y ) + [E(Y )]2) = E(X 2) + 2E(XY ) + E(Y 2) − [E(X)]2 − 2E(X)E(Y ) − [E(Y )]2 = E(X 2) + 2E(X)E(Y ) + E(Y 2) − [E(X)]2 − 2E(X)E(Y ) − [E(Y )]2 = V (X) + V (Y )

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

32 / 54

SLIDE 71

Expectation Moment

Variance of Binomial r.v.s

Question: Let r.v. X be the number of successes of n mutually independent Bernoulli trials, where p is the probability of success on each trial. What is the variance of X?

SLIDE 72

Expectation Moment

Variance of Binomial r.v.s

Question: Let r.v. X be the number of successes of n mutually independent Bernoulli trials, where p is the probability of success on each trial. What is the variance of X? Solution: Let Xi be the number of success in the i−th Bernoulli trial. Thus, we have X = n

i=1 Xi, Xi and Xj are independent for i = j.

E(X 2

i ) = 12p + 02(1 − p) = p;

V (Xi) = E(X 2

i ) − (E(Xi))2 = p − p2 = p(1 − p);

V (X) =

n

i=1

V (Xi) = np(1 − p).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

33 / 54

SLIDE 73

Expectation Moment

Variance of Binomial r.v.s

Question: Let r.v.s Xi, i = 1, 2, · · · , n, with n a positive integer, are independent and identical distribution r.v.s with V (Xi) = σ2. What is the variance of 1

n

i=1 Xi?

SLIDE 74

Expectation Moment

Variance of Binomial r.v.s

Question: Let r.v.s Xi, i = 1, 2, · · · , n, with n a positive integer, are independent and identical distribution r.v.s with V (Xi) = σ2. What is the variance of 1

n

i=1 Xi?

Solution: V (1 n

n

i=1

Xi) = (1 n)2V (

n

i=1

Xi) = (1 n)2

n

i=1

V (Xi) = (1 n)2nσ2 = σ2 n

SLIDE 75

Expectation Moment

Variance of Binomial r.v.s

Question: Let r.v.s Xi, i = 1, 2, · · · , n, with n a positive integer, are independent and identical distribution r.v.s with V (Xi) = σ2. What is the variance of 1

n

i=1 Xi?

Solution: V (1 n

n

i=1

Xi) = (1 n)2V (

n

i=1

Xi) = (1 n)2

n

i=1

V (Xi) = (1 n)2nσ2 = σ2 n That is, the variance of the mean decreases when n increases. It is a good property of variance.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

34 / 54

SLIDE 76

Expectation Moment

Expectation of independent r.v.s

Theorem If X and Y are independent r.v.s on a sample space Ω, then E(XY ) = E(X)E(Y ). Proof: To prove this formula, we use the key observation that event XY = r is

the disjoint union of events X = r1 and Y = r2 over all r1 ∈ X(Ω) and r2 ∈ Y (Ω) with r = r1r2. We have E(XY ) =

r∈XY (Ω)

r · P(XY = r) =

r1∈X(Ω),r2∈Y (Ω)

r1r2 · P(X = r1 ∧ Y = r2) =

r1∈X(Ω)
r2∈Y (Ω)

(r1 · P(X = r1))(r2 · P(Y = r2)) =

r1∈X(Ω)

(r1 · P(X = r1)

r2∈Y (Ω)

(r2 · P(Y = r2))) =

r1∈X(Ω)

(r1 · P(X = r1)E(Y )) = E(Y )

r1∈X(Ω)

r1 · P(X = r1) = E(X)E(Y ).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

35 / 54

SLIDE 77

Expectation Moment Generating Functions

Outline

1

Transformation Functions of a r.v. Monotone Transformations

2

Expectation Properties of Expectations Moment Moment Generating Functions

3

Differentiating Under an Integral Sign

4

Take-aways

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

36 / 54

SLIDE 78

Expectation Moment Generating Functions

Moment generating functions

Definition Let X be a r.v. with cdf FX(x). The moment generating function of X (or FX), denoted by MX(t), is MX(t) = E(etX). The expectation exists for t in some neighborhood of 0 if there is an h > 0, such that E(etX) for all t ∈ (−h, h). The moment generating function does not exist if the expectation does not exist. The moment generating function can be computed as

X is continuous, MX(t) = +∞

−∞ etxfX(x)dx;

X is discrete, MX(t) =

x etxP(X = x).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

37 / 54

SLIDE 79

Expectation Moment Generating Functions

Theorem

If X has moment generating function MX(t), then E(X n) = M(n)

X (0),

where we define M(n)

X (0) = dn

dtn MX(t)|t=0. Proof. d dt MX(t) = d dt +∞

−∞

etxfX(x)dx = +∞

−∞

d dt (etx)fX(x)dx = +∞

−∞

(xetx)fX(x)dx = E(XetX).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

38 / 54

SLIDE 80

Expectation Moment Generating Functions

Proof Cont’d

Proof. Thus, d dt MX(t)|t=0 = E(XetX)|t=0 = EX. Proceeding in an analogous manner, we can establish that dn dtn MX(t)|t=0 = E(X netX)|t=0 = EX n. That is, the n−th moment is equal to the n−th derivative of MX(t) evaluated at t = 0.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

39 / 54

SLIDE 81

Expectation Moment Generating Functions

Gamma moment generating function

The Gamma pdf is f (x) = 1 Γ(α)βα xα−1e−x/βdx, 0 < x < ∞, α > 0, β > 0. The moment generating function is given by MX(t) = 1 Γ(α)βα ∞ etxxα−1e−x/βdx = 1 Γ(α)βα ∞ xα−1e−x/(

β 1−βt )dx

Note that ∞

0 fX(x)dx = 1, hence

∞

0 xα−1e−x/βdx = Γ(α)βα.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

40 / 54

SLIDE 82

Expectation Moment Generating Functions

Gamma moment generating function Cont’d

If t < 1

β,

MX(t) = 1 Γ(α)βα Γ(α)

β

1 − βt α =

1

1 − βt α. The mean of the Gamma distribution is given by E(X) = d dt MX(t)

t=0 =

αβ (1 − βt)α+1

t=0 = αβ.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

41 / 54

SLIDE 83

Expectation Moment Generating Functions

Moment generating functions VS. distributions

Theorem Let FX(x) and FY (y) be two cdfs all of whose moments exist If X and Y have bounded support, then FX(u) = FY (u) for all u if and only if E(X r) = E(Y r) for all integers r = 0, 1, 2, · · · If the moment generating functions exist and MX(t) = MY (t) for all t in some neighborhood of 0, then FX(u) = FY (u) for all u.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

42 / 54

SLIDE 84

Expectation Moment Generating Functions

Convergence of moment generating functions

Suppose {Xi, i = 1, 2, · · · } is a sequence of r.v., each with mgf MXi(t). Furthermore, suppose that, for all t in a neighborhood of 0, lim

i→∞ MXi(t) = MX(t),

where MX(t) is an mgf. Then there is a unique cdf FX(x) whose moments are determined by MX(t) and, for all x where FX(x) is continuous, we have lim

i→∞ FXi(x) = FX(x).

That is, convergence, for |t| < h, of mgfs to an mgf implies conver- gence of cdfs.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

43 / 54

SLIDE 85

Expectation Moment Generating Functions

Property of moment generating functions

For any constants a and b, the mgf of the r.v. aX + b is given by MaX+b(t) = ebtMX(at). Proof. MaX+b(t) = E

e(aX+b)t

= E

e(aX)tebt

= ebtE

e(at)X

= ebtMX(at).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

44 / 54

SLIDE 86

Differentiating Under an Integral Sign

Leibnitz’s Rule

If f (x, θ), a(θ) and b(θ) are differentiable w.r.t. θ, then

d dθ b(θ)

a(θ)

f (x, θ)dx = f (b(θ), θ)db(θ) dθ −f (a(θ), θ)da(θ) dθ + b(θ)

a(θ)

∂f (x, θ) ∂θ dx.

Notice that if a(θ) and b(θ) are constant, we have a special case of Leibnitz’s Rule: d dθ b

a

f (x, θ)dx = b

a

∂f (x, θ) ∂θ dx. If we have the integral of a differentiable function over a finite range, differentiation of the integral poses no problem; If the range of integration is infinite, problem can arise.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

45 / 54

SLIDE 87

Differentiating Under an Integral Sign

Interchanging differentiation and integration

Recall that if f (x, θ) is differentiable, then ∂f (x, θ) ∂θ = lim

δ→0

f (x, θ + δ) − f (x, θ) δ , So we have +∞

−∞

∂f (x, θ) ∂θ dx = +∞

−∞

lim

δ→0

f (x, θ + δ) − f (x, θ) δ

dx,

while d dθ +∞

−∞

f (x, θ)dx = lim

δ→0

+∞

−∞

f (x, θ + δ) − f (x, θ) δ

dx.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

46 / 54

SLIDE 88

Differentiating Under an Integral Sign

Corollary of Lebesgue’s Dominated Convergence Theorem

Suppose the function h(x, y) is continuous at y0 for each x, and there exists a function g(x) satisfying |h(x, y)| ≤ g(x) for all x and y; +∞

−∞ g(x)dx < ∞.

Then, lim

y→y0

+∞

−∞

h(x, y)dx = +∞

−∞

lim

y→y0 h(x, y)dx.

The key condition in this theorem is the existence of a dominating function g(x), with a finite integral.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

47 / 54

SLIDE 89

Differentiating Under an Integral Sign

Corollary

Suppose the function f (x, θ) is continuous at θ0, that is,

lim

δ→0

f (x, θ0 + δ) − f (x, θ0) δ = ∂f (x, θ) ∂θ

θ=θ0

exists for every x, and there exists a function g(x, θ0) and a constant δ0 > 0 such that | f (x,θ0+δ)−f (x,θ0)

δ

| ≤ g(x, θ0) for all x and |δ| ≤ δ0; +∞

−∞ g(x, θ0)dx < ∞.

Then,

d dθ +∞

−∞

f (x, θ)dx

θ=θ0 =

+∞

−∞

∂f (x, θ) ∂θ

θ=θ0
dx.

The distinction between θ and θ0 is not stressed when f (x, θ) is differentiable at all θ.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

48 / 54

SLIDE 90

Differentiating Under an Integral Sign

Example

Let X have the exponential(λ) pdf given by f (x) = 1

λe−x/λ, 0 < x <

∞, and suppose we want to calculate d dλE(X n) = d dλ ∞ xn 1 λe−x/λdx for integer n > 0. If we could move the differentiation inside the integral, we would have d dλE(X n) = ∞ ∂ ∂λxn 1 λe−x/λdx = ∞ xn λ2 (x λ)e−x/λdx = 1 λ2 E(X n+1) − 1 λE(X n). E(X n+1) = λE(X n) + λ2 d dλE(X n).

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

49 / 54

SLIDE 91

Differentiating Under an Integral Sign

Justifying the interchange

We bound the derivative of xn 1

λe−x/λ. Now

| ∂ ∂λ xne−x/λ λ

| = xne−x/λ

λ2 |x λ − 1| ≤ xne−x/λ λ2 (x λ + 1).(x λ > 0) For 0 < δ0 < λ and |λ

′ − λ| ≤ δ0, we take

| ∂ ∂λ xne−x/λ λ

λ=λ′| ≤ xne−x/(λ−δ0)

(λ − δ0)2

x

λ − δ0 + 1

= g(x, λ − δ0).

Since the exponential distribution has all of its moments, i.e., +∞

−∞ g(x, λ − δ0) < ∞ as long as λ − δ0 > 0.

Thus, the example gives us a recursion relation for the moment of exponential distribution.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

50 / 54

SLIDE 92

Differentiating Under an Integral Sign

Interchanging differentiation and summation

Suppose that the series ∞

x=0 h(θ, x) converges for all θ in an interval

(a, b) of real numbers and

∂ ∂θh(θ, x) is continuous in θ for each x;

∞

x=0 ∂ ∂θh(θ, x) converges uniformly on every closed bounded

subinterval of (a, b). Then, d dθ

∞

x=0

h(θ, x) =

∞

x=0

∂ ∂θh(θ, x). The key condition in this theorem is the existence of a dominating function g(x), with a finite integral.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

51 / 54

SLIDE 93

Differentiating Under an Integral Sign

Example

Let X be a geometric distribution P(X = x) = θ(1 − θ)x, x = 0, 1, · · · , 0 < θ < 1. We have that ∞

x=0 θ(1 − θ)x = 1, and

d dθ

∞

x=0

θ(1 − θ)x =

∞

x=0

d dθθ(1 − θ)x =

∞

x=0
(1 − θ)x − θx(1 − θ)x−1

= 1 θ

∞

x=0

θ(1 − θ)x − 1 1 − θ

∞

x=0

xθ(1 − θ)x = 1 θ − 1 1 − θE(X) = 0. That is E(X) = 1−θ

θ .

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

52 / 54

SLIDE 94

Differentiating Under an Integral Sign

Justifying the interchange

Let h(θ, x) = θ(1 − θ)x. Then

∂ ∂θh(θ, x) = (1 − θ)x − θx(1 − θ)x−1,

and verify that ∞

x=0 ∂ ∂θh(θ, x) converges uniformly. Calculate Sn(θ)

by

lim

n→∞ Sn(θ) = lim n→∞

n
x=0
(1 − θ)x − θx(1 − θ)x−1

= lim

n→∞

1 − (1 − θ)n+1 θ − θ

n

x=0

∂ ∂θ(1 − θ)x = lim

n→∞

1 − (1 − θ)n+1 θ − θ ∂ ∂θ

n

x=0

(1 − θ)x = lim

n→∞

1 − (1 − θ)n+1 θ − θ ∂ ∂θ 1 − (1 − θ)n+1 θ

= lim

n→∞(n + 1)(1 − θ)n = 0.

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

53 / 54

SLIDE 95

Take-aways

Conclusions Transformation

Functions of a r.v. Monotone Transformations

Expectation

Properties of Expectations Moment Moment Generating Functions

Differentiating Under an Integral Sign

MING GAO (DaSE@ECNU) Statistical Inference

Mar. 17, 2020

54 / 54