BS2247 Introduction to Econometrics Lecture 2: Fundamentals of - - PowerPoint PPT Presentation

bs2247 introduction to econometrics lecture 2
SMART_READER_LITE
LIVE PREVIEW

BS2247 Introduction to Econometrics Lecture 2: Fundamentals of - - PowerPoint PPT Presentation

BS2247 Introduction to Econometrics Lecture 2: Fundamentals of Probability Dr. Kai Sun Aston Business School 1 / 30 Why do we care about this topic? Economic variables (e.g., education, wage, etc.) are random variables, in the sense that


slide-1
SLIDE 1

BS2247 Introduction to Econometrics Lecture 2: Fundamentals of Probability

  • Dr. Kai Sun

Aston Business School

1 / 30

slide-2
SLIDE 2

Why do we care about this topic?

◮ Economic variables (e.g., education, wage, etc.) are random

variables, in the sense that each observation (i.e., realization) is a random draw from the entire population.

◮ Each random variable has a probability measure. Roughly

speaking, a probability measure is a function which maps from the occurring of an event (e.g., realization of a random variable) to the probability of the occurring of the event.

2 / 30

slide-3
SLIDE 3

Why do we care about this topic?

◮ Economic variables (e.g., education, wage, etc.) are random

variables, in the sense that each observation (i.e., realization) is a random draw from the entire population.

◮ Each random variable has a probability measure. Roughly

speaking, a probability measure is a function which maps from the occurring of an event (e.g., realization of a random variable) to the probability of the occurring of the event.

2 / 30

slide-4
SLIDE 4

For example, the probability measure can tell us that: The probability of, say, educ = 12 years (i.e., realization of a random variable, education) is, say, 0.2. This is the same as saying that 20% of the observations in the sample have educ = 12 years.

3 / 30

slide-5
SLIDE 5

Discrete random variables

◮ Random variables that take on only a finite number of values ◮ For example, consider tossing a single coin.

The two outcomes/events are heads and tails.

◮ A discrete random variable then can be defined as:

x = 1 if the coin turns up heads, x = 0 if the coin turns up tails. By tossing the coin a number of times, we can calculate the probability of x = 1 and x = 0.

4 / 30

slide-6
SLIDE 6

Discrete random variables

◮ Random variables that take on only a finite number of values ◮ For example, consider tossing a single coin.

The two outcomes/events are heads and tails.

◮ A discrete random variable then can be defined as:

x = 1 if the coin turns up heads, x = 0 if the coin turns up tails. By tossing the coin a number of times, we can calculate the probability of x = 1 and x = 0.

4 / 30

slide-7
SLIDE 7

Discrete random variables

◮ Random variables that take on only a finite number of values ◮ For example, consider tossing a single coin.

The two outcomes/events are heads and tails.

◮ A discrete random variable then can be defined as:

x = 1 if the coin turns up heads, x = 0 if the coin turns up tails. By tossing the coin a number of times, we can calculate the probability of x = 1 and x = 0.

4 / 30

slide-8
SLIDE 8

Probability Density Function (pdf)

◮ Generally, if a discrete random variable, X, takes on the n

possible values {x1, . . . , xn}, then the probability measure is pi = P(X = xi), i = 1, 2, . . . , n, where 0 ≤ pi ≤ 1 and

i pi = 1. ◮ P(·) is also called probability density function (pdf) of X.

“The probability of X = xi is equal to pi”.

5 / 30

slide-9
SLIDE 9

Probability Density Function (pdf)

◮ Generally, if a discrete random variable, X, takes on the n

possible values {x1, . . . , xn}, then the probability measure is pi = P(X = xi), i = 1, 2, . . . , n, where 0 ≤ pi ≤ 1 and

i pi = 1. ◮ P(·) is also called probability density function (pdf) of X.

“The probability of X = xi is equal to pi”.

5 / 30

slide-10
SLIDE 10

The pdf of heads and tails from the coin-tossing example

This is essentially a histogram!

6 / 30

slide-11
SLIDE 11

Continuous random variables

◮ They are random variables that take on numerous values. ◮ For example, wage should be a continuous random variable. ◮ In practice, education can also be considered as continuous.

However, in theory, education may be discrete.

7 / 30

slide-12
SLIDE 12

Continuous random variables

◮ The pdf of continuous random variable computes the

probability of events (i.e., realizations) involving a range of values (not a particular value!).

◮ P(a ≤ X ≤ b) measures

the probability that X ranges from a to b.

◮ For example, P(16 ≤ wage ≤ 18) = 0.1 says that the

probability that wage ranges from 16 to 18 is 0.1. This is the same as saying that 10% of the observations in the sample have 16/hour ≤ wage ≤ 18/hour.

8 / 30

slide-13
SLIDE 13

Continuous random variables

◮ The pdf of continuous random variable computes the

probability of events (i.e., realizations) involving a range of values (not a particular value!).

◮ P(a ≤ X ≤ b) measures

the probability that X ranges from a to b.

◮ For example, P(16 ≤ wage ≤ 18) = 0.1 says that the

probability that wage ranges from 16 to 18 is 0.1. This is the same as saying that 10% of the observations in the sample have 16/hour ≤ wage ≤ 18/hour.

8 / 30

slide-14
SLIDE 14

Continuous random variables

◮ The pdf of continuous random variable computes the

probability of events (i.e., realizations) involving a range of values (not a particular value!).

◮ P(a ≤ X ≤ b) measures

the probability that X ranges from a to b.

◮ For example, P(16 ≤ wage ≤ 18) = 0.1 says that the

probability that wage ranges from 16 to 18 is 0.1. This is the same as saying that 10% of the observations in the sample have 16/hour ≤ wage ≤ 18/hour.

8 / 30

slide-15
SLIDE 15

Use Histogram to illustrate the pdf of wage

Histogram of wage

wage pdf of wage 10 20 30 0.00 0.02 0.04 0.06 0.08 0.10 9 / 30

slide-16
SLIDE 16

Use Density plot to illustrate the pdf of wage

Density of wage

wage pdf of wage 10 20 30 0.00 0.02 0.04 0.06 0.08 0.10 10 / 30

slide-17
SLIDE 17

Normal Distribution

10 20 30 0.00 0.02 0.04 0.06 0.08 0.10

Density of wage

wage pdf of wage 11 / 30

slide-18
SLIDE 18

Other Distributions

12 / 30

slide-19
SLIDE 19

Cumulative Distribution Function (cdf)

◮ Sometimes it’s easier to work with cumulative distribution

function (cdf), defined as F(x) = P(X ≤ x) where X is a random variable (either discrete or continuous), and x is any real number.

◮ For continuous random variable,

F(x) is the area under the pdf, to the left of the point x.

13 / 30

slide-20
SLIDE 20

Cumulative Distribution Function (cdf)

◮ Sometimes it’s easier to work with cumulative distribution

function (cdf), defined as F(x) = P(X ≤ x) where X is a random variable (either discrete or continuous), and x is any real number.

◮ For continuous random variable,

F(x) is the area under the pdf, to the left of the point x.

13 / 30

slide-21
SLIDE 21

Cumulative Distribution Function (cdf)

Properties of cdf: (1) P(X > c) = 1 − F(c), for any number c (2) P(a < X ≤ b) = F(b) − F(a) (3) for continuous random variable, any of the above inequality can become strict inequality, and vice versa From the previous example, if P(16 ≤ wage ≤ 18) = 0.1, then F(18) − F(16) = 0.1, where F is the cdf of wage.

14 / 30

slide-22
SLIDE 22

Cumulative Distribution Function (cdf)

Properties of cdf: (1) P(X > c) = 1 − F(c), for any number c (2) P(a < X ≤ b) = F(b) − F(a) (3) for continuous random variable, any of the above inequality can become strict inequality, and vice versa From the previous example, if P(16 ≤ wage ≤ 18) = 0.1, then F(18) − F(16) = 0.1, where F is the cdf of wage.

14 / 30

slide-23
SLIDE 23

Cumulative Distribution Function (cdf)

Properties of cdf: (1) P(X > c) = 1 − F(c), for any number c (2) P(a < X ≤ b) = F(b) − F(a) (3) for continuous random variable, any of the above inequality can become strict inequality, and vice versa From the previous example, if P(16 ≤ wage ≤ 18) = 0.1, then F(18) − F(16) = 0.1, where F is the cdf of wage.

14 / 30

slide-24
SLIDE 24

Cumulative Distribution Function (cdf)

Properties of cdf: (1) P(X > c) = 1 − F(c), for any number c (2) P(a < X ≤ b) = F(b) − F(a) (3) for continuous random variable, any of the above inequality can become strict inequality, and vice versa From the previous example, if P(16 ≤ wage ≤ 18) = 0.1, then F(18) − F(16) = 0.1, where F is the cdf of wage.

14 / 30

slide-25
SLIDE 25

Features of Random Variables

Expected value (population mean)

◮ It is a weighted average of all possible values of X.

The weights are determined by pdf.

◮ Precisely, if X is discrete and can take values {x1, . . . , xn}

with pdf pi = P(X = xi), then the expected value of X is E(X) = x1p1 + · · · + xnpn =

i xipi

(where xi are the realizations, and pi are the weights)

15 / 30

slide-26
SLIDE 26

Features of Random Variables

Expected value (population mean)

◮ It is a weighted average of all possible values of X.

The weights are determined by pdf.

◮ Precisely, if X is discrete and can take values {x1, . . . , xn}

with pdf pi = P(X = xi), then the expected value of X is E(X) = x1p1 + · · · + xnpn =

i xipi

(where xi are the realizations, and pi are the weights)

15 / 30

slide-27
SLIDE 27

Example Question: X = {4, 12, 2, 6}, P(X = 4) = 0.1, P(X = 12) = 0.2, P(X = 2) = 0.5, P(X = 6) = 0.2, calculate E(X) and E(X 2). Answer: E(X) = 4×0.1+12×0.2+2×0.5+6×0.2 = 0.4+2.4+1+1.2 = 5. E(X 2) = 42 × 0.1 + 122 × 0.2 + 22 × 0.5 + 62 × 0.2 = 1.6 + 28.8 + 2 + 7.2 = 39.6.

16 / 30

slide-28
SLIDE 28

Example Question: X = {4, 12, 2, 6}, P(X = 4) = 0.1, P(X = 12) = 0.2, P(X = 2) = 0.5, P(X = 6) = 0.2, calculate E(X) and E(X 2). Answer: E(X) = 4×0.1+12×0.2+2×0.5+6×0.2 = 0.4+2.4+1+1.2 = 5. E(X 2) = 42 × 0.1 + 122 × 0.2 + 22 × 0.5 + 62 × 0.2 = 1.6 + 28.8 + 2 + 7.2 = 39.6.

16 / 30

slide-29
SLIDE 29

Features of Random Variables

Properties of Expected values (1) E(c) = c, where c is constant (not random!) (2) E(aX + b) = aE(X) + b, where a and b are constants (3) E(

i aixi) = i aiE(xi)

(the expectation of summation is the summation of expectation)

17 / 30

slide-30
SLIDE 30

Features of Random Variables

Properties of Expected values (1) E(c) = c, where c is constant (not random!) (2) E(aX + b) = aE(X) + b, where a and b are constants (3) E(

i aixi) = i aiE(xi)

(the expectation of summation is the summation of expectation)

17 / 30

slide-31
SLIDE 31

Features of Random Variables

Properties of Expected values (1) E(c) = c, where c is constant (not random!) (2) E(aX + b) = aE(X) + b, where a and b are constants (3) E(

i aixi) = i aiE(xi)

(the expectation of summation is the summation of expectation)

17 / 30

slide-32
SLIDE 32

Features of Random Variables

Median It is the value in the middle of an ordered sequence of realizations

  • f a random variable.

Example Question: X = {4, 12, 2, 6}, find the median of X. Answer: X = {2, 4, 6, 12}, taking the average of the two numbers in the middle gives (4 + 6)/2 = 5.

18 / 30

slide-33
SLIDE 33

Features of Random Variables

Median It is the value in the middle of an ordered sequence of realizations

  • f a random variable.

Example Question: X = {4, 12, 2, 6}, find the median of X. Answer: X = {2, 4, 6, 12}, taking the average of the two numbers in the middle gives (4 + 6)/2 = 5.

18 / 30

slide-34
SLIDE 34

Features of Random Variables

Median It is the value in the middle of an ordered sequence of realizations

  • f a random variable.

Example Question: X = {4, 12, 2, 6}, find the median of X. Answer: X = {2, 4, 6, 12}, taking the average of the two numbers in the middle gives (4 + 6)/2 = 5.

18 / 30

slide-35
SLIDE 35

Features of Random Variables

Variance: measuring spread of pdf Var(X) = E(X − E(X))2 = E(X 2) − (E(X))2 Properties of variance (1) Var(aX + b) = a2Var(X) (2) Var(aX ± bY ) = a2Var(X) + b2Var(Y ) ± 2abCov(X, Y ) (where a and b are constants, X and Y are random) (3)* Var(

i aixi) = i a2 i Var(xi) if Cov(xi, xj) = 0 ∀i = j

(where ai are constants, xi are random)

19 / 30

slide-36
SLIDE 36

Features of Random Variables

Variance: measuring spread of pdf Var(X) = E(X − E(X))2 = E(X 2) − (E(X))2 Properties of variance (1) Var(aX + b) = a2Var(X) (2) Var(aX ± bY ) = a2Var(X) + b2Var(Y ) ± 2abCov(X, Y ) (where a and b are constants, X and Y are random) (3)* Var(

i aixi) = i a2 i Var(xi) if Cov(xi, xj) = 0 ∀i = j

(where ai are constants, xi are random)

19 / 30

slide-37
SLIDE 37

Features of Random Variables

Variance: measuring spread of pdf Var(X) = E(X − E(X))2 = E(X 2) − (E(X))2 Properties of variance (1) Var(aX + b) = a2Var(X) (2) Var(aX ± bY ) = a2Var(X) + b2Var(Y ) ± 2abCov(X, Y ) (where a and b are constants, X and Y are random) (3)* Var(

i aixi) = i a2 i Var(xi) if Cov(xi, xj) = 0 ∀i = j

(where ai are constants, xi are random)

19 / 30

slide-38
SLIDE 38

Features of Random Variables

Variance: measuring spread of pdf Var(X) = E(X − E(X))2 = E(X 2) − (E(X))2 Properties of variance (1) Var(aX + b) = a2Var(X) (2) Var(aX ± bY ) = a2Var(X) + b2Var(Y ) ± 2abCov(X, Y ) (where a and b are constants, X and Y are random) (3)* Var(

i aixi) = i a2 i Var(xi) if Cov(xi, xj) = 0 ∀i = j

(where ai are constants, xi are random)

19 / 30

slide-39
SLIDE 39

Standard deviation (sd) is the squared root of variance. Property of Standard deviation: sd(aX + b) = |a|sd(x)

20 / 30

slide-40
SLIDE 40

Example Question: X = {4, 12, 2, 6}, P(X = 4) = 0.1, P(X = 12) = 0.2, P(X = 2) = 0.5, P(X = 6) = 0.2, find the variance and standard deviation of X. Answer: Var(X) = E(X 2) − (E(X))2. We calculated that E(X 2) = 39.6 and E(X) = 5, so Var(X) = 39.6 − 52 = 14.6. sd(X) =

  • Var(X) = 3.82.

21 / 30

slide-41
SLIDE 41

Example Question: X = {4, 12, 2, 6}, P(X = 4) = 0.1, P(X = 12) = 0.2, P(X = 2) = 0.5, P(X = 6) = 0.2, find the variance and standard deviation of X. Answer: Var(X) = E(X 2) − (E(X))2. We calculated that E(X 2) = 39.6 and E(X) = 5, so Var(X) = 39.6 − 52 = 14.6. sd(X) =

  • Var(X) = 3.82.

21 / 30

slide-42
SLIDE 42

Features of Random Variables

Covariance: measuring association of two random variables Cov(X, Y ) = E(X − E(X))(Y − E(Y )) = E(XY ) − E(X)E(Y ) Properties of covariance (1) Cov(a1X + b1, a2Y + b2) = a1a2Cov(X, Y ) (2) If X and Y are independent, then Cov(X, Y ) = 0

22 / 30

slide-43
SLIDE 43

Features of Random Variables

Covariance: measuring association of two random variables Cov(X, Y ) = E(X − E(X))(Y − E(Y )) = E(XY ) − E(X)E(Y ) Properties of covariance (1) Cov(a1X + b1, a2Y + b2) = a1a2Cov(X, Y ) (2) If X and Y are independent, then Cov(X, Y ) = 0

22 / 30

slide-44
SLIDE 44

Features of Random Variables

Covariance: measuring association of two random variables Cov(X, Y ) = E(X − E(X))(Y − E(Y )) = E(XY ) − E(X)E(Y ) Properties of covariance (1) Cov(a1X + b1, a2Y + b2) = a1a2Cov(X, Y ) (2) If X and Y are independent, then Cov(X, Y ) = 0

22 / 30

slide-45
SLIDE 45

Features of Random Variables

Correlation coefficient: measuring association of two random variables It is the standardized covariance, in the sense that Corr(X, Y ) =

Cov(X,Y ) sd(X)sd(Y )

Properties of Correlation coefficient (1) −1 ≤ Corr(X, Y ) ≤ 1 (2) Corr(X, Y ) = 0 ⇐ ⇒ Cov(X, Y ) = 0 (3) Corr(a1X + b1, a2Y + b2) = Corr(X, Y )

23 / 30

slide-46
SLIDE 46

Features of Random Variables

Correlation coefficient: measuring association of two random variables It is the standardized covariance, in the sense that Corr(X, Y ) =

Cov(X,Y ) sd(X)sd(Y )

Properties of Correlation coefficient (1) −1 ≤ Corr(X, Y ) ≤ 1 (2) Corr(X, Y ) = 0 ⇐ ⇒ Cov(X, Y ) = 0 (3) Corr(a1X + b1, a2Y + b2) = Corr(X, Y )

23 / 30

slide-47
SLIDE 47

Features of Random Variables

Correlation coefficient: measuring association of two random variables It is the standardized covariance, in the sense that Corr(X, Y ) =

Cov(X,Y ) sd(X)sd(Y )

Properties of Correlation coefficient (1) −1 ≤ Corr(X, Y ) ≤ 1 (2) Corr(X, Y ) = 0 ⇐ ⇒ Cov(X, Y ) = 0 (3) Corr(a1X + b1, a2Y + b2) = Corr(X, Y )

23 / 30

slide-48
SLIDE 48

Features of Random Variables

Correlation coefficient: measuring association of two random variables It is the standardized covariance, in the sense that Corr(X, Y ) =

Cov(X,Y ) sd(X)sd(Y )

Properties of Correlation coefficient (1) −1 ≤ Corr(X, Y ) ≤ 1 (2) Corr(X, Y ) = 0 ⇐ ⇒ Cov(X, Y ) = 0 (3) Corr(a1X + b1, a2Y + b2) = Corr(X, Y )

23 / 30

slide-49
SLIDE 49

Standardizing a Random Variable

◮ We usually write X ∼ (µ, σ2),

where µ is the mean of X, E(X); and σ2 is the variance of X. Read as “a random variable X is distributed as mean µ and variance σ2”

◮ If we define a new random variable Z = X−µ σ ,

we can find that E(Z) = 0, and Var(Z) = 1. So Z ∼ (0, 1), is called a standardized random variable.

◮ Continue with the previous example, E(X) = 5, and

Var(X) = 14.6, then Z = (X − 5)/ √ 14.6 is a standardized random variable.

24 / 30

slide-50
SLIDE 50

Standardizing a Random Variable

◮ We usually write X ∼ (µ, σ2),

where µ is the mean of X, E(X); and σ2 is the variance of X. Read as “a random variable X is distributed as mean µ and variance σ2”

◮ If we define a new random variable Z = X−µ σ ,

we can find that E(Z) = 0, and Var(Z) = 1. So Z ∼ (0, 1), is called a standardized random variable.

◮ Continue with the previous example, E(X) = 5, and

Var(X) = 14.6, then Z = (X − 5)/ √ 14.6 is a standardized random variable.

24 / 30

slide-51
SLIDE 51

Standardizing a Random Variable

◮ We usually write X ∼ (µ, σ2),

where µ is the mean of X, E(X); and σ2 is the variance of X. Read as “a random variable X is distributed as mean µ and variance σ2”

◮ If we define a new random variable Z = X−µ σ ,

we can find that E(Z) = 0, and Var(Z) = 1. So Z ∼ (0, 1), is called a standardized random variable.

◮ Continue with the previous example, E(X) = 5, and

Var(X) = 14.6, then Z = (X − 5)/ √ 14.6 is a standardized random variable.

24 / 30

slide-52
SLIDE 52

Conditional Expectation E(Y |x)

◮ Read as “(conditional) expectation of Y given x” ◮ Intuitively, this is E(Y ) given a particular value of x. ◮ For example, E(wage|educ = 12) is the average wage for all

people with 12 years of education. So E(wage|educ) is usually a function of educ, say, E(wage|educ) = 1.05 + 0.45educ. From this example, E(wage|educ = 12) = 1.05 + 0.45 × 12 = 6.45 pounds/hour.

25 / 30

slide-53
SLIDE 53

Conditional Expectation E(Y |x)

◮ Read as “(conditional) expectation of Y given x” ◮ Intuitively, this is E(Y ) given a particular value of x. ◮ For example, E(wage|educ = 12) is the average wage for all

people with 12 years of education. So E(wage|educ) is usually a function of educ, say, E(wage|educ) = 1.05 + 0.45educ. From this example, E(wage|educ = 12) = 1.05 + 0.45 × 12 = 6.45 pounds/hour.

25 / 30

slide-54
SLIDE 54

Conditional Expectation E(Y |x)

◮ Read as “(conditional) expectation of Y given x” ◮ Intuitively, this is E(Y ) given a particular value of x. ◮ For example, E(wage|educ = 12) is the average wage for all

people with 12 years of education. So E(wage|educ) is usually a function of educ, say, E(wage|educ) = 1.05 + 0.45educ. From this example, E(wage|educ = 12) = 1.05 + 0.45 × 12 = 6.45 pounds/hour.

25 / 30

slide-55
SLIDE 55

◮ P(Y |x) is the conditional probability density function (pdf) of

Y given x.

◮ P(wage|educ = 12) is the proportion of people in the

population with 12 years of education. So P(16 ≤ wage ≤ 18|educ = 12) = 0.1 means that, for those with 12 years of education, 10% of them have 16/hour ≤ wage ≤ 18/hour.

26 / 30

slide-56
SLIDE 56

◮ P(Y |x) is the conditional probability density function (pdf) of

Y given x.

◮ P(wage|educ = 12) is the proportion of people in the

population with 12 years of education. So P(16 ≤ wage ≤ 18|educ = 12) = 0.1 means that, for those with 12 years of education, 10% of them have 16/hour ≤ wage ≤ 18/hour.

26 / 30

slide-57
SLIDE 57

Conditional Expectation E(Y |x)

If Y is discrete and can take values y1, . . . , ym with conditional pdf pj = P(Y = yj|x), then the conditional expectation of Y given x is E(Y |x) = y1p1 + · · · + ympm =

j yjpj

(where yj are the realizations, and pj are the weights)

27 / 30

slide-58
SLIDE 58

Example Question: Y = {4, 12, 2, 6}, P(Y = 4|x = 1) = 0.1, P(Y = 12|x = 1) = 0.2, P(Y = 2|x = 1) = 0.5, P(Y = 6|x = 1) = 0.2, calculate E(Y |x = 1) Answer: E(Y |x = 1) = 4 × 0.1 + 12 × 0.2 + 2 × 0.5 + 6 × 0.2 = 0.4 + 2.4 + 1 + 1.2 = 5.

28 / 30

slide-59
SLIDE 59

Example Question: Y = {4, 12, 2, 6}, P(Y = 4|x = 1) = 0.1, P(Y = 12|x = 1) = 0.2, P(Y = 2|x = 1) = 0.5, P(Y = 6|x = 1) = 0.2, calculate E(Y |x = 1) Answer: E(Y |x = 1) = 4 × 0.1 + 12 × 0.2 + 2 × 0.5 + 6 × 0.2 = 0.4 + 2.4 + 1 + 1.2 = 5.

28 / 30

slide-60
SLIDE 60

X and Y are random variables. Properties of Conditional Expectation: (1) E[a(X)Y + b(X)|X] = a(X)E(Y |X) + b(X) “we know X, but we don’t know Y , and hence E(Y |X)” “we know X, and so we know functions of X, a(X) and b(X)” (2) E[E(Y |X)] = E(Y ): law of iterated expectation “the average of average of Y given X is the same as the simple average of Y ” (3) If E(Y |X) = E(Y ), then Cov(X, Y ) = 0, and Corr(X, Y ) = 0 “if knowing X doesn’t help to know Y , then X and Y are uncorrelated”

29 / 30

slide-61
SLIDE 61

X and Y are random variables. Properties of Conditional Expectation: (1) E[a(X)Y + b(X)|X] = a(X)E(Y |X) + b(X) “we know X, but we don’t know Y , and hence E(Y |X)” “we know X, and so we know functions of X, a(X) and b(X)” (2) E[E(Y |X)] = E(Y ): law of iterated expectation “the average of average of Y given X is the same as the simple average of Y ” (3) If E(Y |X) = E(Y ), then Cov(X, Y ) = 0, and Corr(X, Y ) = 0 “if knowing X doesn’t help to know Y , then X and Y are uncorrelated”

29 / 30

slide-62
SLIDE 62

X and Y are random variables. Properties of Conditional Expectation: (1) E[a(X)Y + b(X)|X] = a(X)E(Y |X) + b(X) “we know X, but we don’t know Y , and hence E(Y |X)” “we know X, and so we know functions of X, a(X) and b(X)” (2) E[E(Y |X)] = E(Y ): law of iterated expectation “the average of average of Y given X is the same as the simple average of Y ” (3) If E(Y |X) = E(Y ), then Cov(X, Y ) = 0, and Corr(X, Y ) = 0 “if knowing X doesn’t help to know Y , then X and Y are uncorrelated”

29 / 30

slide-63
SLIDE 63

Reading

Appendix B, Introductory Econometrics - A Modern Approach, 4th Edition, J. Wooldridge

30 / 30