Example: Bayes rule A drug test proposed by a company tests positive - - PowerPoint PPT Presentation

example bayes rule
SMART_READER_LITE
LIVE PREVIEW

Example: Bayes rule A drug test proposed by a company tests positive - - PowerPoint PPT Presentation

Example: Bayes rule A drug test proposed by a company tests positive 99% of the time on drug consumers, and it tests negative 99% of the time on non- consumers. Lets say the drug is consumed by 0.5% of the people. If a person tests


slide-1
SLIDE 1

Example: Bayes rule

 A drug test proposed by a company tests positive 99%

  • f the time on drug consumers, and it tests negative

99% of the time on non-consumers. Let’s say the drug is consumed by 0.5% of the people. If a person tests positive for the drug, what is the probability (s)he is a drug consumer?

 Let C = event that a person is a drug consumer.  Let + = event that a person tests positive.

1

slide-2
SLIDE 2

Example: the false positive paradox

% 22 . 33 995 . * 01 . 005 . * 99 . 005 . * 99 . ) | ( 995 . * ) 99 . 1 ( 005 . * 99 . ) ( ) | ( ) ( ) | ( ) ( ) ( / ) ( ) | ( ) | ( 99 . ) | ( 995 . ) ( , 005 . ) (                     C P C P C P C P C P P P C P C P C P C P C P C P

c c c

An individual testing positive is most likely not a consumer – despite the apparent (99%) accuracy of the test! This is because the number of drug consumers is small and hence the factor 0.995 outweighs the consumer probability! This is called the false positive paradox. For fewer false positives, we need more than 99% accuracy on non-consumers (Eg: 99.99%). More such examples on wikipedia.

2

slide-3
SLIDE 3

The Birthday Paradox!

 Given n people in a room, what should be the least

value of n such that the probability that at least 2 people in the room share the same birthday is 99.9%?

 Each person can have his/her birthday on any of the

365 days. For n people, there are 365n outcomes.

 The number of outcomes resulting in no two people

sharing a birthday is (365)(364)(363)…(365-n+1).

3

slide-4
SLIDE 4

The Birthday Paradox!

 So required probability is

1-(365)(364)(363)…(365-n+1)/(365)n = 0.999 (given)

 This is satisfied for n as small as 70.  For n = 20, it is around 41%.  For n = 40, it is around 89%.  For more information see the wikipedia article on the

birthday paradox.

4

slide-5
SLIDE 5

Fall 2017 Instructor: Ajit Rajwade

Random Variables

5

slide-6
SLIDE 6

Topic Overview

 Random variable: definition  Discrete and continuous random variables  Probability density function (pdf) and cumulative

distribution function (cdf)

 Joint and conditional pdfs  Expectation and its properties  Variance and covariance  Markov’s and Chebyshev’s inequality  Weak law of large numbers  Moment generating functions

6

slide-7
SLIDE 7

Random variable

 In many random experiments, we are not always

interested in the observed values, but in some numerical quantity determined by the observed values.

 Example: we may be interested in the sum of the

values of two dice throws, or the number of heads appearing in n consecutive coin tosses.

 Any such quantities determined by the results of

random experiments are called as random variables (they may also be the observations themselves).

7

slide-8
SLIDE 8

Random variable

Value of X (Denoted as x) where X = sum

  • f 2 dice throws

P(X=x) 2 1/36 3 2/36 4 3/36 5 4/36 6 5/36 7 6/36 8 5/36 9 4/36 10 3/36 11 2/36 12 1/36 This is called the probability mass function (pmf) table of the random variable X. If S is the sample space, then P(S) = P(union of all events of the form X = x) = 1 (verify from table).

8

slide-9
SLIDE 9

Random variable: Notation

 A random variable is usually denoted by an upper case

alphabet.

 Individual values the random variable can acquire are

denoted by lower case.

9

slide-10
SLIDE 10

Random variable: discrete

 Random variables whose values can be written as a

finite or infinite sequence are called discrete random variables.

 Example: results of coin toss or random dice

experiments

 The probability that a random variable X takes on

value x, i.e. P(X=x), is called as the probability mass function.

10

slide-11
SLIDE 11

Random variable: continuous

 Random variables that can take on values within a

continuum are called continuous random variables.

 Example: the dimensions (length, height, width,

weight) of an object are usually continuous quantities, direction of a vector, amount of water that can be stored in a 4 litre jar is a continuous random variable in the interval [0,4].

1 4

11

slide-12
SLIDE 12

Random variable: continuous

 For a continuous random variable, the probability that

it takes on any particular value within a continuum is zero!

 Why? Because there are infinitely many values – say

in the interval [0,4] in the example on the previous

  • slide. Each value will be equally likely.

 Note: Zero probability in case of continuous random

variables does not mean the event will never occur! This differs from the discrete case.

12

slide-13
SLIDE 13

Random variable: continuous

 Hence for a continuous random variable X, we

consider the cumulative distribution function (cdf) FX(x) defined as P{X ≤ x}.

 The cdf is basically the probability that X takes on a

value less than or equal to x.

 The cdf can be used to compute cumulative interval

measures, that is the probability that X takes on a value greater than a and less than or equal to b, i.e. P(a < X ≤ b) = FX (b) -FX (a ).

13

slide-14
SLIDE 14

Random variable: continuous - example

 Consider a cdf of the form:

FX (x) = 0 for x ≤ 0, and FX (x) = 1-exp(-x2) otherwise

 To find: probability that X exceeds 1  P(X > 1) = 1-P(X ≤1)=1-FX (1) = e-1

14

slide-15
SLIDE 15

Probability Density Function (pdf)

 The pdf of a random variable X at a value x is the

derivative of its cumulative distribution function (cdf) at that value x.

 It is a non-negative function fX(x) such that for any set

B of real numbers, we have

 Properties:

 

B X

dx x f B X P ) ( } {

) ( ) ( ) ( ) ( ) ( ) ( 1 ) (         

  

   a a X X X b a X X

dx x f a X P a F b F dx x f b X a P dx x f

15

slide-16
SLIDE 16

The area beneath the blue curve in between the lines x = a and x = b is the cumulative interval measure P(a < X ≤ b) = FX (b) -FX (a ). fX(a)dx = probability that the random variable X takes on values between a and a+dx. a b x fX(x) dx

16

slide-17
SLIDE 17

Probability Density Function

 Another way of looking at this concept:

     

  

} 2 / 2 / { lim ) ( ) ( ) ( } 2 / 2 / {

2 / 2 /

          

  

a X a P a f a f dx x f a X a P

X a a X

17

slide-18
SLIDE 18

Examples: Popular families of PDFs

 Gaussian (normal) pdf:

) 2 /( ) (

2 2

2 1 ) (

 

 

 

x X

e x f

18

slide-19
SLIDE 19

Examples: Popular families of PDFs

 Bounded uniform pdf:

  • therwise

, ) ( 1 ) (      b x a a b x f X

19

slide-20
SLIDE 20

Expected Value (Expectation) of a random variable

 It is also called the mean value of the random variable.  For a discrete random variable X, it is defined as:  For a continuous random variable X, it is defined as:  The expected value should not be (mis)interpreted to be the

value that X usually takes on – it’s the average value, not the “most frequently occurring value”.

 

i i i

x X P x X E ) ( ) (

  

 dx x xf X E

X

) ( ) (

20

slide-21
SLIDE 21

Expected Value (Expectation) of a random variable

 For some pdfs, the expected value is not always

defined, i.e. the integral below may not have a finite value.

 One example is the pdf for the Pareto distribution

(under some parameters) given as:

  

 dx x xf X E

X

) ( ) ( 1 if 1 ) (

  • therwise

, for ) , | (

1 1

              

  

    

   

m

x m m m m X

x x X E x x x x x x f

xm and α are parameters

  • f the pdf for the Pareto
  • distribution. Verify this

result for E(X) on your

  • wn.

21

slide-22
SLIDE 22

Expected Value (Expectation) of a random variable

 Likewise for some discrete random variables which

take on infinitely many values, the expected value may not be defined, i.e. we may have

 Example:

    

i i i

x X P x X E ) ( ) (

2 1 2 1 1 1 2

/ 6 if 1 / ) ( : Note / 1 ) ( ) ( , 1 for / ) (              

   

        

k x k x X P x k x X xP X E Z x x x k x X P

x x x x

See here. See here.

22

slide-23
SLIDE 23

Expected Value: examples

 The expected value that shows up when you throw a

die is 1/6(1+2+3+4+5+6) = 3.5.

 The game of roulette consists of a ball and wheel with

38 numbered pockets on its side. The ball rolls and settles on one of the pockets. If the number in the pocket is the same as the one you guessed, you win $35 (probability 1/38), otherwise you lose $1 (probability 37/38). The expected value of the amount you earn after one trial is: (-1)37/38 +(35)1/38 = $-0.0526

23

slide-24
SLIDE 24

A Game of Roulette https://en.wikipedia.org/wiki/Roulette#/media/File:Roulette_casino.JPG

24

slide-25
SLIDE 25

Expected value of a function of random variable

 Consider a function g(X) of a discrete random variable

  • X. The expected value of g(X) is defined as:

 For a continuous random variable, the expected value

  • f g(X) is defined as:

 

i i i

x X P x g X g E ) ( ) ( )) ( (

  

 dx x f x g X g E

X

) ( ) ( )) ( (

25

slide-26
SLIDE 26

Properties of expected value

? )) ( ( ) ( ) ( ) ( ) ( ) ) ( ( ) ) ( ( why b X g aE dx x bf dx x f x ag dx x f b x ag b X ag E

X X X

         

  

          

This property is called the linearity of the expected value. In general, a function f(x) is said to be linear in x is f(ax+b) = af(x)+b where a and b are

  • constants. In this case, the expected value is not a function but an operator

(it takes a function as input). An operator E is said to be linear if E(af(x) + b) = a E(f(x)) + b.

26

slide-27
SLIDE 27

Properties of expected value

Suppose you want to predict the value of a random variable with a known

  • mean. On an average, what value will yield the least squared error?

.

  • f

mean the be Let minimized. is ) )

  • that E((

such find want to We value. predicted its be and variable random the be Let

2

X c X c c X 

The expected value is the value that yields the least mean squared prediction error!

)) )( ( 2 ) ( ) (( ) ) (( Then

2 2 2 2

c X c X E c X E ) E((X-c)                  ) ) (( ) ( ) ) (( )) )( (( 2 ) ) (( ) ) ((

2 2 2 2 2

                     X E c X E c X E c E X E

27

slide-28
SLIDE 28

The median

 What minimizes the following quantity?

  

  dx x f c x c J

X

) ( | | ) (

 

  

   

c X c X

dx x f c x dx x f c x c F ) ( | | ) ( | | ) (

 

  

   

c X c X

dx x f c x dx x f x c ) ( ) ( ) ( ) (

   

     

   

c X c X c X c X

dx x cf dx x xf dx x xf dx x cf ) ( ) ( ) ( ) ( )) ( 1 ( ) ( ) ( ) ( c F c dx x xf dx x xf c cF

X c X c X X

    

 

  

28

slide-29
SLIDE 29

The median

)) ( 1 ( ) ( ) ( ) ( ) ( c F c dx x xf dx x xf c cF c J

X c X c X X

    

 

  

) ( ) ( )) ( 1 ( ) ( ) ( ) ( ) ( x xf x q c F c dx x q dx x q c cF c J

X X c c X

     

 

  

) ( ) ( ) ( 2 ) ( 2 ) ) ( ) ( ( )) ( 1 ( )) ( ) ( ( )) ( ) ( ( ) ( ) (                 

Q Q c Q c c cF dx x xf x Q c F c c Q Q Q c Q c cF c J

X X X X

In this derivation, we are assuming that the two definite integrals of q(x) exist! This proof won’t go through

  • therwise.

29

slide-30
SLIDE 30

The median

2 / 1 ) ( 1 ) ( 2 ) ( 2 1 ) ( 2 ) ( 2 ) ( 2 1 ) ( 2 ) ( 2 ) ( '                 c F c F c cf c F c cf c q c F c cf c J

X X X X X X X

) ( ) ( ) ( 2 ) ( 2 ) (        Q Q c Q c c cF c J

X

This is the median – by definition and it minimizes J(c). We can double check that J’’(c) >= 0. Notice the peculiar definition of the median for the continuous case here! This definition is not conceptually different from the discrete case, though. Also, note that the median will not be unique if FX is not differentiable at c. This happens when FX is not strictly increasing in some interval – say K = [c,c+ε] or [c-ε,c]. In such cases, all y ϵ K will qualify as medians and all of them will produce the same value of J(y). This is because fx(y) = 0 for y ϵ K.

30

slide-31
SLIDE 31

Variance

 The variance of a random variable X tells you how much

its values deviate from the mean – on an average.

 The definition of variance is:  The positive square-root of the variance is called the

standard deviation.

 Low-variance probability mass functions or probability

densities tend to be concentrated around one point. High variance densities are spread out.

dx x f x X E X Var

X

) ( ) ( ] ) [( ) (

2 2

 

  

   

31

slide-32
SLIDE 32

Existence?

 For some distributions, the variance (and hence

standard deviation) may not be defined, because the integral may not have a finite value.

 Example: Pareto distribution (see slides on expectation

for definition) for α < 2.

 Note in some cases the mean is defined, but the

variance is not. In some cases both are undefined. However, if the mean is undefined, then the variance will be undefined too (why?).

32

slide-33
SLIDE 33

Variance: Alternative expression

 The definition of variance is:  Alternative expression:

2 2 2 2 2 2 2 2 2 2 2 2

]) [ ( ] [ ] [ ? 2 ] [ ] [ 2 ] [ ] 2 [ ] ) [( ) ( X E X E X E why X E X E X E X X E X E X Var                           dx x f x X E X Var

X

) ( ) ( ] ) [( ) (

2 2

 

  

   

33

slide-34
SLIDE 34

Variance: properties

 Property:

) ( ] ) [( ] ) ( [ ] )) ( [( ] )) ( [( ) (

2 2 2 2 2 2 2

X Var a X E a X a E b a b aX E b aX E b aX E b aX Var                 

34

slide-35
SLIDE 35

Probabilistic inequalities

 Sometimes we know the mean or variance of a random

variable, and want to guess the probability that the random variable can take on a certain value.

 The exact probability can usually not be computed as

the information is too less. But we can get upper or lower bounds on this probability which can influence

  • ur decision-making processes.

35

slide-36
SLIDE 36

Probabilistic inequalities

 Example: Let’s say the average annual salary offered to a

CSE Btech-4 student at IITB is $100,000. What’s the probability that you (i.e. a randomly chosen student) will get an offer of $110,000 or more? Additionally, if you were told that the variance of the salary was 50,000, what’s the probability that your package is between $90,000 and $110,000?

36

slide-37
SLIDE 37

Markov’s inequality

 Let X be a random variable that takes only non-

negative values. For any a > 0, we have

 Proof: next slide

a X E a X P / ] [ } {  

37

slide-38
SLIDE 38

Markov’s inequality

 Proof:

  

 

  

a X a X X

dx x xf dx x xf dx x xf X E ) ( ) ( ) ( ] [

a X

dx x xf ) (

a X

dx x af ) ( } { ) ( a X aP dx x f a

a X

   

38

a X E a X P / ] [ } {   

slide-39
SLIDE 39

Chebyshev’s inequality

 For a random variable X with mean μ and variance σ2,

we have for any value k > 0,

 Proof: follows from Markov’s inequality

2 2

} | {| k k X P     

2 2 2 2 2 2 2 2 2

/ } | {| / / ] ) [( } ) {( variable random negative

  • non

a is ) ( k k X P k k X E k X P X                 

39

slide-40
SLIDE 40

Chebyshev’s inequality: another form

 For a random variable X with mean μ and variance σ2,

we have for any value k > 0,

 If I replace k by kσ, I get the following:

2 2

} | {| k k X P     

2

1 } | {| k k X P     

40

slide-41
SLIDE 41

Back to counting money! 

 Let X be the random variable indicating the annual

salary offered to you when you reach Btech-4 

 Then

% 90 9090 . 110 100 } 110 {     K K K X P % 5 . 99 % 05 . 1 } 10 | 100 {| % 05 . 0005 . 10 10 50 } 10 | 100 {|             K K X P K K K K K X P

41

slide-42
SLIDE 42

Back to the expected value

 When I tell you that the expected value of a random

die variable is 3.5, what does this mean?

 If I throw the die n times, and average the results, I

should get a value close to 3.5 provided n is very large (not valid if n is small).

 As n increases, the average value should move closer

and closer towards 3.5.

 That’s our basic intuition!

42

slide-43
SLIDE 43

https://en.wikipedia.org/wiki/Law_of_large_numbers

43

slide-44
SLIDE 44

Back to the expected value: weak law of large numbers

 This intuition has a rigorous theoretical justification in

a theorem known as the weak law of large numbers.

 Let X1, X2,…,Xn be a sequence of independent and

identically distributed random variables each having mean μ. Then for any ε > 0, we have:

        n n X X X P

n

as } | ... {|

2 1

 

44

slide-45
SLIDE 45

Back to the expected value: weak law of large numbers

 Let X1, X2,…,Xn be a sequence of independent and

identically distributed random variables each having mean μ. Then for any ε > 0, we have:

 Proof: follows immediately from Chebyshev’s

inequality

        n n X X X P

n

as } | ... {|

2 1

 

2 2 2 1 2 2 2 2 1 2 1

} | ... {| , ) ... ( , ) ... (        n n X X X P n n n n X X X Var n X X X E

n n n

                } | ... {| lim

2 1

      

 

  n X X X P

n n

Empirical (or sample) mean

45

slide-46
SLIDE 46

The strong law of large numbers

 The strong law of large numbers states the following:  This is stronger than the weak law because this states that

the probability of the desired event (that the empirical mean is equal to the actual mean) is equal to 1 given enough

  • samples. The weak laws states that it tends to 1.

 The proof of the strong law is formidable and beyond the

scope of our course.

1 ) ... (lim

2 1

    

 

 n X X X P

n n

46

slide-47
SLIDE 47

(The incorrect) Law of averages

 As laymen we tend to believe that if something has

been going wrong for quite some time, it will suddenly turn right – using the law of averages.

 This supposed law is actually a fallacy – it reflects

wishful thinking, and the core mistake is that we mistake the distribution of samples among a small set

  • f outcomes for the distribution of a larger set.

 This is also called as Gambler’s fallacy.

47

slide-48
SLIDE 48

(The incorrect) Law of averages

 Let’s say a gambler independently tosses an unbiased

coin 20 times, and gets a head each time. He now applies the “law of averages” and believes that it is more likely that the next coin toss will yield a tail.

 The mistake is as follows: The probability of getting

all 21 heads = (1/2)21. The probability of getting 20 heads and 1 tail also = (1/2)21.

48

slide-49
SLIDE 49

Joint distributions/pdfs/pmfs

49

slide-50
SLIDE 50

Jointly distributed random variables

 Many times in statistics, one needs to model

relationships between two or more random variables – for example, your CPI at IITB and the annual salary

  • ffered to you during placements!

 Another example: average amount of sugar consumed

per day and blood sugar level recorded in a blood test.

 Another example: literacy level and crime rate.

50

slide-51
SLIDE 51

Joint CDFs

 Given continuous random variables X and Y, their joint

cumulative distribution function (cdf) is defined as:

 The distribution of either random variable (called as

marginal cdf) can be obtained from the joint distribution as follows:

 These definitions can extended to handle more than two

random variables as well.

) , ( ) , ( y Y x X P y x FXY    ) , ( ) , ( ) ( ) , ( ) , ( ) ( y F y Y X P y F x F Y x X P x F

XY Y XY X

           

I’ll explain this a few slides further down

51

slide-52
SLIDE 52

Joint PMFs

 Given two discrete random variables X and Y, their

joint probability mass function (pmf) is defined as:

 The pmf of either random variable (called as marginal

pmf) can be obtained from the joint distribution as follows: ) , ( ) , (

j i j i XY

y Y x X P y x p   

 

       

j j i j j i j j i i

y x p y Y x X P y Y x X P x X P ) , ( } , { ) } , { ( } {

Why?

52

slide-53
SLIDE 53

Joint PMFs: Example

 Consider that in a city 15% of the families are childless, 20% have

  • nly one child, 35% have two children and 30% have three
  • children. Let us suppose that male and female child are equally

likely and independent.

 What is the probability that a randomly chosen family has no

children?

 P(B = 0, G = 0) = 0.15 = P(no children)  Has 1 girl child?  P(B=0,G=1)=P(1 child) P(G=1|1 child) = 0.2 x 0.5 = 0.1  Has 3 girls?  P(B = 0, G = 3) = P(3 children) P(G=3 | 3 Children) = 0.3 x (0.5)3  Has 2 boys and 1 girl?  P(B = 2, G = 1) = P(3 children) P(B = 2, G = 1| 3 children) = 0.3 x

(1/8) x 3 = 0.1125 (all 8 combinations of 3 children are equally

  • likely. Out of these there are 3 of the form 2 boys + 1 girl)

53

slide-54
SLIDE 54

Joint PDFs

 For two jointly continuous random variables X and Y,

the joint pdf is a non-negative function fXY(x,y) such that for any set C in the two-dimensional plane, we have:

 The joint CDF can be obtained from the joint PDF as

follows:



 

C y x XY

dxdy y x f C Y X P

) , (

) , ( } ) , {(

b y a x XY XY a b XY XY

y x F y x b a f dxdy y x f b a F

     

      

, 2

| ) , ( ) , ( ) , ( ) , (

54

slide-55
SLIDE 55

X Y The joint probability that (X,Y) belongs to any arbitrary-shaped region in the XY-plane is

  • btained by integrating the joint pdf of (X,Y)
  • ver that region (eg: region C)

C

55

slide-56
SLIDE 56

Joint and marginal PDFs

 The marginal pdf of a random variable can be

  • btained by integrating the joint pdf w.r.t. the other

random variable(s):

 

     

  dx y x f y f dy y x f x f

XY Y XY X

) , ( ) ( ) , ( ) ( ) , ( ) ( ) , ( ) ( ) , ( ) (     

   

         

a F a F dx dy y x f dx x f dy y x f x f

XY X a a XY X XY X

56

slide-57
SLIDE 57

Independent random variables

 Two continuous random variables are said to be

independent if and only if: i.e., the joint pdf is equal to the product of the marginal pdfs.

 For independent random variables, the joint CDF is

also equal to the product of the marginal CDFs:

) ( ) ( ) , ( , , y f x f y x f y x

Y X XY

   ) ( ) ( ) , ( y F x F y x F

Y X XY

Try proving this yourself!

57

slide-58
SLIDE 58

Independent random variables

 Some n continuous random variables X1, X2, …, Xn are said to be

mutually independent if and only if for any finite subset of k random variables Xi1, Xi2,…, Xik and finite sequence of number x1, x2,…, xk , the events Xi1 ≤ x1, Xi2 ≤ x2,…, Xik ≤ xk are mutually independent.

 As a consequence

i.e., the joint pdf is equal to the product of all n marginal pdfs.

 Note that this condition is stronger than pairwise independence!

) ( )... ( ) ( ) ,..., , ( , ,..., ,

2 1 2 1 ,..., , 2 1

2 1 2 1

n X X X n X X X n

x f x f x f x x x f x x x

n n

  ) ( ) ( ) , ( , , 1 , 1 ), , (

, j X i X j i X X j i

x f x f x x f j i n j n i x x

j i j i

      

58

slide-59
SLIDE 59

Independent random variables

 Mutual independence between n random variables

implies that they are pairwise independent, or in fact, k-wise independent for any k < n.

 But pairwise independence does not necessarily imply

mutual independence.

 Example: Consider a sample space {1,2,3,4} where

each singleton element is equally likely to be chosen.

59

slide-60
SLIDE 60

Independent random variables

 Consider A = {1,2}, B = {1,3}, C = {1,4}.  Then P(A) = P(B) = P(C) = 1/2. P(ABC) = P({1}) = ¼

≠ P(A)P(B)P(C) implying that A,B,C are not mutually independent.

 But P(AB) = ¼ = P(A)P(B) and likewise for AC, BC.

60

slide-61
SLIDE 61

Concept of covariance

 The covariance of two random variables X and Y is

defined as follows:

 Further expansion:

)] )( [( ) , (

Y X

Y X E Y X Cov     

] [ ] [ ] [ ] [ ? ] [ ] [ )] )( [( ) , ( Y E X E XY E XY E why XY E X Y XY E Y X E Y X Cov

Y X Y X X Y Y X Y X Y X Y X

                               

61

slide-62
SLIDE 62

Concept of covariance: properties

 Cov(X,Y) = Cov(Y, X)  Cov(X, X) = Var(X) [verify this yourself!]  Cov(aX,Y) = aCov(X,Y) [prove this!]  Relationship with correlation coefficient:

) ( ) ( ) , ( ) , ( Y Var X Var Y X Cov Y X r 

62

slide-63
SLIDE 63

Concept of covariance: properties

) , ( ) , ( ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] ) [( ) , ( : Pr ) , ( ) , ( ) , ( Y Z Cov Y X Cov Y E Z E ZY E Y E X E XY E Y E Z E Y E X E ZY XY E Y E Z X E Y Z X E Y Z X Cov

  • of

Y Z Cov Y X Cov Y Z X Cov                  

    

 

i j j i j j i i i i i i

Y X Cov Y X Cov Y X Cov Y X Cov ) , ( ) , ( ) , ( ) , (

Try proving this yourself! Along similar lines as the previous one.

63

slide-64
SLIDE 64

Concept of covariance: properties

    

 

i j j i j j i i i i i i

Y X Cov Y X Cov Y X Cov Y X Cov ) , ( ) , ( ) , ( ) , ( ) , ( ) ( ) , ( ) , ( ) , ( ) , ( ) (

j i i j i i i j i i j i i i i j i j i i i i i i i

X X Cov X Var X X Cov X X Cov X X Cov X X Cov X Var

       

 

     

Notice that the variance of the sum of random variables is not equal to the sum of their individual variances. This is quite unlike the mean!

64

slide-65
SLIDE 65

Concept of covariance: properties

 For independent random variables X and Y, Cov(X,Y) =

0, i.e. E[XY] = E[X]E[Y].

 Proof:

] [ ] [ } { } { } { } { } , { ] [ Y E X E y Y P y x X P x y Y P x X P y x y Y x X P y x XY E

j j j i i i j i j i j i j i j i j i

         

   

] [ ] [ ] [ ] [ ] [ ] [ )] )( [( ) , (           Y E X E XY E Y E X E XY E Y X E Y X Cov

Y X Y X Y X

     

65

slide-66
SLIDE 66

Concept of covariance: properties

 Given random variables X and Y, Cov(X,Y) = 0 does

not necessarily imply that X and Y are independent!

 Proof: Construct a counter-example yourself!

66

slide-67
SLIDE 67

Conditional pdf/cdf/pmf

 Given random variables X and Y with joint pdf fXY(x,y),

then the conditional pdf of X given Y = y is defined as follows:

 Conditional cdf FX|Y(x,y):

 

) | ( ) ( ) , ( ) | (

| |

y x F x y f y x f y x f

Y X Y XY Y X

   

 

    

      

x Y Y X x Y X Y X

dz y f y z f dz y z f y Y y x X P y x F ) ( ) , ( ) | ( ) | ( lim ) | (

, | |

http://math.arizona.edu/~jwatkins/m-conddist.pdf 67

slide-68
SLIDE 68

Conditional pdf/cdf/pmf

 Conditional cdf FX|Y(x,y):

) ( ) / ) , ( ( ) ( ) / ) , ( ( )) ( ) ( ( ) , ( ) , ( ) ( ) , ( ) | ( y f y y x F y f y y x F y F y F y x F y x F y Y y P y Y y x X P y Y y x X P

Y XY Y XY Y Y XY XY

                             

http://math.arizona.edu/~jwatkins/m-conddist.pdf

 

) ( ) , ( ) | ( ) | (

| |

y f y x f y x F x y x f

Y XY Y X Y X

    1 ) ( ) , ( ) | (

|

 

     

  dx y f y x f dx y x f

Y XY Y X

68

slide-69
SLIDE 69

Conditional mean and variance

 Conditional densities or distributions can be used to

define the conditional mean (also called conditional expectation) or conditional variance as follows:

 

     

      dx y x f y Y X E x y Y X Var dx y x xf y Y X E

Y X Y X

) | ( )) | ( ( ) | ( ) | ( ) | (

| 2 |

69

slide-70
SLIDE 70

Example

. given

  • f

mean l conditiona Find . given

  • f

density l conditiona Find

  • therwise

1 , 1 ), 2 ( 4 . 2 ) , ( y Y X y Y X y x y x x y x f          

70

slide-71
SLIDE 71

Moment Generating Functions

71

slide-72
SLIDE 72

Definition

 The moment of random variable X of order n is

defined as follows mn = E(Xn).

 The moment generating function (MGF) of a random

variable X is defined as follows:

 

r.v.) s (continuou ) ( r.v.) (discrete ) ( ) ( dx e x f x X P e e E t

tx X x tx tX X

 

  

    

72

slide-73
SLIDE 73

Why is it so called?

 Because of:

 

... ! 3 / ! 2 / 1 ) ( .... ! 3 / ) ( ! 2 / ) ( 1

3 3 2 2 1 3 2

           m t m t tm e E t tX tX tX e

tX X tX

 

1 ,   i X E m

i i

73

slide-74
SLIDE 74

Key property

 Differentiating the MGF w.r.t. the parameter t yields

the different moments of X.

     

) ( ) ( ) ( ) (

' '

X E Xe E e dt d E e E dt d t

X tX tX tX X

           

   

) ( ) ( .... ) ( ) ( ) ( ) (

) ( 2 ) 2 ( 2 ) 2 ( n n X X tX tX X

X E X E e X E Xe E dt d t       

74

slide-75
SLIDE 75

Other properties

 If Y = aX+b, then we have:  If Y and X are independent, then:  Let X and Y be random variables. Let Z be a third r.v.

which is equal to X with probability p, and equal to Y with probability 1-p. Then we have:

) ( ) ( at e t

X tb Y

   ) ( ) ( ) ( t t t

Y X Y X

   

) ( ) 1 ( ) ( ) ( t p t p t

Y X Z

     

75

slide-76
SLIDE 76

Uniqueness

 For a discrete random variable with finite range, the

MGF and PMF uniquely determine each other.

 Proof:

MGF. determines uniquely PMF ) ( ) ( ) (    

x tX tX X

e x X p e E t  Mp φX    

) ( ) ( : have Then we well. as

  • f

values some Consider values. some

  • n

takes hat consider t converse, the prove To

1 X i n i x t k

x X P e t t n n X

i k

Vectors with n elements Matrix of size n x n

. determined uniquely is Hence . invertible it makes that form special a has matrix The

1 X

φ M p M

Proof here.

76

slide-77
SLIDE 77

Uniqueness: Another proof

 If two discrete random variables X and Y have MGFs

X(t) and Y(t) that both exist and X(t) = Y(t) for all t, then X and Y have the same probability density function.

 Proof for discrete random variables:

. all for ) ( ) ( Hence 0. iff ,

  • f

values all for be can polynomial The }. { ts coefficien with in polynomial a is This ) ( ) ( , where )) ( ) ( ( ) ( ) ( ) ( ) ( ) ( x x Y p x X p c s c s x Y p x X p c e s c s x Y p x X p e x Y p e y Y p e x X p e t t

x x x t x x x x tx x tx y ty x tx Y X

                      

    

 

77

slide-78
SLIDE 78

Uniqueness: Continuous case

 The uniqueness theorem is also applicable to

continuous random variables, although we do not prove it here.

78