Families of Random Variables Fall 2017 Instructor: Ajit Rajwade 1 - - PowerPoint PPT Presentation

families of random variables
SMART_READER_LITE
LIVE PREVIEW

Families of Random Variables Fall 2017 Instructor: Ajit Rajwade 1 - - PowerPoint PPT Presentation

Families of Random Variables Fall 2017 Instructor: Ajit Rajwade 1 Topic Overview We will study several useful families of random variables that arise in interesting scenarios in statistics. Discrete random variables- Bernoulli,


slide-1
SLIDE 1

Fall 2017 Instructor: Ajit Rajwade

Families of Random Variables

1

slide-2
SLIDE 2

Topic Overview

 We will study several useful families of random

variables that arise in interesting scenarios in statistics.

 Discrete random variables- Bernoulli, Binomial,

Poisson, multinomial, hypergeometric

 Continuous random variables- Gaussian, uniform,

exponential, chi-square

2

slide-3
SLIDE 3

Bernoulli Distribution

3

slide-4
SLIDE 4

Definition

 Let X be a random variable whose value is 1 when a

coin toss produces a heads, and a 0 otherwise. If p is the probability that the coin toss produces a heads, we have: P(X = 1) = p, P(X = 0) = 1-p

 This is called a Bernoulli pmf with parameter p –

named after Jacob Bernoulli. X is called a Bernoulli random variable.

 Note here: the coin need not be unbiased any longer!

4

slide-5
SLIDE 5

Properties

 E[X] = p(1) + (1-p)(0) = p.  Var(X) = p(1-p)2 + (1-p)(0-p)2 = p(1-p)  What’s the median?  What is (are) the mode(s)?  What is its MGF?

  • therwise

, 5 . if 1 , 5 . if 5 .   p p

  • therwise

, 5 . if 1 , 5 . if 0,1} {   p p

t

pe p   1

5

slide-6
SLIDE 6

Binomial Distribution

6

slide-7
SLIDE 7

Definition

 Let X be a random variable denoting the number

  • f heads in a sequence of n independent coin

tosses (or Bernoulli trials) having success probability (i.e. probability of getting a heads) p.

 Then the pmf of X is given as follows:  This is called the binomial pmf with parameters

(n,p).

i n i

p p i n C i X P

   ) 1 ( ) , ( ) (

7

slide-8
SLIDE 8

Defintion

 The pmf of X is given as follows:  Explanation: Consider a sequence of trials with i

successes and n-i failures. The probability that this sequence occurs is pi(1-p)n-i, by the product

  • rule. But there are C(n,i) such sequences – so we

add their individual probabilities using the sum rule. )! ( ! ! ) , ( ) 1 ( ) , ( ) ( i n i n i n C p p i n C i X P

i n i

    

8

slide-9
SLIDE 9

Definition

 Example: In 5 coin tosses, if we had two heads

and three tails, the possible sequences are: HHTTT, HTHTT, HTTHT, HTTTH, TTTHH, THTTH, TTHHT, TTTHH, THTHT,THHTT

 What’s the probability that a sequence of

Bernoulli trials produces a success only on the i- th trial? Note that this is not a binomial distribution.

9

slide-10
SLIDE 10

Definition

 The pmf of X is given as follows:  To verify it is a valid pmf:

)! ( ! ! ) , ( ) 1 ( ) , ( ) ( i n i n i n C p p i n C i X P

i n i

    

1 )) 1 ( ( ) 1 ( ) , ( ) (

1 1

      

 

   n n i i n i n i

p p p p i n C i X P

Binomial theorem!

10

slide-11
SLIDE 11

11

slide-12
SLIDE 12

Example 1

 Let’s say you design a smart self-driving car and you

have tested it thoroughly. You have determined that the probability that your car collides is p. You go for an international competition and the rules say that you will win a prize if in k difficult tests, your car collides at most once. What’s the probability that you will win the prize?

 Answer: X is the number of times your car collides. X

is a binomial random variable with parameters (k,p). The probability that you win a prize is P(X <= 1) = (1-p)k + C(k,1)p(1-p)k-1.

12

slide-13
SLIDE 13

Example 2

 At least half of an airplane’s engines need to function

for it to operate. If each engine independently fails with probability p, for what values of p is a 4-engine airplane more likely to operate than a 2-engine airplane?

 Answer: The number of functioning engines (X) is a

binomial random variable with parameters (4,p) or (2, p).

 In 4-engine case, the probability of operation is

P(X=2) + P(X=3) + P(X=4) = 6p2(1-p)2 + 4p(1-p)3+(1- p)4.

 In 2-engine case, it is P(X=1)+P(X=2) = 2p(1-p)+(1-

p)2. Do the math!

 Answer is for p < 1/3.

13

slide-14
SLIDE 14

Properties

 Mean: Recall that binomial random variable X is

the sum of random variables of the following form:

 Hence  Variance:

  • therwise

) prob. with

  • ccurs

(this success yields trial if 1   p i Xi np X E X E

n i i 

) ( ) (

1

) 1 ( ) ( ) ( ) (

1 1

p np X Var X Var X Var

n i i n i i

   

 

 

Notice how we are making use of the linearity of the expectation operator. These calculations would have been much harder had you tried to plug in the formulae for binomial distribution. 14

slide-15
SLIDE 15

Properties

 The CDF is given as follows:  MGF: Consider a binomial r.v. to be the sum of n

independent Bernoulli trials. Hence using a property of the MGF (which one?), the MGF of the binomial r.v. is given by (1-p+pet)n.

 

   

i k k n k X

p p k n C i X P i F ) 1 ( ) , ( ) ( ) (

15

slide-16
SLIDE 16

16

slide-17
SLIDE 17

Example

Bits are sent over a communications channel in packets of 12. If the probability of a bit being corrupted over this channel is 0.1 and such errors are independent, what is the probability that no more than 2 bits in a packet are corrupted? If 6 packets are sent over the channel, what is the probability that at least one packet will contain 3 or more corrupted bits? Let X denote the number of packets containing 3 or more corrupted bits. What is the probability that X will exceed its mean by more than 2 standard deviations?

17

slide-18
SLIDE 18

Example

 We want P(X = 0) + P(X=1) + P(X=2).  P(X=0) = (0.1)0(0.9)12 = 0.282  P(X=1) = C(12,1) (0.1)1(0.9)11 =0.377  P(X=2) = C(12,2) (0.1)2(0.9)10=0.23  So the answer is 0.889.

18

slide-19
SLIDE 19

Example

 The probability that 3 or more bits are corrupted is 1-

0.889 = 0.111

 Let Y = number of packets with 3 or more corrupted

  • bits. Then we want P(Y ≥ 1) = 1-P(Y=0)

= 1-C(6,0)(0.111)0 (0.889)6 = 0.4936.

 Mean of Y is μ = 6(0.111) = 0.666.  Standard deviation of Y is σ = [6(0.111)(0.889)]0.5 =

0.77.

 We want P(Y > μ + 2σ) = P(Y > 2.2) = P(Y ≥ 3) = 1-

P(Y=0)-P(Y=1)-P(Y=2) = ?

19

slide-20
SLIDE 20

Example

 We want P(Y > μ + 2σ) = P(Y > 2.2) = P(Y ≥ 3) =

1-P(Y=0)-P(Y=1)-P(Y=2) = ?

 P(Y=0)=C(6,0) (0.111)0(0.889)6 = 0.4936  P(Y=1) = C(6,1)(0.111)(0.889)5 = 0.37  P(Y=2) = C(6,2) (0.111)2(0.889)4 = 0.115  P(Y > μ + 2σ) = 1-(0.4936+0.37+0.115) = 0.0214

20

slide-21
SLIDE 21

Related distributions

 In a sequence of Bernoulli trials, let X be the random

variable for the trial number that gave the first success.

 Then X is called a geometric random variable and its

pmf is given as:

 Let X be the trial number for the kth success in a

sequence of Bernoulli trials. Then X is called a negative binomial random variable. What is its pmf?

1

) 1 ( ) (

  

i

p p i X P

k n k

p p k n C k X P

     ) 1 ( ) 1 , 1 ( ) (

21

slide-22
SLIDE 22

Properties: Mode

 Let X ~ Binomial (n,p).  Then P(X = k) ≥ P(X = k-1) ↔ k ≤ (n+1)p (prove this).  Also P(X = k) ≥ P(X = k+1) ↔ k ≥ (n+1)p-1 (prove

this).

 Any integer-valued k which satisfies both the above

conditions will be the mode (why?).

 If (n+1)p is an integer then that’s the mode of the

  • binomial. Otherwise the binomial has two modes –

and they are consecutive integers.

22

slide-23
SLIDE 23

Poisson distribution

23

slide-24
SLIDE 24

Definition – and genesis

 We have seen the binomial distribution before:  Here p is the success probability. We can express

it in the form

 Hence

i n i

p p i n i n i X P

    ) 1 ( )! ( ! ! ) ( trials in successes

  • f

number expected , n n p    

i n i

n n i n i n i X P

          ) 1 ( )! ( ! ! ) (  

24

slide-25
SLIDE 25

Definition – and genesis

 We have  In the limit when n→∞ and p→0 such that λ=np, we

have

 This is called as the Poisson pmf and the above

statement is called the Poisson limit theorem.

i n i

n n i i n n n n i X P

             ) 1 ( ! ) 1 )...( 2 )( 1 ( ) (  

  e i i X P

i

! ) (

25

slide-26
SLIDE 26

Definition

 The Poisson distribution is used to model the

number of successes of a long sequence of independent Bernoulli trials if the expected number of successes (i.e. λ) is known and constant.

 For a Poisson random variable, note that the

expected number of successes λ is constant and the parameter of the pmf. This is unlike the Binomial pmf for which the success probability p

  • f a single Bernoulli trial is constant and also a

parameter of the pmf.

26

slide-27
SLIDE 27

Notice how the binomial is resembling the Poisson with an increase in n. Also notice that np (which is approximately the mode of the binomial) is constant despite increase in n.

27

slide-28
SLIDE 28

Properties

 To double check that it is indeed a valid pmf, we

check that:

 The afore-mentioned analysis tells us that the

expected number of successes is equal to λ. To prove this rigorously – see next slide.

1 !  

   

   

e e i e

i i

Using Taylor’s series for exponential function about 0

28

slide-29
SLIDE 29

Properties

 The afore-mentioned analysis tells us that the

expected number of successes is equal to λ. To prove this rigorously:

         

  

! ) ( i ie X E

i i

         

  

!

1

i ie

i i

         

    1 1

)! 1 (

i i

i e  

   

  

          

    

e e j e

j j 0 !

29

slide-30
SLIDE 30

Properties

 Variance:  MGF:    

              

  

) ( ! ) ( )) ( ( ) ( ) (

2 2 2 2 2

X Var i e i X E X E X E X Var

i i

Detailed proof on the

  • board. Also see here.

) 1 (

! / ) ( ! / ) ( ) (

       

     

 

t t

e e i i t i i ti tX X

e e e i e e i e e e E t

    

 

30

slide-31
SLIDE 31

Properties

 Mode: integer. an is

  • ften,

that note

  • conditions

these both satisfies that an seek we Thus ) 1 ( ) ( and 1 ) 1 ( ) ( if mode a is not integer    k k k X P k X P k k X P k X P k             

31

slide-32
SLIDE 32

10 20 30 40 50 60 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 lambda = 1 lambda = 5 lambda = 10

Notice: the mean and variance both increase with lambda.

32

slide-33
SLIDE 33

Properties

 Consider independent Poisson random variables

X and Y having parameters λ1 and λ2 respectively. Then Z = X+Y is also a Poisson random variable with parameter λ1 + λ2.

 PMF – recurrence relation:

Detailed proof on the board and in tutorial 1.

  

  

   

             e X P i i X P i X P e i i X P e i i X P

i i

) ( , 1 ) ( ) 1 ( )! 1 ( ) 1 ( ! ) (

1

33

slide-34
SLIDE 34

Properties

 If X ~ Poisson (λ), P(Y|X=l) = Binomial(l,p) where

λ > 0 and 0 ≤ p ≤ 1, then Y ~ Poisson (λp). This is called as thinning of a Poisson random variable by a Binomial. We will cover this derivation in a tutorial.

34

slide-35
SLIDE 35

Poisson distribution: examples

 The number of misprints in a book (assuming the

probability p of a misprint is small, and the number n

  • f letters typed is very large, with np = expected

number of misprints remaining constant)

 Number of traffic rule violations in a typical city in the

USA (assuming the probability p of a violation is small, and the number of vehicles is very large).

 In general, the Poisson distribution is used to model

rare events, even though the event has plenty of “opportunities” to occur. (Sometimes called the law of rare events or the law of small numbers).

35

slide-36
SLIDE 36

Poisson distribution: examples

 Number of people in a country who live up to 100

years

 Number of wrong numbers dialed in a day  Number of laptops that fail on the first day of use  Number of photons of light counted by the

detector element of a camera

36

slide-37
SLIDE 37

Multinomial distribution

37

slide-38
SLIDE 38

Definition

 Consider a sequence of n independent trials each

  • f which will produce one out of k possible
  • utcomes, where the set of possible outcomes is

the same for each trial.

 Assume that the probability of each of the k

  • utcomes is known and constant and given by p1,

p2, …, pk.

38

slide-39
SLIDE 39

Definition

 Let X be a k-dimensional random variable for

which the ith element represents the number of trials that produced the ith outcome (also known as the number of successes for the ith category)

Eg: in 20 throws of a die, you had 2

  • nes, 4 twos, 7 threes, 4 fours, 1 five

and 2 sixes.

39

slide-40
SLIDE 40

Definition

 Then the pmf of X is given as follows:  This is called the multinomial pmf.

n x x x p p i p p p x x x n x X x X x X P x x x P

k n i i i x k x x k k k k

k

             

... , 1 , 1 , ... ! !... ! ! ) ,..., , ( )) ,..., , ( (

2 1 1 2 1 2 1 2 2 1 1 2 1

2 1

X

The number of ways to arrange n objects which can be divided into k groups of identical objects. There are x1 objects

  • f type 1, x2 objects of

type 2, and xk objects

  • f type k.

40

slide-41
SLIDE 41

Definition

 The success probabilities for each category, i.e.

p1, p2,…, pk are all parameters of the multinomial pmf.

 Remember: The multinomial random variable is a

vector whose ith component is the number of successes of the ith category (i.e. the number of times that the trials produced a result of the ith category).

41

slide-42
SLIDE 42

Properties

 Mean vector:  Variance of a component

i i k

np X E np np np E   ) ( ), ,..., , ( ) (

2 1

X

) 1 ( ) ( ) ( ) (

1 1 i i n j ij n j ij i

p np X Var X Var X Var    

 

 

Xij is a Bernoulli random variable which tells you whether or not there was a success in the ith category on the jth trial Assuming independent trials

42

slide-43
SLIDE 43

Properties

 For vector-valued random variables, the variance

is replaced by the covariance matrix. The covariance matrix C in this case will have size k x k, where we have:

) , ( )] )( [( ) , (

j i j j i i

X X Cov X X E j i C       page next :

  • of

Pr ), 1 ( , ) , ( j i p np j i p np X X Cov

i i j i j i

     

43

slide-44
SLIDE 44

j i j i n l jl il n l jl il n l jl il n k n k l l jl ik n k n l jl ik j i n l jl j n k ik i j i j i

p np p p X E X E X X E X X Cov X X Cov X X Cov X X Cov X j X X i X p np X X Cov                

       

         

) ( )) ( ) ( ) ( ( ) , ( ) , ( ) , ( ) , ( category in successes # category in successes # :

  • of

Pr ) , (

1 1 1 1 , 1 1 1 1 1

These are independent Bernoulli random variables – each representing the outcome

  • f a trial (indexed by k and l)

By linearity of covariance By independence of trials Since in a trial, success can be achieved

  • nly in one category

44

slide-45
SLIDE 45

MGF for a Multinomial

 For k = 2, the multinomial reduces to the

binomial.

 Let us derive the MGF for k = 3 (trinomial):

   

n t t n x x n x x x n x t x t x t n x x n x x t x x n x x X t X t x x n x x

p p e p e p p p e p e p x x n x x n e e p p p p x x n x x n e E t t p p p p x x n x x n P x x X X ) 1 ( ) 1 ( )! ( ! ! ! ) 1 ( )! ( ! ! ! ) ( ) , ( ) ( ) 1 ( )! ( ! ! ! ) ( ) , ( ), , (

2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1

2 1 1 1 2 2 1 2 2 1 1 2 2 1 1 2 1 1 2 1 2 1 2 2 1 1 2 1 2 1

                          

 

             X X t

x X x X

This follows from the multinomial theorem.

45

slide-46
SLIDE 46

MGF for a Multinomial

 Multinomial theorem:  For arbitrary k:

n k t k t t k k

p p p e p e p e p x x x X X X

k

) ... 1 ... ( ) ( ) ,..., , ( ), ,..., , (

1 2 1 1 2 1 1 2 1 1 2 1

1 2 1

   

           

t x X

X

 

    

   

m i k i n k k k m n m

i m

x k k k n x x x

1 ... 2 1 2 1

2 1

! !... ! ! ) ... (

46

slide-47
SLIDE 47

Hypergeometric distribution

47

slide-48
SLIDE 48

Sampling with and without replacement

 Suppose there are k objects each of a different type.  When you sample 2 objects from these with replacement,

you pick a particular object with probability 1/k, and you place it back (replace it).

 The probability of picking an object of another type is again

1/k.

 When you sample without replacement, the probability

that your first object was of so and so type is 1/k. The probability that your second object was of so and so type is now 1/(k-1) because you didn’t put the first object back!

48

slide-49
SLIDE 49

Definition

 Consider a set of objects of which N are of good

quality and M are defective.

 Suppose you pick some n objects out of these

without replacement.

 There are C(N+M,n) ways of doing this.  Let X be a random variable denoting the number

  • f good quality objects picked (out of a total of n).

49

slide-50
SLIDE 50

Definition

 There are C(N,i)C(M,n-i) ways to pick i good

quality objects and n-i bad objects.

 So we have  Such a random variable X is called a

hypergeometric random variable.

  • r

if ) , ( , ) , ( ) , ( ) , ( ) (          b a b b a C n i n M N C i n M C i N C i X P

50

slide-51
SLIDE 51

Properties

 Consider random variable Xi which has value 1 if

the ith trial produces a good quality object and 0

  • therwise.

 Now consider the following probabilities:

M N N X P M N N M N M M N N M N N M N N X P X X P X P X X P X P M N N X P

i

                          ) 1 ( general, In 1 1 1 ) ( ) | 1 ( ) 1 ( ) 1 | 1 ( ) 1 ( ) 1 (

1 1 2 1 1 2 2 1

51

slide-52
SLIDE 52

Properties

 Note that:

n i i

X X

1

M N nN X E X E

n i i

         

1

) (

   

    

         

n i n i j j i n i i n i i

X X Cov X Var X Var X Var

1 1 1 1

) , ( 2 ) ( ) ( M N NM X P X P X Var

i i i

      )) 1 ( 1 )( 1 ( ) (

Each Xi is a Bernoulli random variable with parameter p=N/(N+M).

52

slide-53
SLIDE 53

Properties

 Note that:

   

    

         

n i n i j j i n i i n i i

X X Cov X Var X Var X Var

1 1 1 1

) , ( 2 ) ( ) ( M N NM X P X P X Var

i i i

      )) 1 ( 1 )( 1 ( ) ( ) ( ) ( ) ( ) , (

j i j i j i

X E X E X X E X X Cov   M N N M N N X P X X P X X P X X P X X E

i i j i j j i j i

              1 1 ) 1 ( ) 1 | 1 ( ) 1 , 1 ( ) 1 ( ) (

53

slide-54
SLIDE 54

Properties

 Note that:

) ( ) ( ) ( ) , (

j i j i j i

X E X E X X E X X Cov   M N N M N N X P X X P X X P X X P X X E

i i j i j j i j i

              1 1 ) 1 ( ) 1 | 1 ( ) 1 , 1 ( ) 1 ( ) ( ) 1 ( ) ( ) 1 )( ( ) 1 ( ) , (

2 2

                  M N M N NM M N N M N M N N N X X Cov

j i

54

slide-55
SLIDE 55

Properties

 Note that:

) 1 ( ) ( ) 1 )( ( ) 1 ( ) , (

2 2

                  M N M N NM M N N M N M N N N X X Cov

j i

                   1 1 1 ) ( ) 1 ( ) ( ) 1 ( ) ( ) (

2 2 2

M N n M N nNM M N M N NM n n M N nNM X Var             1 1 1 ) 1 ( M N n p np

large very is/are and/or M N when ) 1 ( p np  

Recall: Each Xi is a Bernoulli random variable with parameter p=N/(N+M).

55

slide-56
SLIDE 56

Gaussian distribution

56

slide-57
SLIDE 57

Definition

 A continuous random variable is said to be

normally distributed with parameters mean μ and standard deviation σ if it has a probability density function given as:

 This pdf is symmetric about the mean μ and has

the shape of the “bell curve”.

) , ( as denoted 2 ) ( exp 2 1 ) (

2 2 2

      N            x x f

57

slide-58
SLIDE 58

58

https://upload.wikimedia.org/wikipedia/com mons/7/74/Normal_Distribution_PDF.svg

slide-59
SLIDE 59

Definition

 If μ=0 and σ=1, it is called the standard normal

distribution

 To verify that this is a valid pdf:

) 1 , ( as denoted 2 exp 2 1 ) (

2

N          x x f  1 2 ) 2 / 1 ( 1 2

2 2 2 2 2 2

2 0 0 ) ( 2

              

      

                     

dx e ds e rdr e rdrd e dxdy e dx e

x s r r y x x

    

This is a Gaussian pdf with mean 0 and standard deviation 1/sqrt(2). Thus we have verified that this particular Gaussian function is a valid pdf. You can verify that Gaussians with arbitrary mean and variance are valid pdfs by a change of variables. Note the change from (x,y) to polar coordinates (r,θ). x = r cos (θ) y = r sin (θ) 59

slide-60
SLIDE 60

Properties

 Mean:       

 

         

 

        

] [ ? ) ( 2 ) ( 2 1 ) (

) 2 /( ) ( ) 2 /( ) (

2 2 2

X E why dy e y dx e x X E

y x

60

slide-61
SLIDE 61

Properties

 Variance:

? 2 ) ( 2 1 ) ) ((

2 ) 2 /( ) ( 2 2 ) 2 /( ) ( 2 2

2 2 2

why dy e y dx e x X E

y x

      

 

        

      

 

61

slide-62
SLIDE 62

Properties

) , ( ~ then , if and ) , ( ~ If

2 2 2

    a b a N Y b aX Y N X   

Proof on board. And in the book.

62

slide-63
SLIDE 63

Properties

 Median = mean (why?)  Because of symmetry of the pdf about the mean  Mode = mean – can be checked by setting the

first derivative of the pdf to 0 and solving, and checking the sign of the second derivative.

 CDF for a 0 mean Gaussian with variance 1 – is

given by:

dx e x F x

x x X

  

  

) 2 /( ) (

2

2 1 ) ( ) ( 

63

slide-64
SLIDE 64

Properties

 CDF – it is given by:  It is closely related to the error function erf(x)

defined as:

 It follows that:

dx e x erf

x x

2

2 ) (  dx e x F x

x x X

  

  

) 2 /( ) (

2

2 1 ) ( ) (                 2 1 2 1 ) ( x erf x

Verify for yourself

64

slide-65
SLIDE 65

Properties

 For a Gaussian with mean μ and standard

deviation σ, it follows that:

 The probability that a Gaussian random variable

has values from μ-nσ to μ+nσ is given by:

                       2 1 2 1     x erf x                            2 2 2 1 2 2 1 ) ( ) ( n erf n erf n erf n n    

65

slide-66
SLIDE 66

Properties

 The probability that a Gaussian random variable

has values from μ-nσ to μ+nσ is given by:

           2 ) ( ) ( n erf n n

n Φ(n)-Φ(-n) 1 68.2% 2 95.4% 3 99.7% 4 99.99366% 5 99.9999% 6 99.9999998 % Hence a Gaussian random variable lies within ±3σ from its mean with more than 99% probability

66

slide-67
SLIDE 67

Properties

 MGF:

 

2 / exp ) (

2 2t

t t

X

    

Proof here.

67

slide-68
SLIDE 68

A strange phenomenon

 Let’s say you draw n = 2 values, called x1 and x2, from

a [0,1] uniform random distribution and compute: (where μ is the true mean of the uniform random distribution)

 You repeat this process some m=5000 times (say),

and then plot the histogram of the computed {yj},1 ≤ j ≤ m, values.

 Now suppose you repeat the earlier two steps with

larger and larger n.

             

 n x n y

n i i j 1

Sampling index = i, 1 ≤ i ≤ n Iteration index = j, 1 ≤ j ≤ m

68

slide-69
SLIDE 69

A strange phenomenon

Now suppose you repeat the earlier two steps with larger and larger n.

It turns out that as n grows larger and larger, the histogram starts resembling a 0 mean Gaussian distribution with variance equal to that of the sampling distribution (i.e. the [0,1] distribution).

Now if you repeat the experiment with samples drawn from any other distribution instead of [0,1] uniform random (i.e. you change the sampling distribution).

The phenomenon still occurs, though the resemblance may start showing up at smaller or larger values of n.

This leads us to a very interesting theorem called the central limit theorem.

Demo code: http://www.cse.iitb.ac.in/~ajitvr/CS215_Fall2017/CLT/

69

slide-70
SLIDE 70

Central limit theorem

 Consider X1, X2,…, Xn to be a sequence of

independent and identically distributed (i.i.d.) random variables each with mean μ and variance σ2 < ∞. Then as n→∞, the distribution (i.e. CDF) of the following quantity: converges to that of N(0, σ2). Or, we say Yn converges in distribution to N (0, σ2). This is called the Lindeberg- Levy central limit theorem.

             

 n x n Y

n i i n 1

70

slide-71
SLIDE 71

Central limit theorem: some comments

 Note that the random variables X1, X2,…, Xn must be

independent and identically distributed.

 Converge in distribution means the following:  There is a version of the central limit theorem that

requires only independence – and allows the random variables to belong to different distributions. This extension is called the Lindeberg Central Limit theorem, and is given on the next slide.

) / ( ) ( lim  z z Y P

n n

  

 

71

slide-72
SLIDE 72

Lindeberg’s Central limit theorem

 Consider X1, X2,…, Xn to be a sequence of

independent random variables each with mean μi and variance (σi)2 < ∞. Then as n→∞, the distribution of the following quantity: converges to that of N(0, 1), provided for every ε > 0

             

 

  n i i n i i i n

x Y

1 2 1

) (  

 

     

              

n i i n n n i s x i i n

s s x E

n i i

1 2 1 } | {| 2

, ] 1 . ) [( lim  

 

  • therwise

if 1 ) ( } 1 , { :     A x x X

A A

1 1

Indicator function

72

slide-73
SLIDE 73

Lindeberg’s Central limit theorem

 Informally speaking, the take home message from the

previous slide is that the CLT is valid even if the random variables emerge from different distributions.

 This provides a major motivation for the widespread

usage of the Gaussian distribution.

 The errors in experimental observations are often

modelled as Gaussian – because these errors often stem from many different independent sources, and are modelled as being weighted combinations of errors from each such source.

73

slide-74
SLIDE 74

Central limit theorem versus law of large numbers

 The law of large numbers says that

the empirical mean calculated from a large number of samples is equal to (or very close) to the true mean μ (of the distribution from which the samples were drawn).

 The central limit theorem says that

the empirical mean calculated from a large number of samples is a random variable drawn from a Gaussian distribution with mean equal to the true mean μ (of the distribution from which the samples were drawn).

Empirical mean can take any of these values!

74

source

slide-75
SLIDE 75

Central limit theorem versus law of large numbers

 Is this a contradiction?

75

slide-76
SLIDE 76

Central limit theorem versus law of large numbers

 The answer is NO!  Go and look back at the central limit theorem.

) , ( ~

2 1

  N              

n x n Y

n i i n

?) ( ) / , ( ~

2 1

why n n x

n i i

                

  N ) / , ( ~

2 1

n n x

n i i

  N              

This variance drops to 0 when n is very large! All the probability is now concentrated at the mean!

76

slide-77
SLIDE 77

Proof of Lindberg-Levy CLT using MGFs

 Consider the n i.i.d. random variables X1, X2,…,Xn

with mean  and variance σ2. Let their sum be Sn.

 Then we have to prove that:  For that we will prove that the MGF of Zn equals

the MGF of the standard normal distribution (i.e. exp(t2/2)) where dy e x n n S P

x y n n

    

            2 lim

2 /

2

        n n S Z

n n

 

77

slide-78
SLIDE 78

Proof of Lindberg-Levy CLT using MGFs

 By properties of the MGF, we have:  We need to prove that:

 

n X Z n X n S

n t t t t

n n

                

    

) ( ) ( ) ( 2 / log lim

2

t n t n

X n

              

 

 

) ( ) ( : Recall at e t

X tb Y

    

2 / exp ) ( , dev. std. and mean with r.v. Gaussian a for : Recall

2 2t

t t X

X

      

78

slide-79
SLIDE 79

Proof of Lindberg-Levy CLT using MGFs

 Labelling x = 1/ 𝑜 , we have: 2 2 ) ( ) ( 2 ) ( ) ( ' ' 2 ) / ( ' ) / ( ) / ( ) / ( ' ' ) / ( lim 2 ) / ( ) / ( ' lim 2 2 ) / ( ) / ( ' lim ) / ( log lim

2 2 2 2 2 2 2 2 2 2

t t X E X E t t tx x t tx tx t t tx x tx t x tx tx t x tx

X X X X X x X X x X X x X x

               

   

                         

L’Hospital’s rule Recall:

) ( , 1 ) ( ) ( ) ( ,

' ) (

    

X X r r X

X E r   

79

slide-80
SLIDE 80

Application

 Your friend tells you that in 10000 successive

independent unbiased coin tosses, he counted 5200

  • heads. Is (s)he serious or joking?

 Answer: Let X1, X2,…, Xn be the random variables

indicating whether or not the coin toss was a success (a heads).

 These are i.i.d. random variables whose sum is a

random variable with mean nμ=10000(0.5) = 5000 and standard deviation σn1/2 = sqrt(0.5(1- 0.5))sqrt(10000) = 50.

80

slide-81
SLIDE 81

Application

Your friend tells you that in 10000 successive independent unbiased coin tosses, he counted 5200 heads. Is (s)he serious or joking?

Answer: The given number of heads is 5200 which is 4 standard deviations away from the mean.

The chance of that occurring is of the order of 0.00001 (see the slide on error functions) since the total number of heads is a Gaussian random variable (as per central limit theorem).

So your friend is (most likely) joking.

Notice that this answer is much more principled than giving an answer purely based on some arbitrary threshold over |X-5000|.

You will study much more of this when you do a topic called hypothesis testing.

81

slide-82
SLIDE 82

Binomial distribution and Gaussian distribution

82

slide-83
SLIDE 83

Binomial distribution and Gaussian distribution

 The binomial distribution begins to resemble a

Gaussian distribution with an appropriate mean for large values of n.

 In fact this resemblance begins to show up for

surprisingly small values of n.

 Recall that a binomial random variable is the

number of successes of independent Bernoulli trials

else trial)

  • n

heads ( 1 ,

th 1

i X X X

i n i i

  

83

slide-84
SLIDE 84

Binomial distribution and Gaussian distribution

 Each Xi has a mean of p and standard deviation

  • f p(1-p).

 Hence the following random variable is a

standard normal random variable by CLT:

 Watch the animation here.

) 1 ( p np np X  

84

slide-85
SLIDE 85

Binomial distribution and Gaussian distribution

 Another way of stating the afore-mentioned facts

is that:

 This is called the de Moivre-Laplace theorem

and is a special case of the CLT. But its proof was published almost 80 years before that of the CLT!

) , ( ~ where ) ( ) ( ) ) 1 ( ( , , have we , When p n Binomial X a b b p np np X a P b a a,b n            

85

slide-86
SLIDE 86

Distribution of the sample mean

 Consider independent and identically distributed

random variables X1, X2, …, Xn with mean μ and standard deviation σ.

 We know that the sample mean (or empirical

mean) is a random variable given by: n X X

n i i

1

Note yet again: The true mean μ is NOT a random variable. The sample mean is, and its value converges to the true mean μ by the law of large numbers.

86

slide-87
SLIDE 87

Distribution of the sample mean

 Now we have:

n n X Var X Var n X E X E

n i i n i i 2 2 1 1

) ( ) ( ) (                  

 

 

rem. limit theo central the per as d, distribute normally

  • nly

be would , variables random normal t weren' ..., , , if Otherwise (how?). variable random normal a also is that proved be can it then , variables random normal were ..., , , If

2 1 2 1

ely approximat X X X X X X X X

n n

87

slide-88
SLIDE 88

Distribution of the sample variance

 The sample variance is given by:  The sample standard deviation is S.

1 1 ) (

1 2 2 1 2 2

     

 

 

n X n X n X X S

n i n i i

i

88

slide-89
SLIDE 89

Distribution of the sample variance

 The expected value of the sample variance is

derived as follows: ) ( ) ( ) ( ) ) 1 ((

2 2 1 1 2 2 2

X nE X nE X n X E S n E

n i

i

    

 2 2

)) ( ( ) ( ) ( W E W Var W E  

2 2 1 1 2

)) ( ( ) ( )) ( ( ) ( ) ) 1 (( X E n X nVar X E n X nVar S n E      

89

slide-90
SLIDE 90

Distribution of the sample variance

 The expected value of the sample variance is

derived as follows:

2 2 1 1 2

)) ( ( ) ( )) ( ( ) ( ) ) 1 (( X E n X nVar X E n X nVar S n E      

2 2 2 2 2 2

) 1 ( ) ( ) / ( ) ) 1 ((              n n n n n n S n E

2 2)

(    S E

90

slide-91
SLIDE 91

Distribution of the sample variance

 The expected value of the sample variance is

derived as follows:

2 2

) 1 ( ) ) 1 ((      n S n E

2 2)

(    S E

n n S E we n X n X n X X S If

n i n i i

i

2 2 1 2 2 1 2 2

) 1 ( ) ( : have would , ) ( as defined instead were variance sample the       

 

 

This is undesirable – as we would like to have the expected value

  • f the sample variance to equal the true variance! Hence S2 here

above is multiplied by (n-1)/n to correct for this anomaly giving rise to our strange definition of sample variance. This multiplication by (n-1)/n is called Bessel’s correction.

91

slide-92
SLIDE 92

Distribution of the sample variance

 But the mean and the variance alone does not

determine the distribution of a random variable.

 So what about the distribution of the sample

variance?

 For that we need to study another distribution first

– the chi-squared distribution.

92

slide-93
SLIDE 93

Chi-square distribution

 If Z1, Z2, …., Zn are independent standard normal

random variables, then the following quantity is said to have a chi-square distribution with n degrees of freedom and is denoted as follows

 The formula for this is as follows:

2 2 2 2 1

...

n

Z Z Z X    

real) ( integer) ( )! 1 ( ) ( ) 2 / ( 2 ) (

1 2 / 2 / 1 2 /

y dx e x y y y n e x x f

x y n x n X     

     

93

slide-94
SLIDE 94

Chi-square distribution

 To obtain the expression for the chi-square

distribution when n = 1:

  2 1 2 2 2 1 ) ( 2 1 ) ( 2 1 ) ( ) ( ) ( ) ( ) ( ) ( ) 1 , ( ~

2 / 2 / 1 2 1 2 1 1

1 1 1 1

x x Z Z X Z Z X

e x e x x f x x f x x f x F x F x Z x P x Z P x F Z X N Z

 

               

94

slide-95
SLIDE 95

Chi-square distribution

 MGF of a chi-square distribution with n degrees

  • f freedom:

 Please note that the aforementioned MGF is

defined only for t < ½.

2 /

) 2 1 ( ) (

n X

t t

  

Proof on the board. And here.

95

slide-96
SLIDE 96

96

slide-97
SLIDE 97

Additive property

 If X1 and X2 are independent chi-square random

variables with n1 and n2 degrees of freedom respectively, then X1 + X2 is also a chi-square random variable with n1 + n2 degrees of freedom. This is called the additive property.

 It is easy to prove this property by observing that

X1 + X2 is basically the sum of n1 + n2 independent normal random variables.

97

slide-98
SLIDE 98

Chi-square distribution

 Tables for the chi-square distribution are available

for different number of degrees of freedom, and for different values of the independent variable.

98

slide-99
SLIDE 99

Back to the distribution of the sample variance

2 1 2 1 2 2

) ( ) ( ) ( ) 1 (         

 

 

X n X X X S n

n i i n i i 2 2 2 1 2 2 1 2

) ( ) ( ) (           

 

 

X n X X X

n i i n i i 2 2 1 2 2 1

) ( ) (                    

 

 

     X n X X X

n i i n i i

The sum of squares of n standard normal random variables The square of a standard normal random variable

99

slide-100
SLIDE 100

Back to the distribution of the sample variance

2 2 1 2 2 1

) ( ) (                    

 

 

     X n X X X

n i i n i i

The sum of squares of n standard normal random variables The square of a standard normal random variable It turns out that these two quantities are independent random variables. The proof of this requires multivariate statistics and transformation of random variables, and is deferred to a later point in the course. If you are curious, you can browse this link, but it’s not on the exam for now. Given this fact about independence, it then follows that the middle term is a chi-square distribution with n-1 degrees of freedom.

100

slide-101
SLIDE 101

Uniform distribution

101

slide-102
SLIDE 102

Uniform distribution

 A uniform random variable over the interval [a,b]

has a pdf given by:

 Clearly, this is a valid pdf – it is non-negative and

integrates to 1.

 It is easy to show that its mean and median are

equal to (b+a)/2.

  • therwise

), /( 1 ) (      b x a if a b x f X

2 ) ( 2 | ) (

2

b a a b x a b xdx X E

b a b a

      

102

slide-103
SLIDE 103

Uniform distribution

 Variance:  MGF:

12 ) ( )) ( ( ) ( ) ( 3 ) ( 3 ) ( 3 | ) (

2 2 2 2 2 3 3 3 2 2

a b X E X E X Var ab a b a b a b a b x a b dx x X E

b a b a

               , 1 , ) ( ) (          t t a b t e e a b dx e e E

ta tb b a tx tX

103

slide-104
SLIDE 104

Applications

 Uniform random variables, especially over the [0,1]

interval are very important, in developing programs to draw samples from other distributions including the Gaussian, Poisson, and others.

 You will study more of this later on in the semester.  For now, we will study two applications. How do you

draw a sample from a distribution of the following form:

1 , 1 , ) (

1

    

 n i i i i

p n i p x X P

104

slide-105
SLIDE 105

Applications

 For now, we will study two applications. How do

you draw a sample from a discrete distribution with the following pmf:

1 , 1 , ) (

1

    

 n i i i i

p n i p x X P

n n n n

x p p p p u p p p x p p u p x p u Uniform u is value sampled ... ... If . . is value sampled If is value sampled If ) 1 , ( ~ Draw

1 2 1 1 2 1 2 2 1 1 1 1

               

 

105

slide-106
SLIDE 106

Applications

 Uniform random variables, especially over the [0,1]

interval are very important, in developing programs to draw samples from other distributions including the Gaussian, Poisson, and others.

 You will study more of this later on in the semester.  For now, we will study how to generate a random

permutation of n elements. That is what the “randperm” function in MATLAB does, and you have used it at least once so far!

106

slide-107
SLIDE 107

Application: generating a random subset

 In fact, we will do something more than randperm –

we will develop theory to generate a random subset of size k from a set A={a1, a2,…, an} of size n, assuming all the C(n,k) subsets are equally likely.

 Let us define the following for each element j (1 ≤ j ≤

n):

 Now we will sequentially pick each element of the

subset randomly as follows:

else , if 1

k j j

B a I  

Notation for chosen subset (of size k)

107

slide-108
SLIDE 108

Application: generating a random subset

 Notice that P(I1=1) = k/n. (why? Because there C(n,k)

ways to pick k objects out of n. Let us say object 1 is

  • ne of them, then there are C(n-1,k-1) ways to pick

the other k-1 objects)

 If I1 is 1, then P(I2=1) = (k-1)/(n-1). (why?)  If I1 is 0, then P(I2=1) = k/(n-1). (why?)  Thus P(I2=1|I1) = (k-I1)/(n-1) (why?)  Side question: what is P(I2=1)?

108

slide-109
SLIDE 109

Application: generating a random subset

 Continuing this way, one can show that:

n j j n I k I I I I P

j i i j j

     

  

2 , ) 1 ( ) ,..., , | (

1 1 1 2 1

109

slide-110
SLIDE 110

Application: generating a random subset

 This suggests the following procedure:

. . else ), 1 /( ) ( if , 1 ) 1 , ( ~ else , / if , 1 ) 1 , ( ~

1 2 2 2 1 1 1

      n I k U I Uniform U n k U I Uniform U else ), 1 /( ) ... ( if , 1 ) 1 , ( ~

1 2 1

       

j n I I I k U I Uniform U

j j j j

When does this process stop? It stops at step #j * If I1+I2+…+Ij = k and the random subset Bk contains those indices whose I-values are 1 OR * If the number of unfilled entries in the random subset Bk = number of remaining elements in A. In this case Bk = all remaining elements in A with index greater than i=largest index in Bk. See figure 5.6 of the book.

110

slide-111
SLIDE 111

Exponential Distribution

111

slide-112
SLIDE 112

Motivation

 Consider a Poisson distribution with an average

number of successes per unit time given by λ.

 So the number of successes in time u is λu.  This is actually called a Poisson process.  Now consider the time taken (T) for the first

success – this is called as the waiting time.

112

slide-113
SLIDE 113

Motivation

 Let X ~ Poisson (λu) for time interval (0,u).  T is a random variable whose distribution we are

going to seek to model here. Then,

) ( 1 ) ( 1 ) (        X P u T P u T P

The probability that the first success occurred after time t = probability that there was no success in the time interval (0,t), i.e. X = 0 in that interval

u u

e u e X P u T P u T P

    

           1 ! ) ( 1 ) ( 1 ) ( 1 ) (

113

slide-114
SLIDE 114

Motivation

Such a random variable T is called an exponential random variable. It models the waiting time for a Poisson process. It has a parameter λ.

elsewhere) ( , ) ( 1 ) (    

 

u e u f e u F

u T u T  

114

slide-115
SLIDE 115

Properties: MGF

 MGF defined for t < λ:

t du e du e e e E t

u t u tu tT T

     

 

    

   

  ) (

) ( ) (

115

slide-116
SLIDE 116

Properties: Mean and Variance

 Mean:  Variance:

  

 

1 ) (   

 

   

dt te dt e t T E

t t

This is intuitive – a Poisson process with a large average rate should definitely lead to a lower expected waiting time.

2 2 2 2 2 2

2 ) ( 1 )) ( ( ) ( ) (   

    

  dt

e t T E T E T E T Var

t

116

slide-117
SLIDE 117

117

slide-118
SLIDE 118

Properties: Mode and Median

 Mode: always at 0  Median

 

2 ln 2 1  

u dt e

u t

118

slide-119
SLIDE 119

Properties: “Memorylessness”

 A non-negative random variable T is said to be

memoryless if:

 Meaning: This gives the probability that given a

waiting time of success of more than u, the waiting time will exceed s+u, i.e. one would have to wait for s more time units for success.

 Another formula (equivalent to the earlier one)

) ( ) | ( , , s T P u T u s T P u s       

) ( ) ( ) , ( s T P u T P u T u s T P      

119

slide-120
SLIDE 120

Properties: “Memorylessness”

 You can easily verify that this holds for the

exponential distribution.

) (

) ( ) ( ) (

u s s u

e u s T P e s T P e u T P

   

      

  

120

slide-121
SLIDE 121

Example

 Suppose that the number of miles a car can run

before its battery fails is exponentially distributed with an average of α. What is the probability that the car won’t fail on a trip of k miles given that it has already run for l miles?

 Solution: For exponential distribution we know

that

  /

) ( ) ( ) , ( ) | (

k k

e e k T P l T P l T l k T P l T l k T P

 

           

121

slide-122
SLIDE 122

Example

 Suppose that the number of miles a car can run

before its battery fails is exponentially distributed with an average of α. What is the probability that the car won’t fail on a trip of k miles given that it has already run for l miles?

 Solution: If the distribution were not exponential,

then we have

) ( 1 ) ( 1 ) ( ) , ( ) | ( l F l k F l T P l T l k T P l T l k T P

T T

           

122

slide-123
SLIDE 123

Property: Minimum

 Consider independent exponentially distributed

random variables X1, X2, …, Xn with parameters λ1, λ2, …, λn. Then min(X1, X2, …, Xn) is exponentially distributed.          

   

 

n i i i

x n i x n i i n n

e e x X P x X x X x X P x X X X P

1

1 1 2 1 2 1

ce independen to due ) ( ) ,..., , ( ) ) ,..., , (min(

 

123