Fall 2017 Instructor: Ajit Rajwade
Families of Random Variables
1
Families of Random Variables Fall 2017 Instructor: Ajit Rajwade 1 - - PowerPoint PPT Presentation
Families of Random Variables Fall 2017 Instructor: Ajit Rajwade 1 Topic Overview We will study several useful families of random variables that arise in interesting scenarios in statistics. Discrete random variables- Bernoulli,
1
We will study several useful families of random
Discrete random variables- Bernoulli, Binomial,
Continuous random variables- Gaussian, uniform,
2
3
Let X be a random variable whose value is 1 when a
This is called a Bernoulli pmf with parameter p –
Note here: the coin need not be unbiased any longer!
4
E[X] = p(1) + (1-p)(0) = p. Var(X) = p(1-p)2 + (1-p)(0-p)2 = p(1-p) What’s the median? What is (are) the mode(s)? What is its MGF?
, 5 . if 1 , 5 . if 5 . p p
, 5 . if 1 , 5 . if 0,1} { p p
t
pe p 1
5
6
Let X be a random variable denoting the number
Then the pmf of X is given as follows: This is called the binomial pmf with parameters
i n i
7
The pmf of X is given as follows: Explanation: Consider a sequence of trials with i
i n i
8
Example: In 5 coin tosses, if we had two heads
What’s the probability that a sequence of
9
The pmf of X is given as follows: To verify it is a valid pmf:
i n i
1 1
n n i i n i n i
Binomial theorem!
10
11
Let’s say you design a smart self-driving car and you
Answer: X is the number of times your car collides. X
12
At least half of an airplane’s engines need to function
Answer: The number of functioning engines (X) is a
In 4-engine case, the probability of operation is
In 2-engine case, it is P(X=1)+P(X=2) = 2p(1-p)+(1-
Answer is for p < 1/3.
13
Mean: Recall that binomial random variable X is
Hence Variance:
) prob. with
(this success yields trial if 1 p i Xi np X E X E
n i i
) ( ) (
1
) 1 ( ) ( ) ( ) (
1 1
p np X Var X Var X Var
n i i n i i
Notice how we are making use of the linearity of the expectation operator. These calculations would have been much harder had you tried to plug in the formulae for binomial distribution. 14
The CDF is given as follows: MGF: Consider a binomial r.v. to be the sum of n
i k k n k X
15
16
17
We want P(X = 0) + P(X=1) + P(X=2). P(X=0) = (0.1)0(0.9)12 = 0.282 P(X=1) = C(12,1) (0.1)1(0.9)11 =0.377 P(X=2) = C(12,2) (0.1)2(0.9)10=0.23 So the answer is 0.889.
18
The probability that 3 or more bits are corrupted is 1-
Let Y = number of packets with 3 or more corrupted
Mean of Y is μ = 6(0.111) = 0.666. Standard deviation of Y is σ = [6(0.111)(0.889)]0.5 =
We want P(Y > μ + 2σ) = P(Y > 2.2) = P(Y ≥ 3) = 1-
19
We want P(Y > μ + 2σ) = P(Y > 2.2) = P(Y ≥ 3) =
P(Y=0)=C(6,0) (0.111)0(0.889)6 = 0.4936 P(Y=1) = C(6,1)(0.111)(0.889)5 = 0.37 P(Y=2) = C(6,2) (0.111)2(0.889)4 = 0.115 P(Y > μ + 2σ) = 1-(0.4936+0.37+0.115) = 0.0214
20
In a sequence of Bernoulli trials, let X be the random
Then X is called a geometric random variable and its
Let X be the trial number for the kth success in a
1
i
k n k
21
Let X ~ Binomial (n,p). Then P(X = k) ≥ P(X = k-1) ↔ k ≤ (n+1)p (prove this). Also P(X = k) ≥ P(X = k+1) ↔ k ≥ (n+1)p-1 (prove
Any integer-valued k which satisfies both the above
If (n+1)p is an integer then that’s the mode of the
22
23
We have seen the binomial distribution before: Here p is the success probability. We can express
Hence
i n i
i n i
24
We have In the limit when n→∞ and p→0 such that λ=np, we
This is called as the Poisson pmf and the above
i n i
i
25
The Poisson distribution is used to model the
For a Poisson random variable, note that the
26
Notice how the binomial is resembling the Poisson with an increase in n. Also notice that np (which is approximately the mode of the binomial) is constant despite increase in n.
27
To double check that it is indeed a valid pmf, we
The afore-mentioned analysis tells us that the
1 !
e e i e
i i
Using Taylor’s series for exponential function about 0
28
The afore-mentioned analysis tells us that the
! ) ( i ie X E
i i
!
1
i ie
i i
1 1
)! 1 (
i i
i e
e e j e
j j 0 !
29
Variance: MGF:
) ( ! ) ( )) ( ( ) ( ) (
2 2 2 2 2
X Var i e i X E X E X E X Var
i i
Detailed proof on the
) 1 (
! / ) ( ! / ) ( ) (
t t
e e i i t i i ti tX X
e e e i e e i e e e E t
30
Mode: integer. an is
that note
these both satisfies that an seek we Thus ) 1 ( ) ( and 1 ) 1 ( ) ( if mode a is not integer k k k X P k X P k k X P k X P k
31
10 20 30 40 50 60 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 lambda = 1 lambda = 5 lambda = 10
Notice: the mean and variance both increase with lambda.
32
Consider independent Poisson random variables
PMF – recurrence relation:
Detailed proof on the board and in tutorial 1.
e X P i i X P i X P e i i X P e i i X P
i i
) ( , 1 ) ( ) 1 ( )! 1 ( ) 1 ( ! ) (
1
33
If X ~ Poisson (λ), P(Y|X=l) = Binomial(l,p) where
34
The number of misprints in a book (assuming the
Number of traffic rule violations in a typical city in the
In general, the Poisson distribution is used to model
35
Number of people in a country who live up to 100
Number of wrong numbers dialed in a day Number of laptops that fail on the first day of use Number of photons of light counted by the
36
37
Consider a sequence of n independent trials each
Assume that the probability of each of the k
38
Let X be a k-dimensional random variable for
Eg: in 20 throws of a die, you had 2
and 2 sixes.
39
Then the pmf of X is given as follows: This is called the multinomial pmf.
k n i i i x k x x k k k k
k
2 1 1 2 1 2 1 2 2 1 1 2 1
2 1
The number of ways to arrange n objects which can be divided into k groups of identical objects. There are x1 objects
type 2, and xk objects
40
The success probabilities for each category, i.e.
Remember: The multinomial random variable is a
41
Mean vector: Variance of a component
i i k
2 1
) 1 ( ) ( ) ( ) (
1 1 i i n j ij n j ij i
p np X Var X Var X Var
Xij is a Bernoulli random variable which tells you whether or not there was a success in the ith category on the jth trial Assuming independent trials
42
For vector-valued random variables, the variance
) , ( )] )( [( ) , (
j i j j i i
X X Cov X X E j i C page next :
Pr ), 1 ( , ) , ( j i p np j i p np X X Cov
i i j i j i
43
j i j i n l jl il n l jl il n l jl il n k n k l l jl ik n k n l jl ik j i n l jl j n k ik i j i j i
p np p p X E X E X X E X X Cov X X Cov X X Cov X X Cov X j X X i X p np X X Cov
) ( )) ( ) ( ) ( ( ) , ( ) , ( ) , ( ) , ( category in successes # category in successes # :
Pr ) , (
1 1 1 1 , 1 1 1 1 1
These are independent Bernoulli random variables – each representing the outcome
By linearity of covariance By independence of trials Since in a trial, success can be achieved
44
For k = 2, the multinomial reduces to the
Let us derive the MGF for k = 3 (trinomial):
n t t n x x n x x x n x t x t x t n x x n x x t x x n x x X t X t x x n x x
p p e p e p p p e p e p x x n x x n e e p p p p x x n x x n e E t t p p p p x x n x x n P x x X X ) 1 ( ) 1 ( )! ( ! ! ! ) 1 ( )! ( ! ! ! ) ( ) , ( ) ( ) 1 ( )! ( ! ! ! ) ( ) , ( ), , (
2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1
2 1 1 1 2 2 1 2 2 1 1 2 2 1 1 2 1 1 2 1 2 1 2 2 1 1 2 1 2 1
X X t
x X x X
This follows from the multinomial theorem.
45
Multinomial theorem: For arbitrary k:
n k t k t t k k
k
1 2 1 1 2 1 1 2 1 1 2 1
1 2 1
X
m i k i n k k k m n m
i m
1 ... 2 1 2 1
2 1
46
47
Suppose there are k objects each of a different type. When you sample 2 objects from these with replacement,
you pick a particular object with probability 1/k, and you place it back (replace it).
The probability of picking an object of another type is again
1/k.
When you sample without replacement, the probability
that your first object was of so and so type is 1/k. The probability that your second object was of so and so type is now 1/(k-1) because you didn’t put the first object back!
48
Consider a set of objects of which N are of good
Suppose you pick some n objects out of these
There are C(N+M,n) ways of doing this. Let X be a random variable denoting the number
49
There are C(N,i)C(M,n-i) ways to pick i good
So we have Such a random variable X is called a
if ) , ( , ) , ( ) , ( ) , ( ) ( b a b b a C n i n M N C i n M C i N C i X P
50
Consider random variable Xi which has value 1 if
Now consider the following probabilities:
M N N X P M N N M N M M N N M N N M N N X P X X P X P X X P X P M N N X P
i
) 1 ( general, In 1 1 1 ) ( ) | 1 ( ) 1 ( ) 1 | 1 ( ) 1 ( ) 1 (
1 1 2 1 1 2 2 1
51
Note that:
n i i
X X
1
M N nN X E X E
n i i
1
) (
n i n i j j i n i i n i i
X X Cov X Var X Var X Var
1 1 1 1
) , ( 2 ) ( ) ( M N NM X P X P X Var
i i i
)) 1 ( 1 )( 1 ( ) (
Each Xi is a Bernoulli random variable with parameter p=N/(N+M).
52
Note that:
n i n i j j i n i i n i i
X X Cov X Var X Var X Var
1 1 1 1
) , ( 2 ) ( ) ( M N NM X P X P X Var
i i i
)) 1 ( 1 )( 1 ( ) ( ) ( ) ( ) ( ) , (
j i j i j i
X E X E X X E X X Cov M N N M N N X P X X P X X P X X P X X E
i i j i j j i j i
1 1 ) 1 ( ) 1 | 1 ( ) 1 , 1 ( ) 1 ( ) (
53
Note that:
) ( ) ( ) ( ) , (
j i j i j i
X E X E X X E X X Cov M N N M N N X P X X P X X P X X P X X E
i i j i j j i j i
1 1 ) 1 ( ) 1 | 1 ( ) 1 , 1 ( ) 1 ( ) ( ) 1 ( ) ( ) 1 )( ( ) 1 ( ) , (
2 2
M N M N NM M N N M N M N N N X X Cov
j i
54
Note that:
) 1 ( ) ( ) 1 )( ( ) 1 ( ) , (
2 2
M N M N NM M N N M N M N N N X X Cov
j i
1 1 1 ) ( ) 1 ( ) ( ) 1 ( ) ( ) (
2 2 2
M N n M N nNM M N M N NM n n M N nNM X Var 1 1 1 ) 1 ( M N n p np
large very is/are and/or M N when ) 1 ( p np
Recall: Each Xi is a Bernoulli random variable with parameter p=N/(N+M).
55
56
A continuous random variable is said to be
This pdf is symmetric about the mean μ and has
) , ( as denoted 2 ) ( exp 2 1 ) (
2 2 2
N x x f
57
58
https://upload.wikimedia.org/wikipedia/com mons/7/74/Normal_Distribution_PDF.svg
If μ=0 and σ=1, it is called the standard normal
To verify that this is a valid pdf:
) 1 , ( as denoted 2 exp 2 1 ) (
2
N x x f 1 2 ) 2 / 1 ( 1 2
2 2 2 2 2 2
2 0 0 ) ( 2
dx e ds e rdr e rdrd e dxdy e dx e
x s r r y x x
This is a Gaussian pdf with mean 0 and standard deviation 1/sqrt(2). Thus we have verified that this particular Gaussian function is a valid pdf. You can verify that Gaussians with arbitrary mean and variance are valid pdfs by a change of variables. Note the change from (x,y) to polar coordinates (r,θ). x = r cos (θ) y = r sin (θ) 59
Mean:
] [ ? ) ( 2 ) ( 2 1 ) (
) 2 /( ) ( ) 2 /( ) (
2 2 2
X E why dy e y dx e x X E
y x
60
Variance:
? 2 ) ( 2 1 ) ) ((
2 ) 2 /( ) ( 2 2 ) 2 /( ) ( 2 2
2 2 2
why dy e y dx e x X E
y x
61
2 2 2
Proof on board. And in the book.
62
Median = mean (why?) Because of symmetry of the pdf about the mean Mode = mean – can be checked by setting the
CDF for a 0 mean Gaussian with variance 1 – is
dx e x F x
x x X
) 2 /( ) (
2
2 1 ) ( ) (
63
CDF – it is given by: It is closely related to the error function erf(x)
It follows that:
dx e x erf
x x
2
2 ) ( dx e x F x
x x X
) 2 /( ) (
2
2 1 ) ( ) ( 2 1 2 1 ) ( x erf x
Verify for yourself
64
For a Gaussian with mean μ and standard
The probability that a Gaussian random variable
2 1 2 1 x erf x 2 2 2 1 2 2 1 ) ( ) ( n erf n erf n erf n n
65
The probability that a Gaussian random variable
2 ) ( ) ( n erf n n
n Φ(n)-Φ(-n) 1 68.2% 2 95.4% 3 99.7% 4 99.99366% 5 99.9999% 6 99.9999998 % Hence a Gaussian random variable lies within ±3σ from its mean with more than 99% probability
66
MGF:
2 / exp ) (
2 2t
t t
X
Proof here.
67
Let’s say you draw n = 2 values, called x1 and x2, from
You repeat this process some m=5000 times (say),
Now suppose you repeat the earlier two steps with
n x n y
n i i j 1
Sampling index = i, 1 ≤ i ≤ n Iteration index = j, 1 ≤ j ≤ m
68
Now suppose you repeat the earlier two steps with larger and larger n.
It turns out that as n grows larger and larger, the histogram starts resembling a 0 mean Gaussian distribution with variance equal to that of the sampling distribution (i.e. the [0,1] distribution).
Now if you repeat the experiment with samples drawn from any other distribution instead of [0,1] uniform random (i.e. you change the sampling distribution).
The phenomenon still occurs, though the resemblance may start showing up at smaller or larger values of n.
This leads us to a very interesting theorem called the central limit theorem.
Demo code: http://www.cse.iitb.ac.in/~ajitvr/CS215_Fall2017/CLT/
69
Consider X1, X2,…, Xn to be a sequence of
n x n Y
n i i n 1
70
Note that the random variables X1, X2,…, Xn must be
Converge in distribution means the following: There is a version of the central limit theorem that
) / ( ) ( lim z z Y P
n n
71
Consider X1, X2,…, Xn to be a sequence of
n i i n i i i n
x Y
1 2 1
) (
n i i n n n i s x i i n
n i i
1 2 1 } | {| 2
if 1 ) ( } 1 , { : A x x X
A A
1 1
Indicator function
72
Informally speaking, the take home message from the
This provides a major motivation for the widespread
The errors in experimental observations are often
73
The law of large numbers says that
the empirical mean calculated from a large number of samples is equal to (or very close) to the true mean μ (of the distribution from which the samples were drawn).
The central limit theorem says that
the empirical mean calculated from a large number of samples is a random variable drawn from a Gaussian distribution with mean equal to the true mean μ (of the distribution from which the samples were drawn).
Empirical mean can take any of these values!
74
source
Is this a contradiction?
75
The answer is NO! Go and look back at the central limit theorem.
) , ( ~
2 1
N
n x n Y
n i i n
?) ( ) / , ( ~
2 1
why n n x
n i i
N ) / , ( ~
2 1
n n x
n i i
N
This variance drops to 0 when n is very large! All the probability is now concentrated at the mean!
76
Consider the n i.i.d. random variables X1, X2,…,Xn
Then we have to prove that: For that we will prove that the MGF of Zn equals
x y n n
2 /
2
n n
77
By properties of the MGF, we have: We need to prove that:
n X Z n X n S
n t t t t
n n
) ( ) ( ) ( 2 / log lim
2
t n t n
X n
X tb Y
2 / exp ) ( , dev. std. and mean with r.v. Gaussian a for : Recall
2 2t
t t X
X
78
Labelling x = 1/ 𝑜 , we have: 2 2 ) ( ) ( 2 ) ( ) ( ' ' 2 ) / ( ' ) / ( ) / ( ) / ( ' ' ) / ( lim 2 ) / ( ) / ( ' lim 2 2 ) / ( ) / ( ' lim ) / ( log lim
2 2 2 2 2 2 2 2 2 2
t t X E X E t t tx x t tx tx t t tx x tx t x tx tx t x tx
X X X X X x X X x X X x X x
L’Hospital’s rule Recall:
) ( , 1 ) ( ) ( ) ( ,
' ) (
X X r r X
X E r
79
Your friend tells you that in 10000 successive
Answer: Let X1, X2,…, Xn be the random variables
These are i.i.d. random variables whose sum is a
80
Your friend tells you that in 10000 successive independent unbiased coin tosses, he counted 5200 heads. Is (s)he serious or joking?
Answer: The given number of heads is 5200 which is 4 standard deviations away from the mean.
The chance of that occurring is of the order of 0.00001 (see the slide on error functions) since the total number of heads is a Gaussian random variable (as per central limit theorem).
So your friend is (most likely) joking.
Notice that this answer is much more principled than giving an answer purely based on some arbitrary threshold over |X-5000|.
You will study much more of this when you do a topic called hypothesis testing.
81
82
The binomial distribution begins to resemble a
In fact this resemblance begins to show up for
Recall that a binomial random variable is the
else trial)
heads ( 1 ,
th 1
i X X X
i n i i
83
Each Xi has a mean of p and standard deviation
Hence the following random variable is a
Watch the animation here.
84
Another way of stating the afore-mentioned facts
This is called the de Moivre-Laplace theorem
85
Consider independent and identically distributed
We know that the sample mean (or empirical
n i i
1
Note yet again: The true mean μ is NOT a random variable. The sample mean is, and its value converges to the true mean μ by the law of large numbers.
86
Now we have:
n n X Var X Var n X E X E
n i i n i i 2 2 1 1
) ( ) ( ) (
rem. limit theo central the per as d, distribute normally
be would , variables random normal t weren' ..., , , if Otherwise (how?). variable random normal a also is that proved be can it then , variables random normal were ..., , , If
2 1 2 1
ely approximat X X X X X X X X
n n
87
The sample variance is given by: The sample standard deviation is S.
1 2 2 1 2 2
n i n i i
i
88
The expected value of the sample variance is
2 2 1 1 2 2 2
n i
i
2 2
2 2 1 1 2
89
The expected value of the sample variance is
2 2 1 1 2
2 2 2 2 2 2
2 2)
90
The expected value of the sample variance is
2 2
2 2)
n i n i i
i
2 2 1 2 2 1 2 2
This is undesirable – as we would like to have the expected value
above is multiplied by (n-1)/n to correct for this anomaly giving rise to our strange definition of sample variance. This multiplication by (n-1)/n is called Bessel’s correction.
91
But the mean and the variance alone does not
So what about the distribution of the sample
For that we need to study another distribution first
92
If Z1, Z2, …., Zn are independent standard normal
The formula for this is as follows:
2 2 2 2 1
n
real) ( integer) ( )! 1 ( ) ( ) 2 / ( 2 ) (
1 2 / 2 / 1 2 /
y dx e x y y y n e x x f
x y n x n X
93
To obtain the expression for the chi-square
2 1 2 2 2 1 ) ( 2 1 ) ( 2 1 ) ( ) ( ) ( ) ( ) ( ) ( ) 1 , ( ~
2 / 2 / 1 2 1 2 1 1
1 1 1 1
x x Z Z X Z Z X
e x e x x f x x f x x f x F x F x Z x P x Z P x F Z X N Z
94
MGF of a chi-square distribution with n degrees
Please note that the aforementioned MGF is
2 /
n X
Proof on the board. And here.
95
96
If X1 and X2 are independent chi-square random
It is easy to prove this property by observing that
97
Tables for the chi-square distribution are available
98
2 1 2 1 2 2
n i i n i i 2 2 2 1 2 2 1 2
n i i n i i 2 2 1 2 2 1
n i i n i i
The sum of squares of n standard normal random variables The square of a standard normal random variable
99
2 2 1 2 2 1
n i i n i i
The sum of squares of n standard normal random variables The square of a standard normal random variable It turns out that these two quantities are independent random variables. The proof of this requires multivariate statistics and transformation of random variables, and is deferred to a later point in the course. If you are curious, you can browse this link, but it’s not on the exam for now. Given this fact about independence, it then follows that the middle term is a chi-square distribution with n-1 degrees of freedom.
100
101
A uniform random variable over the interval [a,b]
Clearly, this is a valid pdf – it is non-negative and
It is easy to show that its mean and median are
2 ) ( 2 | ) (
2
b a a b x a b xdx X E
b a b a
102
Variance: MGF:
12 ) ( )) ( ( ) ( ) ( 3 ) ( 3 ) ( 3 | ) (
2 2 2 2 2 3 3 3 2 2
a b X E X E X Var ab a b a b a b a b x a b dx x X E
b a b a
, 1 , ) ( ) ( t t a b t e e a b dx e e E
ta tb b a tx tX
103
Uniform random variables, especially over the [0,1]
You will study more of this later on in the semester. For now, we will study two applications. How do you
1 , 1 , ) (
1
n i i i i
p n i p x X P
104
For now, we will study two applications. How do
1 , 1 , ) (
1
n i i i i
p n i p x X P
n n n n
x p p p p u p p p x p p u p x p u Uniform u is value sampled ... ... If . . is value sampled If is value sampled If ) 1 , ( ~ Draw
1 2 1 1 2 1 2 2 1 1 1 1
105
Uniform random variables, especially over the [0,1]
You will study more of this later on in the semester. For now, we will study how to generate a random
106
In fact, we will do something more than randperm –
Let us define the following for each element j (1 ≤ j ≤
Now we will sequentially pick each element of the
k j j
Notation for chosen subset (of size k)
107
Notice that P(I1=1) = k/n. (why? Because there C(n,k)
If I1 is 1, then P(I2=1) = (k-1)/(n-1). (why?) If I1 is 0, then P(I2=1) = k/(n-1). (why?) Thus P(I2=1|I1) = (k-I1)/(n-1) (why?) Side question: what is P(I2=1)?
108
Continuing this way, one can show that:
j i i j j
1 1 1 2 1
109
This suggests the following procedure:
1 2 2 2 1 1 1
1 2 1
j j j j
When does this process stop? It stops at step #j * If I1+I2+…+Ij = k and the random subset Bk contains those indices whose I-values are 1 OR * If the number of unfilled entries in the random subset Bk = number of remaining elements in A. In this case Bk = all remaining elements in A with index greater than i=largest index in Bk. See figure 5.6 of the book.
110
111
Consider a Poisson distribution with an average
So the number of successes in time u is λu. This is actually called a Poisson process. Now consider the time taken (T) for the first
112
Let X ~ Poisson (λu) for time interval (0,u). T is a random variable whose distribution we are
The probability that the first success occurred after time t = probability that there was no success in the time interval (0,t), i.e. X = 0 in that interval
u u
113
u T u T
114
MGF defined for t < λ:
u t u tu tT T
) (
115
Mean: Variance:
t t
This is intuitive – a Poisson process with a large average rate should definitely lead to a lower expected waiting time.
2 2 2 2 2 2
dt
t
116
117
Mode: always at 0 Median
u t
118
A non-negative random variable T is said to be
Meaning: This gives the probability that given a
Another formula (equivalent to the earlier one)
119
You can easily verify that this holds for the
) (
u s s u
120
Suppose that the number of miles a car can run
Solution: For exponential distribution we know
/
k k
121
Suppose that the number of miles a car can run
Solution: If the distribution were not exponential,
T T
122
Consider independent exponentially distributed
n i i i
x n i x n i i n n
1
1 1 2 1 2 1
123