SLIDE 1
Random variables DS GA 1002 Statistical and Mathematical Models - - PowerPoint PPT Presentation
Random variables DS GA 1002 Statistical and Mathematical Models - - PowerPoint PPT Presentation
Random variables DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda Motivation Random variables model numerical quantities that are uncertain They allow us to structure
SLIDE 2
SLIDE 3
Definition
Given a probability space (Ω, F, P), a random variable X is a function from the sample space Ω to the real numbers R We use uppercase letters to denote random variables: X, Y , . . . Once the outcome ω ∈ Ω is revealed, X (ω) is the realization of X We use lowercase letters to denote numerical values: x, y, . . .
SLIDE 4
Characterization
Given a probability space (Ω, F, P), for any set S P (X ∈ S) = P ({ω | X (ω) ∈ S}) . We will almost never construct probabilistic models like this!
SLIDE 5
Discrete random variables Continuous random variables Conditioning on an event Functions of random variables
SLIDE 6
Discrete random variables
Discrete random variables take values on a finite or countably infinite subset of R such as the integers The probability mass function (pmf) of X is defined as pX (x) := P ({ω | X (ω) = x}) In words, pX (x) is the probability that X equals x The pmf completely specifies a random variable
SLIDE 7
Probability mass function
If D is the range of X, then
- D, 2D, pX
- is a valid probability space
Any pmf satisfies pX (x) ≥ 0 for any x ∈ D,
- x∈D
pX (x) = 1 P (X ∈ S) =
- x∈S
pX (x) for any S ⊆ D
SLIDE 8
Defining a discrete random variable
To define a discrete random variable X we just need
◮ A discrete range D ◮ A nonnegative function pX satisfying
- x∈D
pX (x) = 1
SLIDE 9
Bernoulli random variable
Experiment with two possible outcomes (coin flip with bias p) pX (0) = 1 − p pX (1) = p Special case: Indicator random variable of an event S 1S (ω) =
- 1,
if ω ∈ S, 0,
- therwise
Bernoulli with parameter P (S) Allows to translate events to random variables
SLIDE 10
Example: Coin flips
You flip a coin with bias p until you obtain heads (flips are independent) If you model the number of flips as a random variable X, what is pX?
SLIDE 11
Example: Coin flips
pX (k)
SLIDE 12
Example: Coin flips
pX (k) = P (k flips)
SLIDE 13
Example: Coin flips
pX (k) = P (k flips) = P (1st flip = tails, . . . , k − 1th flip = tails, kth flip = heads)
SLIDE 14
Example: Coin flips
pX (k) = P (k flips) = P (1st flip = tails, . . . , k − 1th flip = tails, kth flip = heads) = P (1st flip = tails) · · · P (k − 1th flip = tails) P (kth flip = heads)
SLIDE 15
Example: Coin flips
pX (k) = P (k flips) = P (1st flip = tails, . . . , k − 1th flip = tails, kth flip = heads) = P (1st flip = tails) · · · P (k − 1th flip = tails) P (kth flip = heads) = (1 − p)k−1 p
SLIDE 16
Geometric random variable
The pmf of a geometric random variable with parameter p is pX (k) = (1 − p)k−1 p k = 1, 2, . . .
SLIDE 17
Geometric random variable p = 0.2
2 4 6 8 10 0.2 0.4 0.6 0.8 k pX(k)
SLIDE 18
Geometric random variable p = 0.5
2 4 6 8 10 0.2 0.4 0.6 0.8 k pX(k)
SLIDE 19
Geometric random variable p = 0.8
2 4 6 8 10 0.2 0.4 0.6 0.8 k pX(k)
SLIDE 20
Example: Coin flips
You flip a coin with bias p n times (flips are independent) If you model the number of heads as a random variable X, what is pX?
SLIDE 21
Example: Coin flips
What is the probability of getting k heads and then n − k tails? P (k heads, then n − k tails)
SLIDE 22
Example: Coin flips
What is the probability of getting k heads and then n − k tails? P (k heads, then n − k tails) = P (1st = heads, . . . , kth = heads, k + 1th = tails,. . . , nth = tails)
SLIDE 23
Example: Coin flips
What is the probability of getting k heads and then n − k tails? P (k heads, then n − k tails) = P (1st = heads, . . . , kth = heads, k + 1th = tails,. . . , nth = tails) = P (1st = heads) · · · P (kth = heads) P (k + 1th = tails) · · · P (nth = tails)
SLIDE 24
Example: Coin flips
What is the probability of getting k heads and then n − k tails? P (k heads, then n − k tails) = P (1st = heads, . . . , kth = heads, k + 1th = tails,. . . , nth = tails) = P (1st = heads) · · · P (kth = heads) P (k + 1th = tails) · · · P (nth = tails) = pk (1 − p)n−k
SLIDE 25
Example: Coin flips
Any fixed order of k heads and n − k tails has the same probability
SLIDE 26
Example: Coin flips
Any fixed order of k heads and n − k tails has the same probability We are interested in the union of these events
SLIDE 27
Example: Coin flips
Any fixed order of k heads and n − k tails has the same probability We are interested in the union of these events Can we just add their probabilities?
SLIDE 28
Example: Coin flips
Any fixed order of k heads and n − k tails has the same probability We are interested in the union of these events Can we just add their probabilities? How many possible orders are there?
SLIDE 29
Example: Coin flips
Any fixed order of k heads and n − k tails has the same probability We are interested in the union of these events Can we just add their probabilities? How many possible orders are there? n k
- :=
n! k! (n − k)!
SLIDE 30
Example: Coin flips
Any fixed order of k heads and n − k tails has the same probability We are interested in the union of these events Can we just add their probabilities? How many possible orders are there? n k
- :=
n! k! (n − k)! pX (k) = n k
- pk (1 − p)n−k
SLIDE 31
Binomial random variable
The pmf of a binomial random variable with parameters n and p is pX (k) = n k
- pk (1 − p)n−k ,
k = 0, 1, 2, . . . , n
SLIDE 32
Binomial random variable n = 20, p = 0.2
5 10 15 20 5 · 10−2 0.1 0.15 0.2 0.25 k pX(k)
SLIDE 33
Binomial random variable n = 20, p = 0.5
5 10 15 20 5 · 10−2 0.1 0.15 0.2 0.25 k pX(k)
SLIDE 34
Binomial random variable n = 20, p = 0.8
5 10 15 20 5 · 10−2 0.1 0.15 0.2 0.25 k pX(k)
SLIDE 35
Example: Call center
Model the number of calls received per day Assumptions:
- 1. Each call occurs independently from every other call
- 2. A given call has the same probability of occurring at any given time of
the day
- 3. Calls occur at a rate of λ calls per day
SLIDE 36
Example: Call center
◮ Discretize day into n slots
SLIDE 37
Example: Call center
◮ Discretize day into n slots ◮ Probability of receiving m calls in one slot?
SLIDE 38
Example: Call center
◮ Discretize day into n slots ◮ Probability of receiving m calls in one slot?
(λ/n)m
SLIDE 39
Example: Call center
◮ Discretize day into n slots ◮ Probability of receiving m calls in one slot?
(λ/n)m
◮ If n is large enough λ/n >> (λ/n)m for all m > 1
SLIDE 40
Example: Call center
◮ Discretize day into n slots ◮ Probability of receiving m calls in one slot?
(λ/n)m
◮ If n is large enough λ/n >> (λ/n)m for all m > 1 ◮ Assume that in each slot we either receive one call or none at all.
What is probability of k calls in a day?
SLIDE 41
Example: Call center
◮ Discretize day into n slots ◮ Probability of receiving m calls in one slot?
(λ/n)m
◮ If n is large enough λ/n >> (λ/n)m for all m > 1 ◮ Assume that in each slot we either receive one call or none at all.
What is probability of k calls in a day? Binomial with parameters n and p := λ/n!
SLIDE 42
Example: Call center
P (k calls during the day )
SLIDE 43
Example: Call center
P (k calls during the day ) = lim
n→∞ P (k calls in n small intervals)
SLIDE 44
Example: Call center
P (k calls during the day ) = lim
n→∞ P (k calls in n small intervals)
= lim
n→∞
n k
- pk (1 − p)(n−k)
SLIDE 45
Example: Call center
P (k calls during the day ) = lim
n→∞ P (k calls in n small intervals)
= lim
n→∞
n k
- pk (1 − p)(n−k)
= lim
n→∞
n k λ n k 1 − λ n (n−k)
SLIDE 46
Example: Call center
P (k calls during the day ) = lim
n→∞ P (k calls in n small intervals)
= lim
n→∞
n k
- pk (1 − p)(n−k)
= lim
n→∞
n k λ n k 1 − λ n (n−k) = lim
n→∞
n! λk k! (n − k)! (n − λ)k
- 1 − λ
n n
SLIDE 47
Example: Call center
P (k calls during the day ) = lim
n→∞ P (k calls in n small intervals)
= lim
n→∞
n k
- pk (1 − p)(n−k)
= lim
n→∞
n k λ n k 1 − λ n (n−k) = lim
n→∞
n! λk k! (n − k)! (n − λ)k
- 1 − λ
n n = λk e−λ k! Identity proved in the notes lim
n→∞
n! (n − k)! (n − λ)k
- 1 − λ
n n = e−λ
SLIDE 48
Poisson random variable
The pmf of a Poisson random variable with parameter λ is pX (k) = λk e−λ k! k = 0, 1, 2, . . .
SLIDE 49
Poisson random variable λ = 10
10 20 30 40 50 5 · 10−2 0.1 0.15 k pX(k)
SLIDE 50
Poisson random variable λ = 20
10 20 30 40 50 5 · 10−2 0.1 0.15 k pX(k)
SLIDE 51
Poisson random variable λ = 30
10 20 30 40 50 5 · 10−2 0.1 0.15 k pX(k)
SLIDE 52
Example: Call center
Pmf of binomial with parameters n and p = λ
n converges to pmf of
Poisson with parameter λ This is an example of convergence in distribution
SLIDE 53
Binomial random variable n = 40, p = 20
40
10 20 30 40 5 · 10−2 0.1 0.15 k
SLIDE 54
Binomial random variable n = 80, p = 20
80
10 20 30 40 5 · 10−2 0.1 0.15 k
SLIDE 55
Binomial random variable n = 400, p = 20
400
10 20 30 40 5 · 10−2 0.1 0.15 k
SLIDE 56
Poisson random variable λ = 20
10 20 30 40 5 · 10−2 0.1 0.15 k
SLIDE 57
Call-center data
◮ Assumptions do not hold over the whole day (why?) ◮ They do hold (approximately) for intervals of time ◮ Example: Data from a call center in Israel ◮ We compare the histogram of the number of calls received in an
interval of 4 hours over 2 months and the pmf of a Poisson random variable fitted to the data
SLIDE 58
Call-center data
5 10 15 20 25 30 35 40 Number of calls 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 Real data Poisson distribution
SLIDE 59
Discrete random variables Continuous random variables Conditioning on an event Functions of random variables
SLIDE 60
Continuous random variables
Useful to model continuous quantities without discretizing Assigning nonzero probabilities to events of the form {X = x} for x ∈ R doesn’t work! Instead, we only consider events of the form {X ∈ S} where S is a union of intervals (formally a Borel set) We cannot consider every possible subset of R for technical reasons
SLIDE 61
Cumulative distribution function
The cumulative distribution function (cdf) of X is defined as FX (x) := P ({X (ω) ≤ x : ω ∈ Ω}) = P (X ≤ x) In words, FX (x) is the probability of X being smaller than x The cdf can be defined for both continuous and discrete random variables
SLIDE 62
Cumulative distribution function
The cdf completely specifies the distribution of the random variable The probability of any interval (a, b] is given by P (a < X ≤ b) = P (X ≤ b) − P (X ≤ a) = FX (b) − FX (a) To define a continuous random variable we just need a valid cdf! A valid underlying probability space exists, but we don’t need to worry about it
SLIDE 63
Properties of the cdf
lim
x→−∞ FX (x) = 0,
lim
x→∞ FX (x) = 1,
FX (b) ≥ FX (a) if b > a, i.e. FX is nondecreasing
SLIDE 64
Probability density function
When the cdf is differentiable, its derivative can be interpreted as a density Probability density function fX (x) := dFX (x) d x The pdf is not a probability measure! (It can be greater than 1)
SLIDE 65
Probability density function
By the fundamental theorem of calculus P (a < X ≤ b) = FX (b) − FX (a) = b
a
fX (x) dx Intuitively, lim
∆→0 P (X ∈ (x, x + ∆)) = fX (x) ∆
SLIDE 66
Properties of the pdf
For any union of intervals (any Borel set) S P (X ∈ S) =
- S
fX (x) dx In particular, ∞
−∞
fX (x) dx = 1 From the monotonicity of the cdf fX (x) ≥ 0
SLIDE 67
Uniform random variable
Pdf of a uniform random variable with domain [a, b]: fX (x) =
- 1
b−a,
if a ≤ x ≤ b, 0,
- therwise
SLIDE 68
Uniform random variable in [a, b]
a b
1 b−a
x fX(x)
SLIDE 69
Exponential random variable
Used to model waiting times (time until a certain event occurs) Examples: decay of a radioactive particle, telephone call, mechanical failure of a device Pdf of an exponential random variable with parameter λ: fX (x) =
- λe−λx,
if x ≥ 0, 0,
- therwise
SLIDE 70
Exponential random variables
2 4 6 8 0.5 1 1.5 x fX (x) λ = 0.5 λ = 1.0 λ = 1.5
SLIDE 71
Call-center data
◮ Example: Data from a call center in Israel ◮ We compare the histogram of the inter-arrival times between calls
- ccurring between 8 pm and midnight over two days and the pdf of an
exponential random variable fitted to the data
SLIDE 72
Call center
1 2 3 4 5 6 7 8 9 Interarrival times (s) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Exponential distribution Real data
SLIDE 73
Gaussian or normal random variable
Extremely popular in probabilistic models and statistics Sums of independent random variables converge to Gaussian distributions under certain assumptions Pdf of a Gaussian random variable with mean µ and standard deviation σ fX (x) = 1 √ 2πσ e− (x−µ)2
2σ2
SLIDE 74
Gaussian random variables
−10 −5 5 10 0.1 0.2 0.3 0.4 x fX (x) µ = 2 σ = 1 µ = 0 σ = 2 µ = 0 σ = 4
SLIDE 75
Height data
◮ Example: Data from a population of 25 000 people ◮ We compare the histogram of the heights and the pdf of a Gaussian
random variable fitted to the data
SLIDE 76
Height data
60 62 64 66 68 70 72 74 76
Height (inches)
0.05 0.10 0.15 0.20 0.25
Gaussian distribution Real data
SLIDE 77
Problem
◮ The Gaussian cdf does not have a closed form solution ◮ This complicates computing the probability that a Gaussian belongs to
a set
SLIDE 78
Standard Gaussian
If X is Gaussian with mean µ and standard deviation σ, then U := X − µ σ is a standard Gaussian, with mean zero and unit standard deviation P (X ∈ [a, b]) = P X − µ σ ∈ a − µ σ , b − µ σ
- = Φ
b − µ σ
- − Φ
a − µ σ
- Φ is the cdf of a standard Gaussian, which is tabulated
SLIDE 79
χ2 random variable
Very important in hypothesis testing If U1, U2, . . . , Ud are d independent standard Gaussian random variables X :=
d
- i=1
U2
i
is a χ2 with d degrees of freedom
SLIDE 80
χ2 random variable
The pdf of a χ2 random variable with d degrees of freedom is fX (x) = x
d 2 −1 exp
- − x
2
- 2
d 2 Γ
d
2
- if x > 0 and zero otherwise
SLIDE 81
χ2 random variables
5 10 15 20 25 0.1 0.2 0.3 0.4 0.5 x fX (x) d = 1 d = 5 d = 10
SLIDE 82
Discrete random variables Continuous random variables Conditioning on an event Functions of random variables
SLIDE 83
Conditioning on an event
We usually define random variables using their pmf, cdf or pdf How can we incorporate the information that X ∈ S for some set S?
SLIDE 84
Conditional pmf
If X has pmf pX, the conditional pmf of X given X ∈ S is pX|X∈S (x) := P (X = x|X ∈ S) =
- pX (x)
- s∈S pX (s)
if x ∈ S
- therwise.
Valid pmf in the new probability space restricted to the event {X ∈ S}
SLIDE 85
Conditional cdf
If X has pdf fX, the conditional cdf of X given X ∈ S is FX|X∈S (x) := P (X ≤ x|X ∈ S) = P (X ≤ x, X ∈ S) P (X ∈ S) =
- u≤x,u∈S fX (u) du
- u∈S fX (u) du
Valid cdf in the new probability space restricted to the event {X ∈ S}
SLIDE 86
Example: Geometric random variables are memoryless
We flip a coin repeatedly until we obtain heads, but pause after k0 flips (which were tails) What is the probability of obtaining heads in k more flips?
SLIDE 87
Example: Geometric random variables are memoryless
P (k more flips)
SLIDE 88
Example: Geometric random variables are memoryless
P (k more flips) = pX|X>k0 (k)
SLIDE 89
Example: Geometric random variables are memoryless
P (k more flips) = pX|X>k0 (k) = pX (k) ∞
m=k0+1 pX (m)
SLIDE 90
Example: Geometric random variables are memoryless
P (k more flips) = pX|X>k0 (k) = pX (k) ∞
m=k0+1 pX (m)
= (1 − p)k−1 p ∞
m=k0+1 (1 − p)m−1 p
SLIDE 91
Example: Geometric random variables are memoryless
P (k more flips) = pX|X>k0 (k) = pX (k) ∞
m=k0+1 pX (m)
= (1 − p)k−1 p ∞
m=k0+1 (1 − p)m−1 p
= (1 − p)k−k0−1 p for k > k0 Geometric series:
∞
- m=k0+1
αm−1 = αk0 1 − α for any α < 1
SLIDE 92
Example: Exponential random variables are memoryless
Assume email inter-arrival times are exponential with parameter λ You get an email, then no email for t0 minutes How is the waiting time until the next email distributed now?
SLIDE 93
Example: Exponential random variables are memoryless
FT|T>t0 (t)
SLIDE 94
Example: Exponential random variables are memoryless
FT|T>t0 (t) = t
t0 fT (u) du
∞
t0 fT (u) du
SLIDE 95
Example: Exponential random variables are memoryless
FT|T>t0 (t) = t
t0 fT (u) du
∞
t0 fT (u) du
= e−λt − e−λt0 −e−λt0
SLIDE 96
Example: Exponential random variables are memoryless
FT|T>t0 (t) = t
t0 fT (u) du
∞
t0 fT (u) du
= e−λt − e−λt0 −e−λt0 = 1 − e−λ(t−t0) for t > t0
SLIDE 97
Example: Exponential random variables are memoryless
FT|T>t0 (t) = t
t0 fT (u) du
∞
t0 fT (u) du
= e−λt − e−λt0 −e−λt0 = 1 − e−λ(t−t0) for t > t0 Differentiating with respect to t fT|T>t0 (t) = λe−λ(t−t0)
for t > t0
SLIDE 98
Discrete random variables Continuous random variables Conditioning on an event Functions of random variables
SLIDE 99
Functions of random variables
◮ For any deterministic function g and r.v. X, Y := g (X) is a random
variable
◮ Formally, X maps elements of Ω to R, so Y does too since
Y (ω) = g (X (ω))
SLIDE 100
Discrete random variables
If X is discrete pY (y) = P (Y = y) = P (g (X) = y) =
- {x | g(x)=y}
pX (x)
SLIDE 101
Continuous random variables
If X is continuous FY (y) = P (Y ≤ y) = P (g (X) ≤ y) =
- {x | g(x)≤y}
fX (x) dx, Then we can differentiate to obtain the pdf fY
SLIDE 102
Gaussian random variable
If X is a Gaussian random variable with mean µ and standard deviation σ, derive the distribution of U := X − µ σ
SLIDE 103
Gaussian random variable
FU (u)
SLIDE 104
Gaussian random variable
FU (u) = P X − µ σ ≤ u
SLIDE 105
Gaussian random variable
FU (u) = P X − µ σ ≤ u
- =
- (x−µ)/σ≤u
1 √ 2πσ e− (x−µ)2
2σ2
dx
SLIDE 106
Gaussian random variable
FU (u) = P X − µ σ ≤ u
- =
- (x−µ)/σ≤u
1 √ 2πσ e− (x−µ)2
2σ2
dx = u
−∞
1 √ 2π e− w2
2 dw
by the change of variables w = x − µ σ
SLIDE 107
Gaussian random variable
FU (u) = P X − µ σ ≤ u
- =
- (x−µ)/σ≤u
1 √ 2πσ e− (x−µ)2
2σ2
dx = u
−∞
1 √ 2π e− w2
2 dw
by the change of variables w = x − µ σ To obtain the pdf we differentiate with respect to u
SLIDE 108
Gaussian random variable
FU (u) = P X − µ σ ≤ u
- =
- (x−µ)/σ≤u
1 √ 2πσ e− (x−µ)2
2σ2
dx = u
−∞
1 √ 2π e− w2
2 dw
by the change of variables w = x − µ σ To obtain the pdf we differentiate with respect to u fU (u) = 1 √ 2π e− u2
2
SLIDE 109
Gaussian random variable
FU (u) = P X − µ σ ≤ u
- =
- (x−µ)/σ≤u
1 √ 2πσ e− (x−µ)2
2σ2
dx = u
−∞
1 √ 2π e− w2
2 dw
by the change of variables w = x − µ σ To obtain the pdf we differentiate with respect to u fU (u) = 1 √ 2π e− u2
2
U is a standard Gaussian random variable
SLIDE 110
χ2 with one degree of freedom
What is the pdf of a χ2 random variable X with one degree of freedom? Recall that X := U2 where U is a standard Gaussian random variable
SLIDE 111
χ2 with one degree of freedom
FX (x)
SLIDE 112
χ2 with one degree of freedom
FX (x) =
- u2≤x
fU (u) du
SLIDE 113
χ2 with one degree of freedom
FX (x) =
- u2≤x
fU (u) du = √x
−√x
1 √ 2π e− u2
2 du
SLIDE 114
χ2 with one degree of freedom
FX (x) =
- u2≤x
fU (u) du = √x
−√x
1 √ 2π e− u2
2 du
To obtain the pdf we differentiate with respect to x
SLIDE 115
χ2 with one degree of freedom
d dt h(t)
−∞
g (u) du
- = g (h (t)) h′ (t)
SLIDE 116
χ2 with one degree of freedom
d dt h(t)
−∞
g (u) du
- = g (h (t)) h′ (t)
fX (x) = d dx √x
−∞
1 √ 2π e− u2
2 du −
−√x
−∞
1 √ 2π e− u2
2 du
SLIDE 117
χ2 with one degree of freedom
d dt h(t)
−∞
g (u) du
- = g (h (t)) h′ (t)
fX (x) = d dx √x
−∞
1 √ 2π e− u2
2 du −
−√x
−∞
1 √ 2π e− u2
2 du
= 1 √ 2π 1 2√x exp
- −x
2
- +
1 2√x exp
- −x
2
SLIDE 118
χ2 with one degree of freedom
d dt h(t)
−∞
g (u) du
- = g (h (t)) h′ (t)
fX (x) = d dx √x
−∞
1 √ 2π e− u2
2 du −
−√x
−∞
1 √ 2π e− u2
2 du
= 1 √ 2π 1 2√x exp
- −x
2
- +
1 2√x exp
- −x
2
- =
1 √ 2πx exp
- −x