SLIDE 1
Convergence of discrete random processes DS GA 1002 Statistical and - - PowerPoint PPT Presentation
Convergence of discrete random processes DS GA 1002 Statistical and - - PowerPoint PPT Presentation
Convergence of discrete random processes DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/DSGA1002_fall16 Carlos Fernandez-Granda Aim Define convergence for random processes Characterize phenomena such as
SLIDE 2
SLIDE 3
Types of convergence Law of Large Numbers Central Limit Theorem Convergence of Markov chains
SLIDE 4
Convergence of deterministic sequences
A deterministic sequence of real numbers x1, x2, . . . converges to x ∈ R, lim
i→∞ xi = x
if xi is arbitrarily close to x as i grows For any ǫ > 0 there is an i0 such that for all i > i0 |xi − x| < ǫ Problem: Random sequences do not have fixed values
SLIDE 5
Convergence with probability one
Consider a discrete random process X and a random variable X defined on the same probability space If we fix the outcome ω, X (i, ω) is a deterministic sequence and X (ω) is a constant We can determine whether lim
i→∞
- X (i, ω) = X (ω)
for that particular ω
SLIDE 6
Convergence with probability one
- X converges with probability one to X if
P
- ω | ω ∈ Ω,
lim
i→∞
- X (ω, i) = X (ω)
- = 1
Deterministic convergence occurs with probability one
SLIDE 7
Puddle
Initial amount of water is uniform between 0 and 1 gallon After a time interval i there is i times less water
- D (ω, i) := ω
i , i = 1, 2, . . .
SLIDE 8
Puddle
1 2 3 4 5 6 7 8 9 10 0.2 0.4 0.6 0.8 i
- D (ω, i)
ω = 0.31 ω = 0.89 ω = 0.52
SLIDE 9
Puddle
If we fix ω ∈ (0, 1) lim
i→∞
- D (ω, i) = lim
i→∞
ω i = 0
- D converges to zero with probability one
SLIDE 10
Puddle
10 20 30 40 50 0.5 1 i
- D (ω, i)
SLIDE 11
Alternative idea
Idea: Instead of fixing ω and checking deterministic convergence:
- 1. Measure how close
X (i) and X are for a fixed i using a deterministic quantity
- 2. Check whether the quantity tends to zero
SLIDE 12
Convergence in mean square
The mean square of Y − X measures how close X and Y are If E
- (X − Y )2
= 0 then X = Y with probability one Proof: By Markov’s inequality for any ǫ > 0 P
- (Y − X)2 > ǫ
- ≤
E
- (X − Y )2
ǫ = 0
SLIDE 13
Convergence in mean square
- X converges to X in mean square if
lim
i→∞ E
- X −
X (i) 2 = 0
SLIDE 14
Convergence in probability
Alternative measure: Probability that |Y − X| > ǫ for small ǫ
- X converges to X in probability if for any ǫ > 0
lim
i→∞ P
- X −
X (i)
- > ǫ
- = 0
SLIDE 15
- Conv. in mean square implies conv. in probability
lim
i→∞ P
- X −
X (i)
- > ǫ
SLIDE 16
- Conv. in mean square implies conv. in probability
lim
i→∞ P
- X −
X (i)
- > ǫ
- = lim
i→∞ P
- X −
X (i) 2 > ǫ2
SLIDE 17
- Conv. in mean square implies conv. in probability
lim
i→∞ P
- X −
X (i)
- > ǫ
- = lim
i→∞ P
- X −
X (i) 2 > ǫ2
- ≤ lim
i→∞
E
- X −
X (i) 2 ǫ2
SLIDE 18
- Conv. in mean square implies conv. in probability
lim
i→∞ P
- X −
X (i)
- > ǫ
- = lim
i→∞ P
- X −
X (i) 2 > ǫ2
- ≤ lim
i→∞
E
- X −
X (i) 2 ǫ2 = 0
SLIDE 19
- Conv. in mean square implies conv. in probability
lim
i→∞ P
- X −
X (i)
- > ǫ
- = lim
i→∞ P
- X −
X (i) 2 > ǫ2
- ≤ lim
i→∞
E
- X −
X (i) 2 ǫ2 = 0 Convergence with probability one also implies convergence in probability
SLIDE 20
Convergence in distribution
The distribution of ˜ X (i) converges to the distribution of X
- X converges in distribution to X if
lim
i→∞ p X(i) (x) = pX (x)
for all x ∈ RX for discrete random variables or lim
i→∞ f X(i) (x) = fX (x)
for all x ∈ R for continuous random variables
SLIDE 21
Convergence in distribution
Convergence in distribution does not imply that ˜ X (i) and X are close as i → ∞! Convergence in probability does imply convergence in distribution
SLIDE 22
Binomial tends to Poisson
◮
X (i) is binomial with parameters i and p := λ/i
◮ X is a Poisson random variable with parameter λ ◮
X (i) converges to X in distribution lim
i→∞ p X(i) (x) = lim i→∞
i x
- px (1 − p)(i−x)
= λx e−λ x! = pX (x)
SLIDE 23
Probability mass function of X (40)
10 20 30 40 5 · 10−2 0.1 0.15 k
SLIDE 24
Probability mass function of X (80)
10 20 30 40 5 · 10−2 0.1 0.15 k
SLIDE 25
Probability mass function of X (400)
10 20 30 40 5 · 10−2 0.1 0.15 k
SLIDE 26
Probability mass function of X
10 20 30 40 5 · 10−2 0.1 0.15 k
SLIDE 27
Types of convergence Law of Large Numbers Central Limit Theorem Convergence of Markov chains
SLIDE 28
Moving average
The moving average A of a discrete random process X is
- A (i) := 1
i
i
- j=1
- X (j)
SLIDE 29
Weak law of large numbers
Let X be an iid discrete random process with mean µ
X := µ and
bounded variance σ2 The average A of X converges in mean square to µ
SLIDE 30
Proof
E
- A (i)
SLIDE 31
Proof
E
- A (i)
- = E
1 i
i
- j=1
- X (j)
SLIDE 32
Proof
E
- A (i)
- = E
1 i
i
- j=1
- X (j)
= 1 i
i
- j=1
E
- X (j)
SLIDE 33
Proof
E
- A (i)
- = E
1 i
i
- j=1
- X (j)
= 1 i
i
- j=1
E
- X (j)
- = µ
SLIDE 34
Proof
Var
- A (i)
SLIDE 35
Proof
Var
- A (i)
- = Var
1 i
i
- j=1
- X (j)
SLIDE 36
Proof
Var
- A (i)
- = Var
1 i
i
- j=1
- X (j)
= 1 i2
i
- j=1
Var
- X (j)
SLIDE 37
Proof
Var
- A (i)
- = Var
1 i
i
- j=1
- X (j)
= 1 i2
i
- j=1
Var
- X (j)
- = σ2
i
SLIDE 38
Proof
lim
i→∞ E
- A (i) − µ
2
SLIDE 39
Proof
lim
i→∞ E
- A (i) − µ
2 = lim
i→∞ E
- A (i) − E
- A (i)
2
SLIDE 40
Proof
lim
i→∞ E
- A (i) − µ
2 = lim
i→∞ E
- A (i) − E
- A (i)
2 = lim
i→∞ Var
- A (i)
SLIDE 41
Proof
lim
i→∞ E
- A (i) − µ
2 = lim
i→∞ E
- A (i) − E
- A (i)
2 = lim
i→∞ Var
- A (i)
- = lim
i→∞
σ2 i
SLIDE 42
Proof
lim
i→∞ E
- A (i) − µ
2 = lim
i→∞ E
- A (i) − E
- A (i)
2 = lim
i→∞ Var
- A (i)
- = lim
i→∞
σ2 i = 0
SLIDE 43
Strong law of large numbers
Let X be an iid discrete random process with mean µ
X := µ and
bounded variance σ2 The average A of X converges with probability one to µ
SLIDE 44
Iid standard Gaussian
10 20 30 40 50 i 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
Moving average Mean of iid seq.
SLIDE 45
Iid standard Gaussian
100 200 300 400 500 i 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
Moving average Mean of iid seq.
SLIDE 46
Iid standard Gaussian
1000 2000 3000 4000 5000 i 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0
Moving average Mean of iid seq.
SLIDE 47
Iid geometric with p = 0.4
10 20 30 40 50 i 2 4 6 8 10 12
Moving average Mean of iid seq.
SLIDE 48
Iid geometric with p = 0.4
100 200 300 400 500 i 2 4 6 8 10 12
Moving average Mean of iid seq.
SLIDE 49
Iid geometric with p = 0.4
1000 2000 3000 4000 5000 i 2 4 6 8 10 12
Moving average Mean of iid seq.
SLIDE 50
Iid Cauchy
10 20 30 40 50 i 5 5 10 15 20 25 30
Moving average Median of iid seq.
SLIDE 51
Iid Cauchy
100 200 300 400 500 i 10 5 5 10
Moving average Median of iid seq.
SLIDE 52
Iid Cauchy
1000 2000 3000 4000 5000 i 60 50 40 30 20 10 10 20 30
Moving average Median of iid seq.
SLIDE 53
Types of convergence Law of Large Numbers Central Limit Theorem Convergence of Markov chains
SLIDE 54
Central Limit Theorem
Let X be an iid discrete random process with mean µ
X := µ and
bounded variance σ2 √n
- A − µ
- converges in distribution to a Gaussian random variable
with mean 0 and variance σ2 The average A is approximately Gaussian with mean µ and variance σ2/i
SLIDE 55
Height data
◮ Example: Data from a population of 25 000 people ◮ We compare the histogram of the heights and the pdf of a Gaussian
random variable fitted to the data
SLIDE 56
Height data
60 62 64 66 68 70 72 74 76
Height (inches)
0.05 0.10 0.15 0.20 0.25
Gaussian distribution Real data
SLIDE 57
Sketch of proof
Pdf of sum of two independent random variables is the convolution
- f their pdfs
fX+Y (z) = ∞
y=−∞
fX (z − y) fY (y) dy Repeated convolutions of any pdf with bounded variance result in a Gaussian!
SLIDE 58
Repeated convolutions
i = 1 i = 2 i = 3 i = 4 i = 5
SLIDE 59
Repeated convolutions
i = 1 i = 2 i = 3 i = 4 i = 5
SLIDE 60
Iid exponential λ = 2, i = 102
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 1 2 3 4 5 6 7 8 9
SLIDE 61
Iid exponential λ = 2, i = 103
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 5 10 15 20 25 30
SLIDE 62
Iid exponential λ = 2, i = 104
0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 10 20 30 40 50 60 70 80 90
SLIDE 63
Iid geometric p = 0.4, i = 102
1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 0.5 1.0 1.5 2.0 2.5
SLIDE 64
Iid geometric p = 0.4, i = 103
1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 1 2 3 4 5 6 7
SLIDE 65
Iid geometric p = 0.4, i = 104
1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 5 10 15 20 25
SLIDE 66
Iid Cauchy, i = 102
20 15 10 5 5 10 15 0.05 0.10 0.15 0.20 0.25 0.30
SLIDE 67
Iid Cauchy, i = 103
20 15 10 5 5 10 15 0.05 0.10 0.15 0.20 0.25 0.30
SLIDE 68
Iid Cauchy, i = 104
20 15 10 5 5 10 15 0.05 0.10 0.15 0.20 0.25 0.30
SLIDE 69
Gaussian approximation to the binomial
X is binomial with parameters n and p Computing the probability that X is in a certain interval requires summing its pmf over the interval Central limit theorem provides a quick approximation X =
n
- i=1
Bi, E (Bi) = p, Var (Bi) = p (1 − p)
1 nX is approximately Gaussian with mean p and variance p (1 − p) /n
X is approximately Gaussian with mean np and variance np (1 − p)
SLIDE 70
Gaussian approximation to the binomial
Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =
1000
- x=420
pX (x) =
1000
- x=420
1000 x
- 0.4x0.6(n−x) = 10.4 10−2
Approximation: P (X ≥ 420)
SLIDE 71
Gaussian approximation to the binomial
Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =
1000
- x=420
pX (x) =
1000
- x=420
1000 x
- 0.4x0.6(n−x) = 10.4 10−2
Approximation: P (X ≥ 420) ≈ P
- np (1 − p)U + np ≥ 420
SLIDE 72
Gaussian approximation to the binomial
Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =
1000
- x=420
pX (x) =
1000
- x=420
1000 x
- 0.4x0.6(n−x) = 10.4 10−2
Approximation: P (X ≥ 420) ≈ P
- np (1 − p)U + np ≥ 420
- = P (U ≥ 1.29)
SLIDE 73
Gaussian approximation to the binomial
Basketball player makes shot with probability p = 0.4 (shots are iid) Probability that she makes more than 420 shots out of 1000? Exact answer: P (X ≥ 420) =
1000
- x=420
pX (x) =
1000
- x=420
1000 x
- 0.4x0.6(n−x) = 10.4 10−2
Approximation: P (X ≥ 420) ≈ P
- np (1 − p)U + np ≥ 420
- = P (U ≥ 1.29)
= 1 − Φ (1.29) = 9.85 10−2
SLIDE 74
Types of convergence Law of Large Numbers Central Limit Theorem Convergence of Markov chains
SLIDE 75
Convergence in distribution
If a Markov chain converges in distribution, then the state vector converges to a constant vector
- p∞ := lim
i→∞
p
X(i)
= lim
i→∞ T i
- X
p
X(0)
SLIDE 76
Mobile phones
◮ Company releases new mobile-phone model ◮ At the moment 90% of the phones are in stock, 10% have been sold
locally and none have been exported
◮ Each day a phone is sold with probability 0.2 and exported with
probability 0.1
◮ Initial state vector and transition matrix:
- a :=
0.9 0.1 , T
X =
0.7 0.2 1 0.1 1
SLIDE 77
Mobile phones
In stock Sold Exported
0.2 0.1 0.7 1 1
SLIDE 78
Mobile phones
5 10 15 20 Day In stock Sold Exported
SLIDE 79
Mobile phones
5 10 15 20 Day In stock Sold Exported
SLIDE 80
Mobile phones
5 10 15 20 Day In stock Sold Exported
SLIDE 81
Mobile phones
The company wants to know how many phones are eventually sold locally and how many exported lim
i→∞
p
X(i) = lim i→∞ T i
- X
p
X(0)
= lim
i→∞ T i
- X
a
SLIDE 82
Mobile phones
The transition matrix T
X has three eigenvectors
- q1 :=
1 ,
- q2 :=
1 ,
- q3 :=
0.80 −0.53 0.27 The corresponding eigenvalues are λ1 := 1, λ2 := 1 and λ3 := 0.7 Eigendecomposition of T
X:
T
X := QΛQ−1
Q :=
- q1
- q2
- q3
- Λ :=
λ1 λ2 λ3
SLIDE 83
Mobile phones
We express the initial state vector a in terms of the eigenvectors Q−1 p
X(0) =
0.3 0.7 1.122 so that
- a = 0.3
q1 + 0.7 q2 + 1.122 q3
SLIDE 84
Mobile phones
lim
i→∞ T i
- X
a
SLIDE 85
Mobile phones
lim
i→∞ T i
- X
a = lim
i→∞ T i
- X (0.3
q1 + 0.7 q2 + 1.122 q3)
SLIDE 86
Mobile phones
lim
i→∞ T i
- X
a = lim
i→∞ T i
- X (0.3
q1 + 0.7 q2 + 1.122 q3) = lim
i→∞ 0.3 T i
- X
q1 + 0.7 T i
- X
q2 + 1.122 T i
- X
q3
SLIDE 87
Mobile phones
lim
i→∞ T i
- X
a = lim
i→∞ T i
- X (0.3
q1 + 0.7 q2 + 1.122 q3) = lim
i→∞ 0.3 T i
- X
q1 + 0.7 T i
- X
q2 + 1.122 T i
- X
q3 = lim
i→∞ 0.3 λ i 1
q1 + 0.7 λ i
2
q2 + 1.122 λ i
3
q3
SLIDE 88
Mobile phones
lim
i→∞ T i
- X
a = lim
i→∞ T i
- X (0.3
q1 + 0.7 q2 + 1.122 q3) = lim
i→∞ 0.3 T i
- X
q1 + 0.7 T i
- X
q2 + 1.122 T i
- X
q3 = lim
i→∞ 0.3 λ i 1
q1 + 0.7 λ i
2
q2 + 1.122 λ i
3
q3 = lim
i→∞ 0.3
q1 + 0.7 q2 + 1.122 0.5 i q3
SLIDE 89
Mobile phones
lim
i→∞ T i
- X
a = lim
i→∞ T i
- X (0.3
q1 + 0.7 q2 + 1.122 q3) = lim
i→∞ 0.3 T i
- X
q1 + 0.7 T i
- X
q2 + 1.122 T i
- X
q3 = lim
i→∞ 0.3 λ i 1
q1 + 0.7 λ i
2
q2 + 1.122 λ i
3
q3 = lim
i→∞ 0.3
q1 + 0.7 q2 + 1.122 0.5 i q3 = 0.3 q1 + 0.7 q2
SLIDE 90
Mobile phones
lim
i→∞ T i
- X
a = lim
i→∞ T i
- X (0.3
q1 + 0.7 q2 + 1.122 q3) = lim
i→∞ 0.3 T i
- X
q1 + 0.7 T i
- X
q2 + 1.122 T i
- X
q3 = lim
i→∞ 0.3 λ i 1
q1 + 0.7 λ i
2
q2 + 1.122 λ i
3
q3 = lim
i→∞ 0.3
q1 + 0.7 q2 + 1.122 0.5 i q3 = 0.3 q1 + 0.7 q2 = 0.7 0.3
SLIDE 91
Mobile phones
5 10 15 20 Day 0.0 0.2 0.4 0.6 0.8 1.0
In stock Sold Exported
SLIDE 92
Mobile phones
lim
i→∞ T i
- X
p
X(0) =
- Q−1
p
X(0)
- 2
- Q−1
p
X(0)
- 1
- b :=
0.6 0.4 , Q−1 b = 0.6 0.4 0.75 (1)
- c :=
0.4 0.5 0.1 , Q−1 c = 0.23 0.77 0.50 (2)
SLIDE 93
Initial state vector b
5 10 15 20 Day 0.0 0.2 0.4 0.6 0.8 1.0
In stock Sold Exported
SLIDE 94
Initial state vector c
5 10 15 20 Day 0.0 0.2 0.4 0.6 0.8 1.0
SLIDE 95
Stationary distribution
- pstat is a stationary distribution of
X if T
X
pstat = pstat
- pstat is an eigenvector with eigenvalue equal to one
If pstat is the initial state lim
i→∞
p
X(i) =
pstat
SLIDE 96
Reversibility
Let X (i) be distributed according to a state vector p ∈ Rs (s = number of states)
- X is reversible with respect to
p if P
- X (i) = xj,
X (i + 1) = xk
- = P
- X (i) = xk,
X (i + 1) = xj
- for all 1 ≤ j, k ≤ s
This is equivalent to the detailed-balance condition
- T
X
- kj
pj =
- T
X
- jk
pk, for all 1 ≤ j, k ≤ s
SLIDE 97
Reversibility implies stationarity
The detailed-balance condition provides a sufficient condition for stationarity If X is reversible with respect to p, then p is a stationary distribution of X
- T
X
p
- j
SLIDE 98
Reversibility implies stationarity
The detailed-balance condition provides a sufficient condition for stationarity If X is reversible with respect to p, then p is a stationary distribution of X
- T
X
p
- j =
s
- k=1
- T
X
- jk
pk
SLIDE 99
Reversibility implies stationarity
The detailed-balance condition provides a sufficient condition for stationarity If X is reversible with respect to p, then p is a stationary distribution of X
- T
X
p
- j =
s
- k=1
- T
X
- jk
pk =
s
- k=1
- T
X
- kj
pj
SLIDE 100
Reversibility implies stationarity
The detailed-balance condition provides a sufficient condition for stationarity If X is reversible with respect to p, then p is a stationary distribution of X
- T
X
p
- j =
s
- k=1
- T
X
- jk
pk =
s
- k=1
- T
X
- kj
pj = pj
s
- k=1
- T
X
- kj
SLIDE 101
Reversibility implies stationarity
The detailed-balance condition provides a sufficient condition for stationarity If X is reversible with respect to p, then p is a stationary distribution of X
- T
X
p
- j =
s
- k=1
- T
X
- jk
pk =
s
- k=1
- T
X
- kj
pj = pj
s
- k=1
- T
X
- kj
= pj
SLIDE 102
Irreducible chains
Irreducible Markov chains have a single stationary distribution Follows from the Perron-Frobenius theorem:
◮ The transition matrix of an irreducible Markov chain has a single
eigenvector with eigenvalue equal to one
◮ The eigenvector has nonnegative entries
SLIDE 103
Irreducible chains
If X is irreducible and aperiodic, its state vector converges to its stationary distribution pstat for any initial state vector p
X(0)
- X converges in distribution to a random variable with pmf given by
pstat
SLIDE 104
Car rental
Aim: Model location of cars 3 states: Los Angeles, San Francisco, San Jose New cars are uniformly distributed between the 3 states After that the transition probabilities are San Francisco Los Angeles San Jose
- 0.6
0.1 0.3 San Francisco 0.2 0.8 0.3 Los Angeles 0.2 0.1 0.4 San Jose
SLIDE 105
Car rental
What is the proportion of cars in each city eventually? Does this depend on the initial allocation?
SLIDE 106
Car rental
Markov chain with
- p
X(0) :=
1/3 1/3 1/3 T := 0.6 0.1 0.3 0.2 0.8 0.3 0.2 0.1 0.4
SLIDE 107
Car rental
SF LA SJ 0.2 0.2 0.6 0.1 0.8 0.1 0.3 0.3 0.4
SLIDE 108
Car rental
The transition matrix has the following eigenvectors
- q1 :=
0.273 0.545 0.182 ,
- q2 :=
−0.577 0.789 −0.211 ,
- q3 :=
−0.577 −0.211 0.789 The eigenvalues are λ1 := 1, λ2 := 0.573 and λ3 := 0.227 No matter how the cars are allocated, 27.3% end up in San Francisco, 54.5% in LA and 18.2% in San Jose
SLIDE 109
Car rental
5 10 15 20 Customer 0.0 0.2 0.4 0.6 0.8 1.0
SF LA SJ
SLIDE 110
Car rental
5 10 15 20 Customer 0.0 0.2 0.4 0.6 0.8 1.0
SF LA SJ
SLIDE 111