SLIDE 1 Asymptotic Robustness of Estimators in Rare-Event Simulation
e de Montr´ eal, Canada
- J. H. Blanchet, Columbia University, USA
- P. W. Glynn, Stanford University, USA
- B. Tuffin, IRISA, Rennes, France
RESIM 2008, Rennes
SLIDE 2
Outline
◮ Rare-event setting and motivation. ◮ Asymptotic robustness properties: Definitions. ◮ Markov chain model and zero-variance approximation. ◮ Examples: Highly-reliable Markovian systems. ◮ Examples: Random walks.
SLIDE 3
Rare-event setting
We want to estimate a small quantity γ = γ(ε) > 0, where γ(ε) → 0 when ε → 0.
SLIDE 4
Rare-event setting
We want to estimate a small quantity γ = γ(ε) > 0, where γ(ε) → 0 when ε → 0. We have an unbiased estimator (r.v.) Y = Y (ε) ≥ 0.
SLIDE 5
Rare-event setting
We want to estimate a small quantity γ = γ(ε) > 0, where γ(ε) → 0 when ε → 0. We have an unbiased estimator (r.v.) Y = Y (ε) ≥ 0. Example: Y (ε) is an indicator function, P[Y (ε) = 1] = γ(ε). Then the relative variance (squared relative error) blows up: Var[Y ] γ2(ε) = 1 − γ(ε) γ(ε) ≈ 1 γ(ε) → ∞ when ε → 0. Standard Monte Carlo (MC): estimate γ by ¯ Yn, the average of n i.i.d. copies of Y . For a meaningful estimate, need n = O(1/γ(ε)). If γ = 10−10, for example, need n = 1014 for 1% relative error. Two popular cases, γ(ε) ≈ e−η/ε and γ(ε) ≈ poly(ε).
SLIDE 6
Some applications where this type of problem happens
◮ Expected amount of radiation that crosses a given protection
shield.
◮ Probability of a large loss from an investment portfolio. ◮ Value-at-risk (quantile estimation). ◮ Ruin probability for an insurance firm. ◮ Probability that the completion time of a large project exceeds
a given threshold.
◮ Probability of buffer overflow, or mean time to overflow, in a
queueing system.
◮ Proportion of packets lost in a communication system. ◮ Air traffic control. ◮ Mean time to failure or other reliability or availability measure
for a highly reliable system (e.g., fault-tolerant computers, safety systems).
SLIDE 7
Major techniques to improve this type of situation: importance sampling (IS) and splitting.
SLIDE 8
Major techniques to improve this type of situation: importance sampling (IS) and splitting. Commonly-used robustness characterizations of Y (ε): It has bounded relative error (BRE) (bounded relative variance) if lim
ε→0
Var[Y (ε)] γ2(ε) < ∞. It is logarithmically efficient (LE) or asymptotically optimal if lim
ε→0
ln E[Y 2(ε)] 2 ln γ(ε) = 1. Means (roughly) that if γ(ε) → 0 at an exponential rate, then the standard deviation converges at least at the same exponential rate.
SLIDE 9
Are there other useful characterizations?
To get a meaningful variance estimator when ε → 0, we would like to control the relative error of the empirical variance. This involves the fourth moment of Y (ε). If we use a CLT, we may want to show that the quality of the normal approximation remains good when ε → 0, by bounding the Berry-Esseen bound on the approx. error. This involves the third moment. In some settings, we may be interested in bounding the relative moment of order 2 + δ for some small δ > 0. And so on.
SLIDE 10
BRM-k
For k ∈ [1, ∞), the relative moment of order k for Y (ε) is mk(ε) = E[Y k(ε)]/γk(ε). For any fixed ε, mk(ε) is nondecreasing in k (from Jensen).
SLIDE 11
BRM-k
For k ∈ [1, ∞), the relative moment of order k for Y (ε) is mk(ε) = E[Y k(ε)]/γk(ε). For any fixed ε, mk(ε) is nondecreasing in k (from Jensen). Y (ε) has a bounded relative moment of order k (BRM-k) if lim sup
ε→0
mk(ε) < ∞.
SLIDE 12
BRM-k
For k ∈ [1, ∞), the relative moment of order k for Y (ε) is mk(ε) = E[Y k(ε)]/γk(ε). For any fixed ε, mk(ε) is nondecreasing in k (from Jensen). Y (ε) has a bounded relative moment of order k (BRM-k) if lim sup
ε→0
mk(ε) < ∞. Some properties: (i) BRE is equivalent to BRM-2. (ii) BRM-k implies BRM-k′ for 1 ≤ k′ < k. (iii) For positive real numbers k, ℓ, m, if Y (ε) = X ℓ(ε) is BRM-mk, then Y ′(ε) = X mℓ(ε) is BRM-k.
SLIDE 13
LE-k
Y (ε) has logarithmic efficiency of order k (LE-k) if lim
ε→0
ln E[Y k(ε)] k ln γ(ε) = 1.
SLIDE 14
Example: γ(ε) = exp[−η/ε], E[Y k(ε)] = q(1/ε) exp[−kη/ε] for some constant η > 0 and a slowly increasing function q (e.g., a polynomial). Then, we have LE-k but not BRE-k.
SLIDE 15
Example: γ(ε) = exp[−η/ε], E[Y k(ε)] = q(1/ε) exp[−kη/ε] for some constant η > 0 and a slowly increasing function q (e.g., a polynomial). Then, we have LE-k but not BRE-k. Example: γk(ε) = q1(ε) = εt1 + o(εt1), E[Y k(ε)] = q2(ε) = εt2 + o(εt2), where t2 ≤ t1. Here, lim
ε→0
ln E[Y k(ε)] k ln γ(ε) = t2 t1 . We have BRM-k iff t2 = t1 iff LE-k.
SLIDE 16 Bounded normal approximation
Berry-Esseen Theorem (one version): Y1, . . . , Yn i.i.d. r.v.’s with E[Y1] = 0, Var[Y1] = σ2, E[|Y1|3] = β3. Let ¯ Yn and S2
n be the empirical mean and variance, and Fn the
distribution function of √n ¯ Yn/Sn. Then, there is an absolute constant a < ∞ such that sup
n≥2, x∈R
|Fn(x) − Φ(x)| ≤ aβ3 σ3√n. Y (ε) is said to have Bounded Normal Approximation (BNA) if lim sup
ε→0
E
σ3(ε) < ∞ (Tuffin 1999). This requires that the Berry-Esseen bound remains O(n−1/2) when ε → 0.
SLIDE 17
BNA is not equivalent to BRM-3, because we divide by σ3(ε) here. One can have BNA and not BRM-3, or vice-versa. There are more general versions of the Berry-Esseen inequality that require only a bounded moment of order 2 + δ for any δ ∈ (0, 1] instead of the third moment β3; see, e.g., Petrov (1995). But the bound on |Fn(x) − Φ(x)| is only O(n−δ/2) instead of O(n−1/2).
SLIDE 18
Robustness of the empirical variance
Let X1(ε), . . . , Xn(ε) be an i.i.d. sample of X(ε). We consider the empirical variance Y (ε) = S2
n(ε) as an estimator of the variance
σ2(ε). Let γ(ε) = E[X(ε)].
SLIDE 19 Robustness of the empirical variance
Let X1(ε), . . . , Xn(ε) be an i.i.d. sample of X(ε). We consider the empirical variance Y (ε) = S2
n(ε) as an estimator of the variance
σ2(ε). Let γ(ε) = E[X(ε)]. BRM-4 for X(ε) and BRE for S2
n(ε) are both linked to E[X 4(ε)],
but they are not equivalent. We have Var[S2
n(ε)]
σ4(ε) = Θ E[(X(ε) − γ(ε))4] σ4(ε)
- which differs in general from
Θ
Proposition: If σ2(ε) = Θ(γ2(ε)), then BRM-2k for X(ε) implies BRM-k for S2
n(ε), for any k ≥ 1.
A similar observation applies to the equivalence between LE-4 for X(ε) and LE for S2
n(ε).
SLIDE 20
Vanishing relative centered moments
The relative centered moment of order k, for Y (ε), is ck(ε) = E[|Y (ε) − γ(ε)|k] γk(ε) . Y (ε) has vanishing relative centered moment of order k (VRCM-k) if lim sup
ε→0
ck(ε) = 0. True if and only if lim sup
ε→0
mk(ε) = 1.
SLIDE 21
Vanishing relative centered moments
The relative centered moment of order k, for Y (ε), is ck(ε) = E[|Y (ε) − γ(ε)|k] γk(ε) . Y (ε) has vanishing relative centered moment of order k (VRCM-k) if lim sup
ε→0
ck(ε) = 0. True if and only if lim sup
ε→0
mk(ε) = 1. It has vanishing relative variance or relative error (VRE), if lim sup
ε→0
σ(ε) γ(ε) = 0.
SLIDE 22
Vanishing relative centered moments
The relative centered moment of order k, for Y (ε), is ck(ε) = E[|Y (ε) − γ(ε)|k] γk(ε) . Y (ε) has vanishing relative centered moment of order k (VRCM-k) if lim sup
ε→0
ck(ε) = 0. True if and only if lim sup
ε→0
mk(ε) = 1. It has vanishing relative variance or relative error (VRE), if lim sup
ε→0
σ(ε) γ(ε) = 0. VRCM-k implies VRCM-k′ for 1 ≤ k′ ≤ k. It also implies BRM-k.
SLIDE 23 VRCM implies convergence to the zero-variance IS
Suppose γ(ε) = EPε[Y (ε)] =
Y (ε, ω)dPǫ(ω). Here, we can get zero variance (in principle) by doing importance sampling with the measure Q∗
ε defined by
dQ∗
ε (ω) = Y (ε, ω)
γ(ε) dPε(ω). Proposition: If Y (ε) is VRCM-(1 + δ) for some δ > 0, then lim
ε→0 sup A
|Pε(A) − Q∗
ε (A)| = 0.
SLIDE 24 Proof: For any measurable set A, sup
A
|Q∗
ε (A) − Pε(A)|
≤ sup
A
|EPε [(dQ∗
ε /dPε) I(A)] − EPε[I(A)]|
≤ EPε |dQ∗
ε /dPε − 1|
≤ E1/(1+δ)
Pε
ε /dPε − 1|(1+δ)
≤ E1/(1+δ)
Pε
= [c1+δ(ε)]1/(1+δ)
ε→0
− → 0.
SLIDE 25
First-Passage Probability in a Markov Chain
Markov chain X = {Xj, j ≥ 0} with state space S and transition kernel K = {K (x, C) : x ∈ S, C ⊆ S}. For C ⊂ S, define τC = inf{j ≥ 0 : Xj ∈ C}. Given A and B, two disjoint subsets of S, and initial state x0 ∈ (A ∪ B)c, want to estimate γ(x0), where γ(x) = γ(x, ε) = P[τB < τA | X0 = x] We have γ(x) = 1 for x ∈ B and γ(x) = 0 for x ∈ A. Here, K, A, and B may depend on ε.
SLIDE 26
Importance sampling (IS) replace K by another kernel. We get zero variance if we use the kernel K ∗ (x, dy) = K (x, dy) γ(y) γ(x).
SLIDE 27 Importance sampling (IS) replace K by another kernel. We get zero variance if we use the kernel K ∗ (x, dy) = K (x, dy) γ(y) γ(x). Not practical. But can be used to design a good IS scheme of the form Kv (x, dy) = K (x, dy) v (y) w (x), where v : S → [0, ∞) is a good approximation of γ(·) and w (x) =
K (x, dy) v (y) is the appropriate normalizing constant, assumed finite for x ∈ (A ∪ B)c. (When v = γ, we have w = v.)
SLIDE 28 IS estimator of γ(x0): Y = Y (ε) = I[τB < τA] L, where L =
τB
w(Xk−1) v(Xk) = w(X0) v(XτB)
τB−1
w(Xk) v(Xk) Can take v(x) = 1 for x ∈ B and v(x) = 0 for x ∈ A.
SLIDE 29 IS estimator of γ(x0): Y = Y (ε) = I[τB < τA] L, where L =
τB
w(Xk−1) v(Xk) = w(X0) v(XτB)
τB−1
w(Xk) v(Xk) Can take v(x) = 1 for x ∈ B and v(x) = 0 for x ∈ A. To establish robustness properties such as LE-k, BRM-k, and VRCM-k, we need an asymptotic bound on E[Y k(ε)]/γk(x0, ε).
SLIDE 30 Proposition: Bounds via Lyapunov inequalities. Suppose there are positive constants κ1 and κ2 and a function hk : S → [0, ∞) such that v(x) ≥ κ1 and hk(x) ≥ κ2 for each x ∈ B, and Ev,x w(x) v(x) k hk(X1)
for all x ∈ (A ∪ B)c. Then, for all x ∈ (A ∪ B)c, Ev,x[Y k] ≤ vk(x)hk(x) κ1κ2 .
SLIDE 31
Corollary. Under the proposition’s conditions: (i) If lim
ε→0 k ln[v(x0, ε)/γ(x0, ε)] + hk(x0, ε) = 1,
then Y (ε) is LE-k. (ii) If lim
ε→0[v(x0, ε)/γ(x0, ε)]khk(x0, ε) < ∞,
then Y (ε) is BRM-k. (iii) If lim
ε→0
[v(x0, ε)/γ(x0, ε)]khk(x0, ε) κ1κ2 = 1, then Y (ε) is VRCM-k.
SLIDE 32
HRMS-type Framework
Suppose S is finite and {Xj, j ≥ 0} has transition probabilities p(x, y, ε) = P[Xj = y | Xj−1 = x] = a(x, y)εb(x,y), where a(x, y) and b(x, y) are non-negative constants. Ex.: multicomponent highly-reliable Markovian system (HRMS): A = {x0} is the single state where all components are up and B is the set of states where the system is failed. We typically have b(x, y) > 0 for “failure” transitions and b(x, y) = 0 for “repair” transitions (Shahabuddin, Nakayama, ...). In this setting, γ(ε) = Θ(εr) for some r > 0.
SLIDE 33 Some proposed IS heuristics: Balanced failure biasing (BFB) (Shahabuddin 1994) changes p to q as follows, for x ∈ B: q(x, y) =
1 |F(x)|
if y ∈ F(x) and pR(x) = 0; ρ
1 |F(x)|
if y ∈ F(x) and pR(x) > 0; (1 − ρ) p(x,y)
pR(x)
if y ∈ R(x);
SLIDE 34 Some proposed IS heuristics: Balanced failure biasing (BFB) (Shahabuddin 1994) changes p to q as follows, for x ∈ B: q(x, y) =
1 |F(x)|
if y ∈ F(x) and pR(x) = 0; ρ
1 |F(x)|
if y ∈ F(x) and pR(x) > 0; (1 − ρ) p(x,y)
pR(x)
if y ∈ R(x);
Simple failure biasing (SFB) (Shahabuddin 1988): Replace 1/|F(x)| above by p(x, y)/
y∈F(x) p(x, y).
SLIDE 35 Some proposed IS heuristics: Balanced failure biasing (BFB) (Shahabuddin 1994) changes p to q as follows, for x ∈ B: q(x, y) =
1 |F(x)|
if y ∈ F(x) and pR(x) = 0; ρ
1 |F(x)|
if y ∈ F(x) and pR(x) > 0; (1 − ρ) p(x,y)
pR(x)
if y ∈ R(x);
Simple failure biasing (SFB) (Shahabuddin 1988): Replace 1/|F(x)| above by p(x, y)/
y∈F(x) p(x, y).
SBLR (Alexopoulos and Shultes 2001) changes the probabilities in a way that over any cycle in the visited states during the simulation, the cumulated likelihood ratio remains bounded
SLIDE 36 Some proposed IS heuristics: Balanced failure biasing (BFB) (Shahabuddin 1994) changes p to q as follows, for x ∈ B: q(x, y) =
1 |F(x)|
if y ∈ F(x) and pR(x) = 0; ρ
1 |F(x)|
if y ∈ F(x) and pR(x) > 0; (1 − ρ) p(x,y)
pR(x)
if y ∈ R(x);
Simple failure biasing (SFB) (Shahabuddin 1988): Replace 1/|F(x)| above by p(x, y)/
y∈F(x) p(x, y).
SBLR (Alexopoulos and Shultes 2001) changes the probabilities in a way that over any cycle in the visited states during the simulation, the cumulated likelihood ratio remains bounded These methods do not attempt to mimic zero-variance sampling.
SLIDE 37 Proposed approximation (ZVA)
Approximate γ by some easily computable function v, and plug into zero-variance formula. For any state y ∈ B, let Γ(y) be the set of all paths π = (y = y0 → y1 → · · · → yk) where y1, . . . , yk−1 ∈ B ∪ {0}, yk ∈ B, and having positive probability p(π) =
k
p(yj−1, yj) > 0.
SLIDE 38 Proposed approximation (ZVA)
Approximate γ by some easily computable function v, and plug into zero-variance formula. For any state y ∈ B, let Γ(y) be the set of all paths π = (y = y0 → y1 → · · · → yk) where y1, . . . , yk−1 ∈ B ∪ {0}, yk ∈ B, and having positive probability p(π) =
k
p(yj−1, yj) > 0. Because these paths represent disjoint events, we have γ(y) =
p(π).
SLIDE 39 Proposed approximation (ZVA)
Approximate γ by some easily computable function v, and plug into zero-variance formula. For any state y ∈ B, let Γ(y) be the set of all paths π = (y = y0 → y1 → · · · → yk) where y1, . . . , yk−1 ∈ B ∪ {0}, yk ∈ B, and having positive probability p(π) =
k
p(yj−1, yj) > 0. Because these paths represent disjoint events, we have γ(y) =
p(π). This last sum may contain a huge (perhaps ∞) number of terms.
SLIDE 40 A very crude approximation is to just take the path with largest probability, i.e., approximate γ(y) =
p(π) by its lower bound v0(y) = max
π∈Γ(y) p(π).
Computing v0(y) amounts to computing a shortest path from y to B, where the length of a link y′ → y′′ is − log p(y′, y′′). Easy.
SLIDE 41 A very crude approximation is to just take the path with largest probability, i.e., approximate γ(y) =
p(π) by its lower bound v0(y) = max
π∈Γ(y) p(π).
Computing v0(y) amounts to computing a shortest path from y to B, where the length of a link y′ → y′′ is − log p(y′, y′′). Easy. This would work fine if a single path dominates the sum (this may happen when failure transitions have very small probabilities), but this v0 may underestimate the bound significantly. Slight improvements: take the sum of probabilities of a few disjoint paths.
SLIDE 42 Refinements
Typically, the farther we are from B, the more v0 underestimates γ. Close to B, things are fine, but not close to 0. First simple correction:
- 1. Estimate γ(0) in preliminary runs with crude IS strategy;
- 2. Find constant α ≤ 1 such that (v0(0))α equals this estimate;
- 3. Use v1(y) = (v0(y))α for all y ∈ B as approx. of γ(y).
This v1 matches γ for y ∈ B and matches its estimate at y = 0.
SLIDE 43 Refinements
Typically, the farther we are from B, the more v0 underestimates γ. Close to B, things are fine, but not close to 0. First simple correction:
- 1. Estimate γ(0) in preliminary runs with crude IS strategy;
- 2. Find constant α ≤ 1 such that (v0(0))α equals this estimate;
- 3. Use v1(y) = (v0(y))α for all y ∈ B as approx. of γ(y).
This v1 matches γ for y ∈ B and matches its estimate at y = 0. Second refinement: Replace α by a state-dependent exponent α(y) = 1 + [α(0) − 1]log v0(y) log v0(0), where α(0) = α as above. This α(y) changes progressively from 1 near B to α(0) < 1 in state 0. The correction here is milder than in the previous case when we are close to B. Let v2(y) = (v0(y))α(y) be the resulting approximation.
SLIDE 44 Example: Three types of components
c = 3 and n1 = n2 = n3.
- Expon. repair times with mean 1.
Failure rate λi for component type i, with λ1 = ε, λ2 = 1.5ε, and λ3 = 2ε2, for some small real number ε. We will try different values of (ni, ε). B = states where at least one component type has fewer than 2
SLIDE 45 Example: Three types of components
c = 3 and n1 = n2 = n3.
- Expon. repair times with mean 1.
Failure rate λi for component type i, with λ1 = ε, λ2 = 1.5ε, and λ3 = 2ε2, for some small real number ε. We will try different values of (ni, ε). B = states where at least one component type has fewer than 2
To define v0(y), we consider all three paths to B that result from failures of a single component type, and sum their probabilities. The table contains results with n = 220 runs. Best estimate of γ(0): obtained from a large number of runs with
SLIDE 46
Mean ni ε γ(0) v0(0) BFB SBLR 3 0.001 2.6 × 10−3 1.3 × 10−3 2.7 × 10−3 2.6 × 10−3 6 0.01 1.8 × 10−7 3.4 × 10−8 1.9 × 10−7 [9.9 × 10−7] 6 0.001 1.7 × 10−11 3.4 × 10−12 1.8 × 10−11 (1.8 × 10−16) 12 0.1 6.0 × 10−8 3.2 × 10−9 4.8 × 10−8 1.3 × 10−8 12 0.001 3.9 × 10−28 3.5 × 10−29 (1.8 × 10−40) (2.9 × 10−45) Variance ni ε BFB SBLR 3 0.001 1.8 × 10−2 8.0 × 10−3 6 0.01 6.3 × 10−11 (4.5 × 10−16) 6 0.001 8.8 × 10−19 (2.0 × 10−26) 12 0.1 8.1 × 10−10 1.7 × 10−10 12 0.001 (3.2 × 10−74) (3.5 × 10−84)
SLIDE 47
Mean ni ε γ(0) ZVA(v0) ZVA(v1) ZVA(v2) 3 0.001 2.6 × 10−3 2.6 × 10−3 2.6 × 10−3 2.6 × 10−3 6 0.01 1.8 × 10−7 1.8 × 10−7 1.8 × 10−7 1.8 × 10−7 6 0.001 1.7 × 10−11 1.7 × 10−11 1.7 × 10−11 1.7 × 10−11 12 0.1 6.0 × 10−8 6.0 × 10−8 6.2 × 10−8 6.7 × 10−8 12 0.001 3.9 × 10−28 3.9 × 10−28 3.9 × 10−28 3.9 × 10−28 Variance ni ε α ZVA(v0) ZVA(v1) ZVA(v2) RE(v2) 3 0.001 0.906 6.5 × 10−4 2.7 × 10−3 9.3 × 10−9 0.04 6 0.01 0.903 2.0 × 10−14 1.2 × 10−14 7.7 × 10−15 0.48 6 0.001 0.939 1.2 × 10−23 1.1 × 10−23 7.6 × 10−24 0.16 12 0.1 0.851 1.6 × 10−10 2.9 × 10−10 1.5 × 10−11 64.50 12 0.001 0.963 1.4 × 10−55 9.3 × 10−56 9.4 × 10−56 0.78
SLIDE 48
- Proposition. In this HRMS framework, with SFB,
BRM-k and LE-k are equivalent. They are also equivalent for the gth empirical moment.
SLIDE 49
- Proposition. In this HRMS framework, with SFB,
BRM-k and LE-k are equivalent. They are also equivalent for the gth empirical moment.
- Proposition. Suppose we have an IS scheme such that
p(x, y, ε) = Θ(εd) implies q(x, y, ε) = Θ(εℓ) for ℓ ≤ d. Let E[Y g(ε)] = Θ(εsg )]. Then we have BRM-k of the g-th empirical moment if and only if for all integers m such that r ≤ m < ksg and all sample paths (x0, · · · , xn) leading to B and having probability Θ(εm), P∗{(X0, · · · , Xτ) = (x0, · · · , xn)} = Θ(εℓ) for some ℓ ≤ k(mg − sg)/(kg − 1).
SLIDE 50
- Proposition. In this HRMS framework, with SFB,
BRM-k and LE-k are equivalent. They are also equivalent for the gth empirical moment.
- Proposition. Suppose we have an IS scheme such that
p(x, y, ε) = Θ(εd) implies q(x, y, ε) = Θ(εℓ) for ℓ ≤ d. Let E[Y g(ε)] = Θ(εsg )]. Then we have BRM-k of the g-th empirical moment if and only if for all integers m such that r ≤ m < ksg and all sample paths (x0, · · · , xn) leading to B and having probability Θ(εm), P∗{(X0, · · · , Xτ) = (x0, · · · , xn)} = Θ(εℓ) for some ℓ ≤ k(mg − sg)/(kg − 1).
- Proposition. With SFB or BFB, one cannot achieve VRCM-k.
SLIDE 51 ZVA
- Proposition. With our ZVA scheme, if we just take v(y) as the
probability of the most probable path to failure from y, then v(y) = Θ(γ(y)) for all y and we have BRE.
SLIDE 52 ZVA
- Proposition. With our ZVA scheme, if we just take v(y) as the
probability of the most probable path to failure from y, then v(y) = Θ(γ(y)) for all y and we have BRE.
- Proposition. If v(y)/γ(y) → 1 for each y when ε → 0, then we
have VRCM-k. This holds if v(y) is the sum of probabilities of all the dominant paths from y (those with the smallest power of ε).
SLIDE 53
Probability of a Large Deviation for a Random Walk
Let D1, D2, . . . be i.i.d. random variables, and Sj = D1 + · · · + Dj for j ≥ 0. Let ℓ > E[Dj], n = n(ε) = ⌈1/ε⌉, and γ(ε) = P[Sn/n ≥ ℓ]. We have γ(ε) → 0 when ε → 0.
SLIDE 54
State-Independent Exponential Twisting
Here, it is well known that an LE-2 estimator can be obtained via IS with exponential twisting, if Dj has a light tail distribution Idea: multiply the density of Dj by eθx and normalize. The IS estimator is Y (θ, ε) = I[Sn ≥ nℓ] · L(θ, Sn).
SLIDE 55 State-Independent Exponential Twisting
Here, it is well known that an LE-2 estimator can be obtained via IS with exponential twisting, if Dj has a light tail distribution Idea: multiply the density of Dj by eθx and normalize. The IS estimator is Y (θ, ε) = I[Sn ≥ nℓ] · L(θ, Sn). Let θ∗
ℓ be a positive root of Ψ′(θ) = ℓ, where Ψ is the cumulant
generating function of Dj. Let I(ℓ) = ℓθ∗
ℓ − Ψ(θ∗ ℓ), the large deviation rate function.
Results of Sadowsky (1993) directly imply the following:
- Proposition. For any integer k ≥ 2 and any θ, Y (θ, ε) is not
BRM-k. It is LE-k if and only if θ = θ∗
ℓ.
- Proposition. The sample variance of m copies of Y (θ, ε), as an
estimator of the true variance, is not BRM-k. It is LE-k if and only if θ = θ∗
ℓ.
SLIDE 56
A State-Dependent IS Scheme Can Achieve BRM-k
We know that for n → ∞, γ(ε) = P[Sn ≥ nℓ] = exp[−nI(ℓ)] [2πnΨ′′(θ∗
ℓ)]1/2θ∗ ℓ
[1 + o(1)].
SLIDE 57
A State-Dependent IS Scheme Can Achieve BRM-k
We know that for n → ∞, γ(ε) = P[Sn ≥ nℓ] = exp[−nI(ℓ)] [2πnΨ′′(θ∗
ℓ)]1/2θ∗ ℓ
[1 + o(1)]. When the Markov chain is in state Xj = (j, Sj) = (j, s) = x, we can approximate γ(x) = γ(j, s) = P[Sn − Sj ≥ ℓ − s] by v(j, s) = exp[−(n − j)I(ℓj)] [2π(n − j)Ψ′′(θ∗
ℓj)]1/2θ∗ ℓj
where ℓj = (nℓ − Sj)/(n − j). Then we use the corresponding zero-variance approximation. By verifying the Lyapunov conditions, we can show that this IS estimator has BRM-k for all k.
SLIDE 58
Increments With Heavy Tails
Random walk with negative drift; probability that the walk exceeds a given positive threshold. Can obtain state-dependent IS having BRM-k for all k.