[PPT] - Asymptotic Robustness of Estimators in Rare-Event Simulation P. PowerPoint Presentation

SLIDE 1

Asymptotic Robustness of Estimators in Rare-Event Simulation

P. L’Ecuyer, Universit´

e de Montr´ eal, Canada

J. H. Blanchet, Columbia University, USA
P. W. Glynn, Stanford University, USA
B. Tuffin, IRISA, Rennes, France

RESIM 2008, Rennes

SLIDE 2

Outline

◮ Rare-event setting and motivation. ◮ Asymptotic robustness properties: Definitions. ◮ Markov chain model and zero-variance approximation. ◮ Examples: Highly-reliable Markovian systems. ◮ Examples: Random walks.

SLIDE 3

Rare-event setting

We want to estimate a small quantity γ = γ(ε) > 0, where γ(ε) → 0 when ε → 0.

SLIDE 4

Rare-event setting

We want to estimate a small quantity γ = γ(ε) > 0, where γ(ε) → 0 when ε → 0. We have an unbiased estimator (r.v.) Y = Y (ε) ≥ 0.

SLIDE 5

Rare-event setting

We want to estimate a small quantity γ = γ(ε) > 0, where γ(ε) → 0 when ε → 0. We have an unbiased estimator (r.v.) Y = Y (ε) ≥ 0. Example: Y (ε) is an indicator function, P[Y (ε) = 1] = γ(ε). Then the relative variance (squared relative error) blows up: Var[Y ] γ2(ε) = 1 − γ(ε) γ(ε) ≈ 1 γ(ε) → ∞ when ε → 0. Standard Monte Carlo (MC): estimate γ by ¯ Yn, the average of n i.i.d. copies of Y . For a meaningful estimate, need n = O(1/γ(ε)). If γ = 10−10, for example, need n = 1014 for 1% relative error. Two popular cases, γ(ε) ≈ e−η/ε and γ(ε) ≈ poly(ε).

SLIDE 6

Some applications where this type of problem happens

◮ Expected amount of radiation that crosses a given protection

shield.

◮ Probability of a large loss from an investment portfolio. ◮ Value-at-risk (quantile estimation). ◮ Ruin probability for an insurance firm. ◮ Probability that the completion time of a large project exceeds

a given threshold.

◮ Probability of buffer overflow, or mean time to overflow, in a

queueing system.

◮ Proportion of packets lost in a communication system. ◮ Air traffic control. ◮ Mean time to failure or other reliability or availability measure

for a highly reliable system (e.g., fault-tolerant computers, safety systems).

SLIDE 7

Major techniques to improve this type of situation: importance sampling (IS) and splitting.

SLIDE 8

Major techniques to improve this type of situation: importance sampling (IS) and splitting. Commonly-used robustness characterizations of Y (ε): It has bounded relative error (BRE) (bounded relative variance) if lim

ε→0

Var[Y (ε)] γ2(ε) < ∞. It is logarithmically efficient (LE) or asymptotically optimal if lim

ε→0

ln E[Y 2(ε)] 2 ln γ(ε) = 1. Means (roughly) that if γ(ε) → 0 at an exponential rate, then the standard deviation converges at least at the same exponential rate.

SLIDE 9

Are there other useful characterizations?

To get a meaningful variance estimator when ε → 0, we would like to control the relative error of the empirical variance. This involves the fourth moment of Y (ε). If we use a CLT, we may want to show that the quality of the normal approximation remains good when ε → 0, by bounding the Berry-Esseen bound on the approx. error. This involves the third moment. In some settings, we may be interested in bounding the relative moment of order 2 + δ for some small δ > 0. And so on.

SLIDE 10

BRM-k

For k ∈ [1, ∞), the relative moment of order k for Y (ε) is mk(ε) = E[Y k(ε)]/γk(ε). For any fixed ε, mk(ε) is nondecreasing in k (from Jensen).

SLIDE 11

BRM-k

For k ∈ [1, ∞), the relative moment of order k for Y (ε) is mk(ε) = E[Y k(ε)]/γk(ε). For any fixed ε, mk(ε) is nondecreasing in k (from Jensen). Y (ε) has a bounded relative moment of order k (BRM-k) if lim sup

ε→0

mk(ε) < ∞.

SLIDE 12

BRM-k

For k ∈ [1, ∞), the relative moment of order k for Y (ε) is mk(ε) = E[Y k(ε)]/γk(ε). For any fixed ε, mk(ε) is nondecreasing in k (from Jensen). Y (ε) has a bounded relative moment of order k (BRM-k) if lim sup

ε→0

mk(ε) < ∞. Some properties: (i) BRE is equivalent to BRM-2. (ii) BRM-k implies BRM-k′ for 1 ≤ k′ < k. (iii) For positive real numbers k, ℓ, m, if Y (ε) = X ℓ(ε) is BRM-mk, then Y ′(ε) = X mℓ(ε) is BRM-k.

SLIDE 13

LE-k

Y (ε) has logarithmic efficiency of order k (LE-k) if lim

ε→0

ln E[Y k(ε)] k ln γ(ε) = 1.

SLIDE 14

Example: γ(ε) = exp[−η/ε], E[Y k(ε)] = q(1/ε) exp[−kη/ε] for some constant η > 0 and a slowly increasing function q (e.g., a polynomial). Then, we have LE-k but not BRE-k.

SLIDE 15

Example: γ(ε) = exp[−η/ε], E[Y k(ε)] = q(1/ε) exp[−kη/ε] for some constant η > 0 and a slowly increasing function q (e.g., a polynomial). Then, we have LE-k but not BRE-k. Example: γk(ε) = q1(ε) = εt1 + o(εt1), E[Y k(ε)] = q2(ε) = εt2 + o(εt2), where t2 ≤ t1. Here, lim

ε→0

ln E[Y k(ε)] k ln γ(ε) = t2 t1 . We have BRM-k iff t2 = t1 iff LE-k.

SLIDE 16

Bounded normal approximation

Berry-Esseen Theorem (one version): Y1, . . . , Yn i.i.d. r.v.’s with E[Y1] = 0, Var[Y1] = σ2, E[|Y1|3] = β3. Let ¯ Yn and S2

n be the empirical mean and variance, and Fn the

distribution function of √n ¯ Yn/Sn. Then, there is an absolute constant a < ∞ such that sup

n≥2, x∈R

|Fn(x) − Φ(x)| ≤ aβ3 σ3√n. Y (ε) is said to have Bounded Normal Approximation (BNA) if lim sup

ε→0

E

|Y (ε) − γ(ε)|3

σ3(ε) < ∞ (Tuffin 1999). This requires that the Berry-Esseen bound remains O(n−1/2) when ε → 0.

SLIDE 17

BNA is not equivalent to BRM-3, because we divide by σ3(ε) here. One can have BNA and not BRM-3, or vice-versa. There are more general versions of the Berry-Esseen inequality that require only a bounded moment of order 2 + δ for any δ ∈ (0, 1] instead of the third moment β3; see, e.g., Petrov (1995). But the bound on |Fn(x) − Φ(x)| is only O(n−δ/2) instead of O(n−1/2).

SLIDE 18

Robustness of the empirical variance

Let X1(ε), . . . , Xn(ε) be an i.i.d. sample of X(ε). We consider the empirical variance Y (ε) = S2

n(ε) as an estimator of the variance

σ2(ε). Let γ(ε) = E[X(ε)].

SLIDE 19

Robustness of the empirical variance

Let X1(ε), . . . , Xn(ε) be an i.i.d. sample of X(ε). We consider the empirical variance Y (ε) = S2

n(ε) as an estimator of the variance

σ2(ε). Let γ(ε) = E[X(ε)]. BRM-4 for X(ε) and BRE for S2

n(ε) are both linked to E[X 4(ε)],

but they are not equivalent. We have Var[S2

n(ε)]

σ4(ε) = Θ E[(X(ε) − γ(ε))4] σ4(ε)

which differs in general from

Θ

E[X 4(ε)]/γ4(ε)
.

Proposition: If σ2(ε) = Θ(γ2(ε)), then BRM-2k for X(ε) implies BRM-k for S2

n(ε), for any k ≥ 1.

A similar observation applies to the equivalence between LE-4 for X(ε) and LE for S2

n(ε).

SLIDE 20

Vanishing relative centered moments

The relative centered moment of order k, for Y (ε), is ck(ε) = E[|Y (ε) − γ(ε)|k] γk(ε) . Y (ε) has vanishing relative centered moment of order k (VRCM-k) if lim sup

ε→0

ck(ε) = 0. True if and only if lim sup

ε→0

mk(ε) = 1.

SLIDE 21

Vanishing relative centered moments

The relative centered moment of order k, for Y (ε), is ck(ε) = E[|Y (ε) − γ(ε)|k] γk(ε) . Y (ε) has vanishing relative centered moment of order k (VRCM-k) if lim sup

ε→0

ck(ε) = 0. True if and only if lim sup

ε→0

mk(ε) = 1. It has vanishing relative variance or relative error (VRE), if lim sup

ε→0

σ(ε) γ(ε) = 0.

SLIDE 22

Vanishing relative centered moments

The relative centered moment of order k, for Y (ε), is ck(ε) = E[|Y (ε) − γ(ε)|k] γk(ε) . Y (ε) has vanishing relative centered moment of order k (VRCM-k) if lim sup

ε→0

ck(ε) = 0. True if and only if lim sup

ε→0

mk(ε) = 1. It has vanishing relative variance or relative error (VRE), if lim sup

ε→0

σ(ε) γ(ε) = 0. VRCM-k implies VRCM-k′ for 1 ≤ k′ ≤ k. It also implies BRM-k.

SLIDE 23

VRCM implies convergence to the zero-variance IS

Suppose γ(ε) = EPε[Y (ε)] =

Ω

Y (ε, ω)dPǫ(ω). Here, we can get zero variance (in principle) by doing importance sampling with the measure Q∗

ε defined by

dQ∗

ε (ω) = Y (ε, ω)

γ(ε) dPε(ω). Proposition: If Y (ε) is VRCM-(1 + δ) for some δ > 0, then lim

ε→0 sup A

|Pε(A) − Q∗

ε (A)| = 0.

SLIDE 24

Proof: For any measurable set A, sup

A

|Q∗

ε (A) − Pε(A)|

≤ sup

A

|EPε [(dQ∗

ε /dPε) I(A)] − EPε[I(A)]|

≤ EPε |dQ∗

ε /dPε − 1|

≤ E1/(1+δ)

Pε

|dQ∗

ε /dPε − 1|(1+δ)

≤ E1/(1+δ)

Pε

|Y (ε)/γ(ε) − 1|(1+δ)

= [c1+δ(ε)]1/(1+δ)

ε→0

− → 0.

SLIDE 25

First-Passage Probability in a Markov Chain

Markov chain X = {Xj, j ≥ 0} with state space S and transition kernel K = {K (x, C) : x ∈ S, C ⊆ S}. For C ⊂ S, define τC = inf{j ≥ 0 : Xj ∈ C}. Given A and B, two disjoint subsets of S, and initial state x0 ∈ (A ∪ B)c, want to estimate γ(x0), where γ(x) = γ(x, ε) = P[τB < τA | X0 = x] We have γ(x) = 1 for x ∈ B and γ(x) = 0 for x ∈ A. Here, K, A, and B may depend on ε.

SLIDE 26

Importance sampling (IS) replace K by another kernel. We get zero variance if we use the kernel K ∗ (x, dy) = K (x, dy) γ(y) γ(x).

SLIDE 27

Importance sampling (IS) replace K by another kernel. We get zero variance if we use the kernel K ∗ (x, dy) = K (x, dy) γ(y) γ(x). Not practical. But can be used to design a good IS scheme of the form Kv (x, dy) = K (x, dy) v (y) w (x), where v : S → [0, ∞) is a good approximation of γ(·) and w (x) =

S

K (x, dy) v (y) is the appropriate normalizing constant, assumed finite for x ∈ (A ∪ B)c. (When v = γ, we have w = v.)

SLIDE 28

IS estimator of γ(x0): Y = Y (ε) = I[τB < τA] L, where L =

τB

k=1

w(Xk−1) v(Xk) = w(X0) v(XτB)

τB−1

k=1

w(Xk) v(Xk) Can take v(x) = 1 for x ∈ B and v(x) = 0 for x ∈ A.

SLIDE 29

IS estimator of γ(x0): Y = Y (ε) = I[τB < τA] L, where L =

τB

k=1

w(Xk−1) v(Xk) = w(X0) v(XτB)

τB−1

k=1

w(Xk) v(Xk) Can take v(x) = 1 for x ∈ B and v(x) = 0 for x ∈ A. To establish robustness properties such as LE-k, BRM-k, and VRCM-k, we need an asymptotic bound on E[Y k(ε)]/γk(x0, ε).

SLIDE 30

Proposition: Bounds via Lyapunov inequalities. Suppose there are positive constants κ1 and κ2 and a function hk : S → [0, ∞) such that v(x) ≥ κ1 and hk(x) ≥ κ2 for each x ∈ B, and Ev,x w(x) v(x) k hk(X1)

≤ hk(x)

for all x ∈ (A ∪ B)c. Then, for all x ∈ (A ∪ B)c, Ev,x[Y k] ≤ vk(x)hk(x) κ1κ2 .

SLIDE 31

Corollary. Under the proposition’s conditions: (i) If lim

ε→0 k ln[v(x0, ε)/γ(x0, ε)] + hk(x0, ε) = 1,

then Y (ε) is LE-k. (ii) If lim

ε→0[v(x0, ε)/γ(x0, ε)]khk(x0, ε) < ∞,

then Y (ε) is BRM-k. (iii) If lim

ε→0

[v(x0, ε)/γ(x0, ε)]khk(x0, ε) κ1κ2 = 1, then Y (ε) is VRCM-k.

SLIDE 32

HRMS-type Framework

Suppose S is finite and {Xj, j ≥ 0} has transition probabilities p(x, y, ε) = P[Xj = y | Xj−1 = x] = a(x, y)εb(x,y), where a(x, y) and b(x, y) are non-negative constants. Ex.: multicomponent highly-reliable Markovian system (HRMS): A = {x0} is the single state where all components are up and B is the set of states where the system is failed. We typically have b(x, y) > 0 for “failure” transitions and b(x, y) = 0 for “repair” transitions (Shahabuddin, Nakayama, ...). In this setting, γ(ε) = Θ(εr) for some r > 0.

SLIDE 33

Some proposed IS heuristics: Balanced failure biasing (BFB) (Shahabuddin 1994) changes p to q as follows, for x ∈ B: q(x, y) =         

1 |F(x)|

if y ∈ F(x) and pR(x) = 0; ρ

1 |F(x)|

if y ∈ F(x) and pR(x) > 0; (1 − ρ) p(x,y)

pR(x)

if y ∈ R(x);

therwise.

SLIDE 34

Some proposed IS heuristics: Balanced failure biasing (BFB) (Shahabuddin 1994) changes p to q as follows, for x ∈ B: q(x, y) =         

1 |F(x)|

if y ∈ F(x) and pR(x) = 0; ρ

1 |F(x)|

if y ∈ F(x) and pR(x) > 0; (1 − ρ) p(x,y)

pR(x)

if y ∈ R(x);

therwise.

Simple failure biasing (SFB) (Shahabuddin 1988): Replace 1/|F(x)| above by p(x, y)/

y∈F(x) p(x, y).

SLIDE 35

Some proposed IS heuristics: Balanced failure biasing (BFB) (Shahabuddin 1994) changes p to q as follows, for x ∈ B: q(x, y) =         

1 |F(x)|

if y ∈ F(x) and pR(x) = 0; ρ

1 |F(x)|

if y ∈ F(x) and pR(x) > 0; (1 − ρ) p(x,y)

pR(x)

if y ∈ R(x);

therwise.

Simple failure biasing (SFB) (Shahabuddin 1988): Replace 1/|F(x)| above by p(x, y)/

y∈F(x) p(x, y).

SBLR (Alexopoulos and Shultes 2001) changes the probabilities in a way that over any cycle in the visited states during the simulation, the cumulated likelihood ratio remains bounded

SLIDE 36

Some proposed IS heuristics: Balanced failure biasing (BFB) (Shahabuddin 1994) changes p to q as follows, for x ∈ B: q(x, y) =         

1 |F(x)|

if y ∈ F(x) and pR(x) = 0; ρ

1 |F(x)|

if y ∈ F(x) and pR(x) > 0; (1 − ρ) p(x,y)

pR(x)

if y ∈ R(x);

therwise.

Simple failure biasing (SFB) (Shahabuddin 1988): Replace 1/|F(x)| above by p(x, y)/

y∈F(x) p(x, y).

SBLR (Alexopoulos and Shultes 2001) changes the probabilities in a way that over any cycle in the visited states during the simulation, the cumulated likelihood ratio remains bounded These methods do not attempt to mimic zero-variance sampling.

SLIDE 37

Proposed approximation (ZVA)

Approximate γ by some easily computable function v, and plug into zero-variance formula. For any state y ∈ B, let Γ(y) be the set of all paths π = (y = y0 → y1 → · · · → yk) where y1, . . . , yk−1 ∈ B ∪ {0}, yk ∈ B, and having positive probability p(π) =

k

j=1

p(yj−1, yj) > 0.

SLIDE 38

Proposed approximation (ZVA)

Approximate γ by some easily computable function v, and plug into zero-variance formula. For any state y ∈ B, let Γ(y) be the set of all paths π = (y = y0 → y1 → · · · → yk) where y1, . . . , yk−1 ∈ B ∪ {0}, yk ∈ B, and having positive probability p(π) =

k

j=1

p(yj−1, yj) > 0. Because these paths represent disjoint events, we have γ(y) =

π∈Γ(y)

p(π).

SLIDE 39

Proposed approximation (ZVA)

Approximate γ by some easily computable function v, and plug into zero-variance formula. For any state y ∈ B, let Γ(y) be the set of all paths π = (y = y0 → y1 → · · · → yk) where y1, . . . , yk−1 ∈ B ∪ {0}, yk ∈ B, and having positive probability p(π) =

k

j=1

p(yj−1, yj) > 0. Because these paths represent disjoint events, we have γ(y) =

π∈Γ(y)

p(π). This last sum may contain a huge (perhaps ∞) number of terms.

SLIDE 40

A very crude approximation is to just take the path with largest probability, i.e., approximate γ(y) =

π∈Γ(y)

p(π) by its lower bound v0(y) = max

π∈Γ(y) p(π).

Computing v0(y) amounts to computing a shortest path from y to B, where the length of a link y′ → y′′ is − log p(y′, y′′). Easy.

SLIDE 41

A very crude approximation is to just take the path with largest probability, i.e., approximate γ(y) =

π∈Γ(y)

p(π) by its lower bound v0(y) = max

π∈Γ(y) p(π).

Computing v0(y) amounts to computing a shortest path from y to B, where the length of a link y′ → y′′ is − log p(y′, y′′). Easy. This would work fine if a single path dominates the sum (this may happen when failure transitions have very small probabilities), but this v0 may underestimate the bound significantly. Slight improvements: take the sum of probabilities of a few disjoint paths.

SLIDE 42

Refinements

Typically, the farther we are from B, the more v0 underestimates γ. Close to B, things are fine, but not close to 0. First simple correction:

1. Estimate γ(0) in preliminary runs with crude IS strategy;
2. Find constant α ≤ 1 such that (v0(0))α equals this estimate;
3. Use v1(y) = (v0(y))α for all y ∈ B as approx. of γ(y).

This v1 matches γ for y ∈ B and matches its estimate at y = 0.

SLIDE 43

Refinements

Typically, the farther we are from B, the more v0 underestimates γ. Close to B, things are fine, but not close to 0. First simple correction:

1. Estimate γ(0) in preliminary runs with crude IS strategy;
2. Find constant α ≤ 1 such that (v0(0))α equals this estimate;
3. Use v1(y) = (v0(y))α for all y ∈ B as approx. of γ(y).

This v1 matches γ for y ∈ B and matches its estimate at y = 0. Second refinement: Replace α by a state-dependent exponent α(y) = 1 + [α(0) − 1]log v0(y) log v0(0), where α(0) = α as above. This α(y) changes progressively from 1 near B to α(0) < 1 in state 0. The correction here is milder than in the previous case when we are close to B. Let v2(y) = (v0(y))α(y) be the resulting approximation.

SLIDE 44

Example: Three types of components

c = 3 and n1 = n2 = n3.

Expon. repair times with mean 1.

Failure rate λi for component type i, with λ1 = ε, λ2 = 1.5ε, and λ3 = 2ε2, for some small real number ε. We will try different values of (ni, ε). B = states where at least one component type has fewer than 2

perational units.

SLIDE 45

Example: Three types of components

c = 3 and n1 = n2 = n3.

Expon. repair times with mean 1.

Failure rate λi for component type i, with λ1 = ε, λ2 = 1.5ε, and λ3 = 2ε2, for some small real number ε. We will try different values of (ni, ε). B = states where at least one component type has fewer than 2

perational units.

To define v0(y), we consider all three paths to B that result from failures of a single component type, and sum their probabilities. The table contains results with n = 220 runs. Best estimate of γ(0): obtained from a large number of runs with

ur best IS strategies.

SLIDE 46

Mean ni ε γ(0) v0(0) BFB SBLR 3 0.001 2.6 × 10−3 1.3 × 10−3 2.7 × 10−3 2.6 × 10−3 6 0.01 1.8 × 10−7 3.4 × 10−8 1.9 × 10−7 [9.9 × 10−7] 6 0.001 1.7 × 10−11 3.4 × 10−12 1.8 × 10−11 (1.8 × 10−16) 12 0.1 6.0 × 10−8 3.2 × 10−9 4.8 × 10−8 1.3 × 10−8 12 0.001 3.9 × 10−28 3.5 × 10−29 (1.8 × 10−40) (2.9 × 10−45) Variance ni ε BFB SBLR 3 0.001 1.8 × 10−2 8.0 × 10−3 6 0.01 6.3 × 10−11 (4.5 × 10−16) 6 0.001 8.8 × 10−19 (2.0 × 10−26) 12 0.1 8.1 × 10−10 1.7 × 10−10 12 0.001 (3.2 × 10−74) (3.5 × 10−84)

SLIDE 47

Mean ni ε γ(0) ZVA(v0) ZVA(v1) ZVA(v2) 3 0.001 2.6 × 10−3 2.6 × 10−3 2.6 × 10−3 2.6 × 10−3 6 0.01 1.8 × 10−7 1.8 × 10−7 1.8 × 10−7 1.8 × 10−7 6 0.001 1.7 × 10−11 1.7 × 10−11 1.7 × 10−11 1.7 × 10−11 12 0.1 6.0 × 10−8 6.0 × 10−8 6.2 × 10−8 6.7 × 10−8 12 0.001 3.9 × 10−28 3.9 × 10−28 3.9 × 10−28 3.9 × 10−28 Variance ni ε α ZVA(v0) ZVA(v1) ZVA(v2) RE(v2) 3 0.001 0.906 6.5 × 10−4 2.7 × 10−3 9.3 × 10−9 0.04 6 0.01 0.903 2.0 × 10−14 1.2 × 10−14 7.7 × 10−15 0.48 6 0.001 0.939 1.2 × 10−23 1.1 × 10−23 7.6 × 10−24 0.16 12 0.1 0.851 1.6 × 10−10 2.9 × 10−10 1.5 × 10−11 64.50 12 0.001 0.963 1.4 × 10−55 9.3 × 10−56 9.4 × 10−56 0.78

SLIDE 48

Proposition. In this HRMS framework, with SFB,

BRM-k and LE-k are equivalent. They are also equivalent for the gth empirical moment.

SLIDE 49

Proposition. In this HRMS framework, with SFB,

BRM-k and LE-k are equivalent. They are also equivalent for the gth empirical moment.

Proposition. Suppose we have an IS scheme such that

p(x, y, ε) = Θ(εd) implies q(x, y, ε) = Θ(εℓ) for ℓ ≤ d. Let E[Y g(ε)] = Θ(εsg )]. Then we have BRM-k of the g-th empirical moment if and only if for all integers m such that r ≤ m < ksg and all sample paths (x0, · · · , xn) leading to B and having probability Θ(εm), P∗{(X0, · · · , Xτ) = (x0, · · · , xn)} = Θ(εℓ) for some ℓ ≤ k(mg − sg)/(kg − 1).

SLIDE 50

Proposition. In this HRMS framework, with SFB,

BRM-k and LE-k are equivalent. They are also equivalent for the gth empirical moment.

Proposition. Suppose we have an IS scheme such that

p(x, y, ε) = Θ(εd) implies q(x, y, ε) = Θ(εℓ) for ℓ ≤ d. Let E[Y g(ε)] = Θ(εsg )]. Then we have BRM-k of the g-th empirical moment if and only if for all integers m such that r ≤ m < ksg and all sample paths (x0, · · · , xn) leading to B and having probability Θ(εm), P∗{(X0, · · · , Xτ) = (x0, · · · , xn)} = Θ(εℓ) for some ℓ ≤ k(mg − sg)/(kg − 1).

Proposition. With SFB or BFB, one cannot achieve VRCM-k.

SLIDE 51

ZVA

Proposition. With our ZVA scheme, if we just take v(y) as the

probability of the most probable path to failure from y, then v(y) = Θ(γ(y)) for all y and we have BRE.

SLIDE 52

ZVA

Proposition. With our ZVA scheme, if we just take v(y) as the

probability of the most probable path to failure from y, then v(y) = Θ(γ(y)) for all y and we have BRE.

Proposition. If v(y)/γ(y) → 1 for each y when ε → 0, then we

have VRCM-k. This holds if v(y) is the sum of probabilities of all the dominant paths from y (those with the smallest power of ε).

SLIDE 53

Probability of a Large Deviation for a Random Walk

Let D1, D2, . . . be i.i.d. random variables, and Sj = D1 + · · · + Dj for j ≥ 0. Let ℓ > E[Dj], n = n(ε) = ⌈1/ε⌉, and γ(ε) = P[Sn/n ≥ ℓ]. We have γ(ε) → 0 when ε → 0.

SLIDE 54

State-Independent Exponential Twisting

Here, it is well known that an LE-2 estimator can be obtained via IS with exponential twisting, if Dj has a light tail distribution Idea: multiply the density of Dj by eθx and normalize. The IS estimator is Y (θ, ε) = I[Sn ≥ nℓ] · L(θ, Sn).

SLIDE 55

State-Independent Exponential Twisting

Here, it is well known that an LE-2 estimator can be obtained via IS with exponential twisting, if Dj has a light tail distribution Idea: multiply the density of Dj by eθx and normalize. The IS estimator is Y (θ, ε) = I[Sn ≥ nℓ] · L(θ, Sn). Let θ∗

ℓ be a positive root of Ψ′(θ) = ℓ, where Ψ is the cumulant

generating function of Dj. Let I(ℓ) = ℓθ∗

ℓ − Ψ(θ∗ ℓ), the large deviation rate function.

Results of Sadowsky (1993) directly imply the following:

Proposition. For any integer k ≥ 2 and any θ, Y (θ, ε) is not

BRM-k. It is LE-k if and only if θ = θ∗

ℓ.

Proposition. The sample variance of m copies of Y (θ, ε), as an

estimator of the true variance, is not BRM-k. It is LE-k if and only if θ = θ∗

ℓ.

SLIDE 56

A State-Dependent IS Scheme Can Achieve BRM-k

We know that for n → ∞, γ(ε) = P[Sn ≥ nℓ] = exp[−nI(ℓ)] [2πnΨ′′(θ∗

ℓ)]1/2θ∗ ℓ

[1 + o(1)].

SLIDE 57

A State-Dependent IS Scheme Can Achieve BRM-k

We know that for n → ∞, γ(ε) = P[Sn ≥ nℓ] = exp[−nI(ℓ)] [2πnΨ′′(θ∗

ℓ)]1/2θ∗ ℓ

[1 + o(1)]. When the Markov chain is in state Xj = (j, Sj) = (j, s) = x, we can approximate γ(x) = γ(j, s) = P[Sn − Sj ≥ ℓ − s] by v(j, s) = exp[−(n − j)I(ℓj)] [2π(n − j)Ψ′′(θ∗

ℓj)]1/2θ∗ ℓj

where ℓj = (nℓ − Sj)/(n − j). Then we use the corresponding zero-variance approximation. By verifying the Lyapunov conditions, we can show that this IS estimator has BRM-k for all k.

SLIDE 58