On Steins Method for Infinite Dimensional Gaussian Approximation - - PDF document

on stein s method for infinite dimensional gaussian
SMART_READER_LITE
LIVE PREVIEW

On Steins Method for Infinite Dimensional Gaussian Approximation - - PDF document

On Steins Method for Infinite Dimensional Gaussian Approximation Hsin-Hung Shih Department of Applied Mathematics National University of Kaohsiung, Kaohsiung, Taiwan 1 1 Joint work with Y.-J. Lee 1 Contents Motivation Steins


slide-1
SLIDE 1

On Stein’s Method for Infinite Dimensional Gaussian Approximation

Hsin-Hung Shih Department of Applied Mathematics National University of Kaohsiung, Kaohsiung, Taiwan

1

1Joint work with Y.-J. Lee

1

slide-2
SLIDE 2

Contents

  • Motivation
  • Stein’s lemma for abstract Wiener measures
  • Stein’s equation for abstract Wiener measures
  • Application: A basic central limit theorem

2

slide-3
SLIDE 3

Part I. Motivation

  • In 1972, Charles Stein introduced powerful Stein’s method for

estimating the distance from a probability distribution on R to a Gaussian distribution. An Outline of Stein’s Method [Step 1.] Stein began with the observation: Stein’s Lemma: Let Z be a real valued random variable. Then Z has a standard normal distribution if and only if E[f ′(Z)] = E[Zf(Z)], for all continuous and piecewise cintinuously differentiable func- tions f : R → R with E[|f ′(Z)|] < +∞. Then A defined by Af(x) = f ′(x) − xf(x) is a characterizing operator such that X ∼ Z if and only if E[Af(X)] = 0 holds for all smooth functions f. [Step 2.] Construct Stein equation: Af(x) = f ′(x) − xf(x) = h − E[h(Z)] for any bounded function h on R, and find a equation. Stein

  • bserved that the function

3

slide-4
SLIDE 4

fh(x) = e

x2 2

x

−∞

{h(x) − E[h(Z)]} e−t2

2 dt

satisfies such an equation. [Step 3.] Then, for any class H of (bounded) test functions h, it follows that dH(W, Z) ≡ sup

h∈H

|E[h(W)]−E[h(Z)]| = sup

h∈H

|E[f ′

h(W)−Wfh(W)]|.

Remark 1.

  • When H = {h : R → R : hULip = supx=y

|h(x)−h(y)| |x−y|

≤ 1}, dH is called the Wasserstein distance.

  • When H = {1(−∞,z]; z ∈ R}, dH is called the Kolmogorov

distance. Reference.

  • C. Stein, Approximation Computation of Expectations, IMS

Lecture Notes, Monograph Series, vol. 7, Institute of Mathematical Statistics, Hayward, CA, 1986.

4

slide-5
SLIDE 5
  • In order to extend Stein method to Wiener process or other

Markov process approximation, A. D. Barbour introduced the gen- erator method. An Outline of Barbour’s Generator Method [Step 1.] If µ is the stationary distribution of the Markov process, then X ∼ µ if and only if EAf(X) = 0 for all real-valued functions f for which Af is defined, where A is the infinitesimal generator

  • f this Markov process.

[Step 2.] Since Tth − h = A t Tuh du

  • ,

where Ttf(x) = E[f(Xt) | X(0) = x] is the transition operator of the Markov process, formally taking limits yields

  • h dµ − h = A

∞ Tuh du

  • ,

if the right-hand side exists. Remark 2. Barbour’s generator method gives both a Stein equa- tion Af(x) = h −

  • h dµ

and a candidate for its solution − ∞ Tuh du.

5

slide-6
SLIDE 6

Example 3. The operator Ah(x) = h′′(x) − xh′(x) is the generator of the Ornstein-Uhlenbeck process with sta- tionary distribution µ ∼ N(0, 1). Putting f = h′ gives the classical Stein characterization for N(0, 1). Reference. 1.

  • A. D. Barbour, Stein’s method and Poisson process conver-

gence, J. Appl. Probab. 25(A) (1988), 175-184.

  • 2. A. D. Barbour, Stein’s method for diffusion approximation,
  • Probab. Theory Related Fields 84 (1990), 297-322.

6

slide-7
SLIDE 7

Part II. Stein Lemma for Abstract Wiener Measures Review of Abstract Wiener Space

  • L. Gross introduced the notion of abstract Wiener space in

the following paper:

  • L. Gross, Abstract Wiener spaces, Proc. 5th Berkeley Symp.
  • Math. Statist. Probab., vol. 2 (1965), 31-42.
  • Let H be a given real separable Hilbert space with the norm

| · |H induced by the inner product ·, ·, and · be an another norm defined on H which is weaker than the | · |H-norm.

  • Let µ be the Gauss cylinder set measure in H, i.e., it is the

non-negative set function defined on the colletion of cylinder sets such that if E = {x ∈ H; Px ∈ F}, then µ(E) = 1 √ 2π n

F

e−|x|2

2 dx,

where n = dim PH and dx is the Lebesgue measure of PH.

  • Let F be the partially ordered set of finite dimensional
  • rthogonal projections P of H, where P > Q means Q(H) ⊂

P(H) for P, Q ∈ F.

  • If · -norm is measurable on H, i.e., for every ǫ > 0, there

exists P0 ∈ F such that µ{Px > ǫ} < ǫ, ∀ P ⊥ P0 and P ∈ F, then the triple (i, H, B) is called an abstract Wiener space, where B is the completion of H with respect to · -norm and i is the canonical embedding of H into B.

7

slide-8
SLIDE 8
  • As H is identified as a dense subspace of B, we identify the

dual space B∗ of B as a dense subspace of H∗ ≈ H ⊂ B under the adjoint operator i∗ of i by the following way: for x ∈ H and η ∈ B∗, x, i∗(η) = (i(x), η), where (·, ·) is the B-B∗ pairing.

  • Fact. B carries a probability measure pt on B(B) such that for

any η ∈ B∗,

  • B

ei (x, η) pt(dx) = e− t

2 |η|2 H.

pt is called the abstract Wiener measure in B with variance parameter t > 0. Thus (·, η) is a random variable over (B, pt) with mean 0 and variance t|η|2

  • H. For any h ∈ H, let {ηn} be a sequence in B∗ such

that |ηn − h|H → 0 as n → ∞. Then {(·, ηn)} forms a Cauchy sequence in L2(B, pt), the L2(B, pt)-limit of which is denoted by

  • h. One notes that

h is independent of the choice of {ηn} and h is distributed by the law of N(0, t|h|2

H).

Reference. H.-H. Kuo, Gaussian Measures in Banach Spaces, Lect. Notes in Math., vol. 463, Springer-Verlag, Berlin/New York, 1975.

8

slide-9
SLIDE 9

Stein’s Lemma for Abstract Wiener measures

  • B is a real separable Banach space with · -norm;
  • Z is a fixed Gaussianly distributed B-valued random variable

with mean 0 which means that the distribution µZ ≡ P({Z ∈ ·})

  • f Z is a probability measure in (B, B(B)) such that for each

η ∈ B∗, the random variable (·, η) has a normal distribution with mean 0 with respect to µZ, where (·, ·) is the B-B∗ pairing. Without loss of generality, we may assume that µZ is non- degenerate, that is, every non-empty open subset of B has positive µZ-measure. If not, we replace B by the support of µZ. Kuelb’s Theorem. Suppose µ is a Gaussian measure in a real separable Banach space B. Assume that every non-empty

  • pen subset of B has positive µ-measure. Then there exists a

real separable Hilbert space H such that (i, H, B) is an abstract Wiener Space and µ equals p1. By Kuelb’s theorem, there exists a real separable Hilbert space H such that (i, H, B) is an abstract Wiener space and µZ is the abstract Wiener measure on B with variance parameter 1. Now, following the basic idea of Barbour generator method, we construct our infinite-dimensional setting for Gaussian approxima- tion as follows. [The first Step] Look for a characterizing operator AZ for µZ. Such an operator is defined on a sufficiently large class D of complex-valued functions on B such that a B-valued random variable Y has the same distribution as Z if and only if E[ AZ(f(Y ))] = 0 for all f belonging to D.

9

slide-10
SLIDE 10

For each t ≥ 0, let Ot be the mapping from B ×B(B) into [0, 1] given by Ot(x, E) ≡ p

1−e−2t(e−tx, E) =

  • B

1E(e−tx+

  • 1 − e−2ty) µZ(dy).

Then, for each E ∈ B(B), the mapping x ∈ B → Ot(x, E) is B(B)-measurable, and, for each x ∈ B, {Ot(x, ·); t ≥ 0} is a family of probability measures on B(B) satisfying the Chapman- Kolmogorov equations:

  • B

Os(y, E) Ot(x, dy) = Os+t(x, E), ∀ s, t ≥ 0. Thus {Ot(·, ·); t ≥ 0} forms a temporally homogeneous Markov transition family. It is well-known that there exists a probability measure Λa on a probability space (Ω, F) and a B-valued process Θ = {Θ(t); t ≥ 0} on that space is a temporally homogeneous Markov process such that Λa({Θ(t) ∈ dy}) = Ot(a, dy) and the transition probability . Λa(Θ(t) ∈ dy | Θ(s)) = Ot−s(Θ(s; ·), dy) ∀ 0 ≤ s ≤ t. We call such a process Θ a canonical B-valued Ornstein-Uhlenbeck process starting at the point a. [The second step] For any h ∈ X0 (X0 is some space of test functions))), find a function fh belonging to D solving the now-called Stein equation AZf = h − E[ h(Z)]. For each B(B)-measurable function f and t ≥ 0, let Ttf(x) = E[f(Θ(t)) | Θ(0) = x], x ∈ B, with respect to Λa. Then Ttf(x) ≡

  • B

f(y) Ot(x, dy) =

  • B

f(e−tx+

  • 1 − e−2ty) µZ(dy), t ≥ 0,

10

slide-11
SLIDE 11

provided that such an integral exists. Fact A. The family {Tt; t ≥ 0} is a strongly continuous con- traction semigroup on Lα

c (B, µZ) for any 1 ≤ α ≤ ∞.

Fact B. µZ is a unique invariant measure for {Ot(·, ·); t ≥ 0}, i.e.,

  • B

Ot(x, E) µZ(dx) = µZ(dx). Let Lα be the infinitesimal generator of {Tt; t ≥ 0} on Lα

c (B, µZ).

  • By Fact B, E[Ttf(Z)] = E[f(Z)] for any f ∈ Lα

c (B, µZ),

t ≥ 0, which implies that E[ Lα(h(Z))] = 0, for any h belonging to certain dense domain Dom(Lα) of Lα in Lα

c (B, µZ).

  • By using standard results about strongly continuous con-

traction semigroups, the Bochner integral t

0 Tuh du exists and is

in Dom(Lα); moreover, satisfies the equality Lα t Tuh du

  • = Tth − h,

h ∈ Dom(Lα). Formally letting t approach to infinity on both sides above, fh(x) ≡ − ∞ (Tuh(x) − E[h(Z)]) du, x ∈ B, may serve as a solution of the following equation (with unknown function f) Lαf = h − E[h(Z)].

11

slide-12
SLIDE 12

As α = 2, −Lα is known as the number operator, the represen- tation of which was studied in

  • M. A. Piech, The Ornstein-Uhlenbeck semigroup in an infinite

dimensional L2 setting, J. Funct. Anal. 18 (1975), 271-285. Piech’s Theorem. If f ∈ L2

c(B, µZ) such that |Df(x)|H

exists for µZ-a.e. x ∈ B and is in L2

c(B, µZ) and the Hilbert-

Schmidt norm of D2f(x) is finite for µZ-a.e. x ∈ B and is in L2

c(B, µZ), then f belongs to Dom(L2), as well as

−L2f(x) = (x, Df(x)) − ∆Gf(x), x ∈ B, provided that Df(x) ∈ B∗ and D2f(x) is of trace class. Remark 4. L. Gross introduced the notion of H-differentiation in H-direction as follows. Let f be a function defined from an

  • pen set U of B into a Banach space W. Then f is said to be

H-differentiable at a point x ∈ U if the mapping φ(h) = f(x+h), h ∈ H, regarded as a function defined in a neighborhood of the

  • rigin of H is Fr´

echet differentiable at 0. The Fr´ echet derivative φ′(0) at 0 ∈ H is called the H-derivative of f at x ∈ B. In notation, we denote the H-derivative of f at x in the direction h ∈ H by Df(x)h. The k-th order H-derivatives of f at x are defined inductively and denoted by Dkf(x) for k ≥ 2 if they exist. One notes that Dkf(x) is a bounded k-linear mapping from the Cartesion product H × · · · × H of k copies of H into W for any k ∈ N. In particular, as f is scalar-valued, Df(x) ∈ H∗ ≈ H and D2f(x) is regarded as a bounded linear operator from H into H for any x ∈ U by D2f(x)h, k ≡ D2f(x)(h, k), h, k ∈ H. Further, if D2f(x) is of trace class on H, we define the Gross Laplacian ∆Gf(x) of f at x by ∆Gf(x) = TrH(D2f(x)), where TrH(A) denotes the trace of the operator A on H.

12

slide-13
SLIDE 13

Remark 5. If there is a constant C > 0 such that |Df(x), h| ≤ C·h for any h ∈ H, then the function Df(x) defines an element

  • f B∗ by extension whose B∗ norm is not greater than C. We shall

still denote this linear functional by Df(x). In particular, if f is twice Fr´ echet differentiable on B, then Df(x) is automatically in B∗ and D2f(x) is a bounded linear operator from B into B∗. By the following Goodman’s theorem, the restriction of D2f(x) to H is of trace class, and ∆Gf(x) is immediately defined. Goodman’s Theorem. Let A be a bounded linear operator

  • f B with range in B∗. Then A is a trace class operator of H.

Moreover, Atr ≤ AB,B∗

  • B

x2 p1(dx), where · tr denotes the trace class norm. Theorem. Let X be a B-valued random variable with the distribution µX. (i) If B is finite-dimensional, then µX = µZ if and only if the following identity holds: E

  • X, Df(X) − ∆Gf(X)
  • = 0,

for any twice differentiable function f on B such that E[D2f(Z)tr] < +∞. (ii) If B is infinite-dimensional, then µX = µZ if and only if the above identity holds for any twice H-differentiable function f on B such that Df(x) ∈ B∗ for any x ∈ B, E[D2f(Z)tr] < +∞, and E[Df(Z)α

B∗] < +∞ for some

1 < α < +∞. Here ηB∗ ≡ sup{|h, η|/h; h ∈ H \ {0}} for any η ∈ B∗.

13

slide-14
SLIDE 14

Remark 6.

  • 1. For the case B = Rn, the above identity can be rewritten as

E [X, ∇f(X) − ∆f(X)] = 0, where ∇f( t ) and ∆f( t ) are respectively the gradient and Laplacian of f at t ∈ Rn.

  • 2. If B = R and we replace Df by a differentiable function g on

R, then E

  • D2f(Z)tr
  • =

1 √ 2π ∞

−∞

|g′(t)| e−1

2t2 dt;

thus the statement (i) in the above Theorem is exactly the orig- inal Stein’s characterization for standard normal distributions

  • n R.

14

slide-15
SLIDE 15

Part III. Solution of Stein’s Equation for Abstract Wiener Measures From the above Theorem, the role of the Stein’s equation for the abstract Wiener measure µZ should be played by the following differential equation (with unknown functional f): ∆Gf(x) − (x, Df(x)) = h(x) − E[h(Z)], x ∈ B, where h is given in some class of test functionals.

  • Definition. A function h : B → R (or C) is called an uniformly

Lipschitz-1 function, denoted by h ∈ ULip-1(B), if hULip ≡ sup

x=y∈B

|h(x) − h(y)|/x − y < +∞. Fix h ∈ ULip-1(B), and let fh(x) ≡ − ∞ (Tth(x) − E[h(Z)]) dt = − ∞

  • B
  • h
  • e−tx +
  • 1 − e−2ty
  • − E[h(Z)]
  • µZ(dy) dt.

It is obviously that fh ∈ ULip-1(B) and fhULip ≤ hULip. Remark 7.

  • 1. The above integral exists, since for any x ∈ B,

  • B
  • h
  • e−tx +
  • 1 − e−2ty
  • − E[h(Z)]
  • µZ(dy)
  • dt

≤ ∞

  • B
  • h
  • e−tx +
  • 1 − e−2ty
  • − h(y)
  • µZ(dy) dt

≤ hULip ∞

  • B

e−t x +

  • e2t − 1 − et

yµZ(dy) dt < +∞.

15

slide-16
SLIDE 16
  • 2. For any t > 0, Tt(ULip-1(B)) is contained in ULip-1(B) and

TtfULip ≤ e−t · fULip for any f ∈ ULip-1(B).

  • 3. M. A. Piech studied properties of the solution for the Cauchy

problem: (∂/∂t) u(x, t) = (x, Du(x, t)) − ∆Gu(x, t) (x ∈ B, t > 0) lim

t→0u(x, t) = f(x),

uniformly for x ∈ B. See

  • M. A. Piech, Parabolic equations associated with the number
  • perator, Trans. Amer. Math. Soc. 194 (1974), 213-222.

She assumed that f is bounded and uniformly Lip-1 on B, then proved that u(x, t) = Ttf(x) is the unique solution. In fact, Piech’s result is still true by moving the condition of boundedness.

16

slide-17
SLIDE 17

Proposition. For any x ∈ B and t > 0, ∆G(Tth)(x) − (x, D(Tth)(x)) = d dt Tth(x). Moreover, for any x ∈ B, ∞ ∆G(Tth)(x) dt − ∞ (x, D(Tth)(x)) dt = E[h(Z)] − h(x). Theorem. (i) fh is twice H-differentiable at any x ∈ B. Further, Dfh(x) = − ∞ D(Tth)(x) dt as a B∗-valued Bochner integral, as well as D2fh(x) = − ∞ D2(Tth)(x) dt as a L(H, H)-valued Bochner integral, where L(H, H) is the normed space of all bounded linear operators from H into itself with the operator norm · H,H. (ii) Dfh(x)B∗ ≤ hULip and D2fh(x)H,H ≤ c · π

2 · hULip

uniformly for x ∈ B, where c is a constant such that · ≤ c | · |H. (iii) For any x ∈ B, D2fh(x) is of trace class, and ∆Gfh(x) = − ∞ ∆G(Tth)(x) dt. (iv) D2fh(x)tr ≤

π 2 · hULip

  • B y µZ(dy), uniformly for x ∈

B. (v) fh(x) solves the equation ∆Gf(x) − (x, Df(x)) = h(x) − E[h(Z)] with unknown function f for any x ∈ B.

17

slide-18
SLIDE 18

Part IV. Application: A Basic Central Limit Theorem From now on, B will always denote a real separable Hilbert space, and ·, · denotes the inner product on B in- duced by · -norm. For any x ∈ B, it induces an element ηx in B∗ such that for any y ∈ B, ( y, ηx) = y, x , (1) where (·, ·) is the B-B∗ pairing. Conversely, each element in B∗ can be expressed as ηx for some x ∈ B by the Riesz representation

  • theorem. Then

·, x has a normal distribution on B with mean 0 with respect to µZ. Prohorov’s Theorem. There is the unique bounded linear

  • perator S : B → B, that is of trace class, positive definite, and

self-adjoint, such that for any x ∈ B,

  • B

ei

y, x µZ(dy) = e−1

2

Sx, x .

Such an operator S is called an S-operator on B.

  • Fact. µZ is non-degenerate if and only if S is injective.

We can employ the S-operator S to recapture H as a subset of B without using the Kuelb’s theorem as follows.

  • Set H =

√ S(B) endowed with | · |H-norm induced by the inner product ·, · defined by √ Sx, √ Sy := x, y for any x, y ∈ B.

  • Let {en; n = 1, 2, . . .} be an orthonormal basis for B, consist-

ing of all eigenvectors of S corresponding to positive eigenvalues λn, n = 1, 2, . . .. Then (H, | · |H) is a Hilbert space with an or- thonormal basis { √ S(en); n = 1, 2, . . .}.

18

slide-19
SLIDE 19
  • Since

  • n=1

| √ S( √ S(en))|2

H = ∞

  • n=1

S(en)2 =

  • n=1
  • S(en), en

= Tr(S) < +∞, √ S

  • H, the restriction of

√ S to H, is of Hilbert-Schmidt. Also, for any h ∈ H, h2 = | √ Sh|2

H.

Fact. Let A be of Hilbert-Schmidt on H. Then x = |Ax|, x ∈ H, is a measurable seminorm on H.

  • By this Fact, · is a measurable norm on H and (i, H, B)

forms an abstract Wiener space, where the canonical embedding i : H → B is just an inclusion map. For any x ∈ B, y ∈ H, and ηx given as in (??), ( y, ηx) =

  • y, x

= y, Sx, and ( y, ηx) = (i(y), ηx) = y, i∗(ηx). Then i∗ : B∗ → H∗ ≈ H is defined by i∗(ηx) = Sx for any x ∈ B. So i∗(B∗) = S(B). Identifying B∗ with S(B), ( y, Sx) = y, Sx = y, x

  • for any y ∈ H and x ∈ B.

(2) In addition, for any η ∈ B∗, η2

B∗ = ∞ n=1 λ−2 n |

η, en |2, as well as for any h ∈ H, |h|2

H = ∞ n=1 λ−1 n |

h, en |2.

19

slide-20
SLIDE 20

Let {X1, X2, . . .} be a sequence of independent, identically dis- tributed B-valued random variables, and satisfies the following con- ditions: (a) E[ X1, x ] = 0, and (b) E[ X1, x

  • X1, y

] = Sx, y , for any x, y ∈ B. Here we remark that the B-valued Bochner integral E[X1] exists, since E[X1] ≤ E[X12]

1 2 =

  

  • j=1

E[ X1, ej 2]   

1 2

=

  • Tr(S) < +∞.

Thus, for any x ∈ B and n ∈ N, E[Xn], x = E[ Xn, x ] = 0. Fix n ∈ N. Let W =

1 √n

n

i=1 Xi. Following Stein’s idea, we

construct an auxiliary B-valued random variable W ∗ on a joint probability space with W such that W ∗ has certain properties and is close to W. First of all, let {Y1, Y2, . . ., Yn} be an independent copy of {X1, X2, . . ., Xn}, and I be a random variable which is uniformly distributed over the index set {1, 2, . . ., n}, and inde- pendent of {Xi} and {Yi}. Define W ∗ = W + 1 √n (YI − XI) = 1 √n

n

  • j=1

 Yj +

  • i=j

Xi   · 1{I=j}. Let h ∈ ULip-1(B) and fh be defined as before. Assumption A. h is twice Fr´ echet differentiable on B. Let φ : R×B ×B → C be defined by φ(r, x, y) = fh(rx+(1− r)y). Then, for either of those two cases, φ is twice differentiable

20

slide-21
SLIDE 21

with respect to r, and by using the Taylor’s theorem, fh(W ∗)−fh(W) = ∂ ∂r

  • r=0

φ(r, W ∗, W)+1 2 ∂2 ∂r2

  • r=0

φ(r, W ∗, W)+R, where R is the second-order error term. Observe that ∂ ∂r

  • r=0

φ(r, W ∗, W) = (W ∗ − W, Dfh(W)), where E[(W ∗ − W, Dfh(W))] = 1 √n · E[(YI − XI, Dfh(W))] = 1 n3/2

n

  • j=1

E[(Yj − Xj, Dfh(W))]. By the independence of Yj’s and W, and the condition (a), it follows that E[(Yj, Dfh(W))] =

  • B

E[(Yj, Dfh(x))] µW (dx) = 0, ∀ j = 1, 2, . . ., where µW is the distribution of W in B. Then ∂ ∂r

  • r=0

φ(r, W ∗, W) = − 1 n3/2

n

  • j=1

E[(Xj, Dfh(W))] = −1 n E[(W, Dfh(W))]. Observe that 1 2 ∂2 ∂r2

  • r=0

φ(r, W ∗, W) = 1 2

  • W ∗ − W, D2fh(W)(W ∗ − W)
  • ,

21

slide-22
SLIDE 22

where (·, ·) is the B-B∗ pairing. Observe that 1 2

  • W ∗ − W, D2fh(W)(W ∗ − W)
  • = 1

2

  • i,j=1
  • W ∗ − W, ei
  • W ∗ − W, ej

(ei, D2fh(W)ej) = 1 n

  • i,j=1
  • Sei, ej

(ei, D2fh(W)ej) + n−1

  • i,j=1

Ei,j · (ei, D2fh(W)ej), where, for any i, j ∈ N, Ei,j = 1 2 YI − XI, ei

  • YI − XI, ej

− Sei, ej . Since Sei, ej = λi δi,j, for any i, j, the first summand equals 1 n

  • i=1
  • λi ei, D2fh(W)
  • λi ei
  • = 1

n

  • i=1
  • D2fh(W)
  • λi ei,
  • λi ei
  • = 1

n∆Gfh(W), and hence 1 2 ∂2 ∂r2

  • r=0

φ(r, W ∗, W) = 1 n∆Gfh(W)+1 n

  • i,j=1

Ei,j·

  • ei, D2fh(W)ej
  • .

Notice that E[fh(W ∗) − fh(W)] = 0. Then we have E[(W, Dfh(W))]−E

  • ∆Gfh(W)
  • = E

 

  • i,j=1

Ei,j ·

  • ei, D2fh(W)ej

+n·E[R ]. This implies that E[h(Z)]−E[h(W)] = E  

  • i,j=1

Ei,j ·

  • ei, D2fh(W)ej

+n·E[R ].

22

slide-23
SLIDE 23

By the Cauchy-Schwarz inequality, the fact that for any i, j, E 1 2 Y1 − X1, ei

  • Y1 − X1, ej

− Sei, ej

  • = 0,

the independence of Yℓ, Xℓ, the inequalities p3q ≤ 3

4p4 + 1 4q4 and

p2q2 ≤ 1

2(p4 + q4) holding for any p, q ≥ 0, we obtain that

  • E

 

  • i,j=1

Ei,j ·

  • ei, D2fh(W)ej

1 √n

  • E
  • D2fh(W)2

HS

1/2 4 · E[ X14] − S2

HS

1/2 . To estimate the error term nE[R ], we need the following Assumption B. D2hULip ≡ sup

x=y∈B

D2h(x) − D2h(y)B,B∗/x − y < +∞. Then we obtain that E[n|R|] ≤ 2 3√n · D2hULip · E[X13].

  • Theorem. Let h be uniformly Lipschitz-1 on B, twice differen-

tiable on B, and D2hB,B∗ < +∞. Then |E[h(Z)] − E[h(W)]| ≤ 1 √n

  • E
  • D2fh(W)2

HS

1/2 4 · E[ X14] − S2

HS

1/2 + 2 3√n · D2hULip · E[X13].

23