Statistical Inverse Problems and Instrumental Variables Thorsten - - PowerPoint PPT Presentation

statistical inverse problems and instrumental variables
SMART_READER_LITE
LIVE PREVIEW

Statistical Inverse Problems and Instrumental Variables Thorsten - - PowerPoint PPT Presentation

Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut fr Numerische und Angewandte Mathematik University of Gttingen Workshop on Inverse and Partial Information Problems: Methodology and Applications RICAM,


slide-1
SLIDE 1

Statistical Inverse Problems and Instrumental Variables

Thorsten Hohage

Institut für Numerische und Angewandte Mathematik University of Göttingen

Workshop on Inverse and Partial Information Problems: Methodology and Applications RICAM, Linz, 27.-31.10.2008

slide-2
SLIDE 2

Collaborators

  • Frank Bauer (Linz)
  • Laurent Cavalier (Marseille)
  • Jean-Pierre Florens (Toulouse)
  • Jan Johannnes (Heidelberg)
  • Enno Mammen (Mannheim)
  • Axel Munk (Göttingen)
slide-3
SLIDE 3
  • utline

1

A Newton method for nonlinear statistical inverse problems

2

Oracle inequalities

3

Nonparametric instrumental variables and perturbed

  • perators
slide-4
SLIDE 4

statistical inverse problem

problem: Let X, Y be separable Hilbert spaces and F : D(F) ⊂ X → Y a Fréchet differentiable, one-to-one

  • perator. Estimate a† given indirect observations in the form of

a random process Y = F(a†) + σξ + δζ. F −1 is not continuous! ξ normalized stochastic noise: a Hilbert space satisfying Eξ = 0 and Covξ ≤ 1 σ ≥ 0 stochastic noise level ζ ∈ Y normalized deterministic noise, ζ = 1 δ ≥ 0 deterministic noise level

slide-5
SLIDE 5

the algorithm

The Newton equation F ′[ ak]( ak+1 − ak) = Y − F( ak), k = 1, 2, . . . is regularized in each step by Tikhonov regularization with initial guess a0 and regularization parameter αk = α0qk, q ∈ (0, 1):

  • ak+1 := argmina∈X F ′[

ak](a− ak)+F( ak)−Y2

Y +αk+1a−a02 X

slide-6
SLIDE 6

What is this for linear problems?

If F = T is linear, the iteration formula simplifies to

  • ak+1 := argmina∈X Ta − Y2

Y + αk+1a − a02 X .

The iteration steps decouple in the sense that none of the previous iterate appears in the formula for ak+1. Bias and variance must be balanced by proper choice of the stopping index.

slide-7
SLIDE 7

What if ak / ∈ D(F) for some k?

  • Since typically D(F) = X and the stochastic noise σξ can

be arbitrarily large, there exists a positive probability that

  • ak /

∈ D(F) in each Newton step.

  • “Emergency stop”: If this happens, we stop the Newton

iteration and return a0 as estimator of a†.

  • We will have to show that the probability the such an

emergency stop is necessary rapidly tends to 0 with the stochastic noise level σ.

slide-8
SLIDE 8

Can we improve on the qualification of Tikhonov regularization?

Replace Tikhonov regularization by iterated Tikhonov regularization: ˆ a(0)

k+1 := a0

ˆ a(j)

k+1 := argmina∈X

  • F ′[

ak](a − ak) + F( ak) − Y2

Y

+αk+1a − ˆ a(j−1)

k+1 2 X

  • j = 1, . . . , m

ˆ ak+1 := ˆ a(m)

k+1

closed formula:

  • ak+1

:= a0 + gαk+1

  • F ′[

ak]∗F ′[ ak]

  • F ′[

ak]∗ × ×

  • Y − F(

ak) + F ′[ ak]( ak − a0)

  • rα(λ) :=
  • α

α+λ

m , gα(λ) := 1

λ(1 − rα(λ))

slide-9
SLIDE 9

references:

deterministic convergence analysis:

  • B. Kaltenbacher, A. Neubauer, O. Scherzer. Iterative

Regularization Methods for Nonlinear Ill-Posed Problems. Radon Series on Computational and Applied Mathematics, de Gruyter, Berlin, 2008

  • A. B. Bakushinsky and M. Y. Kokurin. Iterative Methods for

Approximate Solution of Inverse Problems. Springer, Dordrecht, 2008.

  • A. B. Bakushinsky. The problem of the convergence of the

iteratively regularized Gauss-Newton method. Comput. Maths.

  • Math. Phys., 32:1353–1359, 1992.

The following results are from:

F . Bauer, T. Hohage and A. Munk. Iteratively Regularized Gauss-Newton Method for Nonlinear Inverse Problems with Random Noise. preprint, under revision for SIAM J. Numer. Anal.

slide-10
SLIDE 10

error decomposition

Let T := F ′[a†] and Tk := F ′[ ak]. The error Ek = ak − a† in the kth Newton step can be decomposed into

  • an approximation error

Eapp

k+1 := rαk+1(T ∗T)E0,

  • a propagated data noise error

Enoi

k+1 := gαk+1(T ∗ k Tk)T ∗ k (δζ + σξ),

  • and a nonlinearity error

Enl

k+1

:= gαk+1(T ∗

k Tk)T ∗ k

  • F(a†) − F(

ak) + TkEk

  • +
  • rαk+1(T ∗

k Tk) − rαk(T ∗T)

  • E0,

i.e. Ek+1 = Eapp

k+1 + Enoi k+1 + Enl k+1.

slide-11
SLIDE 11

crucial lemma

Lemma

Under certain assumptions discussed below there exists γnl > 0 such that Enl

k ≤ γnl

  • Eapp

k

+ Enoi

k

  • k = 1, . . . , Kmax.
slide-12
SLIDE 12

assumptions of the lemma

  • source condition: There exists a sufficiently small “source”

w ∈ Y such that a0 − a† = T ∗w

  • α0 sufficiently large such that

E0 ≤ q−mEapp

1

  • Lipschitz condition: For all a1, a2 ∈ D(F)

F ′[a1] − F ′[a2] ≤ La1 − a2.

  • choice of Kmax:

Kmax := max

  • k ∈ N : Enoi

k

√αk ≤ Cstop

slide-13
SLIDE 13
  • n the proof of the lemma
  • The proof uses an straightforward induction argument in k.
  • The following properties of iterated Tikhonov regularization

are used:

  • There exists γapp > 0 such that for all k

Eapp

k+1 ≤ Eapp k

≤ γappEapp

k+1

This rules out methods with infinite qualification such as Landweber iteration!

  • The propagated data noise is an ordered process in the

sense that Enoi

k ≤ Enoi k+1

for all k.

slide-14
SLIDE 14
  • ptimal deterministic rates

Corollary

For deterministic errors (σ = 0) define the optimal stopping index by K∗ := min {Kmax, K} , K := argmin k∈N

  • Eapp

k

+ δ √αk

  • .

Then there exist constants C, δ0 > 0 such that ˆ aK∗ − a† ≤ C inf

k∈N

  • Eapp

k

+ δ √αk

  • for all δ ∈ (0, δ0].

In particular, under the Hölder source condition a0 − a† = Λ(T ∗T) ˜ w with µ ∈ [1

2, m] we obtain

ˆ aK∗ − a† = O

  • ˜

w

1 2µ+1 δ 2µ 2µ+1

  • ,
slide-15
SLIDE 15

propagated data noise error

We make the following assumptions on the variance term V(a, α) := gα(F ′[a]∗F ′[a])F ′[a]∗ξ2:

  • There exists a known function ϕnoi such that

(EV(a, α))1/2 ≤ ϕnoi(α) ∀α ∈ (0, α0] and a ∈ D(F).

  • There are constants 1 < γnoi ≤ γnoi < ∞ such that

γnoi ≤ ϕnoi(αk+1)/ϕnoi(αk) ≤ γnoi, ∀k ∈ N0.

  • (exponential inequality)

∃λ1, λ2 > 0 ∀a ∈ D(F) ∀α ∈ (0, α0] ∀τ ≥ 1 P {V(a, α) ≥ τEV(a, α)} ≤ λ1e−λ2τ.

slide-16
SLIDE 16
  • ptimal rates for known smoothness

Theorem

Assume that {a : a − a0 ≤ 2R} ⊂ D(F) and define the

  • ptimal stopping index

K := argmin k∈N

  • Eapp

k

+ δ √αk + σϕnoi(αk)

  • .

If ˆ ak − a0 ≤ 2R for k = 1, . . . , K, set K∗ := K, otherwise K∗ := 0. Then there exist constants C > 1 and δ0, σ0 > 0 such that

aK∗ − a†21/2 ≤ C min

k∈N

  • Eapp

k

+ δ √αk + σϕnoi(αk)

  • for all δ ∈ (0, δ0] and σ ∈ (0, σ0].

Short: The Newton method achieves the same rate as iterated Tikhonov applied to the linearized problem.

slide-17
SLIDE 17
  • utline

1

A Newton method for nonlinear statistical inverse problems

2

Oracle inequalities

3

Nonparametric instrumental variables and perturbed

  • perators
slide-18
SLIDE 18
  • racle parameter choice rules

Consider an inverse problem Y = F(a†) + σξ + δζ and a family {Rα : Y → X} of regularized inverses of F. An oracle parameter choice rule αor for the method {Rα} and the solution a† is defined by sup

ζ≤1

ERαor(Y) − a†2 = inf

α sup ζ≤1

ERα(Y) − a†2 An oracle inequality for some given parameter choice rule α∗ = α∗(Y, σ, δ) is an estimate of the form sup

ζ≤1

ERα∗(Y) − a†2 ≤ χ(σ, δ) sup

ζ≤1

ERαor(Y) − a†2. In the optimal case χ(σ, δ) → 1 as σ, δ → 0.

  • E. Candès. Modern statistical estimation via oracle inequalities.

Acta Numerica, 15:257–325, 2006.

slide-19
SLIDE 19

typical convergence results in deterministic regularization theory

  • In deterministic theory convergence results for parameter

choice rules typically contain a comparison with all other reconstruction methods R : Y → X

  • In this case one cannot consider only one a† ∈ X,
  • therwise the optimal method would be R(Y) ≡ a†.
  • Hence, estimates must be uniform over a smoothness

class S ⊂ X, which is typically defined by a source

  • condition. E.g.

sup

a∈S

sup

ζ≤1

Rα∗(F(a) + δζ) − a† ≤ C inf

˜ R

sup

a†∈S

sup

ζ≤1

˜ R(F(a†) + δζ) − a†.

slide-20
SLIDE 20
  • racle inequalities are more precise

Proposition

Let Rα := (αI + T ∗T)−1T ∗ (Tikhonov regularization) and A = {(T ∗T)µw : w ≤ ρ} with ρ > 0 and µ ∈ (0, 1]. Then for all a† ∈ A sup

δ>0

inf˜

R supa∈S supζ≤1 ˜

R(T(a) + δζ) − a† infα>0 supζ≤1 Rα(Ta† + δζ) − a† = ∞ In other words: For every element a† in the smoothness class A there exists an error level δ > 0 for which the classical deterministic error bounds are suboptimal by an arbitrarily large factor! Deterministic analog to superefficiency.

  • T. T. Cai and M. G. Low. nonparametric estimation over shrinking

neighborhoods: superefficiency and adaptation. Ann. Stat., 33:184–213, 2005.

slide-21
SLIDE 21

balancing principle for nonlinear inverse problems

  • Let

a0, a1, . . . , aKmax be estimators of a† such that

  • ak − a† ≤ Φnoi(k) + Φapp(k) + Φnl(k),

k ≤ Kmax.

  • Φapp is unknown and non-increasing.
  • Φnoi is known and non-decreasing.
  • Φnl is unknown and satisfies for some γnl > 0

Φnl(k) ≤ γnl

  • Φnoi(k) + Φapp(k)
  • ,

k = 0, . . . , Kmax.

slide-22
SLIDE 22
  • racle inequality

Lepski˘ ı balancing principle: kbal := min

  • k ≤ Kmax :

ak − am ≤ 4(1 + γnl)Φnoi(m), m = k + 1, . . . , Kmax

  • Theorem (Bauer, Hohage, Munk)

Assume that Φnoi(k + 1) ≤ γnoiΦnoi(k) for some constant γnoi < ∞. Then

  • akbal − a† ≤ 6(1 + γnl)γnoi

min

k=1,...,Kmax

  • Φapp(k) + Φnoi(k)
  • .

extension of a result for the linear case γnl = 0 by

P . Mathé and S. Pereverzev. Regularization of some linear ill-posed problems with discretized random noisy

  • data. Math. Comp., 75:1913–1929, 2006.

See also

  • O. V. Lepski˘

ı. On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl., 35:454–466, 1990. P . Mathé. The Lepski˘ ı principle revisited. Inverse Problems, 22:L11–L15, 2006.

slide-23
SLIDE 23

deterministic errors, unknown smoothness

We return now to the Newton method for nonlinear inverse problems.

Corollary

Let u = F(a†) + δζ. Then

  • akbal − a†

≤ 6(1 + γnl)γnoi inf

k∈N

  • Φapp(k) +

δ √αk

slide-24
SLIDE 24

stochastic noise, unknown smoothness

Corollary

Let u = F(a†) + σξ + δζ. Furthermore let kbal be chosen by the Lepski˘ ı balancing principle if ak ∈ B2R(a0) for k = 1, . . . , Kmax and kbal := 0 else. Then there exists a constant C > 0 such that for σ, δ small enough

akbal − a†21/2 ≤ C min

k∈N

  • Eapp

k

+ δ √αk + (ln σ−1)σϕnoi(αk)

slide-25
SLIDE 25

Can the logaritmic factor be avoided?

In general, no! Counter-example:

  • A. Tsybakov. On the best rate of adaptive estimation in some

inverse problems. C. R. Acad. Sci. Paris, 300:835–840, 2000.

However, for linear compact operators with polynomially decaying singular values, yes!

  • L. Cavalier, G. K. Golubev, D. Picard, and A. B. Tsybakov. Oracle

inequalities for inverse problems. Ann. Stat., 30:843–874, 2002.

  • L. Cavalier and A. Tsybakov, Sharp adaption for inverse

problems with random noise. Prob. Theor. Rel. Fields, 123:323–354, 2002.

  • L. Cavalier and G. K. Golubev. Risk hull method for inverse
  • problems. Ann. Stat., to appear.
slide-26
SLIDE 26

Unbiased Risk Estimation

Let Y = Ta† + ǫ and aα = T ∗RαY a linear estimator of a† depending on α > 0. To estimate the risk R(α, a†) := Eaα − a†2 assume an independent copy of the noise ˜ ǫ is available and consider U(Y, α, ˜ ǫ) := T ∗RαY2 − 2 Rα(Y + ˜ ǫ), Y − ˜ ǫ . Then U(Y, α, ˜ ǫ) is an unbiased estimator of the risk up to an additive constant since EU(Y, α, ˜ ǫ) = ET ∗RαY2 − 2E Rα(y + ǫ + ˜ ǫ), y + ǫ − ˜ ǫ = Eaα2 − 2 Rαy, y − 2E Rαǫ, ǫ + 2E Rα˜ ǫ, ˜ ǫ = Eaα2 − 2

  • K ∗Rαy, a†

= Eaα2 − 2E

  • K ∗RαY, a†

= R(α, a†) − a†2.

slide-27
SLIDE 27

A condition for bounding the variance

  • f U

To bound the variance of U the following condition is used in the analysis: tr

  • gα(T ∗T)2

≤ C tr

  • (T ∗T)2gα(T ∗T)4

This condition is satisfied for the truncated singular value decomposition, but violated for Tikhonov regularization, Landweber iteration and ν-methods.

slide-28
SLIDE 28

A modified iterated Tikhonov regularization

For given m = 2, 3, . . . compute an estimator by ˆ a(0)

α

:= −margmin a∈X

  • Ta − Y2 + αa2

and ˆ a(l)

α := argmin a∈X

  • Ta − Y2 + αa − ˆ

a(l−1)

α

2 , l = 1, . . . , m. Then for exact data Y = Ta† a† − ˆ a(m) = rα(T ∗T)a† with rα(λ) :=

  • α

α + λ j α + (j + 1)λ α + λ . The method satisfies the usual assumptions and has qualification m − 1. Moreover, it satisfies the condition on the previous slide if the singular values of T decay polynomially.

slide-29
SLIDE 29
  • utline

1

A Newton method for nonlinear statistical inverse problems

2

Oracle inequalities

3

Nonparametric instrumental variables and perturbed

  • perators
slide-30
SLIDE 30

introduction

  • regression problem: Estimate a function a given n

independent observations (Xi, Zi), i = 1, . . . , n of random variables X, Z satisfying Z = a(X) + ǫ where ǫ is an unobservable nuisance variable satisfying E(ǫ|X) = 0.

  • Often the assumption E(ǫ|X) = 0 is violated.
  • We will show that by solving an ill-posed inverse problem
  • ne can still estimate a if there exists another observable

quantity W, which is sufficiently correlated with X and satisfies E(ǫ|W) = 0.

slide-31
SLIDE 31

Estimating hourly wages as a function

  • f the education level
  • Zi: hourly wage of indiviual i
  • Xi: level of education of individual i
  • unknown: a(X) := E(Z|X)

Here it seems unlikely that the wage X and the nuisance variable ǫ = Z − a(X) are uncorrelated since there are other variables such intellegence and stamina which influence both X and Z. However, we may choose W e.g. as distance of the individuals appartment from college and reasonably assume that E(ǫ|W) = 0.

P . Hall and J.L. Horowitz. Nonparametric methods for inference in the presence of instrumental variables. Ann. Stat., 33: 2904–2929, 2005.

slide-32
SLIDE 32

a linear first kind integral equation

From the observed data (Xi, Zi, Wi) we can estimate the joint density f(x, y, w). We have

  • x

E(X = x|W = w)g(x)dx = E(Z|W = w) for all w. Setting k(w, x) := E(X = x|W = w) and u(w) := E(Z|W = w) we obtain the linear integral equation

  • k(w, x)g(x) dx = u(w).

Note that both the right hand side and the kernel are noisy since they have to be estimated from the data.

slide-33
SLIDE 33

a nonlinear integral equation

Often the assumption E(X|W) = 0 can be replaced by the stronger independence assumption ǫ, W independent, Eǫ = 0. The first assumption is equivalent to

  • f(ǫ + a(x), x, w) dx =
  • fW(ǫ + a(x), x)fY,X(w) dx

for all x, w where fW and FY,X denote the marginal densities w.r.t. W and Y, X, respectively. This is a nonlinear integral equation with a noisy kernel, which can be solved by regularized Newton methods.

joint work with J.P .Florens, J. Johannes and E. Mammen

slide-34
SLIDE 34

related work also leading to a nonlinear integral equation:

J.L. Horowitz and S. Lee. Nonparametric instrumental variables estimation of a quantile regression model. Econometrica, 75: 1191–1208, 2008.

Proof of convergence is modelled after the following paper:

  • N. Bissantz, T. Hohage, A. Munk Consistency and rates of

Convergence of Nonlinear Tikhonov regularization with random

  • noise. Inverse Problems, 20:1773-1791, 2004.

It uses a Hölder source condition which seems unnatural in this context since the estimated kernels of the integral operators are smooth.