On-line estimation of a smooth regression function Liptser, R. - - PDF document

on line estimation of a smooth regression function
SMART_READER_LITE
LIVE PREVIEW

On-line estimation of a smooth regression function Liptser, R. - - PDF document

On-line estimation of a smooth regression function Liptser, R. jointly with L. Goldentyer Tel Aviv University Dept. of Electrical Engineering-Systems December 19, 2002 SETTING We consider a tracking problem for smooth function f = f ( t ) ,


slide-1
SLIDE 1

On-line estimation of a smooth regression function

Liptser, R. jointly with L. Goldentyer Tel Aviv University

  • Dept. of Electrical Engineering-Systems

December 19, 2002

slide-2
SLIDE 2

SETTING We consider a tracking problem for smooth function f = f(t), 0 ≤ t ≤ T, under observation Xin = f(tin) + σξi, tin = i n n is large ;

  • (ξi) is i.i.d., Eξi = 0, Eξ2

i = 1;

  • σ2 is a positive constant.

Without additional assumptions on f, it is dif- ficult to create an estimator even n is large.

1

slide-3
SLIDE 3

Main assumption f is differentiable k-times and the oldest deriva- tive is Lipschitz continuous. Filtering approach: Bar-Shalom and Li Simulate f(k)(t) with a help WHITE NOISE: d f(k)(t) dt = “white noise”. it sounds as nonsense but works pretty good. Nonparametric Statistic Approach. f ∈ Σ(k, α, L) The Stone-Ibragimov-Khasminskii class contain- ing k-times differentiable function with

  • f(k)(t′′) − f(k)(t′)
  • ≤ L|t′′ − t′|α,

0 < α ≤ 1.

2

slide-4
SLIDE 4

Task: to combine both approaches Since a quality of estimating depends on n, any estimate of f is marked by n, that is

  • f(j)

n

(t) are estimates of f(j)(t), j = 0, 1, . . . , k respectively. It is known from Ibragimov and Khasminskii that for a wide class of loss sup

f∈Σ(k,α,L)

EL

  • n

k+α−j 2(k+α)+1

  • f(j)

n

− f(j)Lp

  • < C.

and n

k+α−j 2(k+α)+1, j = 0, 1 . . . , k

is the best rate, uniformly in the class, of esti- mating risk convergence to zero in n → ∞.

3

slide-5
SLIDE 5

In particular, the risks E f(j)

n

(t) − f(j)(t)

2, j = 0, 1 . . . , k

have the same rates in n: sup

f∈Σ(k,α,L)

lim

n E

  • n

k+α−j 2(k+α)+1|

  • f(j)

n

(t)−f(j)(t)|

2

< C. These rates cannot be exceeded uniformly on any nonempty open set from (0, T). Jointly with Khasminskii, we realize on-line fil- ter guaranteeing the optimal rates in n.

4

slide-6
SLIDE 6

Here tin and

  • f(j)

n

(tin) identify ti and

  • f(j)

n

(ti). For j = 0, 1 . . . , k − 1,

  • f(j)(ti) =

f(j)(ti−1)+ + 1 n

  • f(j+1)(ti−1)

+ qj n

(2(k+α)−j) 2(k+α)+1

  • Xi −

f(0)(ti−1)

  • and for j = k
  • f(k)(ti) =

f(k)(ti−1)+ qk n

(2(k+α)−k) 2(k+α)+1

  • Xi−

f(0)(ti−1)

  • .

The vector q with entries q0, . . . , qk has to be chosen such that all roots of characteristic poly- nomial pk(u, q) = uk+1+q0uk+q1uk−1+. . .+qk−1u+qk are different and have negative real parts.

5

slide-7
SLIDE 7

Two problems

  • 1. Choice of an appropriate initial conditions:
  • f(0)(0),

f(1)(0), . . . , f(k)(0) to minimize a boundary layer. 2. Choice of the vector q such that the as- sumption about roots of the polynomial pk(u, q) remains valid and C(q) ≥ sup

f∈Σ(k,α,L)

E

  • n

k+α−j 2(k+α)+1|

  • f(j)

n

(t)−f(j)(t)|

2

is smallest as possible. To manage these problems we need to restrict

  • urselves by

α = 1.

6

slide-8
SLIDE 8

Boundary layer The left side boundary layer c(q)n−

1 2β+1 log n

where the optimal rates in n might be lost is inevitable. This boundary layer is due to on- line limitations of the above tracking system. One can readily suggest an off-line modifica- tion with the same recursion in the backward time subject to some boundary conditions in- dependent of observation Xi’s. This modifica- tion obeys the right side boundary layer. So, a combination of the forward and back- ward time tracking algorithms allows support the optimal rate in n for [0, T].

7

slide-9
SLIDE 9

Suitable choice of q Vector q should satisfy multiple requirements regarding

  • C(q) the upper bound for the normalized risk;
  • c(q) the parameter of the boundary layer;
  • roots of polynomial pk(u, q).

These requirements might contradict each other.

8

slide-10
SLIDE 10

Example 1, Σ(0, 1, L) The worst f(t) = f(0) ± Lt. Applying the Arzela-Ascoli theorem we find that C(q) = σq 2 + L2σ2 q2 and q◦ := argmin

g>0

C(q) = (2L)2/3σ1/3. Hence, a reasonable estimator is:

  • f(ti) =

f(ti−1) +

2L

2/3

(Xi − f(ti−1)).

9

slide-11
SLIDE 11

General case, Σ(k > 0, 1, L) With the worst f(t) such that f(k)(t) = f(k)(0) ± Lt, applying the Arzela-Ascoli theorem we find C(q) = trace

  • P(q) + M(g)M∗(q)
  • where

M(q) = L(a − qA)−1b and P(q) solves the Lyapunov equation (a − qA)P(q) + P(q)(a − qA)∗ + σ2qq∗ = 0.

10

slide-12
SLIDE 12

Here, a =

       

1 . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . 1 . . .

       

(k+1)×(k+1)

, A =

  • 1

. . .

  • 1×(k+1) ,

b =

    

. . . 1

    

(k+1)×1

.

11

slide-13
SLIDE 13

Conditional minimization A direct minimization of C(q) is useless. A computer implementation is heavy enough. Even if q◦ = argmin

g

C(q) is found, the main requirement, expressed in term of eigenvalues the polynomial pk(u, q), might be not satisfied (numerical computa- tions show that). So, some kind of a conditional minimization procedure in vector q is desirable. The main tool for such minimization is adaptation to Kalman filter design.

12

slide-14
SLIDE 14

Kalman filter design In a frame of Bar Shalom idea, set f(k)(ti) = f(k)(ti−1) + n−

k+2 2(k+1)+1 γ ηi

Xi = f(0)(ti−1) + σξi where (ηi) is a white noise, independent of (ξi), with Eη1 = 0, Eη1 = 1; γ is free parameter. For any γ = 0, the Kalman filter possesses an asymptotic form in n → ∞ and, being applied to the original function f(t), guaranties the op- timal rate in n → ∞ for the estimation risk. In

  • ther words, that Kalman filter coincides with
  • ur proposed filter.

The remarkable fact is that q = q(γ) and for any positive γ roots of polynomial pk(u, q(γ)) are different and have negative real parts.

13

slide-15
SLIDE 15

Thus, q(γ) = Q(γ)A∗ σ2 with Q(γ) being solution of the algebraic Ric- cati equation aQ(γ) + Q(γ)a∗ + γ2bb∗ − Q(γ)A∗AQ(γ) σ2 = 0.

  • beying the unique positive-definite solution

since block-matrices G1 =

    

A Aa . . . Aak

    

and G2 =

  • bb∗

abb∗ . . . akbb∗ are of full ranks (so called, observability and controllability conditions).

14

slide-16
SLIDE 16

C(q(γ))-minimization We reduce the minimization of C(q) with re- spect to vector q to minimization of C(q(γ) with respect to a positive parameter γ.

10

−2

10 10

2

10

4

10 10

2

10

4

10

6

10

8

σ = 0.25 γ C (q (γ) ) 10

−2

10 10

2

10

4

10 10

2

10

4

10

6

10

8

σ = 1 γ C (q (γ) ) 10

−2

10 10

2

10

4

10 10

2

10

4

10

6

10

8

σ = 4 γ C (q (γ) ) L=1; σ =0.25; γ=0.74082; C=5.278; L=10; σ =0.25; γ=4.4817; C=102.1574; L=100; σ =0.25; γ=24.5325; C=2695.7356; L=1; σ =1; γ=1; C=18.8975; L=10; σ =1; γ=6.0496; C=260.7145; L=100; σ =1; γ=33.1155; C=5839.3352; L=1; σ =4; γ=1.3499; C=92.0461; L=10; σ =4; γ=8.1662; C=787.7197; L=100; σ =4; γ=49.4024; C=13850.9423; L=1 L=10 L=100

Here, C

  • q(γ)
  • in logarithmic scale for k = 2

and various L and σ.

15

slide-17
SLIDE 17

Explicit minimization procedure Entries of Q(γ) obey the following presentation Qij(γ, σ) = Uijσ2

γ

σ

i+j+1

k+1 , i, j = 0, 1, . . . , k,

where Uij are entries of the matrix U also being solution of the algebraic Riccati equation free

  • f σ and γ:

aU + Ua∗ + bb∗ − UA∗AU = 0. We have q0(γ) = U00

γ

σ

1/k+1

q1(γ) = U01

γ

σ

2/k+1

................................. qk(γ) = U0k

γ

σ

  • .

16

slide-18
SLIDE 18

For k = 0, . . . , 4 k U00 U01 U02 U03 U04 1 NA NA NA NA 1 √ 2 1 NA NA NA 2 2 2 1 NA NA 3

  • 4 +

√ 8 2 + √ 2

  • 4 +

√ 8 1 NA 4 1 + √ 5 3 + √ 5 3 + √ 5 1 + √ 5 1

17

slide-19
SLIDE 19

Roots of pk(u, q) k = 0 : −

γ

σ

  • k = 1 :

γ

σ

1/2 1

√ 2 ± i 1 √ 2

  • k = 2 :

γ

σ

1/3

1; 1 2 ± i √ 3 2

  • k = 3 :

γ

σ

1/4

0.924 ± i0.383; 0.383 ± i0.924

  • k = 4 :

γ

σ

1/5

1; 0.809 ± i0.588; 0.309 ± i0.951

  • .

18

slide-20
SLIDE 20

Example 2 k = 2, L = 100, σ = 0.25.

  • f(0)(ti) =

f(0)(ti−1) + 1 n

  • f(1)(ti−1)

+ 9.225 n6/7

  • Xi −

f(0)(ti−1)

  • f(1)(ti) =

f(1)(ti−1) + 1 n

  • f(2)(ti−1)

+ 42.550 n5/7

  • Xi −

f(0)(ti−1)

  • f(2)(ti) =

f(2)(ti−1) + 98.132 n4/7

  • Xi −

f(0)(ti−1)

  • .

19

slide-21
SLIDE 21

Forward and backward time tracking with n = 2000

0.2 0.4 0.6 0.8 1 −8 −6 −4 −2 2 Time Value Tracking 3−rd order Lipshitz continuous function Observations Signal Forward time tracking 0.2 0.4 0.6 0.8 1 −8 −6 −4 −2 2 Time Value Tracking the function Signal Forward time tracking Combined time tracking 0.2 0.4 0.6 0.8 1 −20 −15 −10 −5 5 10 Time Value Tracking the 1−st derrivative Signal Forward time tracking Combined time tracking 0.2 0.4 0.6 0.8 1 −40 −30 −20 −10 10 20 30 Time Value Tracking the 2−nd derivative Signal Forward time tracking Combined time tracking

20

slide-22
SLIDE 22

Comparison with spline n = 2000

0.2 0.4 0.6 0.8 1 −8 −6 −4 −2 2 Time Value Tracking 3−rd order Lipshitz continuous function Observations Signal Spline estimation 0.2 0.4 0.6 0.8 1 −8 −6 −4 −2 2 Time Value Tracking the function Signal Forward time tracking Spline estimation 0.2 0.4 0.6 0.8 1 −20 −15 −10 −5 5 10 Time Value Tracking the 1−st derrivative Signal Forward time tracking Spline estimation 0.2 0.4 0.6 0.8 1 −100 100 200 300 400 Time Value Tracking the 2−nd derivative Signal Forward time tracking Spline estimation

21

slide-23
SLIDE 23

Application to Finance

200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Volatility Original data Reconstructed data

Volatility is assumed to be Lipschitz continu-

  • us functions.

22

slide-24
SLIDE 24

200 400 600 800 1000 1200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Volatility Original data Reconstructed data

Tracking with adaptation.

23