on line estimation of a smooth regression function
play

On-line estimation of a smooth regression function Liptser, R. - PDF document

On-line estimation of a smooth regression function Liptser, R. jointly with L. Goldentyer Tel Aviv University Dept. of Electrical Engineering-Systems December 19, 2002 SETTING We consider a tracking problem for smooth function f = f ( t ) ,


  1. On-line estimation of a smooth regression function Liptser, R. jointly with L. Goldentyer Tel Aviv University Dept. of Electrical Engineering-Systems December 19, 2002

  2. SETTING We consider a tracking problem for smooth function f = f ( t ) , 0 ≤ t ≤ T, under observation X in = f ( t in ) + σξ i , t in = i n n is large ; - ( ξ i ) is i.i.d., Eξ i = 0 , Eξ 2 i = 1; - σ 2 is a positive constant. Without additional assumptions on f , it is dif- ficult to create an estimator even n is large. 1

  3. Main assumption f is differentiable k -times and the oldest deriva- tive is Lipschitz continuous. Filtering approach: Bar-Shalom and Li Simulate f ( k ) ( t ) with a help WHITE NOISE: f ( k ) ( t ) d = “white noise” . dt it sounds as nonsense but works pretty good. Nonparametric Statistic Approach. f ∈ Σ( k, α, L ) The Stone-Ibragimov-Khasminskii class contain- ing k -times differentiable function with � � � � � ≤ L | t ′′ − t ′ | α , � f ( k ) ( t ′′ ) − f ( k ) ( t ′ ) 0 < α ≤ 1 . 2

  4. Task: to combine both approaches Since a quality of estimating depends on n , any � f ( j ) estimate of f is marked by n , that is ( t ) are n estimates of f ( j ) ( t ), j = 0 , 1 , . . . , k respectively. It is known from Ibragimov and Khasminskii that for a wide class of loss � � k + α − j � f ( j ) − f ( j ) � L p sup E L n 2( k + α )+1 � < C. n f ∈ Σ( k,α,L ) and k + α − j n 2( k + α )+1 , j = 0 , 1 . . . , k is the best rate, uniformly in the class, of esti- mating risk convergence to zero in n → ∞ . 3

  5. In particular, the risks � � � 2 , j = 0 , 1 . . . , k f ( j ) ( t ) − f ( j ) ( t ) E n have the same rates in n : � � 2 k + α − j � f ( j ) ( t ) − f ( j ) ( t ) | 2( k + α )+1 | sup lim n E n < C. n f ∈ Σ( k,α,L ) These rates cannot be exceeded uniformly on any nonempty open set from (0 , T ). Jointly with Khasminskii, we realize on-line fil- ter guaranteeing the optimal rates in n . 4

  6. � � f ( j ) f ( j ) Here t in and ( t in ) identify t i and ( t i ). n n For j = 0 , 1 . . . , k − 1, f ( j ) ( t i ) = � � f ( j ) ( t i − 1 )+ + 1 � f ( j +1) ( t i − 1 ) n � � q j X i − � f (0) ( t i − 1 ) + (2( k + α ) − j ) 2( k + α )+1 n and for j = k � � q k f ( k ) ( t i ) = � � X i − � f ( k ) ( t i − 1 )+ f (0) ( t i − 1 ) . (2( k + α ) − k ) 2( k + α )+1 n The vector q with entries q 0 , . . . , q k has to be chosen such that all roots of characteristic poly- nomial p k ( u, q ) = u k +1 + q 0 u k + q 1 u k − 1 + . . . + q k − 1 u + q k are different and have negative real parts. 5

  7. Two problems 1. Choice of an appropriate initial conditions: f (0) (0) , � � f (1) (0) , . . . , � f ( k ) (0) to minimize a boundary layer. 2. Choice of the vector q such that the as- sumption about roots of the polynomial p k ( u, q ) remains valid and � � 2 k + α − j � f ( j ) ( t ) − f ( j ) ( t ) | C ( q ) ≥ sup 2( k + α )+1 | E n n f ∈ Σ( k,α,L ) is smallest as possible. To manage these problems we need to restrict ourselves by α = 1 . 6

  8. Boundary layer The left side boundary layer 1 c ( q ) n − 2 β +1 log n where the optimal rates in n might be lost is inevitable. This boundary layer is due to on- line limitations of the above tracking system. One can readily suggest an off-line modifica- tion with the same recursion in the backward time subject to some boundary conditions in- dependent of observation X i ’s. This modifica- tion obeys the right side boundary layer. So, a combination of the forward and back- ward time tracking algorithms allows support the optimal rate in n for [0 , T ]. 7

  9. Suitable choice of q Vector q should satisfy multiple requirements regarding - C ( q ) the upper bound for the normalized risk; - c ( q ) the parameter of the boundary layer; - roots of polynomial p k ( u, q ). These requirements might contradict each other. 8

  10. Example 1, Σ(0 , 1 , L ) The worst f ( t ) = f (0) ± Lt . Applying the Arzela-Ascoli theorem we find that 2 + L 2 σ 2 C ( q ) = σq q 2 and q ◦ := argmin C ( q ) = (2 L ) 2 / 3 σ 1 / 3 . g> 0 Hence, a reasonable estimator is: � 2 L � 2 / 3 f ( t i ) = � � ( X i − � f ( t i − 1 ) + f ( t i − 1 )) . nσ 9

  11. General case, Σ( k > 0 , 1 , L ) With the worst f ( t ) such that f ( k ) ( t ) = f ( k ) (0) ± Lt, applying the Arzela-Ascoli theorem we find � � P ( q ) + M ( g ) M ∗ ( q ) C ( q ) = trace where M ( q ) = L ( a − qA ) − 1 b and P ( q ) solves the Lyapunov equation ( a − qA ) P ( q ) + P ( q )( a − qA ) ∗ + σ 2 qq ∗ = 0 . 10

  12. Here,   0 1 0 0 0 . . .    0 0 1 0 0  . . .   . . . . . . . . . . . .   a = . . . . . . ,     0 0 0 0 . . . 1   0 0 0 0 . . . 0 ( k +1) × ( k +1)   0 . � �   . .   A = 1 0 0 b = . . . 1 × ( k +1) , .   0   1 ( k +1) × 1 11

  13. Conditional minimization A direct minimization of C ( q ) is useless. A computer implementation is heavy enough. Even if q ◦ = argmin C ( q ) g is found, the main requirement, expressed in term of eigenvalues the polynomial p k ( u, q ), might be not satisfied (numerical computa- tions show that). So, some kind of a conditional minimization procedure in vector q is desirable. The main tool for such minimization is adaptation to Kalman filter design. 12

  14. Kalman filter design In a frame of Bar Shalom idea, set k +2 f ( k ) ( t i ) = f ( k ) ( t i − 1 ) + n − 2( k +1)+1 γ η i X i = f (0) ( t i − 1 ) + σξ i where ( η i ) is a white noise, independent of ( ξ i ), with Eη 1 = 0, Eη 1 = 1; γ is free parameter. For any γ � = 0, the Kalman filter possesses an asymptotic form in n → ∞ and, being applied to the original function f ( t ), guaranties the op- timal rate in n → ∞ for the estimation risk. In other words, that Kalman filter coincides with our proposed filter. The remarkable fact is that q = q ( γ ) and for any positive γ roots of polynomial p k ( u, q ( γ )) are different and have negative real parts. 13

  15. Thus, q ( γ ) = Q ( γ ) A ∗ σ 2 with Q ( γ ) being solution of the algebraic Ric- cati equation aQ ( γ ) + Q ( γ ) a ∗ + γ 2 bb ∗ − Q ( γ ) A ∗ AQ ( γ ) = 0 . σ 2 obeying the unique positive-definite solution since block-matrices   A   � a k bb ∗ � Aa   bb ∗ abb ∗ G 1 = and G 2 = . . .   . . .   Aa k are of full ranks (so called, observability and controllability conditions). 14

  16. C ( q ( γ )) -minimization We reduce the minimization of C ( q ) with re- spect to vector q to minimization of C ( q ( γ ) with respect to a positive parameter γ . σ = 0.25 σ = 1 8 8 10 10 6 6 10 10 C (q ( γ ) ) C (q ( γ ) ) 4 4 10 10 2 2 10 10 0 0 10 10 −2 0 2 4 −2 0 2 4 10 10 10 10 10 10 10 10 γ γ L=1 σ = 4 L=10 8 10 L=100 L=1; σ =0.25; γ =0.74082; C=5.278; 6 10 L=10; σ =0.25; γ =4.4817; C=102.1574; L=100; σ =0.25; γ =24.5325; C=2695.7356; C (q ( γ ) ) L=1; σ =1; γ =1; C=18.8975; 4 10 L=10; σ =1; γ =6.0496; C=260.7145; L=100; σ =1; γ =33.1155; C=5839.3352; L=1; σ =4; γ =1.3499; C=92.0461; 2 10 L=10; σ =4; γ =8.1662; C=787.7197; L=100; σ =4; γ =49.4024; C=13850.9423; 0 10 −2 0 2 4 10 10 10 10 γ � � Here, C q ( γ ) in logarithmic scale for k = 2 and various L and σ . 15

  17. Explicit minimization procedure Entries of Q ( γ ) obey the following presentation � γ � i + j +1 k +1 , i, j = 0 , 1 , . . . , k, Q ij ( γ, σ ) = U ij σ 2 σ where U ij are entries of the matrix U also being solution of the algebraic Riccati equation free of σ and γ : aU + Ua ∗ + bb ∗ − UA ∗ AU = 0 . We have � γ � 1 /k +1 q 0 ( γ ) = U 00 σ � γ � 2 /k +1 q 1 ( γ ) = U 01 σ ................................. � γ � q k ( γ ) = U 0 k . σ 16

  18. For k = 0 , . . . , 4 k U 00 U 01 U 02 U 03 U 04 0 1 NA NA NA NA √ 1 2 1 NA NA NA 2 2 2 1 NA NA � � √ √ √ 3 4 + 8 2 + 2 4 + 8 1 NA √ √ √ √ 4 1 + 5 3 + 5 3 + 5 1 + 5 1 17

  19. Roots of p k ( u, q ) � γ � k = 0 : − σ � 1 / 2 � 1 � γ � 2 ± i 1 √ √ k = 1 : − σ 2 √ � γ � 1 / 3 � � 1; 1 3 k = 2 : − 2 ± i σ 2 � γ � 1 / 4 � k = 3 : − 0 . 924 ± i 0 . 383; σ � 0 . 383 ± i 0 . 924 � γ � 1 / 5 � k = 4 : 1; 0 . 809 ± i 0 . 588; − σ � 0 . 309 ± i 0 . 951 . 18

  20. Example 2 k = 2, L = 100, σ = 0 . 25. f (0) ( t i − 1 ) + 1 f (0) ( t i ) = � � � f (1) ( t i − 1 ) n � � + 9 . 225 X i − � f (0) ( t i − 1 ) n 6 / 7 f (1) ( t i − 1 ) + 1 f (1) ( t i ) = � � � f (2) ( t i − 1 ) n � � + 42 . 550 X i − � f (0) ( t i − 1 ) n 5 / 7 f (2) ( t i ) = � � f (2) ( t i − 1 ) � � + 98 . 132 X i − � f (0) ( t i − 1 ) . n 4 / 7 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend