adaptive regularization algorithms in learning theory
play

Adaptive Regularization Algorithms in Learning Theory Case Study: - PowerPoint PPT Presentation

Adaptive Regularization Algorithms in Learning Theory Case Study: Prediction of Blood Glucose Level Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang RICAM, Austria Joint research with E: De Vito (Uni. Genova), L. Rosasco (MIT,


  1. Adaptive Regularization Algorithms in Learning Theory – Case Study: Prediction of Blood Glucose Level Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang RICAM, Austria Joint research with E: De Vito (Uni. Genova), L. Rosasco (MIT, Boston). Workshop ”Inverse and Partial Information Problems”. RICAM, Linz. October-2008. Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  2. Learning from examples Vapnik (95), Evgeniou, Pontil, Poggio (2000), Cucker, Smale (01) : 1) Two sets of variables X ⊂ R d , Y ⊂ R are related by a probabilistic relationship: → ρ ( ·| x ) − (unknown) probability distribution on Y X ∋ x − 2) Training data: z = { ( x 1 , y 1 ) ,..., ( x n , y n ) } ∈ ( X × Y ) n The goal: provide an estimator f = f z : X → Y to predict y ∈ Y for any given x ∈ X . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  3. EU-project ”DIAdvisor – diabetes adviser”: Glucose Prediction using patient vital data. 1) Input : x = x i = ( t i , x i 1 , x i 2 ,..., x i d − 1 ) ∈ R d , where x i k , k = 1 , 2 ,..., d − 1 , are the measurements of vital signs (e.g. glucose concentration, blood pH, temperature...) measured at the time t = t i , i = 1 , 2 ,..., n . 2) Output : y is the blood glucose concentration at the time t > t n in the future. State of art (R. Gillis et al., Abstract 0415-P , 2007, Santa Barbara, CA): ”With the estimator blinded to meals one can accurately (i.e. with an error less than 2 mmol/l) predict glucose levels 45 minites into the future. This is a promising result...” Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  4. ”The Uncertainty...It is rather a matter of Efficiency” (David Mumford. ”The mathematics of Perception”) If the blood glucose concentration is assumed to be a function y = y ( t , x 1 , x 2 ,..., x d − 1 , x d ,... ) , then Training data are: t i , x i 1 , x i 2 ,..., x i d − 1 , y i = y ( t i , x i 1 , x i 2 ,..., x i d − 1 , x i d ,... ) i = 1 , 2 ,..., n . In the first phase of ”DIAdvisor” only the data ( t i , y i ) , i = 1 , 2 ,..., n are available. The goal is to predict the value y m = y ( t m ,... ) for t m > t n , t m − t n > 45 ( minutes ) . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  5. Statistical framework 1) ρ X ( · ) is the (marginal) probability distribution on X (which is also unknown) 2) Expected risk of the estimator f : X → Y � � Y ( f ( x ) − y ) 2 ρ ( y | x ) ρ X ( x ) dydx E ( f ) = X 3)Regression function � f ρ ( x ) = argmin { E ( f ) , f ∈ L 2 ( X , ρ X dx ) } = Y y ρ ( y | x ) dy Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  6. Hypothesis space and Target function → L 2 ( X , ρ X dx ) is the compact 1) H is a Hilbert space. J : H ֒ embedding 2) f H = argmin { E ( f ) , f ∈ H } = argmin {� f − f ρ � ρ , f ∈ H } E ( f ) = � f − f ρ � 2 ρ + E ( f ρ ) , � ·� ρ = � ·� L 2 ( X , ρ X dx ) ∀ f ∈ H � f − f ρ � ρ = � J f − f ρ � ρ J ∗ : L 2 ( X , ρ X dx ) → H f H : J ∗ J f = J ∗ f ρ ; Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  7. Picard criterion and Source conditions ∞ ∞ ∑ ∑ L = J J ∗ = T = J ∗ J = t i �· , e i � H e i ; t i �· , l i � ρ l i i = 1 i = 1 ∞ ∞ � l i , f ρ � 2 � l i , f ρ � ρ ∑ ∑ ρ f H = e i ∈ H ⇐ < ∞ √ t i ⇒ t i i = 1 i = 1 ∞ � l i , f ρ � 2 ρ ∑ ∃ φ : [ 0 , t 1 ] → R + , φ ( 0 ) = 0, φ ↑ : t i φ 2 ( t i ) < ∞ i = 1 ∞ � l i , f ρ � ρ ∑ √ t i φ ( t i ) e i ∈ H ⇒ f H = φ ( T ) v v = i = 1 H φ = { f ∈ H : f = φ ( T ) v , v ∈ H } . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  8. Reproducing Kernel Hilbert Space H = H K 1) K : X × X → R is continuous, symmetric, positive semidefinite; K x = K ( x , · ) . r 2) H K = { f : f = ∑ c j K x j } , K x j = K ( x j , · ) j = 1 r s r s 3) � f , g � K = � ∑ c j K x j , ∑ d i K t i � K : = ∑ ∑ c j d i K ( x j , t i ) j = 1 i = 1 j = 1 i = 1 4) H K is the completion of H K w.r.t � ·� K ∀ f ∈ H K f ( x ) = � K x , f � K Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  9. Discrete version of the equation J ∗ J f = J ∗ f ρ for J = J H K n z = { ( x i , y i ) } n x = ( x i ) n y = ( y i ) n i = 1 ∈ R n ; � u , v � R n = 1 ∑ u i v i . i = 1 , i = 1 , n i = 1 x : R n → H K S x : H K → R n , S x f = ( f ( x i )) n S ∗ i = 1 , n n x y = 1 S ∗ ∑ y i K x i , T x = S ∗ x S x = 1 ∑ K x i � K x i , ·� K n n i = 1 i = 1 → L 2 ( X , ρ x d x ) ; J H K f = fp ρ ⇒ S x f = y J H K : H K ֒ T = J ∗ Tf = J ∗ H K f ρ ⇒ T x f = S ∗ x y H K J H K : H K → H K ; Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  10. x y Regularization of T x f = S ∗ Poggio et al. (2000), ... Smale, Zhou (05): Tikhonov regularization f λ n ∑ n i = 1 ( f ( x i ) − y i ) 2 + λ � f � 2 K } = ( λ I + T x ) − 1 S ∗ x y z = argmin { 1 n General regularization scheme: f λ x y = γ i K x i z = g λ ( T x ) S ∗ ∑ i = 1 g λ ( t ) : [ 0 , � T x � ] → R ; For Tikhonov g λ ( t ) = ( λ + t ) − 1 | g λ ( t ) | � c o 1) sup λ ; t | ( 1 − g λ ( t ) t ) t ν | ≤ c ρ λ ν . For Tikhonov p = 1 . 2) ∃ p : ∀ ν ∈ [ 0 , p ] sup t Remark: De Vore et al.(2006), Maiorov(2006): λ = 0 , H is a finite ball in a finite-dimensional space. Cortes, Vapnik (1995): other forms of a loss function V ( y i , f ( x i )) . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  11. Basic Theorem Assume: f H K ∈ H φ φ ∈ F p K ( x , x ) ; 1) K , æ , æ > sup x ∈ X | ( 1 − g λ ( t ) t ) t q | � c λ q , g λ : ∀ λ q � p + 1 / 2; 2) sup t √ Then for f λ x y , λ > 2 z = g λ ( T x ) S ∗ √ n ælog 4 2 h , with probability 1 − h √ c 2 ) log 1 � f H K − f λ ( c 1 φ ( λ ) λ + z � ρ √ h , � λ n c 4 λ √ n ) log 1 � f H K − f λ ( c 3 φ ( λ )+ z � K � h Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  12. A priori parameter choice Th.1. Let θ ( t ) = φ ( t ) t , f H K ∈ H φ K . Under the assumptions of Basic Theorem for λ n = θ − 1 ( n − 1 / 2 ) with probability 1 − h � θ − 1 ( n − 1 / 2 ) log 1 � f H K − f λ n c φ ( θ − 1 ( n − 1 / 2 )) z � ρ � h c φ ( θ − 1 ( n − 1 / 2 )) log 1 � f H K − f λ n z � K � h � ·� ρ ∼ n − 2 r + 1 r � ·� K ∼ n − Remark 1: For φ ( t ) = t r ; 4 ( r + 1 ) , 2 ( r + 1 ) . Remark 2: Smale, Zhou (2005): 0 < r � 1 / 2 ; 2 r + 1 � ·� ρ � n − Caponnetto et al. (2005): r > 1 / 2 , 4 ( r + 3 / 2 ) Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  13. Regularization in the empirical norm n { x i } : = 1 ∑ � f � 2 f 2 ( x i ) . n i = 1 Th.2. For f ∈ H K with the probability 1 − h � ≤ c 1 log 1 � � h � � f � 2 ρ −� f � 2 � f � 2 √ n K . � � { x i } Moreover, under the assumptions of Basic Theorem with the same probability √ c 6 ) log 1 � f H K − f λ z � { x i } ≤ ( c 5 φ ( λ ) λ + √ h . λ n Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  14. Balancing Principle for Learning Theory √ { f λ i λ i = λ 0 q i , i = 0 , 1 ,..., M ; λ 0 = 2 h , q > 1 . √ n ælog 4 2 z } , λ j λ emp = max { λ k : � f λ k z � { x i } ≤ 4 c 6 log 1 z − f j = 0 , 1 ,..., k − 1 } . √ h λ j n , z � K ≤ 4 c 4 log 1 λ j λ H K = max { λ k : � f λ k z − f h j = 0 , 1 ,..., k − 1 } . √ n , λ j Th.3. Under the assumption of Basic Theorem the choice λ + = min { λ emp , λ H K } guarantees the optimal order of the risk without knowledge of the function φ generating source conditions. Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

  15. Adaptive scheme c v = 1 � f H K − f λ φ ( λ )+ z � ≤ λ v √ n , 2 , 1 4 c λ j λ j � f λ k � f H K − f λ k z − f z � + � f H K − f z � ≤ z � ≤ √ n . λ v j Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend