Adaptive Regularization Algorithms in Learning Theory Case Study: - PowerPoint PPT Presentation

Adaptive Regularization Algorithms in Learning Theory – Case Study: Prediction of Blood Glucose Level Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang RICAM, Austria Joint research with E: De Vito (Uni. Genova), L. Rosasco (MIT, Boston). Workshop ”Inverse and Partial Information Problems”. RICAM, Linz. October-2008. Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Learning from examples Vapnik (95), Evgeniou, Pontil, Poggio (2000), Cucker, Smale (01) : 1) Two sets of variables X ⊂ R d , Y ⊂ R are related by a probabilistic relationship: → ρ ( ·| x ) − (unknown) probability distribution on Y X ∋ x − 2) Training data: z = { ( x 1 , y 1 ) ,..., ( x n , y n ) } ∈ ( X × Y ) n The goal: provide an estimator f = f z : X → Y to predict y ∈ Y for any given x ∈ X . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

EU-project ”DIAdvisor – diabetes adviser”: Glucose Prediction using patient vital data. 1) Input : x = x i = ( t i , x i 1 , x i 2 ,..., x i d − 1 ) ∈ R d , where x i k , k = 1 , 2 ,..., d − 1 , are the measurements of vital signs (e.g. glucose concentration, blood pH, temperature...) measured at the time t = t i , i = 1 , 2 ,..., n . 2) Output : y is the blood glucose concentration at the time t > t n in the future. State of art (R. Gillis et al., Abstract 0415-P , 2007, Santa Barbara, CA): ”With the estimator blinded to meals one can accurately (i.e. with an error less than 2 mmol/l) predict glucose levels 45 minites into the future. This is a promising result...” Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

”The Uncertainty...It is rather a matter of Efficiency” (David Mumford. ”The mathematics of Perception”) If the blood glucose concentration is assumed to be a function y = y ( t , x 1 , x 2 ,..., x d − 1 , x d ,... ) , then Training data are: t i , x i 1 , x i 2 ,..., x i d − 1 , y i = y ( t i , x i 1 , x i 2 ,..., x i d − 1 , x i d ,... ) i = 1 , 2 ,..., n . In the first phase of ”DIAdvisor” only the data ( t i , y i ) , i = 1 , 2 ,..., n are available. The goal is to predict the value y m = y ( t m ,... ) for t m > t n , t m − t n > 45 ( minutes ) . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Statistical framework 1) ρ X ( · ) is the (marginal) probability distribution on X (which is also unknown) 2) Expected risk of the estimator f : X → Y � � Y ( f ( x ) − y ) 2 ρ ( y | x ) ρ X ( x ) dydx E ( f ) = X 3)Regression function � f ρ ( x ) = argmin { E ( f ) , f ∈ L 2 ( X , ρ X dx ) } = Y y ρ ( y | x ) dy Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Hypothesis space and Target function → L 2 ( X , ρ X dx ) is the compact 1) H is a Hilbert space. J : H ֒ embedding 2) f H = argmin { E ( f ) , f ∈ H } = argmin {� f − f ρ � ρ , f ∈ H } E ( f ) = � f − f ρ � 2 ρ + E ( f ρ ) , � ·� ρ = � ·� L 2 ( X , ρ X dx ) ∀ f ∈ H � f − f ρ � ρ = � J f − f ρ � ρ J ∗ : L 2 ( X , ρ X dx ) → H f H : J ∗ J f = J ∗ f ρ ; Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Picard criterion and Source conditions ∞ ∞ ∑ ∑ L = J J ∗ = T = J ∗ J = t i �· , e i � H e i ; t i �· , l i � ρ l i i = 1 i = 1 ∞ ∞ � l i , f ρ � 2 � l i , f ρ � ρ ∑ ∑ ρ f H = e i ∈ H ⇐ < ∞ √ t i ⇒ t i i = 1 i = 1 ∞ � l i , f ρ � 2 ρ ∑ ∃ φ : [ 0 , t 1 ] → R + , φ ( 0 ) = 0, φ ↑ : t i φ 2 ( t i ) < ∞ i = 1 ∞ � l i , f ρ � ρ ∑ √ t i φ ( t i ) e i ∈ H ⇒ f H = φ ( T ) v v = i = 1 H φ = { f ∈ H : f = φ ( T ) v , v ∈ H } . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Reproducing Kernel Hilbert Space H = H K 1) K : X × X → R is continuous, symmetric, positive semidefinite; K x = K ( x , · ) . r 2) H K = { f : f = ∑ c j K x j } , K x j = K ( x j , · ) j = 1 r s r s 3) � f , g � K = � ∑ c j K x j , ∑ d i K t i � K : = ∑ ∑ c j d i K ( x j , t i ) j = 1 i = 1 j = 1 i = 1 4) H K is the completion of H K w.r.t � ·� K ∀ f ∈ H K f ( x ) = � K x , f � K Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Discrete version of the equation J ∗ J f = J ∗ f ρ for J = J H K n z = { ( x i , y i ) } n x = ( x i ) n y = ( y i ) n i = 1 ∈ R n ; � u , v � R n = 1 ∑ u i v i . i = 1 , i = 1 , n i = 1 x : R n → H K S x : H K → R n , S x f = ( f ( x i )) n S ∗ i = 1 , n n x y = 1 S ∗ ∑ y i K x i , T x = S ∗ x S x = 1 ∑ K x i � K x i , ·� K n n i = 1 i = 1 → L 2 ( X , ρ x d x ) ; J H K f = fp ρ ⇒ S x f = y J H K : H K ֒ T = J ∗ Tf = J ∗ H K f ρ ⇒ T x f = S ∗ x y H K J H K : H K → H K ; Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

x y Regularization of T x f = S ∗ Poggio et al. (2000), ... Smale, Zhou (05): Tikhonov regularization f λ n ∑ n i = 1 ( f ( x i ) − y i ) 2 + λ � f � 2 K } = ( λ I + T x ) − 1 S ∗ x y z = argmin { 1 n General regularization scheme: f λ x y = γ i K x i z = g λ ( T x ) S ∗ ∑ i = 1 g λ ( t ) : [ 0 , � T x � ] → R ; For Tikhonov g λ ( t ) = ( λ + t ) − 1 | g λ ( t ) | � c o 1) sup λ ; t | ( 1 − g λ ( t ) t ) t ν | ≤ c ρ λ ν . For Tikhonov p = 1 . 2) ∃ p : ∀ ν ∈ [ 0 , p ] sup t Remark: De Vore et al.(2006), Maiorov(2006): λ = 0 , H is a finite ball in a finite-dimensional space. Cortes, Vapnik (1995): other forms of a loss function V ( y i , f ( x i )) . Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Basic Theorem Assume: f H K ∈ H φ φ ∈ F p K ( x , x ) ; 1) K , æ , æ > sup x ∈ X | ( 1 − g λ ( t ) t ) t q | � c λ q , g λ : ∀ λ q � p + 1 / 2; 2) sup t √ Then for f λ x y , λ > 2 z = g λ ( T x ) S ∗ √ n ælog 4 2 h , with probability 1 − h √ c 2 ) log 1 � f H K − f λ ( c 1 φ ( λ ) λ + z � ρ √ h , � λ n c 4 λ √ n ) log 1 � f H K − f λ ( c 3 φ ( λ )+ z � K � h Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

A priori parameter choice Th.1. Let θ ( t ) = φ ( t ) t , f H K ∈ H φ K . Under the assumptions of Basic Theorem for λ n = θ − 1 ( n − 1 / 2 ) with probability 1 − h � θ − 1 ( n − 1 / 2 ) log 1 � f H K − f λ n c φ ( θ − 1 ( n − 1 / 2 )) z � ρ � h c φ ( θ − 1 ( n − 1 / 2 )) log 1 � f H K − f λ n z � K � h � ·� ρ ∼ n − 2 r + 1 r � ·� K ∼ n − Remark 1: For φ ( t ) = t r ; 4 ( r + 1 ) , 2 ( r + 1 ) . Remark 2: Smale, Zhou (2005): 0 < r � 1 / 2 ; 2 r + 1 � ·� ρ � n − Caponnetto et al. (2005): r > 1 / 2 , 4 ( r + 3 / 2 ) Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Regularization in the empirical norm n { x i } : = 1 ∑ � f � 2 f 2 ( x i ) . n i = 1 Th.2. For f ∈ H K with the probability 1 − h � ≤ c 1 log 1 � � h � � f � 2 ρ −� f � 2 � f � 2 √ n K . � � { x i } Moreover, under the assumptions of Basic Theorem with the same probability √ c 6 ) log 1 � f H K − f λ z � { x i } ≤ ( c 5 φ ( λ ) λ + √ h . λ n Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Balancing Principle for Learning Theory √ { f λ i λ i = λ 0 q i , i = 0 , 1 ,..., M ; λ 0 = 2 h , q > 1 . √ n ælog 4 2 z } , λ j λ emp = max { λ k : � f λ k z � { x i } ≤ 4 c 6 log 1 z − f j = 0 , 1 ,..., k − 1 } . √ h λ j n , z � K ≤ 4 c 4 log 1 λ j λ H K = max { λ k : � f λ k z − f h j = 0 , 1 ,..., k − 1 } . √ n , λ j Th.3. Under the assumption of Basic Theorem the choice λ + = min { λ emp , λ H K } guarantees the optimal order of the risk without knowledge of the function φ generating source conditions. Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Adaptive scheme c v = 1 � f H K − f λ φ ( λ )+ z � ≤ λ v √ n , 2 , 1 4 c λ j λ j � f λ k � f H K − f λ k z − f z � + � f H K − f z � ≤ z � ≤ √ n . λ v j Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang Adaptive Regularization Algorithms in Learning Theory – Case Study: Pre

Adaptive Regularization Algorithms in Learning Theory Case Study: - PowerPoint PPT Presentation

Adaptive Regularization Algorithms in Learning Theory Case Study: Prediction of Blood Glucose Level Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang RICAM, Austria Joint research with E: De Vito (Uni. Genova), L. Rosasco (MIT,

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Learning Theory & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2015 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2014 Tomaso

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Chapter 8 Quantities in Chemical Reactions Roy Kennedy Massachusetts Bay Community College

Fasting Fasting T Time ime in CDC NHANES Richard Stahlhut, MD MPH University of Rochester y

Improving Glucose for Incremental SAT Solving with Assumptions: Application to MUS Extraction

Tracking Glucose for 4 months as a Non-Diabetic Justin Lawler @justin_d_lawler

CEE 370 Environmental Engineering Principles Lecture #23 Water Quality Management I:

Comparison of the metabolomic signature of diabetes and the oral glucose tolerance test lvaro

Testing RR or OR (95% CI) Initial Visit Obese (BMI 30) 2.90 (2.15 3.91) History of GDM

The Ever-Changing Approaches to Diabetes in Pregnancy I have nothing to disclose. Kirsten E.

Sambuz

Useful Links

Newsletter

Mail Us

Adaptive Regularization Algorithms in Learning Theory Case Study: - PowerPoint PPT Presentation

Adaptive Regularization Algorithms in Learning Theory Case Study: Prediction of Blood Glucose Level Sergei V. Pereverzev, Sivananthan Sampath, Huajun Wang RICAM, Austria Joint research with E: De Vito (Uni. Genova), L. Rosasco (MIT,

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

Learning Theory &amp; Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer

Regularization for Multi-Output Learning Lorenzo Rosasco 9.520 L. Rosasco Regularization for

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2015 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 February 2011 Tomaso

The Learning Problem and Regularization Tomaso Poggio 9.520 Class 02 September 2014 Tomaso

CS7015 (Deep Learning) : Lecture 8 Regularization: Bias Variance Tradeoff, l2 regularization,

Adaptive Control Chapter 13: Multimodel adaptive control with switching Chapter 13: Multimodel

Adaptive Control Chapter 14: Adaptive regulation Rejection of unknown disturbances 1

Manifold Regularization Lorenzo Rosasco MIT, 9.520 L. Rosasco Manifold Regularization About

Chapter 8 Quantities in Chemical Reactions Roy Kennedy Massachusetts Bay Community College

Fasting Fasting T Time ime in CDC NHANES Richard Stahlhut, MD MPH University of Rochester y

Improving Glucose for Incremental SAT Solving with Assumptions: Application to MUS Extraction

Tracking Glucose for 4 months as a Non-Diabetic Justin Lawler @justin_d_lawler

CEE 370 Environmental Engineering Principles Lecture #23 Water Quality Management I:

Comparison of the metabolomic signature of diabetes and the oral glucose tolerance test lvaro

Testing RR or OR (95% CI) Initial Visit Obese (BMI 30) 2.90 (2.15 3.91) History of GDM

The Ever-Changing Approaches to Diabetes in Pregnancy I have nothing to disclose. Kirsten E.

Sambuz

Useful Links

Newsletter

Mail Us

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Learning Theory & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer