Class 5 Stability Carlo Ciliberto Department of Computer Science, - PowerPoint PPT Presentation

Class 5 Stability Carlo Ciliberto Department of Computer Science, UCL November 8, 2017

Uniform Stability - Notation Let Z be a set. For any set S = { z 1 , . . . , z n } ∈ Z n and for any z ∈ Z and i = 1 , . . . , n , we denote S i,z = { z 1 , . . . , z i − 1 , z, z i +1 , . . . , z n } ∈ Z n the set obtained by substituting the i -th element in S with z .

Uniform Stability We denote input-output pairs as z = ( x, y ) ∈ Z = X × Y and for any f : X → Y we denote ℓ ( f, z ) = ℓ ( f ( x ) , y ) . For an algorithm A and for any dataset S = ( z i ) n i =1 we write f S = A ( S ) . Uniform β -Stability . An algorithm A is β ( n ) -stable with n ∈ N and β ( n ) > 0 , if for all S ∈ Z n , z ∈ Z and i = 1 , . . . , n sup | ℓ ( f S , ¯ z ) − ℓ ( f S i,z , ¯ z ) | ≤ β ( n ) ¯ z ∈Z

Stability and Generalization Error Theorem . Let A be a uniform β ( n ) -stable algorithm. For any dataset S ∈ Z n denote f S = A ( S ) . Then | E S ∼ ρ n [ E ( f S ) − E n ( f S )] | ≤ β ( n ) where S ∼ ρ n denotes a random dataset with n points sampled independently from ρ . The above result shows that uniform stability of an algorithm allows to directly control its generalization error . Note that this result relies only on the properties of the learning algorithm but does not require any knowledge about the complexity of the hypotheses space (however it is indirectly related).

Stability and Generalization Error (Continued) We begin by providing alternative formulation for: 1) The expectation of the empirical risk E S [ E S ( f S ) ] � n � 1 � E S [ E S ( f S ) ] = E S ℓ ( f S , z i ) n i =1 n n = 1 E S [ ℓ ( f S , z i ) ] = 1 � � i [ ℓ ( f S , z i ) ] E S E z ′ n n i =1 i =1 n � n � = 1 1 � � i , z ′ i , z ′ E S E z ′ i [ ℓ ( f i ) ] = E S E S ′ ℓ ( f i ) Si,z ′ Si,z ′ n n i =1 i =1 2) The expected risk E ( f S ) n n � n � E ( f S ) = E z ′ ℓ ( f S , z ′ ) = 1 E z ′ ℓ ( f S , z ′ ) = 1 1 � � � i ℓ ( f S , z ′ ℓ ( f S , z ′ i ) = E S ′ i ) E z ′ n n n i =1 i =1 i =1

Stability and Generalization Error (Continued) Putting the two together � � n � � 1 � � � � � i , z ′ i ) − ℓ ( f S , z ′ � E S [ E ( f S ) − E n ( f S )] � ≤ ℓ ( f i ) � E S E S ′ � � � � S i,z ′ � � n � i =1 n ≤ E S E S ′ 1 � � � i , z ′ i ) − ℓ ( f S , z ′ � ℓ ( f i ) � ≤ β ( n ) � � S i,z ′ n i =1

Stability of Tikhonov Regularization In the following we focus on the Tikhonov regularization algorithm A = A λ with λ > 0 . In particular, for any S ∈ Z n n 1 � ℓ ( f, z i ) + λ � f � 2 A ( S ) = f S = argmin H n f ∈H i =1 We will show that when H is a reproducing kernel Hilbert space (RKHS), Tikhonov regularization is β ( n ) -stable with � 1 � β ( n ) = O nλ

Error Decomposition for Tikhonov Regularization Define f λ = argmin f ∈H E ( f ) + λ � f � 2 H and decompose the excess risk as E ( f S ) − E ( f ∗ ) = E ( f S ) ± E S ( f S ) ± E S ( f λ ) − E ( f ∗ ) ± λ � f λ � 2 H Now, since ◮ E ( f S ) − E ( f ∗ ) ≤ E ( f S ) − E ( f ∗ ) + λ � f S � 2 H , ◮ f S is the minimizer of the regularized empirical risk E n ( f S ) + λ � f S � 2 H − E n ( f λ ) − λ � f λ � 2 H ≤ 0 , ◮ E S E S ( f λ ) = E ( f λ ) We can conclude E S E ( f S ) − E ( f ∗ ) ≤ E S [ E ( f S ) − E n ( f S )] + E ( f λ ) − E ( f ∗ ) + λ � f λ � 2 H

Error Decomposition for Tikhonov Regularization + E ( f λ ) − E ( f ∗ ) + λ � f λ � 2 E S E ( f S ) − E ( f ∗ ) ≤ E S [ E ( f S ) − E n ( f S )] H � �� Generalization Error (related to) Interpolation Error and Approximation Error Stability of Tikhonov regularization O (1 / ( nλ )) + assuming the interpolation/approximation error to be bounded by λ s with s > 0 lead to E S E ( f S ) − E ( f ∗ ) ≤ O (1 / ( nλ )) + λ s We can choose the optimal λ ( n ) and (expected) error rates ǫ ( n ) as 1 s s +1 ) s +1 ) λ ( n ) = O ( n − E S E ( f S ) − E ( f ∗ ) ≤ O ( n − Note. If f ∗ ∈ H it is easy to show that s = 1 and therefore that the expected excess risk goes to zero at least as O ( n − 1 / 2 ) .

Stability of Tikhonov Regularization Let H be a RKHS with associated kernel k : X × X → R . We want to show that for any S ∈ Z n , z ′ ∈ Z and i = 1 , . . . , n | ℓ ( f S , z ) − ℓ ( f S i,z ′ , z ) | ≤ 2 L 2 k 2 sup nλ z ∈Z where L > 0 is the Lipschitz constant of ℓ ( · , y ) (uniformly w.r.t. y ∈ Y ) and k 2 = sup x ∈X k ( x, x ) .

Reproducing Property Recall the reproducing property of RKHS H : ∀ f ∈ H , ∀ x ∈ X f ( x ) = � f, k ( · , x ) � H � In particular, | f ( x ) | ≤ k ( x, x ) � f � H . Therefore, sup | ℓ ( f S , z ) − ℓ ( f S i,z ′ , z ) | ≤ sup | ℓ ( f S ( x ) , y ) − ℓ ( f S i,z ′ ( x ) , y ) | z ∈Z x ∈X ,y ∈Y ≤ L sup | f S ( x ) − f S i,z ′ ( x ) | ≤ Lk � f S − f S i,z ′ � H x ∈X We need to control � f S − f S i,z ′ � H . We will exploit the strong convexity of Tikhonov regularization.

Strong convexity of � · � 2 H Technical observation . For any f, g ∈ H and θ ∈ [0 , 1] we have � θf + (1 − θ ) g � 2 H = θ 2 � f � 2 H = (1 − θ ) 2 � g � 2 H + 2 θ (1 − θ ) � f, g � H = θ (1 − (1 − θ )) � f � 2 H + (1 − θ )(1 − θ ) � g � 2 H + 2 θ (1 − θ ) � f, g � H = θ � f � 2 H + (1 − θ ) � g � 2 H − θ (1 − θ )( � f � 2 H + � g � 2 H − 2 � f, g � H ) = θ � f � 2 H + (1 − θ ) � g � 2 H − θ (1 − θ ) � f − g � 2 H In particular, for any F ′ : H → R convex, if we denote F ( · ) = F ′ ( · ) + λ � · � 2 , we have F ( θf + (1 − θ ) g ) ≤ θF ( f ) + (1 − θ ) F ( g ) − λθ (1 − θ ) � f − g � 2 H

Strong convexity II Let θ = 1 / 2 . Then we have � f + g � ≤ F ( f ) + F ( g ) − λ 2 � f − g � 2 2 F H 2 By subtracting on both sides 2 F ( f ) and adding λ/ 2 � f − g � 2 H we have � f + g � λ 2 � f − g � 2 H + 2 F − 2 F ( f ) ≤ F ( g ) − F ( f ) 2 � � f + g Finally, note that if f = argmin f ∈H F ( f ) we have F − F ( f ) ≥ 0 2 and therefore λ 2 � f − g � 2 H ≤ F ( g ) − F ( f )

Strong Convexity of Tikhonov Regularization Let now define ◮ F 1 ( · ) = E S ( · ) + λ � · � 2 H and ◮ F 2 ( · ) = E S i,z ′ ( · ) + λ � · � 2 H Furthermore, to simplify the notation denote f 1 = f S and f 2 = f S i,z ′ . Recall that by construction f 1 = argmin F 1 ( f ) and f 2 = argmin F 2 ( f ) f ∈H f ∈H

Strong Convexity of Tikhonov Regularization II By our previous observation on strong convexity λ λ 2 � f 1 − f 2 � 2 2 � f 1 − f 2 � 2 H ≤ F 1 ( f 2 ) − F 1 ( f 1 ) and H ≤ F 2 ( f 1 ) − F 2 ( f 2 ) Summing the two inequalities (and rearranging the terms) λ � f 1 − f 2 � 2 H ≤ F 1 ( f 2 ) − F 2 ( f 2 ) + F 2 ( f 1 ) − F 1 ( f 1 ) = E S ( f 2 ) − E S i,z ′ ( f 2 ) + E S i,z ′ ( f 1 ) − E S ( f 1 ) = 1 n ( ℓ ( f 2 , z i ) − ℓ ( f 2 , z ′ ) + ℓ ( f 1 , z ′ ) − ℓ ( f 1 , z i )) = 1 n ( ℓ ( f 2 , z i ) − ℓ ( f 1 , z i ) + ℓ ( f 1 , z ′ ) − ℓ ( f 2 , z ′ ))) ≤ 2 n sup | ℓ ( f 1 , z ) − ℓ ( f 2 , z ) | z where we have used the definitions of F 1 and F 2 and the fact that the risks E S and E S i,z ′ differ only for one point. Therefore, for any function f : X → Y , we have E S ( f ) − E S i,z ′ ( f ) = 1 n ( ℓ ( f, z i ) − ℓ ( f, z ′ )) .

Stability of Tikhonov Regularization (Continued) Since sup z | ℓ ( f 1 , z ) − ℓ ( f 2 , z ) | ≤ Lk � f 1 − f 2 � H , we have H ≤ 2 kL λ � f 1 − f 2 � 2 n � f 1 − f 2 � H which implies � f 1 − f 2 � H ≤ 2 kL nλ and from which we can conclude that | ℓ ( f 1 , z ) − ℓ ( f 2 , z ) | ≤ 2 L 2 k 2 sup nλ z ∈Z proving the β ( n ) = 2 L 2 k 2 uniform stability of Tikhonov regularization. nλ

So far... In previous classes we have studied the excess risk of an estimator (in particular its sample error) by controlling the complexity of the space of functions from which the estimator was sampled (e.g. by Covering numbers). In this class we have investigated an alternative approach that focuses exclusively on properties of the learning algorithm (rather than of the whole space). In particular we have observed how the stability of an estimator allows to control its generalization error in expectation. We have shown in particular that Tikhonov regularization is a stable algorithm. This allowed to immediately derive excess risk bounds.

Stability and Generalization (in Probability) Ok but... what about controlling the generalization error in probability rather than in expectation? We can exploit the following result McDiarmid’s Inequality . Let F : Z n × Z n → R such that for any i = 1 , . . . , n there exists a c i > 0 for which sup S ∈Z n ,z ∈Z | F ( S ) − F ( S i,z ) | ≤ c i . Then, � � 2 ǫ 2 P S ∼ ρ n ( | F ( S ) − E S ′ ∼ ρ n F ( S ′ ) | ≥ ǫ ) ≤ 2 exp − � n i =1 c 2 i

Class 5 Stability Carlo Ciliberto Department of Computer Science, - PowerPoint PPT Presentation

Class 5 Stability Carlo Ciliberto Department of Computer Science, UCL November 8, 2017 Uniform Stability - Notation Let Z be a set. For any set S = { z 1 , . . . , z n } Z n and for any z Z and i = 1 , . . . , n , we denote S i,z = { z

A tour on Bridgeland stability Paolo Stellari Hamburg, June 2015 Paolo Stellari A tour on

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class

Corporate Presentation Thermolab group One stop Solutions for all your Stability requirements

STABILITY METER INSTRUMENT for Hydrogen-Peroxide - automatic device- Abstract The Stability

Stability Programme, 2018 Update John McCarthy Department of Finance 17 th April 2018 Stability

Financial Stability: Financial Stability: Policy Choices for Small Economies Policy Choices for

Bessel inequality for robust stability analysis of time-delay system F. Gouaisbaut, Y. Ariba, A.

Section 26: Joints Types and Movement 26-1 Synovial joints: stability Synovial joints:

GVP models and Linear stability 1 Non linear stability: variational approaches. 2 A general

Plan of the Lecture Review: Nyquist stability criterion Todays topic: Nyquist stability

approach to financial stability Geoff Bascand 26 June 2019 Today Why financial stability

Stability, Popularity, and Lower Quotas Meghana Nasre IIT Madras CAALM 2019 Chennai

Stability Theorem Xinyi Kong, 1281909 Eindhoven, university of Technology 31 May 2018 Overview

Stability functions inferred from Kansas experiment Atm S 547 Lecture 6, Slide 1 K m,h vs.

Categories of Stability Phenomena in Power Systems Jose Rueda Torres Learning Objectives

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Lecture 9a: Sphere Maps, Viewport Transformation & Hidden Surface Removal Prof Emmanuel Agu

1/12/2011 Chapter 5: z-Scores : Location of Scores and Standardized Distributions Introduction to

AVERIST: An Algorithmic VERifier for STability AVERIST ARCHITECTURE AVERIST architecture Linear

Stable Marriage Problem What criteria to use? Stability. Introduced by Gale and Shapley in a

AM 205: lecture 12 Last time: Numerical differentiation, numerical solution of ordinary

Lecture 13: Stability Matthew Spencer Harvey Mudd College E157 Radio Frequency Circuit

Stability of the DenjoyWolff theorem Argyris Christodoulou The Open University Topics in

Class 5 Stability Carlo Ciliberto Department of Computer Science, - PowerPoint PPT Presentation

Class 5 Stability Carlo Ciliberto Department of Computer Science, UCL November 8, 2017 Uniform Stability - Notation Let Z be a set. For any set S = { z 1 , . . . , z n } Z n and for any z Z and i = 1 , . . . , n , we denote S i,z = { z

A tour on Bridgeland stability Paolo Stellari Hamburg, June 2015 Paolo Stellari A tour on

Programming Abstraction in C++ Eric S. Roberts and Julie Zelenski Stanford University 2010

BIBLICAL SURVEY Introductory Class Introductory Class BIBLICAL SURVEY Introductory Class

Corporate Presentation Thermolab group One stop Solutions for all your Stability requirements

STABILITY METER INSTRUMENT for Hydrogen-Peroxide - automatic device- Abstract The Stability

Stability Programme, 2018 Update John McCarthy Department of Finance 17 th April 2018 Stability

Financial Stability: Financial Stability: Policy Choices for Small Economies Policy Choices for

Bessel inequality for robust stability analysis of time-delay system F. Gouaisbaut, Y. Ariba, A.

Section 26: Joints Types and Movement 26-1 Synovial joints: stability Synovial joints:

GVP models and Linear stability 1 Non linear stability: variational approaches. 2 A general

Plan of the Lecture Review: Nyquist stability criterion Todays topic: Nyquist stability

approach to financial stability Geoff Bascand 26 June 2019 Today Why financial stability

Stability, Popularity, and Lower Quotas Meghana Nasre IIT Madras CAALM 2019 Chennai

Stability Theorem Xinyi Kong, 1281909 Eindhoven, university of Technology 31 May 2018 Overview

Stability functions inferred from Kansas experiment Atm S 547 Lecture 6, Slide 1 K m,h vs.

Categories of Stability Phenomena in Power Systems Jose Rueda Torres Learning Objectives

Visible Surface Determination CS418 Computer Graphics John C. Hart Painters Algorithm

Lecture 9a: Sphere Maps, Viewport Transformation &amp; Hidden Surface Removal Prof Emmanuel Agu

1/12/2011 Chapter 5: z-Scores : Location of Scores and Standardized Distributions Introduction to

AVERIST: An Algorithmic VERifier for STability AVERIST ARCHITECTURE AVERIST architecture Linear

Stable Marriage Problem What criteria to use? Stability. Introduced by Gale and Shapley in a

AM 205: lecture 12 Last time: Numerical differentiation, numerical solution of ordinary

Lecture 13: Stability Matthew Spencer Harvey Mudd College E157 Radio Frequency Circuit

Stability of the DenjoyWolff theorem Argyris Christodoulou The Open University Topics in

Lecture 9a: Sphere Maps, Viewport Transformation & Hidden Surface Removal Prof Emmanuel Agu