Spectral regularization methods for statistical inverse learning - PowerPoint PPT Presentation

Spectral regularization methods for statistical inverse learning problems G. Blanchard Universtit¨ at Potsdam van Dantzig seminar, 23/06/2016 Joint work with N. M¨ ucke (U. Potsdam) Rates for statistical inverse learning van Dantzig seminar 24/06/2016 1 / 38 G. Blanchard

General regularization and kernel methods 1 Inverse learning/regression and relation to kernels 2 Rates for linear spectral regularization methods 3 Beyond the regular spectrum case 4 Rates for statistical inverse learning van Dantzig seminar 24/06/2016 2 / 38 G. Blanchard

I NTRODUCTION : RANDOM DESIGN REGRESSION ◮ Consider the familiar regression setting on a random design, Y i = f ∗ ( X i ) + ε i , where ( X i , Y i ) 1 ≤ i ≤ n is an i.i.d. sample from P XY on the space X × R , ◮ with E [ ε i | X i ] = 0. ◮ For an estimator � f we consider the prediction error function, �� 2 � � f − f ∗ � 2 � � �� f ( X ) − f ∗ ( X ) � 2 , X = E , which we want to be as small as possible (in expectation or with large probability). ◮ We can also be interested in squared reconstruction error � f − f ∗ � 2 � � �� H where H is a certain Hilbert norm of interest for the user. Rates for statistical inverse learning van Dantzig seminar 24/06/2016 4 / 38 G. Blanchard

L INEAR CASE ◮ Very classical is the linear case: X = R p , f ∗ ( x ) = � x , β ∗ � , and in usual matrix form ( X t i form the lines of the design matrix X ) Y = X β ∗ + ε ◮ ordinary least squares solution is � β OLS = ( X t X ) † X t Y . �� 2 � β ∗ − � ◮ Prediction error corresponds to E β, X � � 2 � � � β ∗ − � ◮ Reconstruction error corresponds to β � . Rates for statistical inverse learning van Dantzig seminar 24/06/2016 5 / 38 G. Blanchard

E XTENDING THE SCOPE OF LINEAR REGRESSION ◮ Common strategy to model more complex functions: map input variable x ∈ X to a so-called “feature space” through � x = Φ( x ) ◮ typical examples (say with X = [ 0 , 1 ] ) are x = Φ( x ) = ( 1 , x , x 2 , . . . , x p ) ∈ R p + 1 ; � x = Φ( x ) = ( 1 , cos ( 2 π x ) , sin ( 2 π x ) , cos ( 3 π x ) , sin ( 3 π x ) , . . . ) ∈ R 2 p + 1 . � ◮ Problem: large number of parameters to estimate require regularization to avoid overfitting. Rates for statistical inverse learning van Dantzig seminar 24/06/2016 6 / 38 G. Blanchard

R EGULARIZATION METHODS ◮ Main idea of regularization is to replace ( X t X ) † by an approximate inverse, for instance ◮ Ridge regression/Tikhonov : � β Ridge ( λ ) = ( X t X + λ I p ) − 1 X t Y ◮ PCA projection/spectral cut-off : restrict X t X on its k first eigenvectors β PCA ( k ) = ( X t X ) † � | k X t Y ◮ Gradient descent/Landweber Iteration/ L 2 boosting : β LW ( k ) = � � β LW ( k − 1 ) + X t ( Y − X � β LW ( k − 1 ) ) k � ( I − X t X ) k X t Y , = i = 0 � � � X t X � op ≤ 1). (assuming Rates for statistical inverse learning van Dantzig seminar 24/06/2016 7 / 38 G. Blanchard

G ENERAL FORM SPECTRAL LINEARIZATION ◮ General form regularization method: � β Spec ( ζ,λ ) = ζ λ ( X t X ) X t Y for somme well-chosen function ζ λ : R + → R + acting on the spectrum and “approximating” the function x �→ 1 / x . ◮ λ > 0: regularization parameter; λ → 0 ⇔ less regularization ◮ Notation of functional calculus, i.e. X t X = Q T diag ( λ 1 , . . . , λ p ) Q → ζ ( X t X ) := Q T diag ( ζ ( λ 1 ) , . . . , ζ ( λ p )) Q ◮ Many well-known from the inverse problem literature ◮ Examples: ◮ Tikhonov : ζ λ ( t ) = ( t + λ ) − 1 ◮ Spectral cut-off : ζ λ ( t ) = t − 1 1 { t ≥ λ } i = 0 ( 1 − t ) i . ◮ Landweber iteration : ζ k ( t ) = � k Rates for statistical inverse learning van Dantzig seminar 24/06/2016 8 / 38 G. Blanchard

C OEFFICIENT EXPANSION ◮ A useful trick of functional calculus is the “ shift rule ”: ζ ( X t X ) X t = X t ζ ( XX t ) . ◮ Interpretation : n � � β Spec ( ζ,λ ) = ζ ( X t X ) X t Y = X t ζ ( XX t ) Y = α i X i , � i = 1 with α i = ζ ( G ) Y , � and G = XX t is the ( n , n ) Gram matrix of ( X 1 , . . . , X n ) . ◮ This representation is more economical if p ≫ n . Rates for statistical inverse learning van Dantzig seminar 24/06/2016 9 / 38 G. Blanchard

T HE “ KERNELIZATION ” A NSATZ ◮ Let Φ be a feature mapping into a (possibly infinite dimensional) Hilbert feature space H . ◮ Representing � x = Φ( x ) ∈ H explicitly is cumbersome/impossible in practice, but if we can compute quickly the kernel � � x , � K ( x , x ′ ) := = � Φ( x ) , Φ( x ′ ) � , � x ′ �� then kernel Gram matrix � x i , � G ij = x j = K ( x i , x j ) is accessible. ◮ We can hence directly “kernelize” any classical regularization technique using the implicit representation n � � α i � α i = ζ ( � β Spec ( ζ,λ ) = � X i , � G ) Y , i = 1 � � � ◮ the value of f ( x ) = β, � x can then be computed for any x : n � f ( x ) = α i K ( X i , x ) . � i = 1 Rates for statistical inverse learning van Dantzig seminar 24/06/2016 10 / 38 G. Blanchard

R EPRODUCING KERNEL METHODS ◮ If H is a Hilbert feature space, it is useful to identify it as a space of real functions on X of the form f ( x ) = � w , Φ( x ) � . The canonical feature mapping is then Φ( x ) = K ( x , . ) and the “reproducing kernel” property reads f ( x ) = � f , Φ( x ) � = � f , K ( x , . ) � . ◮ Classical kernels on R d include ◮ Gaussian Kernel K ( x , y ) = exp − � x − y � 2 / 2 σ 2 ◮ Polynomial Kernel K ( x , y ) = ( 1 + � x , y � ) p ◮ Spline kernels, Mat´ ern kernel, inverse quadratic kernel. . . ◮ Success of reproducing kernel methods since early 00’s is due to their versatility and ease of use : beyond vector spaces, kernels have been constructed on various non-euclidean data (text, genome, graphs, probability distributions. . . ) ◮ One of the tenets of “learning theory” is a distribution-free point of view ; in particular the sampling distribution (of the X i s) is unknown to the user and could be very general. Rates for statistical inverse learning van Dantzig seminar 24/06/2016 11 / 38 G. Blanchard

S ETTING : “I NVERSE L EARNING ” PROBLEM ◮ We refer to “inverse learning” (or inverse regression) for an inverse problem where we have noisy observations at random design points : Y i = ( Af ∗ )( X i ) + ε i . ( X i , Y i ) i = 1 ,..., n i.i.d. : (ILP) ◮ the goal is to recover f ∗ ∈ H 1 . ◮ early works on closely related subjects: from the splines literature in the 80’s (e.g. O’Sullivan ’90) Rates for statistical inverse learning van Dantzig seminar 24/06/2016 13 / 38 G. Blanchard

M AIN ASSUMPTION FOR I NVERSE L EARNING Y i = ( Af ∗ )( X i ) + ε i , i = 1 , . . . , n , where A : H 1 → H 2 . Model: (ILP) Observe: ◮ H 2 should be a space of real-values functions on X . ◮ the geometrical structure of the “measurement errors” will be dictated by the statistical properties of the sampling scheme – no need to assume or consider any a priori Hilbert structure on H 2 ◮ crucial stuctural assumption is the following: Assumption The family of evaluation functionals ( S x ) , x ∈ X , defined by S x : H 1 − → R f �− → ( S x )( f ) := ( Af )( x ) is uniformly bounded, i.e., there exists κ < ∞ such that for any x ∈ X | S x ( f ) | ≤ κ � f � H 1 . Rates for statistical inverse learning van Dantzig seminar 24/06/2016 14 / 38 G. Blanchard

G EOMETRY OF INVERSE LEARNING ◮ The inverse learning under the previous assumption was essentially considered by Caponnetto et al. (2006). ◮ Riesz’s theorem implies the existence for any x ∈ X of F x ∈ H 1 : ∀ f ∈ H 1 : ( Af )( x ) = � f , F x � ◮ K ( x , y ) := � F x , F y � defines a positive semidefinite kernel on X with associated reproducing kernel Hilbert space (RKHS) denoted H K . ◮ as a pure function space, H K coincides with Im ( A ) . ◮ assuming A injective, A is in fact an isometric isomorphism between H 1 and H K . Rates for statistical inverse learning van Dantzig seminar 24/06/2016 15 / 38 G. Blanchard

Spectral regularization methods for statistical inverse learning - PowerPoint PPT Presentation

Spectral regularization methods for statistical inverse learning problems G. Blanchard Universtit at Potsdam van Dantzig seminar, 23/06/2016 Joint work with N. M ucke (U. Potsdam) Rates for statistical inverse learning van Dantzig

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

An indefinite inverse spectral problem of Stieltjes type Andreas Fleige, OTIND 2016 (joint work

Regularization Methods for System Identification Input Design Biqiang MU Academy of Mathematics

Regularization Overview Regularization Overview Problems & Multicollinearity We will

Inverse Problems and Regularization An Introduction Stefan Kindermann Industrial Mathematics

A new approach for regularization of inverse problems in image processing I. Souopgui 1 , 2 , E.

Regularization of Inverse Problems Matthias J. Ehrhardt January 28, 2019 What is an Inverse

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Inverse Problems Recovering x 0 R N from noisy observations y = x 0 + w R P Inverse

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

L ECTURE 9: D UAL AND K ERNEL Prof. Julia Hockenmaier juliahmr@illinois.edu Linear classifiers

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &

Meta-parameters of kernel methods and their optimization Petra Vidnerov Roman Neruda

PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION Hypothesis

12.1 Active Learning: A Review When learning, it may be the case that getting the true labels of

splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP

Spectral regularization methods for statistical inverse learning - PowerPoint PPT Presentation

Spectral regularization methods for statistical inverse learning problems G. Blanchard Universtit at Potsdam van Dantzig seminar, 23/06/2016 Joint work with N. M ucke (U. Potsdam) Rates for statistical inverse learning van Dantzig

Spectral Clustering Spectral Clustering? Spectral methods Methods using eigenvectors of

Regularization via Spectral Filtering Lorenzo Rosasco MIT, 9.520 Class 7 L. Rosasco

Statistical Inverse Problems and abstract inverse problems examples Instrumental Variables

Iterative regularization for general inverse problems Guillaume Garrigos with L. Rosasco and S.

Dynamic Inverse Problems: Schmitt Efficient Algorithms and Approximate Inverse Problems

An indefinite inverse spectral problem of Stieltjes type Andreas Fleige, OTIND 2016 (joint work

Regularization Methods for System Identification Input Design Biqiang MU Academy of Mathematics

Regularization Overview Regularization Overview Problems &amp; Multicollinearity We will

Inverse Problems and Regularization An Introduction Stefan Kindermann Industrial Mathematics

A new approach for regularization of inverse problems in image processing I. Souopgui 1 , 2 , E.

Regularization of Inverse Problems Matthias J. Ehrhardt January 28, 2019 What is an Inverse

Introduction CSCE 970 CSCE 970 Lecture 3: Lecture 3: Regularization Regularization CSCE 970

Regularization Regularization is a general approach to add a complexity parameter to a

Inverse Problems Recovering x 0 R N from noisy observations y = x 0 + w R P Inverse

Inverse Kinematics Inverse Kinematics Inverse Kinematics Carnegie Carnegie Sebastian Grassia

Course on Inverse Problems Albert Tarantola Lesson VI: a) General Formulation of the Inverse

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

L ECTURE 9: D UAL AND K ERNEL Prof. Julia Hockenmaier juliahmr@illinois.edu Linear classifiers

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &amp;

Meta-parameters of kernel methods and their optimization Petra Vidnerov Roman Neruda

PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION Hypothesis

12.1 Active Learning: A Review When learning, it may be the case that getting the true labels of

splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel Computation for NLP

Regularization Overview Regularization Overview Problems & Multicollinearity We will

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &