Introduction to Statistical Learning and Kernel Machines Hichem - PowerPoint PPT Presentation

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Introduction to Statistical Learning and Kernel Machines Hichem SAHBI CNRS UPMC June 2018 Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Outline Introduction to Statistical Learning Definitions Probability Tools Generalization Bounds Machine Learning Algorithms Kernel Machines : Supervised and Unsupervised Learning The Representer Theorem Supervised Learning (Support vector machines and regression) Kernel Design (kernel combination, cdk kernels,...) Unsupervised Learning (kernel PCA and CCA) Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) The Representer Theorem 1 Supervised Learning (SVMs and SVRs) 2 Kernel Design 3 Unsupervised Learning (Kernel PCA and CCA) 4 Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Pattern Recognition Problems Given a pattern (observation) X ∈ X , the goal is to predict the unknown label Y of X . Character recognition (OCR) : X is an image, Y is a letter. Face detection (resp. recognition) : X is an image, Y indicates the presence of a face in the picture (resp. identity). Text classification : X is a text, Y is a category (topic, spam/non spam,...). Medical diagnosis : X is a set of features (age, genome, ...), Y is the risk. Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Section 1 The Representer Theorem Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Regularization, Kernel Methods and Representer Theorem min R n ( g ) + λ Ω( g ) , λ ≥ 0 ( Tikhonov ) g ∈ G For a particular regularizer Ω( g ) and class G , the solution to the above problem (Kimeldorf and Wahba, 1971) n � { ( X i , Y i ) } n g α ( . ) = α i k ( ., X i ) , i = 1 ⊆ X × Y is fixed i = 1 k is a kernel : symmetric, continuous on X × X and positive definite (Mercer, 1909), k ( X , X ′ ) = � Φ( X ) , Φ( X ′ ) � . n � ( g ( X i ) − Y i ) 2 (kernel regression), R n ( g ) = 1 n i = 1 n � ( sign [ g ( X i )] − Y i ) 2 (e.g., max margin 1 R n ( g ) = 2 n i = 1 classifier, SVMs). Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Section 2 Supervised Learning (SVMs and SVRs) Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Support Vector Machines Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Support Vector Machines (Large Margin Classifiers) Let { ( X 1 , Y 1 ) , . . . , ( X n , Y n ) } , be a training set generated i.i.d w t x + b = − 1 w t x + b = 0 w t x + b = + 1 w 2 � w � 1 � w ′ X i + b � 2 w ′ w min Y i − 1 ≥ 0 , ∀ i s.c. w , b Optimality conditions lead to w = � i α i Y i X i and dual form α i − 1 � � � max α i α j Y i Y j � X i , X j � 2 { α i } i i j � α i ≥ 0 , ∀ i α i Y i = 0 s.t. and i Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) VC Dimension of Large Margin Classifiers The set of hyperplane classifiers with a margin at least M has a VC dimension upper bounded by : h ≤ r 2 / M 2 , here r is the radius of the smallest sphere containing all the patterns X . Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Interpretation of Lagrange Multipliers α i > 0 implies Y i ( w ′ X i + b ) = 1 : X i is a support vector. α i = 0 implies Y i ( w ′ X i + b ) > 1 : X i is useless vector. w t x + b = − 1 w t x + b = 0 w t x + b = + 1 w 2 � w � Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Classification Function w t x + b = − 1 w t x + b = 0 w t x + b = + 1 w Classification function g α ( X ) − b = � w , X � � � = α i � X i , X � − α i � X i , X � Y i =+ 1 Y i = − 1 Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Linear Soft-SVMs Introduce slack variables { ξ 1 , . . . , ξ n } to allow misclassification. Trade-off large margin and misclassification. n 1 � 2 w ′ w + C min ξ i w , b i = 1 Y i ( w ′ X i + b ) + ξ i ≥ 1 , ∀ i s.c. ξ i ≥ 0 w t x + b = − 1 w t x + b = 0 w t x + b = + 1 ξ i w 2 � w � Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Dual Formulation α i − 1 � � � max { α i } α i α j Y i Y j � X i , X j � 2 i i j � 0 ≤ α i ≤ C , ∀ i i α i Y i = 0 s.c. and Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Non-Linear SVMs Φ Φ() Φ() Φ() Φ() Φ() Φ() Φ() Φ() α i − 1 � � � max α i α j Y i Y j � Φ( X i ) , Φ( X j ) � 2 { α i } i i j � α i ≥ 0 , ∀ i α i Y i = 0 s.t. and i � � g α ( X ) − b = α i � Φ( X i ) , Φ( X ) � − α i � Φ( X i ) , Φ( X ) � Y i =+ 1 Y i = − 1 The product � Φ( X ) , Φ( X ′ ) � defines a kernel k ( X , X ′ ) . Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Kernels Kernels are symmetric and positive (semi) definite functions that measure similarity between data. Positive semi definite means k ( X , X ′ ) = � Φ( X ) , Φ( X ′ ) � . � ∀ X 1 , . . . , X n ∈ X , ∀ c 1 , . . . , c n ∈ R , c i c j k ( X i , X j ) ≥ 0 i , j Equivalently, the Gram (kernel) matrix K with K ij = k ( X i , X j ) has positive eigenvalues. Kernels on vectorial data : linear � X , X ′ � , polynomial � � X − X ′ � 2 � ( 1 + � X , X ′ � ) p , Gaussian exp − 1 , etc. σ 2 Kernels can be designed using closure operations (additions, products, exponentiation, etc.) Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Examples (Linear vs Gaussian) Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Gaussian Kernel � � X − X ′ � 2 � − 1 k ( X , X ′ ) = � Φ( X ) , Φ( X ′ ) � = exp . σ 2 The dimension of the output space R H is ∞ . The Gaussian kernel has good generalization properties but requires a good selection of the scale parameter σ using the tedious cross validation process. over-fitting Generalization error over-smoothing tradeoff scale parameter Hichem SAHBI Introduction to Statistical Learning and Kernel Machines

Introduction to Statistical Learning and Kernel Machines Hichem - PowerPoint PPT Presentation

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Introduction to Statistical Learning and Kernel Machines Hichem SAHBI CNRS UPMC June 2018 Hichem SAHBI Introduction to

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &

Finite State Machines (FSM) AKA Finite State Automat on State Machines Introduction State

Statistical Classification with Fisher Zantedeschi Introduction Kernel Topic Models LDA PLSM

Silicon Kernel Learning Machines Gert Cauwenberghs Johns Hopkins University gert@jhu.edu

TCP Part 3: Performance, Fairness, & Modern Congestion Controllers 15-441 Guest Lecture

IND-CCA-secure Key Encapsulation Mechanism in the Quantum Random Oracle Model, Revisited Haodong

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio 1 , Sahely Bhadra 2 , and

Large deviations and heterogeneities in driven or non-driven kinetically constrained models

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Public-Key Cryptography Lecture 12 CCA Secure PKE Hybrid Encryption CCA Secure PKE In SKE, to get

Phase Lengths in the Cylic Cellular Automaton Kiran Tomlinson Department of Computer Science,

Spectral Methods for Natural Language Processing Karl Stratos Thesis Defense Committee David

Introduction to Statistical Learning and Kernel Machines Hichem - PowerPoint PPT Presentation

The Representer Theorem Supervised Learning (SVMs and SVRs) Kernel Design Unsupervised Learning (Kernel PCA and CCA) Introduction to Statistical Learning and Kernel Machines Hichem SAHBI CNRS UPMC June 2018 Hichem SAHBI Introduction to

Kernel Machines Support Vector Machines 1 Kernel Machines Optimal Separating HyperPlanes Soft

Kernel Machines Steven J Zeil Old Dominion Univ. Fall 2010 1 Support Vector Machines Kernel

WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF THE WARS OF

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Finite State Machines (FSM) Chapter 8 State Machines Introduction State Machines Mealy and

Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity

Introduction to Linux Kernel Modules Luca Abeni luca.abeni@santannapisa.it Linux Kernel Modules

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel -means Clustering Manuel

Processes, Protection and the Kernel: Processes, Protection and the Kernel: Mode, Space, and

Black Kernel Rot Malady of Pecan B Wood, C Bock, l Wells, T Cottrell, M Hotchkiss Black Kernel

Kernel Properties - Convexity Leila Wehbe October 1st 2013 Leila Wehbe Kernel Properties -

Linux Kernel Debugging Your kernel just oopsed - What do you do, hotshot? Muli Ben-Yehuda

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &amp;

Finite State Machines (FSM) AKA Finite State Automat on State Machines Introduction State

Statistical Classification with Fisher Zantedeschi Introduction Kernel Topic Models LDA PLSM

Silicon Kernel Learning Machines Gert Cauwenberghs Johns Hopkins University gert@jhu.edu

TCP Part 3: Performance, Fairness, &amp; Modern Congestion Controllers 15-441 Guest Lecture

IND-CCA-secure Key Encapsulation Mechanism in the Quantum Random Oracle Model, Revisited Haodong

Large-Scale Sparse Kernel Canonical Correlation Analysis Viivi Uurtio 1 , Sahely Bhadra 2 , and

Large deviations and heterogeneities in driven or non-driven kinetically constrained models

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Public-Key Cryptography Lecture 12 CCA Secure PKE Hybrid Encryption CCA Secure PKE In SKE, to get

Phase Lengths in the Cylic Cellular Automaton Kiran Tomlinson Department of Computer Science,

Spectral Methods for Natural Language Processing Karl Stratos Thesis Defense Committee David

Kernel machines and sparsity 2 juillet, 2009 ENBIS09, Saint Etienne Stphane Canu &

TCP Part 3: Performance, Fairness, & Modern Congestion Controllers 15-441 Guest Lecture