Advanced Machine Learning Learning Kernels MEHRYAR MOHRI - PowerPoint PPT Presentation

Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH ..

Outline Kernel methods. Learning kernels • scenario. • learning bounds. • algorithms. Advanced Machine Learning - Mohri@ page 2

Machine Learning Components user features sample algorithm h Advanced Machine Learning - Mohri@ page 3

Machine Learning Components user features sample main focus critical task algorithm of ML literature h Advanced Machine Learning - Mohri@ page 4

Kernel Methods Features implicitly de fi ned via the choice of a Φ : X → H PDS kernel K Φ ( x ) · Φ ( y ) = K ( x, y ) . ∀ x, y ∈ X, interpreted as a similarity measure. K Flexibility: PDS kernel can be chosen arbitrarily. Help extend a variety of algorithms to non-linear predictors, e.g., SVMs, KRR, SVR, KPCA. PDS condition directly related to convexity of optimization problem. Advanced Machine Learning - Mohri@ page 5

Example - Polynomial Kernels De fi nition : ∀ x, y ∈ R N , K ( x, y ) = ( x · y + c ) d , c > 0 . Example: for and , N =2 d =2 K ( x, y ) = ( x 1 y 1 + x 2 y 2 + c ) 2 x 2 y 2     1 1 x 2 y 2 2 2      √   √  2 x 1 x 2 2 y 1 y 2     = .   ·   √ √ 2 c x 1 2 c y 1         √ √ 2 c x 2 2 c y 2     c c Advanced Machine Learning - Mohri@ page 6

XOR Problem Use second-degree polynomial kernel with : c = 1 √ 2 x 1 x 2 x 2 √ √ √ ( − 1 , 1) (1 , 1) √ √ √ (1 , 1 , + 2 , − 2 , − 2 , 1) (1 , 1 , + 2 , + 2 , + 2 , 1) √ 2 x 1 x 1 (1 , − 1) ( − 1 , − 1) √ √ √ √ √ √ (1 , 1 , − 2 , − 2 , + 2 , 1) (1 , 1 , − 2 , + 2 , − 2 , 1) Linearly non-separable Linearly separable by x 1 x 2 = 0 . Advanced Machine Learning - Mohri@ page 7

Other Standard PDS Kernels Gaussian kernels : � || x � y || 2 � � K ( x, y ) = exp , σ � = 0 . 2 σ 2 • � x · x � Normalized kernel of � ( x , x � ) �� exp σ 2 . Sigmoid Kernels: K ( x, y ) = tanh( a ( x · y ) + b ) , a, b ≥ 0 . Advanced Machine Learning - Mohri@ page 8

SVM (Cortes and Vapnik, 1995; Boser, Guyon, and Vapnik, 1992) Primal: m 1 2 � w � 2 + C � � � min 1 � y i ( w · Φ K ( x i ) + b ) + . w ,b i =1 Dual: m m α i − 1 � � max α i α j y i y j K ( x i , x j ) 2 α i =1 i,j =1 m � subject to: 0 ≤ α i ≤ C ∧ α i y i = 0 , i ∈ [1 , m ] . i =1 Advanced Machine Learning - Mohri@ page 9

Kernel Ridge Regression (Hoerl and Kennard, 1970; Sanders et al., 1998) Primal: m ( w · Φ K ( x i ) + b � y i ) 2 . w λ � w � 2 + � min i =1 Dual: � ( K + λ I ) α + 2 α � y . max α � R m − α Advanced Machine Learning - Mohri@ page 10

Questions How should the user choose the kernel? • problem similar to that of selecting features for other learning algorithms. • poor choice learning made very di ffi cult. • good choice even poor learners could succeed. The requirement from the user is thus critical. • can this requirement be lessened? • is a more automatic selection of features possible? Advanced Machine Learning - Mohri@ page 11

Outline Kernel methods. Learning kernels • scenario. • learning bounds. • algorithms. Advanced Machine Learning - Mohri@ page 12

Standard Learning with Kernels user kernel K sample algorithm h Advanced Machine Learning - Mohri@ page 13

Learning Kernel Framework kernel user sample family K algorithm ( K, h ) Advanced Machine Learning - Mohri@ page 14

Kernel Families Most frequently used kernel families, , q ≥ 1 � µ 1 p � � � � . K q = K µ : K µ = µ k K k , µ = ∈ ∆ q . . µ p k =1 � � with ∆ q = µ : µ � 0 , � µ � q = 1 . Hypothesis sets:   � � h � H K : K � K q , � h � H K � 1 H q = . Advanced Machine Learning - Mohri@ page 15

Relation between Norms Lemma: for , the following holds: p, q ∈ (0 , + ∞ ] 1 p − 1 q � x � q . � x � R N , p � q � � x � q � � x � p � N Proof: for the left inequalities, observe that for , x � = 0  | x i |  | x i |  k x k p � p N � p N � q X X � = = 1 . k x k q k x k q k x k q i =1 i =1 | {z } ≤ 1 • Right inequalities follow immediately Hölder’s inequality: 1 � N � 1 � N q � N � p � 1 − p � q � p p q q p − 1 1 � � � | x i | p ( | x i | p ) q . � x � p = � (1) = � x � q N p q − p � � i =1 i =1 i =1 Advanced Machine Learning - Mohri@ page 16

Single Kernel Guarantee (Koltchinskii and Panchenko, 2002) Theorem: fi x . Then, for any , with probability at δ > 0 ρ > 0 least , the following holds for all , h ∈ H 1 1 − δ � � log 1 R ρ ( h ) + 2 Tr[ K ] R ( h ) ≤ � δ + 2 m . m ρ Advanced Machine Learning - Mohri@ page 17

Multiple Kernel Guarantee (Cortes, MM, and Rostamizadeh, 2010) Theorem: fi x . Let with . Then, for q + 1 1 ρ > 0 q, r ≥ 1 r =1 any , with probability at least , the following holds δ > 0 1 − δ for all and any integer : h ∈ H q 1 ≤ s ≤ r s p log 1 R ρ ( h ) + 2 s k u k s R ( h )  b δ + 2 m , m ρ with . u = (Tr[ K 1 ] , . . . , Tr[ K p ]) � Advanced Machine Learning - Mohri@ page 18

Proof Let with . 1 q + 1 q, r ≥ 1 r =1 h i m X R S ( H q ) = 1 b m E sup σ i h ( x i ) σ h 2 H q i =1 h i m X = 1 m E sup σ i α j K µ ( x i , x j ) σ µ 2 ∆ q , α > K µ α  1 i,j =1 h i h i = 1 = 1 σ > K µ α m E sup m E sup h σ , α i K 1 / 2 σ σ µ µ 2 ∆ q , α > K µ α  1 µ 2 ∆ q , k α k  1 K 1 / 2 µ q h i = 1 σ > K µ σ m E sup (Cauchy-Schwarz) σ µ 2 ∆ q h i ⇥ ⇤ = 1 p µ · u σ u σ = ( σ > K 1 σ , . . . , σ > K p σ ) > ) m E sup σ µ 2 ∆ q ⇥p ⇤ = 1 m E k u σ k r (definition of dual norm) . σ Advanced Machine Learning - Mohri@ page 19

Lemma (Cortes, MM, and Rostamizadeh, 2010) Lemma: Let be a kernel matrix for a fi nite sample. Then, K for any integer , r ⌘ r h ( σ > K σ ) r i ⇣ E r Tr[ K ] ≤ . σ Proof: combinatorial argument. Advanced Machine Learning - Mohri@ page 20

Proof For any , 1 ≤ s ≤ r ⇥p ⇤ R S ( H q ) = 1 b m E k u σ k r σ ⇥p ⇤  1 m E k u σ k s σ hh ( σ > K k σ ) s i 1 2 s i p X = 1 m E σ k =1 h h ( σ > K k σ ) s ii 1 p X  1 2 s (Jensen’s inequality) E m σ k =1 h h ( σ > K k σ ) s ii 1 p X = 1 2 s E m σ p k =1 h ⇣ ⌘ s i 1 p X s k u k s  1 2 s = s Tr[ K k ] (lemma) . m m k =1 Advanced Machine Learning - Mohri@ page 21

L 1 Learning Bound (Cortes, MM, and Rostamizadeh, 2010) Corollary: fi x . For any , with probability , the δ > 0 1 − δ ρ > 0 following holds for all : h ∈ H 1 r s p e d log p e max k =1 Tr[ K k ] log 1 R ρ ( h ) + 2 R ( h )  b δ + 2 m . m ρ • weak dependency on . p • bound valid for . p � m • Tr[ K k ] ≤ m max K k ( x, x ) . x Advanced Machine Learning - Mohri@ page 22

Proof For , the bound holds for any integer q = 1 s ≥ 1 s p log 1 R ρ ( h ) + 2 s k u k s R ( h )  b δ + 2 m , m ρ " p # 1 s p 1 X with Tr[ K k ] s s k u k s = s  sp max k =1 Tr[ K k ] . s k =1 1 The function reaches it minimum at . log p s �� sp s Advanced Machine Learning - Mohri@ page 23

Lower Bound Tight bound: • dependency cannot be improved. � log p • argument based on VC dimension or example. Observations: case . X = { - 1 , + 1 } p • canonical projection kernels . K k ( x , x � )= x k x � k • contains . J p = { x �� sx k : k � [1 , p ] , s � { - 1 , + 1 }} H 1 • . VCdim( J p )= Ω (log p ) • for and , . R ρ ( h )= � � h ∈ J p R ( h ) ρ =1 • VC lower bound: . �� VCdim( J p ) /m Ω Advanced Machine Learning - Mohri@ page 24

Pseudo-Dimension Bound (Srebro and Ben-David, 2006) Assume that for all . Then, for k ∈ [1 , p ] , K k ( x, x ) ≤ R 2 any , with probability at least , for any , δ > 0 1 − δ h ∈ H 1 ⇥ 2 + p log 128 em 3 R 2 + 256 R 2 ρ 2 log ρ em 8 R log 128 mR 2 + log(1 / δ ) ρ 2 p ρ 2 R ( h ) ≤ � R ρ ( h ) + 8 . m • bound additive in (modulo log terms). p • not informative for . p>m • based on pseudo-dimension of kernel family. • similar guarantees for other families. Advanced Machine Learning - Mohri@ page 25

Comparison ρ /R = . 2 Advanced Machine Learning - Mohri@ page 26

Advanced Machine Learning Learning Kernels MEHRYAR MOHRI - PowerPoint PPT Presentation

Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH .. Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. Advanced Machine Learning -

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING Overview Todays Lecture

ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Discrete and Continuous Reinforcement Learning (not part of exam material) 1 1 ADVANCED

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

THE YEAR OF CHANGE FY2020/2021 BUDGET PRESENTATION - 1 - THE DEPARTMENT OF PARKS &

Timetable 1. Coaches arrive by 9:30am West Bay Stone Farm Evening activities: 2. Children

READING BUDDIES PARTNERSHIP BETWEEN STAR UNITED METHODIST, BISCOE PRESBYTERIAN AND FIRST

Celebrating the 40 th Anniversary of the Groundbreaking at History On September 27, 1970 a group

NRC Annual Conference Hotel del Coronado, San Diego, CA Genesee & Wyoming Railroad Services

Convening a Community to Address Social Determinants of Health: Baton Rouge Vision of Health

BATON ROUGE AREA ECONOMIC OVERVIEW ADAM KNAPP, JULY 28, 2020 JOB LOSSES IN THE CAPITAL REGION ARE

INDUSTRY LEADING ENGINEERING & SUPPLIER TO TANK CAR MARKET Transquip Timeline 2001 1994

Advanced Machine Learning Learning Kernels MEHRYAR MOHRI - PowerPoint PPT Presentation

Advanced Machine Learning Learning Kernels MEHRYAR MOHRI MOHRI@ COURANT INSTITUTE & GOOGLE RESEARCH .. Outline Kernel methods. Learning kernels scenario. learning bounds. algorithms. Advanced Machine Learning -

MACHINE LEARNING Kernel Canonical Correlation Analysis 1 ADVANCED MACHINE LEARNING ADVANCED

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

ADVANCED MACHINE LEARNING Kernel PCA 11 ADVANCED MACHINE LEARNING Overview Todays Lecture

ADVANCED MACHINE LEARNING Non-linear regression techniques 1 1 ADVANCED MACHINE LEARNING

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

A Machine Learning Approach A Machine Learning Approach A Machine Learning Approach A Machine

Welcome to the Machine Learning Toolbox! Machine Learning Toolbox Supervised learning caret

Discrete and Continuous Reinforcement Learning (not part of exam material) 1 1 ADVANCED

Introduction to Machine Learning COMPSCI 371D Machine Learning COMPSCI 371D Machine

INTRODUCTION TO MACHINE LEARNING Joseph C. Osborn CS 51A Spring 2020 Machine Learning is

Human and Machine Learning Tom Mitchell Machine Learning Department Carnegie Mellon University

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Machine Learning - Intro Aarti Singh Machine Learning 10-701/15-781 Sept 8, 2010 You tell me

THE YEAR OF CHANGE FY2020/2021 BUDGET PRESENTATION - 1 - THE DEPARTMENT OF PARKS &amp;

Timetable 1. Coaches arrive by 9:30am West Bay Stone Farm Evening activities: 2. Children

READING BUDDIES PARTNERSHIP BETWEEN STAR UNITED METHODIST, BISCOE PRESBYTERIAN AND FIRST

Celebrating the 40 th Anniversary of the Groundbreaking at History On September 27, 1970 a group

NRC Annual Conference Hotel del Coronado, San Diego, CA Genesee &amp; Wyoming Railroad Services

Convening a Community to Address Social Determinants of Health: Baton Rouge Vision of Health

BATON ROUGE AREA ECONOMIC OVERVIEW ADAM KNAPP, JULY 28, 2020 JOB LOSSES IN THE CAPITAL REGION ARE

INDUSTRY LEADING ENGINEERING &amp; SUPPLIER TO TANK CAR MARKET Transquip Timeline 2001 1994

THE YEAR OF CHANGE FY2020/2021 BUDGET PRESENTATION - 1 - THE DEPARTMENT OF PARKS &

NRC Annual Conference Hotel del Coronado, San Diego, CA Genesee & Wyoming Railroad Services

INDUSTRY LEADING ENGINEERING & SUPPLIER TO TANK CAR MARKET Transquip Timeline 2001 1994