CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 - - PowerPoint PPT Presentation

cs480 680 lecture 11 june 12 2019
SMART_READER_LITE
LIVE PREVIEW

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 - - PowerPoint PPT Presentation

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec. 14.1, 14.2 [HTF] Chap. 6 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Non-linear Models Recap Generalized linear models:


slide-1
SLIDE 1

CS480/680 Lecture 11: June 12, 2019

Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec. 14.1, 14.2 [HTF] Chap. 6

CS480/680 Spring 2019 Pascal Poupart 1 University of Waterloo

slide-2
SLIDE 2

Non-linear Models Recap

  • Generalized linear models:
  • Neural networks:

CS480/680 Spring 2019 Pascal Poupart 2 University of Waterloo

slide-3
SLIDE 3

Kernel Methods

  • Idea: use large (possibly infinite) set of fixed non-

linear basis functions

  • Normally, complexity depends on number of basis

functions, but by a “dual trick”, complexity depends

  • n the amount of data
  • Examples:

– Gaussian Processes (next class) – Support Vector Machines (next week) – Kernel Perceptron – Kernel Principal Component Analysis

CS480/680 Spring 2019 Pascal Poupart 3 University of Waterloo

slide-4
SLIDE 4

Kernel Function

  • Let !(#) be a set of basis functions that map inputs

% to a feature space.

  • In many algorithms, this feature space only appears

in the dot product ! # &!(#') of input pairs #, #′.

  • Define the kernel function * #, #' = ! # &!(#') to

be the dot product of any pair %, %′ in feature space.

– We only need to know ,(#, #'), not !(#)

CS480/680 Spring 2019 Pascal Poupart 4 University of Waterloo

slide-5
SLIDE 5

Dual Representations

  • Recall linear regression objective

! " =

$ % ∑'($ )

"*+ ,' − .'

% + % "*"

  • Solution: set gradient to 0

1! " = ∑' "*+ ,' − .' + ,' + 2" = 0 " = −

$ 0 ∑' "*+ ,4 − .' +(,4)

∴ " is a linear combination of inputs in feature space + ,' |1 ≤ ; ≤ <

CS480/680 Spring 2019 Pascal Poupart 5 University of Waterloo

slide-6
SLIDE 6

Dual Representations

  • Substitute ! = #$
  • Where # = [& '( & ')

… & '+ ] $ =

  • (
  • )

  • /

and -0 = −

( 2 34& '0 − 50

  • Dual objective: minimize 6 with respect to $

6 $ = (

) $7#7##7#$ − $7#7#8 + 878 ) + 2 ) $7#7#$

CS480/680 Spring 2019 Pascal Poupart 6 University of Waterloo

slide-7
SLIDE 7

Gram Matrix

  • Let ! = #$# be the Gram matrix
  • Substitute in objective:

% & = '

(&)!!& − &)!+ + +)+ ( + - ( &)!&

  • Solution: set gradient to 0

.% & = !!& − !+ + /!& = 0 ! ! + /1 & = !+ & = ! + /1 2'+

  • Prediction:

3∗ = 5 6∗ $7 = 5 6∗ $#& = 8 6∗, : ! + /1 2'+ where :, + is the training set and 6∗, 3∗ is a test instance

CS480/680 Spring 2019 Pascal Poupart 7 University of Waterloo

slide-8
SLIDE 8

Dual Linear Regression

  • Prediction: !∗ = $ %∗ &'(

= ) %∗, + , + ./ 012

  • Linear regression where we find dual solution (

instead of primal solution w.

  • Complexity:

– Primal solution: depends on # of basis functions – Dual solution: depends on amount of data

  • Advantage: can use very large # of basis functions
  • Just need to know kernel )

CS480/680 Spring 2019 Pascal Poupart 8 University of Waterloo

slide-9
SLIDE 9

Constructing Kernels

  • Two possibilities:

– Find mapping ! to feature space and let " = !$! – Directly specify "

  • Can any function that takes two arguments serve as a

kernel?

  • No, a valid kernel must be positive semi-definite

– In other words, % must factor into the product of a transposed matrix by itself (e.g., " = !$!) – Or, all eigenvalues must be greater than or equal to 0.

CS480/680 Spring 2019 Pascal Poupart 9 University of Waterloo

slide-10
SLIDE 10

Example

  • Let ! ", $ = "&$

'

CS480/680 Spring 2019 Pascal Poupart 10 University of Waterloo

slide-11
SLIDE 11

Constructing Kernels

  • Can we construct ! directly without knowing "?
  • Yes, any positive semi-definite ! is fine since there is

a corresponding implicit feature space. But positive semi-definiteness is not always easy to verify.

  • Alternative, construct kernels from other kernels

using rules that preserve positive semi-definiteness

CS480/680 Spring 2019 Pascal Poupart 11 University of Waterloo

slide-12
SLIDE 12

Rules to construct Kernels

  • Let !" #, #% and !&(#, #%) be valid kernels
  • The following kernels are also valid:

1. ! #, #% = *!" #, #% ∀* > 0 2. ! #, #% = . # !" #, #% . #% ∀. 3. ! #, #% = /(!" #, #% ) / is polynomial with coeffs ≥ 0 4. ! #, #% = exp !" #, #% 5. ! #, #% = !" #, #% + !& #, #% 6. ! #, #% = !" #, #% !&(#, #%) 7. ! #, #% = !5(6 # , 6 #% ) 8. ! #, #% = #78#% 8 is symmetric positive semi-definite 9. ! #, #% = !9 #:, #9

%

+ !;(#<, #;

% )

  • 10. ! #, #% = !9 #9, #9

% !;(#;, #; % )

CS480/680 Spring 2019 Pascal Poupart 12

where # =

#= #>

University of Waterloo

slide-13
SLIDE 13

Common Kernels

  • Polynomial kernel: ! ", "$ = "&"$ '

– ( is the degree – Feature space: all degree M products of entries in " – Example: Let " and "′ be two images, then feature space could be all products of M pixel intensities

  • More general polynomial kernel:

! ", "$ = "&"$ + + ' with + > 0

– Feature space: all products of up to M entries in "

CS480/680 Spring 2019 Pascal Poupart 13 University of Waterloo

slide-14
SLIDE 14

Common Kernels

  • Gaussian Kernel: ! ", "$ = exp −

"*"+

,

  • .,
  • Valid Kernel because:
  • Implicit feature space is infinite!

CS480/680 Spring 2019 Pascal Poupart 14 University of Waterloo

slide-15
SLIDE 15

Non-vectorial Kernels

  • Kernels can be defined with respect to other things

than vectors such as sets, strings or graphs

  • Example for strings: ! "#, "% = similarity between

two documents (weighted sum of all non-contiguous strings that appear in both documents "# and "%).

  • Lodhi, Saunders, Shawe-Taylor, Christianini, Watkins,

Text Classification Using String Kernels, JMLR, p. 419-444, 2002.

CS480/680 Spring 2019 Pascal Poupart 15 University of Waterloo