Kernels CS678 Advanced Topics in Machine Learning Thorsten - - PowerPoint PPT Presentation

kernels
SMART_READER_LITE
LIVE PREVIEW

Kernels CS678 Advanced Topics in Machine Learning Thorsten - - PowerPoint PPT Presentation

Non-Linear Problems Kernels CS678 Advanced Topics in Machine Learning Thorsten Joachims Spring 2003 ==> Outline: A representation of the hyperplane in terms of the


slide-1
SLIDE 1

Kernels

CS678AdvancedTopicsinMachineLearning ThorstenJoachims Spring2003 Outline:

  • Arepresentationofthehyperplaneintermsofthetrainingexamples.
  • Howtotransformalinearlearnerintoanon-linearlearner!
  • Howcankernelscanmakehigh-dimensionalspacestractable?
  • Howcankernelsmakenon-vectorialdatatractable?

Non-LinearProblems

Problem:

  • sometaskshavenon-linearstructure
  • nohyperplaneissufficientlyaccurate

HowcanSVMslearnnon-linearclassificationrules? ==>

ExtendingtheHypothesisSpace

Idea: ==>Findhyperplaneinfeaturespace! Example: ==>Theseparatinghyperplaneinfeaturesspaceisadegreetwo polynomialininputspace. InputSpace FeatureSpace Φ a b c a b c aa ab ac bb bc cc Φ

Example

InputSpace: (2Attributes) FeatureSpace: (6Attributes)

x x1 x2 , ( ) = Φ x ( ) x1

2 x2 2

, 2x ,

1

2x2 2x1x2 1 , , , ( ) =

slide-2
SLIDE 2

Kernels

Problem:VerymanyParameters!PolynomialsofdegreepoverN attributesininputspaceleadto attributesinfeaturespace! Solution:[Boseretal.]ThedualOPdependsonlyoninnerproducts =>KernelFunctions Example:For calculating givesinnerproductinfeaturespace. Wedonotneedtorepresentthefeaturespaceexplicitly!

O Np ( ) K a b , ( ) Φ a ( ) Φ b ( ) ⋅ = Φ x ( ) x1

2 x2 2

, 2x ,

1

2x2 2x1x2 1 , , , ( ) = K a b , ( ) a b 1 + ⋅ [ ]

2

Φ a ( ) Φ b ( ) ⋅ = =

SVMwithKernels

Training:maximize s.t. Classification:Fornewexamplex NewhypothesesspacesthroughnewKernels: Linear: Polynomial: RadialBasisFunctions: Sigmoid:

L α ( ) αi

i 1 = n

  • 1

2

  • αiαjyiyjK xi xj

, ( )

j 1 = n

  • i

1 = n

= αiyi

i 1 = n

  • =

und αi C ≤ ≤ h x ( ) sign αiyiK xi x , ( )

xi SV ∈

  • b

+

  • =

K xi xj , ( ) xi xj ⋅ = K xi xj , ( ) xi xj 1 + ⋅ [ ]

d

= K xi xj , ( ) γ – xi xj –

2

( ) exp = K xi xj , ( ) γ xi xj – ( ) c + ( ) tanh =

Example:SVMwithPolynomialofDegree2

Kernel: K xi xj

, ( ) xi xj 1 + ⋅ [ ]

2

=

Example:SVMwithRBF-Kernel

Kernel: K xi xj

, ( ) γ – xi xj –

2

( ) exp =

slide-3
SLIDE 3

WhatisaValidKernel?

Definition:LetXbeanonemptyset.Afunction isavalidkernel inXifforallnandall itproducesaGrammatrix thatissymmetric andpositivesemi-definite

K xi xj , ( ) x1 … xn X ∈ , , Gij K xi xj , ( ) = G GT = α α

T

Gα αiαjK xi xj , ( )

j 1 = n

  • i

1 = n

  • =

HowtoConstructValidKernels?

Theorem:Let and bevalidKernelsover , , , , areal-valuedfunctionon , with akernelover ,and asummetricpositivesemi-definitematrix.Thenthe followingfunctionsarevalidKernels =>ConstructcomplexKernelsfromsimpleKernels.

K1 K2 X X × X ℜ

N

⊆ a ≥ λ 1 ≤ ≤ f X φ X ℜ

m

→ ; K3 ℜ

m

m

× K K x z , ( ) λK1 x z , ( ) 1 λ – ( )K2 x z , ( ) + = K x z , ( ) aK1 x z , ( ) = K x z , ( ) K1 x z , ( )K2 x z , ( ) = K x z , ( ) f x ( )f z ( ) = K x z , ( ) K3 φ x ( ) φ z ( ) , ( ) = K x z , ( ) x

T

Kz =

KernelsforDiscreteandStructuredRepresentations

KernelsforSequences:Twosequencesaresimilar,ifthehavemany commonandconsecutivesubsequences. Example[Lodhietal.,2000]:For considerthefollowing featuresspace => ,efficientcomputationviadynamicprogramming. =>FisherKernels[Jaakkola&Haussler,1998]

c-a c-t a-t b-a b-t c-r a-r b-r

λ 1 ≤ ≤ φ cat ( ) λ

2

λ

3

λ

2

φ car ( ) λ

2

λ

3

λ

2

φ bat ( ) λ

2

λ

2

λ

3

φ bar ( ) λ

2

λ

2

λ

3

K car cat , ( ) λ

4

=

ComputingStringKernel(I)

Definitions:

  • :sequencesoflengthnoveralphabet
  • :indexsequence(sorted)
  • :substringoperator
  • :rangeofindexsequence

Kernel:Averagerangeofcommonsubsequencesoflengthn AuxiliaryFunction:Averagerangetoendofsequenceofcommon subsequencesoflengthn

Σn Σ i i1 … in , , ( ) = s i ( ) r i ( ) in i1 – 1 + = Kn s t , ( ) λ

in jn i1 – j1 – 2 + + j u ; s j ( ) =

  • i u

; s i ( ) =

  • u

Σn ∈

  • =

Kd′ s t , ( ) λ

s t i1 – j1 – 2 + + j u ; s j ( ) =

  • i u

; s i ( ) =

  • u

Σn ∈

  • =
slide-4
SLIDE 4

ComputingStringKernel(II)

Kernel: Auxiliary:

Kn s t , ( ) = if min s t , ( ) n < ( ) Kn sx t , ( ) Kn s t , ( ) K′n

1 –

s t 1…j 1 – [ ] , ( )λ2

j tj ; x =

  • +

= K′0 s t , ( ) 1 = K′d s t , ( ) = if min s t , ( ) d < ( ) K′d sx t , ( ) λK′d s t , ( ) K′d

1 –

s t 1…j 1 – [ ] , ( )λ t

j – 2 + j tj ; x =

  • +

=