CSE446: Kernels and Kernelized Perceptron Winter 2015 - PowerPoint PPT Presentation

CSE446: ¡Kernels ¡and ¡ ¡ Kernelized ¡Perceptron ¡ Winter ¡2015 ¡ Luke ¡Ze@lemoyer ¡ ¡ ¡ Slides ¡adapted ¡from ¡Carlos ¡Guestrin ¡

What ¡if ¡the ¡data ¡is ¡not ¡linearly ¡separable? ¡ Use features of features of features of features….   x 1 . . .     x n     x 1 x 2   φ ( x ) =   x 1 x 3     . . .     e x 1   . . . Feature space can get really large really quickly!

Non-‑linear ¡features: ¡1D ¡input ¡ • Datasets ¡that ¡are ¡linearly ¡separable ¡with ¡some ¡noise ¡work ¡ out ¡great: ¡ ¡ x 0 • But ¡what ¡are ¡we ¡going ¡to ¡do ¡if ¡the ¡dataset ¡is ¡just ¡too ¡hard? ¡ ¡ x 0 • How ¡about… ¡mapping ¡data ¡to ¡a ¡higher-‑dimensional ¡space: ¡ x 2 x

Feature ¡spaces ¡ • General ¡idea: ¡ ¡ ¡map ¡to ¡higher ¡dimensional ¡space ¡ – if ¡ x ¡is ¡in ¡R n , ¡then ¡φ( x ) ¡is ¡in ¡R m ¡for ¡m>n ¡ – Can ¡now ¡learn ¡feature ¡weights ¡ w ¡ in ¡R m ¡ and ¡predict: ¡ ¡ y = sign ( w · φ ( x )) – Linear ¡funcXon ¡in ¡the ¡higher ¡dimensional ¡space ¡will ¡be ¡non-‑linear ¡in ¡ the ¡original ¡space ¡ x → φ ( x )

Higher ¡order ¡polynomials ¡ number of monomial terms d=4 m – input features d – degree of polynomial d=3 grows fast! d = 6, m = 100 d=2 about 1.6 billion terms number of input dimensions

Efficient ¡dot-‑product ¡of ¡polynomials ¡ Polynomials of degree exactly d d =1 � u 1 � v 1 ⇥ ⇥ � ⇥ � ⇥ φ ( u ) . φ ( v ) = = u 1 v 1 + u 2 v 2 = u.v . u 2 v 2 u 2 v 2 ⇤ ⌅ ⇤ ⌅ d =2 ⌃ ⇧ ⌃ 1 1 u 1 u 2 v 1 v 2 ⌃ = u 2 1 v 2 1 + 2 u 1 v 1 u 2 v 2 + u 2 2 v 2 ⌥ � ⌥ � φ ( u ) . φ ( v ) = ⌃ . ⌥ � ⌥ � 2 u 2 u 1 v 2 v 1 ⇧ ⇧ = ( u 1 v 1 + u 2 v 2 ) 2 u 2 v 2 2 2 = ( u.v ) 2 For any d (we will skip proof): φ ( u ) . φ ( v ) = ( u.v ) d K ( u, v ) = • Cool! Taking a dot product and an exponential gives same results as mapping into high dimensional space and then taking dot product

The ¡ “ Kernel ¡Trick ” ¡ • A ¡ kernel ¡func*on ¡defines ¡a ¡dot ¡product ¡in ¡some ¡feature ¡space. ¡ ¡ ¡ ¡K ( u , v )= ¡ φ ( u )  ¡ φ ( v ) ¡ • Example: ¡ ¡ ¡2-‑dimensional ¡vectors ¡ u =[ u 1 ¡ ¡ ¡ u 2 ] ¡and ¡ v =[ v 1 ¡ ¡ ¡ v 2 ]; ¡ ¡let ¡ K ( u,v )=(1 ¡+ ¡ u  v ) 2 , ¡ ¡Need ¡to ¡show ¡that ¡ K ( x i , x j )= ¡ φ ( x i ) ¡  φ ( x j ): ¡ ¡ ¡ K ( u , v )=(1 ¡+ ¡ u  v ) 2 , = ¡1+ ¡ u 1 2 v 1 2 ¡ + ¡ 2 ¡ u 1 v 1 ¡ u 2 v 2 + ¡u 2 2 v 2 2 ¡ + ¡2 u 1 v 1 ¡ + ¡ 2 u 2 v 2 = ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡= ¡ [1, ¡ u 1 2 , ¡ ¡ √ 2 ¡ u 1 u 2 , ¡ ¡ ¡u 2 2 , ¡ ¡ √ 2 u 1 , ¡ ¡ √ 2 u 2 ] ¡  ¡ [1, ¡ ¡ v 1 2 , ¡ ¡ √ 2 v 1 v 2 , ¡ ¡ v 2 2 , ¡ ¡ √ 2 v 1 , ¡ ¡ √ 2 v 2 ] ¡= ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡= ¡ φ ( u ) ¡  φ ( v ), ¡ ¡ ¡ ¡where ¡ φ ( x ) ¡= ¡ ¡ [1, ¡ ¡ x 1 2 , ¡ ¡ √ 2 ¡ x 1 x 2 , ¡ ¡ ¡x 2 2 , ¡ ¡ ¡ √ 2 x 1 , ¡ ¡ √ 2 x 2 ] ¡ • Thus, ¡a ¡kernel ¡funcXon ¡implicitly ¡ maps ¡data ¡to ¡a ¡high-‑dimensional ¡space ¡ (without ¡the ¡need ¡to ¡compute ¡each ¡ φ ( x ) ¡explicitly). ¡ • But, ¡it ¡isn’t ¡obvious ¡yet ¡how ¡we ¡will ¡incorporate ¡it ¡into ¡actual ¡learning ¡ algorithms… ¡

“Kernel ¡trick” ¡for ¡The ¡Perceptron! ¡ • Never ¡compute ¡features ¡explicitly!!! ¡ – Compute ¡dot ¡products ¡in ¡closed ¡form ¡K(u,v) ¡= ¡Φ(u) ¡  ¡Φ(v) ¡ ¡ • Kernelized ¡Perceptron: ¡ • Standard ¡Perceptron: ¡ • set ¡a i =0 ¡for ¡each ¡example ¡i ¡ • set ¡w i =0 ¡for ¡each ¡feature ¡i ¡ • For ¡t=1..T, ¡i=1..n: ¡ • set ¡a i =0 ¡for ¡each ¡example ¡i ¡ – ¡ ¡ X a k φ ( x k )) · φ ( x i )) y = sign (( • For ¡t=1..T, ¡i=1..n: ¡ k y = sign ( w · φ ( x i )) – ¡ ¡ X a k K ( x k , x i )) ¡ = sign ( – if ¡y ¡≠ ¡y i ¡ – if ¡y ¡≠ ¡y i ¡ w = w + y i φ ( x i ) k • ¡ ¡ • a i ¡+= ¡y i ¡ • ¡ a i ¡+= ¡y i ¡ ¡ • At ¡all ¡Xmes ¡during ¡learning: ¡ Exactly the same ¡ X a k φ ( x k ) computations, but can use w = K(u,v) to avoid enumerating k the features!!!

• set ¡a i =0 ¡for ¡each ¡example ¡i ¡ IniXal: ¡ • a ¡= ¡[a 1 , ¡a 2 , ¡a 3 , ¡a 4 ] ¡= ¡[0,0,0,0] ¡ • For ¡t=1..T, ¡i=1..n: ¡ t=1,i=1 ¡ – ¡ ¡ X a k K ( x k , x i )) y = sign ( • Σ k a k K(x k ,x 1 ) ¡= ¡0x4+0x0+0x4+0x0 ¡= ¡0, ¡sign(0)=-‑1 ¡ – if ¡y ¡≠ ¡y i ¡ k • a 1 ¡+= ¡y 1 à ¡a 1 +=1, ¡new ¡a= ¡[1,0,0,0] ¡ • a i ¡+= ¡y i ¡ t=1,i=2 ¡ • Σ k a k K(x k ,x 2 ) ¡= ¡1x0+0x4+0x0+0x4 ¡= ¡0, ¡sign(0)=-‑1 ¡ ¡ t=1,i=3 ¡ x 1 ¡ x 2 ¡ y ¡ ¡ • Σ k a k K(x k ,x 3 ) ¡= ¡1x4+0x0+0x4+0x0 ¡= ¡4, ¡sign(4)=1 ¡ t=1,i=4 ¡ 1 ¡ 1 ¡ 1 ¡ • Σ k a k K(x k ,x 4 ) ¡= ¡1x0+0x4+0x0+0x4 ¡= ¡0, ¡sign(0)=-‑1 ¡ -‑1 ¡ 1 ¡ -‑1 ¡ t=2,i=1 ¡ x 1 • Σ k a k K(x k ,x 1 ) ¡= ¡1x4+0x0+0x4+0x0 ¡= ¡4, ¡sign(4)=1 ¡ -‑1 ¡ -‑1 ¡ 1 ¡ … ¡ ¡ 1 ¡ -‑1 ¡ -‑1 ¡ x 2 ¡ ¡ x 1 ¡ x 2 ¡ x 3 ¡ x 4 ¡ K(u,v) ¡= ¡(u  v ) 2 ¡ K ¡ Converged!!! ¡ e.g., ¡ ¡ x 1 ¡ 4 ¡ 0 ¡ 4 ¡ 0 ¡ • y=Σ k ¡a k ¡K(x k ,x) ¡ K(x 1 ,x 2 ) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡= ¡1×K(x 1 ,x)+0×K(x 2 ,x)+0×K(x 3 ,x)+0×K(x 4 ,x) ¡ x 2 ¡ 0 ¡ 4 ¡ 0 ¡ 4 ¡ ¡ ¡ ¡ ¡= ¡K([1,1],[-‑1,1]) ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡= ¡K(x 1 ,x) ¡ ¡ ¡ ¡ ¡= ¡(1x-‑1+1x1) 2 ¡ x 3 ¡ 4 ¡ 0 ¡ 4 ¡ 0 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡= ¡K([1,1],x) ¡ ¡ ¡(because ¡x 1 =[1,1]) ¡ ¡ ¡ ¡ ¡ ¡ ¡ = ¡0 ¡ x 4 ¡ 0 ¡ 4 ¡ 0 ¡ 4 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡= ¡(x 1 +x 2 ) 2 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡(because ¡ ¡ K(u,v) ¡= ¡(u  v) 2 ) ¡ ¡ ¡ ¡ ¡

Common ¡kernels ¡ • Polynomials ¡of ¡degree ¡exactly ¡ d ¡ • Polynomials ¡of ¡degree ¡up ¡to ¡ d ¡ • Gaussian ¡kernels ¡ • Sigmoid ¡ ¡ ¡ • And ¡many ¡others: ¡very ¡acXve ¡area ¡of ¡research! ¡

Overfipng? ¡ • Huge ¡feature ¡space ¡with ¡kernels, ¡what ¡about ¡ overfipng??? ¡ – Oqen ¡robust ¡to ¡overfipng, ¡e.g. ¡if ¡you ¡don’t ¡make ¡ too ¡many ¡Perceptron ¡updates ¡ – SVMs ¡(which ¡we ¡will ¡see ¡next) ¡will ¡have ¡a ¡clearer ¡ story ¡for ¡avoiding ¡overfipng ¡ – But ¡everything ¡overfits ¡someXmes!!! ¡ • Can ¡control ¡by: ¡ – Choosing ¡a ¡be@er ¡Kernel ¡ – Varying ¡parameters ¡of ¡the ¡Kernel ¡(width ¡of ¡Gaussian, ¡etc.) ¡

Kernels ¡in ¡logisXc ¡regression ¡ 1 P ( Y = 0 | X = x , w , w 0 ) = 1 + exp ( w 0 + w · x ) • Define ¡weights ¡in ¡terms ¡of ¡data ¡points: ¡ X α j φ ( x j ) w = j 1 P ( Y = 0 | X = x , w , w 0 ) = 1 + exp ( w 0 + P j α j φ ( x j ) · φ ( x )) 1 = ¡ 1 + exp ( w 0 + P j α j K ( x j , x )) • Derive ¡gradient ¡descent ¡rule ¡on ¡ α j ,w 0 ¡ • Similar ¡tricks ¡for ¡all ¡linear ¡models: ¡SVMs, ¡etc ¡

What ¡you ¡need ¡to ¡know ¡ • The ¡kernel ¡trick ¡ • Derive ¡polynomial ¡kernel ¡ • Common ¡kernels ¡ • Kernelized ¡perceptron ¡

CSE446: Kernels and Kernelized Perceptron Winter 2015 - PowerPoint PPT Presentation

CSE446: Kernels and Kernelized Perceptron Winter 2015 Luke Ze@lemoyer Slides adapted from Carlos Guestrin What if the data is not linearly

Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Introduction to Machine Learning 4. Perceptron and Kernels Geoff Gordon and Alex Smola Carnegie

Introduction to Machine Learning 4. Perceptron and Kernels Alex Smola Carnegie Mellon University

CSE446: Decision Trees Winter 2015 Luke Ze;lemoyer Slides

The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at

Comparison of UV-RSS Spectral Measurements and TUV Model Runs for the May 2003 ARM Aerosol

MelanieGardner USDA,NAL Manager,Innovation&Collaboration

Scope of Briefing Address by Executive Chairman Group Financial Highlights

CAM SLIDES Steel

UV-Vis optical fiber assisted spectroscopy in thin films and solutions Description UV-Visible

Collaboration Signatur Collaboration Signatures es Reveal Scientific Impact Reveal Scientific

Depth First Search (DFS) Lecture 16 Thursday, October 26, 2017 Sariel Har-Peled (UIUC) CS374 1

Searches for new vacua II: A new higgstory at the cosmological collider Junwu Huang Perimeter

Sambuz

Useful Links

Newsletter

Mail Us

CSE446: Kernels and Kernelized Perceptron Winter 2015 - PowerPoint PPT Presentation

CSE446: Kernels and Kernelized Perceptron Winter 2015 Luke Ze@lemoyer Slides adapted from Carlos Guestrin What if the data is not linearly

Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang

CS 472 - Perceptron 1 Basic Neuron CS 472 - Perceptron 2 Expanded Neuron CS 472 - Perceptron

Overview: Kernels for Sequences and Graphs String Kernels 8 Example Sequence Classification

The Gray Code Kernels The Gray Code Kernels The Gray Code Kernels Gil Ben-Artzi Hagit Hel-Or

The Perceptron Algorithm Machine Learning 1 Some slides based on lectures from Dan Roth, Avrim

Structured Perceptron CMSC 470 Marine Carpuat POS tagging Sequence labeling with the perceptron

Beta kernels and transformed kernels applications to copulas and quantiles Arthur Charpentier

Kernels on structures Andrea Passerini passerini@disi.unitn.it Machine Learning Kernels on

The Perceptron Mistake Bound Machine Learning 1 Some slides based on lectures from Dan Roth,

Introduction to Machine Learning Perceptron Barnabs Pczos Contents History of Artificial

How to Train Your Perceptron 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University

Machine Learning A Geometric Approach Linear Classification: Perceptron Professor Liang Huang

Introduction to Machine Learning 4. Perceptron and Kernels Geoff Gordon and Alex Smola Carnegie

Introduction to Machine Learning 4. Perceptron and Kernels Alex Smola Carnegie Mellon University

CSE446: Decision Trees Winter 2015 Luke Ze;lemoyer Slides

The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at Wisley The Winter Walk at

Comparison of UV-RSS Spectral Measurements and TUV Model Runs for the May 2003 ARM Aerosol

MelanieGardner USDA,NAL Manager,Innovation&amp;Collaboration

Scope of Briefing Address by Executive Chairman Group Financial Highlights

CAM SLIDES Steel

UV-Vis optical fiber assisted spectroscopy in thin films and solutions Description UV-Visible

Collaboration Signatur Collaboration Signatures es Reveal Scientific Impact Reveal Scientific

Depth First Search (DFS) Lecture 16 Thursday, October 26, 2017 Sariel Har-Peled (UIUC) CS374 1

Searches for new vacua II: A new higgstory at the cosmological collider Junwu Huang Perimeter

Sambuz

Useful Links

Newsletter

Mail Us

MelanieGardner USDA,NAL Manager,Innovation&Collaboration