L ECTURE 9: D UAL AND K ERNEL Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS446 Introduction to Machine Learning (Fall 2013) University of Illinois at Urbana-Champaign http://courses.engr.illinois.edu/cs446 L ECTURE 9: D UAL AND K ERNEL Prof. Julia Hockenmaier juliahmr@illinois.edu

Linear classifiers so far… What we’ve seen so far is not the whole story – We’ve assumed that the data are linearly separable – We’ve ignored the fact that the perceptron just finds some decision boundary, but not necessarily an optimal decision boundary CS446 Machine Learning 2

Data are not linearly separability Noise / outliers Target function is not linear in X CS446 Machine Learning 3

Dual representation of linear classifiers CS446 Machine Learning 4

Dual representation Recall the Perceptron update rule: If x m is misclassified, add y m · x m to w if y m ·f( x m ) = y m · w · x m < 0: w := w + y m · x m Dual representation: Write w as a weighted sum of training items: w = ∑ n α n y n x n α n : how often was x n misclassified? f( x ) = w · x = ∑ n α n y n x n · x CS446 Machine Learning 5

Dual representation Primal Perceptron update rule: If x m is misclassified, add y m · x m to w if y m ·f( x m ) = y m · w · x m < 0: w := w + y m · x m Dual Perceptron update rule: If x m is misclassified, add 1 to α m if y m · ∑ d α d x d · x m < 0: α m := α m + 1 CS446 Machine Learning 6

Dual representation Classifying x in the primal: f( x ) = w x w = feature weights (to be learned) wx = dot product between w and x Classifying x in the dual: f( x ) = ∑ n α n y n x n x α n = weight of n -th training example (to be learned) x n x = dot product between x n and x The dual representation is advantageous when #training examples ≪ #features (requires fewer parameters to learn) CS446 Machine Learning 7

Kernels

Making data linearly separable Original feature space 2 1.5 1 0.5 x2 0 -0.5 -1 -1.5 -2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 x1 f( x ) = 1 iff x 1 2 + x 2 2 ≤ 1 CS446 Machine Learning 9

Making data linearly separable Transformed feature space 2 1.5 x2*x2 1 0.5 0 0 0.5 1 1.5 2 x1*x1 Transform data: x = (x 1 , x 2 ) => x’ = (x 1 2 , x 2 2 ) f( x’ ) = 1 iff x’ 1 + x’ 2 ≤ 1 CS446 Machine Learning 10

Making data linearly separable x 1 These data aren’t linearly separable in the x 1 space But adding a second dimension with x 2 = x 1 2 makes them linearly separable in 〈 x 1 , x 2 〉 : x 1 2 x 1 CS446 Machine Learning 11

Making data linearly separable It is common for data to be not linearly separable in the original feature space. We can often introduce new features to make the data linearly separable in the new space: – transform the original features (e.g. x → x 2 ) – include transformed features in addition to the original features – capture interactions between features (e.g. x 3 = x 1 x 2 ) But this may blow up the number of features CS446 Machine Learning 12

Making data linearly separable We need to introduce a lot of new features to learn the target function. Problem for the primal representation: w now has a lot of elements, and we might not have enough data to learn w The dual representation is not affected CS446 Machine Learning 13

The kernel trick – Define a feature function φ ( x ) which maps items x into a higher-dimensional space. – The kernel function K( x i , x j ) computes the inner product between the φ ( x i ) and φ ( x j ) K( x i , x j ) = φ ( x i ) φ ( x j ) – Dual representation: We don’t need to learn w in this higher-dimensional space. It is sufficient to evaluate K( x i , x j ) CS446 Machine Learning 14

Quadratic kernel Original features: x = (a, b) Transformed features: φ ( x ) = (a 2 , b 2 , √ 2·ab) Dot product in transformed space: φ ( x 1 )· φ ( x 2 ) = a 1 2 a 2 2 + b 1 2 b 2 2 , 2·a 1 b 1 a 2 b 2 = ( x 1 · x 2 ) 2 Kernel: K( x 1 , x 2 ) = ( x 1 · x 2 ) 2 = φ ( x 1 )· φ ( x 2 ) CS446 Machine Learning 15

Polynomial kernels Polynomial kernel of degree p: – Basic form K( x i ,x j ) = ( x i ·x j ) p – Standard form (captures all lower order terms): K( x i ,x j ) = ( x i ·x j + 1) p CS446 Machine Learning 16

From dual to kernel perceptron Dual Perceptron: f( x ) = ∑ d α d x d · x m Update: If x m is misclassified, add 1 to α m if y m · ∑ d α d x d · x m < 0: α m := α m + 1 Kernel Perceptron: f( x ) = ∑ d α d φ ( x d )· φ ( x m ) = ∑ d α d K( x d · x m ) Update: If x m is misclassified, add 1 to α m if y m · ∑ d α d K( x d · x m ) < 0: α m := α m + 1 CS446 Machine Learning 17

Maximum margin classifiers

Maximum margin classifiers CS446 Machine Learning 19

Hard vs. soft margins

Dealing with outliers: Slack variables ξ i ξ i measures by how much example ( x i , y i ) fails to achieve margin δ CS446 Machine Learning 21

Soft margins Hard margin (primal) Soft margin (primal) n 1 ∑ 1 min 2 w ⋅ w + C ξ i min 2 w ⋅ w w i = 1 w subject to subject to ξ i ≥ 0 ∀ i y 1 ( w ⋅ x 1 ) ≥ 1 y 1 ( w ⋅ x 1 ) ≥ (1 − ξ 1 ) and ξ 1 ≥ 0 ... ... y n ( w ⋅ x n ) ≥ 1 y n ( w ⋅ x n ) ≥ (1 − ξ n ) and ξ n ≥ 0 Minimize training error while maximizing the margin ∑ i ξ i is an upper bound on the number of training errors C controls tradeoff between margin and training error CS446 Machine Learning 22

L ECTURE 9: D UAL AND K ERNEL Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS446 Introduction to Machine Learning (Fall 2013) University of Illinois at Urbana-Champaign http://courses.engr.illinois.edu/cs446 L ECTURE 9: D UAL AND K ERNEL Prof. Julia Hockenmaier juliahmr@illinois.edu Linear classifiers so far What

Draw me a L ocal K ernel D ebugger Demo Conclusion Samuel Chevet & Clment Rouault 20

Semant ic Knowledge f or Semant ic Knowledge f or Text ual Ent ailment Text ual Ent ailment

CPSC 410/ 611: Week 7 Vir t ual Memor y Reading: Silber shat z, Chapt er 9 Vir

L ECTURE 8: D UAL AND K ERNELS Prof. Julia Hockenmaier juliahmr@illinois.edu Admin CS446

P RINCIPLED K ERNEL P REDICTION FOR S PATIALLY V ARYING BSSRDF S Oskar Elek and Jaroslav K

DrK DrK: Brea eaking Ker ernel el Addres ess Space e La Layout ut Ra Rando ndomi

U NDERSTANDING E MBEDDED L INUX B ENCHMARKING U SING K ERNEL T RACE A NALYSIS A LEXIS M ARTIN

W INDOWS :: B YPASSING KERNEL /GS There are two ways published in the literature to bypass the

P orting a G AMESS C omputational C hemistry K ernel to F PGAs Uma Klaassen University of Texas

Steams mship Mut utua ual Cosc sco B Busa san FFO FFO a and P Pollution. 7 th

OUR VI RT UAL NE W WORL D 2 DOE S PHYSI CAL CONT ACTMAT T E R I N PARE NT ALRE

Annual ual Ge Gener neral Meeting eting May 10 th th , 2017 Revi view w by Erik k Stenf

Teaching, Learning and Wellbeing Learning and Teaching Conference 2019 Thurs 21 March #LTC19

Art & Design and Photography WHAT TO EXPECT VOCATIONAL Art & Design programme of study

Ann Annua ual l Gen Gener eral M al Mee eeting ting Novembe mber 201 2012 Qu Quickstep

Post-Occupancy Evaluation: UAL University of the Arts London and POE : User Experience (UX)

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

Explicit Feature Methods for Accelerated Kernel Learning Purushottam Kar Quick Motivation

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Spectral regularization methods for statistical inverse learning problems G. Blanchard

Meta-parameters of kernel methods and their optimization Petra Vidnerov Roman Neruda

PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION Hypothesis

L ECTURE 9: D UAL AND K ERNEL Prof. Julia Hockenmaier - PowerPoint PPT Presentation

CS446 Introduction to Machine Learning (Fall 2013) University of Illinois at Urbana-Champaign http://courses.engr.illinois.edu/cs446 L ECTURE 9: D UAL AND K ERNEL Prof. Julia Hockenmaier juliahmr@illinois.edu Linear classifiers so far What

Draw me a L ocal K ernel D ebugger Demo Conclusion Samuel Chevet &amp; Clment Rouault 20

Semant ic Knowledge f or Semant ic Knowledge f or Text ual Ent ailment Text ual Ent ailment

CPSC 410/ 611: Week 7 Vir t ual Memor y Reading: Silber shat z, Chapt er 9 Vir

L ECTURE 8: D UAL AND K ERNELS Prof. Julia Hockenmaier juliahmr@illinois.edu Admin CS446

P RINCIPLED K ERNEL P REDICTION FOR S PATIALLY V ARYING BSSRDF S Oskar Elek and Jaroslav K

DrK DrK: Brea eaking Ker ernel el Addres ess Space e La Layout ut Ra Rando ndomi

U NDERSTANDING E MBEDDED L INUX B ENCHMARKING U SING K ERNEL T RACE A NALYSIS A LEXIS M ARTIN

W INDOWS :: B YPASSING KERNEL /GS There are two ways published in the literature to bypass the

P orting a G AMESS C omputational C hemistry K ernel to F PGAs Uma Klaassen University of Texas

Steams mship Mut utua ual Cosc sco B Busa san FFO FFO a and P Pollution. 7 th

OUR VI RT UAL NE W WORL D 2 DOE S PHYSI CAL CONT ACTMAT T E R I N PARE NT ALRE

Annual ual Ge Gener neral Meeting eting May 10 th th , 2017 Revi view w by Erik k Stenf

Teaching, Learning and Wellbeing Learning and Teaching Conference 2019 Thurs 21 March #LTC19

Art &amp; Design and Photography WHAT TO EXPECT VOCATIONAL Art &amp; Design programme of study

Ann Annua ual l Gen Gener eral M al Mee eeting ting Novembe mber 201 2012 Qu Quickstep

Post-Occupancy Evaluation: UAL University of the Arts London and POE : User Experience (UX)

Lecture 5: SVM II Princeton University COS 495 Instructor: Yingyu Liang Review: SVM objective

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &amp;

Kernel methods and Graph kernels Social and Technological Networks Rik Sarkar University of

Explicit Feature Methods for Accelerated Kernel Learning Purushottam Kar Quick Motivation

Machine learning theory Kernel methods Hamid Beigy Sharif university of technology April 20,

Spectral regularization methods for statistical inverse learning problems G. Blanchard

Meta-parameters of kernel methods and their optimization Petra Vidnerov Roman Neruda

PRESENTATION ON: A SHORTEST PATH DEPENDENCY KERNEL FOR RELATION EXTRACTION Hypothesis

Draw me a L ocal K ernel D ebugger Demo Conclusion Samuel Chevet & Clment Rouault 20

Art & Design and Photography WHAT TO EXPECT VOCATIONAL Art & Design programme of study

A Neural Network View of Kernel Methods Shuiwang Ji Department of Computer Science &