Learning From Data Lecture 26 Kernel Machines Popular Kernels The - - PowerPoint PPT Presentation

learning from data lecture 26 kernel machines
SMART_READER_LITE
LIVE PREVIEW

Learning From Data Lecture 26 Kernel Machines Popular Kernels The - - PowerPoint PPT Presentation

Learning From Data Lecture 26 Kernel Machines Popular Kernels The Kernel Measures Similarity Kernels in Different Applications M. Magdon-Ismail CSCI 4100/6100 recap: The Kernel Allows Us to Bypass Z -space Solve the QP 1


slide-1
SLIDE 1

Learning From Data Lecture 26 Kernel Machines

Popular Kernels The Kernel Measures Similarity Kernels in Different Applications

  • M. Magdon-Ismail

CSCI 4100/6100

slide-2
SLIDE 2

recap: The Kernel Allows Us to Bypass Z-space xn ∈ X

↓ K(·, ·)

g(x) = sign  

α∗

n>0

α∗

nynK(xn, x) + b∗

  b∗ = ys −

  • α∗

n>0

α∗

nynK(xn, xs) (One can compute b∗ for several SVs and average)

Solve the QP minimize

α 1 2αtGα − 1tα

subject to: ytα = 0 C ≥ α ≥ 0      − → α∗ index s : C > α∗

s > 0

free support vectors

Overfitting Computation SVM Pseudo-inverse Inner products with Kernel K(·, ·) high ˜ d → complicated separator few support vectors → low effective complexity high ˜ d → expensive or infeasible computation kernel → computationally feasible to go to high ˜ d

Can go to high (infinite) ˜ d Can go to high (infinite) ˜ d

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 2 /12

Polynomial Kernel − →

slide-3
SLIDE 3

Polynomial Kernel

2nd-Order Polynomial Kernel

Φ(x) =                                x1 x2 . . . xd x2

1

x2

2

. . . x2

d

√ 2x1x2 √ 2x1x3 . . . √ 2x1xd √ 2x2x3 . . . √ 2xd−1xd                               

K(x, x′) = Φ(x)tΦ(x′) =

d

  • i=1

xix′

i + d

  • i=1

x2

i x′ i 2 + 2

  • i<j

xixjx′

ix′ j

← O(d2) = 1 2 + xtx′ 2 − 1 4

↑ computed quickly in X -space, in O(d)

Q-th order polynomial kernel K(x, x′) = (r + xtx′)Q ← inhomogeneous kernel K(x, x′) = (xtx′)Q ← homogeneous kernel

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 3 /12

RBF-Kernel − →

slide-4
SLIDE 4

RBF-Kernel

One dimensional RBF-Kernel

Φ(x) = e−x2                     1

  • 21

1! x

  • 22

2! x2

  • 23

3! x3

  • 24

4! x4

. . .                    

K(x, x′) = Φ(x)tΦ(x′) = e−x2e−x′2

  • i=0

(2xx′)i i! ← not feasible = e−x2e−x′2e2xx′ = e−(x−x′)2

↑ computed quickly in X -space, in O(d)

d-dimensional RBF-Kernel K(x, x′) = e−γ|

| x−x′ | |2

(γ > 0)

Hard Margin (γ = 2000, C = ∞) Soft Margin (γ = 2000, C = 0.25) Soft Margin (γ = 100, C = 0.25)

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 4 /12

RBF-Kernel Width − →

slide-5
SLIDE 5

Choosing RBF-Kernel Width γ e−γ| | x−x′ | |2

Small γ Medium γ Large γ!

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 5 /12

RBF-Kernel simulates k-RBF-Network − →

slide-6
SLIDE 6

RBF-Kernel Simulates k-RBF-Network

RBF-Kernel k-RBF-Network

g(x) = sign  

α∗

n>0

α∗

nyne−| | x−xn | |2 + b∗

  g(x) = sign  

k

  • j=1

wje−|

| x−µj | |2 + w0

  Centers are at support vectors Number of centers auto-determined Centers chosen to represent the data Number of centers k is an input

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 6 /12

Neural Network Kernel − →

slide-7
SLIDE 7

Neural Network Kernel

K(x, x′) = tanh(κ · xtx′ + c) Neural Network Kernel 2 Layer Neural Network

g(x) = sign  

α∗

n>0

α∗

nyn tanh(κ · xn tx + c) + b∗

  g(x) = sign m

  • j=1

wj tanh(vj

tx) + w0

  • First layer weights are support vectors

Number of hidden nodes auto-determined First layer weights arbitrary Number of hidden nodes m is an input

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 7 /12

Inner product measures similarity − →

slide-8
SLIDE 8

The Inner Product Measures Similarity

K(x, x′) = ztz′ = | | z | | · | | z′ | | · cos(θz,z′) = | | z | | · | | z′ | | · CosSim(z, z′) Normalizing for size, Kernel measures similarity between input vectors

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 8 /12

Designing Kernels − →

slide-9
SLIDE 9

Designing Kernels

  • Construct a similarity measure for the data
  • A linear model should be plausible in that transformed space

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 9 /12

String Kernels − →

slide-10
SLIDE 10

String Kernels

Applications: DNA sequences, Text

ACGGTGTCAAACGTGTCAGTGTG GTCGGGTCAAAACGTGAT

Dear Sir, With reference to your letter dated 26th March, I want to confirm the Order No. 34-09-10 placed on 3rd March, 2010. I would appreciate if you could send me the account details where the payment has to be made. As per the invoice, we are entitled to a cash discount of 2%. Can you please let us know whether it suits you if we make a wire transfer instead of a cheque? Dear Jane, I am terribly sorry to hear the news of your hip fracture. I can only imagine what a terrible time you must be going through. I hope you and the family are coping well. If there is any help you need, don’t hesitate to let me know.

Similar? Yes, if classifying spam versus non-spam No, if classifying business versus personal

To design the kernel − → measure similarity between strings Bag of words (number of occurences of each atom) Co-occurrence of substrings or subsequences

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 10 /12

Graph Kernels − →

slide-11
SLIDE 11

Graph Kernels

Performing classification on:

Graph structures (eg. protein networks for function prediction) Graph nodes within a network (eg. advertise of not to Facebook users)

Similarity between graphs:

random walks degree sequences, connectivity properties, mixing properties.

Measuring similarity between nodes:

Looking at neighborhoods, K(v, v′) = |N(v) ∩ N(v′)| |N(v) ∪ N(v′)|.

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 11 /12

Image Kernels − →

slide-12
SLIDE 12

Image Kernels

Similar? Yes - if trying to regcognize pictures with faces. No - if trying to distinguish Malik from Christos

c A M L Creator: Malik Magdon-Ismail

Kernel Machines: 12 /12