Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1 , - - PowerPoint PPT Presentation

online knowledge based support vector machines
SMART_READER_LITE
LIVE PREVIEW

Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1 , - - PowerPoint PPT Presentation

Online Knowledge-Based Support Vector Machines Gautam Kunapuli 1 , Kristin P. Bennett 2 , Amina Shabbeer 2 , Richard Maclin 3 and Amina Shabbeer 2 , Richard Maclin 3 and Jude W. Shavlik 1 1 University of Wisconsin-Madison, USA 2 Rensselaer


slide-1
SLIDE 1

Online Knowledge-Based Support Vector Machines

Gautam Kunapuli1, Kristin P. Bennett2, Amina Shabbeer2, Richard Maclin3 and Amina Shabbeer2, Richard Maclin3 and Jude W. Shavlik1

1University of Wisconsin-Madison, USA 2Rensselaer Polytechnic Institute, USA 3University of Minnesota, Duluth, USA

ECML 2010, Barcelona, Spain

slide-2
SLIDE 2

Outline

  • Knowledge-Based Support Vector Machines
  • The Adviceptron: Online KBSVMs
  • A Real-World Task: Diabetes Diagnosis

A Real-World Task: Tuberculosis Isolate Classification

  • A Real-World Task: Tuberculosis Isolate Classification
  • Conclusions

ECML 2010, Barcelona, Spain

slide-3
SLIDE 3

Knowledge-Based SVMs

  • Introduced by Fung et al (2003)
  • Allows incorporation of expert advice into

SVM formulations

  • Advice is specified with respect to polyhedral

Advice is specified with respect to polyhedral regions in input (feature) space

  • Can be incorporated into SVM formulation as

constraints using advice variables

ECML 2010, Barcelona, Spain

≥ ∧ ≤ ⇒

≤ − ∧ ≤ ∧ ≥ ⇒ −

≥ ∧ ≤ − ⇒

slide-4
SLIDE 4

Knowledge-Based SVMs

In classic SVMs, we have T labeled data points (xt, yt), t = 1, …, T. We learn a linear classifier w’x – b = 0.

Class A, y= +1

  • λ ′ξ

Y X − b ξ ≥ , ξ ≥ .

The standard SVM formulation trades off regularization and loss:

Class B, y = -1

ECML 2010, Barcelona, Spain

slide-5
SLIDE 5

Knowledge-Based SVMs

We assume an expert provides polyhedral advice of the form

D ≤ ⇒ ′ ≥ b

We can transform the logic constraint above using

Class A, y= +1

constraint above using advice variables, u

D ≤

Class B, y = -1

ECML 2010, Barcelona, Spain

D′ , −′ − b ≥ ,

These constraints are added to the standard formulation to give Knowledge-Based SVMs

slide-6
SLIDE 6

Knowledge-Based SVMs

In general, there are m advice sets, each with label , for advice belonging to Class A or B,

Di ≤ i ⇒ zi ′ − b ≥

Class A, y= +1

z ±

Each advice set adds the following constraints to the SVM formulation

D′

ii zi ,

−i′i − zib ≥ , i ≥

D ≤ D ≤

Class B, y = -1

D ≤

ECML 2010, Barcelona, Spain

slide-7
SLIDE 7

Knowledge-Based SVMs

The batch KBSVM formulation introduces advice slack variables to soften the advice constraints

Class A, y= +1

  • λ ′ξ
  • η ζ
  • .
  • Y X − b ξ ≥ ,

D ≤ D ≤

Class B, y = -1

D ≤

ECML 2010, Barcelona, Spain

  • Y X − b ξ ≥ ,

ξ ≥ , D′

ii zi ηi ,

−i′i − zib ζi ≥ , i, ηi, ζi ≥ , i , ..., m.

slide-8
SLIDE 8

Outline

  • Knowledge-Based Support Vector Machines
  • The Adviceptron: Online KBSVMs
  • A Real-World Task: Diabetes Diagnosis

A Real-World Task: Tuberculosis Isolate Classification

  • A Real-World Task: Tuberculosis Isolate Classification
  • Conclusions

ECML 2010, Barcelona, Spain

slide-9
SLIDE 9

Online KBSVMs

  • Need to derive an online version of KBSVMs
  • Algorithm is provided with advice and one labeled

data point at each round

  • Algorithm should update the hypothesis at each
  • Algorithm should update the hypothesis at each

step, wt, as well as the advice vectors, ui,t

ECML 2010, Barcelona, Spain

slide-10
SLIDE 10

Passive-Aggressive Algorithms

  • Adopt the framework of passive-aggressive

algorithms (Crammer et al, 2006), where at each round, when a new data point is given,

– if loss = 0, there is no update (passive) – if loss > 0, update weights to minimize loss (aggressive) – if loss > 0, update weights to minimize loss (aggressive)

  • Why passive-aggressive algorithms?

– readily applicable to most SVM losses – possible to derive elegant, closed-form update rules – simple rules provide fast updates; scalable – analyze performance by deriving regret bounds

ECML 2010, Barcelona, Spain

slide-11
SLIDE 11
  • There are m advice sets,
  • At round t, the algorithm receives
  • The current hypothesis is , and the current

advice variables are

t

Online KBSVMs

t, yt

i,t, i , ..., m

Di, i, zim

i

ECML 2010, Barcelona, Spain

  • ξ,,η,ζ≥,
  • − t
  • m
  • i

i − i,t λ ξ

  • m
  • i
  • ηi ζ

i

  • yt′t − ξ ≥ ,

D′

ii zi ηi

−i′i − ζi ≥ i ≥      i , ..., m.

At round t, the formulation for deriving an update is

slide-12
SLIDE 12
  • There are m advice sets,
  • At round t, the algorithm receives
  • The current hypothesis is , and the current

advice variables are

t

Formulation At The t-th Round

t, yt

i,t, i , ..., m

Di, i, zim

i

proximal terms for hypothesis and advice vectors

ECML 2010, Barcelona, Spain

  • ξ,,η,ζ≥,
  • − t
  • m
  • i

i − i,t λ ξ

  • m
  • i
  • ηi ζ

i

  • ,

yt′t − ξ ≥ , D′

ii zi ηi

−i′i − ζi ≥ i ≥      i , ..., m.

proximal terms for hypothesis and advice vectors

slide-13
SLIDE 13

Formulation At The t-th Round

data loss advice loss

  • There are m advice sets,
  • At round t, the algorithm receives
  • The current hypothesis is , and the current

advice variables are

t

t, yt

i,t, i , ..., m

Di, i, zim

i

ECML 2010, Barcelona, Spain

  • ξ,,η,ζ≥,
  • − t
  • m
  • i

i − i,t λ ξ

  • m
  • i
  • ηi ζi
  • ,

yt′t − ξ ≥ , D′

ii zi ηi

−i′i − ζi ≥ i ≥      i , ..., m.

data loss advice loss

slide-14
SLIDE 14

Formulation At The t-th Round

parameters

  • There are m advice sets,
  • At round t, the algorithm receives
  • The current hypothesis is , and the current

advice variables are

t

t, yt

i,t, i , ..., m

Di, i, zim

i

ECML 2010, Barcelona, Spain

  • ξ,,η,ζ≥,
  • − t
  • m
  • i

i − i,t λ ξ

  • m
  • i
  • ηi ζ

i

  • ,

yt′t − ξ ≥ , D′

ii zi ηi

−i′i − ζi ≥ i ≥      i , ..., m.

parameters

slide-15
SLIDE 15

Formulation At The t-th Round

  • There are m advice sets,
  • At round t, the algorithm receives
  • The current hypothesis is , and the current

advice variables are

t

t, yt

i,t, i , ..., m

Di, i, zim

i

ECML 2010, Barcelona, Spain

  • ξ,,η,ζ≥,
  • − t
  • m
  • i

i − i,t λ ξ

  • m
  • i
  • ηi ζ

i

  • ,

yt′t − ξ ≥ , D′

ii zi ηi

−i′i − ζi ≥ i ≥      i , ..., m.

inequality constraints make deriving a closed- form update impossible

slide-16
SLIDE 16
  • There are m advice sets,
  • At round t, the algorithm receives
  • The current hypothesis is , and the current

advice-vector estimates are

t

Formulation At The t-th Round

t, yt

i,t, i , ..., m

Di, i, zim

i

ECML 2010, Barcelona, Spain

  • ξ,,η,ζ≥,
  • − t
  • m
  • i

i − i,t λ ξ

  • m
  • i
  • ηi ζ

i

  • ,

yt′t − ξ ≥ , D′

ii zi ηi

−i′i − ζi ≥ i ≥      i , ..., m.

inequality constraints make deriving a closed- form update impossible

slide-17
SLIDE 17
  • First sub-problem: update hypothesis by fixing

the advice variables, to their values at the t-th iteration

Decompose Into m+1 Sub-problems

  • ξ,,η,ζ≥,
  • − t
  • m
  • i

i − i,t λ ξ

  • m
  • i
  • ηi ζ

i

  • ,

yt′t − ξ ≥ , 

i i,t

ECML 2010, Barcelona, Spain

t ′

− ≥ D′

ii zi ηi

−i′i − ζi ≥ i ≥      i , ..., m.

  • Some objective terms and constraints drop out
  • f the formulation
slide-18
SLIDE 18
  • First sub-problem: update hypothesis by fixing

the advice vectors

  • Update advice vectors by fixing the hypothesis

– Breaks down into m sub-problems, one for each advice set

Deriving The Hypothesis Update

t

  • ,ξ,η
  • − t

λ

ξ

  • m
  • i

ηi

  • yt′t − ξ ≥ ,

α D′

ii,t zi ηi , i , ..., m.

βi

ECML 2010, Barcelona, Spain

D′

i

zi η , i , ..., m. β

slide-19
SLIDE 19
  • First sub-problem: update hypothesis by fixing

the advice vectors

  • Update advice vectors by fixing the hypothesis

– Breaks down into m sub-problems, one for each advice set

Deriving The Hypothesis Update

t

  • ,ξ,η
  • − t

λ

ξ

  • m
  • i

ηi

  • yt′t − ξ ≥ ,

α D′

ii,t zi ηi , i , ..., m.

βi

ECML 2010, Barcelona, Spain

D′

i

zi η , i , ..., m. β fixed, advice-estimate of the hypothesis according to i-th advice set ; denote as i,t

slide-20
SLIDE 20
  • First sub-problem: update hypothesis by fixing

the advice vectors

  • Update advice vectors by fixing the hypothesis

– Breaks down into m sub-problems, one for each advice set

Advice-Estimate Of Current Hypothesis

t

  • ,ξ,η
  • − t

λ

ξ

  • m
  • i

ηi

  • yt′t − ξ ≥ ,

α D′

ii,t zi ηi , i , ..., m.

βi

ECML 2010, Barcelona, Spain

D′

i

zi η , i , ..., m. β fixed, advice-estimate of the hypothesis according to i-th advice set ; denote as i,t average advice-estimates

  • ver all m advice vectors

and denote as

t m

m

  • i

i,t

slide-21
SLIDE 21

The Hypothesis Update

λ > t t ν t αtytt − ν t, αt λ νt − !

  • − ν ytt′t − − ν ytt′t ,
  • .

ECML 2010, Barcelona, Spain

slide-22
SLIDE 22

λ > t t ν t αtytt − ν t, αt λ νt − !

  • − ν ytt′t − − ν ytt′t ,
  • .

The Hypothesis Update

Update is convex combination

ECML 2010, Barcelona, Spain

Update is convex combination

  • f the standard passive-

aggressive update and the average advice-estimate

ν

  • m

Parameter of convex combinations is

slide-23
SLIDE 23

λ > t t ν t αtytt − ν t, αt λ νt − !

  • − ν ytt′t − − ν ytt′t ,
  • .

The Hypothesis Update

Update is convex combination

ECML 2010, Barcelona, Spain

Update is convex combination

  • f the standard passive-

aggressive update and the average advice-estimate Update weight depends on hinge loss computed with respect to a composite weight vector that is a convex combination of the current hypothesis and the average advice-estimate

ν

  • m

Parameter of convex combinations is

slide-24
SLIDE 24
  • Second sub-problem: update advice vectors by

fixing the hypothesis

Deriving The Advice Updates

  • ξ,,η,ζ≥,
  • − t
  • m
  • i

i − i,t λ ξ

  • m
  • i
  • ηi ζ

i

  • ,

yt′t − ξ ≥ , D′

ii zi ηi i′ i

   i , ..., m.

t

ECML 2010, Barcelona, Spain

−i′i − ζi ≥ i ≥      i , ..., m.

  • Some constraints and objective terms drop out
  • f the formulation
slide-25
SLIDE 25
  • Second sub-problem: update advice vectors by

fixing the hypothesis

Deriving The Advice Updates

  • ,η,ζ≥,
  • m
  • i

i − i,t

  • m
  • i
  • ηi ζ

i

  • ,

D′

ii zit ηi

−i′i − ζi ≥

i

     i , ..., m.

ECML 2010, Barcelona, Spain

− − ≥ i ≥    

slide-26
SLIDE 26
  • Second sub-problem: update advice vectors by

fixing the hypothesis

Deriving The Advice Updates

  • ,η,ζ≥,
  • m
  • i

i − i,t

  • m
  • i
  • ηi ζ

i

  • ,

D′

ii zit ηi

−i′i − ζi ≥

i

     i , ..., m.

ECML 2010, Barcelona, Spain

− − ≥ i ≥    

split into m sub-problems

slide-27
SLIDE 27
  • m sub-problems: update the i-th advice vector

by fixing the hypothesis

Deriving The i-th Advice Updates

i,t

,η,ζ

  • i − i,t
  • ηi

ζ i

  • D′

ii zit ηi ,

βi −i′i − ζi ≥ , γi

ECML 2010, Barcelona, Spain

−i′i − ζi ≥ , γi i ≥ . τ i

slide-28
SLIDE 28
  • m sub-problems: update the i-th advice vector

by fixing the hypothesis

Deriving The i-th Advice Updates

cone constraints still

i,t

,η,ζ

  • i − i,t
  • ηi

ζ i

  • D′

ii zit ηi ,

βi −i′i − ζi ≥ , γi

ECML 2010, Barcelona, Spain

cone constraints still complicating cannot derive closed form solution

  • Use projected-gradient approach
  • drop constraints to compute

intermediate closed-form update

  • project intermediate update back
  • n to cone constraints

−i′i − ζi ≥ , γi i ≥ . τ i

slide-29
SLIDE 29

The m Advice Updates

> t " i , ..., m # $ i,t !

  • i,t Diβi − i γi,
  • ,
  • βi

γi

 −D′

iDi In

D′

ii

i′Di −i′i

− 

 D′

ii,t zit

−i′i,t −  

ECML 2010, Barcelona, Spain

slide-30
SLIDE 30

The m Advice Updates

> t " i , ..., m # $ i,t !

  • i,t Diβi − i γi,
  • ,
  • βi

γi

 −D′

iDi In

D′

ii

i′Di −i′i

− 

 D′

ii,t zit

−i′i,t −  

projection

ECML 2010, Barcelona, Spain

each advice update depends on the newly updated hypothesis

  • hypothesis-estimate of the

advice; denote

  • The update is the error or

the amount of violation of the constraint by an ideal data point,

i βi/γi Di ≤ i

i

slide-31
SLIDE 31

% &'$$ ' '#$

, y

D, , z λ, >

ν / m

, y

  • y ′
  • y

− νy ′ − − νy ′,

The Adviceptron

ℓ − νy′ − − νy′,

  • α ℓ/

λ ν, ν α y − ν

  • β, γ H−
  • ,

Dβ − γ

  • ECML 2010, Barcelona, Spain
slide-32
SLIDE 32

Outline

  • Knowledge-Based Support Vector Machines
  • The Adviceptron: Online KBSVMs
  • A Real-World Task: Diabetes Diagnosis

A Real-World Task: Tuberculosis Isolate Classification

  • A Real-World Task: Tuberculosis Isolate Classification
  • Conclusions

ECML 2010, Barcelona, Spain

slide-33
SLIDE 33

Diagnosing Diabetes

  • Standard data set from UCI repository (768 x 8)

– all patients at least 21 years old of Pima Indian heritage – features include body mass index, blood glucose level

  • Expert advice for diagnosing diabetes from NIH

website on risks for Type-2 diabetes website on risks for Type-2 diabetes

– a person who is obese (characterized by BMI > 30) and has a high blood glucose level (> 126) is at a strong risk for diabetes – a person who is at normal weight (BMI < 25) and has low blood glucose level (< 100) is at a low risk for diabetes

ECML 2010, Barcelona, Spain

≥ ∧ ≥ ( ⇒ ≤ ∧ ≤ ⇒ ¬

slide-34
SLIDE 34

Diagnosing Diabetes: Results

  • 200 examples for training,

remaining for testing

  • Results averaged over 20

randomized iterations

  • Compared to advice-free

ECML 2010, Barcelona, Spain

Compared to advice-free

  • nline algorithms:
  • Passive-aggressive

(Crammer et al, 2006),

  • ROMMA

(Li & Long, 2002),

  • Max margin-perceptron

(Freund & Schapire, 1999)

batch KBSVM, 200 points

slide-35
SLIDE 35

Outline

  • Knowledge-Based Support Vector Machines
  • The Adviceptron: Online KBSVMs
  • A Real-World Task: Diabetes Diagnosis

A Real-World Task: Tuberculosis Isolate Classification

  • A Real-World Task: Tuberculosis Isolate Classification
  • Conclusions

ECML 2010, Barcelona, Spain

slide-36
SLIDE 36

Tuberculosis Isolate Classification

  • Task is to classify strains of Mycobacterium

tuberculosis complex (MTBC) into major genetic lineages based on DNA fingerprints

  • MTBC is the causative agent for TB

– leading cause of disease and morbidity – leading cause of disease and morbidity – strains vary in infectivity, transmission, virulence, immunogenicity, host associations depending on genetic lineage

  • Lineage classification is crucial for surveillance,

tracking and control of TB world-wide

ECML 2010, Barcelona, Spain

slide-37
SLIDE 37

Tuberculosis Isolate Classification

  • Two types of DNA fingerprints for all culture-positive TB strains

collected in the US by the CDC (44 data features)

  • Six (classes) major lineages of TB for classification

– ancestral: M. bovis, M. africanum, Indo-Oceanic – modern: Euro-American, East-Asian, East-African-Indian Problem formulated as six 1-vs-many classification tasks

  • Problem formulated as six 1-vs-many classification tasks

ECML 2010, Barcelona, Spain

) " ) " *# )# & ' +$ ' ,'

  • ,'".

(-

  • ,'

(

  • ./
  • (-
slide-38
SLIDE 38

Expert Rules for TB Lineage Classification

East-Asian Indo-Oceanic

  • M. bovis
  • M. africanum

East-African-Indian

>

ECML 2010, Barcelona, Spain

Rules provided by Dr. Lauren Cowan at the Center for Disease Control, documented in Shabbeer et al, (2010)

Indo-Oceanic Euro-American

slide-39
SLIDE 39

TB Results: Might Need Fewer Examples To Converge With Advice

Euro-American vs. the Rest

  • M. africanum vs. the Rest

ECML 2010, Barcelona, Spain

batch KBSVM batch KBSVM

slide-40
SLIDE 40

TB Results: Can Converge To A Better Solution With Advice

East-African-Indian vs. the Rest Indo-Oceanic vs. the Rest

ECML 2010, Barcelona, Spain

batch KBSVM batch KBSVM

slide-41
SLIDE 41

TB Results: Possible To Still Learn Well With Only Advice

East-Asian vs. the Rest

  • M. bovis vs. the Rest

ECML 2010, Barcelona, Spain

batch KBSVM batch KBSVM

slide-42
SLIDE 42

Outline

  • Knowledge-Based Support Vector Machines
  • The Adviceptron: Online KBSVMs
  • A Real-World Task: Diabetes Diagnosis

A Real-World Task: Tuberculosis Isolate Classification

  • A Real-World Task: Tuberculosis Isolate Classification
  • Conclusions And Questions

ECML 2010, Barcelona, Spain

slide-43
SLIDE 43

Conclusions

  • New online learning algorithm: the adviceptron
  • Makes use of prior knowledge in the form of (possibly

imperfect) polyhedral advice

  • Performs simple, closed-form updates via passive-

aggressive framework; scalable

  • Good advice can help converge to a better solution
  • Good advice can help converge to a better solution

with fewer examples

  • Encouraging empirical results on two important

real-world tasks

ECML 2010, Barcelona, Spain

slide-44
SLIDE 44

References.

(Fung et al, 2003) G. Fung, O. L. Mangasarian, and J. W. Shavlik. Knowledge-based support vector

  • classifiers. In S. Becker, S. Thrun & K. Obermayer, eds, NIPS, 15, pp. 521–528, 2003

(Crammer et al, 2006) K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive- aggressive algorithms. J. of Mach. Learn. Res., 7:551–585, 2006. (Freund and Schapire, 1999) Y. Freund and R. E. Schapire. Large margin classification using the perceptron algorithm. Mach. Learn., 37(3):277–296, 1999. (Li and Long, 2002) Y. Li and P. M. Long. The relaxed online maximum margin algorithm. Mach. Learn., 46(1/3):361–387, 2002. (Shabbeer et al, 2001) A. Shabbeer, L. Cowan, J. R. Driscoll, C. Ozcaglar, S. L Vandenberg, B. Yener, and

  • K. P Bennett. TB-Lineage: An online tool for classification and analysis of

strains of Mycobacterium tuberculosis Complex. Unpublished manuscript, 2010.

Acknowledgements.

The authors would like to thank Dr. Lauren Cowan of the Center for Disease Control (CDC) for providing the TB dataset and the expert-defined rules for lineage classification. We gratefully acknowledge support of DARPA under grant HR0011-07-C-0060 and the NIH under grant 1-R01-LM009731-01.

Views and conclusions contained in this document are those of the authors and do not necessarily represent the

  • fficial opinion or policies, either expressed or implied of the US government or of DARPA.

ECML 2010, Barcelona, Spain

strains of Mycobacterium tuberculosis Complex. Unpublished manuscript, 2010.

slide-45
SLIDE 45

KBSVMs: Deriving The Advice Constraints

We assume an expert provides polyhedral advice of the form

D ≤ ⇒ ′ ≥ b

We know is equivalent

p ⇒ q

Class A, y= +1

We know is equivalent to If has a solution then its negation has no solution or,

D − τ ≤ , ′ − b τ < , −τ < , τ.

has no solution

p ⇒ q

¬p ∨ q D ≤

Class B, y = -1

ECML 2010, Barcelona, Spain

¬p ∨ q

slide-46
SLIDE 46

has no solution , then by If the following system

, τ

Class A, y= +1

D − τ ≤ , ′ − b τ < , −τ <

KBSVMs: Deriving The Advice Constraints

has no solution , then by Motzkin’s Theorem of the Alternative, the following system

, τ D′ , −′ − b ≥ ,

has a solution u.

D ≤

Class B, y = -1

ECML 2010, Barcelona, Spain