Sequential Minimal Optimization Seth Terashima April 23, 2012 Seth - PowerPoint PPT Presentation

Sequential Minimal Optimization Seth Terashima April 23, 2012 Seth Terashima Sequential Minimal Optimization

The problem The story so far: We’ve had fun mathing our way to the dual, but. . . It would be nice if we could actually do something with it. So let’s take a look at Sequential Minimal Optimization. Seth Terashima Sequential Minimal Optimization

The problem We want to find λ that minimizes N N N Ψ( λ ) = 1 � � � y i y j � x i , x j � λ i λ j − λ i 2 i =1 j =1 i =1 subject to the constraints N � 0 ≤ λ i ≤ C (for all i ) and y i λ i = 0 . i =1 Each y i = ± 1 is the class of the training data x i , each λ i is the corresponding Lagrange multiplier, and C controls how “soft” we are willing to let the margin be. Seth Terashima Sequential Minimal Optimization

A solution for the constraint-free case We can minimize F ( λ 1 , . . . , λ n ) one coordinate at a time. Starting with some point λ , Choose some coordinate j ∈ { 1 , 2 , . . . , n } View F as a single-variable function of λ j by fixing the other n − 1 inputs Minimize F with respect to λ j Update λ by setting λ j to its optimal value, then repeat the process for other values of j Seth Terashima Sequential Minimal Optimization

Example F ( x , y ) = x 2 + xy + y 2 Seth Terashima Sequential Minimal Optimization

Great, but the solution doesn’t meet the constraints. Seth Terashima Sequential Minimal Optimization

First constraint Our first constraint is � N i =1 y i λ i = 0. The fix: Substitution. 1 Choose two coordinates, j and i . 2 Solve for λ j in terms of λ i (and the other multipliers): λ i = − 1 y k λ k = − y j � λ j + garbage y i y i k � = i 3 We are now back to optimizing a single-variable function. E.g., if j = 1, i = 2, and y 1 = − y 2 , then f ( λ 1 ) = F ( λ 1 , λ 1 + garbage , λ 3 , . . . , λ N ) meets the first constraint for all values of λ 1 . Seth Terashima Sequential Minimal Optimization

Second constraint The second constraint says that for all i , 0 ≤ λ i ≤ C . This is just a boundary condition. (Slope could be negative.) Seth Terashima Sequential Minimal Optimization

To recap, we are trying to minimize N N N Ψ( λ ) = 1 � � � y i y j � x i , x j � λ i λ j − λ i 2 i =1 j =1 i =1 one coordinate at a time (but also changing a second coordinate to meet the linear constraint). When j = 1, i = 2, and y 1 = − y 2 , we are minimizing f ( λ 1 ) = Ψ( λ 1 , λ 1 + garbage , λ 3 , . . . , λ N ) = c 2 λ 2 1 + c 1 λ 1 + c 0 . We can do this analytically (read: quickly)! Seth Terashima Sequential Minimal Optimization

Concavity given by second derivative: f ′′ ( λ 1 ) = � x 1 , x 1 � + � x 2 , x 2 � − 2 � x 1 , x 2 � If this is positive, find global minimum 2 = λ 2 + y 2 ( E 1 − E 2 ) λ ′ f ′′ ( λ 1 ) y k − y k ), then use closest λ new (where E k = ˆ allowed by boundary 1 conditions. Set λ new = ( λ new , λ new + garbage , λ 3 , . . . , λ N ) . 1 1 Choose new values for i , j , rinse, repeat. Seth Terashima Sequential Minimal Optimization

So how do we choose j and i for each iteration? There is not a clear-cut solution We need some heuristics And how do we decide when we’re done? Knowing your destination is a good first step towards getting there. Seth Terashima Sequential Minimal Optimization

Choosing j Choosing j : A solution value for λ has the following properties (the KKT conditions): λ j = 0 = ⇒ y j ˆ y j ≥ 1 (1) λ j = C = ⇒ y j ˆ y j ≤ 1 (2) 0 < λ j < C = ⇒ y j ˆ y j = 1 (3) We just want to be “close enough” (within ε ≈ 0 . 001) for all j . If there is some j that violates these, j is a candidate for optimization. Priority given to “unbound” multipliers (when 0 < λ j < C ) Multipliers tend to become bound over time (why?) Seth Terashima Sequential Minimal Optimization

Choosing i Recall that the global minimum of f ( λ j ) has value i = λ i + y i ( E j − E i ) λ ′ . f ′′ ( λ j ) After choosing j , we choose i that maximizes | E j − E i | . Intuitively, this heurristic helps “move” λ i by a large amount each iteration. Seth Terashima Sequential Minimal Optimization

Recomputing the offset Our model is ˆ y = w · x − b Although b is not part of the dual (why not?), we need b to evaluate E k and the KTT conditions After each iteration, we update b to be halfway between the values that would make x i and x j support vectors Seth Terashima Sequential Minimal Optimization

Benchmarks Algorithms completed when all KKT conditions met within ε = 0 . 001 The chunking algorithm used in the benchmark used a different convergence condition, but Platt was conservative. SMO showed better scaling than chunking, usually by a factor of N SMO time dominated by SVM evaluations — very fast with linear SVMs SMO performed over a 1000 times faster than contemporary state-of-the-art alternatives on real-world data. Not bad. Seth Terashima Sequential Minimal Optimization

Conclusion We needed an efficient way to minimize the dual SMO accomplishes this by changing two multipliers at a time until the KKT conditions are met SMO is reasonably simple and very fast compared to previous methods Heuristics might be a good place to look for improvements Seth Terashima Sequential Minimal Optimization

Sequential Minimal Optimization Seth Terashima April 23, 2012 Seth - PowerPoint PPT Presentation

Sequential Minimal Optimization Seth Terashima April 23, 2012 Seth Terashima Sequential Minimal Optimization The problem The story so far: Weve had fun mathing our way to the dual, but. . . It would be nice if we could actually do

{Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code} {Sequential Code}

Coarse Classification of Binary Minimal Clones Zarathustra Brady Minimal clones A clone C is

Random Sampling Florian Schoppmann August 24, 2010 Non-Sequential Sequential Sequential with

Hardware Design with VHDL Sequential Stmts ECE 443 Sequential Statements This slide set covers

Sequential Files : Outline ! Overview ! Ordered vs. Unordered ! Physical sequential Files !

Chapter 5 Synchronous Sequential Logic 5-1 Outline ! Sequential Circuits ! Latches ! Flip-Flops

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require

Introduction to Synchronous Sequential Introduction to Synchronous Sequential Circuits Circuits

Synthetic Minimal Chromosome 2010 CBNU-KOREA team genetic information necessary and sufficient

A toy example in Minimal Model Program In minimal model program for 3-folds, Mori connected

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Hardware Design with VHDL Sequential Circuit Design I ECE 443 Sequential Circuit Design:

Sequential Circuits Combinational circuits : current input output Sequential circuit :

1 Sequential data analysis Sequential data analysis Objects and operators Objects and operators

Sequential Decision Making AIMA Chapters: 17.1, 17.2, 17.3. Sutton and Barto, Reinforcement

Lecture 14: Sequential Circuits, FSM Todays topics: Sequential circuits Finite

Disclosures I have no relevant exposures. Companies are smart. The prefer Rams opinions

Defense-Within-Limits Policies Insurer and Insured Perspectives on Bad Faith Risks and Pitfalls

A Snapshot of Massachusetts A Snapshot of Massachusetts Agriculture Agriculture July 2015 July

Unlock Your Data, Improve Your Performance with Data Warehousing Janice Miller Research Systems

2014 INTERIM RESULTS PRESENTATION 31 JULY 2014 1 Forward-Looking Statements Disclaimer The

Mechanical Analysis and Characterization of Extracellular Bone Matrix (ECM) Using Atomic Force

Low Creep/Low Relaxation Thermoplastic Polymer Composites for Deployable Structures Kyle Horn

Multi-dimensional nature of drought in Abbay/Upper Blue Nile Basin and the importance of regional