sequential minimal optimization
play

Sequential Minimal Optimization Seth Terashima April 23, 2012 Seth - PowerPoint PPT Presentation

Sequential Minimal Optimization Seth Terashima April 23, 2012 Seth Terashima Sequential Minimal Optimization The problem The story so far: Weve had fun mathing our way to the dual, but. . . It would be nice if we could actually do


  1. Sequential Minimal Optimization Seth Terashima April 23, 2012 Seth Terashima Sequential Minimal Optimization

  2. The problem The story so far: We’ve had fun mathing our way to the dual, but. . . It would be nice if we could actually do something with it. So let’s take a look at Sequential Minimal Optimization. Seth Terashima Sequential Minimal Optimization

  3. The problem We want to find λ that minimizes N N N Ψ( λ ) = 1 � � � y i y j � x i , x j � λ i λ j − λ i 2 i =1 j =1 i =1 subject to the constraints N � 0 ≤ λ i ≤ C (for all i ) and y i λ i = 0 . i =1 Each y i = ± 1 is the class of the training data x i , each λ i is the corresponding Lagrange multiplier, and C controls how “soft” we are willing to let the margin be. Seth Terashima Sequential Minimal Optimization

  4. A solution for the constraint-free case We can minimize F ( λ 1 , . . . , λ n ) one coordinate at a time. Starting with some point λ , Choose some coordinate j ∈ { 1 , 2 , . . . , n } View F as a single-variable function of λ j by fixing the other n − 1 inputs Minimize F with respect to λ j Update λ by setting λ j to its optimal value, then repeat the process for other values of j Seth Terashima Sequential Minimal Optimization

  5. Example F ( x , y ) = x 2 + xy + y 2 Seth Terashima Sequential Minimal Optimization

  6. Example F ( x , y ) = x 2 + xy + y 2 Seth Terashima Sequential Minimal Optimization

  7. Example F ( x , y ) = x 2 + xy + y 2 Seth Terashima Sequential Minimal Optimization

  8. Example F ( x , y ) = x 2 + xy + y 2 Seth Terashima Sequential Minimal Optimization

  9. Example F ( x , y ) = x 2 + xy + y 2 Seth Terashima Sequential Minimal Optimization

  10. Example F ( x , y ) = x 2 + xy + y 2 Seth Terashima Sequential Minimal Optimization

  11. Great, but the solution doesn’t meet the constraints. Seth Terashima Sequential Minimal Optimization

  12. First constraint Our first constraint is � N i =1 y i λ i = 0. The fix: Substitution. 1 Choose two coordinates, j and i . 2 Solve for λ j in terms of λ i (and the other multipliers): λ i = − 1 y k λ k = − y j � λ j + garbage y i y i k � = i 3 We are now back to optimizing a single-variable function. E.g., if j = 1, i = 2, and y 1 = − y 2 , then f ( λ 1 ) = F ( λ 1 , λ 1 + garbage , λ 3 , . . . , λ N ) meets the first constraint for all values of λ 1 . Seth Terashima Sequential Minimal Optimization

  13. Second constraint The second constraint says that for all i , 0 ≤ λ i ≤ C . This is just a boundary condition. (Slope could be negative.) Seth Terashima Sequential Minimal Optimization

  14. To recap, we are trying to minimize N N N Ψ( λ ) = 1 � � � y i y j � x i , x j � λ i λ j − λ i 2 i =1 j =1 i =1 one coordinate at a time (but also changing a second coordinate to meet the linear constraint). When j = 1, i = 2, and y 1 = − y 2 , we are minimizing f ( λ 1 ) = Ψ( λ 1 , λ 1 + garbage , λ 3 , . . . , λ N ) = c 2 λ 2 1 + c 1 λ 1 + c 0 . We can do this analytically (read: quickly)! Seth Terashima Sequential Minimal Optimization

  15. Concavity given by second derivative: f ′′ ( λ 1 ) = � x 1 , x 1 � + � x 2 , x 2 � − 2 � x 1 , x 2 � If this is positive, find global minimum 2 = λ 2 + y 2 ( E 1 − E 2 ) λ ′ f ′′ ( λ 1 ) y k − y k ), then use closest λ new (where E k = ˆ allowed by boundary 1 conditions. Set λ new = ( λ new , λ new + garbage , λ 3 , . . . , λ N ) . 1 1 Choose new values for i , j , rinse, repeat. Seth Terashima Sequential Minimal Optimization

  16. So how do we choose j and i for each iteration? There is not a clear-cut solution We need some heuristics And how do we decide when we’re done? Knowing your destination is a good first step towards getting there. Seth Terashima Sequential Minimal Optimization

  17. Choosing j Choosing j : A solution value for λ has the following properties (the KKT conditions): λ j = 0 = ⇒ y j ˆ y j ≥ 1 (1) λ j = C = ⇒ y j ˆ y j ≤ 1 (2) 0 < λ j < C = ⇒ y j ˆ y j = 1 (3) We just want to be “close enough” (within ε ≈ 0 . 001) for all j . If there is some j that violates these, j is a candidate for optimization. Priority given to “unbound” multipliers (when 0 < λ j < C ) Multipliers tend to become bound over time (why?) Seth Terashima Sequential Minimal Optimization

  18. Choosing i Recall that the global minimum of f ( λ j ) has value i = λ i + y i ( E j − E i ) λ ′ . f ′′ ( λ j ) After choosing j , we choose i that maximizes | E j − E i | . Intuitively, this heurristic helps “move” λ i by a large amount each iteration. Seth Terashima Sequential Minimal Optimization

  19. Recomputing the offset Our model is ˆ y = w · x − b Although b is not part of the dual (why not?), we need b to evaluate E k and the KTT conditions After each iteration, we update b to be halfway between the values that would make x i and x j support vectors Seth Terashima Sequential Minimal Optimization

  20. Benchmarks Algorithms completed when all KKT conditions met within ε = 0 . 001 The chunking algorithm used in the benchmark used a different convergence condition, but Platt was conservative. SMO showed better scaling than chunking, usually by a factor of N SMO time dominated by SVM evaluations — very fast with linear SVMs SMO performed over a 1000 times faster than contemporary state-of-the-art alternatives on real-world data. Not bad. Seth Terashima Sequential Minimal Optimization

  21. Conclusion We needed an efficient way to minimize the dual SMO accomplishes this by changing two multipliers at a time until the KKT conditions are met SMO is reasonably simple and very fast compared to previous methods Heuristics might be a good place to look for improvements Seth Terashima Sequential Minimal Optimization

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend