machine learning chenhao tan
play

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 10 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 52 Roadmap Last time: linear SVM formulation when data is


  1. Slack variables New Lagrangian m L ( w , b , ξ , α , β ) = 1 2 || w || 2 + C � ξ i (10) i = 1 m � − α i [ y i ( w · x i + b ) − 1 + ξ i ] (11) i = 1 m � − β i ξ i (12) i = 1 Taking the gradients ( ∇ w L , ∇ b L , ∇ ξ i L ) and solving for zero gives us m m � α i y i = 0 (14) � α i + β i = C (15) w = α i y i x i (13) i = 1 i = 1 Machine Learning: Chenhao Tan | Boulder | 22 of 52

  2. Slack variables New Lagrangian m L ( w , b , ξ , α , β ) = 1 2 || w || 2 + C � ξ i (10) i = 1 m � − α i [ y i ( w · x i + b ) − 1 + ξ i ] (11) i = 1 m � − β i ξ i (12) i = 1 Taking the gradients ( ∇ w L , ∇ b L , ∇ ξ i L ) and solving for zero gives us m m � α i y i = 0 (14) � α i + β i = C (15) w = α i y i x i (13) i = 1 i = 1 Machine Learning: Chenhao Tan | Boulder | 22 of 52

  3. Slack variables Simplifying dual objective m m � w = α i y i x i α i + β i = C � α i y i = 0 i = 1 i = 1 Machine Learning: Chenhao Tan | Boulder | 23 of 52

  4. Slack variables Simplifying dual objective m m � w = α i y i x i α i + β i = C � α i y i = 0 i = 1 i = 1 m L ( w , b , ξ , α , β ) = 1 2 || w || 2 + C � ξ i i = 1 m � − α i [ y i ( w · x i + b ) − 1 + ξ i ] i = 1 m � − β i ξ i i = 1 Machine Learning: Chenhao Tan | Boulder | 23 of 52

  5. Slack variables Dual Problem m m m α i − 1 � � � max α i α j y i y j ( x j · x i ) 2 α i = 1 i = 1 j = 1 s.t. C ≥ α i ≥ 0 , i ∈ [ 1 , m ] � α i y i = 0 i Machine Learning: Chenhao Tan | Boulder | 24 of 52

  6. Slack variables Dual Problem m m m α i − 1 � � � max α i α j y i y j ( x j · x i ) 2 α i = 1 i = 1 j = 1 s.t. C ≥ α i ≥ 0 , i ∈ [ 1 , m ] � α i y i = 0 i Machine Learning: Chenhao Tan | Boulder | 24 of 52

  7. Slack variables Karush-Kuhn-Tucker (KKT) conditions Primal and dual feasibility y i ( w · x i + b ) ≥ 1 − ξ i , ξ i ≥ 0 , C ≥ α i ≥ 0 , β i ≥ 0 (16) Machine Learning: Chenhao Tan | Boulder | 25 of 52

  8. Slack variables Karush-Kuhn-Tucker (KKT) conditions Primal and dual feasibility y i ( w · x i + b ) ≥ 1 − ξ i , ξ i ≥ 0 , C ≥ α i ≥ 0 , β i ≥ 0 (16) Stationarity m m � � w = α i y i x i , α i y i = 0 , α i + β i = C (17) i = 1 i = 1 Machine Learning: Chenhao Tan | Boulder | 25 of 52

  9. Slack variables Karush-Kuhn-Tucker (KKT) conditions Primal and dual feasibility y i ( w · x i + b ) ≥ 1 − ξ i , ξ i ≥ 0 , C ≥ α i ≥ 0 , β i ≥ 0 (16) Stationarity m m � � w = α i y i x i , α i y i = 0 , α i + β i = C (17) i = 1 i = 1 Complementary slackness α i [ y i ( w · x i + b ) − 1 + ξ i ] = 0 , β i ξ i = 0 (18) Machine Learning: Chenhao Tan | Boulder | 25 of 52

  10. Slack variables More on Complementary Slackness α i [ y i ( w · x i + b ) − 1 + ξ i ] = 0 , β i ξ i = 0 (19) • x i satisfies the margin, y i ( w · x i + b ) > 1 ⇒ α i = 0 Machine Learning: Chenhao Tan | Boulder | 26 of 52

  11. Slack variables More on Complementary Slackness α i [ y i ( w · x i + b ) − 1 + ξ i ] = 0 , β i ξ i = 0 (19) • x i satisfies the margin, y i ( w · x i + b ) > 1 ⇒ α i = 0 • x i does not satisfy the margin, y i ( w · x i + b ) < 1 ⇒ α i = C Machine Learning: Chenhao Tan | Boulder | 26 of 52

  12. Slack variables More on Complementary Slackness α i [ y i ( w · x i + b ) − 1 + ξ i ] = 0 , β i ξ i = 0 (19) • x i satisfies the margin, y i ( w · x i + b ) > 1 ⇒ α i = 0 • x i does not satisfy the margin, y i ( w · x i + b ) < 1 ⇒ α i = C • x i is on the margin, y i ( w · x i + b ) = 1 ⇒ 0 ≤ α i ≤ C Machine Learning: Chenhao Tan | Boulder | 26 of 52

  13. Sequential Mimimal Optimization Outline Duality Slack variables Sequential Mimimal Optimization Recap Machine Learning: Chenhao Tan | Boulder | 27 of 52

  14. Sequential Mimimal Optimization Sequential Mimimal Optimization Trivia • Invented by John Platt in 1998 at Microsoft Research • Called Minimal due to solving small sub-problems Machine Learning: Chenhao Tan | Boulder | 28 of 52

  15. Sequential Mimimal Optimization Dual problem m m m α i − 1 � � � max α i α j y i y j ( x j · x i ) 2 α i = 1 i = 1 j = 1 s.t. C ≥ α i ≥ 0 , i ∈ [ 1 , m ] � α i y i = 0 i Machine Learning: Chenhao Tan | Boulder | 29 of 52

  16. Sequential Mimimal Optimization Brief Interlude: Coordinate Ascent m m m α i − 1 � � � max α i α j y i y j ( x j · x i ) 2 α i = 1 i = 1 j = 1 s.t. C ≥ α i ≥ 0 , i ∈ [ 1 , m ] � α i y i = 0 i Loop over each training example, change α i to maximize the above function Machine Learning: Chenhao Tan | Boulder | 30 of 52

  17. Sequential Mimimal Optimization Brief Interlude: Coordinate Ascent m m m α i − 1 � � � max α i α j y i y j ( x j · x i ) 2 α i = 1 i = 1 j = 1 s.t. C ≥ α i ≥ 0 , i ∈ [ 1 , m ] � α i y i = 0 i Loop over each training example, change α i to maximize the above function Although coordinate ascent works OK for lots of problems, we have the constraint � i α i y i = 0 Machine Learning: Chenhao Tan | Boulder | 30 of 52

  18. Sequential Mimimal Optimization Outline for SVM Optimization (SMO) 1. Select two examples i , j 2. Update α j , α i to maximize the above function Machine Learning: Chenhao Tan | Boulder | 31 of 52

  19. Sequential Mimimal Optimization Karush-Kuhn-Tucker (KKT) conditions Primal and dual feasibility y i ( w · x i + b ) ≥ 1 − ξ i , ξ i ≥ 0 , C ≥ α i ≥ 0 , β i ≥ 0 (20) Stationarity m m � � w = α i y i x i , α i y i = 0 , α i + β i = C (21) i = 1 i = 1 Complementary slackness α i [ y i ( w · x i + b ) − 1 + ξ i ] = 0 , β i ξ i = 0 (22) Machine Learning: Chenhao Tan | Boulder | 32 of 52

  20. Sequential Mimimal Optimization Outline for SVM Optimization (SMO) y i α i + y j α j = y i α old + y j α old = γ i j Machine Learning: Chenhao Tan | Boulder | 33 of 52

  21. Sequential Mimimal Optimization Step 2: Optimize α j 1. Compute upper ( H ) and lower ( L ) bounds that ensure 0 ≤ α j ≤ C . y i = y j y i � = y j L = max( 0 , α i + α j − C ) (25) L = max( 0 , α j − α i ) (23) H = min( C , α j + α i ) (26) H = min( C , C + α j − α i ) (24) Machine Learning: Chenhao Tan | Boulder | 34 of 52

  22. Sequential Mimimal Optimization Step 2: Optimize α j 1. Compute upper ( H ) and lower ( L ) bounds that ensure 0 ≤ α j ≤ C . y i = y j y i � = y j L = max( 0 , α i + α j − C ) (25) L = max( 0 , α j − α i ) (23) H = min( C , α j + α i ) (26) H = min( C , C + α j − α i ) (24) This is because the update for α i is based on y i y j (sign matters) Machine Learning: Chenhao Tan | Boulder | 34 of 52

  23. Sequential Mimimal Optimization Step 2: Optimize α j Compute errors for i and j E k ≡ f ( x k ) − y k (27) η = 2 x i · x j − x i · x i − x j · x j (28) for new value for α j − y j ( E i − E j ) j = α ( old ) α ∗ (29) j η Machine Learning: Chenhao Tan | Boulder | 35 of 52

  24. Sequential Mimimal Optimization Step 3: Optimize α i Set α i : � � i = α ( old ) α ( old ) α ∗ + y i y j − α j (30) i j Machine Learning: Chenhao Tan | Boulder | 36 of 52

  25. Sequential Mimimal Optimization Step 3: Optimize α i Set α i : � � i = α ( old ) α ( old ) α ∗ + y i y j − α j (30) i j This balances out the move that we made for α j . Machine Learning: Chenhao Tan | Boulder | 36 of 52

  26. Sequential Mimimal Optimization Overall algorithm Iterate over i = { 1 , . . . m } Repeat until KKT conditions are met Choose j randomly from m − 1 other options Update α i , α j Find w , b based on stationarity conditions Machine Learning: Chenhao Tan | Boulder | 37 of 52

  27. Sequential Mimimal Optimization Iterations / Details • What if i doesn’t violate the KKT conditions? • What if η ≥ 0 ? • When do we stop? Machine Learning: Chenhao Tan | Boulder | 38 of 52

  28. Sequential Mimimal Optimization Iterations / Details • What if i doesn’t violate the KKT conditions? Skip it! • What if η ≥ 0 ? • When do we stop? Machine Learning: Chenhao Tan | Boulder | 38 of 52

  29. Sequential Mimimal Optimization Iterations / Details • What if i doesn’t violate the KKT conditions? Skip it! • What if η ≥ 0 ? Skip it! (should not happen except for numerical instability) • When do we stop? Machine Learning: Chenhao Tan | Boulder | 38 of 52

  30. Sequential Mimimal Optimization Iterations / Details • What if i doesn’t violate the KKT conditions? Skip it! • What if η ≥ 0 ? Skip it! (should not happen except for numerical instability) • When do we stop? Until we go through α ’s without changing anything Machine Learning: Chenhao Tan | Boulder | 38 of 52

  31. Sequential Mimimal Optimization SMO Algorithm Negative Positive (-2, -3) (-2, 2) (0, -1) (0, 4) (2, -3) (2, 1) 1 positive 0 2 4 3 5 negative Machine Learning: Chenhao Tan | Boulder | 39 of 52

  32. Sequential Mimimal Optimization SMO Algorithm Negative Positive (-2, -3) (-2, 2) (0, -1) (0, 4) (2, -3) (2, 1) • Initially, all alphas are zero 1 positive α = < 0 , 0 , 0 , 0 , 0 , 0 > 0 2 4 3 5 negative Machine Learning: Chenhao Tan | Boulder | 39 of 52

  33. Sequential Mimimal Optimization SMO Algorithm Negative Positive (-2, -3) (-2, 2) (0, -1) (0, 4) (2, -3) (2, 1) • Initially, all alphas are zero 1 positive α = < 0 , 0 , 0 , 0 , 0 , 0 > 0 2 4 3 5 • Intercept b is also zero negative • Capacity C = π Machine Learning: Chenhao Tan | Boulder | 39 of 52

  34. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : Predictions and Step 1 positive • Prediction: f ( x 0 ) 0 • Prediction: f ( x 4 ) 2 • Error: E 0 • Error: E 4 4 3 5 negative Machine Learning: Chenhao Tan | Boulder | 40 of 52

  35. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : Predictions and Step 1 positive • Prediction: f ( x 0 ) = 0 0 • Prediction: f ( x 4 ) 2 • Error: E 0 • Error: E 4 4 3 5 negative Machine Learning: Chenhao Tan | Boulder | 40 of 52

  36. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : Predictions and Step 1 positive • Prediction: f ( x 0 ) = 0 0 • Prediction: f ( x 4 ) = 0 2 • Error: E 0 • Error: E 4 4 3 5 negative Machine Learning: Chenhao Tan | Boulder | 40 of 52

  37. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : Predictions and Step 1 positive • Prediction: f ( x 0 ) = 0 0 • Prediction: f ( x 4 ) = 0 2 • Error: E 0 = − 1 • Error: E 4 = + 1 4 3 5 negative Machine Learning: Chenhao Tan | Boulder | 40 of 52

  38. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : Predictions and Step 1 positive • Prediction: f ( x 0 ) = 0 0 • Prediction: f ( x 4 ) = 0 2 • Error: E 0 = − 1 • Error: E 4 = + 1 4 η = 2 � x 0 , x 4 � − � x 0 , x 0 � − � x 4 , x 4 � 3 5 negative Machine Learning: Chenhao Tan | Boulder | 40 of 52

  39. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : Predictions and Step 1 positive • Prediction: f ( x 0 ) = 0 0 • Prediction: f ( x 4 ) = 0 2 • Error: E 0 = − 1 • Error: E 4 = + 1 4 η = 2 � x 0 , x 4 � − � x 0 , x 0 � − � x 4 , x 4 � 3 5 = 2 · − 2 − 8 − 1 = − 13 negative Machine Learning: Chenhao Tan | Boulder | 40 of 52

  40. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : Bounds • Lower and upper bounds for α j L = max( 0 , α j − α i ) (31) H = min( C , C + α j − α i ) (32) Machine Learning: Chenhao Tan | Boulder | 41 of 52

  41. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : Bounds • Lower and upper bounds for α j L = max( 0 , α j − α i ) = 0 (31) H = min( C , C + α j − α i ) (32) Machine Learning: Chenhao Tan | Boulder | 41 of 52

  42. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : Bounds • Lower and upper bounds for α j L = max( 0 , α j − α i ) = 0 (31) H = min( C , C + α j − α i ) = π (32) Machine Learning: Chenhao Tan | Boulder | 41 of 52

  43. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : α update New value for α j j = α j − y j ( E i − E j ) α ∗ (33) η (34) Machine Learning: Chenhao Tan | Boulder | 42 of 52

  44. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : α update New value for α j j = α j − y j ( E i − E j ) = − 2 η = 2 α ∗ (33) η 13 (34) Machine Learning: Chenhao Tan | Boulder | 42 of 52

  45. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : α update New value for α j j = α j − y j ( E i − E j ) = − 2 η = 2 α ∗ (33) η 13 New value for α i (34) Machine Learning: Chenhao Tan | Boulder | 42 of 52

  46. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : α update New value for α j j = α j − y j ( E i − E j ) = − 2 η = 2 α ∗ (33) η 13 New value for α i � � α ( old ) α ∗ i = α i + y i y j − α j (34) j Machine Learning: Chenhao Tan | Boulder | 42 of 52

  47. Sequential Mimimal Optimization SMO Optimization for i = 0 , j = 4 : α update New value for α j j = α j − y j ( E i − E j ) = − 2 η = 2 α ∗ (33) η 13 New value for α i = α j = 2 � � α ( old ) α ∗ i = α i + y i y j − α j (34) j 13 Machine Learning: Chenhao Tan | Boulder | 42 of 52

  48. Sequential Mimimal Optimization Margin Machine Learning: Chenhao Tan | Boulder | 43 of 52

  49. Sequential Mimimal Optimization Find weight vector and bias • Weight vector m � � w = α i y i � (35) x i i • Bias b = b ( old ) − E i − y i ( α ∗ i − α ( old ) j − α ( old ) ) x i · x i − y j ( α ∗ ) x i · x j (36) i j (37) Machine Learning: Chenhao Tan | Boulder | 44 of 52

  50. Sequential Mimimal Optimization Find weight vector and bias • Weight vector � 0 m x i = 2 � − 2 � − 2 � � � w = α i y i � (35) − 1 13 2 13 i • Bias b = b ( old ) − E i − y i ( α ∗ i − α ( old ) j − α ( old ) ) x i · x i − y j ( α ∗ ) x i · x j (36) i j (37) Machine Learning: Chenhao Tan | Boulder | 44 of 52

  51. Sequential Mimimal Optimization Find weight vector and bias • Weight vector � 0 m � − 4 � − 2 � � � x i = 2 − 2 � � w = α i y i � = 13 (35) 6 2 − 1 13 13 13 i • Bias b = b ( old ) − E i − y i ( α ∗ i − α ( old ) j − α ( old ) ) x i · x i − y j ( α ∗ ) x i · x j (36) i j (37) Machine Learning: Chenhao Tan | Boulder | 44 of 52

  52. Sequential Mimimal Optimization Find weight vector and bias • Weight vector � 0 � − 4 m � − 2 � � � x i = 2 − 2 � w = � α i y i � = 13 (35) 6 − 1 2 13 13 13 i • Bias b = b ( old ) − E i − y i ( α ∗ i − α ( old ) j − α ( old ) ) x i · x i − y j ( α ∗ ) x i · x j (36) i j = 1 − 2 13 · 8 + 2 13 · − 2 = − 0 . 54 (37) Machine Learning: Chenhao Tan | Boulder | 44 of 52

  53. Sequential Mimimal Optimization SMO Optimization for i = 2 , j = 4 1 positive Let’s skip the boring stuff • E 2 = − 1 . 69 0 2 • E 4 = 0 . 00 • η = − 8 • α 4 = α ( old ) − y j ( E i − E j ) 4 j η 3 5 � � • α 2 = α ( old ) α ( old ) + y i y j − α j negative i j Machine Learning: Chenhao Tan | Boulder | 45 of 52

  54. Sequential Mimimal Optimization SMO Optimization for i = 2 , j = 4 1 positive Let’s skip the boring stuff • E 2 = − 1 . 69 0 2 • E 4 = 0 . 00 • η = − 8 • α 4 = α ( old ) − y j ( E i − E j ) 4 j η 3 5 � � • α 2 = α ( old ) α ( old ) + y i y j − α j negative i j Machine Learning: Chenhao Tan | Boulder | 45 of 52

  55. Sequential Mimimal Optimization SMO Optimization for i = 2 , j = 4 1 positive Let’s skip the boring stuff • E 2 = − 1 . 69 0 • E 4 = 0 . 00 2 • η = − 8 • α 4 = α ( old ) − y j ( E i − E j ) = 0 . 15 + − 1 . 69 = 4 j η − 8 0 . 37 3 5 � � • α 2 = α ( old ) α ( old ) negative + y i y j − α j i j Machine Learning: Chenhao Tan | Boulder | 45 of 52

  56. Sequential Mimimal Optimization SMO Optimization for i = 2 , j = 4 Let’s skip the boring stuff 1 positive • E 2 = − 1 . 69 0 • E 4 = 0 . 00 2 • η = − 8 • α 4 = α ( old ) − y j ( E i − E j ) = 0 . 15 + − 1 . 69 = j η − 8 4 0 . 37 3 5 � � • α 2 = α ( old ) α ( old ) + y i y j − α j = negative i j 0 − ( 0 . 15 − 0 . 37 ) = 0 . 21 Machine Learning: Chenhao Tan | Boulder | 45 of 52

  57. Sequential Mimimal Optimization Margin Machine Learning: Chenhao Tan | Boulder | 46 of 52

  58. Sequential Mimimal Optimization Weight vector and bias • Bias b = − 0 . 12 • Weight vector m � � w = α i y i � x i (38) i Machine Learning: Chenhao Tan | Boulder | 47 of 52

  59. Sequential Mimimal Optimization Weight vector and bias • Bias b = − 0 . 12 • Weight vector m � 0 . 12 � � � w = α i y i � x i = (38) 0 . 88 i Machine Learning: Chenhao Tan | Boulder | 47 of 52

  60. Sequential Mimimal Optimization Another Iteration ( i = 0 , j = 2 ) Machine Learning: Chenhao Tan | Boulder | 48 of 52

  61. Sequential Mimimal Optimization SMO Algorithm • Convenient approach for solving: vanilla, slack, kernel approaches • Convex problem • Scalable to large datasets (implemented in scikit learn) • What we didn’t do: ◦ Check KKT conditions ◦ Randomly choose indices Machine Learning: Chenhao Tan | Boulder | 49 of 52

  62. Recap Outline Duality Slack variables Sequential Mimimal Optimization Recap Machine Learning: Chenhao Tan | Boulder | 50 of 52

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend