optimization for machine learning
play

Optimization for Machine Learning Lecture 2: Support Vector Machine - PowerPoint PPT Presentation

Optimization for Machine Learning Lecture 2: Support Vector Machine Training S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 41


  1. Optimization for Machine Learning Lecture 2: Support Vector Machine Training S.V . N. (vishy) Vishwanathan Purdue University vishy@purdue.edu July 11, 2012 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 1 / 41

  2. Linear Support Vector Machines Outline Linear Support Vector Machines 1 Stochastic Optimization 2 Implicit Updates 3 Dual Problem 4 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 2 / 41

  3. Linear Support Vector Machines Binary Classification y i = +1 y i = − 1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 41

  4. Linear Support Vector Machines Binary Classification y i = +1 y i = − 1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 41

  5. Linear Support Vector Machines Binary Classification y i = +1 y i = − 1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 41

  6. Linear Support Vector Machines Binary Classification � w , x 1 � + b = +1 y i = +1 � w , x 2 � + b = − 1 � w , x 1 − x 2 � = 2 � � 2 w � w � , x 1 − x 2 = � w � x 2 x 1 y i = − 1 { x | � w , x � + b = 1 } { x | � w , x � + b = − 1 } { x | � w , x � + b = 0 } S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 3 / 41

  7. Linear Support Vector Machines Linear Support Vector Machines Optimization Problem m 2 � w � 2 + 1 λ � min ξ i m w , b ,ξ i =1 s.t. y i ( � w , x i � + b ) ≥ 1 − ξ i for all i ξ i ≥ 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 41

  8. Linear Support Vector Machines Linear Support Vector Machines Optimization Problem m λ 2 � w � 2 + 1 � min max(0 , 1 − y i ( � w , x i � + b )) m w , b i =1 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 41

  9. Linear Support Vector Machines Linear Support Vector Machines Optimization Problem m λ + 1 � 2 � w � 2 min max(0 , 1 − y i ( � w , x i � + b )) m w , b i =1 � �� � � �� � λ Ω( w ) R emp ( w ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 4 / 41

  10. Stochastic Optimization Outline Linear Support Vector Machines 1 Stochastic Optimization 2 Implicit Updates 3 Dual Problem 4 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 5 / 41

  11. Stochastic Optimization Stochastic Optimization Algorithms Optimization Problem (with no bias) m λ + 1 � 2 � w � 2 min max(0 , 1 − y i � w , x i � ) m w i =1 � �� � � �� � Ω( w ) R emp ( w ) Unconstrained Nonsmooth Convex S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 6 / 41

  12. Stochastic Optimization Pegasos: Stochastic Gradient Descent Require: T 1: w 0 ← 0 2: for t = 1 , . . . , T do 1 η t ← 3: λ t if y t � w t , x t � < 1 then 4: w ′ t ← (1 − η t λ ) w t + η t y t x t 5: else 6: w ′ t ← (1 − η t λ ) w t 7: end if 8: 9: end for √ � � 1 , 1 / λ w ′ 10: w t +1 ← min t � w ′ t � S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 7 / 41

  13. Stochastic Optimization Understanding Pegasos Objective Function Revisited m J ( w ) = λ 2 � w � 2 + 1 � max(0 , 1 − y i � w , x i � ) m i =1 Subgradient If y t � w , x t � < 1 then ∂ w J t ( w ) = λ w − y t x t else ∂ w J t ( w ) = λ w S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 41

  14. Stochastic Optimization Understanding Pegasos Objective Function Revisited J ( w ) ≈ J t ( w ) = λ 2 � w � 2 + max(0 , 1 − y t � w , x t � ) Subgradient If y t � w , x t � < 1 then ∂ w J t ( w ) = λ w − y t x t else ∂ w J t ( w ) = λ w S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 41

  15. Stochastic Optimization Understanding Pegasos Objective Function Revisited J ( w ) ≈ J t ( w ) = λ 2 � w � 2 + max(0 , 1 − y t � w , x t � ) Subgradient If y t � w , x t � < 1 then ∂ w J t ( w ) = λ w − y t x t else ∂ w J t ( w ) = λ w S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 8 / 41

  16. Stochastic Optimization Understanding Pegasos Explicit Update If y t � w , x t � < 1 then w ′ t = w t − η t ∂ w J t ( w t ) = (1 − λη t ) w t + y t x t else w ′ t = w t − η t ∂ w J t ( w t ) = (1 − λη t ) w t Projection Project w ′ t onto the set √ � � B = w s.t. � w � ≤ 1 / λ S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 9 / 41

  17. Stochastic Optimization Motivating Stochastic Gradient Descent How are the Updates Derived? Minimize the following objective function 1 2 � w − w t � 2 + η t J t ( w ) w t +1 = argmin w This gives us S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 41

  18. Stochastic Optimization Motivating Stochastic Gradient Descent How are the Updates Derived? Minimize the following objective function 1 2 � w − w t � 2 + η t J t ( w ) w t +1 = argmin w This gives us w t +1 = w t − η t ∂ w J t ( w t +1 ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 41

  19. Stochastic Optimization Motivating Stochastic Gradient Descent How are the Updates Derived? Minimize the following objective function 1 2 � w − w t � 2 + η t J t ( w ) w t +1 = argmin w This gives us w t +1 = w t − η t ∂ w J t ( w t +1 ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 41

  20. Stochastic Optimization Motivating Stochastic Gradient Descent How are the Updates Derived? Minimize the following objective function 1 2 � w − w t � 2 + η t J t ( w ) w t +1 = argmin w This gives us w t +1 ≈ w t − η t ∂ w J t ( w t ) S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 10 / 41

  21. Implicit Updates Outline Linear Support Vector Machines 1 Stochastic Optimization 2 Implicit Updates 3 Dual Problem 4 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 11 / 41

  22. Implicit Updates Implicit Updates What if we did not approximate ∂ w J t ( w t +1 ) ? w t +1 = w t − η t ∂ w J t ( w t +1 ) Subgradient ∂ w J t ( w ) = λ w − γ y t x t If y t � w , x t � < 1 then γ = 1 If y t � w , x t � = 1 then γ ∈ [0 , 1] If y t � w , x t � > 1 then γ = 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 41

  23. Implicit Updates Implicit Updates What if we did not approximate ∂ w J t ( w t +1 ) ? w t +1 = w t − η t ∂ w J t ( w t +1 ) Subgradient ∂ w J t ( w ) = λ w − γ y t x t If y t � w , x t � < 1 then γ = 1 If y t � w , x t � = 1 then γ ∈ [0 , 1] If y t � w , x t � > 1 then γ = 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 41

  24. Implicit Updates Implicit Updates What if we did not approximate ∂ w J t ( w t +1 ) ? w t +1 = w t − η t λ w t +1 + γη t y t x t Subgradient ∂ w J t ( w ) = λ w − γ y t x t If y t � w , x t � < 1 then γ = 1 If y t � w , x t � = 1 then γ ∈ [0 , 1] If y t � w , x t � > 1 then γ = 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 41

  25. Implicit Updates Implicit Updates What if we did not approximate ∂ w J t ( w t +1 ) ? (1 + η t λ ) w t +1 = w t + γη t y t x t Subgradient ∂ w J t ( w ) = λ w − γ y t x t If y t � w , x t � < 1 then γ = 1 If y t � w , x t � = 1 then γ ∈ [0 , 1] If y t � w , x t � > 1 then γ = 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 41

  26. Implicit Updates Implicit Updates What if we did not approximate ∂ w J t ( w t +1 ) ? 1 w t +1 = 1 + η t λ [ w t + γη t y t x t ] Subgradient ∂ w J t ( w ) = λ w − γ y t x t If y t � w , x t � < 1 then γ = 1 If y t � w , x t � = 1 then γ ∈ [0 , 1] If y t � w , x t � > 1 then γ = 0 S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 12 / 41

  27. Implicit Updates Implicit Updates: Case 1 The Implicit Update Condition 1 w t +1 = 1 + η t λ [ w t + γη t y t x t ] Case 1 Suppose 1 + η t λ < y t � w t , x t � . Set 1 w t +1 = 1 + η t λ w t Verify y t � w t +1 , x t � > 1 which implies that γ = 0 and the implicit update condition is satisfied S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 13 / 41

  28. Implicit Updates Implicit Updates: Case 2 The Implicit Update Condition 1 w t +1 = 1 + η t λ [ w t + γη t y t x t ] Case 2 Suppose y t � w t , x t � < 1 + η t λ − η t � x t , x t � . Set 1 w t +1 = 1 + η t λ [ w t + η t y t x t ] Verify y t � w t +1 , x t � < 1 which implies that γ = 1 and the implicit update condition is satisfied S.V . N. Vishwanathan (Purdue University) Optimization for Machine Learning 14 / 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend