announcements homework
play

Announcements - Homework Homework 1 is graded, please collect at end - PowerPoint PPT Presentation

Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution HW1 total score 40 35 30 25 20 15 10 5


  1. Announcements - Homework • Homework 1 is graded, please collect at end of lecture • Homework 2 due today • Homework 3 out soon (watch email) • Ques 1 – midterm review

  2. HW1 score distribution HW1 total score 40 35 30 25 20 15 10 5 0 0~10 10~20 20~30 30~40 40~50 50~60 60~70 70~80 80~90 90~100 100~110 2

  3. Announcements - Midterm • When: Wednesday, 10/20 • Where: In Class • What: You, your pencil, your textbook, your notes, course slides, your calculator, your good mood :) • What NOT: No computers, iphones, or anything else that has an internet connection. • Material: Everything from the beginning of the semester, until, and including SVMs and the Kernel trick 3

  4. Recitation Tomorrow! • Boosting, SVM (convex optimization), Midterm review ! • Strongly recommended!! • Place: NSH 3305 ( Note: change from last time ) • Time: 5-6 pm Rob

  5. Support Vector Machines Aarti Singh Machine Learning 10-701/15-781 Oct 13, 2010

  6. At Pittsburgh G- 20 summit … 6

  7. Linear classifiers – which line is better? 7

  8. Pick the one with the largest margin! 8

  9. Parameterizing the decision boundary w . x =  j w (j) x (j) w . x + b < 0 w . x + b > 0 Example i (= 1,2,…,n): Data: 9

  10. Parameterizing the decision boundary w . x + b < 0 w . x + b > 0 10

  11. Maximizing the margin w . x + b > 0 w . x + b < 0 Distance of closest examples from the line/hyperplane margin = g = 2a/ ǁwǁ g g 11

  12. Maximizing the margin w . x + b > 0 w . x + b < 0 Distance of closest examples from the line/hyperplane margin = g = 2a/ ǁwǁ max g = 2a/ ǁwǁ g g w , b s.t. ( w . x j + b ) y j ≥ a  j Note: ‘a’ is arbitrary (can normalize equations by a) 12

  13. Support Vector Machines w . x + b > 0 w . x + b < 0 min w . w w , b s.t. ( w . x j + b ) y j ≥ 1  j Solve efficiently by quadratic programming (QP) g – Well-studied solution g algorithms Linear hyperplane defined by “ support vectors ” 13

  14. Support Vectors w . x + b > 0 w . x + b < 0 Linear hyperplane defined by “ support vectors ” Moving other points a little doesn’t effect the decision boundary only need to store the support vectors to predict g labels of new points g How many support vectors in linearly separable case? ≤ m+1 14

  15. What if data is not linearly separable? Use features of features of features of features…. 2 , x 2 2 , x 1 x 2 , …., exp(x 1 ) x 1 But run risk of overfitting! 15

  16. What if data is still not linearly separable? Allow “error” in classification min w . w + C #mistakes w , b s.t. ( w . x j + b ) y j ≥ 1  j Maximize margin and minimize # mistakes on training data C - tradeoff parameter Not QP  0/1 loss (doesn’t distinguish between near miss and bad mistake) 16

  17. What if data is still not linearly separable? Allow “error” in classification min w . w + C Σξ j j w , b s.t. ( w . x j + b ) y j ≥ 1 - ξ j  j  j ξ j ≥ 0 ξ j - “slack” variables = (>1 if x j misclassifed) pay linear penalty if mistake C - tradeoff parameter (chosen by Soft margin approach cross-validation) Still QP  17

  18. Slack variables – Hinge loss Complexity penalization min w . w + C Σξ j j w , b s.t. ( w . x j + b ) y j ≥ 1 - ξ j  j  j ξ j ≥ 0 Hinge loss 0-1 loss 0 1 -1 18

  19. SVM vs. Logistic Regression SVM : Hinge loss Logistic Regression : Log loss ( -ve log conditional likelihood) Log loss Hinge loss 0-1 loss -1 0 1 19

  20. What about multiple classes? 20

  21. One against all Learn 3 classifiers separately: Class k vs. rest ( w k , b k ) k=1,2,3 y = arg max w k .x + b k k But w k s may not be based on the same scale. Note: (a w) .x + (ab) is also a solution 21

  22. Learn 1 classifier: Multi-class SVM Simultaneously learn 3 sets of weights Margin - gap between correct class and nearest other class y = arg max w (k) .x + b (k) 22

  23. Learn 1 classifier: Multi-class SVM Simultaneously learn 3 sets of weights y = arg max w (k) .x + b (k) Joint optimization: w k s have the same scale. 23

  24. What you need to know • Maximizing margin • Derivation of SVM formulation • Slack variables and hinge loss • Relationship between SVMs and logistic regression – 0/1 loss – Hinge loss – Log loss • Tackling multiple class – One against All – Multiclass SVMs 24

  25. SVMs reminder Regularization Hinge loss min w . w + C Σξ j w , b s.t. ( w . x j + b ) y j ≥ 1 - ξ j  j  j ξ j ≥ 0 Soft margin approach 25

  26. Today’s Lecture • Learn one of the most interesting and exciting recent advancements in machine learning – The “kernel trick” – High dimensional feature spaces at no extra cost! • But first, a detour – Constrained optimization! 26

  27. Constrained Optimization 27

  28. Lagrange Multiplier – Dual Variables Moving the constraint to objective function Lagrangian: Solve: Constraint is tight when a > 0 28

  29. Duality Primal problem: Dual problem: Weak duality – For all feasible points Strong duality – (holds under KKT conditions) 29

  30. Lagrange Multiplier – Dual Variables b +ve b -ve Solving: When a > 0, constraint is tight 30

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend