machine learning for signal
play

Machine Learning for Signal Processing Detecting faces (& other - PowerPoint PPT Presentation

Machine Learning for Signal Processing Detecting faces (& other objects) in images Class 7. 22 Sep 2015 11755/18979 1 Last Lecture: How to describe a face The typical face A typical face that captures the essence of


  1. Boosting • The basic idea: Can a “weak” learning algorithm that performs just slightly better than a random guess be boosted into an arbitrarily accurate “strong” learner • This is a “meta” algorithm, that poses no constraints on the form of the weak learners themselves 11755/18979 40

  2. Boosting: A Voting Perspective • Boosting is a form of voting – Let a number of different classifiers classify the data – Go with the majority – Intuition says that as the number of classifiers increases, the dependability of the majority vote increases • Boosting by majority • Boosting by weighted majority – A (weighted) majority vote taken over all the classifiers – How do we compute weights for the classifiers? – How do we actually train the classifiers 11755/18979 41

  3. ADA Boost • Challenge: how to optimize the classifiers and their weights? – Trivial solution: Train all classifiers independently – Optimal: Each classifier focuses on what others missed – But joint optimization becomes impossible • Ada ptive Boost ing: Greedy incremental optimization of classifiers – Keep adding classifiers incrementally, to fix what others missed 11755/18979 42

  4. AdaBoost ILLUSTRATIVE EXAMPLE 11755/18979 43

  5. AdaBoost First WEAK Learner 11755/18979 44

  6. AdaBoost The First Weak Learner makes Errors 11755/18979 45

  7. AdaBoost Reweighted data 11755/18979 46

  8. AdaBoost SECOND Weak Learner FOCUSES ON DATA “MISSED” BY FIRST LEARNER 11755/18979 47

  9. AdaBoost SECOND STRONG Learner Combines both Weak Learners 11755/18979 48

  10. AdaBoost RETURNING TO THE SECOND WEAK LEARNER 11755/18979 49

  11. AdaBoost The SECOND Weak Learner makes Errors 11755/18979 50

  12. AdaBoost Reweighting data 11755/18979 51

  13. THIRD Weak AdaBoost Learner FOCUSES ON DATA “MISSED” BY FIRST AND SECOND LEARNERs 11755/18979 52

  14. AdaBoost THIRD STRONG Learner 11755/18979 53

  15. Boosting: An Example • Red dots represent training data from Red class • Blue dots represent training data from Blue class 11755/18979 54

  16. Boosting: An Example • The final strong learner has learnt a complicated decision boundary 11755/18979 55

  17. Boosting: An Example • The final strong learner h as learnt a complicated decision boundary • Decision boundaries in areas with low density of training points assumed inconsequential 11755/18979 56

  18. Overall Learning Pattern  Strong learner increasingly accurate with increasing number of weak learners  Residual errors increasingly difficult to correct ‒ Additional weak learners less and less effective Error of n th weak learner Error of n th strong learner number r of weak k learn rners rs 11755/18979 57

  19. Overfitting  Note: Can continue to add weak learners EVEN after strong learner error goes to 0!  Shown to IMPROVE generalization! Error of n th weak learner This may go to 0 Error of n th strong learner number r of weak k learn rners rs 11755/18979 58

  20. AdaBoost: Summary • No relation to Ada Lovelace • Adaptive Boosting • Adaptively Selects Weak Learners • ~8K citations for just one paper for Freund and Schapire 11755/18979 59

  21. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Sum { D t (x i ) ½(1 – y i h t ( x i ))} – Set  t = ½ ln ((1 – e t ) / e t ) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 60

  22. First, some example data = 0.3 E1 - 0.6 E2 = 0.2 E1 + 0.4 E2 = 0.5 E1 - 0.5 E2 = -0.8 E1 - 0.1 E2 E 1 = 0.7 E1 - 0.1 E2 = 0.4 E1 - 0.9 E2 = 0.6 E1 - 0.4 E2 = 0.2 E1 + 0.5 E2 E 2 Image = a*E1 + b*E2  a = Image.E1 • Face detection with multiple Eigen faces • Step 0: Derived top 2 Eigen faces from Eigen face training data • Step 1: On a (different) set of examples, express each image as a linear combination of Eigen faces – Examples include both faces and non faces – Even the non-face images are explained in terms of the Eigen faces 11755/18979 61

  23. Training Data D A = 0.3 E1 - 0.6 E2 = 0.2 E1 + 0.4 E2 E B = 0.5 E1 - 0.5 E2 = -0.8 E1 - 0.1 E2 F C = 0.7 E1 - 0.1 E2 = 0.4 E1 - 0.9 E2 G D = 0.6 E1 - 0.4 E2 = 0.2 E1 + 0.5 E2 ID E1 E2. Class A 0.3 -0.6 +1 Face = +1 B 0.5 -0.5 +1 Non-face = -1 C 0.7 -0.1 +1 D 0.6 -0.4 +1 E 0.2 0.4 -1 F -0.8 -0.1 -1 G 0.4 -0.9 -1 H 0.2 0.5 -1 11755/18979 62

  24. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Sum { D t (x i ) ½(1 – y i h t ( x i ))} – Set  t = ½ ln ((1 – e t ) / e t ) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 63

  25. Initialize D 1 ( x i ) = 1/ N 11755/18979 64

  26. Training Data = 0.3 E1 - 0.6 E2 = 0.2 E1 + 0.4 E2 = 0.5 E1 - 0.5 E2 = -0.8 E1 - 0.1 E2 = 0.7 E1 - 0.1 E2 = 0.4 E1 - 0.9 E2 = 0.6 E1 - 0.4 E2 = 0.2 E1 + 0.5 E2 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 65

  27. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Sum { D t (x i ) ½(1 – y i h t ( x i ))} – Set  t = ½ ln ( e t /(1 – e t )) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 66

  28. The E1 “Stump” Classifier based on E1: F E H A G B C D if ( sign*wt(E1) > thresh) > 0) face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 sign = +1 or -1 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 67

  29. The E1 “Stump” Classifier based on E1: F E H A G B C D if ( sign*wt(E1) > thresh) > 0) face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 sign = +1 or -1 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 68

  30. The E1 “Stump” Classifier based on E1: F E H A G B C D if ( sign*wt(E1) > thresh) > 0) face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 sign = +1 or -1 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 69

  31. The E1 “Stump” Classifier based on E1: F E H A G B C D if ( sign*wt(E1) > thresh) > 0) face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 sign = +1 or -1 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 70

  32. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 71

  33. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 2/8 Sign = -1, error = 6/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 72

  34. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 1/8 Sign = -1, error = 7/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 73

  35. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 2/8 Sign = -1, error = 6/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 74

  36. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 1/8 Sign = -1, error = 7/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 75

  37. The E1 “Stump” Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 2/8 Sign = -1, error = 6/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 76

  38. The Best E1 “Stump” F E H A G B C D Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 face = true 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 Sign = +1 threshold Threshold = 0.45 Sign = +1, error = 1/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 77

  39. The E2“Stump” Classifier based on E2: if ( sign*wt(E2) > thresh) > 0) Note order G A B D C F E H face = true -0.9 -0.6 -0.5 -0.4 -0.1 -0.1 0.4 0.5 sign = +1 or -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 3/8 Sign = -1, error = 5/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 78

  40. The Best E2“Stump” Classifier based on E2: if ( sign*wt(E2) > thresh) > 0) G A B D C F E H face = true -0.9 -0.6 -0.5 -0.4 -0.1 -0.1 0.4 0.5 sign = -1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 Threshold = 0.15 threshold Sign = -1, error = 2/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 79

  41. The Best “Stump” F E H A G B C D The Best overall classifier based on a single feature is -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 based on E1 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 If (wt(E1) > 0.45)  Face threshold Sign = +1, error = 1/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 11755/18979 80

  42. The Best “Stump” 11755/18979 81

  43. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Sum { D t (x i ) ½(1 – y i h t ( x i ))} – Set  t = ½ ln ( e t /(1 – e t )) – For i = 1… N – • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 82

  44. The Best “Stump” 11755/18979 83

  45. The Best Error F E H A G B C D The Error of the classifier is the sum of the weights of -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 the misclassified instances 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 1/8 ID E1 E2. Class Weight A 0.3 -0.6 +1 1/8 B 0.5 -0.5 +1 1/8 C 0.7 -0.1 +1 1/8 D 0.6 -0.4 +1 1/8 E 0.2 0.4 -1 1/8 F -0.8 -0.1 -1 1/8 G 0.4 -0.9 -1 1/8 H 0.2 0.5 -1 1/8 NOTE: THE ERROR IS THE SUM OF THE WEIGHTS OF MISCLASSIFIED INSTANCES 11755/18979 84

  46. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Sum { D t (x i ) ½(1 – y i h t ( x i ))} – Set  t = ½ ln ((1 – e t ) / e t ) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 85

  47. Computing Alpha F E H A G B C D Alpha = 0.5ln((1-1/8) / (1/8)) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 = 0.5 ln(7) = 0.97 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 1/8 11755/18979 86

  48. The Boosted Classifier Thus Far F E H A G B C D Alpha = 0.5ln((1-1/8) / (1/8)) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 = 0.5 ln(7) = 0.97 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold Sign = +1, error = 1/8 h1(X) = wt(E1) > 0.45 ? +1 : -1 H(X) = sign(0.97 * h1(X)) It’s the same as h1(x) 11755/18979 87

  49. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Average {½ (1 – y i h t ( x i ))} – Set  t = ½ ln ((1 – e t ) / e t ) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 88

  50. The Best Error F E H A G B C D D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 exp(  t ) = exp(0.97) = 2.63 exp(-  t ) = exp(-0.97) = 0.38 threshold ID E1 E2. Class Weight Weight A 0.3 -0.6 +1 1/8 * 2.63 0.33 B 0.5 -0.5 +1 1/8 * 0.38 0.05 C 0.7 -0.1 +1 1/8 * 0.38 0.05 D 0.6 -0.4 +1 1/8 * 0.38 0.05 E 0.2 0.4 -1 1/8 * 0.38 0.05 F -0.8 0.1 -1 1/8 * 0.38 0.05 G 0.4 -0.9 -1 1/8 * 0.38 0.05 H 0.2 0.5 -1 1/8 * 0.38 0.05 Multiply the correctly classified instances by 0.38 Multiply incorrectly classified instances by 2.63 11755/18979 89

  51. AdaBoost 11755/18979 90

  52. AdaBoost 11755/18979 91

  53. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Average {½ (1 – y i h t ( x i ))} – Set  t = ½ ln ((1 – e t ) / e t ) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 92

  54. The Best Error F E H A G B C D D’ = D / sum(D) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold ID E1 E2. Class Weight Weight Weight A 0.3 -0.6 +1 1/8 * 2.63 0.33 0.48 B 0.5 -0.5 +1 1/8 * 0.38 0.05 0.074 C 0.7 -0.1 +1 1/8 * 0.38 0.05 0.074 D 0.6 -0.4 +1 1/8 * 0.38 0.05 0.074 E 0.2 0.4 -1 1/8 * 0.38 0.05 0.074 F -0.8 0.1 -1 1/8 * 0.38 0.05 0.074 G 0.4 -0.9 -1 1/8 * 0.38 0.05 0.074 H 0.2 0.5 -1 1/8 * 0.38 0.05 0.074 Multiply the correctly classified instances by 0.38 Multiply incorrectly classified instances by 2.63 Normalize to sum to 1.0 11755/18979 93

  55. The Best Error F E H A G B C D D’ = D / sum(D) -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 1/8 1/8 1/8 1/8 1/8 1/8 1/8 1/8 threshold ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 Multiply the correctly classified instances by 0.38 Multiply incorrectly classified instances by 2.63 Normalize to sum to 1.0 11755/18979 94

  56. The ADABoost Algorithm • Initialize D 1 ( x i ) = 1/ N • For t = 1, …, T – Train a weak classifier h t using distribution D t – Compute total error on training data • e t = Average {½ (1 – y i h t ( x i ))} – Set  t = ½ ln ( e t /(1 – e t )) – For i = 1… N • set D t +1 ( x i ) = D t ( x i ) exp(-  t y i h t ( x i )) – Normalize D t +1 to make it a distribution • The final classifier is – H ( x ) = sign( S t  t h t ( x )) 11755/18979 95

  57. E1 classifier Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 .074 .074 .074 .48 .074 .074 .074 .074 threshold Sign = +1, error = 0.222 Sign = -1, error = 0.778 ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 11755/18979 96

  58. E1 classifier Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 .074 .074 .074 .48 .074 .074 .074 .074 threshold Sign = +1, error = 0.148 Sign = -1, error = 0.852 ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 11755/18979 97

  59. The Best E1 classifier Classifier based on E1: if ( sign*wt(E1) > thresh) > 0) F E H A G B C D face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 sign = +1 or -1 .074 .074 .074 .48 .074 .074 .074 .074 threshold Sign = +1, error = 0.074 ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 11755/18979 98

  60. The Best E2 classifier Classifier based on E2: if ( sign*wt(E2) > thresh) > 0) G A B D C F E H face = true -0.9 -0.6 -0.5 -0.4 -0.1 -0.1 0.4 0.5 sign = +1 or -1 .074 .48 .074 .074 .074 .074 .074 .074 threshold Sign = -1, error = 0.148 ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 11755/18979 99

  61. The Best Classifier Classifier based on E1: F E H A G B C D if (wt(E1) > 0.45) face = true -0.8 0.2 0.2 0.3 0.4 0.5 0.6 0.7 .074 .074 .074 .48 .074 .074 .074 .074 Alpha = 0.5ln((1-0.074) / 0.074) threshold = 1.26 Sign = +1, error = 0.074 ID E1 E2. Class Weight A 0.3 -0.6 +1 0.48 B 0.5 -0.5 +1 0.074 C 0.7 -0.1 +1 0.074 D 0.6 -0.4 +1 0.074 E 0.2 0.4 -1 0.074 F -0.8 0.1 -1 0.074 G 0.4 -0.9 -1 0.074 H 0.2 0.5 -1 0.074 11755/18979 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend