today
play

Today Perceptron. Today Perceptron. Support Vector Machine. - PowerPoint PPT Presentation

Today Perceptron. Today Perceptron. Support Vector Machine. Labelled points with x 1 ,..., x n . + + ++ Labelled points with x 1 ,..., x n . Hyperplane separator. + + ++ Labelled points


  1. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i w t + 1 x i x i w t

  2. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! w t + 1 x i x i w t

  3. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i w t

  4. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t

  5. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t ( w t + x i ) 2 = | w t | 2 + 2 w t · x i + | x i | 2 .

  6. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t ( w t + x i ) 2 = | w t | 2 + 2 w t · x i + | x i | 2 . ≤ | w t | 2 + | x i | 2 = | w t | 2 + 1 .

  7. Alg: Given x 1 ,..., x n . Let w 1 = x 1 . For each x i , w t · x i is wrong sign (negative) w t + 1 = w t + x i t = t + 1 Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 w t + 1 = w t + x i Less than a right angle! → | w t + 1 | 2 ≤ | w t | 2 + | x i | 2 ≤ | w t | 2 + 1 . w t + 1 x i x i Algebraically. Positive x i , w t x i ≤ 0. w t ( w t + x i ) 2 = | w t | 2 + 2 w t · x i + | x i | 2 . ≤ | w t | 2 + | x i | 2 = | w t | 2 + 1 . Claim 2 holds even if no separating hyperplane!

  8. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ .

  9. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1

  10. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm.

  11. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M

  12. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w

  13. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w ≤ || w t ||

  14. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w √ ≤ || w t || ≤ M .

  15. Putting it together... Claim 1: w t + 1 · w ≥ w t · w + γ . Claim 2: | w t + 1 | 2 ≤ | w t | 2 + 1 M -number of mistakes in algorithm. γ M ≤ w t + 1 · w √ ≤ || w t || ≤ M . → M ≤ 1 γ 2

  16. Hinge Loss. Most of data has good separator.

  17. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ .

  18. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress

  19. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way.

  20. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting?

  21. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin.

  22. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ .

  23. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part.

  24. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t .

  25. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . w M ≥ γ M − TD γ

  26. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . w M ≥ γ M − TD γ + Claim 2. →

  27. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M

  28. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M

  29. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0.

  30. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0.

  31. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh...

  32. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ .

  33. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ .

  34. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ . The extra is (twice) the amount of rotation in units of γ .

  35. Hinge Loss. Most of data has good separator. Claim 1: w t + 1 · w ≥ w t · w + γ . Don’t make progress or tilt the wrong way. How much bad tilting? Rotate points to have γ -margin. Total rotation: TD γ . Anaylsis: subtract bad tilting part. Claim 1: w t + 1 · w ≥ w t · w + γ − rotation for x i t . √ w M ≥ γ M − TD γ + Claim 2. → γ M − TD γ ≤ M Quadratic equation: γ 2 M 2 − ( 2 γ TD γ + 1 ) M + TD 2 γ ≤ 0. Uh... One implication: M ≤ 1 γ 2 + 2 γ TD γ . The extra is (twice) the amount of rotation in units of γ . Hinge loss: 1 γ TD γ .

  36. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane.

  37. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it!

  38. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.)

  39. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake.

  40. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 ,

  41. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n ,

  42. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i ,

  43. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1

  44. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1 Claim 1: w t + 1 · w ≥ w t w + γ 2 .

  45. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1 Claim 1: w t + 1 · w ≥ w t w + γ 2 . Same

  46. Approximately Maximizing Margin Algorithm There is a γ separating hyperplane. Find it! (Kind of.) Any point within γ / 2 is still a mistake. Let w 1 = x 1 , For each x 2 ,... x n , if w t · x i < γ / 2, w t + 1 = w t + x i , t = t + 1 Claim 1: w t + 1 · w ≥ w t w + γ 2 . Same (ish) as before.

  47. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1??

  48. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1??

  49. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. x i w t < γ / 2

  50. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. x i w t < γ / 2

  51. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i w t + 1 w t < γ / 2

  52. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v w t + 1 w t < γ / 2

  53. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 w t < γ / 2

  54. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2

  55. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.)

  56. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.) Red bit is at most γ / 2.

  57. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.) Red bit is at most γ / 2. 2 | w t | + γ 1 Together: | w t + 1 | ≤ | w t | + 2

  58. Margin Approximation: Claim 2 Claim 2(?): | w t + 1 | 2 ≤ | w t | 2 + 1?? Adding x i to w t even if in correct direction. Obtuse triangle. x i v | v | 2 ≤ | w t | 2 + 1 w t + 1 1 → | v | ≤ | w t | + w t 2 | w t | < γ / 2 (square right hand side.) Red bit is at most γ / 2. 2 | w t | + γ 1 Together: | w t + 1 | ≤ | w t | + 2 If | w t | ≥ 2 γ , then | w t + 1 | ≤ | w t | + 3 4 γ .

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend