computational learning theory shattering and vc dimensions
play

Computational Learning Theory: Shattering and VC Dimensions Machine - PowerPoint PPT Presentation

Computational Learning Theory: Shattering and VC Dimensions Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others This lecture: Computational Learning Theory The Theory of Generalization Probably


  1. Computational Learning Theory: Shattering and VC Dimensions Machine Learning 1 Slides based on material from Dan Roth, Avrim Blum, Tom Mitchell and others

  2. This lecture: Computational Learning Theory • The Theory of Generalization • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 2

  3. This lecture: Computational Learning Theory • The Theory of Generalization • Probably Approximately Correct (PAC) learning • Positive and negative learnability results • Agnostic Learning • Shattering and the VC dimension 3

  4. Infinite Hypothesis Space The previous analysis was restricted to finite hypothesis spaces • Some infinite hypothesis spaces are more expressive than others • – E.g., Rectangles, vs. 17- sides convex polygons vs. general convex polygons – Linear threshold function vs. a combination of LTUs Need a measure of the expressiveness of an infinite hypothesis • space other than its size The Vapnik-Chervonenkis dimension (VC dimension) provides such • a measure – “What is the expressive capacity of a set of functions?” Analogous to |H|, there are bounds for sample complexity using • VC(H) 4

  5. Infinite Hypothesis Space The previous analysis was restricted to finite hypothesis spaces • Some infinite hypothesis spaces are more expressive than others • – E.g., Rectangles, vs. 17- sides convex polygons vs. general convex polygons – Linear threshold function vs. a combination of LTUs Need a measure of the expressiveness of an infinite hypothesis • space other than its size The Vapnik-Chervonenkis dimension (VC dimension) provides such • a measure – “What is the expressive capacity of a set of functions?” Analogous to |H|, there are bounds for sample complexity using • VC(H) 5

  6. Infinite Hypothesis Space The previous analysis was restricted to finite hypothesis spaces • Some infinite hypothesis spaces are more expressive than others • – E.g., Rectangles, vs. 17- sides convex polygons vs. general convex polygons – Linear threshold function vs. a combination of LTUs Need a measure of the expressiveness of an infinite hypothesis • space other than its size The Vapnik-Chervonenkis dimension (VC dimension) provides such • a measure – “What is the expressive capacity of a set of functions?” Analogous to |H|, there are bounds for sample complexity using • VC(H) 6

  7. Infinite Hypothesis Space The previous analysis was restricted to finite hypothesis spaces • Some infinite hypothesis spaces are more expressive than others • – E.g., Rectangles, vs. 17- sides convex polygons vs. general convex polygons – Linear threshold function vs. a combination of LTUs Need a measure of the expressiveness of an infinite hypothesis • space other than its size The Vapnik-Chervonenkis dimension (VC dimension) provides such • a measure – “What is the expressive capacity of a set of functions?” Analogous to |𝐼| , there are bounds for sample complexity using • 𝑊𝐷(𝐼) 7

  8. Learning Rectangles Assume the target concept is an axis parallel rectangle Y X 8

  9. Learning Rectangles Assume the target concept is an axis parallel rectangle Points outside Y are negative Points outside Points outside Points inside are positive are negative are negative X Points outside are negative 9

  10. Learning Rectangles Assume the target concept is an axis parallel rectangle Y + + - X 10

  11. Learning Rectangles Assume the target concept is an axis parallel rectangle Y + + - X 11

  12. Learning Rectangles Assume the target concept is an axis parallel rectangle Y + + - + + X 12

  13. Learning Rectangles Assume the target concept is an axis parallel rectangle Y + + - + + X 13

  14. Learning Rectangles Assume the target concept is an axis parallel rectangle Y + + + + - + + + + + + X 14

  15. Learning Rectangles Assume the target concept is an axis parallel rectangle Y + + + + + - + + + + + + X 15

  16. Learning Rectangles Assume the target concept is an axis parallel rectangle Y + + + + + - + + + + + + X Will we be able to learn the target rectangle? Can we come close? 16

  17. Let’s think about expressivity of functions Suppose we have two points. Can linear classifiers correctly classify any labeling of these points? Linear functions are expressive enough to shatter 2 points What about fourteen points? 17

  18. Let’s think about expressivity of functions There are four ways to label two points 18

  19. Let’s think about expressivity of functions There are four ways to label two points And it is possible to draw a line that separates positive and negative points in all four cases Linear functions are expressive enough to shatter 2 points What about fourteen points? 19

  20. Let’s think about expressivity of functions There are four ways to label two points And it is possible to draw a line that separates positive and negative points in all four cases We say that linear functions are expressive enough to shatter two points What about fourteen points? 20

  21. Let’s think about expressivity of functions There are four ways to label two points And it is possible to draw a line that separates positive and negative points in all four cases We say that linear functions are expressive enough to shatter two points What about fourteen points? 21

  22. Shattering 22

  23. Shattering 23

  24. Shattering 24

  25. Shattering What about this labeling? 25

  26. Shattering This particular labeling of the points cannot be separated by any line 26

  27. Shattering This particular labeling of the points cannot be separated by any line 27

  28. Shattering This particular labeling of the points cannot be separated by any line 28

  29. Shattering This particular labeling of the points cannot be separated by any line 29

  30. Shattering Linear functions are not expressive enough to shatter fourteen points Because there is at least one labeling that can not be separated by them 30

  31. Shattering Linear functions are not expressive enough to shatter fourteen points Because there is at least one labeling that can not be separated by them Of course, a more complex function could separate them 31

  32. Shattering Definition : A set S of examples is shattered by a set of functions H if for every partition of the examples in S into positive and negative examples there is a function in H that gives exactly these labels to the examples Intuition : A rich set of functions shatters large sets of points 32

  33. Shattering Definition : A set S of examples is shattered by a set of functions H if for every partition of the examples in S into positive and negative examples there is a function in H that gives exactly these labels to the examples Intuition : A rich set of functions shatters large sets of points Example 1: Hypothesis class of left bounded intervals on the real axis: [0,a) for some real number a>0 𝑏 0 Points in this region Points outside the will be labeled as shaded region will be positive labeled as negative 33

  34. Left bounded intervals Example 1: Hypothesis class of left bounded intervals on the real axis: [0,a) for some real number a>0 34

  35. Left bounded intervals Example 1: Hypothesis class of left bounded intervals on the real axis: [0,a) for some real number a>0 0 If we have a set S with only this one point 35

  36. Left bounded intervals Example 1: Hypothesis class of left bounded intervals on the real axis: [0,a) for some real number a>0 + 𝑏 0 If we have a If the point is labeled +, we set S with only can find an 𝑏 that is to the this one point right of that point This hypothesis correctly labels the point as positive 36

  37. Left bounded intervals Example 1: Hypothesis class of left bounded intervals on the real axis: [0,a) for some real number a>0 − 𝑏 0 If we have a If the point is labeled − , we set S with only can find an 𝑏 that is to the this one point right of that point This hypothesis correctly labels the point as negative 37

  38. Left bounded intervals Example 1: Hypothesis class of left bounded intervals on the real axis: [0,a) for some real number a>0 − 𝑏 0 If we have a If the point is labeled − , we set S with only can find an 𝑏 that is to the this one point right of that point This hypothesis correctly labels the point as negative Any set of one point can be shattered by the hypothesis class of left bounded intervals 38

  39. Left bounded intervals Example 1: Hypothesis class of left bounded intervals on the real axis: [0,a) for some real number a>0 Let us consider a set with two points 0 If we have a set S with these two points 39

  40. Left bounded intervals Example 1: Hypothesis class of left bounded intervals on the real axis: [0,a) for some real number a>0 Let us consider a set with two points 0 If we have a set S with these two points We can label the points such that no hypothesis in our class can match the labels 40

  41. Left bounded intervals Example 1: Hypothesis class of left bounded intervals on the real axis: [0,a) for some real number a>0 Let us consider a set with two points + − 0 If we have a set S with these two points We can label the points such that no hypothesis in our class can match the labels 41

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend