learning from data lecture 5 training versus testing
play

Learning From Data Lecture 5 Training Versus Testing The Two - PowerPoint PPT Presentation

Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of Generalization ( E in E out ) An Effective Number of Hypotheses A Combinatorial Puzzle M. Magdon-Ismail CSCI 4100/6100 recap: The Two Questions


  1. Learning From Data Lecture 5 Training Versus Testing The Two Questions of Learning Theory of Generalization ( E in ≈ E out ) An Effective Number of Hypotheses A Combinatorial Puzzle M. Magdon-Ismail CSCI 4100/6100

  2. recap: The Two Questions of Learning 1. Can we make sure that E out ( g ) is close enough to E in ( g )? 2. Can we make E in ( g ) small enough? out-of-sample error The Hoeffding generalization bound : model complexity � Error 2 N ln 2 |H| 1 E out ( g ) ≤ E in ( g ) + δ in-sample error � �� � generalization error bar |H| |H| ∗ E in : training (eg. the practice exam) There is a tradeoff when picking |H| . E out : testing (eg. the real exam) M Training Versus Testing : 2 /18 � A c L Creator: Malik Magdon-Ismail Goal of generalization theory − →

  3. What Will The Theory of Generalization Achieve? out-of-sample error model complexity � Error 2 N ln 2 |H| 1 E out ( g ) ≤ E in ( g ) + δ in-sample error |H| |H| ∗ ↓ out-of-sample error model complexity � Error N ln 4 m H 8 E out ( g ) ≤ E in ( g ) + δ in-sample error model complexity The new bound will be applicable to infinite H . M Training Versus Testing : 3 /18 � A c L Creator: Malik Magdon-Ismail |H| is overkill − →

  4. Why is |H| an Overkill How did |H| come in? B ad events B 2 B 1 B g = {| E out ( g ) − E in ( g ) | > ǫ } B m = {| E out ( h m ) − E in ( h m ) | > ǫ } We do not know which g , so use a worst case union bound. |H| � P [ B g ] ≤ P [any B m ] ≤ P [ B m ] . B 3 m =1 • B m are events (sets of outcomes); they can overlap. • If the B m overlap, the union bound is loose. • If many h m are similar, the B m overlap. • There are “effectively” fewer than |H| hypotheses,. • We can replace |H| by something smaller. |H| fails to account for similarity between hypotheses. M Training Versus Testing : 4 /18 � A c L Creator: Malik Magdon-Ismail Measuring diversity on N points − →

  5. Measuring the Diversity (Size) of H We need a way to measure the diversity of H . A simple idea: Fix any set of N data points. If H is diverse it should be able to implement all functions . . . on these N points. M Training Versus Testing : 5 /18 � A c L Creator: Malik Magdon-Ismail Example: large H − →

  6. A Data Set Reveals the True Colors of an H H M Training Versus Testing : 6 /18 � A c L Creator: Malik Magdon-Ismail . . . through the eyes of D − →

  7. A Data Set Reveals the True Colors of an H H H through the eyes of the D M Training Versus Testing : 7 /18 � A c L Creator: Malik Magdon-Ismail Just one dichotomy − →

  8. A Data Set Reveals the True Colors of an H From the point of view of D , the entire H is just one dichotomy . M Training Versus Testing : 8 /18 � A c L Creator: Malik Magdon-Ismail An effective number of hypotheses − →

  9. An Effective Number of Hypotheses If H is diverse it should be able to implement many dichotomys. |H| only captures the maximum possible diversity of H . Consider an h ∈ H , and a data set x 1 , . . . , x N . h gives us an N -tuple of ± 1’s: ( h ( x 1 ) , . . . , h ( x N )). A dichotomy of the inputs. If H is diverse, we get many different dichotomies. dichotomy If H contains similar functions, we only get a few dichotomies. The growth function quantifies this. M Training Versus Testing : 9 /18 � A c L Creator: Malik Magdon-Ismail Growth function − →

  10. The Growth Function m H ( N ) Define the the restriction of H to the inputs x 1 , x 2 , . . . , x N : H ( x 1 , . . . , x N ) = { ( h ( x 1 ) , . . . , h ( x N )) | h ∈ H} (set of dichotomies induced by H ) The Growth Function m H ( N ) The largest set of dichotomies induced by H : m H ( N ) = max x 1 ,..., x N |H ( x 1 , . . . , x N ) | . m H ( N ) ≤ 2 N . Can we replace |H| by m H , an effective number of hypotheses? � � � • Replacing |H| with 2 N is no help in the bound. (why?) 2 N ln 2 |H| 1 the error bar is δ • We want m H ( N ) ≤ poly(N) to get a useful error bar. M Training Versus Testing : 10 /18 � A c L Creator: Malik Magdon-Ismail Example: 2-d perceptron − →

  11. Example: 2-D Perceptron Model Cannot implement Can implement all 8 Can implement at most 14 m H (3) = 8 = 2 3 . m H (4) = 14 < 2 4 . What is m H (5)? M Training Versus Testing : 11 /18 � A c L Creator: Malik Magdon-Ismail Example: 1-d positive ray − →

  12. Example: 1-D Positive Ray Model + w 0 · · · x 1 x 2 · · · x N • h ( x ) = sign ( x − w 0 ) • Consider N points. • There are N + 1 dichotomies depending on where you put w 0 . • m H ( N ) = N + 1. M Training Versus Testing : 12 /18 � A c L Creator: Malik Magdon-Ismail Example: 2-d positive rectangle − →

  13. Example: Positive Rectangles in 2-D N = 4 N = 5 x 2 x 2 x 1 x 3 x 1 x 3 x 4 x 4 x 4 H implements all dichotomies some point will be inside a rectangle defined by others m H (4) = 2 4 m H (5) < 2 5 We have not computed m H (5) – not impossible, but tricky. M Training Versus Testing : 13 /18 � A c L Creator: Malik Magdon-Ismail The growth functions summarized − →

  14. Example Growth Functions N 1 2 3 4 5 · · · 2-D perceptron 2 4 8 14 · · · 1-D pos. ray 2 3 4 5 · · · < 2 5 · · · 2-D pos. rectangles 2 4 8 16 • m H ( N ) drops below 2 N – there is hope for the generalization bound. • A break point is any n for which m H ( n ) < 2 n . M Training Versus Testing : 14 /18 � A c L Creator: Malik Magdon-Ismail Combinatorial puzzle: dichotomys on 3 points − →

  15. A Combinatorial Puzzle x 1 x 2 x 3 ◦ ◦ ◦ ◦ ◦ • ◦ • ◦ ◦ • • A set of dichotomys M Training Versus Testing : 15 /18 � A c L Creator: Malik Magdon-Ismail Two points shattered − →

  16. A Combinatorial Puzzle x 1 x 2 x 3 ◦ ◦ ◦ ◦ ◦ • ◦ • ◦ ◦ • • Two points are shattered M Training Versus Testing : 16 /18 � A c L Creator: Malik Magdon-Ismail Another set of dichotomys − →

  17. A Combinatorial Puzzle x 1 x 2 x 3 ◦ ◦ ◦ ◦ ◦ • ◦ • ◦ • ◦ ◦ No pair of points is shattered M Training Versus Testing : 17 /18 � A c L Creator: Malik Magdon-Ismail What about N = 4? − →

  18. A Combinatorial Puzzle x 1 x 2 x 3 x 1 x 2 x 3 x 4 ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ • ◦ ◦ ◦ • . . ◦ • ◦ . • ◦ ◦ 4 dichotomies is max. If N = 4 how many possible dichotomys with no 2 points shattered? M Training Versus Testing : 18 /18 � A c L Creator: Malik Magdon-Ismail

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend