learning from data lecture 14 three learning principles
play

Learning From Data Lecture 14 Three Learning Principles Occams - PowerPoint PPT Presentation

Learning From Data Lecture 14 Three Learning Principles Occams Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100 recap: Validation and Cross Validation Validation Cross Validation D ( N ) D 1 D 2 D N D train D


  1. Learning From Data Lecture 14 Three Learning Principles Occam’s Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100

  2. recap: Validation and Cross Validation Validation Cross Validation D ( N ) D 1 D 2 · · · D N D train D ( N − K ) · · · g 1 g 2 g N ( x 1 , y 1 ) ( x 2 , y 2 ) ( x N , y N ) g D val · · · ( K ) e 1 e 2 e N � �� � take average g E val ( g ) g E cv · · · H 1 H 2 H 3 H M Model Selection − − − − − − − − − − − − → → → → · · · g 1 g 2 g 3 g M M Three Learning Principles : 2 /58 � A c L Creator: Malik Magdon-Ismail Occam, bias, snooping − →

  3. We Will Discuss . . . • Occam’s Razor : pick a model carefully • Sampling Bias : generate the data carefuly • Data Snooping : handle the data carefully M Three Learning Principles : 3 /58 � A c L Creator: Malik Magdon-Ismail Occam’s Razor − →

  4. Occam’s Razor M Three Learning Principles : 4 /58 � A c L Creator: Malik Magdon-Ismail Occam − →

  5. Occam’s Razor use a ‘ razor ’ to ‘trim down’ “an explanation of the data to make it as simple as possible but no simpler.” attributed to William of Occam (14th Century) and often mistakenly to Einstein M Three Learning Principles : 5 /58 � A c L Creator: Malik Magdon-Ismail Simpler is Better − →

  6. Simpler is Better The simplest model that fits the data is also the most plausible. . . . or, beware of using complex models to fit data M Three Learning Principles : 6 /58 � A c L Creator: Malik Magdon-Ismail What is Simpler? − →

  7. What is Simpler? simple hypothesis h simple hypothesis set H Ω( h ) Ω( H ) low order polynomial H with small d vc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small We had a glimpse of this: λ soft order constraint (smaller H ) → minimize E aug (favors simpler h ) . ← − − − − M Three Learning Principles : 7 /58 � A c L Creator: Malik Magdon-Ismail What is Simpler? − →

  8. What is Simpler? simple hypothesis h simple hypothesis set H Ω( h ) Ω( H ) low order polynomial H with small d vc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small We had a glimpse of this: λ soft order constraint (smaller H ) → minimize E aug (favors simpler h ) . ← − − − − M Three Learning Principles : 8 /58 � A c L Creator: Malik Magdon-Ismail What is Simpler? − →

  9. What is Simpler? simple hypothesis h simple hypothesis set H Ω( h ) Ω( H ) low order polynomial H with small d vc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small We had a glimpse of this: λ soft order constraint (smaller H ) ← − − − − → minimize E aug (favors simpler h ) . M Three Learning Principles : 9 /58 � A c L Creator: Malik Magdon-Ismail Why is Simpler Better − →

  10. Why is Simpler Better Mathematically: simple curtails ability to fit noise, VC-dimension is small, and blah and blah . . . simpler is better because you will be more “surprised” when you fit the data. If something unlikely happens, it is very significant when it happens. . . . “Is there any other point to which you would wish to draw my attention?” Detective Gregory: Sherlock Holmes: “To the curious incident of the dog in the night-time.” “The dog did nothing in the night-time.” Detective Gregory: “That was the curious incident.” Sherlock Holmes: . . . – Silver Blaze , Sir Arthur Conan Doyle M Three Learning Principles : 10 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

  11. A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 3 resistivity ρ temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 11 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

  12. A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 2 Scientist 3 resistivity ρ resistivity ρ temperature T temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 12 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

  13. A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 13 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

  14. A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 14 /58 � A c L Creator: Malik Magdon-Ismail Scientist 2 vs. 3 − →

  15. Scientist 2 Versus Scientist 3 Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence? M Three Learning Principles : 15 /58 � A c L Creator: Malik Magdon-Ismail Scientist 1 vs. 3 − →

  16. Scientist 1 versus Scientist 3 Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence? M Three Learning Principles : 16 /58 � A c L Creator: Malik Magdon-Ismail Non-Falsifiability − →

  17. Axiom of Non-Falsifiability Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ temperature T temperature T no evidence very convincing some evidence? Who provides most evidence? M Three Learning Principles : 17 /58 � A c L Creator: Malik Magdon-Ismail Falsification and m H ( N ) − →

  18. Falsification and m H ( N ) If H shatters x 1 , · · · , x N , – Don’t be surprised if you fit the data. – Can’t falsify “ H is a good set of candidate hypotheses for f ”. If H doesn’t shatter x 1 , · · · , x N , and the target values are uniformly distributed, P [falsification] ≥ 1 − m H ( N ) . 2 N A good fit is surprising with simple H , hence significant. You can, but didn’t falsify “ H is a good set of candidate hypotheses for f ” The data must have a chance to win. M Three Learning Principles : 18 /58 � A c L Creator: Malik Magdon-Ismail Falsification and m H ( N ) − →

  19. Falsification and m H ( N ) If H shatters x 1 , · · · , x N , – Don’t be surprised if you fit the data. – Can’t falsify “ H is a good set of candidate hypotheses for f ”. If H doesn’t shatter x 1 , · · · , x N , and the target values are uniformly distributed, P [falsification] ≥ 1 − m H ( N ) . 2 N A good fit is surprising with simple H , hence significant. You can, but didn’t falsify “ H is a good set of candidate hypotheses for f ” The data must have a chance to win. M Three Learning Principles : 19 /58 � A c L Creator: Malik Magdon-Ismail Falsification and m H ( N ) − →

  20. Falsification and m H ( N ) If H shatters x 1 , · · · , x N , – Don’t be surprised if you fit the data. – Can’t falsify “ H is a good set of candidate hypotheses for f ”. If H doesn’t shatter x 1 , · · · , x N , and the target values are uniformly distributed, P [falsification] ≥ 1 − m H ( N ) . 2 N A good fit is surprising with simple H , hence significant. You can, but didn’t falsify “ H is a good set of candidate hypotheses for f ” The data must have a chance to win. M Three Learning Principles : 20 /58 � A c L Creator: Malik Magdon-Ismail Beyond Occam − →

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend