Learning From Data Lecture 14 Three Learning Principles Occams - PowerPoint PPT Presentation

Learning From Data Lecture 14 Three Learning Principles Occam’s Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100

recap: Validation and Cross Validation Validation Cross Validation D ( N ) D 1 D 2 · · · D N D train D ( N − K ) · · · g 1 g 2 g N ( x 1 , y 1 ) ( x 2 , y 2 ) ( x N , y N ) g D val · · · ( K ) e 1 e 2 e N � �� take average g E val ( g ) g E cv · · · H 1 H 2 H 3 H M Model Selection − − − − − − − − − − − − → → → → · · · g 1 g 2 g 3 g M M Three Learning Principles : 2 /58 � A c L Creator: Malik Magdon-Ismail Occam, bias, snooping − →

We Will Discuss . . . • Occam’s Razor : pick a model carefully • Sampling Bias : generate the data carefuly • Data Snooping : handle the data carefully M Three Learning Principles : 3 /58 � A c L Creator: Malik Magdon-Ismail Occam’s Razor − →

Occam’s Razor M Three Learning Principles : 4 /58 � A c L Creator: Malik Magdon-Ismail Occam − →

Occam’s Razor use a ‘ razor ’ to ‘trim down’ “an explanation of the data to make it as simple as possible but no simpler.” attributed to William of Occam (14th Century) and often mistakenly to Einstein M Three Learning Principles : 5 /58 � A c L Creator: Malik Magdon-Ismail Simpler is Better − →

Simpler is Better The simplest model that fits the data is also the most plausible. . . . or, beware of using complex models to fit data M Three Learning Principles : 6 /58 � A c L Creator: Malik Magdon-Ismail What is Simpler? − →

What is Simpler? simple hypothesis h simple hypothesis set H Ω( h ) Ω( H ) low order polynomial H with small d vc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small We had a glimpse of this: λ soft order constraint (smaller H ) → minimize E aug (favors simpler h ) . ← − − − − M Three Learning Principles : 7 /58 � A c L Creator: Malik Magdon-Ismail What is Simpler? − →

What is Simpler? simple hypothesis h simple hypothesis set H Ω( h ) Ω( H ) low order polynomial H with small d vc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small We had a glimpse of this: λ soft order constraint (smaller H ) → minimize E aug (favors simpler h ) . ← − − − − M Three Learning Principles : 8 /58 � A c L Creator: Malik Magdon-Ismail What is Simpler? − →

What is Simpler? simple hypothesis h simple hypothesis set H Ω( h ) Ω( H ) low order polynomial H with small d vc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small We had a glimpse of this: λ soft order constraint (smaller H ) ← − − − − → minimize E aug (favors simpler h ) . M Three Learning Principles : 9 /58 � A c L Creator: Malik Magdon-Ismail Why is Simpler Better − →

Why is Simpler Better Mathematically: simple curtails ability to fit noise, VC-dimension is small, and blah and blah . . . simpler is better because you will be more “surprised” when you fit the data. If something unlikely happens, it is very significant when it happens. . . . “Is there any other point to which you would wish to draw my attention?” Detective Gregory: Sherlock Holmes: “To the curious incident of the dog in the night-time.” “The dog did nothing in the night-time.” Detective Gregory: “That was the curious incident.” Sherlock Holmes: . . . – Silver Blaze , Sir Arthur Conan Doyle M Three Learning Principles : 10 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 3 resistivity ρ temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 11 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 2 Scientist 3 resistivity ρ resistivity ρ temperature T temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 12 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 13 /58 � A c L Creator: Malik Magdon-Ismail Scientific Experiment − →

A Scientific Experiment Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence for the hypothesis “ ρ is linear in T ”? M Three Learning Principles : 14 /58 � A c L Creator: Malik Magdon-Ismail Scientist 2 vs. 3 − →

Scientist 2 Versus Scientist 3 Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence? M Three Learning Principles : 15 /58 � A c L Creator: Malik Magdon-Ismail Scientist 1 vs. 3 − →

Scientist 1 versus Scientist 3 Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ resistivity ρ temperature T temperature T temperature T no evidence very convincing some evidence? Who provides most evidence? M Three Learning Principles : 16 /58 � A c L Creator: Malik Magdon-Ismail Non-Falsifiability − →

Axiom of Non-Falsifiability Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3 resistivity ρ resistivity ρ temperature T temperature T no evidence very convincing some evidence? Who provides most evidence? M Three Learning Principles : 17 /58 � A c L Creator: Malik Magdon-Ismail Falsification and m H ( N ) − →

Falsification and m H ( N ) If H shatters x 1 , · · · , x N , – Don’t be surprised if you fit the data. – Can’t falsify “ H is a good set of candidate hypotheses for f ”. If H doesn’t shatter x 1 , · · · , x N , and the target values are uniformly distributed, P [falsification] ≥ 1 − m H ( N ) . 2 N A good fit is surprising with simple H , hence significant. You can, but didn’t falsify “ H is a good set of candidate hypotheses for f ” The data must have a chance to win. M Three Learning Principles : 18 /58 � A c L Creator: Malik Magdon-Ismail Falsification and m H ( N ) − →

Falsification and m H ( N ) If H shatters x 1 , · · · , x N , – Don’t be surprised if you fit the data. – Can’t falsify “ H is a good set of candidate hypotheses for f ”. If H doesn’t shatter x 1 , · · · , x N , and the target values are uniformly distributed, P [falsification] ≥ 1 − m H ( N ) . 2 N A good fit is surprising with simple H , hence significant. You can, but didn’t falsify “ H is a good set of candidate hypotheses for f ” The data must have a chance to win. M Three Learning Principles : 19 /58 � A c L Creator: Malik Magdon-Ismail Falsification and m H ( N ) − →

Falsification and m H ( N ) If H shatters x 1 , · · · , x N , – Don’t be surprised if you fit the data. – Can’t falsify “ H is a good set of candidate hypotheses for f ”. If H doesn’t shatter x 1 , · · · , x N , and the target values are uniformly distributed, P [falsification] ≥ 1 − m H ( N ) . 2 N A good fit is surprising with simple H , hence significant. You can, but didn’t falsify “ H is a good set of candidate hypotheses for f ” The data must have a chance to win. M Three Learning Principles : 20 /58 � A c L Creator: Malik Magdon-Ismail Beyond Occam − →

Learning From Data Lecture 14 Three Learning Principles Occams - PowerPoint PPT Presentation

Learning From Data Lecture 14 Three Learning Principles Occams Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100 recap: Validation and Cross Validation Validation Cross Validation D ( N ) D 1 D 2 D N D train D

Principles Principles Principles Principles of a well of a well of a well of a well- - -

4 OO Package Design Principles 4.1 Packages Introduction 4.2 Packages in UML 4.3 Three

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

MACBETH Revision Day Slides ACT 1 SCENE 1 Note the significance of the number three: three

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Data Visualization Principles: Color CSC444 Acknowledgments for todays lecture: Tamara

Consumer Privacy Protection Principles: Privacy Principles for Vehicle Technologies and Service

Principles Seven Principles Rigor and Formality Rigor and Formality Rigor is the

On the Level of Teaching Heaven Teaching Principles Earth 1 Four Principles Stepwise

Principles of Learning Assorted Principles for More Effective Training VP University

Cognitive Principles in Tutor & Cognitive Tutor Principles e-Learning Design

GJGNY Advisory Council Meeting May 13, 2016 2 Principles Three principles need to be

You can revolutionize a business with just three core principles of simplicity. You can

Data Visualization Principles: Color CSC444 Acknowledgments for todays lecture: Tamara

Mod 3 Unit 7 Lesson 5 Three Dimensional Space Lecture Slides.notebook April 29, 2015 1 Mod 3

Chapter 6 Data Types CSE 130 Programming Language Principles & Paradigms Lecture # 8

Behavioral Health Health Information Technology Learning Collaborative We will start the event

General Track: Telehealth in Physical Therapy From the computer screen to the Clinic (Part I

Webinar Recording: COVID-19 and the CV Service Line: Setting up Telehealth in Your Office - Part

-arrestins in GPCR Desensitization How Lisp Will Save the World 15,596 abstracts 15

What can a 1980s BASIC programming textbook teach us today? Martin Lester Department of

C lt Cultural Heritage in CLEF (CHiC) 2012 l H it i CLEF (CHiC) 2012 Pilot Lab Overview

3/23/2014 Stage classification % at 5-year survival diagnosis Localized 8 20% Treatment of

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Learning From Data Lecture 14 Three Learning Principles Occams - PowerPoint PPT Presentation

Learning From Data Lecture 14 Three Learning Principles Occams Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100 recap: Validation and Cross Validation Validation Cross Validation D ( N ) D 1 D 2 D N D train D

Principles Principles Principles Principles of a well of a well of a well of a well- - -

4 OO Package Design Principles 4.1 Packages Introduction 4.2 Packages in UML 4.3 Three

Principles and Patterns 26 February, 2020 Recap Principles Patterns Inheritance Anti-patterns

MACBETH Revision Day Slides ACT 1 SCENE 1 Note the significance of the number three: three

Malaysian Healthy Ageing Society Plenary Lecture Plenary Lecture Plenary Lecture Plenary

Data Visualization Principles: Color CSC444 Acknowledgments for todays lecture: Tamara

Consumer Privacy Protection Principles: Privacy Principles for Vehicle Technologies and Service

Principles Seven Principles Rigor and Formality Rigor and Formality Rigor is the

On the Level of Teaching Heaven Teaching Principles Earth 1 Four Principles Stepwise

Principles of Learning Assorted Principles for More Effective Training VP University

Cognitive Principles in Tutor &amp; Cognitive Tutor Principles e-Learning Design

GJGNY Advisory Council Meeting May 13, 2016 2 Principles Three principles need to be

You can revolutionize a business with just three core principles of simplicity. You can

Data Visualization Principles: Color CSC444 Acknowledgments for todays lecture: Tamara

Mod 3 Unit 7 Lesson 5 Three Dimensional Space Lecture Slides.notebook April 29, 2015 1 Mod 3

Chapter 6 Data Types CSE 130 Programming Language Principles &amp; Paradigms Lecture # 8

Behavioral Health Health Information Technology Learning Collaborative We will start the event

General Track: Telehealth in Physical Therapy From the computer screen to the Clinic (Part I

Webinar Recording: COVID-19 and the CV Service Line: Setting up Telehealth in Your Office - Part

-arrestins in GPCR Desensitization How Lisp Will Save the World 15,596 abstracts 15

What can a 1980s BASIC programming textbook teach us today? Martin Lester Department of

C lt Cultural Heritage in CLEF (CHiC) 2012 l H it i CLEF (CHiC) 2012 Pilot Lab Overview

3/23/2014 Stage classification % at 5-year survival diagnosis Localized 8 20% Treatment of

Pathways analysis in proteomics Angela Bachi Dibit-San Raffaele Scientific Institute, Milano

Cognitive Principles in Tutor & Cognitive Tutor Principles e-Learning Design

Chapter 6 Data Types CSE 130 Programming Language Principles & Paradigms Lecture # 8