Learning From Data Lecture 14 Three Learning Principles Occams - - PowerPoint PPT Presentation

learning from data lecture 14 three learning principles
SMART_READER_LITE
LIVE PREVIEW

Learning From Data Lecture 14 Three Learning Principles Occams - - PowerPoint PPT Presentation

Learning From Data Lecture 14 Three Learning Principles Occams Razor Sampling Bias Data Snooping M. Magdon-Ismail CSCI 4100/6100 recap: Validation and Cross Validation Validation Cross Validation D ( N ) D 1 D 2 D N D train D


slide-1
SLIDE 1

Learning From Data Lecture 14 Three Learning Principles

Occam’s Razor Sampling Bias Data Snooping

  • M. Magdon-Ismail

CSCI 4100/6100

slide-2
SLIDE 2

recap: Validation and Cross Validation

Validation Cross Validation

Dval D

(N)

Dtrain

(N − K)

g

(K)

Eval(g ) g D1 D g g1 D2 · · · · · · Ecv

  • take average

gN g2

(x1, y1) (x2, y2) (xN, yN)

DN e1 e2 eN · · ·

Model Selection

H1 H2 H3

· · ·

HM − − − → − − − → − − − → − − − → g1 g2 g3

· · ·

gM

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 2 /58

Occam, bias, snooping − →

slide-3
SLIDE 3

We Will Discuss . . .

  • Occam’s Razor: pick a model carefully
  • Sampling Bias: generate the data carefuly
  • Data Snooping: handle the data carefully

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 3 /58

Occam’s Razor− →

slide-4
SLIDE 4

Occam’s Razor

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 4 /58

Occam − →

slide-5
SLIDE 5

Occam’s Razor

use a ‘razor’ to ‘trim down’

“an explanation of the data to make it as simple as possible but no simpler.”

attributed to William of Occam (14th Century) and often mistakenly to Einstein

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 5 /58

Simpler is Better − →

slide-6
SLIDE 6

Simpler is Better

The simplest model that fits the data is also the most plausible. . . . or, beware of using complex models to fit data

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 6 /58

What is Simpler? − →

slide-7
SLIDE 7

What is Simpler?

simple hypothesis h simple hypothesis set H Ω(h) Ω(H) low order polynomial H with small dvc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small

We had a glimpse of this: soft order constraint (smaller H) λ ← − − − − → minimize Eaug (favors simpler h).

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 7 /58

What is Simpler? − →

slide-8
SLIDE 8

What is Simpler?

simple hypothesis h simple hypothesis set H Ω(h) Ω(H) low order polynomial H with small dvc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small

We had a glimpse of this: soft order constraint (smaller H) λ ← − − − − → minimize Eaug (favors simpler h).

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 8 /58

What is Simpler? − →

slide-9
SLIDE 9

What is Simpler?

simple hypothesis h simple hypothesis set H Ω(h) Ω(H) low order polynomial H with small dvc hypothesis with small weights small number of hypotheses easily described hypothesis low entropy set . . . . . . The equivalence: A hypothesis set with simple hypotheses must be small

We had a glimpse of this: soft order constraint (smaller H) λ ← − − − − → minimize Eaug (favors simpler h).

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 9 /58

Why is Simpler Better − →

slide-10
SLIDE 10

Why is Simpler Better

Mathematically: simple curtails ability to fit noise, VC-dimension is small, and blah and blah . . .

simpler is better because you will be more “surprised” when you fit the data.

If something unlikely happens, it is very significant when it happens.

. . . Detective Gregory: “Is there any other point to which you would wish to draw my attention?” Sherlock Holmes: “To the curious incident of the dog in the night-time.” Detective Gregory: “The dog did nothing in the night-time.” Sherlock Holmes: “That was the curious incident.” . . . – Silver Blaze, Sir Arthur Conan Doyle

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 10 /58

Scientific Experiment − →

slide-11
SLIDE 11

A Scientific Experiment

  • Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of

that experiment provides no evidence one way or the other for the hypothesis. Scientist 3

temperature T resistivity ρ

no evidence very convincing some evidence?

Who provides most evidence for the hypothesis “ρ is linear in T”?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 11 /58

Scientific Experiment − →

slide-12
SLIDE 12

A Scientific Experiment

  • Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of

that experiment provides no evidence one way or the other for the hypothesis. Scientist 2 Scientist 3

temperature T resistivity ρ temperature T resistivity ρ

no evidence very convincing some evidence?

Who provides most evidence for the hypothesis “ρ is linear in T”?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 12 /58

Scientific Experiment − →

slide-13
SLIDE 13

A Scientific Experiment

  • Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of

that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3

temperature T resistivity ρ temperature T resistivity ρ temperature T resistivity ρ

no evidence very convincing some evidence?

Who provides most evidence for the hypothesis “ρ is linear in T”?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 13 /58

Scientific Experiment − →

slide-14
SLIDE 14

A Scientific Experiment

  • Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of

that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3

temperature T resistivity ρ temperature T resistivity ρ temperature T resistivity ρ

no evidence very convincing some evidence?

Who provides most evidence for the hypothesis “ρ is linear in T”?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 14 /58

Scientist 2 vs. 3 − →

slide-15
SLIDE 15

Scientist 2 Versus Scientist 3

  • Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of

that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3

temperature T resistivity ρ temperature T resistivity ρ temperature T resistivity ρ

no evidence very convincing some evidence?

Who provides most evidence?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 15 /58

Scientist 1 vs. 3 − →

slide-16
SLIDE 16

Scientist 1 versus Scientist 3

  • Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of

that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3

temperature T resistivity ρ temperature T resistivity ρ temperature T resistivity ρ

no evidence very convincing some evidence?

Who provides most evidence?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 16 /58

Non-Falsifiability − →

slide-17
SLIDE 17

Axiom of Non-Falsifiability

  • Axiom. If an experiment has no chance of falsifying a hypothesis, then the result of

that experiment provides no evidence one way or the other for the hypothesis. Scientist 1 Scientist 2 Scientist 3

temperature T resistivity ρ temperature T resistivity ρ

no evidence very convincing some evidence?

Who provides most evidence?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 17 /58

Falsification and mH(N) − →

slide-18
SLIDE 18

Falsification and mH(N)

If H shatters x1, · · · , xN,

– Don’t be surprised if you fit the data. – Can’t falsify “H is a good set of candidate hypotheses for f”.

If H doesn’t shatter x1, · · · , xN, and the target values are uniformly distributed,

P[falsification] ≥ 1 − mH(N)

2N . A good fit is surprising with simple H, hence significant. You can, but didn’t falsify “H is a good set of candidate hypotheses for f”

The data must have a chance to win.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 18 /58

Falsification and mH(N) − →

slide-19
SLIDE 19

Falsification and mH(N)

If H shatters x1, · · · , xN,

– Don’t be surprised if you fit the data. – Can’t falsify “H is a good set of candidate hypotheses for f”.

If H doesn’t shatter x1, · · · , xN, and the target values are uniformly distributed,

P[falsification] ≥ 1 − mH(N)

2N . A good fit is surprising with simple H, hence significant. You can, but didn’t falsify “H is a good set of candidate hypotheses for f”

The data must have a chance to win.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 19 /58

Falsification and mH(N) − →

slide-20
SLIDE 20

Falsification and mH(N)

If H shatters x1, · · · , xN,

– Don’t be surprised if you fit the data. – Can’t falsify “H is a good set of candidate hypotheses for f”.

If H doesn’t shatter x1, · · · , xN, and the target values are uniformly distributed,

P[falsification] ≥ 1 − mH(N)

2N . A good fit is surprising with simple H, hence significant. You can, but didn’t falsify “H is a good set of candidate hypotheses for f”

The data must have a chance to win.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 20 /58

Beyond Occam − →

slide-21
SLIDE 21

Learning Goes Beyond Occam’s Razor

We may opt for ‘a simpler fit than possible’, namely an imperfect fit of the data using a simple model over a perfect fit using a more complex one. The reason is that the price we pay for a perfect fit in terms of the penalty for model complexity may be too much in comparison to the benefit of the better fit.

– Learning From Data, Abu-Mostafa, Magdon-Ismail, Lin

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 21 /58

Postal Scam− →

slide-22
SLIDE 22

Postal Scam

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 22 /58

Puzzle 1: football oracle − →

slide-23
SLIDE 23

A Puzzle – The Football Oracle

Saturday, Oct 13, 2012

Home team will win the Monday Night Footbal Game.

This happens for 5 weeks in a row.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 23 /58

Got it right − →

slide-24
SLIDE 24

A Puzzle – The Football Oracle

Saturday, Oct 13, 2012

Home team will win the Monday Night Footbal Game.

This happens for 5 weeks in a row.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 24 /58

Got it right − →

slide-25
SLIDE 25

A Puzzle – The Football Oracle

Saturday, Oct 13, 2012

Home team will win the Monday Night Footbal Game.

This happens for 5 weeks in a row.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 25 /58

Pay for more predictions − →

slide-26
SLIDE 26

A Puzzle – The Football Oracle . . . on the 6th week

Saturday, Nov 17, 2012

Call 1-900-555-5555 for winner; $50 charge applied

Ein = 0! Meaningless without knowing the ‘complexity’ of the process leading to that!

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 26 /58

Oracle is a single predictor − →

slide-27
SLIDE 27

What did the Oracle Really Do?

you

day 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

day 2

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0

day 3

1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0

day 4

1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

day 5

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 Single hypothesis that worked?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 27 /58

Oracle is every hypothesis − →

slide-28
SLIDE 28

What did the Oracle Really Do?

you

day 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

day 2

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0

day 3

1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0

day 4

1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

day 5

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 Every possible hypothesis one of which worked?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 28 /58

Oracle is every hypothesis − →

slide-29
SLIDE 29

What did the Oracle Really Do?

you

day 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

day 2

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0

day 3

1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0

day 4

1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

day 5

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 Every possible hypothesis one of which worked?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 29 /58

Oracle is every hypothesis − →

slide-30
SLIDE 30

What did the Oracle Really Do?

you

day 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

day 2

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0

day 3

1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0

day 4

1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

day 5

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 Every possible hypothesis one of which worked?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 30 /58

Oracle is every hypothesis − →

slide-31
SLIDE 31

What did the Oracle Really Do?

you

day 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

day 2

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0

day 3

1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0

day 4

1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

day 5

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 Every possible hypothesis one of which worked?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 31 /58

Oracle is every hypothesis − →

slide-32
SLIDE 32

What did the Oracle Really Do?

you

day 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

day 2

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0

day 3

1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0

day 4

1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

day 5

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 Every possible hypothesis one of which worked?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 32 /58

Oracle is every hypothesis − →

slide-33
SLIDE 33

What did the Oracle Really Do?

you

day 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

day 2

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0

day 3

1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0

day 4

1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0

day 5

1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 Every possible hypothesis one of which worked?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 33 /58

Pay for more predictions − →

slide-34
SLIDE 34

A Puzzle – The Football Oracle . . . on the 6th week

Saturday, Nov 17, 2012

Call 1-900-555-5555 for winner; $50 charge applied

Ein = 0! Meaningless without the ‘complexity’ of the process leading to that!

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 34 /58

Sampling bias − →

slide-35
SLIDE 35

We Will Discuss . . .

  • Occam’s Razor: pick a model carefully
  • Sampling Bias: generate the data carefuly
  • Data Snooping: handle the data carefully

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 35 /58

Sampling Bias− →

slide-36
SLIDE 36

Sampling Bias

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 36 /58

Dewey Defeats Truman − →

slide-37
SLIDE 37

November 3rd 1948, Dewey Defeats Truman

Tribune wanted to show off its latest technology

could go earlier to press.

Telephone poll on how people voted

statisticians had done their thing and were confident.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 37 /58

Truman defeats Dewey − →

slide-38
SLIDE 38

Imagine Their Surprise When . . .

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 38 /58

Sampling Bias in Learning − →

slide-39
SLIDE 39

Sampling Bias in Learning

If the data is sampled in a biased way, learning will produce a similarly biased outcome. . . . or, make sure the training and test distributions are the same. You cannot draw a sample from one bin and make claims about another bin

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 39 /58

Examples − →

slide-40
SLIDE 40

Examples

  • Kids and social media – the highlight reel.
  • Taller, Fatter, Older: How Humans Have Changed in 100 Years.
  • The GRE: A test that fails.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 40 /58

Extrapolation − →

slide-41
SLIDE 41

Extrapolation

Amazon Ranking # Copies Sold

2000 4000 6000 8000 20 40 60 80 100

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 41 /58

Extrapolation is Hard − →

slide-42
SLIDE 42

Extrapolation is Hard

Amazon Ranking # Copies Sold

2000 4000 6000 8000 20 40 60 80 100

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 42 /58

Dealing with Mismatch − →

slide-43
SLIDE 43

Dealing with the Training-Test Mismatch

Think more carefully about what f should look like

Need some additional help outside the data, by choosing a good H In our ranking example, account for the fat tail − → hyperbola

Amazon Ranking # Copies Sold

2000 4000 6000 8000 20 40 60 80 100

(hyperbola fit)

Account for the training-test mismatch during learning

There are methods that reweight/resample data can help If test data have zero representation in training, you are in trouble — Think carefully about f

Amazon Ranking Probability

2000 4000 6000 8000 10−3

(test versus training distributions)

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 43 /58

Puzzle – credit analysis − →

slide-44
SLIDE 44

Puzzle - Credit Analysis

  • Determine credit given salary, debt, years in residence, . . . .
  • Banks have lots of data

– customer information: salary, debt, etc. – whether or not they defaulted on their credit. age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Approve for credit?

where is the sampling bias?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 44 /58

Bias in approvals − →

slide-45
SLIDE 45

Puzzle - Credit Analysis

  • Determine credit given salary, debt, years in residence, . . . .
  • Banks have lots of data

– customer information: salary, debt, etc. – whether or not who? defaulted on their credit. age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Approve for credit?

  • nly data on approved customers

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 45 /58

Data Snooping − →

slide-46
SLIDE 46

We Will Discuss . . .

  • Occam’s Razor: pick a model carefully
  • Sampling Bias: generate the data carefuly
  • Data Snooping: handle the data carefully

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 46 /58

Data Snooping− →

slide-47
SLIDE 47

Data Snooping

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 47 /58

Data snooping definition − →

slide-48
SLIDE 48

Data Snooping

If a data set has affected any step in the learning process, it cannot be fully trusted in assessing the outcome. . . . or, estimate performance with a completely uncontaminated test set . . . and, choose H before looking at the data

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 48 /58

Puzzle – buy and hold on ‘S&P’ − →

slide-49
SLIDE 49

Puzzle: The Buy and Hold Strategy on S&P 500 Stocks

16.2% return

1985 1990 1995 2000 2005 2010

Sampling Bias: didn’t buy and hold a random sample of stocks. Snooping: Choose which stocks to hold by ‘snooping’ into the test set (the future).

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 49 /58

Actual S&P − →

slide-50
SLIDE 50

Puzzle: The Buy and Hold Strategy on S&P 500 Stocks

16.2% return

snooping/sampling bias actual S&P

8.3% return

1985 1990 1995 2000 2005 2010

Sampling Bias: didn’t buy and hold a random sample of stocks. Snooping: Choose which stocks to hold by ‘snooping’ into the test set (the future).

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 50 /58

Data snooping is subtle − →

slide-51
SLIDE 51

Data Snooping is a Subtle Happy Hell

  • The data looks linear, so I will use a linear model, and it worked.

If the data were different and didn’t look linear, would you do something different?

  • Try linear, it fails; try circles it works.

If you torture the data enough, it will confess.

  • Try linear, it works; so I don’t need to try circles.

Would you have tried circles if the data were different?

  • Read papers, see what others did on the data. Modify and improve on that.

If the data were different, would that modify what others did and hence what you did? the data snooping can happen all at once or sequentially by different people

  • Input normalization: normalize the data, now set aside the test set.

Since the test set was involved in the normalization, wouldn’t your g change if the test set changed?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 51 /58

Data snooping is subtle − →

slide-52
SLIDE 52

Data Snooping is a Subtle Happy Hell

  • The data looks linear, so I will use a linear model, and it worked.

If the data were different and didn’t look linear, would you do something different?

  • Try linear, it fails; try circles it works.

If you torture the data enough, it will confess.

  • Try linear, it works; so I don’t need to try circles.

Would you have tried circles if the data were different?

  • Read papers, see what others did on the data. Modify and improve on that.

If the data were different, would that modify what others did and hence what you did? the data snooping can happen all at once or sequentially by different people

  • Input normalization: normalize the data, now set aside the test set.

Since the test set was involved in the normalization, wouldn’t your g change if the test set changed?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 52 /58

Data snooping is subtle − →

slide-53
SLIDE 53

Data Snooping is a Subtle Happy Hell

  • The data looks linear, so I will use a linear model, and it worked.

If the data were different and didn’t look linear, would you do something different?

  • Try linear, it fails; try circles it works.

If you torture the data enough, it will confess.

  • Try linear, it works; so I don’t need to try circles.

Would you have tried circles if the data were different?

  • Read papers, see what others did on the data. Modify and improve on that.

If the data were different, would that modify what others did and hence what you did? the data snooping can happen all at once or sequentially by different people

  • Input normalization: normalize the data, now set aside the test set.

Since the test set was involved in the normalization, wouldn’t your g change if the test set changed?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 53 /58

Data snooping is subtle − →

slide-54
SLIDE 54

Data Snooping is a Subtle Happy Hell

  • The data looks linear, so I will use a linear model, and it worked.

If the data were different and didn’t look linear, would you do something different?

  • Try linear, it fails; try circles it works.

If you torture the data enough, it will confess.

  • Try linear, it works; so I don’t need to try circles.

Would you have tried circles if the data were different?

  • Read papers, see what others did on the data. Modify and improve on that.

If the data were different, would that modify what others did and hence what you did? the data snooping can happen all at once or sequentially by different people

  • Input normalization: normalize the data, now set aside the test set.

Since the test set was involved in the normalization, wouldn’t your g change if the test set changed?

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 54 /58

Account for data snooping − →

slide-55
SLIDE 55

Account for Data Snooping

Ask yourself: “If the data were different, could/would I have done something different?”

if yes, then there is data snooping.

D

your choices

− → g

You must account for every choice influenced by D. We know how to account for the choice of g from H.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 55 /58

Account for data snooping − →

slide-56
SLIDE 56

Account for Data Snooping

Ask yourself: “If the data were different, could/would I have done something different?”

if yes, then there is data snooping.

?

h ∈ H

? ?

g Data

D

your choices

− → g

You must account for every choice influenced by D. We know how to account for the choice of g from H.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 56 /58

Account for all data snooping − →

slide-57
SLIDE 57

Account for Data Snooping

Ask yourself: “If the data were different, could/would I have done something different?”

if yes, then there is data snooping.

?

h ∈ H

? ?

g Data

D

your choices

− → g

You must account for every choice influenced by D. We know how to account for the choice of g from H.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 57 /58

Three Learning Principles − →

slide-58
SLIDE 58

Three Learning Principles

  • Occam’s Razor: pick a model carefully

Simpler H is better.

  • Sampling Bias: generate the data carefuly

Make sure you train and test from the same bin.

  • Data Snooping: handle the data carefully

Account for all choices the data influenced. Choose H before you see the data.

c A M L Creator: Malik Magdon-Ismail

Three Learning Principles: 58 /58