Age Age Age Age E out ( h 1 ) E out ( h 2 ) E out ( h 3 ) E - - PowerPoint PPT Presentation

age age age age e out h 1 e out h 2 e out h 3 e out h m
SMART_READER_LITE
LIVE PREVIEW

Age Age Age Age E out ( h 1 ) E out ( h 2 ) E out ( h 3 ) E - - PowerPoint PPT Presentation

recap: Verification h Income Learning From Data Lecture 4 Real Learning is Feasible Age Hoeffding: E out ( h ) E in ( h ) (with high probability) E out ( h ) Real Learning vs. Verification The Two Step Solution to Learning D Closer


slide-1
SLIDE 1

Learning From Data Lecture 4 Real Learning is Feasible

Real Learning vs. Verification The Two Step Solution to Learning Closer to Reality: Error and Noise

  • M. Magdon-Ismail

CSCI 4100/6100 recap: Verification

h

Age Income

Eout(h)

↓ D

Age Income

Ein(h) = 2

9

Hoeffding: Eout(h) ≈ Ein(h)

(with high probability)

P[|Ein(h) − Eout(h)| > ǫ] ≤ 2e−2Nǫ2.

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 2 /17

Learning: finite model − →

Real Learning – Finite Learning Models

h1

Age Income

Eout(h1) h2

Age Income

Eout(h2) h3

Age Income

Eout(h3) · · · hM

Age Income

Eout(hM)

↓ D

Age Income

Ein(h1) = 2

9

Age Income

Ein(h2) = 0

Age Income

Ein(h3) = 5

9

· · ·

Age Income

Ein(hM) = 6

9

Pick the hypothesis with minimum Ein; will Eout be small?

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 3 /17

Recap: 1000 Monkeys− →

recap: 1000 Monkeys Behind Closed Doors

5-question A/B test. Monkeys answer randomly. Child gets all right.

Door 1 Door 2 Door 3 Door 4 Door 5 Door 6 Door 1000 Door 1001

· · ·

↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓

2 wrong 3 wrong 0 wrong 5 wrong 0 wrong 4 wrong 1 wrong 3 wrong

  • What are your chances of picking the child?
  • What can you do about it? (You can’t peek behind the door.

)

More Monkeys: Ein Can’t Reach Out to Eout.

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 4 /17

Recap: selection bias and coins − →

slide-2
SLIDE 2

recap: Selection Bias Illustrated with Coins

Coin tossing example:

  • If we toss one coin and get no heads, its very surprising.

P = 1 2N

We expect it is biased: P[heads] ≈ 0.

  • Tossing 70 coins, and find one with no heads. Is it surprising?

P = 1 −

  • 1 − 1

2N 70

Do we expect P[heads] ≈ 0 for the selected coin? Similar to the “birthday problem”: among 30 people, two will likely share the same birthday.

  • This is called selection bias.

Selection bias is a very serious trap. For example medical screening.

Search Causes Selection Bias

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 5 /17

Hoeffding: finite model − →

Hoeffding says that Ein(g) ≈ Eout(g) for Finite H

P [|Ein(g) − Eout(g)| > ǫ] ≤ 2|H|e−2ǫ2N, for any ǫ > 0. P [|Ein(g) − Eout(g)| ≤ ǫ] ≥ 1 − 2|H|e−2ǫ2N, for any ǫ > 0. We don’t care how g was obtained, as long as it is from H

Some Basic Probability Events A, B Implication If A = ⇒ B (A ⊆ B) then P[A] ≤ P[B]. Union Bound P[A or B] = P[A ∪ B] ≤ P[A] + P[B]. Bayes’ Rule P[A|B] = P[B|A] · P[A] P[B] Proof: Let M = |H|. The event “|Ein(g) − Eout(g)| > ǫ” implies “|Ein(h1) − Eout(h1)| > ǫ” OR . . . OR “|Ein(hM) − Eout(hM)| > ǫ” So, by the implication and union bounds: P[|Ein(g) − Eout(g)| > ǫ] ≤ P M OR

m=1 |Ein(hM) − Eout(hM)| > ǫ

M

  • m=1

P[|Ein(hm) − Eout(hm)| > ǫ], ≤ 2Me−2ǫ2N. (The last inequality is because we can apply the Hoeffding bound to each summand)

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 6 /17

Hoeffding as error bar − →

Interpreting the Hoeffding Bound for Finite |H|

P [|Ein(g) − Eout(g)| > ǫ] ≤ 2|H|e−2ǫ2N, for any ǫ > 0. P [|Ein(g) − Eout(g)| ≤ ǫ] ≥ 1 − 2|H|e−2ǫ2N, for any ǫ > 0.

  • Theorem. With probability at least 1 − δ,

Eout(g) ≤ Ein(g) +

  • 1

2N log 2|H| δ . We don’t care how g was obtained, as long as g ∈ H

Proof: Let δ = 2|H|e−2ǫ2N. Then P [|Ein(g) − Eout(g)| ≤ ǫ] ≥ 1 − δ. In words, with probability at least 1 − δ, |Ein(g) − Eout(g)| ≤ ǫ. This implies Eout(g) ≤ Ein(g) + ǫ. From the definition of δ, solve for ǫ: ǫ =

  • 1

2N log 2|H| δ .

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 7 /17

Ein is close to Eout for small H − →

Ein Reaches Outside to Eout when |H| is Small

Eout(g) ≤ Ein(g) +

  • 1

2N log 2|H| δ . If N ≫ ln |H|, then Eout(g) ≈ Ein(g).

  • Does not depend on X, P(x), f or how g is found.
  • Only requires P(x) to generate the data points independently and also the test point.

What about Eout ≈ 0?

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 8 /17

2 step approach − →

slide-3
SLIDE 3

The 2 Step Approach to Getting Eout ≈ 0:

(1) Eout(g) ≈ Ein(g). (2) Ein(g) ≈ 0. Together, these ensure Eout ≈ 0. How to verify (1) since we do not know Eout

– must ensure it theoretically - Hoeffding.

We can ensure (2) (for example PLA)

– modulo that we can guarantee (1)

There is a tradeoff:

  • Small |H| =

⇒ Ein ≈ Eout

  • Large |H| =

⇒ Ein ≈ 0 is more likely.

in-sample error model complexity

  • 1

2N log 2|H| δ

  • ut-of-sample error

|H| Error |H|∗

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 9 /17

Summary: feasibility of learning − →

Feasibility of Learning (Finite Models)

  • No Free Lunch: can’t know anything outside D, for sure.
  • Can “learn” with high probability if D is i.i.d. from P(x).

Eout ≈ Ein (Ein can reach outside the data set to Eout).

  • We want Eout ≈ 0.
  • The two step solution. We trade Eout ≈ 0 for 2 goals:

(i) Eout ≈ Ein; (ii) Ein ≈ 0. We know Ein, not Eout, but we can ensure (i) if |H| is small. This is a big step!

  • What about infinite H - the perceptron?

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 10 /17

Complex f are harder to learn − →

“Complex” Target Functions are Harder to Learn

What happened to the “difficulty” (complexity) of f?

  • Simple f =

⇒ can use small H to get Ein ≈ 0 (need smaller N).

  • Complex f =

⇒ need large H to get Ein ≈ 0 (need larger N).

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 11 /17

Learning setup with probability − →

Revising the Learning Problem – Adding in Probability

x x1, x2, . . . , xN g(x) ≈ f(x) UNKNOWN TARGET FUNCTION f : X → Y TRAINING EXAMPLES (x1, y1), (x2, y2), . . . , (xN, yN) UNKNOWN INPUT DISTRIBUTION P(x) FINAL HYPOTHESIS g LEARNING ALGORITHM A HYPOTHESIS SET H

yn = f(xn)

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 12 /17

Error and Noise − →

slide-4
SLIDE 4

Error and Noise

Error Measure: How to quantify that h ≈ f. Noise: yn = f(xn).

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 13 /17

Finger print example − →

Finger Print Recognition

f      +1 you −1 intruder

Two types of error. f +1 −1 h +1 no error false accept −1 false reject no error

In any application you need to think about how to penalize each type of error. f +1 −1 h +1 1 −1 10 f +1 −1 h +1 1000 −1 1 Supermarket CIA

Take Away Error measure is specified by the user. If not, choose one that is – plausible (conceptually appealing) – friendly (practically appealing)

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 14 /17

Pointwise errors − →

Almost All Error Mearures are Pointwise

Compare h and f on individual points x using a pointwise error e(h(x), f(x)): Binary error: e(h(x), f(x)) = h(x) = f(x) (classification) Squared error: e(h(x), f(x)) = (h(x) − f(x))2 (regression) In-sample error: Ein(h) = 1 N

N

  • n=1

e(h(xn), f(xn)). Out-of-sample error: Eout(h) = Ex[e(h(x), f(x))].

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 15 /17

Noisy targets − →

Noisy Targets

age 32 years gender male salary 40,000 debt 26,000 years in job 1 year years at home 3 years . . . . . . Approve for credit?

Consider two customers with the same credit data. They can have different behaviors.

The target ‘function’ is not a deterministic function but a stochastic function. ‘f(x)’ = P(y|x)

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 16 /17

Learning setup with error and noise − →

slide-5
SLIDE 5

Learning Setup with Error Measure and Noisy Targets

x1, x2, . . . , xN TRAINING EXAMPLES (x1, y1), (x2, y2), . . . , (xN, yN) UNKNOWN INPUT DISTRIBUTION P(x) UNKNOWN TARGET DISTRIBUTION (target function f plus noise) P(y | x) x g(x) ≈ f(x) FINAL HYPOTHESIS g HYPOTHESIS SET H LEARNING ALGORITHM A ERROR MEASURE

yn ∼ P(y | xn)

c A M L Creator: Malik Magdon-Ismail

Real Learning is Feasible: 17 /17