age age age age e out h 1 e out h 2 e out h 3 e out h m
play

Age Age Age Age E out ( h 1 ) E out ( h 2 ) E out ( h 3 ) E - PowerPoint PPT Presentation

recap: Verification h Income Learning From Data Lecture 4 Real Learning is Feasible Age Hoeffding: E out ( h ) E in ( h ) (with high probability) E out ( h ) Real Learning vs. Verification The Two Step Solution to Learning D Closer


  1. recap: Verification h Income Learning From Data Lecture 4 Real Learning is Feasible Age Hoeffding: E out ( h ) ≈ E in ( h ) (with high probability) E out ( h ) Real Learning vs. Verification The Two Step Solution to Learning ↓ D Closer to Reality: Error and Noise P [ | E in ( h ) − E out ( h ) | > ǫ ] ≤ 2 e − 2 Nǫ 2 . M. Magdon-Ismail CSCI 4100/6100 Income Age E in ( h ) = 2 9 � A M Real Learning is Feasible : 2 /17 c L Creator: Malik Magdon-Ismail Learning: finite model − → Real Learning – Finite Learning Models recap: 1000 Monkeys Behind Closed Doors 5-question A/B test. Monkeys answer randomly. Child gets all right. h 1 h 2 h 3 h M Door 1 Door 2 Door 3 Door 4 Door 5 Door 6 Door 1000 Door 1001 Income Income Income Income · · · · · · Age Age Age Age E out ( h 1 ) E out ( h 2 ) E out ( h 3 ) E out ( h M ) ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ 2 wrong 3 wrong 0 wrong 5 wrong 0 wrong 4 wrong 1 wrong 3 wrong ↓ D • What are your chances of picking the child? • What can you do about it? (You can’t peek behind the door. ) Income Income Income Income · · · Age Age Age Age E in ( h 1 ) = 2 E in ( h 3 ) = 5 E in ( h M ) = 6 E in ( h 2 ) = 0 9 9 9 More Monkeys: E in Can’t Reach Out to E out . Pick the hypothesis with minimum E in ; will E out be small? � A c M Real Learning is Feasible : 3 /17 � A c M Real Learning is Feasible : 4 /17 L Creator: Malik Magdon-Ismail Recap: 1000 Monkeys − → L Creator: Malik Magdon-Ismail Recap: selection bias and coins − →

  2. recap: Selection Bias Illustrated with Coins Hoeffding says that E in ( g ) ≈ E out ( g ) for Finite H Coin tossing example: P [ | E in ( g ) − E out ( g ) | > ǫ ] ≤ 2 |H| e − 2 ǫ 2 N , for any ǫ > 0 . • If we toss one coin and get no heads , its very surprising. P = 1 2 N P [ | E in ( g ) − E out ( g ) | ≤ ǫ ] ≥ 1 − 2 |H| e − 2 ǫ 2 N , for any ǫ > 0 . We expect it is biased: P [heads] ≈ 0. • Tossing 70 coins, and find one with no heads. Is it surprising? � 70 � 1 − 1 P = 1 − 2 N Do we expect P [heads] ≈ 0 for the selected coin? We don’t care how g was obtained, as long as it is from H Similar to the “birthday problem”: among 30 people, two will likely share the same birthday. • This is called selection bias . Some Basic Probability Proof: Let M = |H| . Selection bias is a very serious trap. For example medical screening. Events A, B The event “ | E in ( g ) − E out ( g ) | > ǫ ” implies Implication “ | E in ( h 1 ) − E out ( h 1 ) | > ǫ ” OR . . . OR “ | E in ( h M ) − E out ( h M ) | > ǫ ” If A = ⇒ B ( A ⊆ B ) then P [ A ] ≤ P [ B ]. So, by the implication and union bounds: Union Bound � M � P [ A or B ] = P [ A ∪ B ] ≤ P [ A ] + P [ B ]. P [ | E in ( g ) − E out ( g ) | > ǫ ] ≤ P OR m =1 | E in ( h M ) − E out ( h M ) | > ǫ Bayes’ Rule Search Causes Selection Bias M P [ A | B ] = P [ B | A ] · P [ A ] � ≤ P [ | E in ( h m ) − E out ( h m ) | > ǫ ] , P [ B ] m =1 2 Me − 2 ǫ 2 N . ≤ (The last inequality is because we can apply the Hoeffding bound to each summand) � A M Real Learning is Feasible : 5 /17 � A M Real Learning is Feasible : 6 /17 c L Creator: Malik Magdon-Ismail Hoeffding: finite model − → c L Creator: Malik Magdon-Ismail Hoeffding as error bar − → Interpreting the Hoeffding Bound for Finite |H| E in Reaches Outside to E out when |H| is Small P [ | E in ( g ) − E out ( g ) | > ǫ ] ≤ 2 |H| e − 2 ǫ 2 N , for any ǫ > 0 . � 2 N log 2 |H| 1 E out ( g ) ≤ E in ( g ) + δ . P [ | E in ( g ) − E out ( g ) | ≤ ǫ ] ≥ 1 − 2 |H| e − 2 ǫ 2 N , for any ǫ > 0 . If N ≫ ln |H| , then E out ( g ) ≈ E in ( g ). Theorem. With probability at least 1 − δ , Proof: Let δ = 2 |H| e − 2 ǫ 2 N . Then • Does not depend on X , P ( x ) , f or how g is found. � P [ | E in ( g ) − E out ( g ) | ≤ ǫ ] ≥ 1 − δ. 2 N log 2 |H| 1 • Only requires P ( x ) to generate the data points independently and also the test point. E out ( g ) ≤ E in ( g ) + δ . In words, with probability at least 1 − δ , | E in ( g ) − E out ( g ) | ≤ ǫ. This implies E out ( g ) ≤ E in ( g ) + ǫ. We don’t care how g was obtained, as long as g ∈ H From the definition of δ , solve for ǫ : What about E out ≈ 0 ? � 2 N log 2 |H| 1 ǫ = . δ � A c M Real Learning is Feasible : 7 /17 � A c M Real Learning is Feasible : 8 /17 L Creator: Malik Magdon-Ismail E in is close to E out for small H − → L Creator: Malik Magdon-Ismail 2 step approach − →

  3. The 2 Step Approach to Getting E out ≈ 0 : Feasibility of Learning (Finite Models) (1) E out ( g ) ≈ E in ( g ). (2) E in ( g ) ≈ 0. • No Free Lunch: can’t know anything outside D , for sure . Together, these ensure E out ≈ 0. • Can “learn” with high probability if D is i.i.d. from P ( x ). E out ≈ E in ( E in can reach outside the data set to E out ). How to verify (1) since we do not know E out – must ensure it theoretically - Hoeffding. • We want E out ≈ 0 . We can ensure (2) (for example PLA) – modulo that we can guarantee (1) • The two step solution. We trade E out ≈ 0 for 2 goals: (i) E out ≈ E in ; out-of-sample error (ii) E in ≈ 0. We know E in , not E out , but we can ensure ( i ) if |H| is small. There is a tradeoff: model complexity Error This is a big step! • Small |H| = ⇒ E in ≈ E out � 2 N log 2 |H| 1 δ • Large |H| = ⇒ E in ≈ 0 is more likely. • What about infinite H - the perceptron? in-sample error |H| |H| ∗ � A M Real Learning is Feasible : 9 /17 � A M Real Learning is Feasible : 10 /17 c L Creator: Malik Magdon-Ismail Summary: feasibility of learning − → c L Creator: Malik Magdon-Ismail Complex f are harder to learn − → “Complex” Target Functions are Harder to Learn Revising the Learning Problem – Adding in Probability UNKNOWN TARGET FUNCTION f : X �→ Y UNKNOWN y n = f ( x n ) INPUT DISTRIBUTION What happened to the “difficulty” (complexity) of f ? P ( x ) TRAINING EXAMPLES x 1 , x 2 , . . . , x N ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) x g ( x ) ≈ f ( x ) FINAL LEARNING HYPOTHESIS ALGORITHM • Simple f = ⇒ can use small H to get E in ≈ 0 (need smaller N ). g A • Complex f = ⇒ need large H to get E in ≈ 0 (need larger N ). HYPOTHESIS SET H � A c M Real Learning is Feasible : 11 /17 � A c M Real Learning is Feasible : 12 /17 L Creator: Malik Magdon-Ismail Learning setup with probability − → L Creator: Malik Magdon-Ismail Error and Noise − →

  4. Error and Noise Finger Print Recognition Two types of error.  +1 you f   f +1 − 1  − 1 intruder  h +1 no error false accept − 1 false reject no error Error Measure: How to quantify that h ≈ f . In any application you need to think about how to penalize each type of error. Noise: y n � = f ( x n ). f f Take Away +1 − 1 +1 − 1 Error measure is specified by the user. h +1 0 1 h +1 0 1000 − 1 10 0 − 1 1 0 If not, choose one that is – plausible (conceptually appealing) – friendly (practically appealing) Supermarket CIA � A M Real Learning is Feasible : 13 /17 � A M Real Learning is Feasible : 14 /17 c L Creator: Malik Magdon-Ismail Finger print example − → c L Creator: Malik Magdon-Ismail Pointwise errors − → Almost All Error Mearures are Pointwise Noisy Targets Compare h and f on individual points x using a pointwise error e ( h ( x ) , f ( x )): age 32 years gender male Binary error: e ( h ( x ) , f ( x )) = � h ( x ) � = f ( x ) � (classification) salary 40,000 debt 26,000 Consider two customers with the same credit data. years in job 1 year e ( h ( x ) , f ( x )) = ( h ( x ) − f ( x )) 2 Squared error: (regression) They can have different behaviors. years at home 3 years . . . . . . Approve for credit? In-sample error: N E in ( h ) = 1 � e ( h ( x n ) , f ( x n )) . N The target ‘function’ is not a deterministic function but a stochastic function. n =1 Out-of-sample error: ‘ f ( x )’ = P ( y | x ) E out ( h ) = E x [ e ( h ( x ) , f ( x ))] . � A c M Real Learning is Feasible : 15 /17 � A c M Real Learning is Feasible : 16 /17 L Creator: Malik Magdon-Ismail Noisy targets − → L Creator: Malik Magdon-Ismail Learning setup with error and noise − →

  5. Learning Setup with Error Measure and Noisy Targets UNKNOWN TARGET DISTRIBUTION (target function f plus noise) P ( y | x ) UNKNOWN y n ∼ P ( y | x n ) INPUT DISTRIBUTION P ( x ) TRAINING EXAMPLES ( x 1 , y 1 ) , ( x 2 , y 2 ) , . . . , ( x N , y N ) x 1 , x 2 , . . . , x N x ERROR MEASURE g ( x ) ≈ f ( x ) FINAL LEARNING HYPOTHESIS ALGORITHM g A HYPOTHESIS SET H � A M Real Learning is Feasible : 17 /17 c L Creator: Malik Magdon-Ismail

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend