machine learning chenhao tan
play

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 3 Slides adapted from Thorsten Joachims Machine Learning: Chenhao Tan | Boulder | 1 of 29 Logistics Homework assignments Final project Machine Learning: Chenhao


  1. Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 3 Slides adapted from Thorsten Joachims Machine Learning: Chenhao Tan | Boulder | 1 of 29

  2. Logistics • Homework assignments • Final project Machine Learning: Chenhao Tan | Boulder | 2 of 29

  3. Overview Sample error and generalization error Bias-variance tradeoff Model selection Machine Learning: Chenhao Tan | Boulder | 3 of 29

  4. Sample error and generalization error Outline Sample error and generalization error Bias-variance tradeoff Model selection Machine Learning: Chenhao Tan | Boulder | 4 of 29

  5. Sample error and generalization error Supervised learning • S train → h • Target function f : X → Y ( f is unknown) • Goal: h approximates f Machine Learning: Chenhao Tan | Boulder | 5 of 29

  6. Sample error and generalization error Problem Setup • Instances in a learning problems follow a probability distribution P ( X , Y ) • A sample S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } is independently and identically distributed (i.i.d.) according to P ( X , Y ) . Machine Learning: Chenhao Tan | Boulder | 6 of 29

  7. Sample error and generalization error Problem Setup • Instances in a learning problems follow a probability distribution P ( X , Y ) • A sample S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } is independently and identically distributed (i.i.d.) according to P ( X , Y ) . Examples ◦ training sample S train ◦ test sample S test Machine Learning: Chenhao Tan | Boulder | 6 of 29

  8. Sample error and generalization error Sample Error vs. Generalization Error • Generalization error of a hypothesis h for a learning task P ( X , Y ) : � Err P ( h ) = E [∆( h ( x ) , y )] = ∆( h ( x ) , y ) P ( X = x , Y = y ) x ∈ X , y ∈ Y Machine Learning: Chenhao Tan | Boulder | 7 of 29

  9. Sample error and generalization error Sample Error vs. Generalization Error • Generalization error of a hypothesis h for a learning task P ( X , Y ) : � Err P ( h ) = E [∆( h ( x ) , y )] = ∆( h ( x ) , y ) P ( X = x , Y = y ) x ∈ X , y ∈ Y • Sample error of a hypothesis h for a sample S : n Err S ( h ) = 1 � ∆( h ( x i ) , y i ) n i = 1 Machine Learning: Chenhao Tan | Boulder | 7 of 29

  10. Sample error and generalization error Training error vs. Test error • S train → h Machine Learning: Chenhao Tan | Boulder | 8 of 29

  11. Sample error and generalization error Training error vs. Test error • S train → h • Train error = Err S train ( h ) • test error = Err S test ( h ) Machine Learning: Chenhao Tan | Boulder | 8 of 29

  12. Sample error and generalization error A concrete hypothetical example • Predict flu trends using search data • X : search data, Y : fraction of population with flu Machine Learning: Chenhao Tan | Boulder | 9 of 29

  13. Sample error and generalization error A concrete hypothetical example • Predict flu trends using search data • X : search data, Y : fraction of population with flu • S train = all data before 2012 • S test = all data in 2012 • What is the problem of generalization error estimation? [Lazer et al., 2014] Machine Learning: Chenhao Tan | Boulder | 9 of 29

  14. Sample error and generalization error Overfitting [Friedman et al., 2001] Machine Learning: Chenhao Tan | Boulder | 10 of 29

  15. Bias-variance tradeoff Outline Sample error and generalization error Bias-variance tradeoff Model selection Machine Learning: Chenhao Tan | Boulder | 11 of 29

  16. Bias-variance tradeoff Bias-Variance Tradeoff Assume a simple model y = f ( x ) + ǫ , E ( ǫ ) = 0 , Var( ǫ ) = σ 2 ǫ , Machine Learning: Chenhao Tan | Boulder | 12 of 29

  17. Bias-variance tradeoff Bias-Variance Tradeoff Assume a simple model y = f ( x ) + ǫ , E ( ǫ ) = 0 , Var( ǫ ) = σ 2 ǫ , E [( y − h ( x 0 )) 2 | X = x 0 ] Err( x 0 ) = Machine Learning: Chenhao Tan | Boulder | 12 of 29

  18. Bias-variance tradeoff Bias-Variance Tradeoff Assume a simple model y = f ( x ) + ǫ , E ( ǫ ) = 0 , Var( ǫ ) = σ 2 ǫ , E [( y − h ( x 0 )) 2 | X = x 0 ] Err( x 0 ) = ǫ + [ E h ( x 0 ) − f ( x 0 )] 2 + E [ h ( x 0 ) − E h ( x 0 )] 2 σ 2 = σ 2 ǫ + Bias 2 ( h ( x 0 )) + Var( h ( x 0 )) = Irreducible Error + Bias 2 + Variance = Machine Learning: Chenhao Tan | Boulder | 12 of 29

  19. Bias-variance tradeoff Example order=1 order=5 order=9 1.00 1.00 1.00 0.75 0.75 0.75 0.50 0.50 0.50 0.25 0.25 0.25 0.00 0.00 0.00 y y y 0.25 0.25 0.25 0.50 0.50 0.50 0.75 0.75 0.75 1.00 1.00 1.00 1.0 1.2 1.4 1.6 1.8 2.0 1.0 1.2 1.4 1.6 1.8 2.0 1.0 1.2 1.4 1.6 1.8 2.0 x x x Machine Learning: Chenhao Tan | Boulder | 13 of 29

  20. Bias-variance tradeoff Revisit Overfitting http://scott.fortmann-roe.com/docs/BiasVariance.html Machine Learning: Chenhao Tan | Boulder | 14 of 29

  21. Bias-variance tradeoff K-NN Example k � 2 + σ 2 f ( x 0 ) − 1 � σ 2 � Err( x 0 ) = ǫ + f ( x ( l ) ) ǫ k k l = 1 In homework 1! Machine Learning: Chenhao Tan | Boulder | 15 of 29

  22. Model selection Outline Sample error and generalization error Bias-variance tradeoff Model selection Machine Learning: Chenhao Tan | Boulder | 16 of 29

  23. Model selection Model Selection • training: run learning algorithm m times (e.g., parameter search). • validation error: Errors Err S val (ˆ h i ) is an estimate of Err P ( h i ) . • selection: Use h i with min Err S val (ˆ h i ) for prediction on test examples. Machine Learning: Chenhao Tan | Boulder | 17 of 29

  24. Model selection Train-val-test Machine Learning: Chenhao Tan | Boulder | 18 of 29

  25. Model selection K -fold cross validation An estimate using all instances: • Input: a sample S and a learning algorithm A . • Procedure: Randomly split S into K equally-sized folds S 1 , . . . , S K For each S i , apply A to S − i , get ˆ h i , and compute Err S i (ˆ h i ) � K i = 1 Err S i (ˆ • Performance estimates: 1 h i ) K Machine Learning: Chenhao Tan | Boulder | 19 of 29

  26. Model selection K -fold Cross Validation Example use: • Find good features F using S train • Split S train into K folds • For each fold, use the rest training data and features F to build a classifier and estimate prediction error using average error rates on each fold Machine Learning: Chenhao Tan | Boulder | 20 of 29

  27. Model selection K -fold Cross Validation Example use (Wrong!): • Find good features F using S train • Split S train into K folds • For each fold, use the rest training data and features F to build a classifier and estimate prediction error using average error rates on each fold Machine Learning: Chenhao Tan | Boulder | 21 of 29

  28. Model selection K -fold cross validation • select best models from training data • nested cross-validation for performance estimation Machine Learning: Chenhao Tan | Boulder | 22 of 29

  29. Model selection Evaluating learned hypothesis • Goal: Find h with small prediction error Err P ( h ) over P ( X , Y ) • Question: What is Err P (ˆ h ) of ˆ h obtained from training data S train • Training error and test error ◦ Training error: Err S train (ˆ h ) ◦ Test error: Err S test (ˆ h ) is an estimate of Err P (ˆ h ) Machine Learning: Chenhao Tan | Boulder | 23 of 29

  30. Model selection What is the True Error of a Hypothesis? • Apply ˆ h to S test , for each ( x , y ) ∈ S test observer ∆(ˆ h ( x ) , y ) . Machine Learning: Chenhao Tan | Boulder | 24 of 29

  31. Model selection What is the True Error of a Hypothesis? • Apply ˆ h to S test , for each ( x , y ) ∈ S test observer ∆(ˆ h ( x ) , y ) . • Binomial distribution estimates: assume that each toss is independent and the probability of heads is p , then the probability of observing x heads in a sample of n independent coin tosses is n ! x !( n − x )! p x ( 1 − p ) n − x Pr( X = x | p , n ) = Machine Learning: Chenhao Tan | Boulder | 24 of 29

  32. Model selection What is the True Error of a Hypothesis? • Apply ˆ h to S test , for each ( x , y ) ∈ S test observer ∆(ˆ h ( x ) , y ) . • Binomial distribution estimates: assume that each toss is independent and the probability of heads is p , then the probability of observing x heads in a sample of n independent coin tosses is n ! x !( n − x )! p x ( 1 − p ) n − x Pr( X = x | p , n ) = ◦ Normal approximation ◦ Err(ˆ � n i = 1 ∆(ˆ p = 1 h ) = ˆ h ( x i ) , y i ) n � 1 ◦ Confidence interval: ˆ p ± z α n ˆ p ( 1 − ˆ p ) Machine Learning: Chenhao Tan | Boulder | 24 of 29

  33. Model selection Is hypothesis ˆ h 1 better than ˆ h 2 ? Same test sample • Apply ˆ h 1 and ˆ h 2 to S test Machine Learning: Chenhao Tan | Boulder | 25 of 29

  34. Model selection Is hypothesis ˆ h 1 better than ˆ h 2 ? Same test sample • Apply ˆ h 1 and ˆ h 2 to S test • Decide if Err P (ˆ h 1 ) � = Err P (ˆ h 2 ) • Null hypothesis: Err S test (ˆ h 1 ) and Err S test (ˆ h 2 ) come from binomial distributions with same p Binomial Sign Test (McNemar’s Test) Machine Learning: Chenhao Tan | Boulder | 25 of 29

  35. Model selection Is hypothesis ˆ h 1 better than ˆ h 2 ? Different test samples • Apply ˆ h 1 to S test1 and ˆ h 2 to S test2 Machine Learning: Chenhao Tan | Boulder | 26 of 29

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend