Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 3 Slides adapted from Thorsten Joachims Machine Learning: Chenhao Tan | Boulder | 1 of 29

Logistics • Homework assignments • Final project Machine Learning: Chenhao Tan | Boulder | 2 of 29

Overview Sample error and generalization error Bias-variance tradeoff Model selection Machine Learning: Chenhao Tan | Boulder | 3 of 29

Sample error and generalization error Outline Sample error and generalization error Bias-variance tradeoff Model selection Machine Learning: Chenhao Tan | Boulder | 4 of 29

Sample error and generalization error Supervised learning • S train → h • Target function f : X → Y ( f is unknown) • Goal: h approximates f Machine Learning: Chenhao Tan | Boulder | 5 of 29

Sample error and generalization error Problem Setup • Instances in a learning problems follow a probability distribution P ( X , Y ) • A sample S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } is independently and identically distributed (i.i.d.) according to P ( X , Y ) . Machine Learning: Chenhao Tan | Boulder | 6 of 29

Sample error and generalization error Problem Setup • Instances in a learning problems follow a probability distribution P ( X , Y ) • A sample S = { ( x 1 , y 1 ) , . . . , ( x n , y n ) } is independently and identically distributed (i.i.d.) according to P ( X , Y ) . Examples ◦ training sample S train ◦ test sample S test Machine Learning: Chenhao Tan | Boulder | 6 of 29

Sample error and generalization error Sample Error vs. Generalization Error • Generalization error of a hypothesis h for a learning task P ( X , Y ) : � Err P ( h ) = E [∆( h ( x ) , y )] = ∆( h ( x ) , y ) P ( X = x , Y = y ) x ∈ X , y ∈ Y Machine Learning: Chenhao Tan | Boulder | 7 of 29

Sample error and generalization error Sample Error vs. Generalization Error • Generalization error of a hypothesis h for a learning task P ( X , Y ) : � Err P ( h ) = E [∆( h ( x ) , y )] = ∆( h ( x ) , y ) P ( X = x , Y = y ) x ∈ X , y ∈ Y • Sample error of a hypothesis h for a sample S : n Err S ( h ) = 1 � ∆( h ( x i ) , y i ) n i = 1 Machine Learning: Chenhao Tan | Boulder | 7 of 29

Sample error and generalization error Training error vs. Test error • S train → h Machine Learning: Chenhao Tan | Boulder | 8 of 29

Sample error and generalization error Training error vs. Test error • S train → h • Train error = Err S train ( h ) • test error = Err S test ( h ) Machine Learning: Chenhao Tan | Boulder | 8 of 29

Sample error and generalization error A concrete hypothetical example • Predict flu trends using search data • X : search data, Y : fraction of population with flu Machine Learning: Chenhao Tan | Boulder | 9 of 29

Sample error and generalization error A concrete hypothetical example • Predict flu trends using search data • X : search data, Y : fraction of population with flu • S train = all data before 2012 • S test = all data in 2012 • What is the problem of generalization error estimation? [Lazer et al., 2014] Machine Learning: Chenhao Tan | Boulder | 9 of 29

Sample error and generalization error Overfitting [Friedman et al., 2001] Machine Learning: Chenhao Tan | Boulder | 10 of 29

Bias-variance tradeoff Outline Sample error and generalization error Bias-variance tradeoff Model selection Machine Learning: Chenhao Tan | Boulder | 11 of 29

Bias-variance tradeoff Bias-Variance Tradeoff Assume a simple model y = f ( x ) + ǫ , E ( ǫ ) = 0 , Var( ǫ ) = σ 2 ǫ , Machine Learning: Chenhao Tan | Boulder | 12 of 29

Bias-variance tradeoff Bias-Variance Tradeoff Assume a simple model y = f ( x ) + ǫ , E ( ǫ ) = 0 , Var( ǫ ) = σ 2 ǫ , E [( y − h ( x 0 )) 2 | X = x 0 ] Err( x 0 ) = Machine Learning: Chenhao Tan | Boulder | 12 of 29

Bias-variance tradeoff Bias-Variance Tradeoff Assume a simple model y = f ( x ) + ǫ , E ( ǫ ) = 0 , Var( ǫ ) = σ 2 ǫ , E [( y − h ( x 0 )) 2 | X = x 0 ] Err( x 0 ) = ǫ + [ E h ( x 0 ) − f ( x 0 )] 2 + E [ h ( x 0 ) − E h ( x 0 )] 2 σ 2 = σ 2 ǫ + Bias 2 ( h ( x 0 )) + Var( h ( x 0 )) = Irreducible Error + Bias 2 + Variance = Machine Learning: Chenhao Tan | Boulder | 12 of 29

Bias-variance tradeoff Example order=1 order=5 order=9 1.00 1.00 1.00 0.75 0.75 0.75 0.50 0.50 0.50 0.25 0.25 0.25 0.00 0.00 0.00 y y y 0.25 0.25 0.25 0.50 0.50 0.50 0.75 0.75 0.75 1.00 1.00 1.00 1.0 1.2 1.4 1.6 1.8 2.0 1.0 1.2 1.4 1.6 1.8 2.0 1.0 1.2 1.4 1.6 1.8 2.0 x x x Machine Learning: Chenhao Tan | Boulder | 13 of 29

Bias-variance tradeoff Revisit Overfitting http://scott.fortmann-roe.com/docs/BiasVariance.html Machine Learning: Chenhao Tan | Boulder | 14 of 29

Bias-variance tradeoff K-NN Example k � 2 + σ 2 f ( x 0 ) − 1 � σ 2 � Err( x 0 ) = ǫ + f ( x ( l ) ) ǫ k k l = 1 In homework 1! Machine Learning: Chenhao Tan | Boulder | 15 of 29

Model selection Outline Sample error and generalization error Bias-variance tradeoff Model selection Machine Learning: Chenhao Tan | Boulder | 16 of 29

Model selection Model Selection • training: run learning algorithm m times (e.g., parameter search). • validation error: Errors Err S val (ˆ h i ) is an estimate of Err P ( h i ) . • selection: Use h i with min Err S val (ˆ h i ) for prediction on test examples. Machine Learning: Chenhao Tan | Boulder | 17 of 29

Model selection Train-val-test Machine Learning: Chenhao Tan | Boulder | 18 of 29

Model selection K -fold cross validation An estimate using all instances: • Input: a sample S and a learning algorithm A . • Procedure: Randomly split S into K equally-sized folds S 1 , . . . , S K For each S i , apply A to S − i , get ˆ h i , and compute Err S i (ˆ h i ) � K i = 1 Err S i (ˆ • Performance estimates: 1 h i ) K Machine Learning: Chenhao Tan | Boulder | 19 of 29

Model selection K -fold Cross Validation Example use: • Find good features F using S train • Split S train into K folds • For each fold, use the rest training data and features F to build a classifier and estimate prediction error using average error rates on each fold Machine Learning: Chenhao Tan | Boulder | 20 of 29

Model selection K -fold Cross Validation Example use (Wrong!): • Find good features F using S train • Split S train into K folds • For each fold, use the rest training data and features F to build a classifier and estimate prediction error using average error rates on each fold Machine Learning: Chenhao Tan | Boulder | 21 of 29

Model selection K -fold cross validation • select best models from training data • nested cross-validation for performance estimation Machine Learning: Chenhao Tan | Boulder | 22 of 29

Model selection Evaluating learned hypothesis • Goal: Find h with small prediction error Err P ( h ) over P ( X , Y ) • Question: What is Err P (ˆ h ) of ˆ h obtained from training data S train • Training error and test error ◦ Training error: Err S train (ˆ h ) ◦ Test error: Err S test (ˆ h ) is an estimate of Err P (ˆ h ) Machine Learning: Chenhao Tan | Boulder | 23 of 29

Model selection What is the True Error of a Hypothesis? • Apply ˆ h to S test , for each ( x , y ) ∈ S test observer ∆(ˆ h ( x ) , y ) . Machine Learning: Chenhao Tan | Boulder | 24 of 29

Model selection What is the True Error of a Hypothesis? • Apply ˆ h to S test , for each ( x , y ) ∈ S test observer ∆(ˆ h ( x ) , y ) . • Binomial distribution estimates: assume that each toss is independent and the probability of heads is p , then the probability of observing x heads in a sample of n independent coin tosses is n ! x !( n − x )! p x ( 1 − p ) n − x Pr( X = x | p , n ) = Machine Learning: Chenhao Tan | Boulder | 24 of 29

Model selection What is the True Error of a Hypothesis? • Apply ˆ h to S test , for each ( x , y ) ∈ S test observer ∆(ˆ h ( x ) , y ) . • Binomial distribution estimates: assume that each toss is independent and the probability of heads is p , then the probability of observing x heads in a sample of n independent coin tosses is n ! x !( n − x )! p x ( 1 − p ) n − x Pr( X = x | p , n ) = ◦ Normal approximation ◦ Err(ˆ � n i = 1 ∆(ˆ p = 1 h ) = ˆ h ( x i ) , y i ) n � 1 ◦ Confidence interval: ˆ p ± z α n ˆ p ( 1 − ˆ p ) Machine Learning: Chenhao Tan | Boulder | 24 of 29

Model selection Is hypothesis ˆ h 1 better than ˆ h 2 ? Same test sample • Apply ˆ h 1 and ˆ h 2 to S test Machine Learning: Chenhao Tan | Boulder | 25 of 29

Model selection Is hypothesis ˆ h 1 better than ˆ h 2 ? Same test sample • Apply ˆ h 1 and ˆ h 2 to S test • Decide if Err P (ˆ h 1 ) � = Err P (ˆ h 2 ) • Null hypothesis: Err S test (ˆ h 1 ) and Err S test (ˆ h 2 ) come from binomial distributions with same p Binomial Sign Test (McNemar’s Test) Machine Learning: Chenhao Tan | Boulder | 25 of 29

Model selection Is hypothesis ˆ h 1 better than ˆ h 2 ? Different test samples • Apply ˆ h 1 to S test1 and ˆ h 2 to S test2 Machine Learning: Chenhao Tan | Boulder | 26 of 29

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 3 Slides adapted from Thorsten Joachims Machine Learning: Chenhao Tan | Boulder | 1 of 29 Logistics Homework assignments Final project Machine Learning: Chenhao

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 7 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 4 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 1 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 10 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 23: Machine

User Level Sentiment Analysis Incorporating Social Networks Chenhao Tan Department of Computer

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16:

Leveraging Operation-Aware UREQA: Error Rates for Effective Quantum Circuit Mapping on NISQ-Era

Solution approaches for Solution approaches for address-selection problems address-selection

Why are Polls So Wrong? CTC1-1A 4 Dec, 2016 1A 1A 2016 Schield CTC1 1 2016 Schield CTC1

Consistency Estimates for gFD Methods and Selection of Sets of Influence Oleg Davydov University

Variable selection and parameter tuning in high-dimensional prediction Christoph Bernau and

Overview Model Comparison Machine Learning and Pattern Recognition The model selection

The effects of data selection on The effects of data selection on the assimilation of AIRS data

Population sizing Correct size of the population important: too small: premature