diagnostics
play

Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. - PowerPoint PPT Presentation

Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator (an estimator is a function of input points). x 1, x 2, ... ,x N


  1. Diagnostics Gad Kimmel

  2. Outline ● Introduction. ● Bootstrap method. ● Cross validation. ● ROC plot.

  3. Introduction

  4. Motivation ● Estimating properties of an estimator (an estimator is a function of input points). x 1, x 2, ... ,x N − Given data samples , evaluate some estimator, say the average: ∑ x i N − How can we estimate its properties (e.g., its variance)? var  ∑ x i = 1 2 var  ∑ x i  N N ● Model selection. − How many parameters should we use?

  5. Bootstrap Method

  6. Evaluating Accuracy ● A simple approach for accuracy estimation is to provide the bias or variance of the estimator. ● Example: suppose the samples are independently identically distributed (i.i.d.), with finite variance. − We know, by the central limit theorem, that 1 / 2   x n − n  Z ~ N  0,1   − Roughly speaking, is normally distributed with x n  2 / n   expectation and variance .

  7. Assumptions Do Not Hold ● What if the r.v. are not i.i.d. ? ● What if we want to evaluate another estimator (and not )? x n  ● It would be nice to have many different samples of samples. ● In that case, one could calculate the estimator for each sample of samples, and infer its distribution. ● But... we don't have it.

  8. Solution - Bootstrap ● Estimating the sampling distribution of an estimator by resampling with replacement from the original sample. ● Efron, The Annals of Statistics , '79.

  9. Bootstrap - Illustration ● Goal: Sampling from P. P

  10. Bootstrap - Illustration ● Goal: Sampling from P. x 1 , x 2 , x 3 , x 4 , ... , x n P

  11. Bootstrap - Illustration ● Goal: Sampling from P. x 1 , x 2 , x 3 , x 4 , ... , x n P ... in order to estimate the variance of an estimator.

  12. Bootstrap - Illustration Samples Estimator x 1,1 ,x 1,2 , x 1,3 , ... , x 1, n e 1 x 2,1 , x 2,2 , x 2,3 , ... ,x 2, n e 2 x 3,1 , x 3,2 , x 3,3 , ... , x 3, n e 3 P x 4,1 , x 4,2 , x 4,3 , ... ,x 4, n e 4 ... x m , 1 ,x m , 2 , x m, 3 , ... , x m, n e m

  13. Bootstrap - Illustration Samples Estimator x 1,1 ,x 1,2 , x 1,3 , ... , x 1, n e 1 x 2,1 , x 2,2 , x 2,3 , ... ,x 2, n e 2 x 3,1 , x 3,2 , x 3,3 , ... , x 3, n e 3 P x 4,1 , x 4,2 , x 4,3 , ... ,x 4, n e 4 ... x m , 1 ,x m , 2 , x m, 3 , ... , x m, n e m ● What is the variance of ? e

  14. Bootstrap - Illustration Samples Estimator x 1,1 ,x 1,2 , x 1,3 , ... , x 1, n e 1 x 2,1 , x 2,2 , x 2,3 , ... ,x 2, n e 2 x 3,1 , x 3,2 , x 3,3 , ... , x 3, n e 3 P x 4,1 , x 4,2 , x 4,3 , ... ,x 4, n e 4 ... x m , 1 ,x m , 2 , x m, 3 , ... , x m, n e m var  e = 1 m ● Estimate the variance by m ∑ i = 1 2  e i − 

  15. Bootstrap - Illustration ● We only have 1 sample: x 1 , x 2 , x 3 , x 4 , ... , x n P

  16. Bootstrap - Illustration ● Sampling is done from the empirical distribution. Samples Estimator z 1,1 ,z 1,2 , z 1,3 , ... , z 1, n e 1 z 2,1 , z 2,2 , z 2,3 , ... , z 2, n e 2 P z 3,1 , z 3,2 , z 3,3 , ... , z 3, n e 3 x 1 , x 2 , x 3 , x 4 , ... ,x n z 4,1 , z 4,2 , z 4,3 , ... , z 4, n e 4 ... z m , 1 ,z m, 2 , z m, 3 , ... , z m , n e m

  17. Formalization ● The data is . Note that the distribution  x 1, x 2, ... , x n ~ P function P is unknown. ● We sample m samples . Y 1, Y 2, ... ,Y m contains n samples drawn from Y i = z i , 1 , z i , 2 , ... , z i, n  the empirical distribution of the data: # x i Pr [ z j , k = x i ]= n Where is the number of times appears in # x i x i the original data.

  18. The Main Idea ● . Y i ~  P ● We wish that . Is it (always) true? NO. P =  P ● Rather, is an approximation of .  P P

  19. Example 1 ● The yield of the Dow Jones Index over the past two years is ~12%. ● You are considering a broker that had a yield of 25%, by picking specific stocks from the Dow Jones. ● Let x be a r.v. that represents the yield of randomly selected stocks. ● Do we know the distribution of x ?

  20. Example 1 x 1, x 2, ... ,x 10,000 ● Prepare a sample , where each x i is the yield of randomly selected stocks. ● Approximate the distribution of x using this sample.

  21. Evaluation of Estimators ● Using the approximate distribution, we can evaluate estimators. E.g.: − Variance of the mean. − Confidence intervals.

  22. Example 1 ● What is the probability to obtain yield larger than 25% (p-value)?

  23. Example 1 ● What is the probability to obtain yield larger than 25% (p-value)? 30%

  24. Example 2 - Decision tree ● Decision tree - short introduction.

  25. Example 2 ● Building a decision tree.

  26. Example 2 ● Many other trees can be built, using different algorithms. ● For a specific tree one can calculate prediction accuracy: # of elements classified correctly total # of elements

  27. Example 2 ● Many other trees can be built, using different algorithms. ● For a specific tree one can calculate prediction accuracy: # of elements classified correctly total # of elements ● For calculating error bars for this value, we need to sample more, apply the algorithm many times, and each time evaluate the prediction.

  28. Example 2 - Applying Bootstrap Build decision tree for each sample. Calculate prediction for each tree. Evaluate error bars based on predictions.

  29. Example 2 - Applying Bootstrap Build decision T 1 ,T 2 , ... ,T n tree for each sample. Calculate prediction p 1 , p 2 , ... , p n p 1 , p 2 , ... , p n for each tree. Evaluate error bars ± 1.96 STD  p 1 , p 2 , ... , p n  based on predictions.

  30. Example 2 - Applying Bootstrap But we have Build decision only one data tree for each set ! sample. Calculate prediction for each tree. Evaluate error bars based on predictions.

  31. Example 2 - Applying Bootstrap Use bootstrap Build decision to prepare many tree for each samples. sample. Calculate prediction for each tree. Evaluate error bars based on predictions.

  32. Cross Validation

  33. Objective ● Model selection.

  34. Formalization ● Let (x, y) drawn from distribution P . Where n and y ∈ℜ x ∈ℜ ● Let be a learning algorithm, with n  ℜ f  : ℜ parameter(s) . 

  35. Example ● Regression model.

  36. What Do We Want? ● We want the method that is going to predict future data most accurately, assuming they are drawn from the distribution P .

  37. What Do We Want? ● We want the method that is going to predict future data most accurately, assuming they are drawn from the distribution P . ● Niels Bohr: " It is very difficult to make an accurate prediction, especially about the future. "

  38. Choosing the Best Model ● For a sample ( x , y ) which is drawn from the distribution function P : 2  f   x − y  or |  f   x − y  | ● Since ( x , y ) is a r.v. we are usually interested in: 2 ] E [ f   x − y 

  39. Choosing the Best Model (cont.) ● Choose the parameter(s) :  2 ] argmin  E [ f   x − y  ● The problem is that we don't know to sample from P .

  40. Regression − Order of 1 (Linear) 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  41. Regression − Order of 2 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  42. Regression − Order of 3 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  43. Regression − Order of 4 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  44. Regression − Order of 5 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  45. Regression − Join the Dots 20 18 16 14 12 10 8 6 4 2 0 4 6 8 10 12 14 16

  46. Solution - Cross Validation ● Partition the data to 2 sets: − Training set T . − Test set S . ● Calculate using only the training set T .  ● Given , calculate  1 | S | ∑  x i , y i ∈ S  f   x i − y i  2

  47. Back to the Example ● In our case, we should try different orders for the regression (or different # of params). ● Each time apply the regression only on the training set, and calculate estimation error on the test set. ● The # of parameters will be the one minimizing the error.

  48. Variants of Cross Validation ● Test - set. ● Leave one out. ● k-fold cross validation.

  49. K-fold Cross Validation Train Train Test Train Train

  50. K-fold Cross Validation ● We want to find a parameter that minimizes the cross validation estimate of prediction error: CV  = 1 | N | ∑ L  y i , f − k  i   x i ,  

  51. K-fold Cross Validation ● How to choose K? ● K=N ( = leave one out) - CV is unbiased for true prediction error, but can have high variance. ● When K increases - CV has lower variance, but bias could be a problem (depending on how the performance of the learning method varies with size of training set).

  52. ROC Plot (Receiver Operating Characteristic)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend