recap approximation versus generalization
play

recap: Approximation Versus Generalization VC Analysis - PowerPoint PPT Presentation

recap: Approximation Versus Generalization VC Analysis Bias-Variance Analysis E out E in + ( d vc ) E out = bias + var 1. Did you fit your data well enough ( E in )? 1. How well can you fit your data ( bias )? 2. Are you confident your E in


  1. recap: Approximation Versus Generalization VC Analysis Bias-Variance Analysis E out ≤ E in + Ω( d vc ) E out = bias + var 1. Did you fit your data well enough ( E in )? 1. How well can you fit your data ( bias )? 2. Are you confident your E in will generalize to E out 2. How close to that best fit can you get ( var )? Learning From Data Lecture 8 out-of-sample error Linear Classification and Regression y y model complexity Error Linear Classification x x Linear Regression in-sample error d ∗ VC dimension, d vc g ( x ) ¯ vc y y M. Magdon-Ismail ¯ g ( x ) sin( x ) sin( x ) CSCI 4100/6100 x x The VC Insuarance Co. H 0 H 1 The VC warranty had conditions for becoming void: bias = 0 . 50; bias = 0 . 21; var = 0 . 25. var = 1 . 69. You can’t look at your data before choosing H . E out = 0 . 75 � E out = 1 . 90 Data must be generated i.i.d from P ( x ). Data and test case from same P ( x ) (same bin). � A M Linear Classification and Regression : 2 /23 c L Creator: Malik Magdon-Ismail Recap: learning curve − → recap: Decomposing The Learning Curve Three Learning Problems VC Analysis Bias-Variance Analysis Approve y = ± 1 Classification or Deny Credit Amount Regression y ∈ R Analysis of Credit Expected Error Expected Error E out E out Probability Logistic Regression y ∈ [0 , 1] of Default variance generalization error E in E in bias in-sample error • Linear models are perhaps the fundamental model. Number of Data Points, N Number of Data Points, N • The linear model is the first model to try. Pick H that can generalize and has a good Pick ( H , A ) to approximate f and not behave chance to fit the data wildly after seeing the data � A c M Linear Classification and Regression : 3 /23 � A c M Linear Classification and Regression : 4 /23 L Creator: Malik Magdon-Ismail 3 learning problems − → L Creator: Malik Magdon-Ismail Linear signal − →

  2. The Linear Signal The Linear Signal   → sign( w t x ) linear in x : gives the line/hyperplane separator  {− 1 , +1 }      ↓         → w t x s = w t x − → s = w t x R       ↑      → θ ( w t x )  [0 , 1]   linear in w : makes the algorithms work  x is the augmented vector: x ∈ { 1 } × R d y = θ ( s ) � A M Linear Classification and Regression : 5 /23 � A M Linear Classification and Regression : 6 /23 c L Creator: Malik Magdon-Ismail Using the linear signal − → c L Creator: Malik Magdon-Ismail Classification and PLA − → Linear Classification Non-Separable Data � h ( x ) = sign( w t x ) � H lin = 1. E in ≈ E out because d vc = d + 1, �� � d E out ( h ) ≤ E in ( h ) + O N log N . 2. If the data is linearly separable, PLA will find a separator = ⇒ E in = 0. w ( t + 1) = w ( t ) + x ∗ y ∗ ↑ misclassified data point E in = 0 = ⇒ E out ≈ 0 ( f is well approximated by a linear fit). What if the data is not separable ( E in = 0 is not possible)? pocket algorithm How to ensure E in ≈ 0 is possible? select good features � A c M Linear Classification and Regression : 7 /23 � A c M Linear Classification and Regression : 8 /23 L Creator: Malik Magdon-Ismail Non-separable data − → L Creator: Malik Magdon-Ismail Pocket algorithm − →

  3. The Pocket Algorithm Digits Data Minimizing E in is a hard combinatorial problem. The Pocket Algorithm – Run PLA – At each step keep the best E in (and w ) so far. (Its not rocket science, but it works.) Each digit is a 16 × 16 image. (Other approaches: linear regression, logistic regression, linear programming . . . ) � A M Linear Classification and Regression : 9 /23 � A M Linear Classification and Regression : 10 /23 c L Creator: Malik Magdon-Ismail Digits − → c L Creator: Malik Magdon-Ismail Input is 256 dimensional − → Digits Data Intensity and Symmetry Features feature : an important property of the input that you think is useful for classification. ( dictionary.com : a prominent or conspicuous part or characteristic) Each digit is a 16 × 16 image. � -1 -1 -1 -1 -1 -1 -1 -0.63 0.86 -0.17 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.99 0.3 1 0.31 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.41 1 0.99 -0.57 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.68 0.83 1 0.56 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.94 0.54 1 0.78 -0.72 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0.1 1 0.92 -0.44 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -0.26 0.95 1 -0.16 -1 -1 -1 -0.99 -0.71 -0.83 -1 -1 -1 -1 -1 -0.8 0.91 1 0.3 -0.96 -1 -1 -0.55 0.49 1 0.88 0.09 -1 -1 -1 -1 0.28 1 0.88 -0.8 -1 -0.9 0.14 0.97 1 1 1 0.99 -0.74 -1 -1 -0.95 0.84 1 0.32 -1 -1 0.35 1 0.65 -0.10 -0.18 1 0.98 -0.72 -1 -1 -0.63 1 1 0.07 -0.92 0.11 0.96 0.30 -0.88 -1 -0.07 1 0.64 -0.99 -1 -1 -0.67 1 1 0.75 0.34 1 0.70 -0.94 -1 -1 0.54 1 0.02 -1 -1 -1 -0.90 0.79 1 1 1 1 0.53 0.18 0.81 0.83 0.97 0.86 -0.63 -1 -1 -1 -1 -0.45 0.82 1 1 1 1 1 1 1 1 0.13 -1 -1 -1 -1 -1 � -1 -0.48 0.81 1 1 1 1 1 1 0.21 -0.94 -1 -1 -1 -1 -1 -1 -1 -0.97 -0.42 0.30 0.82 1 0.48 -0.47 -0.99 -1 -1 -1 -1 � x = (1 , x 1 , x 2 ) ← input � x = (1 , x 1 , · · · , x 256 ) ← input d vc = 3 d vc = 257 w = ( w 0 , w 1 , w 2 ) ← linear model w = ( w 0 , w 1 , · · · , w 256 ) ← linear model � A c M Linear Classification and Regression : 11 /23 � A c M Linear Classification and Regression : 12 /23 L Creator: Malik Magdon-Ismail Intensity and symmetry features − → L Creator: Malik Magdon-Ismail PLA on digits data − →

  4. PLA on Digits Data Pocket on Digits Data PLA PLA Pocket 50% 50% 50% E out E out Error (log scale) Error (log scale) Error (log scale) 10% 10% 10% E out 1% 1% 1% E in E in E in 0 0 0 250 500 750 1000 250 500 750 1000 250 500 750 1000 Iteration Number, t Iteration Number, t Iteration Number, t � A M Linear Classification and Regression : 13 /23 � A M Linear Classification and Regression : 14 /23 c L Creator: Malik Magdon-Ismail Pocket on digits data − → c L Creator: Malik Magdon-Ismail Regression − → Linear Regression Linear Regression age 32 years age 32 years gender male gender male salary 40,000 salary 40,000 debt 26,000 debt 26,000 years in job 1 year years in job 1 year years at home 3 years years at home 3 years . . . . . . . . . . . . Classification: Approve/Deny Classification: Approve/Deny Regression: Credit Line (dollar amount) regression ≡ y ∈ R Regression: Credit Line (dollar amount) regression ≡ y ∈ R d d � � w i x i = w t x h ( x ) = h ( x ) = w i x i = w t x i =0 i =0 � A c M Linear Classification and Regression : 15 /23 � A c M Linear Classification and Regression : 16 /23 L Creator: Malik Magdon-Ismail Regression − → L Creator: Malik Magdon-Ismail Squared error − →

  5. Least Squares Linear Regression Least Squares Linear Regression y y y y x 1 x 1 x 2 x 2 x x y = f ( x ) + ǫ ← − noisy target P ( y | x )  N �  E in ( h ) = 1 ( h ( x n ) − y n ) 2 in-sample error    N n =1 h ( x ) = w t x    E out ( h ) = E x [( h ( x ) − y ) 2 ] out-of-sample error  � A M Linear Classification and Regression : 17 /23 � A M Linear Classification and Regression : 18 /23 c L Creator: Malik Magdon-Ismail Squared error − → c L Creator: Malik Magdon-Ismail Matrix representation − → Using Matrices for Linear Regression Linear Regression Solution E in ( w ) = 1         — x 1 — y 1 ˆ y 1 w t x 1 N ( w t X t X w − 2 w t X t y + y t y ) — x 2 — y 2 ˆ y 2 w t x 2         X = y = y = ˆ  =  = X w  .   .   .   .  . . . . . . . .       — x N — y N y N ˆ w t x N Vector Calculus: To minimize E in ( w ), set ∇ w E in ( w ) = 0 . ∇ w ( w t A w ) = (A + A t ) w , ∇ w ( w t b ) = b . A = X t X and b = X t y : � �� � � �� � � �� � target vector in-sample predictions data matrix, N × ( d + 1) ∇ w E in ( w ) = 2 N N (X t X w − X t y ) E in ( w ) = 1 � y n − y n ) 2 (ˆ N n =1 | 2 1 = N | | ˆ y − y | Setting ∇ E in ( w ) = 0 : 2 X t X w = X t y ← − normal equations 1 | 2 = N | | X w − y | 2 1 = N ( w t X t X w − 2 w t X t y + y t y ) w lin = (X t X) − 1 X t y ← − when X t X is invertible � A c M Linear Classification and Regression : 19 /23 � A c M Linear Classification and Regression : 20 /23 L Creator: Malik Magdon-Ismail Pseudoinverse solution − → L Creator: Malik Magdon-Ismail Regression algorithm − →

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend