vc dimension and classification
play

VC Dimension and classification John Duchi Prof. John Duchi - PowerPoint PPT Presentation

VC Dimension and classification John Duchi Prof. John Duchi Outline I Setting: classification problems II Finite hypothesis classes 1 Union bounds 2 Zero error case III Shatter coe ffi cients and Rademacher complexity IV VC Dimension Prof.


  1. VC Dimension and classification John Duchi Prof. John Duchi

  2. Outline I Setting: classification problems II Finite hypothesis classes 1 Union bounds 2 Zero error case III Shatter coe ffi cients and Rademacher complexity IV VC Dimension Prof. John Duchi

  3. Setting for the lecture Binary classification problems: data X 2 X and labels Y 2 { � 1 , 1 } . Hypothesis class H ⇢ { h : X ! R } . Goal: Find h 2 H with L ( h ) := E [ 1 { h ( X ) Y  0 } ] small Loss is always ( 1 if sign( h ( x )) 6 = y ` ( h ; ( x, y )) = 1 { h ( x ) y  0 } = 0 if sign( h ( x )) = y Prof. John Duchi

  4. Finite hypothesis classes Theorem Let H be a finite class. Then ! r log |H| + t 9 h 2 H s.t. | L ( h ) � b  2 e � t . L n ( h ) | � P 2 n Prof. John Duchi

  5. Finite hypothesis classes: generalization Corollary Let H be a finite class, b h n 2 argmin h b L n ( h ) . Then (for numerical constant C < 1 ) s log |H| L ( b � h n )  min h 2 H L ( h ) + C n w.p. � 1 � � Prof. John Duchi

  6. Finite hypothesis classes: perfect classifiers Possible to give better guarantees if there are good classifiers! We won’t bother looking at bad ones. Theorem Let H be a finite hypothesis class and assume min h L ( h ) = 0 . Then for t � 0 ✓ ◆ h n ) � L ( h ? ) + log |H| + t L ( b  e � t . P n Prof. John Duchi

  7. Do not pick the bad ones Prof. John Duchi

  8. Finite function classes: Rademacher complexity Idea: Use Rademacher complexity to understand generalization even for these? Let F be finite with | f |  1 for f 2 F . Then � � " # � � n X 1 � � R n ( F ) := E max " i f ( Z i ) � � � � n f 2 F i =1 satisfies � � ! � � n X 1 � �  2 exp( � cnt 2 ) max f ( X i ) � E [ f ( X i )] � � 2 R n ( F ) + t P � � � n f 2 F i =1 Prof. John Duchi

  9. Finite function classes: sub-Gaussianity I Let P n be empirical distribution P n I Define k f k 2 L 2 ( P n ) = 1 i =1 f ( x i ) 2 n I What about sum n X 1 p n " i f ( x i ) i =1 Prof. John Duchi

  10. Finite function classes: Rademacher complexity Proposition (Massart’s finite class bound) Let F be finite with M := max f 2 F k f k L 2 ( P n ) . Then r 2 M 2 log(2 card( F )) b R n ( F )  . n Prof. John Duchi

  11. Infinite classes with finite labels What if we had a classifier h : X ! { � 1 , 1 } that could only give a certain number of di ff erent labelings to a data set? Example (Sketchy) Say X = R and h t ( x ) = sign( x � t ) . Complexity of F := { f ( x ) = 1 { h t ( x )  0 }} ? Prof. John Duchi

  12. Complexity of function classes Define F ( x 1: n ) := { ( f ( x 1 ) , . . . , f ( x n )) | f 2 F} . Then R n ( F ) = b b R n ( F 0 ) whenever F ( x 1: n ) = F 0 ( x 1: n ) Proposition Rademacher complexity depends on values of F : if | f ( x ) |  M for all x then r log card( F ( x 1: n )) R n ( F )  c · M sup . n x 1 ,...,x n 2 X Prof. John Duchi

  13. Proof of complexity Prof. John Duchi

  14. Shatter coe ffi cients Given function class F , shattering coe ffi cient (growth function) is s n ( F ) := sup card ( F ( x 1: n )) x 1 ,...,x n 2 X = x 1: n 2 X n card (( f ( x 1 ) , . . . , f ( x n )) | f 2 F ) sup Example Thresholds in R Prof. John Duchi

  15. Shatter coe ffi cients and Rademacher complexity Proposition For any function class F with | f ( x ) |  M we have r log s n ( F ) R n ( F )  cM . n Prof. John Duchi

  16. VC Dimension How do we use shatter coe ffi cients to give complexity guarantees? Definition (VC Dimension) Let H be a collection of boolean functions. The Vapnik Chervonenkis (VC) Dimension of H is VC ( H ) := sup { n 2 N : s n ( H ) = 2 n } . Prof. John Duchi

  17. VC Dimension: examples Example (Thresholds in R ) Example (Intervals in R ) Prof. John Duchi

  18. VC Dimension: examples Example (Half-spaces in R 2 ) Prof. John Duchi

  19. Finite dimensional hypothesis classes Let F be functions f : X ! R and suppose dim ( F ) = d I Definition of dimension: Example (Linear functionals) If F = { f ( x ) = w > x, w 2 R d } then dim ( F ) = d Example (Nonlinear functionals) If F = { f ( x ) = w > � ( x ) , w 2 R d } then dim ( F ) = d Prof. John Duchi

  20. VC dimension of finite dimensional classes Let F have dim ( F ) = d and let H := { h : X ! { � 1 , 1 } s.t. h ( x ) = sign( f ( x )) , f 2 F} . Proposition (Dimension bounds VC dimension) VC ( H )  dim ( F ) Prof. John Duchi

  21. Finite dimensional hypothesis classes: proof Prof. John Duchi

  22. Sauer-Shelah Lemma Theorem Let H be boolean functions with VC ( H ) = d . Then ( ✓ n ◆ d X 2 n if n  d s n ( H )   � ne � d i if n > d i =0 d Prof. John Duchi

  23. Rademacher complexity of VC classes Proposition Let H be collection of boolean functions with VC ( H ) = d . Then r d log n d R n ( H )  c . n Proof is immediate (but a tighter result is possible): Prof. John Duchi

  24. Generalization bounds for VC classes Proposition Let H have VC-dimension d and ` ( h ; ( x, y )) = 1 { h ( x ) 6 = y } . Then 0 s 1 d log d A  2 e � nt 2 @ 9 h 2 H s.t. | b n L n ( h ) � L ( h ) | � c + t P n Prof. John Duchi

  25. Things we have not addressed I Multiclass problems (Natarajan dimension, due to Bala Natarajan; see also Multiclass Learnability and the ERM Principle by Daniely et al.) I Extending “zero error” results to infinite classes I Non-boolean classes Prof. John Duchi

  26. Reading and bibliography 1. M. Anthony and P. Bartlet. Neural Network Learning: Theoretical Foundations . Cambridge University Press, 1999 2. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research , 3:463–482, 2002 3. S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: a survey of some recent advances. ESAIM: Probability and Statistics , 9:323–375, 2005 4. A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics . Springer, New York, 1996 (Ch. 2.6) 5. Scribe notes for Statistics 300b: http://web.stanford.edu/class/stats300b/ Prof. John Duchi

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend