VC Dimension and classification John Duchi Prof. John Duchi

Outline I Setting: classification problems II Finite hypothesis classes 1 Union bounds 2 Zero error case III Shatter coe ffi cients and Rademacher complexity IV VC Dimension Prof. John Duchi

Setting for the lecture Binary classification problems: data X 2 X and labels Y 2 { � 1 , 1 } . Hypothesis class H ⇢ { h : X ! R } . Goal: Find h 2 H with L ( h ) := E [ 1 { h ( X ) Y  0 } ] small Loss is always ( 1 if sign( h ( x )) 6 = y ` ( h ; ( x, y )) = 1 { h ( x ) y  0 } = 0 if sign( h ( x )) = y Prof. John Duchi

Finite hypothesis classes Theorem Let H be a finite class. Then ! r log |H| + t 9 h 2 H s.t. | L ( h ) � b  2 e � t . L n ( h ) | � P 2 n Prof. John Duchi

Finite hypothesis classes: generalization Corollary Let H be a finite class, b h n 2 argmin h b L n ( h ) . Then (for numerical constant C < 1 ) s log |H| L ( b � h n )  min h 2 H L ( h ) + C n w.p. � 1 � � Prof. John Duchi

Finite hypothesis classes: perfect classifiers Possible to give better guarantees if there are good classifiers! We won’t bother looking at bad ones. Theorem Let H be a finite hypothesis class and assume min h L ( h ) = 0 . Then for t � 0 ✓ ◆ h n ) � L ( h ? ) + log |H| + t L ( b  e � t . P n Prof. John Duchi

Do not pick the bad ones Prof. John Duchi

Finite function classes: Rademacher complexity Idea: Use Rademacher complexity to understand generalization even for these? Let F be finite with | f |  1 for f 2 F . Then � � " # � � n X 1 � � R n ( F ) := E max " i f ( Z i ) � � � � n f 2 F i =1 satisfies � � ! � � n X 1 � �  2 exp( � cnt 2 ) max f ( X i ) � E [ f ( X i )] � � 2 R n ( F ) + t P � � � n f 2 F i =1 Prof. John Duchi

Finite function classes: sub-Gaussianity I Let P n be empirical distribution P n I Define k f k 2 L 2 ( P n ) = 1 i =1 f ( x i ) 2 n I What about sum n X 1 p n " i f ( x i ) i =1 Prof. John Duchi

Finite function classes: Rademacher complexity Proposition (Massart’s finite class bound) Let F be finite with M := max f 2 F k f k L 2 ( P n ) . Then r 2 M 2 log(2 card( F )) b R n ( F )  . n Prof. John Duchi

Infinite classes with finite labels What if we had a classifier h : X ! { � 1 , 1 } that could only give a certain number of di ff erent labelings to a data set? Example (Sketchy) Say X = R and h t ( x ) = sign( x � t ) . Complexity of F := { f ( x ) = 1 { h t ( x )  0 }} ? Prof. John Duchi

Complexity of function classes Define F ( x 1: n ) := { ( f ( x 1 ) , . . . , f ( x n )) | f 2 F} . Then R n ( F ) = b b R n ( F 0 ) whenever F ( x 1: n ) = F 0 ( x 1: n ) Proposition Rademacher complexity depends on values of F : if | f ( x ) |  M for all x then r log card( F ( x 1: n )) R n ( F )  c · M sup . n x 1 ,...,x n 2 X Prof. John Duchi

Proof of complexity Prof. John Duchi

Shatter coe ffi cients Given function class F , shattering coe ffi cient (growth function) is s n ( F ) := sup card ( F ( x 1: n )) x 1 ,...,x n 2 X = x 1: n 2 X n card (( f ( x 1 ) , . . . , f ( x n )) | f 2 F ) sup Example Thresholds in R Prof. John Duchi

Shatter coe ffi cients and Rademacher complexity Proposition For any function class F with | f ( x ) |  M we have r log s n ( F ) R n ( F )  cM . n Prof. John Duchi

VC Dimension How do we use shatter coe ffi cients to give complexity guarantees? Definition (VC Dimension) Let H be a collection of boolean functions. The Vapnik Chervonenkis (VC) Dimension of H is VC ( H ) := sup { n 2 N : s n ( H ) = 2 n } . Prof. John Duchi

VC Dimension: examples Example (Thresholds in R ) Example (Intervals in R ) Prof. John Duchi

VC Dimension: examples Example (Half-spaces in R 2 ) Prof. John Duchi

Finite dimensional hypothesis classes Let F be functions f : X ! R and suppose dim ( F ) = d I Definition of dimension: Example (Linear functionals) If F = { f ( x ) = w > x, w 2 R d } then dim ( F ) = d Example (Nonlinear functionals) If F = { f ( x ) = w > � ( x ) , w 2 R d } then dim ( F ) = d Prof. John Duchi

VC dimension of finite dimensional classes Let F have dim ( F ) = d and let H := { h : X ! { � 1 , 1 } s.t. h ( x ) = sign( f ( x )) , f 2 F} . Proposition (Dimension bounds VC dimension) VC ( H )  dim ( F ) Prof. John Duchi

Finite dimensional hypothesis classes: proof Prof. John Duchi

Sauer-Shelah Lemma Theorem Let H be boolean functions with VC ( H ) = d . Then ( ✓ n ◆ d X 2 n if n  d s n ( H )   � ne � d i if n > d i =0 d Prof. John Duchi

Rademacher complexity of VC classes Proposition Let H be collection of boolean functions with VC ( H ) = d . Then r d log n d R n ( H )  c . n Proof is immediate (but a tighter result is possible): Prof. John Duchi

Generalization bounds for VC classes Proposition Let H have VC-dimension d and ` ( h ; ( x, y )) = 1 { h ( x ) 6 = y } . Then 0 s 1 d log d A  2 e � nt 2 @ 9 h 2 H s.t. | b n L n ( h ) � L ( h ) | � c + t P n Prof. John Duchi

Things we have not addressed I Multiclass problems (Natarajan dimension, due to Bala Natarajan; see also Multiclass Learnability and the ERM Principle by Daniely et al.) I Extending “zero error” results to infinite classes I Non-boolean classes Prof. John Duchi

Reading and bibliography 1. M. Anthony and P. Bartlet. Neural Network Learning: Theoretical Foundations . Cambridge University Press, 1999 2. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research , 3:463–482, 2002 3. S. Boucheron, O. Bousquet, and G. Lugosi. Theory of classification: a survey of some recent advances. ESAIM: Probability and Statistics , 9:323–375, 2005 4. A. W. van der Vaart and J. A. Wellner. Weak Convergence and Empirical Processes: With Applications to Statistics . Springer, New York, 1996 (Ch. 2.6) 5. Scribe notes for Statistics 300b: http://web.stanford.edu/class/stats300b/ Prof. John Duchi

VC Dimension and classification John Duchi Prof. John Duchi - PowerPoint PPT Presentation

VC Dimension and classification John Duchi Prof. John Duchi Outline I Setting: classification problems II Finite hypothesis classes 1 Union bounds 2 Zero error case III Shatter coe ffi cients and Rademacher complexity IV VC Dimension Prof.

VC-dimension and Erd os-P osa property Nicolas Bousquet LIRMM, University Montpellier II

The Human Dimension Sue Manns Regional Director Pegasus The Human Dimension The Human

Dimension Reduction and Nearest Neighbor Search Advanced Algorithms Nanjing University, Fall

Packing Dimension Results for Anisotropic Gaussian Random Fields Dongsheng Wu Department of

Dimension Reduction CSE 6242 / CX 4242 Thanks : Prof. Jaegul Choo , Dr. Ramakrishnan Kannan,

The Metric Dimension Problem. J. D az Monash U., May 2018 The Metric Dimension problem

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Management of Classification Lookup Files The basics of classification The basics of

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Library of Congress Classification: Module 1.3 1 Library of Congress Classification: Module 1.3

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Classification of procedures Vicen c Torra March, 2019 Hamilton Institute, Maynooth

LArSoft vectorization tests: status report Guilherme Lima LArSoft Coordination Meeting June 19,

GVPLS/LPE - Generic VPLS Solution based on LPE Framework Update version 01 Vasile Radoaca/Dinesh

Machine Learning and Data Mining VC Dimension Kalev Kask Slides based on Andrew Moores

People Centric Designs IoP - Dagstuhl October 2017 Paul Houghton Director, Wizardry and

Complete Information Flow Tracking from Gates Up Mohit Tiwari, Xun Li, Hassan M G Wassel,

ITU-R Study Group 1 on-going activities on Spectrum Management Webpage

Cooperative Positioning in Urban Environments: Opportunities and Challenges Joon Wayn Cheong

DMLSS and Strategic Sourcing Capability and Pricing Agreements Practical Experience Mr. Ivan