learnability beyond uniform convergence
play

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School - PowerPoint PPT Presentation

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Mathematical and Computational Foundations of Learning Theory, Dagstuhl 2011 Joint work with: N. Srebro, O.


  1. Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem ”Mathematical and Computational Foundations of Learning Theory”, Dagstuhl 2011 Joint work with: N. Srebro, O. Shamir, K. Sridharan (COLT’09,JMLR’11) A. Daniely, S. Sabato, S. Ben-David (COLT’11) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 1 / 34

  2. The Fundamental Theorem of Learning Theory For Binary Classification Uniform trivial trivial Learnable Learnable Convergence with ERM VC’71 Finite VC NFL (W’96) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 2 / 34

  3. The Fundamental Theorem of Learning Theory For Regression Uniform trivial trivial Learnable Learnable Convergence with ERM Finite fat- BLW’96,ABCH’97 KS’94,BLW’96,ABCH’97 shattering Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 3 / 34

  4. For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM ? Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 4 / 34

  5. For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM X Not true even in multiclass classification ! What is learnable ? How to learn ? Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 4 / 34

  6. Outline Definitions 1 Learnability without uniform convergence 2 Characterizing Learnability using Stability 3 Characterizing Multiclass Learnability 4 Open Questions 5 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 5 / 34

  7. The General Learning Setting Vapnik’s General Learning Setting Hypothesis class H Instance space Z with unknown distribution D Loss function ℓ : H × Z → R Given: Training set S ∼ D m Goal: Probably approximately solve min h ∈H L ( h ) where L ( h ) = E z ∼D [ ℓ ( h, z )] Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 6 / 34

  8. Examples Binary classification: Z = X × { 0 , 1 } h ∈ H is a predictor h : X → { 0 , 1 } ℓ ( h, ( x, y )) = 1 [ h ( x ) � = y ] Multiclass categorization: Z = X × Y h ∈ H is a predictor h : X → Y ℓ ( h, ( x, y )) = 1 [ h ( x ) � = y ] k -means clustering: Z = R d H ⊂ ( R d ) k specifies k cluster centers ℓ (( µ 1 , . . . , µ k ) , z ) = min j � µ j − z � Density Estimation: h is a parameter of a density p h ( z ) ℓ ( h, z ) = − log p h ( z ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 7 / 34

  9. Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) , S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

  10. Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) , S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min ≥ 1 − δ P h ∈H L ( h ) + ǫ S ∼D m Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

  11. Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) , S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min ≥ 1 − δ P h ∈H L ( h ) + ǫ S ∼D m ERM : An algorithm that returns A ( S ) ∈ argmin h ∈H L S ( h ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

  12. Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) , S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min ≥ 1 − δ P h ∈H L ( h ) + ǫ S ∼D m ERM : An algorithm that returns A ( S ) ∈ argmin h ∈H L S ( h ) Learnable by arbitrary ERM : Like “Learnable” but A should be an ERM. Denote sample complexity by m ERM ( ǫ, δ ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

  13. For Binary Classification Uniform trivial trivial Learnable Learnable Convergence with ERM VC’71 Finite VC NFL (W’96) VC( H ) log(1 /δ ) m UC ( ǫ, δ ) ≈ m ERM ( ǫ, δ ) ≈ m PAC ( ǫ, δ ) ≈ ǫ 2 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 9 / 34

  14. Outline Definitions 1 Learnability without uniform convergence 2 Characterizing Learnability using Stability 3 Characterizing Multiclass Learnability 4 Open Questions 5 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 10 / 34

  15. First (trivial) Counter Example Minorizing function: Let H ′ be a class of binary classifiers with infinite VC dimension Let H = H ′ ∪ { h 0 }  1 if h � = h 0 ∧ h ( x ) � = y   Let ℓ ( h, ( x, y )) = 1 / 2 if h � = h 0 ∧ h ( x ) = y  0 if h = h 0  No uniform convergence ( m UC = ∞ ) Learnable by ERM ( m ERM = 0 ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 11 / 34

  16. From Vapnik’s book ... Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 12 / 34

  17. Second Counter Example — Multiclass X – a set, Y = 2 X ∪ {∗} . H = { h T : T ⊂ X} where � ∗ x / ∈ T h T ( x ) = x ∈ T T Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 13 / 34

  18. Second Counter Example — Multiclass X – a set, Y = 2 X ∪ {∗} . H = { h T : T ⊂ X} where � ∗ x / ∈ T h T ( x ) = x ∈ T T Claim: No uniform convergence: m UC ≥ |X| /ǫ Target function is h ∅ For any training set S , take T = X \ S L S ( h T ) = 0 but L ( h T ) = P [ T ] Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 13 / 34

  19. Second Counter Example — Multiclass X – a set, Y = 2 X ∪ {∗} . H = { h T : T ⊂ X} where � ∗ ∈ T x / h T ( x ) = T x ∈ T Claim: H is Learnable: m PAC ≤ 1 ǫ Let T be the target A ( S ) = h T if ( x, T ) ∈ S A ( S ) = h ∅ if S = { ( x 1 , ∗ ) , . . . , ( x m , ∗ ) } In the 1st case, L ( A ( S )) = 0 . In the 2nd case, L ( A ( S )) = P [ T ] With high probability, if P [ T ] > ǫ then we’ll be in the 1st case Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 13 / 34

  20. Second Counter Example — Multiclass Corollary m UC m PAC ≈ |X| . If |X| → ∞ then the problem is learnable but there is no uniform convergence! Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 14 / 34

  21. Third Counter Example — Stochastic Convex Optimization Consider the family of problems: H is a convex set with max h ∈H � h � ≤ 1 For all z , ℓ ( h, z ) is convex and Lipschitz w.r.t. h Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 15 / 34

  22. Third Counter Example — Stochastic Convex Optimization Consider the family of problems: H is a convex set with max h ∈H � h � ≤ 1 For all z , ℓ ( h, z ) is convex and Lipschitz w.r.t. h Claim: Problem is learnable by the rule: m 2 � h � 2 + 1 λ m � argmin ℓ ( h, z i ) m h ∈H i =1 No uniform convergence Not learnable by ERM Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 15 / 34

  23. Third Counter Example — Stochastic Convex Optimization Proof (of “not learnable by arbitrary ERM”) 1 -Mean + missing features Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 16 / 34

  24. Third Counter Example — Stochastic Convex Optimization Proof (of “not learnable by arbitrary ERM”) 1 -Mean + missing features z = ( α, x ) , α ∈ { 0 , 1 } d , x ∈ R d , � x � ≤ 1 �� i α i ( h i − x i ) 2 ℓ ( h, ( α, x )) = Take P [ α i = 1] = 1 / 2 , P [ x = µ ] = 1 Let h ( i ) be s.t. � 1 − µ j if j = i h ( i ) = j µ j o.w. If d is large enough, exists i such that h ( i ) is an ERM √ But L ( h ( i ) ) ≥ 1 / 2 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 16 / 34

  25. Third Counter Example — Stochastic Convex Optimization Proof (of “not even learnable by a unique ERM”) Perturb the loss a little bit: �� α i ( h i − x i ) 2 + ǫ � 2 − i ( h i − 1) 2 ℓ ( h, ( α, x )) = i i Now loss is strictly convex — unique ERM But the unique ERM does not generalize (as before) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 17 / 34

  26. Outline Definitions 1 Learnability without uniform convergence 2 Characterizing Learnability using Stability 3 Characterizing Multiclass Learnability 4 Open Questions 5 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 18 / 34

  27. Characterizing Learnability using Stability Theorem A sufficient and necessary condition for learnability is the existence of Asymptotic ERM (AERM) which is stable. RMP’05,MNPR’06, Uniform trivial ERM is stable ∃ stable AERM Convergence Learnable Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 19 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend