learnability beyond uniform convergence
play

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School - PowerPoint PPT Presentation

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Algorithmic Learning Theory, Lyon 2012 Joint work with: N. Srebro, O. Shamir, K. Sridharan


  1. Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem ”Algorithmic Learning Theory”, Lyon 2012 Joint work with: N. Srebro, O. Shamir, K. Sridharan (COLT’09,JMLR’11) A. Daniely, S. Sabato, S. Ben-David (COLT’11) A. Daniely, S. Sabato (NIPS’12) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 1 / 34

  2. The Fundamental Theorem of Learning Theory For Binary Classification Uniform trivial trivial Learnable Learnable Convergence with ERM VC’71 NFL (W’96) Finite VC VC = Vapnik and Chervonenkis, W = Wolpert Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 2 / 34

  3. The Fundamental Theorem of Learning Theory For Regression Uniform trivial trivial Learnable Learnable Convergence with ERM BLW’96,ABCH’97 Finite fat- KS’94,BLW’96,ABCH’97 shattering BLW = Bartlett, Long, Williamson. ABCH = Alon, Ben-David, Cesa-Bianchi, Hausler. KS = Kearns and Schapire Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 3 / 34

  4. For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM ? Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 4 / 34

  5. For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM X Not true Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 4 / 34

  6. For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM X Not true Not true in “Convex learning problems” ! Not true even in “multiclass categorization” ! What is learnable ? How to learn ? Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 4 / 34

  7. Outline Definitions 1 Learnability without uniform convergence 2 Characterizing Learnability using Stability 3 Characterizing Multiclass Learnability 4 Analyzing specific, practically relevant, classes 5 Open Questions 6 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 5 / 34

  8. The General Learning Setting (Vapnik) Hypothesis class H Examples domain Z with unknown distribution D Loss function ℓ : H × Z → R Given: Training set S ∼ D m Goal: Solve: min h ∈H L ( h ) where L ( h ) = E z ∼D [ ℓ ( h, z )] in the P robably (w.p. ≥ 1 − δ ) A pproximately C orrect (up to ǫ ) sense Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 6 / 34

  9. The General Learning Setting (Vapnik) Hypothesis class H Examples domain Z with unknown distribution D Loss function ℓ : H × Z → R Given: Training set S ∼ D m Goal: Solve: min h ∈H L ( h ) where L ( h ) = E z ∼D [ ℓ ( h, z )] in the P robably (w.p. ≥ 1 − δ ) A pproximately C orrect (up to ǫ ) sense m Training loss: L S ( h ) = 1 � ℓ ( h, z i ) m i =1 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 6 / 34

  10. Examples Binary classification: Z = X × { 0 , 1 } h ∈ H is a predictor h : X → { 0 , 1 } ℓ ( h, ( x, y )) = 1 [ h ( x ) � = y ] Multiclass categorization: Z = X × Y h ∈ H is a predictor h : X → Y ℓ ( h, ( x, y )) = 1 [ h ( x ) � = y ] k -means clustering: Z = R d H ⊂ ( R d ) k specifies k cluster centers ℓ (( µ 1 , . . . , µ k ) , z ) = min j � µ j − z � Density Estimation: h is a parameter of a density p h ( z ) ℓ ( h, z ) = − log p h ( z ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 7 / 34

  11. Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 8 / 34

  12. Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min h ∈H L ( h ) + ǫ ≥ 1 − δ P S ∼D m Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 8 / 34

  13. Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min h ∈H L ( h ) + ǫ ≥ 1 − δ P S ∼D m ERM : An algorithm that returns A ( S ) ∈ argmin h ∈H L S ( h ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 8 / 34

  14. Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min h ∈H L ( h ) + ǫ ≥ 1 − δ P S ∼D m ERM : An algorithm that returns A ( S ) ∈ argmin h ∈H L S ( h ) Learnable by arbitrary ERM (with rate m ERM ( ǫ, δ ) ) Like “Learnable” but A should be an ERM. Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 8 / 34

  15. For Binary Classification Uniform trivial trivial Learnable Learnable Convergence with ERM VC’71 NFL (W’96) Finite VC VC( H ) log(1 /δ ) m UC ( ǫ, δ ) ≈ m ERM ( ǫ, δ ) ≈ m PAC ( ǫ, δ ) ≈ ǫ 2 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 9 / 34

  16. Outline Definitions 1 Learnability without uniform convergence 2 Characterizing Learnability using Stability 3 Characterizing Multiclass Learnability 4 Analyzing specific, practically relevant, classes 5 Open Questions 6 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 10 / 34

  17. Counter Example — Stochastic Convex Optimization Consider the family of problems: H is a convex set with max h ∈H � h � ≤ 1 For all z , ℓ ( h, z ) is convex and Lipschitz w.r.t. h Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 11 / 34

  18. Counter Example — Stochastic Convex Optimization Consider the family of problems: H is a convex set with max h ∈H � h � ≤ 1 For all z , ℓ ( h, z ) is convex and Lipschitz w.r.t. h Claim: Problem is learnable by the rule: m 2 � h � 2 + 1 λ m � argmin ℓ ( h, z i ) m h ∈H i =1 No uniform convergence Not learnable by ERM Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 11 / 34

  19. Counter Example — Stochastic Convex Optimization Proof (of “not learnable by arbitrary ERM”) 1 -Mean + missing features Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 12 / 34

  20. Counter Example — Stochastic Convex Optimization Proof (of “not learnable by arbitrary ERM”) 1 -Mean + missing features z = ( α, x ) , α ∈ { 0 , 1 } d , x ∈ R d , � x � ≤ 1 �� i α i ( h i − x i ) 2 ℓ ( h, ( α, x )) = Take P [ α i = 1] = 1 / 2 , P [ x = µ ] = 1 Let h ( i ) be s.t. � 1 − µ j if j = i h ( i ) = j µ j o.w. If d is large enough, exists i such that h ( i ) is an ERM √ But L ( h ( i ) ) ≥ 1 / 2 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 12 / 34

  21. Counter Example — Stochastic Convex Optimization Proof (of “not even learnable by a unique ERM”) Perturb the loss a little bit: �� α i ( h i − x i ) 2 + ǫ � 2 − i ( h i − 1) 2 ℓ ( h, ( α, x )) = i i Now loss is strictly convex — unique ERM But the unique ERM does not generalize (as before) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 13 / 34

  22. For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM X Not true Not true in “Convex learning problems” ! ✓ Not true even in “multiclass categorization” ! Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 14 / 34

  23. Counter Example — Multiclass X – a set, Y = { 0 , 1 , 2 , . . . , 2 |X| − 1 } Let n : 2 X → Y be defined by binary encoding H = { h T : T ⊂ X} where � 0 x / ∈ T h T ( x ) = n ( T ) x ∈ T Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 15 / 34

  24. Counter Example — Multiclass X – a set, Y = { 0 , 1 , 2 , . . . , 2 |X| − 1 } Let n : 2 X → Y be defined by binary encoding H = { h T : T ⊂ X} where � 0 x / ∈ T h T ( x ) = n ( T ) x ∈ T Claim: No uniform convergence: m UC ≥ |X| /ǫ Target function is h ∅ For any training set S , take T = X \ S L S ( h T ) = 0 but L ( h T ) = P [ T ] Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 15 / 34

  25. Counter Example — Multiclass X – a set, Y = { 0 , 1 , 2 , . . . , 2 |X| − 1 } Let n : 2 X → Y be defined by binary encoding H = { h T : T ⊂ X} where � 0 x / ∈ T h T ( x ) = n ( T ) x ∈ T Claim: H is Learnable: m PAC ≤ 1 ǫ Let T be the target A ( S ) = h T if ( x, n ( T )) ∈ S A ( S ) = h ∅ if S = { ( x 1 , 0) , . . . , ( x m , 0) } In the 1st case, L ( A ( S )) = 0 . In the 2nd case, L ( A ( S )) = P [ T ] With high probability, if P [ T ] > ǫ then we’ll be in the 1st case Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 15 / 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend