Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School - PowerPoint PPT Presentation

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem ”Algorithmic Learning Theory”, Lyon 2012 Joint work with: N. Srebro, O. Shamir, K. Sridharan (COLT’09,JMLR’11) A. Daniely, S. Sabato, S. Ben-David (COLT’11) A. Daniely, S. Sabato (NIPS’12) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 1 / 34

The Fundamental Theorem of Learning Theory For Binary Classification Uniform trivial trivial Learnable Learnable Convergence with ERM VC’71 NFL (W’96) Finite VC VC = Vapnik and Chervonenkis, W = Wolpert Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 2 / 34

The Fundamental Theorem of Learning Theory For Regression Uniform trivial trivial Learnable Learnable Convergence with ERM BLW’96,ABCH’97 Finite fat- KS’94,BLW’96,ABCH’97 shattering BLW = Bartlett, Long, Williamson. ABCH = Alon, Ben-David, Cesa-Bianchi, Hausler. KS = Kearns and Schapire Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 3 / 34

For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM ? Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 4 / 34

For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM X Not true Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 4 / 34

For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM X Not true Not true in “Convex learning problems” ! Not true even in “multiclass categorization” ! What is learnable ? How to learn ? Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 4 / 34

Outline Definitions 1 Learnability without uniform convergence 2 Characterizing Learnability using Stability 3 Characterizing Multiclass Learnability 4 Analyzing specific, practically relevant, classes 5 Open Questions 6 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 5 / 34

The General Learning Setting (Vapnik) Hypothesis class H Examples domain Z with unknown distribution D Loss function ℓ : H × Z → R Given: Training set S ∼ D m Goal: Solve: min h ∈H L ( h ) where L ( h ) = E z ∼D [ ℓ ( h, z )] in the P robably (w.p. ≥ 1 − δ ) A pproximately C orrect (up to ǫ ) sense Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 6 / 34

The General Learning Setting (Vapnik) Hypothesis class H Examples domain Z with unknown distribution D Loss function ℓ : H × Z → R Given: Training set S ∼ D m Goal: Solve: min h ∈H L ( h ) where L ( h ) = E z ∼D [ ℓ ( h, z )] in the P robably (w.p. ≥ 1 − δ ) A pproximately C orrect (up to ǫ ) sense m Training loss: L S ( h ) = 1 � ℓ ( h, z i ) m i =1 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 6 / 34

Examples Binary classification: Z = X × { 0 , 1 } h ∈ H is a predictor h : X → { 0 , 1 } ℓ ( h, ( x, y )) = 1 [ h ( x ) � = y ] Multiclass categorization: Z = X × Y h ∈ H is a predictor h : X → Y ℓ ( h, ( x, y )) = 1 [ h ( x ) � = y ] k -means clustering: Z = R d H ⊂ ( R d ) k specifies k cluster centers ℓ (( µ 1 , . . . , µ k ) , z ) = min j � µ j − z � Density Estimation: h is a parameter of a density p h ( z ) ℓ ( h, z ) = − log p h ( z ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 7 / 34

Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 8 / 34

Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min h ∈H L ( h ) + ǫ ≥ 1 − δ P S ∼D m Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 8 / 34

Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min h ∈H L ( h ) + ǫ ≥ 1 − δ P S ∼D m ERM : An algorithm that returns A ( S ) ∈ argmin h ∈H L S ( h ) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 8 / 34

Learnability, ERM, Uniform convergence Uniform Convergence : For m ≥ m UC ( ǫ, δ ) S ∼D m [ ∀ h ∈ H , | L S ( h ) − L ( h ) | ≤ ǫ ] ≥ 1 − δ P Learnable : ∃A s.t. for m ≥ m PAC ( ǫ, δ ) , � � L ( A ( S )) ≤ min h ∈H L ( h ) + ǫ ≥ 1 − δ P S ∼D m ERM : An algorithm that returns A ( S ) ∈ argmin h ∈H L S ( h ) Learnable by arbitrary ERM (with rate m ERM ( ǫ, δ ) ) Like “Learnable” but A should be an ERM. Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 8 / 34

For Binary Classification Uniform trivial trivial Learnable Learnable Convergence with ERM VC’71 NFL (W’96) Finite VC VC( H ) log(1 /δ ) m UC ( ǫ, δ ) ≈ m ERM ( ǫ, δ ) ≈ m PAC ( ǫ, δ ) ≈ ǫ 2 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 9 / 34

Outline Definitions 1 Learnability without uniform convergence 2 Characterizing Learnability using Stability 3 Characterizing Multiclass Learnability 4 Analyzing specific, practically relevant, classes 5 Open Questions 6 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 10 / 34

Counter Example — Stochastic Convex Optimization Consider the family of problems: H is a convex set with max h ∈H � h � ≤ 1 For all z , ℓ ( h, z ) is convex and Lipschitz w.r.t. h Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 11 / 34

Counter Example — Stochastic Convex Optimization Consider the family of problems: H is a convex set with max h ∈H � h � ≤ 1 For all z , ℓ ( h, z ) is convex and Lipschitz w.r.t. h Claim: Problem is learnable by the rule: m 2 � h � 2 + 1 λ m � argmin ℓ ( h, z i ) m h ∈H i =1 No uniform convergence Not learnable by ERM Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 11 / 34

Counter Example — Stochastic Convex Optimization Proof (of “not learnable by arbitrary ERM”) 1 -Mean + missing features Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 12 / 34

Counter Example — Stochastic Convex Optimization Proof (of “not learnable by arbitrary ERM”) 1 -Mean + missing features z = ( α, x ) , α ∈ { 0 , 1 } d , x ∈ R d , � x � ≤ 1 �� i α i ( h i − x i ) 2 ℓ ( h, ( α, x )) = Take P [ α i = 1] = 1 / 2 , P [ x = µ ] = 1 Let h ( i ) be s.t. � 1 − µ j if j = i h ( i ) = j µ j o.w. If d is large enough, exists i such that h ( i ) is an ERM √ But L ( h ( i ) ) ≥ 1 / 2 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 12 / 34

Counter Example — Stochastic Convex Optimization Proof (of “not even learnable by a unique ERM”) Perturb the loss a little bit: �� α i ( h i − x i ) 2 + ǫ � 2 − i ( h i − 1) 2 ℓ ( h, ( α, x )) = i i Now loss is strictly convex — unique ERM But the unique ERM does not generalize (as before) Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 13 / 34

For general learning problems? Uniform trivial trivial Learnable Learnable Convergence with ERM X Not true Not true in “Convex learning problems” ! ✓ Not true even in “multiclass categorization” ! Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 14 / 34

Counter Example — Multiclass X – a set, Y = { 0 , 1 , 2 , . . . , 2 |X| − 1 } Let n : 2 X → Y be defined by binary encoding H = { h T : T ⊂ X} where � 0 x / ∈ T h T ( x ) = n ( T ) x ∈ T Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 15 / 34

Counter Example — Multiclass X – a set, Y = { 0 , 1 , 2 , . . . , 2 |X| − 1 } Let n : 2 X → Y be defined by binary encoding H = { h T : T ⊂ X} where � 0 x / ∈ T h T ( x ) = n ( T ) x ∈ T Claim: No uniform convergence: m UC ≥ |X| /ǫ Target function is h ∅ For any training set S , take T = X \ S L S ( h T ) = 0 but L ( h T ) = P [ T ] Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 15 / 34

Counter Example — Multiclass X – a set, Y = { 0 , 1 , 2 , . . . , 2 |X| − 1 } Let n : 2 X → Y be defined by binary encoding H = { h T : T ⊂ X} where � 0 x / ∈ T h T ( x ) = n ( T ) x ∈ T Claim: H is Learnable: m PAC ≤ 1 ǫ Let T be the target A ( S ) = h T if ( x, n ( T )) ∈ S A ( S ) = h ∅ if S = { ( x 1 , 0) , . . . , ( x m , 0) } In the 1st case, L ( A ( S )) = 0 . In the 2nd case, L ( A ( S )) = P [ T ] With high probability, if P [ T ] > ǫ then we’ll be in the 1st case Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Oct’12 15 / 34

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School - PowerPoint PPT Presentation

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Algorithmic Learning Theory, Lyon 2012 Joint work with: N. Srebro, O. Shamir, K. Sridharan

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Curriculum on The Cadet Corps Uniform Wear It WIth honor Class C Uniform Class C Uniform

Non-Uniform Computation Lecture 10 Non-Uniform Computational Models: Circuits 1 Non-Uniform

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The

Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2

Non-Uniform Computation & Circuits Lecture 10 Wherein every language can be decided 1

An experimental study of the learnability of congestion control Anirudh Sivaraman, Keith

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

Evaluating Learnability of - User interface and inline help - Inline/Online Tutorials Aim:

Plan Introduction 1 On categorial grammars and learnability 2 Logical Information Systems

Non Uniform Learnability prof. dr Arno Siebes Algorithmic Data Analysis Group Department of

MAZENOD COLLEGE STUDENT PRESENTATION POLICY SUMMER UNIFORM If out of uniform students must

Winter Uniform If out of uniform students must present a note of explanation to their Year Level

Uniform Guidance aka UG, UniGui HUGE: CSU Harnessing Uniform Guidance Effectively An update

Convergence of uniform subdivision Amos Ron Erice, Trapani, Sicilia, Italia, Europa September,

CernVM[FS] and CMS Open Data Pilot Jakob Blomer, Gerardo Ganis, Adam Huffman, Kati

CS7038 - Malware Analysis - Wk01.2 VirtualBox Lab Setup and Crash Course Coleman Kane

Import/Export OVA Arik Hadas Deep Dive Scope Existing OVA support: Importing

Slide 1 / 65 Slide 2 / 65 1 The length and radius of an aluminum wire is quadrupled. By 2 A

Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J Applied Parallel Computing

Multi-class Classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396

Towards Optimal Discriminating Order for Multiclass Classification Dong Liu, Shuicheng Yan,

Software Vulnerability Handling and practical incident recognition Idea: ( ) Sven Gabriel

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School - PowerPoint PPT Presentation

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Algorithmic Learning Theory, Lyon 2012 Joint work with: N. Srebro, O. Shamir, K. Sridharan

Curriculum on The Cadet Corps Uniform Class A Uniform Class A Uniform Agenda C1. Class A

Curriculum on The Cadet Corps Uniform Wear It WIth honor Class C Uniform Class C Uniform

Non-Uniform Computation Lecture 10 Non-Uniform Computational Models: Circuits 1 Non-Uniform

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The

Circuits Lecture 11 Uniform Circuit Complexity 1 Recall 2 Recall Non-uniform complexity 2

Non-Uniform Computation &amp; Circuits Lecture 10 Wherein every language can be decided 1

An experimental study of the learnability of congestion control Anirudh Sivaraman, Keith

Machine learning theory Nonuniform learnability Hamid Beigy Sharif university of technology

Learnability and models of decision making under uncertainty Pathikrit Basu Federico Echenique

Evaluating Learnability of - User interface and inline help - Inline/Online Tutorials Aim:

Plan Introduction 1 On categorial grammars and learnability 2 Logical Information Systems

Non Uniform Learnability prof. dr Arno Siebes Algorithmic Data Analysis Group Department of

MAZENOD COLLEGE STUDENT PRESENTATION POLICY SUMMER UNIFORM If out of uniform students must

Winter Uniform If out of uniform students must present a note of explanation to their Year Level

Uniform Guidance aka UG, UniGui HUGE: CSU Harnessing Uniform Guidance Effectively An update

Convergence of uniform subdivision Amos Ron Erice, Trapani, Sicilia, Italia, Europa September,

CernVM[FS] and CMS Open Data Pilot Jakob Blomer, Gerardo Ganis, Adam Huffman, Kati

CS7038 - Malware Analysis - Wk01.2 VirtualBox Lab Setup and Crash Course Coleman Kane

Import/Export OVA Arik Hadas Deep Dive Scope Existing OVA support: Importing

Slide 1 / 65 Slide 2 / 65 1 The length and radius of an aluminum wire is quadrupled. By 2 A

Multiclass Classification using SVMs on GPUs Sergio Herrero 6.338J Applied Parallel Computing

Multi-class Classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396

Towards Optimal Discriminating Order for Multiclass Classification Dong Liu, Shuicheng Yan,

Software Vulnerability Handling and practical incident recognition Idea: ( ) Sven Gabriel

Non-Uniform Computation & Circuits Lecture 10 Wherein every language can be decided 1