Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School - - PowerPoint PPT Presentation

learnability beyond uniform convergence
SMART_READER_LITE
LIVE PREVIEW

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School - - PowerPoint PPT Presentation

Learnability Beyond Uniform Convergence Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Mathematical and Computational Foundations of Learning Theory, Dagstuhl 2011 Joint work with: N. Srebro, O.


slide-1
SLIDE 1

Learnability Beyond Uniform Convergence

Shai Shalev-Shwartz

School of CS and Engineering, The Hebrew University of Jerusalem

”Mathematical and Computational Foundations of Learning Theory”, Dagstuhl 2011

Joint work with:

  • N. Srebro, O. Shamir, K. Sridharan (COLT’09,JMLR’11)
  • A. Daniely, S. Sabato, S. Ben-David (COLT’11)

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 1 / 34

slide-2
SLIDE 2

The Fundamental Theorem of Learning Theory

For Binary Classification

Uniform Convergence Learnable with ERM Learnable Finite VC

trivial trivial NFL (W’96) VC’71 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 2 / 34

slide-3
SLIDE 3

The Fundamental Theorem of Learning Theory

For Regression

Uniform Convergence Learnable with ERM Learnable Finite fat- shattering

trivial trivial KS’94,BLW’96,ABCH’97 BLW’96,ABCH’97 Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 3 / 34

slide-4
SLIDE 4

For general learning problems?

Uniform Convergence Learnable with ERM Learnable

trivial trivial

?

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 4 / 34

slide-5
SLIDE 5

For general learning problems?

Uniform Convergence Learnable with ERM Learnable

trivial trivial

X

Not true even in multiclass classification ! What is learnable ? How to learn ?

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 4 / 34

slide-6
SLIDE 6

Outline

1

Definitions

2

Learnability without uniform convergence

3

Characterizing Learnability using Stability

4

Characterizing Multiclass Learnability

5

Open Questions

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 5 / 34

slide-7
SLIDE 7

The General Learning Setting

Vapnik’s General Learning Setting

Hypothesis class H Instance space Z with unknown distribution D Loss function ℓ : H × Z → R Given: Training set S ∼ Dm Goal: Probably approximately solve min

h∈H L(h)

where L(h) = E

z∼D[ℓ(h, z)]

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 6 / 34

slide-8
SLIDE 8

Examples

Binary classification:

Z = X × {0, 1} h ∈ H is a predictor h : X → {0, 1} ℓ(h, (x, y)) = 1[h(x) = y]

Multiclass categorization:

Z = X × Y h ∈ H is a predictor h : X → Y ℓ(h, (x, y)) = 1[h(x) = y]

k-means clustering:

Z = Rd H ⊂ (Rd)k specifies k cluster centers ℓ((µ1, . . . , µk), z) = minj µj − z

Density Estimation:

h is a parameter of a density ph(z) ℓ(h, z) = − log ph(z)

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 7 / 34

slide-9
SLIDE 9

Learnability, ERM, Uniform convergence

Uniform Convergence: For m ≥ mUC(ǫ, δ), P

S∼Dm [∀h ∈ H, |LS(h) − L(h)| ≤ ǫ] ≥ 1 − δ

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

slide-10
SLIDE 10

Learnability, ERM, Uniform convergence

Uniform Convergence: For m ≥ mUC(ǫ, δ), P

S∼Dm [∀h ∈ H, |LS(h) − L(h)| ≤ ǫ] ≥ 1 − δ

Learnable: ∃A s.t. for m ≥ mPAC(ǫ, δ), P

S∼Dm

  • L(A(S)) ≤ min

h∈H L(h) + ǫ

  • ≥ 1 − δ

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

slide-11
SLIDE 11

Learnability, ERM, Uniform convergence

Uniform Convergence: For m ≥ mUC(ǫ, δ), P

S∼Dm [∀h ∈ H, |LS(h) − L(h)| ≤ ǫ] ≥ 1 − δ

Learnable: ∃A s.t. for m ≥ mPAC(ǫ, δ), P

S∼Dm

  • L(A(S)) ≤ min

h∈H L(h) + ǫ

  • ≥ 1 − δ

ERM: An algorithm that returns A(S) ∈ argminh∈H LS(h)

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

slide-12
SLIDE 12

Learnability, ERM, Uniform convergence

Uniform Convergence: For m ≥ mUC(ǫ, δ), P

S∼Dm [∀h ∈ H, |LS(h) − L(h)| ≤ ǫ] ≥ 1 − δ

Learnable: ∃A s.t. for m ≥ mPAC(ǫ, δ), P

S∼Dm

  • L(A(S)) ≤ min

h∈H L(h) + ǫ

  • ≥ 1 − δ

ERM: An algorithm that returns A(S) ∈ argminh∈H LS(h) Learnable by arbitrary ERM: Like “Learnable” but A should be an ERM. Denote sample complexity by mERM(ǫ, δ)

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 8 / 34

slide-13
SLIDE 13

For Binary Classification

Uniform Convergence Learnable with ERM Learnable Finite VC

trivial trivial NFL (W’96) VC’71

mUC(ǫ, δ) ≈ mERM(ǫ, δ) ≈ mPAC(ǫ, δ) ≈

VC(H) log(1/δ) ǫ2

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 9 / 34

slide-14
SLIDE 14

Outline

1

Definitions

2

Learnability without uniform convergence

3

Characterizing Learnability using Stability

4

Characterizing Multiclass Learnability

5

Open Questions

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 10 / 34

slide-15
SLIDE 15

First (trivial) Counter Example

Minorizing function: Let H′ be a class of binary classifiers with infinite VC dimension Let H = H′ ∪ {h0} Let ℓ(h, (x, y)) =      1 if h = h0 ∧ h(x) = y 1/2 if h = h0 ∧ h(x) = y if h = h0 No uniform convergence (mUC = ∞) Learnable by ERM (mERM = 0)

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 11 / 34

slide-16
SLIDE 16

From Vapnik’s book ...

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 12 / 34

slide-17
SLIDE 17

Second Counter Example — Multiclass

X – a set, Y = 2X ∪ {∗}. H = {hT : T ⊂ X} where hT (x) =

x / ∈ T T x ∈ T

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 13 / 34

slide-18
SLIDE 18

Second Counter Example — Multiclass

X – a set, Y = 2X ∪ {∗}. H = {hT : T ⊂ X} where hT (x) =

x / ∈ T T x ∈ T Claim: No uniform convergence: mUC ≥ |X|/ǫ

Target function is h∅ For any training set S, take T = X \ S LS(hT ) = 0 but L(hT ) = P[T]

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 13 / 34

slide-19
SLIDE 19

Second Counter Example — Multiclass

X – a set, Y = 2X ∪ {∗}. H = {hT : T ⊂ X} where hT (x) =

x / ∈ T T x ∈ T Claim: H is Learnable: mPAC ≤ 1

ǫ

Let T be the target A(S) = hT if (x, T) ∈ S A(S) = h∅ if S = {(x1, ∗), . . . , (xm, ∗)} In the 1st case, L(A(S)) = 0. In the 2nd case, L(A(S)) = P[T] With high probability, if P[T] > ǫ then we’ll be in the 1st case

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 13 / 34

slide-20
SLIDE 20

Second Counter Example — Multiclass

Corollary

mUC mPAC ≈ |X|.

If |X| → ∞ then the problem is learnable but there is no uniform convergence!

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 14 / 34

slide-21
SLIDE 21

Third Counter Example — Stochastic Convex Optimization

Consider the family of problems: H is a convex set with maxh∈H h ≤ 1 For all z, ℓ(h, z) is convex and Lipschitz w.r.t. h

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 15 / 34

slide-22
SLIDE 22

Third Counter Example — Stochastic Convex Optimization

Consider the family of problems: H is a convex set with maxh∈H h ≤ 1 For all z, ℓ(h, z) is convex and Lipschitz w.r.t. h Claim: Problem is learnable by the rule: argmin

h∈H λm 2 h2 + 1 m m

  • i=1

ℓ(h, zi) No uniform convergence Not learnable by ERM

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 15 / 34

slide-23
SLIDE 23

Third Counter Example — Stochastic Convex Optimization

Proof (of “not learnable by arbitrary ERM”) 1-Mean + missing features

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 16 / 34

slide-24
SLIDE 24

Third Counter Example — Stochastic Convex Optimization

Proof (of “not learnable by arbitrary ERM”) 1-Mean + missing features z = (α, x), α ∈ {0, 1}d, x ∈ Rd, x ≤ 1 ℓ(h, (α, x)) =

  • i αi(hi − xi)2

Take P[αi = 1] = 1/2, P[x = µ] = 1 Let h(i) be s.t. h(i)

j

=

  • 1 − µj

if j = i µj

  • .w.

If d is large enough, exists i such that h(i) is an ERM But L(h(i)) ≥ 1/ √ 2

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 16 / 34

slide-25
SLIDE 25

Third Counter Example — Stochastic Convex Optimization

Proof (of “not even learnable by a unique ERM”) Perturb the loss a little bit: ℓ(h, (α, x)) =

  • i

αi(hi − xi)2 + ǫ

  • i

2−i(hi − 1)2 Now loss is strictly convex — unique ERM But the unique ERM does not generalize (as before)

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 17 / 34

slide-26
SLIDE 26

Outline

1

Definitions

2

Learnability without uniform convergence

3

Characterizing Learnability using Stability

4

Characterizing Multiclass Learnability

5

Open Questions

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 18 / 34

slide-27
SLIDE 27

Characterizing Learnability using Stability

Theorem

A sufficient and necessary condition for learnability is the existence of Asymptotic ERM (AERM) which is stable. Uniform Convergence ERM is stable ∃ stable AERM Learnable

RMP’05,MNPR’06, trivial Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 19 / 34

slide-28
SLIDE 28

More formally

Definition (Stability)

We say that A is ǫstable(m)-uniform-replace-one stable if for all D, E

S,z′,i |ℓ(A(S(i)); z′) − ℓ(A(S); z′)| ≤ ǫstable(m).

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 20 / 34

slide-29
SLIDE 29

More formally

Definition (Stability)

We say that A is ǫstable(m)-uniform-replace-one stable if for all D, E

S,z′,i |ℓ(A(S(i)); z′) − ℓ(A(S); z′)| ≤ ǫstable(m).

Definition (AERM)

We say that A is an AERM (Asymptotic Empirical Risk Minimizer) with rate ǫerm(m) if for all D: E

S∼Dm[LS(A(S)) − min h∈H LS(h)] ≤ ǫerm(m)

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 20 / 34

slide-30
SLIDE 30

Proof sketch: (Stable AERM is sufficient and necessary for Learnability)

Sufficient: For AERM: stability ⇒ generalization AERM+generalization ⇒ consistency Necessary: ∃ consistent A ⇒ ∃ consistent and generalizing A′ (using subsampling) Consistent+generalizing ⇒ AERM AERM+generalizing ⇒ stable

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 21 / 34

slide-31
SLIDE 31

Intermediate Summary

Learnability ⇐ ⇒ ∃ stable AERM But, how do we find one? And, is there a combinatorial notion of learnability (like VC dimension) ?

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 22 / 34

slide-32
SLIDE 32

Outline

1

Definitions

2

Learnability without uniform convergence

3

Characterizing Learnability using Stability

4

Characterizing Multiclass Learnability

5

Open Questions

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 23 / 34

slide-33
SLIDE 33

Why multiclass learning

Practical relevance A simple twist of binary classification In a sense, captures the essence of difficulty of the General Learning Setting

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 24 / 34

slide-34
SLIDE 34

The Graph Dimension

S is G-shattered by H if ∃f ∈ H s.t. for every T ⊆ S exists h ∈ H with h(x) = f(x) if x ∈ T h(x) = f(x) if x ∈ S \ T

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 25 / 34

slide-35
SLIDE 35

The Graph Dimension

S is G-shattered by H if ∃f ∈ H s.t. for every T ⊆ S exists h ∈ H with h(x) = f(x) if x ∈ T h(x) = f(x) if x ∈ S \ T Graph dimension: Maximal size of G-shattered set

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 25 / 34

slide-36
SLIDE 36

The Graph Dimension

S is G-shattered by H if ∃f ∈ H s.t. for every T ⊆ S exists h ∈ H with h(x) = f(x) if x ∈ T h(x) = f(x) if x ∈ S \ T Graph dimension: Maximal size of G-shattered set Remark: When |Y| = 2, Graph dimension equals to VC dimension

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 25 / 34

slide-37
SLIDE 37

Example

Consider again our counter example: Y = 2X ∪ {∗} and H = {hT : T ⊂ X} with hT (x) =

x / ∈ T T x ∈ T

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 26 / 34

slide-38
SLIDE 38

Example

Consider again our counter example: Y = 2X ∪ {∗} and H = {hT : T ⊂ X} with hT (x) =

x / ∈ T T x ∈ T Claim: Graph dimension of H is |X| Proof: Take f = h∅ and S = X. For each T ⊂ S take hT c. So, for x ∈ T, hT c(x) = ∗ = f(x) and for x / ∈ T, hT c(x) = T = ∗.

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 26 / 34

slide-39
SLIDE 39

Example

Consider again our counter example: Y = 2X ∪ {∗} and H = {hT : T ⊂ X} with hT (x) =

x / ∈ T T x ∈ T Claim: Graph dimension of H is |X| Proof: Take f = h∅ and S = X. For each T ⊂ S take hT c. So, for x ∈ T, hT c(x) = ∗ = f(x) and for x / ∈ T, hT c(x) = T = ∗. Conclusion: Graph dimension does not characterize multiclass learnability (in fact, Graph dimension characterizes uniform convergence)

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 26 / 34

slide-40
SLIDE 40

The Natarajan Dimension

S is N-shattered by H if ∃f1, f2 ∈ H s.t. ∀x ∈ S, f1(x) = f2(x), and for every T ⊆ S exists h ∈ H with h(x) =

  • f1(x)

if x ∈ T f2(x) if x ∈ S \ T

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 27 / 34

slide-41
SLIDE 41

The Natarajan Dimension

S is N-shattered by H if ∃f1, f2 ∈ H s.t. ∀x ∈ S, f1(x) = f2(x), and for every T ⊆ S exists h ∈ H with h(x) =

  • f1(x)

if x ∈ T f2(x) if x ∈ S \ T Natarajan dimension: Maximal size of N-shattered set

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 27 / 34

slide-42
SLIDE 42

The Natarajan Dimension

S is N-shattered by H if ∃f1, f2 ∈ H s.t. ∀x ∈ S, f1(x) = f2(x), and for every T ⊆ S exists h ∈ H with h(x) =

  • f1(x)

if x ∈ T f2(x) if x ∈ S \ T Natarajan dimension: Maximal size of N-shattered set Remarks: When |Y| = 2, Natarajan dimension also equals to VC dimension Natarajan ≤ Graph

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 27 / 34

slide-43
SLIDE 43

Example

Consider again our counter example: Y = 2X ∪ {∗} and H = {hT : T ⊂ X} with hT (x) =

x / ∈ T T x ∈ T

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 28 / 34

slide-44
SLIDE 44

Example

Consider again our counter example: Y = 2X ∪ {∗} and H = {hT : T ⊂ X} with hT (x) =

x / ∈ T T x ∈ T Claim: Natarajan dimension of H is 1 Proof:

Take S = {x1, x2}. The only possible labelings of S by H are h1 h2 h3 h4 x1 1,2 1 * * x2 1,2 * 2 * Constraints on f1, f2 are that f1(x) = f2(x) for all x, and exists h with h(x1) = f1(x) and h2(x) = f2(x). No (f1, f2) satisfies these two constraints.

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 28 / 34

slide-45
SLIDE 45

Example

Consider again our counter example: Y = 2X ∪ {∗} and H = {hT : T ⊂ X} with hT (x) =

x / ∈ T T x ∈ T Claim: Natarajan dimension of H is 1 Proof:

Take S = {x1, x2}. The only possible labelings of S by H are h1 h2 h3 h4 x1 1,2 1 * * x2 1,2 * 2 * Constraints on f1, f2 are that f1(x) = f2(x) for all x, and exists h with h(x1) = f1(x) and h2(x) = f2(x). No (f1, f2) satisfies these two constraints.

Does Natarajan dimension characterize multiclass learnability ?

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 28 / 34

slide-46
SLIDE 46

Multiclass Learnability of Symmetric Classes

Theorem

If H is a class of symmetric functions with Natarajan dimension d then d + ln(1/δ) ǫ ≤ mPAC(ǫ, δ) ≤ d ln(d/ǫ) + ln(1/δ) ǫ .

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 29 / 34

slide-47
SLIDE 47

Multiclass Learnability of Symmetric Classes

Theorem

If H is a class of symmetric functions with Natarajan dimension d then d + ln(1/δ) ǫ ≤ mPAC(ǫ, δ) ≤ d ln(d/ǫ) + ln(1/δ) ǫ . Is the above also true for non-symmetric hypotheses classes? Open Question

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 29 / 34

slide-48
SLIDE 48

Proof: The Learning Algorithm

A good ERM is an ERM that, for every target hypothesis, considers a small number of hypotheses A Principle for Designing Good ERMs

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 30 / 34

slide-49
SLIDE 49

Proof: The Learning Algorithm

A good ERM is an ERM that, for every target hypothesis, considers a small number of hypotheses A Principle for Designing Good ERMs Given a target hypothesis h⋆, let S(h⋆) = {S : errS(h⋆) = 0} Let A(S(h⋆)) = {A(S) : S ∈ S(h⋆)} Claim: If |A(S(h⋆))| is small then A is consistent.

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 30 / 34

slide-50
SLIDE 50

Proof: The Learning Algorithm

A good ERM is an ERM that, for every target hypothesis, considers a small number of hypotheses A Principle for Designing Good ERMs Given a target hypothesis h⋆, let S(h⋆) = {S : errS(h⋆) = 0} Let A(S(h⋆)) = {A(S) : S ∈ S(h⋆)} Claim: If |A(S(h⋆))| is small then A is consistent. Obviously, |A(S(h⋆))| ≤ |H| but can be much smaller Example: Recall our counter example, then |Abad(S(∅))| = 2|X| while for all h⋆, |Agood(S(h⋆))| ≤ 2

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 30 / 34

slide-51
SLIDE 51

Proof: Natarajan+Symmetric ⇒ small |A(S(h⋆))|

Lemma: |A(S(h⋆))| ≤ md · (Max Range)2d Lemma: If H is symmetric and has Natarajan dimension d, then the Max Range of each h ∈ H is at most 2d + 1.

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 31 / 34

slide-52
SLIDE 52

Sample Complexity of Specific classes

We show how to calculate sample complexity of popular hypothesis classes — particularly, multiclass-to-binary reductions Enables a rigorous comparison of known multiclass algorithms

Previous analysis (e.g. SS’01,BL’07): how the binary error translates to multiclass error Our analysis: Direct calculation of the sample complexity of the multiclass classifier

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 32 / 34

slide-53
SLIDE 53

Specific classes

Multiclass-to-binary reductions:

1-vs-rest Linear multiclass construction: arg maxi(Wx)i Filter trees

Use linear predictors in Rd as the binary classifiers

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 33 / 34

slide-54
SLIDE 54

Specific classes

Multiclass-to-binary reductions:

1-vs-rest Linear multiclass construction: arg maxi(Wx)i Filter trees

Use linear predictors in Rd as the binary classifiers

Theorem

The Natarajan dimension of all the above classes is ˜ Θ(d |Y|). All these reductions have the same estimation error. To compare them, one should analyze approximation error.

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 33 / 34

slide-55
SLIDE 55

Summary and Open Questions

Equivalence between uniform convergence and learnability breaks even in multiclass problems What characterizes multiclass learnability ? What is the corresponding learning rule ? What characterizes learnability in the general learning setting ? What is the corresponding learning rule ?

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 34 / 34

slide-56
SLIDE 56

Summary and Open Questions

Equivalence between uniform convergence and learnability breaks even in multiclass problems What characterizes multiclass learnability ? What is the corresponding learning rule ? What characterizes learnability in the general learning setting ? What is the corresponding learning rule ?

THANKS

Shai Shalev-Shwartz (Hebrew U) Learnability Beyond Uniform Convergence Jul’11 34 / 34