Teaching Dimension COMS 6998-4 Learning Theory Benjamin Kuykendall - - PowerPoint PPT Presentation

teaching dimension
SMART_READER_LITE
LIVE PREVIEW

Teaching Dimension COMS 6998-4 Learning Theory Benjamin Kuykendall - - PowerPoint PPT Presentation

Teaching Dimension COMS 6998-4 Learning Theory Benjamin Kuykendall brk2117@columbia.edu 1 November 2017 Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 1 / 38 Outline 1 Introduction Learning model Generic bounds 2


slide-1
SLIDE 1

Teaching Dimension

COMS 6998-4 Learning Theory Benjamin Kuykendall

brk2117@columbia.edu

1 November 2017

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 1 / 38

slide-2
SLIDE 2

Outline

1 Introduction

Learning model Generic bounds

2 Examples

Least Teachable Class Axis Aligned Boxes

3 Teaching versus Learning

Disparities Bounds

4 Recursive Teaching

Almost maximal Classes Recursive Teaching Dimension

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 2 / 38

slide-3
SLIDE 3

Outline

1 Introduction

Learning model Generic bounds

2 Examples

Least Teachable Class Axis Aligned Boxes

3 Teaching versus Learning

Disparities Bounds

4 Recursive Teaching

Almost maximal Classes Recursive Teaching Dimension

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 3 / 38

slide-4
SLIDE 4

Consistent learners and Helpful Directors

[Goldman, Rivest, & Shapire 1993]

Definition (Consistent learner)

A learner is consistent when for all t the is some f ∈ C such that ∀i < t, f (xi) = f ∗(xi) and f (xt) = yt

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 4 / 38

slide-5
SLIDE 5

Consistent learners and Helpful Directors

[Goldman, Rivest, & Shapire 1993]

Definition (Consistent learner)

A learner is consistent when for all t the is some f ∈ C such that ∀i < t, f (xi) = f ∗(xi) and f (xt) = yt In the online model, after inputs x1, x2, ..., xi: No consistent learner will make a mistake at t > i ⇔ Exactly one consistent hypothesis is consistent with the x<t

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 4 / 38

slide-6
SLIDE 6

Teaching dimension

[Goldman & Kearns 1995]

Definition (Teaching Sequence)

Inputs x1,...,xm are a teaching sequence for f when there is no other function g ∈ C such that g(xi) = f (xi) for all i ≤ m.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 5 / 38

slide-7
SLIDE 7

Teaching dimension

[Goldman & Kearns 1995]

Definition (Teaching Sequence)

Inputs x1,...,xm are a teaching sequence for f when there is no other function g ∈ C such that g(xi) = f (xi) for all i ≤ m.

Definition (Teaching Dimension)

The class C has teaching dimension of t when t is the smallest integer such that each f ∈ C has a teaching sequence of length at most t.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 5 / 38

slide-8
SLIDE 8

Outline

1 Introduction

Learning model Generic bounds

2 Examples

Least Teachable Class Axis Aligned Boxes

3 Teaching versus Learning

Disparities Bounds

4 Recursive Teaching

Almost maximal Classes Recursive Teaching Dimension

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 6 / 38

slide-9
SLIDE 9

Trivial Teaching Sequence

Theorem (Teaching Upper Bound)

Any finite class has a teaching dimension at most t ≤ ∣C∣ − 1. Enumerate C = f , f1,..., f∣C∣−1. To teach f , choose xi such that f (xi) ≠ fi(xi).

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 7 / 38

slide-10
SLIDE 10

Counting Teaching Sequences

Theorem (Teaching Lower Bound)

Any finite class C over X has a teaching dimension at least t ≤ log ∣C∣ − 1 log ∣X∣ . Each f uniquely identified by some x1,...,xt with f (x1),...,f (xt). ∣C∣ ≤ 2t(∣X∣ t ) ≤ 2∣X∣t.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 8 / 38

slide-11
SLIDE 11

Summary of Generic Bounds

Theorem (Teaching Bounds)

Any finite class C over X has a teaching dimension t such that ∣C∣ − 1 ≥ t ≥ log ∣C∣ − 1 log ∣X∣ .

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 9 / 38

slide-12
SLIDE 12

Outline

1 Introduction

Learning model Generic bounds

2 Examples

Least Teachable Class Axis Aligned Boxes

3 Teaching versus Learning

Disparities Bounds

4 Recursive Teaching

Almost maximal Classes Recursive Teaching Dimension

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 10 / 38

slide-13
SLIDE 13

Least Teachable Class

Example (Least Teachable Class)

Consider the following concept class over {1, 2, ..., n}: C = {X ∖ {1}, X ∖ {2},..., X ∖ {n}} ∪ {X}. To teach X ∖ {i} use teaching sequence i. To teach X need sequence 1, 2, ..., n. So teaching dimension is n = ∣C∣ − 1.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 11 / 38

slide-14
SLIDE 14

Outline

1 Introduction

Learning model Generic bounds

2 Examples

Least Teachable Class Axis Aligned Boxes

3 Teaching versus Learning

Disparities Bounds

4 Recursive Teaching

Almost maximal Classes Recursive Teaching Dimension

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 12 / 38

slide-15
SLIDE 15

Rectangles in the Plane

Example (Rectangles in Z2)

Two points x, y ∈ Z2 define a rectangle Rx,y(z) = 1 ⇔ z1 ∈ [x1,y1] and z2 ∈ [x2,y2]. Teaching sequence Positive examples: x and y Negative examples: x − (1,0), x − (0,1), y + (1,0), y + (0,1) Teaching dimension 6

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 13 / 38

slide-16
SLIDE 16

Rectangles in the Plane

+ +

  • Benjamin Kuykendall (brk2117)

Teaching Dimension 1 November 2017 14 / 38

slide-17
SLIDE 17

Rectangles in the Plane

+ +

  • Benjamin Kuykendall (brk2117)

Teaching Dimension 1 November 2017 15 / 38

slide-18
SLIDE 18

Higher Dimensions

Example (Boxes in Zd)

Two points x, y ∈ Zd define a box Rx,y(z) = 1 ⇔ ∀i ∈ [d] zi ∈ [xi,yi]. Teaching sequence Positive examples: x and y Negative examples for each i ∈ [d]: x − ei, y + ei Teaching dimension 2(1 + d)

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 16 / 38

slide-19
SLIDE 19

Unions of boxes

Example (Union of Boxes)

Fix k. For Rx1,y 1, ..., Rxk,y k disjoint each in Rd let U{xi,y i}(z) =

k

i=1

Rxi,y i. Use the union of the teaching sequences for each box (with special case when boxes are adjacent) Teaching dimension 2k(1 + d).

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 17 / 38

slide-20
SLIDE 20

Union of boxes

+

  • +
  • +
  • +
  • Benjamin Kuykendall (brk2117)

Teaching Dimension 1 November 2017 18 / 38

slide-21
SLIDE 21

Union of boxes (k = 2)

+

  • +
  • +
  • +
  • Benjamin Kuykendall (brk2117)

Teaching Dimension 1 November 2017 19 / 38

slide-22
SLIDE 22

Union of boxes (k = ?)

+

  • +
  • +
  • +
  • Benjamin Kuykendall (brk2117)

Teaching Dimension 1 November 2017 20 / 38

slide-23
SLIDE 23

Union of boxes (k = 2)

+

  • +
  • +
  • +
  • Benjamin Kuykendall (brk2117)

Teaching Dimension 1 November 2017 21 / 38

slide-24
SLIDE 24

Outline

1 Introduction

Learning model Generic bounds

2 Examples

Least Teachable Class Axis Aligned Boxes

3 Teaching versus Learning

Disparities Bounds

4 Recursive Teaching

Almost maximal Classes Recursive Teaching Dimension

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 22 / 38

slide-25
SLIDE 25

VC Dimension

Definition (Shattered set)

The class C shatters a set S ⊂ X when {S ∩ c ∶ c ∈ C} = P(S).

Definition (VC dimension)

The integer d is the Vapnik-Chervonenkis dimension of a class C if it is the minimum d such that C shatters no sets of d + 1 points.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 23 / 38

slide-26
SLIDE 26

Hard to teach, easy to learn

Example (Least Teachable Class)

C = {X ∖ {1}, X ∖ {2},..., X ∖ {n}} ∪ {X}. Teaching Dimension n VC Dimension 2 as no hypothesis induces (1,0,0) on three points

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 24 / 38

slide-27
SLIDE 27

Infinite Teaching Dimension

[Moran, Shpilka, Wigderson, Yehudayoff 2015]

Example (Dedekind cuts)

Consider the class of sets of rational numbers less than some real C = {(−∞,r) ∩ Q ∶ r ∈ R}. VC Dimension 2 as for q1 < q2 < q3 cannot induce (1,0,1) Teaching Dimension ∞

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 25 / 38

slide-28
SLIDE 28

Easy to teach, hard to learn

Set of n easy to teach functions: F = {{x} ∶ x ∈ [n]}

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 26 / 38

slide-29
SLIDE 29

Easy to teach, hard to learn

Set of n easy to teach functions: F = {{x} ∶ x ∈ [n]} Set of 2m hard to learn functions: G = 2[m]

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 26 / 38

slide-30
SLIDE 30

Easy to teach, hard to learn

Set of n easy to teach functions: F = {{x} ∶ x ∈ [n]} Set of 2m hard to learn functions: G = 2[m] Choose 2m = n and construct class over [n] ⊍ [m]

Example (Hybrid Concept)

Enumerate F = f1, ..., fn and G = g1, ..., gm above. Define class C = {hi = fi ⊍ gi ∶ i ∈ [n]}.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 26 / 38

slide-31
SLIDE 31

Easy to teach, hard to learn

x1 x2 x3 ... xn−1 xn y1 ... ym−1 ym h1 + − − ... − − − ... − − h2 − + − ... − − − ... − + h3 − − + ... − − − ... + − ⋮ hn−1 − − − ... + − + ... + − hn − − − ... − + + ... + +

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 27 / 38

slide-32
SLIDE 32

Easy to teach, hard to learn

x1 x2 x3 ... xn−1 xn y1 ... ym−1 ym h1 + − − ... − − − ... − − h2 − + − ... − − − ... − + h3 − − + ... − − − ... + − ⋮ hn−1 − − − ... + − + ... + − hn − − − ... − + + ... + + Still easy to teach: hi identified by positive example xi Still hard to learn: y1,...,ym is shattered Teaching Dimension 1 but VC Dimension log n

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 27 / 38

slide-33
SLIDE 33

Outline

1 Introduction

Learning model Generic bounds

2 Examples

Least Teachable Class Axis Aligned Boxes

3 Teaching versus Learning

Disparities Bounds

4 Recursive Teaching

Almost maximal Classes Recursive Teaching Dimension

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 28 / 38

slide-34
SLIDE 34

Bounding Teaching by VC Dimension

Theorem (Lower bound)

t ≥ d − 1 log ∣X∣. Follows directly from previous: t ≥ log ∣C∣ − 1 log ∣X∣ and log ∣C∣ ≥ d.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 29 / 38

slide-35
SLIDE 35

Bounding Teaching by VC Dimension

Theorem (Upper bound)

t ≤ ∣C∣ − 2d + d. Learning sequence: Shattered set of size d One example to exclude each remaining hypothesis First step removes 2d − 1 hypotheses with d examples Second step removes ∣C∣ − (2d − 1) − 1 hypotheses, 1 example each

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 30 / 38

slide-36
SLIDE 36

Summary of VC Dimension Bounds

Theorem (Teaching versus Learning Bounds)

If C has teaching dimension t and VC dimension d then ∣C∣ − 2d + d ≥ t ≥ d − 1 log ∣X∣

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 31 / 38

slide-37
SLIDE 37

Outline

1 Introduction

Learning model Generic bounds

2 Examples

Least Teachable Class Axis Aligned Boxes

3 Teaching versus Learning

Disparities Bounds

4 Recursive Teaching

Almost maximal Classes Recursive Teaching Dimension

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 32 / 38

slide-38
SLIDE 38

Removing one function

Theorem (Concentration of Teaching Dimension)

If the teaching dimension of C is t ≥ ∣C∣ − k, then for some f ∈ C the class C ∖ {f } has teaching dimension at most k.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 33 / 38

slide-39
SLIDE 39

Removing one function

Theorem (Concentration of Teaching Dimension)

If the teaching dimension of C is t ≥ ∣C∣ − k, then for some f ∈ C the class C ∖ {f } has teaching dimension at most k. Fix f requiring a teaching sequence x1, x2, ..., xt of length t. To prove: fix some f1 in the class C ∖ {f } and wlog take f1(x1) ≠ f (x).

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 33 / 38

slide-40
SLIDE 40

Removing one function

Theorem (Concentration of Teaching Dimension)

If the teaching dimension of C is t ≥ ∣C∣ − k, then for some f ∈ C the class C ∖ {f } has teaching dimension at most k. Fix f requiring a teaching sequence x1, x2, ..., xt of length t. To prove: fix some f1 in the class C ∖ {f } and wlog take f1(x1) ≠ f (x). Idea: partition C ∖ {f } into S a large set that disagrees with f1 on x1 T a small set To teach f1, use sequence xi plus one x to distinguish from each g ∈ T.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 33 / 38

slide-41
SLIDE 41

Concentration Theorem (proof)

Construct S and T inductively. Let C = C ∖ ({f } ∪ S ∪ T) the remaining concepts. Define D(x) the set of g ∈ C such that g(x) ≠ f (x).

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 34 / 38

slide-42
SLIDE 42

Concentration Theorem (proof)

Construct S and T inductively. Let C = C ∖ ({f } ∪ S ∪ T) the remaining concepts. Define D(x) the set of g ∈ C such that g(x) ≠ f (x). First set S = {f1} and T = D(x1) ∖ {f1}.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 34 / 38

slide-43
SLIDE 43

Concentration Theorem (proof)

Construct S and T inductively. Let C = C ∖ ({f } ∪ S ∪ T) the remaining concepts. Define D(x) the set of g ∈ C such that g(x) ≠ f (x). First set S = {f1} and T = D(x1) ∖ {f1}. Then for i = 2, ..., t: Pick an arbitrary fi ∈ D(xi). Add fi to S. Add any remaining D(xi) ∖ {fi} to T.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 34 / 38

slide-44
SLIDE 44

Concentration Theorem (proof)

Construct S and T inductively. Let C = C ∖ ({f } ∪ S ∪ T) the remaining concepts. Define D(x) the set of g ∈ C such that g(x) ≠ f (x). First set S = {f1} and T = D(x1) ∖ {f1}. Then for i = 2, ..., t: Pick an arbitrary fi ∈ D(xi). Add fi to S. Add any remaining D(xi) ∖ {fi} to T.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 34 / 38

slide-45
SLIDE 45

Concentration Theorem (proof)

First set S = {f1} and T = D(x1) ∖ {f1}. Then for i = 2, . . . , t: Pick an arbitrary fi ∈ D(xi). Add fi to S. Add any remaining D(xi) ∖ {fi} to T.

Claim 1: fi ∈ S disagrees with f1 on x1 Assume fi(x1) = f1(x1). fi(x1) ≠ f (x1) by construction. But then in first step fi ∈ D(x1) so fi ∈ T T and S are disjoint, so fi / ∈ S.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 35 / 38

slide-46
SLIDE 46

Concentration Theorem (proof)

First set S = {f1} and T = D(x1) ∖ {f1}. Then for i = 2, . . . , t: Pick an arbitrary fi ∈ D(xi). Add fi to S. Add any remaining D(xi) ∖ {fi} to T.

Claim 2: ∣T∣ = k − 1: D(xi) non-empty at each step, otherwise {xj} ∖ xi a learning sequence One fi gets added to S each round, have ∣S∣ = t C ∖ {f } = S ∪ T implies ∣T∣ = ∣C∣ − 1 − ∣S∣ Assumed t = ∣C∣ − k so ∣T∣ = k − 1.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 36 / 38

slide-47
SLIDE 47

Outline

1 Introduction

Learning model Generic bounds

2 Examples

Least Teachable Class Axis Aligned Boxes

3 Teaching versus Learning

Disparities Bounds

4 Recursive Teaching

Almost maximal Classes Recursive Teaching Dimension

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 37 / 38

slide-48
SLIDE 48

Recursive Teaching Dimension

[Zilles, Lange, Holte, Zinkevich 2011]

Let MinTD(C) be the set of f ∈ C with the shortest teaching sequences. Construct levels of C as follows: Ci = MinTD⎛ ⎝C ∖ ⋃

j<i

Cj ⎞ ⎠.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 38 / 38

slide-49
SLIDE 49

Recursive Teaching Dimension

[Zilles, Lange, Holte, Zinkevich 2011]

Let MinTD(C) be the set of f ∈ C with the shortest teaching sequences. Construct levels of C as follows: Ci = MinTD⎛ ⎝C ∖ ⋃

j<i

Cj ⎞ ⎠. Then we can define a robust notion of teaching dimension.

Definition (Recursive Teaching Dimension)

The recursive teaching dimension of C is the maximum of the teaching dimensions of the levels Ci constructed above.

Benjamin Kuykendall (brk2117) Teaching Dimension 1 November 2017 38 / 38