Strongly Non-U-Shaped Learning Results by General Techniques John - - PowerPoint PPT Presentation

strongly non u shaped learning results by general
SMART_READER_LITE
LIVE PREVIEW

Strongly Non-U-Shaped Learning Results by General Techniques John - - PowerPoint PPT Presentation

Strongly Non-U-Shaped Learning Results by General Techniques John Case 1 Timo Ktzing 2 1 Computer and Information Science, University of Delaware 2 Max Planck Institute for Informatics June 28, 2010 Examples for Language Learning We want to


slide-1
SLIDE 1

Strongly Non-U-Shaped Learning Results by General Techniques

John Case1 Timo Kötzing2

1 Computer and Information Science, University of Delaware 2 Max Planck Institute for Informatics June 28, 2010

slide-2
SLIDE 2

Examples for Language Learning

We want to learn correct programs or programmable descriptions for given languages, such as: 16, 12, 18, 2, 4, 0, 16, . . . “even numbers” 1, 16, 256, 16, 4, . . . “powers of 2” 0, 0, 0, 0, 0, . . . “singleton 0”

June 28, 2010 2/11

slide-3
SLIDE 3

Examples for Language Learning

We want to learn correct programs or programmable descriptions for given languages, such as: 16, 12, 18, 2, 4, 0, 16, . . . “even numbers” 1, 16, 256, 16, 4, . . . “powers of 2” 0, 0, 0, 0, 0, . . . “singleton 0”

June 28, 2010 2/11

slide-4
SLIDE 4

Examples for Language Learning

We want to learn correct programs or programmable descriptions for given languages, such as: 16, 12, 18, 2, 4, 0, 16, . . . “even numbers” 1, 16, 256, 16, 4, . . . “powers of 2” 0, 0, 0, 0, 0, . . . “singleton 0”

June 28, 2010 2/11

slide-5
SLIDE 5

Examples for Language Learning

We want to learn correct programs or programmable descriptions for given languages, such as: 16, 12, 18, 2, 4, 0, 16, . . . “even numbers” 1, 16, 256, 16, 4, . . . “powers of 2” 0, 0, 0, 0, 0, . . . “singleton 0”

June 28, 2010 2/11

slide-6
SLIDE 6

Examples for Language Learning

We want to learn correct programs or programmable descriptions for given languages, such as: 16, 12, 18, 2, 4, 0, 16, . . . “even numbers” 1, 16, 256, 16, 4, . . . “powers of 2” 0, 0, 0, 0, 0, . . . “singleton 0”

June 28, 2010 2/11

slide-7
SLIDE 7

Examples for Language Learning

We want to learn correct programs or programmable descriptions for given languages, such as: 16, 12, 18, 2, 4, 0, 16, . . . “even numbers” 1, 16, 256, 16, 4, . . . “powers of 2” 0, 0, 0, 0, 0, . . . “singleton 0”

June 28, 2010 2/11

slide-8
SLIDE 8

Examples for Language Learning

We want to learn correct programs or programmable descriptions for given languages, such as: 16, 12, 18, 2, 4, 0, 16, . . . “even numbers” 1, 16, 256, 16, 4, . . . “powers of 2” 0, 0, 0, 0, 0, . . . “singleton 0”

June 28, 2010 2/11

slide-9
SLIDE 9

Examples for Language Learning

We want to learn correct programs or programmable descriptions for given languages, such as: 16, 12, 18, 2, 4, 0, 16, . . . “even numbers” 1, 16, 256, 16, 4, . . . “powers of 2” 0, 0, 0, 0, 0, . . . “singleton 0”

June 28, 2010 2/11

slide-10
SLIDE 10

Language Learning from Positive Data

Let N = {0, 1, 2, . . .}, the set of all natural numbers. A language is a set L ⊆ N. A presentation for L is essentially an (infinite) listing T of all and

  • nly the elements of L. Such a T is called a text for L.

We numerically name programs or grammars in some standard general hypothesis space, where each e ∈ N generates some language.

June 28, 2010 3/11

slide-11
SLIDE 11

Language Learning from Positive Data

Let N = {0, 1, 2, . . .}, the set of all natural numbers. A language is a set L ⊆ N. A presentation for L is essentially an (infinite) listing T of all and

  • nly the elements of L. Such a T is called a text for L.

We numerically name programs or grammars in some standard general hypothesis space, where each e ∈ N generates some language.

June 28, 2010 3/11

slide-12
SLIDE 12

Language Learning from Positive Data

Let N = {0, 1, 2, . . .}, the set of all natural numbers. A language is a set L ⊆ N. A presentation for L is essentially an (infinite) listing T of all and

  • nly the elements of L. Such a T is called a text for L.

We numerically name programs or grammars in some standard general hypothesis space, where each e ∈ N generates some language.

June 28, 2010 3/11

slide-13
SLIDE 13

Language Learning from Positive Data

Let N = {0, 1, 2, . . .}, the set of all natural numbers. A language is a set L ⊆ N. A presentation for L is essentially an (infinite) listing T of all and

  • nly the elements of L. Such a T is called a text for L.

We numerically name programs or grammars in some standard general hypothesis space, where each e ∈ N generates some language.

June 28, 2010 3/11

slide-14
SLIDE 14

Language Learning from Positive Data

Let N = {0, 1, 2, . . .}, the set of all natural numbers. A language is a set L ⊆ N. A presentation for L is essentially an (infinite) listing T of all and

  • nly the elements of L. Such a T is called a text for L.

We numerically name programs or grammars in some standard general hypothesis space, where each e ∈ N generates some language.

June 28, 2010 3/11

slide-15
SLIDE 15

Success: TxtEx-Learning

Let L be a language, h an algorithmic learner and T a text (a presentation) for L. For all k, we write T[k] for the sequence T(0), . . . , T(k − 1). The learning sequence pT of h on T is given by ∀k : pT(k) = h(T[k]). Gold 1967: h TxtEx-learns L iff, for all texts T for L, there is i such that pT(i) = pT(i + 1) = pT(i + 2) = . . . and pT(i) is a program for L. A class L of languages is TxtEx-learnable iff there exists an algorithmic learner h TxtEx-learning each language L ∈ L.

June 28, 2010 4/11

slide-16
SLIDE 16

Success: TxtEx-Learning

Let L be a language, h an algorithmic learner and T a text (a presentation) for L. For all k, we write T[k] for the sequence T(0), . . . , T(k − 1). The learning sequence pT of h on T is given by ∀k : pT(k) = h(T[k]). Gold 1967: h TxtEx-learns L iff, for all texts T for L, there is i such that pT(i) = pT(i + 1) = pT(i + 2) = . . . and pT(i) is a program for L. A class L of languages is TxtEx-learnable iff there exists an algorithmic learner h TxtEx-learning each language L ∈ L.

June 28, 2010 4/11

slide-17
SLIDE 17

Success: TxtEx-Learning

Let L be a language, h an algorithmic learner and T a text (a presentation) for L. For all k, we write T[k] for the sequence T(0), . . . , T(k − 1). The learning sequence pT of h on T is given by ∀k : pT(k) = h(T[k]). Gold 1967: h TxtEx-learns L iff, for all texts T for L, there is i such that pT(i) = pT(i + 1) = pT(i + 2) = . . . and pT(i) is a program for L. A class L of languages is TxtEx-learnable iff there exists an algorithmic learner h TxtEx-learning each language L ∈ L.

June 28, 2010 4/11

slide-18
SLIDE 18

Success: TxtEx-Learning

Let L be a language, h an algorithmic learner and T a text (a presentation) for L. For all k, we write T[k] for the sequence T(0), . . . , T(k − 1). The learning sequence pT of h on T is given by ∀k : pT(k) = h(T[k]). Gold 1967: h TxtEx-learns L iff, for all texts T for L, there is i such that pT(i) = pT(i + 1) = pT(i + 2) = . . . and pT(i) is a program for L. A class L of languages is TxtEx-learnable iff there exists an algorithmic learner h TxtEx-learning each language L ∈ L.

June 28, 2010 4/11

slide-19
SLIDE 19

Success: TxtEx-Learning

Let L be a language, h an algorithmic learner and T a text (a presentation) for L. For all k, we write T[k] for the sequence T(0), . . . , T(k − 1). The learning sequence pT of h on T is given by ∀k : pT(k) = h(T[k]). Gold 1967: h TxtEx-learns L iff, for all texts T for L, there is i such that pT(i) = pT(i + 1) = pT(i + 2) = . . . and pT(i) is a program for L. A class L of languages is TxtEx-learnable iff there exists an algorithmic learner h TxtEx-learning each language L ∈ L.

June 28, 2010 4/11

slide-20
SLIDE 20

Success: TxtEx-Learning

Let L be a language, h an algorithmic learner and T a text (a presentation) for L. For all k, we write T[k] for the sequence T(0), . . . , T(k − 1). The learning sequence pT of h on T is given by ∀k : pT(k) = h(T[k]). Gold 1967: h TxtEx-learns L iff, for all texts T for L, there is i such that pT(i) = pT(i + 1) = pT(i + 2) = . . . and pT(i) is a program for L. A class L of languages is TxtEx-learnable iff there exists an algorithmic learner h TxtEx-learning each language L ∈ L.

June 28, 2010 4/11

slide-21
SLIDE 21

Success: TxtEx-Learning

Let L be a language, h an algorithmic learner and T a text (a presentation) for L. For all k, we write T[k] for the sequence T(0), . . . , T(k − 1). The learning sequence pT of h on T is given by ∀k : pT(k) = h(T[k]). Gold 1967: h TxtEx-learns L iff, for all texts T for L, there is i such that pT(i) = pT(i + 1) = pT(i + 2) = . . . and pT(i) is a program for L. A class L of languages is TxtEx-learnable iff there exists an algorithmic learner h TxtEx-learning each language L ∈ L.

June 28, 2010 4/11

slide-22
SLIDE 22

Restrictions

An (algorithmic) learner h is called set-driven iff, for all σ, τ listing the same (finite) set of elements, h(σ) = h(τ). A learner h is called partially set-driven iff, for all σ, τ of same length and listing the same set of elements, h(σ) = h(τ). The above two restrictions model learner local-insensitivity to

  • rder of data presentation.

A learner h is called iterative iff, for all σ, τ with h(σ) = h(τ), for all x, h(σ ⋄ x) = h(τ ⋄ x).1

1This is equivalent to a learner having access only to the current datum and

the just prior hypothesis.

June 28, 2010 5/11

slide-23
SLIDE 23

Restrictions

An (algorithmic) learner h is called set-driven iff, for all σ, τ listing the same (finite) set of elements, h(σ) = h(τ). A learner h is called partially set-driven iff, for all σ, τ of same length and listing the same set of elements, h(σ) = h(τ). The above two restrictions model learner local-insensitivity to

  • rder of data presentation.

A learner h is called iterative iff, for all σ, τ with h(σ) = h(τ), for all x, h(σ ⋄ x) = h(τ ⋄ x).1

1This is equivalent to a learner having access only to the current datum and

the just prior hypothesis.

June 28, 2010 5/11

slide-24
SLIDE 24

Restrictions

An (algorithmic) learner h is called set-driven iff, for all σ, τ listing the same (finite) set of elements, h(σ) = h(τ). A learner h is called partially set-driven iff, for all σ, τ of same length and listing the same set of elements, h(σ) = h(τ). The above two restrictions model learner local-insensitivity to

  • rder of data presentation.

A learner h is called iterative iff, for all σ, τ with h(σ) = h(τ), for all x, h(σ ⋄ x) = h(τ ⋄ x).1

1This is equivalent to a learner having access only to the current datum and

the just prior hypothesis.

June 28, 2010 5/11

slide-25
SLIDE 25

Restrictions

An (algorithmic) learner h is called set-driven iff, for all σ, τ listing the same (finite) set of elements, h(σ) = h(τ). A learner h is called partially set-driven iff, for all σ, τ of same length and listing the same set of elements, h(σ) = h(τ). The above two restrictions model learner local-insensitivity to

  • rder of data presentation.

A learner h is called iterative iff, for all σ, τ with h(σ) = h(τ), for all x, h(σ ⋄ x) = h(τ ⋄ x).1

1This is equivalent to a learner having access only to the current datum and

the just prior hypothesis.

June 28, 2010 5/11

slide-26
SLIDE 26

Restrictions

An (algorithmic) learner h is called set-driven iff, for all σ, τ listing the same (finite) set of elements, h(σ) = h(τ). A learner h is called partially set-driven iff, for all σ, τ of same length and listing the same set of elements, h(σ) = h(τ). The above two restrictions model learner local-insensitivity to

  • rder of data presentation.

A learner h is called iterative iff, for all σ, τ with h(σ) = h(τ), for all x, h(σ ⋄ x) = h(τ ⋄ x).1

1This is equivalent to a learner having access only to the current datum and

the just prior hypothesis.

June 28, 2010 5/11

slide-27
SLIDE 27

U-Shapes

For learning with any of the above restrictions we investigate the necessity of (two kinds of) U-shapes. U-shaped learning occurs empirically in human child development: learn, unlearn, relearn. A learner h is said to be non-U-shaped on a class of languages L iff, for each language L ∈ L, h, when learning L, never semantically abandons a correct hypothesis. A learner h is said to be strongly non-U-shaped on a class of languages L iff, for each language L ∈ L, h, when learning L, never syntactically abandons a correct hypothesis.

June 28, 2010 6/11

slide-28
SLIDE 28

U-Shapes

For learning with any of the above restrictions we investigate the necessity of (two kinds of) U-shapes. U-shaped learning occurs empirically in human child development: learn, unlearn, relearn. A learner h is said to be non-U-shaped on a class of languages L iff, for each language L ∈ L, h, when learning L, never semantically abandons a correct hypothesis. A learner h is said to be strongly non-U-shaped on a class of languages L iff, for each language L ∈ L, h, when learning L, never syntactically abandons a correct hypothesis.

June 28, 2010 6/11

slide-29
SLIDE 29

U-Shapes

For learning with any of the above restrictions we investigate the necessity of (two kinds of) U-shapes. U-shaped learning occurs empirically in human child development: learn, unlearn, relearn. A learner h is said to be non-U-shaped on a class of languages L iff, for each language L ∈ L, h, when learning L, never semantically abandons a correct hypothesis. A learner h is said to be strongly non-U-shaped on a class of languages L iff, for each language L ∈ L, h, when learning L, never syntactically abandons a correct hypothesis.

June 28, 2010 6/11

slide-30
SLIDE 30

U-Shapes

For learning with any of the above restrictions we investigate the necessity of (two kinds of) U-shapes. U-shaped learning occurs empirically in human child development: learn, unlearn, relearn. A learner h is said to be non-U-shaped on a class of languages L iff, for each language L ∈ L, h, when learning L, never semantically abandons a correct hypothesis. A learner h is said to be strongly non-U-shaped on a class of languages L iff, for each language L ∈ L, h, when learning L, never syntactically abandons a correct hypothesis.

June 28, 2010 6/11

slide-31
SLIDE 31

U-Shapes

For learning with any of the above restrictions we investigate the necessity of (two kinds of) U-shapes. U-shaped learning occurs empirically in human child development: learn, unlearn, relearn. A learner h is said to be non-U-shaped on a class of languages L iff, for each language L ∈ L, h, when learning L, never semantically abandons a correct hypothesis. A learner h is said to be strongly non-U-shaped on a class of languages L iff, for each language L ∈ L, h, when learning L, never syntactically abandons a correct hypothesis.

June 28, 2010 6/11

slide-32
SLIDE 32

Results

For set-driven learning, we can assume strongly non-U-shaped learners. For partially set-driven learning, we can assume strongly non-U-shaped learners. Surprisingly, for iterative learning, we cannot assume strongly non-U-shaped learners. From Case and Moelius 2007, we know that, for iterative learning, we can assume (not necessarily strongly) non-U-shaped learners.

June 28, 2010 7/11

slide-33
SLIDE 33

Results

For set-driven learning, we can assume strongly non-U-shaped learners. For partially set-driven learning, we can assume strongly non-U-shaped learners. Surprisingly, for iterative learning, we cannot assume strongly non-U-shaped learners. From Case and Moelius 2007, we know that, for iterative learning, we can assume (not necessarily strongly) non-U-shaped learners.

June 28, 2010 7/11

slide-34
SLIDE 34

Results

For set-driven learning, we can assume strongly non-U-shaped learners. For partially set-driven learning, we can assume strongly non-U-shaped learners. Surprisingly, for iterative learning, we cannot assume strongly non-U-shaped learners. From Case and Moelius 2007, we know that, for iterative learning, we can assume (not necessarily strongly) non-U-shaped learners.

June 28, 2010 7/11

slide-35
SLIDE 35

Results

For set-driven learning, we can assume strongly non-U-shaped learners. For partially set-driven learning, we can assume strongly non-U-shaped learners. Surprisingly, for iterative learning, we cannot assume strongly non-U-shaped learners. From Case and Moelius 2007, we know that, for iterative learning, we can assume (not necessarily strongly) non-U-shaped learners.

June 28, 2010 7/11

slide-36
SLIDE 36

Results

For set-driven learning, we can assume strongly non-U-shaped learners. For partially set-driven learning, we can assume strongly non-U-shaped learners. Surprisingly, for iterative learning, we cannot assume strongly non-U-shaped learners. From Case and Moelius 2007, we know that, for iterative learning, we can assume (not necessarily strongly) non-U-shaped learners.

June 28, 2010 7/11

slide-37
SLIDE 37

Techniques

How did we get those results? For unnecessary U-shapes, we give a general scheme for how to remove them. We apply this scheme for both set-driven and partially set-driven learning. We use an different (self-referential or self-learning) approach for showing the necessity of U-shapes.

June 28, 2010 8/11

slide-38
SLIDE 38

Techniques

How did we get those results? For unnecessary U-shapes, we give a general scheme for how to remove them. We apply this scheme for both set-driven and partially set-driven learning. We use an different (self-referential or self-learning) approach for showing the necessity of U-shapes.

June 28, 2010 8/11

slide-39
SLIDE 39

Techniques

How did we get those results? For unnecessary U-shapes, we give a general scheme for how to remove them. We apply this scheme for both set-driven and partially set-driven learning. We use an different (self-referential or self-learning) approach for showing the necessity of U-shapes.

June 28, 2010 8/11

slide-40
SLIDE 40

Techniques

How did we get those results? For unnecessary U-shapes, we give a general scheme for how to remove them. We apply this scheme for both set-driven and partially set-driven learning. We use an different (self-referential or self-learning) approach for showing the necessity of U-shapes.

June 28, 2010 8/11

slide-41
SLIDE 41

Surprise re Self-Learning Technique

We have a very general result employing self-learning classes

  • f languages to completely epitomize or characterize any strict

learning power difference between two learning criteria. Suppose L is a self-learning class for this result. Each language of L contains only programs which completely specify how the corresponding learner of L is to transform its data into

  • utput programs.

This technique applies well beyond criteria featuring presence

  • r absence of U-shapes.

June 28, 2010 9/11

slide-42
SLIDE 42

Surprise re Self-Learning Technique

We have a very general result employing self-learning classes

  • f languages to completely epitomize or characterize any strict

learning power difference between two learning criteria. Suppose L is a self-learning class for this result. Each language of L contains only programs which completely specify how the corresponding learner of L is to transform its data into

  • utput programs.

This technique applies well beyond criteria featuring presence

  • r absence of U-shapes.

June 28, 2010 9/11

slide-43
SLIDE 43

Surprise re Self-Learning Technique

We have a very general result employing self-learning classes

  • f languages to completely epitomize or characterize any strict

learning power difference between two learning criteria. Suppose L is a self-learning class for this result. Each language of L contains only programs which completely specify how the corresponding learner of L is to transform its data into

  • utput programs.

This technique applies well beyond criteria featuring presence

  • r absence of U-shapes.

June 28, 2010 9/11

slide-44
SLIDE 44

Surprise re Self-Learning Technique

We have a very general result employing self-learning classes

  • f languages to completely epitomize or characterize any strict

learning power difference between two learning criteria. Suppose L is a self-learning class for this result. Each language of L contains only programs which completely specify how the corresponding learner of L is to transform its data into

  • utput programs.

This technique applies well beyond criteria featuring presence

  • r absence of U-shapes.

June 28, 2010 9/11

slide-45
SLIDE 45

Conclusion and Future Work

We added to the picture regarding the necessity of U-shapes. In the future, we will try to get an even better understanding wrt the necessity of U-shapes for other learning criteria. Regarding self-learning classes of languages, we currently work on a considerable expansion of the surprising result that self-learning classes characterize learning power differences.

June 28, 2010 10/11

slide-46
SLIDE 46

Conclusion and Future Work

We added to the picture regarding the necessity of U-shapes. In the future, we will try to get an even better understanding wrt the necessity of U-shapes for other learning criteria. Regarding self-learning classes of languages, we currently work on a considerable expansion of the surprising result that self-learning classes characterize learning power differences.

June 28, 2010 10/11

slide-47
SLIDE 47

Conclusion and Future Work

We added to the picture regarding the necessity of U-shapes. In the future, we will try to get an even better understanding wrt the necessity of U-shapes for other learning criteria. Regarding self-learning classes of languages, we currently work on a considerable expansion of the surprising result that self-learning classes characterize learning power differences.

June 28, 2010 10/11

slide-48
SLIDE 48

Conclusion and Future Work

We added to the picture regarding the necessity of U-shapes. In the future, we will try to get an even better understanding wrt the necessity of U-shapes for other learning criteria. Regarding self-learning classes of languages, we currently work on a considerable expansion of the surprising result that self-learning classes characterize learning power differences.

June 28, 2010 10/11

slide-49
SLIDE 49

Thank You.

June 28, 2010 11/11