Learning with Partially Ordered Representations Jane Chandlee, Remi - - PowerPoint PPT Presentation

learning with partially ordered representations
SMART_READER_LITE
LIVE PREVIEW

Learning with Partially Ordered Representations Jane Chandlee, Remi - - PowerPoint PPT Presentation

Learning Features Grammars Learning Algorithm References Learning with Partially Ordered Representations Jane Chandlee, Remi Eyraud, Jeffrey Heinz, Adam Jardine, Jonathan Rawski 1 Learning Features Grammars Learning Algorithm References


slide-1
SLIDE 1

Learning Features Grammars Learning Algorithm References

Learning with Partially Ordered Representations

Jane Chandlee, Remi Eyraud, Jeffrey Heinz, Adam Jardine, Jonathan Rawski

1

slide-2
SLIDE 2

Learning Features Grammars Learning Algorithm References

Jane Chandlee (Haverford) Remi Eyraud (Aix-Marseille) Jeff Heinz (Stony Brook) Adam Jardine (Rutgers)

2

slide-3
SLIDE 3

Learning Features Grammars Learning Algorithm References

The Talk in a Nutshell

Previously

◮ Efficient Learning of subregular languages and functions ◮ Question: How to extend these learners for multiple, shared

properties? Today

◮ Describe model-theoretic characterization of strings and trees ◮ Describe the partial order structure of the space of

feature-based hypotheses

◮ Showcase a learning algorithm which exploits this structure to

generalize from data to grammars.

3

slide-4
SLIDE 4

Learning Features Grammars Learning Algorithm References

The Talk in a Nutshell

Previously

◮ Efficient Learning of subregular languages and functions ◮ Question: How to extend these learners for multiple, shared

properties? Today

◮ Describe model-theoretic characterization of strings and trees ◮ Describe the partial order structure of the space of

feature-based hypotheses

◮ Showcase a learning algorithm which exploits this structure to

generalize from data to grammars.

3

slide-5
SLIDE 5

Learning Features Grammars Learning Algorithm References

Finite Word Models

‘word’ is synonymous with ‘structure.’

◮ A model of a word is a representation of it. ◮ A (Relational) Model contains two kinds of elements.

◮ A domain: a finite set of elements. ◮ Relations over domain elements.

◮ Every word has a model. ◮ Different words have different models.

4

slide-6
SLIDE 6

Learning Features Grammars Learning Algorithm References

Finite Word Models

5

slide-7
SLIDE 7

Learning Features Grammars Learning Algorithm References

Finite Word Models

  • 1. Successor (Immediate Precedence)

1 a 2 b 3 b 4 a ⊳ ⊳ ⊳

  • 2. General precedence

1 a 2 b 3 b 4 a < < < < < <

6

slide-8
SLIDE 8

Learning Features Grammars Learning Algorithm References

Tree Models (Rogers 2003)

Pic courtesy of Rogers 2014 ESSLLI course.

7

slide-9
SLIDE 9

Learning Features Grammars Learning Algorithm References

Subregular Hierarchy (Rogers et al 2013)

8

slide-10
SLIDE 10

Learning Features Grammars Learning Algorithm References

Local Factors

Pics courtesy of Heinz and Rogers 2014 ESSLLI course.

9

slide-11
SLIDE 11

Learning Features Grammars Learning Algorithm References

Locality and Projection

Theorem (Medvedev) A set of strings is Regular iff it is a homomorphic image of a Strictly 2-Local set. Theorem (Thatcher) A set of Σ-labeled trees is recognizable by a finite-state tree automaton (i.e. regular) iff it is a projection of a local set of trees. Theorem (Thatcher) A set of strings L is the yield of a local set

  • f trees (equivalently, is the yield of a recognizable set of trees) iff

it is Context-Free.

10

slide-12
SLIDE 12

Learning Features Grammars Learning Algorithm References

Unconventional Word Models

Successor (Immediate Precedence) 1 vowel back low 2 voiced labial stop 3 voiced labial stop 4 vowel back low ⊳ ⊳ ⊳ stop voiced stop

⊳ 11

slide-13
SLIDE 13

Learning Features Grammars Learning Algorithm References

The Challenge of Features

Distinctive Feature Theory “part of the heart of phonology” — Rice (2003) “The most fudamental insight gained during the last century” — Ladefoged & Halle (1988) *NT → { *nt, *np, *nk, *mt, *mp, *mk, ...} Wilson & Gallagher 2018 “Could there be a non-statistical model that learns by memorizing feature sequences? The problem confronting such a model is that any given segment sequence has may different featural

  • representations. Without a method for deciding which

representations are relevant for assessing wellformedness (the role that statistics plays in Maxent) learning is doomed.”

12

slide-14
SLIDE 14

Learning Features Grammars Learning Algorithm References

The Challenge of Features

Distinctive Feature Theory “part of the heart of phonology” — Rice (2003) “The most fudamental insight gained during the last century” — Ladefoged & Halle (1988) *NT → { *nt, *np, *nk, *mt, *mp, *mk, ...} Wilson & Gallagher 2018 “Could there be a non-statistical model that learns by memorizing feature sequences? The problem confronting such a model is that any given segment sequence has may different featural

  • representations. Without a method for deciding which

representations are relevant for assessing wellformedness (the role that statistics plays in Maxent) learning is doomed.”

12

slide-15
SLIDE 15

Learning Features Grammars Learning Algorithm References

Example Imagine the sequence nt is not present in a corpus. There are many possible equivalent constraints: *nt *[+nasal][+coronal] *[+consonant][+coronal,-continuant] *[+sonorant][-sonorant] .... How can a learner decide which of these constraints is responsible for the absence of nt?

13

slide-16
SLIDE 16

Learning Features Grammars Learning Algorithm References

Example Imagine the sequence nt is not present in a corpus. There are many possible equivalent constraints: *nt *[+nasal][+coronal] *[+consonant][+coronal,-continuant] *[+sonorant][-sonorant] .... How can a learner decide which of these constraints is responsible for the absence of nt?

13

slide-17
SLIDE 17

Learning Features Grammars Learning Algorithm References

Example Imagine the sequence nt is not present in a corpus. There are many possible equivalent constraints: *nt *[+nasal][+coronal] *[+consonant][+coronal,-continuant] *[+sonorant][-sonorant] .... How can a learner decide which of these constraints is responsible for the absence of nt?

13

slide-18
SLIDE 18

Learning Features Grammars Learning Algorithm References

Constraint Explosion (Hayes and Wilson 2008)

As we add segments and features, the amount of possible hypotheses grows larger. How much larger?

14

slide-19
SLIDE 19

Learning Features Grammars Learning Algorithm References

Some definitions

Definition (Restriction) A = DA;≻,RA

1,...,RA n is a restriction of B = DB;≻,RB 1,...,RB n iff

DA ⊆ DB and for each m-ary relation Ri, we have RA

i = {(x1 ...xm) ∈ RB i | x1,...,xm ∈ DA}.

Intuition: Identifies a subset A of the domain of B and strips B of all elements and relations which are not wholly within A.

15

slide-20
SLIDE 20

Learning Features Grammars Learning Algorithm References

Some definitions

Definition (Subfactor) Structure A is a subfactor of structure B (A ⊑ B) if A is connected, there exists a restriction of B denoted B′, and there exists h : A → B′ such that for all a1,...am ∈ A and for all Ri in the model signature: if h(a1),...h(am) ∈ B′ and Ri(a1,...am) holds in A then Ri(h(a1),...h(am)) holds in B′. If A ⊑ B we also say that B is a superfactor of A. Intuition: Properties that hold of the connected structure A also hold in a related way within B.

16

slide-21
SLIDE 21

Learning Features Grammars Learning Algorithm References

Subfactor Ideals

Definition (Ideal) A non-empty subset S of a poset A,≤ is an ideal iff

◮ for every x ∈ S, y ≤ x implies y ∈ S, and ◮ for all x,y ∈ S there is some z ∈ S s.t x ≤ z and y ≤ z.

Example [-N] [-N,+V] [-N,+C] [-N,+V,+C]

17

slide-22
SLIDE 22

Learning Features Grammars Learning Algorithm References

Grammatical Entailments

Subfactor Ideals If s is a subfactor of t for G and G generates t, then G generates s. Example [-N] [-N,+V] [-N,+C] [-N,+V,+C]

18

slide-23
SLIDE 23

Learning Features Grammars Learning Algorithm References

Grammatical Entailments

Subfactor Ideals If s is a subfactor of t for G and G generates t, then G generates s. Example [-N] [-N,+V] [-N,+C] [-N,+V,+C]

18

slide-24
SLIDE 24

Learning Features Grammars Learning Algorithm References

Grammatical Entailments

Subfactor Ideals If s is a subfactor of t for G and G generates t, then G generates s. Example [-N] [-N,+V] [-N,+C] [-N,+V,+C]

  • 18
slide-25
SLIDE 25

Learning Features Grammars Learning Algorithm References

Grammatical Entailments

Subfactor Ideals If s is a subfactor of t for G and G generates t, then G generates s. Example [-N] [-N,+V] [-N,+C] [-N,+V,+C]

  • 18
slide-26
SLIDE 26

Learning Features Grammars Learning Algorithm References

Grammatical Entailments

Subfactor Ideals If s is a subfactor of t for G and G generates t, then G generates s. Example [-N] [-N,+V] [-N,+C] [-N,+V,+C]

  • 18
slide-27
SLIDE 27

Learning Features Grammars Learning Algorithm References

Grammatical Entailments

Subfactor Ideals If s is a subfactor of t for G and G generates t, then G generates s. Example [-N] [-N,+V] [-N,+C] [-N,+V,+C]

  • *

18

slide-28
SLIDE 28

Learning Features Grammars Learning Algorithm References

Grammatical Entailments

Subfactor Ideals If s is a subfactor of t for G and G generates t, then G generates s. Example [-N] [-N,+V] [-N,+C] [-N,+V,+C]

  • *

18

slide-29
SLIDE 29

Learning Features Grammars Learning Algorithm References

Grammatical Entailments

Subfactor Ideals If s is a subfactor of t for G and G generates t, then G generates s. Example [-N] [-N,+V] [-N,+C] [-N,+V,+C]

  • *

*

18

slide-30
SLIDE 30

Learning Features Grammars Learning Algorithm References

Example with Singular Segments

19

slide-31
SLIDE 31

Learning Features Grammars Learning Algorithm References

NLP Example

In many NLP applications, text symbols are treated independently Alphabet = {a,...,z,A,...,Z} = 52 symbols Forbidding maybe all capitals → Explosion! If we use feature [capital], only 27! 26 letters + [capital]

20

slide-32
SLIDE 32

Learning Features Grammars Learning Algorithm References

Unbounded Linguistic Dependencies

◮ Samala Sibilant Harmony

Sibilants must not disagree in anteriority. (Applegate 1972) (1) a. * hasxintilawaS b. * haSxintilawas c. haSxintilawaS Example: Samala $ h a s x i n t i l a w a S $ $ h a S x i n t i l a w a S $

$ s t a j a n o w o n w a S $

21

slide-33
SLIDE 33

Learning Features Grammars Learning Algorithm References

Unbounded Linguistic Dependencies

◮ Samala Sibilant Harmony

Sibilants must not disagree in anteriority. (Applegate 1972) (1) a. * hasxintilawaS b. * haSxintilawas c. haSxintilawaS Example: Samala $ h a s x i n t i l a w a S $ $ h a S x i n t i l a w a S $

$ s t a j a n o w o n w a S $

21

slide-34
SLIDE 34

Learning Features Grammars Learning Algorithm References

Unbounded Linguistic Dependencies

◮ Samala Sibilant Harmony

Sibilants must not disagree in anteriority. (Applegate 1972) (1) a. * hasxintilawaS b. * haSxintilawas c. haSxintilawaS Example: Samala $ h a s x i n t i l a w a S $ $ h a S x i n t i l a w a S $

$ s t a j a n o w o n w a S $

21

slide-35
SLIDE 35

Learning Features Grammars Learning Algorithm References

Unbounded Linguistic Dependencies

◮ Samala Sibilant Harmony

Sibilants must not disagree in anteriority. (Applegate 1972) (1) a. * hasxintilawaS b. * haSxintilawas c. haSxintilawaS Example: Samala $ h a s x i n t i l a w a S $ $ h a S x i n t i l a w a S $

$ s t a j a n o w o n w a S $

21

slide-36
SLIDE 36

Learning Features Grammars Learning Algorithm References

Unbounded Linguistic Dependencies

◮ Samala Sibilant Harmony

Sibilants must not disagree in anteriority. (Applegate 1972) (1) a. * hasxintilawaS b. * haSxintilawas c. haSxintilawaS Example: Samala $ h a s x i n t i l a w a S $ $ h a S x i n t i l a w a S $

∗ ◮ But: Sibilants can be arbitrarily far away from each other!

$ s t a j a n o w o n w a S $

21

slide-37
SLIDE 37

Learning Features Grammars Learning Algorithm References

Unbounded Linguistic Dependencies

◮ Samala Sibilant Harmony

Sibilants must not disagree in anteriority. (Applegate 1972) (1) a. * hasxintilawaS b. * haSxintilawas c. haSxintilawaS Example: Samala $ h a s x i n t i l a w a S $ $ h a S x i n t i l a w a S $

∗ ◮ But: Sibilants can be arbitrarily far away from each other!

$ s t a j a n o w o n w a S $

21

slide-38
SLIDE 38

Learning Features Grammars Learning Algorithm References

Example: Samala Long-Distance *sS

Banned Structure

1

+str +ant

2

+str

  • ant

< 22

slide-39
SLIDE 39

Learning Features Grammars Learning Algorithm References

Two Ways to Learn (De Raedt 2008)

Specific-to-General Induction

◮ Start at the most specific points (highest) in the space ◮ Remove all the subfactors that are present in the data. ◮ Collect the most general substructures remaining.

General-to-Specific Induction

◮ Beginning at the lowest element in the space, ◮ Check whether this structure is a subfactor of the input data. ◮ If no, extend the structure by either adding a domain element,

  • r a relation on an existing element and repeat.

23

slide-40
SLIDE 40

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-41
SLIDE 41

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-42
SLIDE 42

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-43
SLIDE 43

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-44
SLIDE 44

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-45
SLIDE 45

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-46
SLIDE 46

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-47
SLIDE 47

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-48
SLIDE 48

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-49
SLIDE 49

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-50
SLIDE 50

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-51
SLIDE 51

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-52
SLIDE 52

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-53
SLIDE 53

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-54
SLIDE 54

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-55
SLIDE 55

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-56
SLIDE 56

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-57
SLIDE 57

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-58
SLIDE 58

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-59
SLIDE 59

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-60
SLIDE 60

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-61
SLIDE 61

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-62
SLIDE 62

Learning Features Grammars Learning Algorithm References

Learning Algorithm

Bottom-Up Relational Learner

◮ Prunes Hypothesis space according to ordering relation ◮ Provably identifies correct constraints for sequential data ◮ Uses data sparsity to its advantage!

24

slide-63
SLIDE 63

Learning Features Grammars Learning Algorithm References

Learning Guarantees

This learner is provably guaranteed to find the responsible

  • constraints. With What measures?

Theorem Given a finite positive data sample, the bottom-up learner finds a constraint grammar G such that:

1 G is consistent, i.e. it covers the data:

◮ D ⊆ L(G)

2 L(G) is the smallest language in L which covers the data

◮ for all L ∈ L where D ⊆ L, L(G) ⊆ L

3 the largest forbidden substructure is of size k 4 G includes structures S that are restrictions of structures S′

included in other grammars G′ that also satisfy (1,2,3)

◮ for all S′ ∈ G′, there exists S ∈ G such that S ⊑ S′.

25

slide-64
SLIDE 64

Learning Features Grammars Learning Algorithm References

Extensions

Things To Do

◮ Determine the trade-off between data sparsity and time

  • complexity. We hypothesize sparser data should yield faster

generalization.

◮ Extend algorithm to learning subregular functions. ◮ Incorporation/Comparison to Statistical Learning

◮ what is the efficiency tradeoff between statistics and structure? ◮ MaxEnt models perform well, can they accommodate

structure?

26

slide-65
SLIDE 65

Learning Features Grammars Learning Algorithm References

Conclusion

Today’s Results

◮ Learning is due to representations and structured hypothesis

spaces

◮ There is rich structure in features that partially orders the

hypothesis space

◮ These entailments allow bottom-up inference of collections of

constraint ideals and filters to succeed

27

slide-66
SLIDE 66

Learning Features Grammars Learning Algorithm References

Thanks!

Special thanks to Jim Rogers for immensely helpful discussions This work was supported by NIH under grant #R01HD87133-01

28

slide-67
SLIDE 67

References I

Applegate, R.B. 1972. Ineseno chumash grammar. Doctoral Dissertation, University

  • f California,Berkeley.