Grammatical inference and subregular phonology Adam Jardine - - PowerPoint PPT Presentation

grammatical inference and subregular phonology
SMART_READER_LITE
LIVE PREVIEW

Grammatical inference and subregular phonology Adam Jardine - - PowerPoint PPT Presentation

Grammatical inference and subregular phonology Adam Jardine Rutgers University December 11, 2019 Tel Aviv University Review [V]arious formal and substantive universals are intrinsic properties of the language-acquisition system, these


slide-1
SLIDE 1

Grammatical inference and subregular phonology

Adam Jardine Rutgers University December 11, 2019 · Tel Aviv University

slide-2
SLIDE 2

Review

slide-3
SLIDE 3

“[V]arious formal and substantive universals are intrinsic properties of the language-acquisition system, these providing a schema that is applied to data and that determines in a highly restricted way the general form and, in part, even the substantive features of the grammar that may emerge upon presentation of appropriate data.” Chomsky, Aspects “[I]f an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems.” Wolpert and Macready (1997), NFL Thms.

2

slide-4
SLIDE 4

computable languages Reg SL phonotactics computable functions Reg Subseq ISL processes

3

slide-5
SLIDE 5

Review

  • Computational characterizations of phonological patterns identify

structure that can be used by a learner

⋊ : ⊥ C : ⊥ V : ⊤ C : ⊤ V : ⊤ V : ⊤ C : ⊤ V : ⊥ C : ⊥

4

slide-6
SLIDE 6

Review

  • Computational characterizations of phonological patterns identify

structure that can be used by a learner data 0 CV 1 V 2 CV CV

⋊ : ⊥ C : ⊥ V : ⊤ C : ⊤ V : ⊤ V : ⊤ C : ⊤ V : ⊥ C : ⊥

4

slide-7
SLIDE 7

Review

  • Computational characterizations of phonological patterns identify

structure that can be used by a learner

⋊ : λ C : λ V : λ C : C V : V V : V C : C V : V C : C

5

slide-8
SLIDE 8

Review

  • Computational characterizations of phonological patterns identify

structure that can be used by a learner

⋊ : λ C : V V : λ C : C V : V V : V C : C V : V C : V C

5

slide-9
SLIDE 9

Review

  • Computational characterizations of phonological patterns identify

structure that can be used by a learner

⋊ : λ C : V V : λ C : C V : V V : V C : C V : λ C : V C

5

slide-10
SLIDE 10

Review

  • Computational characterizations of phonological patterns identify

structure that can be used by a learner

⋊ : C : V : C : V : V : C : V : C :

5

slide-11
SLIDE 11

Today

  • Using automata structure for learning

– ISL functions – SL distributions

  • Open questions

6

slide-12
SLIDE 12

Learning ISL functions

slide-13
SLIDE 13

Learning input strictly local functions

  • When learning languages, presentation is a sequence of examples of L

t datum V 1 CV CV 2 CV V CV CV . . .

  • When learning functions, ...

7

slide-14
SLIDE 14

Learning input strictly local functions

  • When learning languages, presentation is a sequence of examples of L

t datum V 1 CV CV 2 CV V CV CV . . .

  • When learning functions, presentation is of example pairs from f

t datum (C, CV ) 1 (CV C, CV CV ) 2 (CV CV, CV CV ) . . .

7

slide-15
SLIDE 15

Learning input strictly local functions

t datum (C, CV ) 1 (CV C, CV CV ) 2 (CV CV, CV CV ) 3 (V CV C, V CV CV )

?

− − →

⋊ : C : V : C : V : V : C : V : C :

8

slide-16
SLIDE 16

Learning input strictly local functions

  • The longest common prefix (lcp) is the longest initial sequence shared by

a set of strings lcp({CV CV, CV CCV, CV CV C}) =

9

slide-17
SLIDE 17

Learning input strictly local functions

  • The longest common prefix (lcp) is the longest initial sequence shared by

a set of strings lcp({CV CV, CV CCV, CV CV C}) = CV C lcp({CV CV, CCV CV, CV CV C}) =

9

slide-18
SLIDE 18

Learning input strictly local functions

  • The longest common prefix (lcp) is the longest initial sequence shared by

a set of strings lcp({CV CV, CV CCV, CV CV C}) = CV C lcp({CV CV, CCV CV, CV CV C}) = C

9

slide-19
SLIDE 19

Learning input strictly local functions

  • The longest common prefix (lcp) is the longest initial sequence shared by

a set of strings lcp({CV CV, CV CCV, CV CV C}) = CV C lcp({CV CV, CCV CV, CV CV C}) = C

  • Call our data sequence d ⊂ f

(CV, CV ) (CV C, CV C) (CV CV C, CV CV C) (V CV V C, V CV C) (V CV V, V CV ) dp(w) = lcp(d(wΣ∗))

10

slide-20
SLIDE 20

Learning input strictly local functions

  • The longest common prefix (lcp) is the longest initial sequence shared by

a set of strings lcp({CV CV, CV CCV, CV CV C}) = CV C lcp({CV CV, CCV CV, CV CV C}) = C

  • Call our data sequence d ⊂ f

(CV, CV ) (CV C, CV C) (CV CV C, CV CV C) (V CV V C, V CV C) (V CV V, V CV ) dp(w) = lcp(d(wΣ∗)) dp(CV C) = ...

10

slide-21
SLIDE 21

Learning input strictly local functions

  • The longest common prefix (lcp) is the longest initial sequence shared by

a set of strings lcp({CV CV, CV CCV, CV CV C}) = CV C lcp({CV CV, CCV CV, CV CV C}) = C

  • Call our data sequence d ⊂ f

(CV, CV ) (CV C, CV C) (CV CV C, CV CV C) (V CV V C, V CV C) (V CV V, V CV ) dp(w) = lcp(d(wΣ∗)) dp(CV C) = CV C dp(V CV V ) = ...

10

slide-22
SLIDE 22

Learning input strictly local functions

  • The longest common prefix (lcp) is the longest initial sequence shared by

a set of strings lcp({CV CV, CV CCV, CV CV C}) = CV C lcp({CV CV, CCV CV, CV CV C}) = C

  • Call our data sequence d ⊂ f

(CV, CV ) (CV C, CV C) (CV CV C, CV CV C) (V CV V C, V CV C) (V CV V, V CV ) dp(w) = lcp(d(wΣ∗)) dp(CV C) = CV C dp(V CV V ) = V CV

10

slide-23
SLIDE 23

Learning input strictly local functions

  • Call our data sequence d ⊂ f

(CV, CV ) (CV C, CV C) (CV CV C, CV CV C) (V CV V C, V CV C) (V CV V, V CV ) dp(w) = lcp(d(wΣ∗)) dp(CV C) = CV C dp(V CV V ) = V CV dw(u) = dp(w)−1d(wu)

11

slide-24
SLIDE 24

Learning input strictly local functions

  • Call our data sequence d ⊂ f

(CV, CV ) (CV C, CV C) (CV CV C, CV CV C) (V CV V C, V CV C) (V CV V, V CV ) dp(w) = lcp(d(wΣ∗)) dp(CV C) = CV C dp(V CV V ) = V CV dw(u) = dp(w)−1d(wu) dCV (C) = dp(CV )−1d(CV C) = (CV )−1CV C = C

11

slide-25
SLIDE 25

Learning input strictly local functions

  • Call our data sequence d ⊂ f

(CV, CV ) (CV C, CV C) (CV CV C, CV CV C) (V CV V C, V CV C) (V CV V, V CV ) dp(w) = lcp(d(wΣ∗)) dp(CV C) = CV C dp(V CV V ) = V CV dw(u) = dp(w)−1d(wu) dCV (C) = dp(CV )−1d(CV C) = (CV )−1CV C = C dV CV (V ) = dp(V CV )−1d(V CV V ) = (V CV )−1V CV = λ

11

slide-26
SLIDE 26

Learning input strictly local functions

  • Call our data sequence d ⊂ f

(CV, CV ) (CV C, CV C) (CV CV C, CV CV C) (V CV V C, V CV C) (V CV V, V CV ) dp(w) = lcp(d(wΣ∗)) dp(CV C) = CV C dp(V CV V ) = V CV dw(u) = dp(w)−1d(wu) dCV (C) = dp(CV )−1d(CV C) = (CV )−1CV C = C dV CV (V ) = dp(V CV )−1d(V CV V ) = (V CV )−1V CV = λ dp

w(u) = lcp(dw(uΣ∗))

11

slide-27
SLIDE 27

(CV C, CV C) (CV V, CV ) (CV CCV, CV CCV ) (CCV CC, CCV CC) (CCCV CV, CCCV CV ) (CV V CV, CV CV ) (V, V )

⋊ : C : V : C : V : V : C : V : C :

12

slide-28
SLIDE 28

(CV C, CV C) (CV V, CV ) (CV CCV, CV CCV ) (CCV CC, CCV CC) (CCCV CV, CCCV CV ) (CV V CV, CV CV ) (V, V )

⋊ : C : V : C : C V : V : C : V : C :

dp

λ(C) = C

12

slide-29
SLIDE 29

(CV C, CV C) (CV V, CV ) (CV CCV, CV CCV ) (CCV CC, CCV CC) (CCCV CV, CCCV CV ) (CV V CV, CV CV ) (V, V )

⋊ : C : V : C : C V : V : C : V : C : C

dp

C(C) = C

12

slide-30
SLIDE 30

(CV C, CV C) (CV V, CV ) (CV CCV, CV CCV ) (CCV CC, CCV CC) (CCCV CV, CCCV CV ) (CV V CV, CV CV ) (V, V )

⋊ : C : V : C : C V : V : V C : V : C : C

dp

C(V ) = V

12

slide-31
SLIDE 31

(CV C, CV C) (CV V, CV ) (CV CCV, CV CCV ) (CCV CC, CCV CC) (CCCV CV, CCCV CV ) (CV V CV, CV CV ) (V, V )

⋊ : C : V : C : C V : V : V C : C V : C : C

dp

CV (C) = C

12

slide-32
SLIDE 32

(CV C, CV C) (CV V , CV ) (CV CCV, CV CCV ) (CCV CC, CCV CC) (CCCV CV, CCCV CV ) (CV V CV, CV CV ) (V, V )

⋊ : C : V : C : C V : V : V C : C V : λ C : C

dp

CV (V ) = λ

12

slide-33
SLIDE 33

(CV C, CV C) (CV V, CV ) (CV CCV, CV CCV ) (CCV CC, CCV CC) (CCCV CV, CCCV CV ) (CV V CV, CV CV ) (V , V )

⋊ : C : V : C : C V : V V : V C : C V : λ C : C

dp

λ(V ) = V

12

slide-34
SLIDE 34

(CV C, CV C) (CV V, CV ) (CV CCV, CV CCV ) (CCV CC, CCV CC) (CCCV CV, CCCV CV ) (CV V CV, CV CV ) (V , V )

⋊ : C : λ V : λ C : C V : V V : V C : C V : λ C : C

dp(CV C)−1d(CV C) = λ, dp(V )−1d(V ) = λ

12

slide-35
SLIDE 35

(CV C, CV C) (CV V, CV ) (CV CCV, CV CCV ) (CCV CC, CCV CC) (CCCV CV, CCCV CV ) (CV V CV, CV CV ) (V, V )

⋊ : C : λ V : λ C : C V : V V : V C : C V : λ C : C

12

slide-36
SLIDE 36

Learning input strictly local functions

  • As any two ISLk functions share the same structure, this method

ILPD-learns the ISLk functions

⋊ : C : V : C : V : V : C : V : C :

  • This method extends to any class of functions that shares such a

structure (Jardine et al., 2014)

13

slide-37
SLIDE 37

Learning input strictly local functions

  • A learning algorithm for grammars that explicitly encode computational

properties of phonological patterns

  • Learning for OSL (Chandlee et al., 2015) and tier-based OSL (Burness and

McMullin, 2019) use a similar (yet distinct) method

  • Learning URs uses this same structural concept (Hua et al. in progress)
  • Learning for optional ISL processes uses the same basic idea (Heinz in

progress) based on Beros and de la Higuera (2016)

14

slide-38
SLIDE 38

Learning SL distributions

slide-39
SLIDE 39

Learning strictly local distributions

  • Probability distributions can be described with the same structure.

⋊ : 0.0 C : 0.2 V : 0.5 C : 0.6 V : 0.4 V : 0.6 C : 0.2 V : 0.1 C : 0.4

15

slide-40
SLIDE 40

Learning strictly local distributions

CV C CV V CV CCV CV CV C CV CV CV V CV ⋊ : 0 C : 0 V : 0 C : 0 V : 0 V : 0 C : 0 V : 0 C : 0

16

slide-41
SLIDE 41

Learning strictly local distributions

CV C CV V CV CCV CV CV C CV CV CV V CV ⋊ : 0 C : 2 V : 4 C : 6 V : 0 V : 10 C : 1 V : 2 C : 6

16

slide-42
SLIDE 42

Learning strictly local distributions

CV C CV V CV CCV CV CV C CV CV CV V CV ⋊ : 0

6

C : 2

13

V : 4

12

C : 6

6

V : 0

6

V : 10

13

C : 1

13

V : 2

12

C : 6

12

16

slide-43
SLIDE 43

Learning structured distributions

  • This same technique can be extended to...

– Learning strictly piecewise distributions: Heinz and Rogers (2010) – Learning SL distributions over features: Heinz and Koirala (2010) – ...

17

slide-44
SLIDE 44

Review

  • Studying computational principles that underly phonological patterns

identify structural properties for learning: – phonotactics – processes – stochastic generalizations

  • A theory of phonology based on these principles derives typological

predictions from learning

18

slide-45
SLIDE 45

Open questions

  • Non-string representations are best characterized using logic

(Jardine, 2016; Strother-Garcia, 2017)

  • Learning with logic is a wide-open question

(Strother-Garcia et al., 2016)

  • Learning using features

(Chandlee et al., 2019)

  • Learning URs (Hua et al., in progress)
  • Learning optionality (Heinz et al., in progress) and stochastic processes

(wide open)

  • Distinguishing accidental versus systematic gaps (Rawski in progress)

19

slide-46
SLIDE 46

Open questions

  • A useful tool:

https://github.com/alenaks/SigmaPie

20