Learning reduplication with 2-way finite-state transducers Hossep - - PowerPoint PPT Presentation

learning reduplication with 2 way finite state transducers
SMART_READER_LITE
LIVE PREVIEW

Learning reduplication with 2-way finite-state transducers Hossep - - PowerPoint PPT Presentation

Learning reduplication with 2-way finite-state transducers Hossep Dolatian & Jeffrey Heinz ICGI Wrocaw University of Science and Technology Sept 7, 2018 ICGI 2018 Dolatian & Heinz (1) Copying sequential information Copying


slide-1
SLIDE 1

Learning reduplication with 2-way finite-state transducers

Hossep Dolatian & Jeffrey Heinz ICGI Wrocław University of Science and Technology Sept 7, 2018

ICGI 2018 Dolatian & Heinz (1)

slide-2
SLIDE 2

Copying sequential information

Copying (=duplication, doubling, mimicry)

  • biological sciences
  • planning and control (robotics)
  • natural language. . .

→ word-formation or morphology (=reduplication)

ICGI 2018 Dolatian & Heinz (3)

slide-3
SLIDE 3

Copying in Natural Language

Many languages (∼83%) use reduplication to mark meaning

Indonesian plural

  • buku → buku∼buku, ‘book’→‘books’
  • wanita→ wanita∼wanita, ‘woman’→‘women’

Tohono O’odham plural

  • kotwa → kok∼twa, ‘shoulder’ → ‘shoulders’
  • sikul → sis∼kul, ‘younger sibling’ → ‘younger siblings’

(Rubino, 2013; Cohn, 1989) and (Anderson and Smith 2017)

ICGI 2018 Dolatian & Heinz (4)

slide-4
SLIDE 4

In this talk, we. . .

  • Present (the old) deterministic 2-way finite-state transducer

(FST) as a new way to represent reduplicative processes;

  • Identify a subclass of those transducers which covers most

reduplication patterns we studied;

  • Show how this subclass is learnable from examples.

The trick is to decompose the 2-way FSTs into the concatenation

  • f 1-way FSTs and learn the 1-way FSTs with known methods.

ICGI 2018 Dolatian & Heinz (5)

slide-5
SLIDE 5

Studying Linguistic Variation/Typology

Requires two books:

  • “encyclopedia of categories”
  • “encyclopedia of types”

Wilhelm Von Humboldt

ICGI 2018 Dolatian & Heinz (6)

slide-6
SLIDE 6

Basic typology of reduplication

  • Typology: Wide variation in how natural languages copy:

(1) Total reduplication = unbounded copy (∼83%) wanita→wanita∼wanita ‘woman’→‘women’ (Indo.) (2) Partial reduplication = bounded copy (∼75%) a. C: gen→g∼gen (Shilh) ‘to sleep’→‘to be sleeping’ b. CV: guyon→gu∼guyon (Sundanese) ‘to jest’→‘to jest repeatedly’ c. CVC: takki→ tak∼takki (Agta) ‘leg’→‘legs’ d. CVCV: banagañu→bana∼banagañu (Dyirbal) ‘return’

ICGI 2018 Dolatian & Heinz (7)

slide-7
SLIDE 7

Basic typology of reduplication

And it gets wider (3) Triplication: roar→ roar∼roar-roar ‘give a shudder’ →‘continue to shudder’ (Mokilese) (4) Final reduplication: erasi→erasi∼rasi ‘he is sick’→‘he continues being sick’ (Siriono) (5) Subconstituent copying: ku-haata→ku-haata∼haata ‘to ferment’→‘to start fermenting’ (KiHehe) (6) Left-right copying: l´ u:t’uxw→l´ uxw∼l´ ut’uxw ‘to value’→‘... (plural)’ (Nisgha)

ICGI 2018 Dolatian & Heinz (8)

slide-8
SLIDE 8

Basic Typology of Reduplication

(7) Syllable-counting:

  • a. jang→jang∼jang

‘sheet’→‘every sheet’ (Mandarin)

  • b. jialuen→meei jialuen

‘gallon’→‘every gallon’ (8) Echo reduplication: tras→tras∼vras ‘grief’→‘grief schmief’ (Hindi)

ICGI 2018 Dolatian & Heinz (9)

slide-9
SLIDE 9

Computational Nature of Word Formation

Word formation processes are rational relations, analyzable with (1-way) finite-state methods Beesley and Karttunen 2003 Roark and Sproat 2007

ICGI 2018 Dolatian & Heinz (10)

slide-10
SLIDE 10

1-way FSTs and reduplication

  • 1-way FSTs memorize a large but finite list of strings and their

copies

  • For partial reduplication = bounded # of segments copied:

▸ Extension: productively modeled ▸ Size: burdensome because of state explosion ▸ Intension: treated as ‘remembering’ and not ‘copying’

  • For total reduplication = unbounded # of segments copied:

▸ Extension: If we assume a finite lexicon, can be modeled ... ▸ but can’t be extended productively to new words ▸ output language is non-regular Lww={ ww | w ∈ Σ*} ▸ Size: larger state explosion ! ▸ Intension: can’t capture productivity + ‘remembering’ again

  • Appendix: more contrasts + difference in ‘remembering’ vs.

‘copying’ using origin semantics (Bojańczyk, 2014)

ICGI 2018 Dolatian & Heinz (11)

slide-11
SLIDE 11

Responses to the 1-way problem

  • Approximate:

▸ Stick to 1-way FST approximations (Walther, 2000; Cohen-Sygal

and Wintner, 2006; Beesley and Karttunen, 2003; Hulden, 2009)

▸ But: impose un-linguistic restrictions (e.g. a finite bound on

word size,...) and don’t directly capture reduplication

  • Non-finite-state mechanisms:

▸ MCFGs (Albro, 2005), HPSG (Crysmann, 2017), pushdown

accepters with queues (Savitch, 1989)

▸ But: those are recognizers not transducers ICGI 2018 Dolatian & Heinz (12)

slide-12
SLIDE 12

2-way FSTs

  • Mainstream FSTs are 1-way FSTs because they read the input
  • nce from left to right.
  • 2-way FSTs are an enriched class of FSTs that can go back and

forth on the input (Engelfriet and Hoogeboom, 2001; Savitch, 1982).

  • A 2-way FST can do everything a 1-way FST can do, and more.
  • Equivalances to logical transduction, other kinds of machines:
  • 2-way FSTs
  • =
  • MSO-definable transductions
  • =
  • Streaming String Transducers
  • (Courcelle, 1997; Engelfriet and Hoogeboom, 2001; Alur, 2010)

ICGI 2018 Dolatian & Heinz (13)

slide-13
SLIDE 13

Definition

2-way deterministic FST

A 2-way, deterministic FST is a six-tuple (Q,Σ⋉,Γ,q0,F,δ) such that:

  • Q is a finite set of states,
  • Σ⋉ = Σ ∪ {⋊,⋉} is the input alphabet,
  • Γ is the output alphabet,
  • q0 ∈ Q is the initial state,
  • F ⊆ Q is the set of final states,
  • δ ∶ Q × Σ → Q × Γ∗ × D is the transition function where the

direction D = {−1,0,+1}.

ICGI 2018 Dolatian & Heinz (14)

slide-14
SLIDE 14

2-way FSTs - Total reduplication

  • Total reduplication copies an unbounded size

wanita→wanita∼wanita ‘woman’→‘women’ (Indo.)

  • 2-way FST reads the input left-to-right (+1), goes back (-1), and

reads it again (+1) q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

ICGI 2018 Dolatian & Heinz (15)

slide-15
SLIDE 15

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→?

Input: ⋊ b y e ⋉ Output: q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

ICGI 2018 Dolatian & Heinz (16)

slide-16
SLIDE 16

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

ICGI 2018 Dolatian & Heinz (16)

slide-17
SLIDE 17

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: q0 start q1 q2 q3 q4

Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 ⋊:λ:+1

ICGI 2018 Dolatian & Heinz (16)

slide-18
SLIDE 18

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b q0 start q1 q2 q3 q4

⋊:λ:+1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (16)

slide-19
SLIDE 19

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y q0 start q1 q2 q3 q4

⋊:λ:+1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (16)

slide-20
SLIDE 20

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e q0 start q1 q2 q3 q4

⋊:λ:+1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (16)

slide-21
SLIDE 21

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e ∼ q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 ⋉:∼∶ −1

ICGI 2018 Dolatian & Heinz (16)

slide-22
SLIDE 22

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e ∼ q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 Σ ∶ λ ∶ −1

ICGI 2018 Dolatian & Heinz (16)

slide-23
SLIDE 23

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e ∼ q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 Σ ∶ λ ∶ −1

ICGI 2018 Dolatian & Heinz (16)

slide-24
SLIDE 24

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e ∼ q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 Σ ∶ λ ∶ −1

ICGI 2018 Dolatian & Heinz (16)

slide-25
SLIDE 25

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e ∼ q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 ⋊:λ ∶ +1

ICGI 2018 Dolatian & Heinz (16)

slide-26
SLIDE 26

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e ∼ b q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (16)

slide-27
SLIDE 27

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e ∼ b y q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (16)

slide-28
SLIDE 28

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e ∼ b y e q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (16)

slide-29
SLIDE 29

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e ∼ b y e q0 start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

ICGI 2018 Dolatian & Heinz (16)

slide-30
SLIDE 30

2-way FSTs - Total Reduplication

  • Indonesian example: wanita→wanita∼wanita
  • Working example: bye→bye∼bye

Input: ⋊ b y e ⋉ Output: b y e ∼ b y e

  • q0

start q1 q2 q3 q4

⋊:λ:+1 Σ ∶ Σ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊:λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

ICGI 2018 Dolatian & Heinz (16)

slide-31
SLIDE 31

Observe

Total reduplication

R(x) = f(x) ⋅ g(x) where f = g = id.

ICGI 2018 Dolatian & Heinz (17)

slide-32
SLIDE 32

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→?

Input: ⋊ c

  • p

i e s ⋉ Output: q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

ICGI 2018 Dolatian & Heinz (18)

slide-33
SLIDE 33

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

ICGI 2018 Dolatian & Heinz (18)

slide-34
SLIDE 34

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: q0 start q1 q2 q3 q4 q5 q6

C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 ⋊:λ:+1

ICGI 2018 Dolatian & Heinz (18)

slide-35
SLIDE 35

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 C:C:+1

ICGI 2018 Dolatian & Heinz (18)

slide-36
SLIDE 36

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • q0

start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 V:V:+1

ICGI 2018 Dolatian & Heinz (18)

slide-37
SLIDE 37

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 C:C:-1

ICGI 2018 Dolatian & Heinz (18)

slide-38
SLIDE 38

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 ⋊:∼∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 Σ ∶ λ ∶ −1

ICGI 2018 Dolatian & Heinz (18)

slide-39
SLIDE 39

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 ⋊:∼∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 Σ ∶ λ ∶ −1

ICGI 2018 Dolatian & Heinz (18)

slide-40
SLIDE 40

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

∼ q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 Σ ∶ Σ ∶ +1 ⋉:λ:+1 ⋊:∼∶ +1

ICGI 2018 Dolatian & Heinz (18)

slide-41
SLIDE 41

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

∼ c q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (18)

slide-42
SLIDE 42

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

∼ c

  • q0

start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (18)

slide-43
SLIDE 43

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

∼ c

  • p

q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (18)

slide-44
SLIDE 44

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

∼ c

  • p

i q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (18)

slide-45
SLIDE 45

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

∼ c

  • p

i e q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (18)

slide-46
SLIDE 46

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

∼ c

  • p

i e s q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 ⋉:λ:+1 Σ ∶ Σ ∶ +1

ICGI 2018 Dolatian & Heinz (18)

slide-47
SLIDE 47

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

∼ c

  • p

i e s q0 start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

ICGI 2018 Dolatian & Heinz (18)

slide-48
SLIDE 48

2-way FSTs - Partial Reduplication

  • Agta initial-CVC copying: takki→tak∼takki
  • Working example: copies→cop∼copies

Input: ⋊ c

  • p

i e s ⋉ Output: c

  • p

∼ c

  • p

i e s

  • q0

start q1 q2 q3 q4 q5 q6

⋊:λ:+1 C:C:+1 V:V:+1 C:C:-1 Σ ∶ λ ∶ −1 ⋊:∼∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

ICGI 2018 Dolatian & Heinz (18)

slide-49
SLIDE 49

Observe

Partial reduplication

R(x) = f(x) ⋅ g(x) where f truncates x and g = id.

ICGI 2018 Dolatian & Heinz (19)

slide-50
SLIDE 50

Building a computationally explicit typology of patterns

RedTyp

  • SQL database of reduplicative processes + 2-way FST
  • Modeled 138 reduplicative processes across 90 languages using 57

2-way FSTs

  • Average # of states = 8.8
  • Largest 2-way FST has 30 states (would be 1000s for a 1-way

FST)

  • https://github.com/jhdeov/RedTyp

ICGI 2018 Dolatian & Heinz (20)

slide-51
SLIDE 51

Encyclopedia of Categories Regular languages

Regular Non-Counting Locally Threshold Testable Locally Testable Piecewise Testable Strictly Local Strictly Piecewise Successor Precedence Monadic Second Order First Order Propositional Conjunctions

  • f Negative

Literals

(McNaughton and Papert, 1971; Rogers and Pullum, 2011; Rogers et al., 2013)

ICGI 2018 Dolatian & Heinz (21)

slide-52
SLIDE 52

Encyclopedia of Categories Rational functions

Rational R-Sequential L-Sequential Input Strictly Local R-Output Strictly Local L-Output Strictly Local

(Chandlee et al., 2014, 2015)

ICGI 2018 Dolatian & Heinz (22)

slide-53
SLIDE 53

Left OSL subclass

  • The state of any FST computing a Left Output Strictly k-Local

(k-LOSL) function depends only on the last k-1 segments written to the output tape and the last symbol read on the input tape.

  • k-LOSL functions are identifiable in quadratic time and data

with k-OSLFIA.

(Chandlee et al., 2015)

ICGI 2018 Dolatian & Heinz (23)

slide-54
SLIDE 54

Truncation belongs to LOSL

  • English nicknames: truncate name to first (C)VC

(9)

  • a. /dZEfôi/

→ [dZEf] ‘Jeffrey’→‘Jeff’

  • b. /deIvId/

→ [deIv] ‘David’→‘Dave’

  • c. /æl@n/

→ [æl] ‘Alan→Al’

  • 3-OSL because keep track of last 2 segment outputted + the

current segment (and skip anything after the first VC)

ICGI 2018 Dolatian & Heinz (24)

slide-55
SLIDE 55

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→?

Input: ⋊ s æ m j @ l ⋉ Output: q0 start λ C CV VC qf

⋊:λ C:C V:V C:C Σ ∶ λ ⋉:λ V:V

ICGI 2018 Dolatian & Heinz (25)

slide-56
SLIDE 56

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: q0 start λ C CV VC qf

⋊:λ C:C V:V C:C Σ ∶ λ ⋉:λ V:V

ICGI 2018 Dolatian & Heinz (25)

slide-57
SLIDE 57

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: q0 start λ C CV VC qf

⋊:λ C:C V:V C:C Σ ∶ λ ⋉:λ V:V

ICGI 2018 Dolatian & Heinz (25)

slide-58
SLIDE 58

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: q0 start λ C CV VC qf

⋊:λ C:C V:V C:C Σ ∶ λ ⋉:λ V:V

ICGI 2018 Dolatian & Heinz (25)

slide-59
SLIDE 59

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: q0 start λ C CV VC qf

C:C V:V C:C Σ ∶ λ ⋉:λ V:V ⋊:λ

ICGI 2018 Dolatian & Heinz (25)

slide-60
SLIDE 60

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: s q0 start λ C CV VC qf

⋊:λ V:V C:C Σ ∶ λ ⋉:λ V:V C:C

ICGI 2018 Dolatian & Heinz (25)

slide-61
SLIDE 61

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: s æ q0 start λ C CV VC qf

⋊:λ C:C C:C Σ ∶ λ ⋉:λ V:V V:V

ICGI 2018 Dolatian & Heinz (25)

slide-62
SLIDE 62

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: s æ m q0 start λ C CV VC qf

⋊:λ C:C V:V Σ ∶ λ ⋉:λ V:V C:C

ICGI 2018 Dolatian & Heinz (25)

slide-63
SLIDE 63

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: s æ m q0 start λ C CV VC qf

⋊:λ C:C V:V C:C ⋉:λ V:V Σ ∶ λ

ICGI 2018 Dolatian & Heinz (25)

slide-64
SLIDE 64

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: s æ m q0 start λ C CV VC qf

⋊:λ C:C V:V C:C ⋉:λ V:V Σ ∶ λ

ICGI 2018 Dolatian & Heinz (25)

slide-65
SLIDE 65

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: s æ m q0 start λ C CV VC qf

⋊:λ C:C V:V C:C ⋉:λ V:V Σ ∶ λ

ICGI 2018 Dolatian & Heinz (25)

slide-66
SLIDE 66

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: s æ m q0 start λ C CV VC qf

⋊:λ C:C V:V C:C Σ ∶ λ V:V ⋉:λ

ICGI 2018 Dolatian & Heinz (25)

slide-67
SLIDE 67

Truncation belongs to LOSL

  • English: /dZEfri/→[dZEf]
  • Working example: ‘Samuel’ /sæmj@l/→[sæm]

Input: ⋊ s æ m j @ l ⋉ Output: s æ m

  • q0

start λ C CV VC qf

⋊:λ C:C V:V C:C Σ ∶ λ ⋉:λ V:V

ICGI 2018 Dolatian & Heinz (25)

slide-68
SLIDE 68

Observe

Partial reduplication

R(x) = f(x) ⋅ g(x) where f truncates x and g = id. Both f and g are 1-way LOSL functions!

ICGI 2018 Dolatian & Heinz (26)

slide-69
SLIDE 69

Defining C-LOSL

A function R is C-k-LOSL iff R(x) = f(x) ⋅ g(x) & both f,g are k-LOSL.

  • Right C-OSL can be similarly defined
  • Can be generalized to the concatenation of n 1-way functions.
  • Can define a class of C-Sequential functions in this way and so on.

ICGI 2018 Dolatian & Heinz (27)

slide-70
SLIDE 70

Concatenating 1-way FSTs

Left OSL 1-way FSTs for Trunc(x) and ID(x) q0 start λ1 C1

CV1

CV C1

qf q0 start λ2 qf

⋊:λ:+1 C:C:+1 V:V:+1 C:C:+1 Σ ∶ λ ∶ +1 ⋉:λ:+1 ⋊ ∶ λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

Make C-OSL by concatenation

ICGI 2018 Dolatian & Heinz (28)

slide-71
SLIDE 71

Concatenating 1-way FSTs

q0 start λ1 C1

CV1

CV C1

qf

rewind

λ2 qf

⋊:λ:+1 C:C:+1 V:V:+1 C:C:+1 Σ ∶ λ ∶ +1 ⋉:∼∶ −1 Σ ∶ λ ∶ −1 ⋊ ∶ λ ∶ +1 Σ ∶ Σ ∶ +1 ⋉:λ:+1

ICGI 2018 Dolatian & Heinz (29)

slide-72
SLIDE 72

What’s C-OSL

  • 87% of RedTyp is C-OSL.

(10) Total reduplication wanita→wanita∼wanita (Indonesian) (11) Partial reduplication a. CV: guyon→gu∼guyon (Sundanese)

  • b. CVC:

takki→ tak∼takki

  • c. CVCV:

banagañu→bana∼banagañu (Dyirbal)

  • The other ∼13% appear to require:
  • 1. Concatenation of Sequential functions.
  • 2. Compositions of OSL or Sequential functions.
  • More in appendix

ICGI 2018 Dolatian & Heinz (30)

slide-73
SLIDE 73

Learning C-OSL

  • Learning k-C-OSL can be reduced to learning the 1-way k-OSL

functions

  • 1. If the boundary is overtly marked, then this is straightforward.
  • 2. if the boundary is not overtly marked then the learning reduces to

finding this boundary (still an open problem).

  • Reducing learning C-OSL to learning OSL modularizes learning

and and builds on pre-existing results in GI.

ICGI 2018 Dolatian & Heinz (31)

slide-74
SLIDE 74

BB-COSLL - learning with boundaries

BB-COSLL: Boundary-Based C-OSL learner

  • Input: boundary-enriched data sample S and a positive integer k

▸ Example S = {(cat, cat∼cat), (bird, bir∼bird), (music,

mus∼music) . . . }

  • Algorithm:
  • 1. Break up S into two data sets

H1 = {(w, u) ∣ (w, u ∼ v) ∈ S} and H2 = {(w, v) ∣ (w, u ∼ v) ∈ S}

  • 2. Submit H1 and H2 and k to OSLFIA which outputs 1-way

transducers T1 and T2.

  • 3. Concatenate T1 and T2.

ICGI 2018 Dolatian & Heinz (32)

slide-75
SLIDE 75

BB-COSLL Discussion

  • 1. BB-COSLL provably learns C-OSL functions iff H1,H2 are

characteristic samples for T1,T2 respectively.

  • 2. What about the boundary?

▸ BB-COSLL depends on the boundary ∼ in the input. ▸ The boundary is an overt manifestation of the concatenation in

the “derivation” of the reduplicated form.

▸ Potential phonological evidence for the boundary (appendix) ▸ If this boundary is unknown, learning reduces to morpheme

segmentation, which is an open problem.

(Goldsmith et al., 2017)

ICGI 2018 Dolatian & Heinz (33)

slide-76
SLIDE 76

2FST Discussion

  • Reduplication is widespread in natural languages.
  • It has generally been considered beyond finite-state analysis.
  • However, it is amenable to an analysis with 2-way FSTs.
  • How come 2way FSTs have been off the radar of computational

linguists all this time??? Maybe transducers are harder to study?

▸ While

  • 1DFA
  • =
  • 2DFA
  • =
  • 1NFA
  • =
  • 2NFA
  • =
  • MSO
  • ,
  • 2NFT
  • 2DFT
  • =
  • MSO
  • 1NFT
  • 1fNFT
  • 1DFT
  • ▸ Note 50 years passed between Büchi’s theorem and Engelfriedt

and Hoogeboom’s.

(Filiot and Reynier, 2016)

ICGI 2018 Dolatian & Heinz (34)

slide-77
SLIDE 77

Conclusions

  • Contributions
  • 1. RedTyp database
  • 2. Show that 2-way FSTs can model virtually the entire typology of

reduplication

  • 3. ∼87% of this typology belongs to the C-OSL subclass.
  • 4. simple learning algorithm for C-OSL which builds off of OSLFIA

but uses boundary-enriched sample.

  • Future research:
  • 1. Adding to RedTyp.
  • 2. Studying the non-C-OSL transformations more carefully.
  • 3. Learning without boundaries.
  • 4. Studying the trade-off between 1-way vs. 2-way FSTs for learning

partial reduplication.

ICGI 2018 Dolatian & Heinz (35)

slide-78
SLIDE 78

Thank you and Thank you Wrocław [vrOtswaf] !

ICGI 2018 Dolatian & Heinz (36)

slide-79
SLIDE 79

References

Albro, D. M. (2005). Studies in Computational Optimality Theory, with Special Reference to the Phonological System of Malagasy.

  • Ph. D. thesis, University of California, Los Angeles, Los Angeles.

Alur, R. (2010). Expressiveness of streaming string transducers. In Proceedings of the 30th Annual Conference on Foundations of Software Technology and Theoretical Computer Science,, Volume 8,

  • pp. 1–12.

Beesley, K. R. and L. Karttunen (2003). Finite-state morphology: Xerox tools and techniques. CSLI Publications. Bojańczyk, M. (2014). Transducers with origin information. In

  • J. Esparza, P. Fraigniaud, T. Husfeldt, and E. Koutsoupias (Eds.),

Automata, Languages, and Programming, Berlin, Heidelberg, pp. 26–37. Springer. Chandlee, J., R. Eyraud, and J. Heinz (2014). Learning strictly local subsequential functions. Transactions of the Association for Computational Linguistics 2, 491–503. Chandlee, J., R. Eyraud, and J. Heinz (2015, July). Output strictly local functions. In Proceedings of the 14th Meeting on the Mathematics of Language (MoL 2015), Chicago, USA, pp. 112–125. Cohen-Sygal, Y. and S. Wintner (2006). Finite-state registered automata for non-concatenative morphology. Computational

ICGI 2018 Dolatian & Heinz (37)