T-orders in MaxEnt Arto Anttila (Stanford University) and Giorgio - - PowerPoint PPT Presentation

t orders in maxent
SMART_READER_LITE
LIVE PREVIEW

T-orders in MaxEnt Arto Anttila (Stanford University) and Giorgio - - PowerPoint PPT Presentation

T-orders in MaxEnt Arto Anttila (Stanford University) and Giorgio Magri (CNRS) Society for Computation in Linguistics Salt Lake City | January 4-7, 2018 A. Anttila and G. Magri T-orders in MaxEnt SCiL 2018 1 / 48 Introduction A. Anttila and


slide-1
SLIDE 1

T-orders in MaxEnt

Arto Anttila (Stanford University) and Giorgio Magri (CNRS) Society for Computation in Linguistics Salt Lake City | January 4-7, 2018

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 1 / 48

slide-2
SLIDE 2

Introduction

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 2 / 48

slide-3
SLIDE 3

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

A good linguistic theory should neither under-generate (does not miss

any attested pattern) nor over-generate (does not predict any “unattestable” pattern)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 3 / 48

slide-4
SLIDE 4

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

A good linguistic theory should neither under-generate (does not miss

any attested pattern) nor over-generate (does not predict any “unattestable” pattern)

Rich literature argues that Max Entropy (ME) is rich enough to avoid

under-generation

[Zuraw and Hayes 2017]

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 3 / 48

slide-5
SLIDE 5

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

A good linguistic theory should neither under-generate (does not miss

any attested pattern) nor over-generate (does not predict any “unattestable” pattern)

Rich literature argues that Max Entropy (ME) is rich enough to avoid

under-generation

[Zuraw and Hayes 2017]

But does ME over-generate?

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 3 / 48

slide-6
SLIDE 6

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

A good linguistic theory should neither under-generate (does not miss

any attested pattern) nor over-generate (does not predict any “unattestable” pattern)

Rich literature argues that Max Entropy (ME) is rich enough to avoid

under-generation

[Zuraw and Hayes 2017]

But does ME over-generate? Over-generation is “easy” to investigate for categorical theories such as

HG: the typology is usually finite and can be exhaustively listed

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 3 / 48

slide-7
SLIDE 7

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

A good linguistic theory should neither under-generate (does not miss

any attested pattern) nor over-generate (does not predict any “unattestable” pattern)

Rich literature argues that Max Entropy (ME) is rich enough to avoid

under-generation

[Zuraw and Hayes 2017]

But does ME over-generate? Over-generation is “easy” to investigate for categorical theories such as

HG: the typology is usually finite and can be exhaustively listed

The situation is very different for probabilistic theories such as ME:

the typology consists of an infinite number of probability distributions which therefore cannot be exhaustively listed

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 3 / 48

slide-8
SLIDE 8

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

A natural way around this problem is to enumerate not the individual

grammars/distributions in the typology, but the corresponding set of predicted implicational universals

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 4 / 48

slide-9
SLIDE 9

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

A natural way around this problem is to enumerate not the individual

grammars/distributions in the typology, but the corresponding set of predicted implicational universals

An implicational universal is an implication

[Greenberg 1963]

P − → P which holds whenever every language with property P has property P

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 4 / 48

slide-10
SLIDE 10

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

A natural way around this problem is to enumerate not the individual

grammars/distributions in the typology, but the corresponding set of predicted implicational universals

An implicational universal is an implication

[Greenberg 1963]

P − → P which holds whenever every language with property P has property P

The idea is that a phonological theory over-generates provided it

generates so many languages/grammars/distributions that implicational universals become very hard to satisfy (they involve universal quantification)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 4 / 48

slide-11
SLIDE 11

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

A natural way around this problem is to enumerate not the individual

grammars/distributions in the typology, but the corresponding set of predicted implicational universals

An implicational universal is an implication

[Greenberg 1963]

P − → P which holds whenever every language with property P has property P

The idea is that a phonological theory over-generates provided it

generates so many languages/grammars/distributions that implicational universals become very hard to satisfy (they involve universal quantification)

And the phonological theory thus fails to predict many implicational

universals that seem like they should hold of natural language phonology

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 4 / 48

slide-12
SLIDE 12

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Consider a typology T of categorical phonological grammars,

construed as mappings from URs to SRs

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 5 / 48

slide-13
SLIDE 13

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Consider a typology T of categorical phonological grammars,

construed as mappings from URs to SRs

Within this framework, the simplest antecedent property P is the

property of mapping a certain UR x to a certain SR y: (x, y)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 5 / 48

slide-14
SLIDE 14

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Consider a typology T of categorical phonological grammars,

construed as mappings from URs to SRs

Within this framework, the simplest antecedent property P is the

property of mapping a certain UR x to a certain SR y: (x, y)

Analogously, the simplest consequent property

P is the property of mapping a certain UR x to a certain SR y: ( x, y)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 5 / 48

slide-15
SLIDE 15

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Consider a typology T of categorical phonological grammars,

construed as mappings from URs to SRs

Within this framework, the simplest antecedent property P is the

property of mapping a certain UR x to a certain SR y: (x, y)

Analogously, the simplest consequent property

P is the property of mapping a certain UR x to a certain SR y: ( x, y)

We consider the simplest implicational universal

(x, y)

T

→ ( x, y) holds provided each grammar in T which maps x to y also maps x to y

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 5 / 48

slide-16
SLIDE 16

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Consider a typology T of categorical phonological grammars,

construed as mappings from URs to SRs

Within this framework, the simplest antecedent property P is the

property of mapping a certain UR x to a certain SR y: (x, y)

Analogously, the simplest consequent property

P is the property of mapping a certain UR x to a certain SR y: ( x, y)

We consider the simplest implicational universal

(x, y)

T

→ ( x, y) holds provided each grammar in T which maps x to y also maps x to y

  • T

→ is a partial order called the T-order induced by T [Anttila and Andrus 2006]

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 5 / 48

slide-17
SLIDE 17

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Consider a typology T of categorical phonological grammars,

construed as mappings from URs to SRs

Within this framework, the simplest antecedent property P is the

property of mapping a certain UR x to a certain SR y: (x, y)

Analogously, the simplest consequent property

P is the property of mapping a certain UR x to a certain SR y: ( x, y)

We consider the simplest implicational universal

(x, y)

T

→ ( x, y) holds provided each grammar in T which maps x to y also maps x to y

  • T

→ is a partial order called the T-order induced by T [Anttila and Andrus 2006]

For instance, any dialect of English which deletes t/d before V, also

does before C: (/cost.us/, [cos.us]) − → (/cost.me/, [cos.me])

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 5 / 48

slide-18
SLIDE 18

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Implicational universals can also be statistical: variable t/d deletion is

more frequent before C than V

[Guy 1991; Kiparsky 1993; Coetzee 2004]

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 6 / 48

slide-19
SLIDE 19

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Implicational universals can also be statistical: variable t/d deletion is

more frequent before C than V

[Guy 1991; Kiparsky 1993; Coetzee 2004]

Consider a typology T of probabilistic phonological grammars,

construed as functions from underlying forms to probability distributions over surface forms

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 6 / 48

slide-20
SLIDE 20

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Implicational universals can also be statistical: variable t/d deletion is

more frequent before C than V

[Guy 1991; Kiparsky 1993; Coetzee 2004]

Consider a typology T of probabilistic phonological grammars,

construed as functions from underlying forms to probability distributions over surface forms

Within this probabilistic setting, we say that the implicational universal

(x, y)

T

→ ( x, y) holds provided each grammar in T assigns a probability to the consequent ( x, y) which is at least as large as the probability it assigns to the antecedent (x, y)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 6 / 48

slide-21
SLIDE 21

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Implicational universals can also be statistical: variable t/d deletion is

more frequent before C than V

[Guy 1991; Kiparsky 1993; Coetzee 2004]

Consider a typology T of probabilistic phonological grammars,

construed as functions from underlying forms to probability distributions over surface forms

Within this probabilistic setting, we say that the implicational universal

(x, y)

T

→ ( x, y) holds provided each grammar in T assigns a probability to the consequent ( x, y) which is at least as large as the probability it assigns to the antecedent (x, y)

This extension of T-orders to the probabilistic setting makes sense:

◮ categorical definition of T-orders is a special case of probabilistic one ◮ categorical T-orders of HG = probabilistic T-orders of stochastic HG

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 6 / 48

slide-22
SLIDE 22

T-orders in HG

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 7 / 48

slide-23
SLIDE 23

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

We want to establish the HG entailment (x, y)

HG

− → ( x, y)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 8 / 48

slide-24
SLIDE 24

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

We want to establish the HG entailment (x, y)

HG

− → ( x, y)

Focus on the antecedent mapping (x, y)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 8 / 48

slide-25
SLIDE 25

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

We want to establish the HG entailment (x, y)

HG

− → ( x, y)

Focus on the antecedent mapping (x, y)

consider a competing loser mapping (x, z)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 8 / 48

slide-26
SLIDE 26

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

We want to establish the HG entailment (x, y)

HG

− → ( x, y)

Focus on the antecedent mapping (x, y)

consider a competing loser mapping (x, z) compute the corresponding antecedent difference vector: C(x, y, z) = violations of

the loser (x, z)− violations of the antecedent (x, y) =

     C1(x, z) − C1(x, y) C2(x, z) − C2(x, y) . . . Cn(x, z) − Cn(x, y)     

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 8 / 48

slide-27
SLIDE 27

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

We want to establish the HG entailment (x, y)

HG

− → ( x, y)

Focus on the antecedent mapping (x, y)

consider a competing loser mapping (x, z) compute the corresponding antecedent difference vector: C(x, y, z) = violations of

the loser (x, z)− violations of the antecedent (x, y) =

     C1(x, z) − C1(x, y) C2(x, z) − C2(x, y) . . . Cn(x, z) − Cn(x, y)     

The consequent difference vectors

C( x, y, z) are defined analogously, as pitting the consequent mapping ( x, y) against one of its losers ( x, z)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 8 / 48

slide-28
SLIDE 28

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Suppose there are only n = 2 constraints,

so that the antecedent difference vectors can be plotted as points in the plane

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 9 / 48

slide-29
SLIDE 29

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Suppose there are only n = 2 constraints,

so that the antecedent difference vectors can be plotted as points in the plane

The red region is their corresponding

convex cone

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 10 / 48

slide-30
SLIDE 30

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Suppose there are only n = 2 constraints,

so that the antecedent difference vectors can be plotted as points in the plane

The red region is their corresponding

convex cone

The points which are larger than some

point in this cone yield the gray region

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 11 / 48

slide-31
SLIDE 31

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Suppose there are only n = 2 constraints,

so that the antecedent difference vectors can be plotted as points in the plane

The red region is their corresponding

convex cone

The points which are larger than some

point in this cone yield the gray region

  • The HG entailment (x, y)

HG

− → ( x, y) holds (for any n) iff each conse- quent difference vector lives in this gray region: it is larger than some vector in the cone generated by the antecedent difference vectors

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 11 / 48

slide-32
SLIDE 32

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Suppose there are only n = 2 constraints,

so that the antecedent difference vectors can be plotted as points in the plane

The red region is their corresponding

convex cone

The points which are larger than some

point in this cone yield the gray region

  • The HG entailment (x, y)

HG

− → ( x, y) holds (for any n) iff each conse- quent difference vector lives in this gray region: it is larger than some vector in the cone generated by the antecedent difference vectors

Follows from the Hyperplane Separation theorem of convex geometry

through straightforward algebra

[Boyd and Vandenberghe 2004]

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 11 / 48

slide-33
SLIDE 33

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Equivalently, the HG entailment (x, y)

HG

− → ( x, y) holds iff for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z)

  • consequent

difference vector

  • z

λz C(x, y, z)

  • antecedent

difference vector

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 12 / 48

slide-34
SLIDE 34

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Equivalently, the HG entailment (x, y)

HG

− → ( x, y) holds iff for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z)

  • consequent

difference vector

  • z

λz C(x, y, z)

  • antecedent

difference vector This makes sense: it says that each consequent loser

z violates the constraints more than the antecedent losers z, so that the consequent winner y has to fight an easier battle than the antecedent winner y

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 12 / 48

slide-35
SLIDE 35

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Equivalently, the HG entailment (x, y)

HG

− → ( x, y) holds iff for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z)

  • consequent

difference vector

  • z

λz C(x, y, z)

  • antecedent

difference vector This makes sense: it says that each consequent loser

z violates the constraints more than the antecedent losers z, so that the consequent winner y has to fight an easier battle than the antecedent winner y

While (x, y)

HG

− → ( x, y) is expensive to check directly (because of universal quantification over weights), the characterization above is easy to check with linear programming (conic feasibility problem)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 12 / 48

slide-36
SLIDE 36

T-orders in ME

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 13 / 48

slide-37
SLIDE 37

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

By definition, the ME entailment (x, y)

ME

− → ( x, y) means that PME

w (x, y) ≤ PME w (

x, y) holds for every choice of the weights w

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 14 / 48

slide-38
SLIDE 38

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

By definition, the ME entailment (x, y)

ME

− → ( x, y) means that

exp{−

k wkCk(x,y)}

  • z exp{−

k wkCk(x,z)} ≤

exp{−

k wkCk(

x, y)}

  • z exp{−

k wkCk(

x, z)}

holds for every choice of the weights w

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 15 / 48

slide-39
SLIDE 39

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

By definition, the ME entailment (x, y)

ME

− → ( x, y) means that

exp{−

k wkCk(x,y)}

  • z exp{−

k wkCk(x,z)} ≤

exp{−

k wkCk(

x, y)}

  • z exp{−

k wkCk(

x, z)}

holds for every choice of the weights w

For weights w = 0 all equal to zero, this inequality becomes:

1

# antecedent candidates ≤

1

# consequent candidates

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 15 / 48

slide-40
SLIDE 40

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

By definition, the ME entailment (x, y)

ME

− → ( x, y) means that

exp{−

k wkCk(x,y)}

  • z exp{−

k wkCk(x,z)} ≤

exp{−

k wkCk(

x, y)}

  • z exp{−

k wkCk(

x, z)}

holds for every choice of the weights w

For weights w = 0 all equal to zero, this inequality becomes:

1

# antecedent candidates ≤

1

# consequent candidates

Thus, a necessary condition for the ME entailment (x, y)

ME

− → ( x, y) is that the antecedent has at least as many candidates as the consequent

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 15 / 48

slide-41
SLIDE 41

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

By definition, the ME entailment (x, y)

ME

− → ( x, y) means that

exp{−

k wkCk(x,y)}

  • z exp{−

k wkCk(x,z)} ≤

exp{−

k wkCk(

x, y)}

  • z exp{−

k wkCk(

x, z)}

holds for every choice of the weights w

For weights w = 0 all equal to zero, this inequality becomes:

1

# antecedent candidates ≤

1

# consequent candidates

Thus, a necessary condition for the ME entailment (x, y)

ME

− → ( x, y) is that the antecedent has at least as many candidates as the consequent

This makes sense: as the number of candidates increases, each one of

them gets a smaller share of the probability mass

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 15 / 48

slide-42
SLIDE 42

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

By definition, the ME entailment (x, y)

ME

− → ( x, y) means that

exp{−

k wkCk(x,y)}

  • z exp{−

k wkCk(x,z)} ≤

exp{−

k wkCk(

x, y)}

  • z exp{−

k wkCk(

x, z)}

holds for every choice of the weights w

For weights w = 0 all equal to zero, this inequality becomes:

1

# antecedent candidates ≤

1

# consequent candidates

Thus, a necessary condition for the ME entailment (x, y)

ME

− → ( x, y) is that the antecedent has at least as many candidates as the consequent

This makes sense: as the number of candidates increases, each one of

them gets a smaller share of the probability mass

The first difference between T-orders in HG and ME is that

T-orders in ME are subject to a candidate condition which we will argue makes little phonological sense

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 15 / 48

slide-43
SLIDE 43

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Suppose there are only n = 2 constraints,

so that the antecedent difference vectors can be plotted as points in the plane

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 16 / 48

slide-44
SLIDE 44

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Suppose there are only n = 2 constraints,

so that the antecedent difference vectors can be plotted as points in the plane

The red region is their corresponding

convex hull (smaller than the convex cone considered before)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 17 / 48

slide-45
SLIDE 45

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Suppose there are only n = 2 constraints,

so that the antecedent difference vectors can be plotted as points in the plane

The red region is their corresponding

convex hull (smaller than the convex cone considered before)

The points which are larger than some

point in this hull yield the gray region

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 18 / 48

slide-46
SLIDE 46

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Suppose there are only n = 2 constraints,

so that the antecedent difference vectors can be plotted as points in the plane

The red region is their corresponding

convex hull (smaller than the convex cone considered before)

The points which are larger than some

point in this hull yield the gray region

  • If the ME T-order (x, y)

ME

− → ( x, y) holds (for any n), each conse- quent difference vector lives in this gray region: it is larger than some vector in the convex hull of the antecedent difference vectors

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 18 / 48

slide-47
SLIDE 47

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Equivalently, if the ME entailment (x, y)

ME

− → ( x, y) holds, for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z) ≥

  • z

λz C(x, y, z) subject to the additional condition that

z λz = 1

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 19 / 48

slide-48
SLIDE 48

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Equivalently, if the ME entailment (x, y)

ME

− → ( x, y) holds, for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z) ≥

  • z

λz C(x, y, z) subject to the additional condition that

z λz = 1

The second difference between T-orders in HG and ME is that

T-orders in ME are subject to a normalization condition on the coefficients λ’s which we will argue makes little phonological sense

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 19 / 48

slide-49
SLIDE 49

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Equivalently, if the ME entailment (x, y)

ME

− → ( x, y) holds, for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z) ≥

  • z

λz C(x, y, z) subject to the additional condition that

z λz = 1

The second difference between T-orders in HG and ME is that

T-orders in ME are subject to a normalization condition on the coefficients λ’s which we will argue makes little phonological sense

This necessary condition is also sufficient when both antecedent UR x

and consequent UR x have at most three candidates

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 19 / 48

slide-50
SLIDE 50

Introduction T-orders in HG T-orders in ME Phonological applications 1 Phonological applications 2

Equivalently, if the ME entailment (x, y)

ME

− → ( x, y) holds, for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z) ≥

  • z

λz C(x, y, z) subject to the additional condition that

z λz = 1

The second difference between T-orders in HG and ME is that

T-orders in ME are subject to a normalization condition on the coefficients λ’s which we will argue makes little phonological sense

This necessary condition is also sufficient when both antecedent UR x

and consequent UR x have at most three candidates

For the general case, as long as x and

x have the same number of candidates, we derive a (stronger) sufficient constraint condition, based on the Tomic-Weyl majorization theorem

[Marshall et al. 2010, p. 157]

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 19 / 48

slide-51
SLIDE 51

Phonological applications 1 CV syllabification

[Prince and Smolensky 2004]

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 20 / 48

slide-52
SLIDE 52

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

CV syllabification

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 21 / 48

slide-53
SLIDE 53

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

OT and HG typologies coincide 1 2 3 4 /CV/: CV CV CV CV /CVC/: CV CV CVC CVC /VC/: CV V CVC VC /V/: CV V CV V

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 22 / 48

slide-54
SLIDE 54

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Therefore OT and HG T-orders coincide There are 100 entailments:

  • 16 have a feasible antecedent

Example: (VC, CVC) (CVC, CVC) 8 fail in ME Example: (VC, VC) (CVC, CVC)

  • 84 have an unfeasible antecedent, 56 fail in ME
  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 23 / 48

slide-55
SLIDE 55

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

8 implicational universals are lost in ME (red bars)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 24 / 48

slide-56
SLIDE 56

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

What do the survivors have in common? (a) The antecedent and the consequent have identical outputs and the antecedent input is more marked than the consequent input. Example: (VC, CVC) (CVC, CVC) (b) The antecedent is the most marked and the consequent is the least marked faithful mapping. Example: (VC, VC) (CV, CV) But why only these?

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 25 / 48

slide-57
SLIDE 57

Phonological applications 2 Obstruent voicing

[Lombardi 1999; Helgason and Ringen 2008]

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 26 / 48

slide-58
SLIDE 58

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Swedish (Lombardi 1999, Helgason and Ringen 2008)

(a) skog /g#/ [g] faithful ‘forest’ (b) vigsel /gs/ [ks] regressive devoicing ‘wedding’ byggt /g-t/ [kt] regressive devoicing ‘build-SUPINE’ stekte /k-d/ [kt] progressive devoicing ‘fry-PAST’ (c) ägde /g-d/ [gd] faithful ‘own-PAST’ (a) Word-final consonants maintain contrast. (b) If adjacent consonants disagree, both become voiceless. (c) If adjacent consonants agree, nothing happens.

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 27 / 48

slide-59
SLIDE 59

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Constraints (Lombardi 1999)

  • a. ID([VOICE])

Be faithful to [voice].

  • b. IDONSET([VOICE])

Be faithful to [ voice] in onsets.

  • c. AGREE([VOICE])

CC clusters agree in [ voice].

  • d. *VOICE

No [+voice] in the output. T = voiceless obstruent D = voiced obstruent

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 28 / 48

slide-60
SLIDE 60

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Constraint violations

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 29 / 48

slide-61
SLIDE 61

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

6 OT languages (framed), 9 HG languages

1 2 3 4 5 6 7 8 9 /T#/ T# T# T# T# T# T# T# T# T# /D#/ D# D# D# T# T# T# T# D# T# /DT/ TT DT TT TT TT TT TT TT TT /TD/ TT TD DD DD TD TT TT TD TD /DD/ DD DD DD DD TD TT DD DD DD Swedish English Yiddish Polish German Finnish

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 30 / 48

slide-62
SLIDE 62

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Nevertheless, OT and HG T-orders coincide There are 86 entailments:

  • 24 have a feasible antecedent, 9 fail in ME
  • 62 have an unfeasible antecedent, 29 fail in ME
  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 31 / 48

slide-63
SLIDE 63

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

9 implicational universals are lost in ME (red bars)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 32 / 48

slide-64
SLIDE 64

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

This one survives (TD, TT) (DT, TT) ‘Progressive devoicing implies regressive devoicing’ This is one of Lombardi’s key results. Swedish: stekte /k-d/ [kt] progressive devoicing ‘fry-PAST’ vigsel /gs/ [ks] regressive devoicing ‘wedding’

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 33 / 48

slide-65
SLIDE 65

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

This one is lost (D#, T#) (DT, TT) ‘Word-final devoicing implies syllable-final devoicing’ This is another one of Lombardi’s key results. Standard German: Lob /b#/ [p] word-final devoicing ‘praise’ Jagden /g-d/ [kd] syllable-final devoicing ‘hunts’

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 34 / 48

slide-66
SLIDE 66

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Why does (D#, T#) (DT, TT) fail? The antecedent has fewer candidates than the consequent.

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 35 / 48

slide-67
SLIDE 67

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

If we delete harmonically bounded candidates… …the universal (D#, T#) (DT, TT) is revived because the antecedent and consequent have the same number of candidates. (This is the green bar.)

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 36 / 48

slide-68
SLIDE 68

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

This one survives (DT, DT) (TD, TD) ‘Preserving voicing in coda implies preserving it in onset’ English:

  • btain, subtitle

/bt/ [bt] voiced-voiceless frostbite, seatbelt /tb/ [tb] voiceless-voiced

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 37 / 48

slide-69
SLIDE 69

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

The violation tableau Both have the same number of candidates.

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 38 / 48

slide-70
SLIDE 70

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

If we delete harmonically bounded candidates… … the universal (DT, DT) (TD, TD) is lost because now the antecedent has fewer candidates than the

  • consequent. (This is the blue bar.)
  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 39 / 48

slide-71
SLIDE 71

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

We also added candidates, but the German universal remains lost.

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 40 / 48

slide-72
SLIDE 72

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Sample weights that break the German universal AGR(voi) ID(voi) *VOI 1.79175947 ID-ONS(voi) 0 MAX DEP 1.79175947

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 41 / 48

slide-73
SLIDE 73

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

More puzzles The following five survive: (DD, TD) -> (T#, T#) (any language) (DT, DT) -> (T#, T#) (any language) (D#, T#) -> (T#, T#) (any language) (TD, TD) -> (T#, T#) (any language) (D#, D#) -> (T#, T#) (any language) ‘No spontaneous final voicing’

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 42 / 48

slide-74
SLIDE 74

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

More puzzles The following five are lost: (DT, TT) -> (T#, T#) (any language) (TD, DD) -> (T#, T#) (any language) (DD, TT) -> (T#, T#) (any language) (DD, DD) -> (T#, T#) (any language) (TD, TT) -> (T#, T#) (any language) ‘No spontaneous final voicing.’

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 43 / 48

slide-75
SLIDE 75

Introduction Formal results 1 Formal results 2 Phonological applications 1 Phonological applications 2

Preliminary conclusions

What have we learned about MaxEnt T-orders?

  • They exist, but are relatively sparse.
  • The fact that so many reasonable implicational

universals fail suggests that MaxEnt overgenerates. Software for computing OT, HG, and MaxEnt T-orders will be made available soon.

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 44 / 48

slide-76
SLIDE 76

Thank you!

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 45 / 48

slide-77
SLIDE 77

References

Anttila, Arto, and Curtis Andrus. 2006. T-orders. manuscript and software. URL www.stanford.edu/~anttila/research/torders/t-order-manual.pdf, Stanford University. Boyd, Stephen, and Lieven Vandenberghe. 2004. Convex optimization. Cambridge University Press. Coetzee, Andries W. 2004. What it means to be a loser: Non-optimal candidates in optimality theory. Doctoral Dissertation, University of Massachusetts, Amherst. Greenberg, Joseph H. 1963. Universals of language. Cambridge, MA: MIT Press. Guy, G. 1991. Explanation in variable phonology. Language Variation and Change 3:1–22. Helgason, Pétur, and Catherine Ringen. 2008. Voicing and aspiration in Swedish stops. Journal of phonetics 36.4:607–628. Kiparsky, Paul. 1993. An ot perspective on phonological variation. Handout, available at http://www. stanford. edu/ kiparsky/Papers/nwave94. Lombardi, Linda. 1999. Positional faithfulness and voicing assimilation in Optimality Theory. Natural Language and Linguistic Theory 17:267–302. Marshall, A., I. Olin, and B. Arnold. 2010. Inequalities: Theory of majorization and its applications. Springer Series in Statistics. Springer. Prince, Alan, and Paul Smolensky. 2004. Optimality Theory: Constraint interaction in generative grammar. Oxford:

  • Blackwell. As Technical Report CU-CS-696-93, Department of Computer Science, University of Colorado at

Boulder, and Technical Report TR-2, Rutgers Center for Cognitive Science, Rutgers University, New Brunswick, NJ, April 1993. Also available as ROA 537 version. Zuraw, Kie, and Bruce Hayes. 2017. Intersecting constraint families: an argument for harmonic grammar. Language 93.3:497–546.

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 46 / 48

slide-78
SLIDE 78

Appendix

Reminder: the HG entailment (x, y)

HG

− → ( x, y) holds iff for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z)

  • consequent

difference vector

  • z

λz C(x, y, z)

  • antecedent

difference vector

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 47 / 48

slide-79
SLIDE 79

Appendix

Reminder: the HG entailment (x, y)

HG

− → ( x, y) holds iff for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z)

  • consequent

difference vector

  • z

λz C(x, y, z)

  • antecedent

difference vector This condition holds for the entailment (D#, T#) → (DT, TT) with

all λ’s equal to 0 but the one corresponding to antecedent loser D#:

C(DT, TT, D#) = [0 −1 1 0 1 0] ≥ λC(D#, T#, D#) = [0 −1 1 0 0 0], λ = 1 C(DT, TT, DT) = [1 −1 1 0 0 0] ≥ λC(D#, T#, D#) = [0 −1 1 0 0 0], λ = 1 C(DT, TT, T#) = [0 0 0 0 1 0] ≥ λC(D#, T#, D#) = [0 −1 1 0 0 0], λ = 0 C(DT, TT, DD) = [0 0 2 1 0 0] ≥ λC(D#, T#, D#) = [0 −1 1 0 0 0], λ = 0 C(DT, TT, TD) = [1 1 1 1 0 0] ≥ λC(D#, T#, D#) = [0 −1 1 0 0 0], λ = 0

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 47 / 48

slide-80
SLIDE 80

Appendix

Reminder: if the ME entailment (x, y)

ME

− → ( x, y) holds, for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z)

  • consequent

difference vector

  • z

λz C(x, y, z)

  • antecedent

difference vector

subject to the additional condition that

z λz = 1

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 48 / 48

slide-81
SLIDE 81

Appendix

Reminder: if the ME entailment (x, y)

ME

− → ( x, y) holds, for every consequent loser z, we can assign a non-negative coefficient λz ≥ 0 to each antecedent loser z such that: C( x, y, z)

  • consequent

difference vector

  • z

λz C(x, y, z)

  • antecedent

difference vector

subject to the additional condition that

z λz = 1

This condition fails for instance for the consequent difference vector

C(DT, TT, T#) = [0 0 0 0 1 0] as the antecedent difference vectors are as follows: C(D#, T#, D#) = [0 − 1 1 0 0 0] C(D#, T#, TT) = [0 0 0 0 0 1] C(D#, T#, DT) = [1 − 1 1 0 0 1] C(D#, T#, DD) = [0 − 1 2 0 0 1] C(D#, T#, TD) = [1 0 1 0 0 1]

  • A. Anttila and G. Magri

T-orders in MaxEnt SCiL 2018 48 / 48