Productivity, Reuse, and Competition between Generalizations - - PowerPoint PPT Presentation

productivity reuse and competition between generalizations
SMART_READER_LITE
LIVE PREVIEW

Productivity, Reuse, and Competition between Generalizations - - PowerPoint PPT Presentation

Productivity, Reuse, and Competition between Generalizations Timothy J. ODonnell MIT Two Problems 1. Problem of Competition 2. Problem of Productivity The Problem of Competition When multiple ways of expressing a meaning exist, how do


slide-1
SLIDE 1

Productivity, Reuse, and Competition between Generalizations

Timothy J. O’Donnell MIT

slide-2
SLIDE 2

Two Problems

1. Problem of Competition 2. Problem of Productivity

slide-3
SLIDE 3

The Problem of Competition

When multiple ways of expressing a meaning exist, how do we decide between them?

slide-4
SLIDE 4

Competition

(e.g., Aronoff, 1976; Plag, 2003; Rainer, 1988; van Marle, 1986)

  • Examples
  • Computed v. Stored
  • goed v. went
  • Computed v. Computed
  • splinged v. splang (Albright & Hayes, 2003)
  • Multi-way competition
slide-5
SLIDE 5

Multi-way Competition

  • Hierarchical and recursive structures often

give rise to multi-way competition between different combinations of stored and computed subexpression.

slide-6
SLIDE 6

Multi-way Competition

(Aronoff, 1976)

6

slide-7
SLIDE 7

Competition Resolution

  • Competition is resolved in general following the

elsewhere condition (subset principle, Pāṇini’s principle, blocking, pre-emption, etc.)

  • “More specific” way of expressing meaning is preferred to

“more general” way.

  • Variability in strength of preferences
  • goed v. went
  • curiosity v. curiousness, depulsiveness v. depulsivity (Aronoff &

Schvaneveldt, 1978)

  • tolerance v. toleration (i.e., doublets, e.g., Kiparsky, 1982a)
  • More frequent items are more strongly preferred (e.g.,

Marcus et al. 1992)

slide-8
SLIDE 8

The Problem of Productivity

Why can some potential generalizations actually generalize productively, while others remain “inert” in existing expressions?

slide-9
SLIDE 9

Suffix

Productive (with Adjectives)

  • ness

Context-Dependent

  • ity

Unproductive

  • th

Productivity

slide-10
SLIDE 10

Suffix

Productive (with Adjectives)

  • ness

Semi-productive

  • ity

Unproductive

  • th

Productivity

circuitousness, grandness, orderliness, pretentiousness, cheapness, ...

Existing:

pine-scentedness

Novel:

pine-scented

slide-11
SLIDE 11

Suffix

Productive (with Adjectives)

  • ness

Semi-productive

  • ity

Unproductive

  • th

Productivity

slide-12
SLIDE 12

Suffix

Productive (with Adjectives)

  • ness

Context-Dependent

  • ity

Unproductive

  • th

verticality,tractability,severity, seniority, inanity, electricity, ...

Existing:

*pine-scentedity

Novel:

Productivity

slide-13
SLIDE 13

Suffix

Productive (with Adjectives)

  • ness

Context-Dependent

  • ity

Unproductive

  • th
  • ile, -al, -able, -ic, -(i)an

subsequentiability subsequentiable

Productivity

slide-14
SLIDE 14

Suffix

Productive (with Adjectives)

  • ness

Context-Dependent

  • ity

Unproductive

  • th

Productivity

slide-15
SLIDE 15

Suffix

Productive (with Adjectives)

  • ness

Context-Dependent

  • ity

Unproductive

  • th

warmth, width, truth, depth, ...

Existing:

*coolth

Novel:

Productivity

slide-16
SLIDE 16

Suffix

Productive (with Adjectives)

  • ness

Context-Dependent

  • ity

Unproductive

  • th

Productivity

slide-17
SLIDE 17

Suffix

Most Productive

  • ness

Less Productive

  • ity

Least Productive

  • th
  • 1. How can differences

in productivity be represented?

  • 2. How can differences

be learned?

Productivity and Reuse

slide-18
SLIDE 18

Unifying the Problems

  • Fundamental problem: How to produce/

comprehend linguistic expressions under uncertainty about how meaning is conventionally encoded by combinations of stored items and composed structures.

  • Productivity and competition are often just

special cases of this general problem.

slide-19
SLIDE 19

Approach

  • Build a model of computation and storage

under uncertainty based on an inference which

  • ptimizes a tradeoff between productivity

(computation) and reuse (storage).

  • This implicitly explains many specific cases of

productivity and competition.

slide-20
SLIDE 20

Case Studies

  • 1. What distributional factors signal

productivity?

  • Explaining Baayen’s hapax-based measures.
  • 2. How is competition resolved?
  • Derives elsewhere condition.
  • 3. Multi-way competition.
  • Explains productivity and ordering generalization.
  • Handles exceptional cases of paradoxical suffix

combinations.

slide-21
SLIDE 21

Talk Outline

  • 1. Introduction to productivity and reuse

with Fragment Grammars (with Noah Goodman).

  • 2. Case Studies on Productivity and

Competition.

slide-22
SLIDE 22

Talk Outline

  • 1. Introduction to productivity and reuse

with Fragment Grammars (with Noah Goodman).

  • 2. Case Studies on Productivity and

Competition.

slide-23
SLIDE 23

The Framework: Three Ideas

  • 1. Model how expressions are built by

composing stored pieces.

  • 2. Treat productivity (computation) and reuse

(storage) as properties which must be determined on a case-by-case basis.

  • 3. Infer correct patterns of storage and

computation by balancing ability to predict input data against simplicity biases.

slide-24
SLIDE 24

A Simple Formal Model: Fragment Grammars

1. Formalization of the hypothesis space.

  • Arbitrary contiguous (sub)trees.

2. Formalization of the inference problem.

  • Probabilistic conditioning to find good

balance between computation and storage.

slide-25
SLIDE 25

A Simple Formal Model: Fragment Grammars

1. Formalization of the hypothesis space.

  • Arbitrary contiguous (sub)trees.

2. Formalization of the inference problem.

  • Probabilistic conditioning to find good

balance between computation and storage.

slide-26
SLIDE 26

Underlying Computational System

N Adj V agree

  • able
  • ity
slide-27
SLIDE 27

Underlying Computational System

N Adj V agree

  • able
  • ity
slide-28
SLIDE 28

Underlying Computational System

N Adj V agree

  • able
  • ity
slide-29
SLIDE 29

Underlying Computational System

N Adj V agree

  • able
  • ity
slide-30
SLIDE 30

Hypothesis Space

Any contiguous subtree can be stored in memory and reused as if it were a single rule from the starting grammar.

slide-31
SLIDE 31

Hypothesis Space

N Adj V agree

  • able
  • ity
slide-32
SLIDE 32

Hypothesis Space

N Adj V agree

  • able
  • ity
slide-33
SLIDE 33

Hypothesis Space

N Adj V agree

  • able
  • ity
slide-34
SLIDE 34

Computation with Stored items

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity
slide-35
SLIDE 35

Computation with Stored items

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity
slide-36
SLIDE 36

Computation with Stored items

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity
slide-37
SLIDE 37

A Simple Formal Model: Fragment Grammars

1. Formalization of the hypothesis space.

  • Arbitrary contiguous (sub)trees.

2. Formalization of the inference problem.

  • Probabilistic conditioning to find good

balance between computation and storage.

slide-38
SLIDE 38

Inference Problem

Find and store the subcomputations which best predict the distribution of forms in the linguistic input taking into account prior expectations for simplicity.

slide-39
SLIDE 39

Prior Expectations

Two Opposing Simplicity Biases

  • 1. Fewer, more reusable stored items.
  • Chinese Restaurant process prior on lexica.
  • 2. Small amounts of computation.
  • Geometric decrease in probability in number of

random choices.

slide-40
SLIDE 40

Example Input

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-41
SLIDE 41

Storage of Minimal, General Structures

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-42
SLIDE 42

Computation per Expression

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-43
SLIDE 43

Computation per Expression

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

P( )

slide-44
SLIDE 44

Computation per Expression

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

P( ) × P( )

slide-45
SLIDE 45

Computation per Expression

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

P( ) × P( ) P( ) ×

slide-46
SLIDE 46

Sharing Across Expressions

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-47
SLIDE 47

Sharing Across Expressions

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-48
SLIDE 48

Storage of Maximal, Specific Structures

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-49
SLIDE 49

Computation per Expression

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-50
SLIDE 50

Computation per Expression

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

P( )

slide-51
SLIDE 51

Sharing Across Expressions

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-52
SLIDE 52

Sharing Across Expressions

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-53
SLIDE 53

Storage of Intermediate Structures

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-54
SLIDE 54

Computation per Expression

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-55
SLIDE 55

Computation per Expression

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

P( )

slide-56
SLIDE 56

Computation per Expression

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

Computation per Expression

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

P( ) × P( )

slide-57
SLIDE 57

Sharing Across Expressions

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-58
SLIDE 58

Sharing Across Expressions

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Thursday, February 2, 2012

slide-59
SLIDE 59

Remarks on Inference Tradeoff

  • Nothing fancy here.
  • The two simplicity biases are just Bayesian

prior and likelihood applied to computation and storage problem.

  • Lexicon code length and data code length

given lexicon in (two part) MDL.

  • Can be connected with many other

frameworks.

slide-60
SLIDE 60

Inference as Conditioning

60

  • Inference Process: Probabilistic Conditioning.
  • Define joint model.

P(Data, Fragments) = P(Data | Fragments) * P(Fragments)

slide-61
SLIDE 61

Inference as Conditioning

61

  • Inference Process: Probabilistic Conditioning.
  • Define joint model.

P(Data, Fragments) = P(Data | Fragments) * P(Fragments)

Likelihood (derivation probabilities)

slide-62
SLIDE 62

Inference as Conditioning

62

  • Inference Process: Probabilistic Conditioning.
  • Define joint model.

P(Data, Fragments) = P(Data | Fragments) * P(Fragments)

Prior (lexicon probabilities)

slide-63
SLIDE 63

Inference as Conditioning

63

  • Inference Process: Probabilistic Conditioning.
  • Condition on particular dataset.

P(Fragments | Data) ∝ P(Data | Fragments) * P(Fragments)

slide-64
SLIDE 64

Probabilistic Conditioning

64

  • Intuition: two-step algorithm.
  • 1. Throw away lexicons not consistent with

the data.

  • 2. Renormalize remaining lexicons so that

they sum to one.

  • Maximally conservative: Relative beliefs are

always conserved.

slide-65
SLIDE 65

The Mathematical Model: Fragment Grammars

  • Generalization of Adaptor Grammars (Johnson et

al., 2007).

  • Allows storing of partial trees.
  • Framework first proposed in MDL setting by

De Marcken, 1996.

  • Related to work on probabilistic tree-

substitution grammars (e.g., Bod, 2003; Cohn, 2010;

Goodman, 2003; Zuidema, 2007; Post, 2013).

slide-66
SLIDE 66

Talk Outline

  • 1. Introduction to productivity and reuse

with Fragment Grammars (with Noah Goodman).

  • 2. Case Studies on Productivity and

Competition.

slide-67
SLIDE 67

Case Studies

  • Other approaches to productivity and

reuse. 1. What distributions signal productivity? 2. How is competition resolved? 3. Multi-way competition.

slide-68
SLIDE 68

Case Studies

  • Other approaches to productivity and

reuse. 1. What distributions signal productivity? 2. How is competition resolved? 3. Multi-way competition.

slide-69
SLIDE 69

Four Strategies for Productivity and Reuse

Thursday, February 2, 2012

slide-70
SLIDE 70

Four Strategies for Productivity and Reuse

  • 5 Formal Models

Thursday, February 2, 2012

slide-71
SLIDE 71

Four Strategies for Productivity and Reuse

  • 5 Formal Models
  • Capture historical proposals from the

literature.

Thursday, February 2, 2012

slide-72
SLIDE 72

Four Strategies for Productivity and Reuse

  • 5 Formal Models
  • Capture historical proposals from the

literature.

  • Minimally different.

Thursday, February 2, 2012

slide-73
SLIDE 73

Four Strategies for Productivity and Reuse

  • 5 Formal Models
  • Capture historical proposals from the

literature.

  • Minimally different.
  • Same inputs, same underlying space of

representations.

Thursday, February 2, 2012

slide-74
SLIDE 74

Four Strategies for Productivity and Reuse

  • 5 Formal Models
  • Capture historical proposals from the

literature.

  • Minimally different.
  • Same inputs, same underlying space of

representations.

  • State-of-the-art probabilistic models.

Thursday, February 2, 2012

slide-75
SLIDE 75

Full-Parsing

(MAP Multinomial-Dirichlet Context- Free Grammars)

  • All generalizations are

productive.

  • Minimal abstract units.
  • Johnson, et al. 2007a
  • Estimated on token frequency.

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Full-Parsing

(FP)

slide-76
SLIDE 76

Full-Listing

(MAP All-Adapted Adaptor Grammars)

  • Store whole form after first use

(recursively).

  • Maximally specific units.
  • Johnson, et al. 2007
  • Base system estimated on type

frequencies.

  • Formalization of classical lexical

redundancy rules.

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Full-Parsing

(FP)

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Full-Listing

(FL)

slide-77
SLIDE 77

Exemplar-Based

(Data-Oriented Parsing)

  • Store all generalizations

consistent with input.

  • Two Formalization: Data-Oriented

Parsing 1 (DOP1; Bod, 1998), Data- Oriented Parsing: Equal-Node Estimator (ENDOP; Goodman, 2003).

  • Argued to be exemplar model of

syntax.

Full-Parsing

(FP)

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Full-Listing

(FL)

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Exemplar-Based

(EB)

slide-78
SLIDE 78

Inference-Based

(Fragment Grammars)

  • Store set of subcomputations

which best explains the data.

  • Formalization: Fragment

Grammars (O’Donnell, et al. 2009)

  • Inference depends on

distribution of tokens over types.

  • Only model which infers

variables.

Full-Parsing

(FP)

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Full-Listing

(FL)

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Exemplar-Based

(EB)

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ness

N Adj V count

  • able
  • ity

N Adj V agree

  • able
  • ity

N Adj V agree

  • able
  • ity

Inference-Based

(IB)

slide-79
SLIDE 79

Empirical Domains

Past Tense (Inflectional) Derivational Morphology

Productive +ed (walked) +ness (goodness) Context-Dependent I →æ (sang) +ity (ability) Unproductive suppletion

(go/went)

+th (width)

slide-80
SLIDE 80

Case Studies

  • Other approaches to productivity and

reuse. 1. What distributions signal productivity? 2. How is competition resolved? 3. Multi-way competition.

slide-81
SLIDE 81

Empirical Evaluations

Past Tense Derivational Morphology

Productive +ed (walked) +ness (goodness) Context-Dependent I →æ (sang) +ity (ability) Unproductive suppletion

(go/went)

+th (width)

slide-82
SLIDE 82

What (Distributional) Cues Signal Productivity?

  • Many proposals in the literature:
  • Type frequency.
  • Token frequency (combined with

something else, e.g., entropy).

  • Heterogeneity of context (generalized

type frequency).

slide-83
SLIDE 83

Full-Listing (MAG) Full-Parsing (MDPCFG) Inference-Based (FG) Exemplar (ENDOP) Exemplar (DOP1)

Top 5 Most Productive Suffixes

slide-84
SLIDE 84

Top 5 Most Productive Suffixes

Full-Listing (MAG) Full-Parsing (MDPCFG) Inference-Based (FG) Exemplar (ENDOP) Exemplar (DOP1)

slide-85
SLIDE 85

Full-Listing (MAG) Full-Parsing (MDPCFG) Inference-Based (FG) Exemplar (GDMN) Exemplar (DOP1)

Top 5 Most Productive Suffixes

slide-86
SLIDE 86

What Evidences Productivity?

  • Crucial evidence of productivity: Use of a

lexical item (morpheme, rule, etc.) to generate new forms.

  • Distributional consequence: Large

proportion of low frequency forms.

slide-87
SLIDE 87

What Predicts Productivity?

slide-88
SLIDE 88

Top 5 Most Productive Suffixes

Full-Listing (MAG) Full-Parsing (MDPCFG) Inference-Based (FG) Exemplar (GDMN) Exemplar (DOP1)

High Proportion of Low Frequency Types

slide-89
SLIDE 89

Full-Listing (MAG) Full-Parsing (MDPCFG) Inference-Based (FG) Exemplar (GDMN) Exemplar (DOP1)

Top 5 Most Productive Suffixes

High Token Frequency High Type Frequency High Token Frequency High Token Frequency

slide-90
SLIDE 90
  • Baayen’s / (e.g., Baayen, 1992)
  • Estimators of productivity based on the

proportion of frequency-1 words in an input corpus.

  • Various derivations.
  • Rate of vocabulary change in urn model.
  • Good-Turing estimation.
  • Fundamentally, a rule-of-thumb.
  • Only defined for single affix estimation.

Baayen’s Hapax-Based Measures

P P∗

slide-91
SLIDE 91

Productivity Correlations

( values from Hay & Baayen, 2002)

P/P∗

MDPCFG

(Full-parsing)

MAG

(Full-listing)

DOP1

(Exemplar-based) ENDOP (Exemplar-based)

FG

(Inference)

slide-92
SLIDE 92

Fragment Grammars and Hapaxes

  • For the case of single affixes, Fragments

Grammars behave approximately as if they were using hapaxes.

  • Not an explicit assumption of the model
  • Model is about how words are built. Given the

fact that some new words are built, behavior arises automatically.

  • Generalizes to multi-way competition.
slide-93
SLIDE 93

Case Studies

  • Other approaches to productivity and

reuse. 1. What distributions signal productivity? 2. How is competition resolved? 3. Multi-way competition.

slide-94
SLIDE 94

Empirical Domains

Past Tense Derivational Morphology

Productive +ed (walked) +ness (goodness) Context-Dependent I →æ (sang) +ity (ability) Unproductive suppletion

(go/went)

+th (width)

slide-95
SLIDE 95

Crucial Facts

  • Defaultness: Regular rule applies when all

else fails.

  • Blocking: Existence of irregular blocks

regular rule.

  • In this domain preferences are sharp.
slide-96
SLIDE 96

How can Correct Inflection be Represented?

Irregulars Regulars

Thursday, February 2, 2012

slide-97
SLIDE 97

How can Correct Inflection be Represented?

Irregulars Regulars

Thursday, February 2, 2012

slide-98
SLIDE 98

−4 −2 2 4 6 8

Correct Inflection

Log Odds Correct

Irregular

(

Regular

(

Unattested

(

FP FP FPFL FL FL E1 E1 E1 E2 E2 E2 IB IB IB 98 FP

Full-Parsing

(Multinomial-Dirichlet CFG)

FL

Full-Listing

(Adaptor Grammars)

E1

Exemplar

(Data-Oriented Parsing 1)

E2

Exemplar

(DOP: ENDOP)

IB

Inference-Based

(Fragment Grammars)

slide-99
SLIDE 99

−4 −2 2 4 6 8

Log Odds Correct

Irregular

(

Regular

(

Unattested

(

FP FP FPFL FL FL E1 E1 E1 E2 E2 E2 IB IB IB

Preference for Correct Past Form

99 FP

Full-Parsing

(Multinomial-Dirichlet CFG)

FL

Full-Listing

(Adaptor Grammars)

E1

Exemplar

(Data-Oriented Parsing 1)

E2

Exemplar

(DOP: ENDOP)

IB

Inference-Based

(Fragment Grammars)

Correct Inflection

slide-100
SLIDE 100

−4 −2 2 4 6 8

Log Odds Correct

Irregular

(

Regular

(

Unattested

(

FP FP FPFL FL FL E1 E1 E1 E2 E2 E2 IB IB IB

Preference for Incorrect Past Form

100 FP

Full-Parsing

(Multinomial-Dirichlet CFG)

FL

Full-Listing

(Adaptor Grammars)

E1

Exemplar

(Data-Oriented Parsing 1)

E2

Exemplar

(DOP: ENDOP)

IB

Inference-Based

(Fragment Grammars)

Correct Inflection

slide-101
SLIDE 101

−4 −2 2 4 6 8

Log Odds Correct

Irregular

(

Regular

(

Unattested

(

FP FP FPFL FL FL E1 E1 E1 E2 E2 E2 IB IB IB

Irregulars in Training

101 FP

Full-Parsing

(Multinomial-Dirichlet CFG)

FL

Full-Listing

(Adaptor Grammars)

E1

Exemplar

(Data-Oriented Parsing 1)

E2

Exemplar

(DOP: ENDOP)

IB

Inference-Based

(Fragment Grammars)

Correct Inflection

slide-102
SLIDE 102

−4 −2 2 4 6 8

Log Odds Correct

Irregular

(

Regular

(

Unattested

(

FP FP FPFL FL FL E1 E1 E1 E2 E2 E2 IB IB IB

Regulars in Training

102 FP

Full-Parsing

(Multinomial-Dirichlet CFG)

FL

Full-Listing

(Adaptor Grammars)

E1

Exemplar

(Data-Oriented Parsing 1)

E2

Exemplar

(DOP: ENDOP)

IB

Inference-Based

(Fragment Grammars)

Correct Inflection

slide-103
SLIDE 103

−4 −2 2 4 6 8

Log Odds Correct

Irregular

(

Regular

(

Unattested

(

FP FP FPFL FL FL E1 E1 E1 E2 E2 E2 IB IB IB

Regulars and Irregulars not in Training

103 FP

Full-Parsing

(Multinomial-Dirichlet CFG)

FL

Full-Listing

(Adaptor Grammars)

E1

Exemplar

(Data-Oriented Parsing 1)

E2

Exemplar

(DOP: ENDOP)

IB

Inference-Based

(Fragment Grammars)

Correct Inflection

slide-104
SLIDE 104

−4 −2 2 4 6 8

Log Odds Correct

Irregular

(

Regular

(

Unattested

(

FP FP FPFL FL FL E1 E1 E1 E2 E2 E2 IB IB IB 104 FP

Full-Parsing

(Multinomial-Dirichlet CFG)

FL

Full-Listing

(Adaptor Grammars)

E1

Exemplar

(Data-Oriented Parsing 1)

E2

Exemplar

(DOP: ENDOP)

IB

Inference-Based

(Fragment Grammars)

Correct Inflection

slide-105
SLIDE 105

−4 −2 2 4 6 8

Log Odds Correct

Irregular

(

Regular

(

Unattested

(

FP FP FPFL FL FL E1 E1 E1 E2 E2 E2 IB IB IB

Exemplar (Data-Oriented Parsing 1)

105 FP

Full-Parsing

(Multinomial-Dirichlet CFG)

FL

Full-Listing

(Adaptor Grammars)

E1

Exemplar

(Data-Oriented Parsing 1)

E2

Exemplar

(DOP: ENDOP)

IB

Inference-Based

(Fragment Grammars)

Correct Inflection

slide-106
SLIDE 106

−4 −2 2 4 6 8

Log Odds Correct

Irregular

(

Regular

(

Unattested

(

FP FP FPFL FL FL E1 E1 E1 E2 E2 E2 IB IB IB FP

Full-Parsing

(Multinomial-Dirichlet CFG)

FL

Full-Listing

(Adaptor Grammars)

E1

Exemplar

(Data-Oriented Parsing 1)

E2

Exemplar

(DOP: ENDOP)

IB

Inference-Based

(Fragment Grammars)

Full-Listing (Adaptor Grammars)

106

Correct Inflection

slide-107
SLIDE 107

−4 −2 2 4 6 8

Log Odds Correct

Irregular

(

Regular

(

Unattested

(

FP FP FPFL FL FL E1 E1 E1 E2 E2 E2 IB IB IB

Inference-Based (Fragment Grammars)

107 FP

Full-Parsing

(Multinomial-Dirichlet CFG)

FL

Full-Listing

(Adaptor Grammars)

E1

Exemplar

(Data-Oriented Parsing 1)

E2

Exemplar

(DOP: ENDOP)

IB

Inference-Based

(Fragment Grammars)

Correct Inflection

slide-108
SLIDE 108

Why Does Blocking Occur?

  • Consequence of two principles.
  • Law of Conservation of Belief:

Hypotheses that predict a greater variety

  • f observed datasets place less probability
  • n each.
  • Conservativity of Conditioning: Posterior

distributions have same relative probability as prior distributions.

108

slide-109
SLIDE 109

Law of Conservation of Belief

109

slide-110
SLIDE 110

Law of Conservation of Belief

110

slide-111
SLIDE 111

Law of Conservation of Belief

111

slide-112
SLIDE 112

Observation

112

x

slide-113
SLIDE 113

Conservativity

113

∝P(H1|D)P(H1) ∝P(H2|D)P(H2) x

slide-114
SLIDE 114

Past Tense

114

slide-115
SLIDE 115

Elsewhere

(Kiparsky, 1973; Anderson, 1969; Kiparsky, 1982a; Andrews, 1982)

  • Don’t need elsewhere condition as independent

stipulation (cf. subset principle, premption, etc.).

  • When a choice must be made between two analyses/

derivations, prefer the one with highest P(form | meaning) more “tightly.”

  • More general than original statement.
  • Any factor influencing P(form | meaning)
  • input conditions on rules, frequency, etc.
  • Stored-stored, stored-computed, computed-

computed, etc.

slide-116
SLIDE 116

Case Studies

  • Other approaches to productivity and

reuse. 1. What distributions signal productivity? 2. How is competition resolved? 3. Multi-way competition.

slide-117
SLIDE 117

Empirical Domains

Past Tense Derivational Morphology

Productive +ed (walked) +ness (goodness) Context-Dependent I →æ (sang) +ity (ability) Unproductive suppletion

(go/went)

+th (width)

slide-118
SLIDE 118

Hierarchical Structure

  • Derivational morphology hierarchical and

recursive.

  • Multiple suffixes can appear in a word.
slide-119
SLIDE 119

Many Hypotheses

N Adj V agree

  • able
  • ity
slide-120
SLIDE 120

Many Hypotheses

N Adj V agree

  • able
  • ity
slide-121
SLIDE 121

Many Hypotheses

N Adj V agree

  • able
  • ity
slide-122
SLIDE 122

Empirical Problem: Suffix Ordering

  • Many combinations of suffixes do not

appear in words.

  • Fabb (1988).
  • 43 suffixes.
  • 663 possible pairs (taking into account

selectional restrictions)

  • Only 50 exist.
slide-123
SLIDE 123

Empirical Problem: Suffix Ordering

  • Many theories
  • Level-ordering (e.g., Siegel, 1974)
  • Selectional-restriction based (e.g., Plag, 2003)
  • Complexity-based ordering (Hay, 2004)
  • Focus on two phenomena
  • Productivity and ordering generalization
  • Paradoxical suffix combinations
slide-124
SLIDE 124

Productivity and Ordering Generalization

(Hay, 2004) On average, more productive suffixes appear after less productive suffixes

(Hay, 2002; Hay and Plag, 2004; Plag et al, 2009).

slide-125
SLIDE 125

Productivity and Ordering Generalization

(Hay, 2004)

  • Implicit in many earlier theories (e.g., Level-

Ordering Generalization of Siegel 1974).

  • Hay’s argues for processing-based view

(Complexity-Based Ordering)

  • But: Follows as a logically necessary

consequence of pattern of storage and computation.

slide-126
SLIDE 126

Productivity and Ordering Generalization

  • Intuition:
  • Less productive suffixes stored as part of

words.

  • More productive suffixes can attach to

anything, including morphologically-complex stored forms.

slide-127
SLIDE 127

But: Paradoxical Suffix Combinations

  • Combinations of suffixes which violate the

Productivity and Ordering Generalization (as well as predictions of other earlier theories).

  • -ability, -ation, -istic, -mental
slide-128
SLIDE 128

Multi-way Competition:

  • ity v. -ness
  • In general, -ness more productive than -ity.
  • -ity more productive after:
  • ile, -able, -(i)an, -ic.

(Anshen & Aronoff, 1981; Aronoff & Schvaneveldt, 1978; Cutler, 1980)

slide-129
SLIDE 129

Two Frequent Combinations:

  • ivity v. -bility
  • -ive + -ity: -ivity (e.g., selectivity).
  • Speaker prefer to use -ness with novel words

(Aronoff & Schvaneveldt, 1978).

  • depulsiveness > depulsivity.
  • -ble + -ity: -bility (e.g., sensibility).
  • Speakers prefer to use -ity with novel words

(Anshen & Aronoff, 1981).

  • remortibility > remortibleness.
slide-130
SLIDE 130
  • ivity v. -bility

−5 5

{ {

  • ness
  • ity

ive ive ive ive ive ive ble ble ble ble ble ble

  • ive
  • ble

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Predicted

slide-131
SLIDE 131
  • ivity v. -bility

−5 5

{ {

  • ness
  • ity

ive ive ive ive ive ive ble ble ble ble ble ble

  • ive
  • ble

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Predicted

Preference for -ness

slide-132
SLIDE 132

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Predicted

  • ivity v. -bility

−5 5

{ {

  • ness
  • ity

ive ive ive ive ive ive ble ble ble ble ble ble

  • ive
  • ble

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Preference for -ity

slide-133
SLIDE 133

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Predicted

  • ivity v. -bility

−5 5

{ {

  • ness
  • ity

ive ive ive ive ive ive ble ble ble ble ble ble

  • ive
  • ble

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Preceding suffix -ive

slide-134
SLIDE 134

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Predicted

  • ivity v. -bility

−5 5

{ {

  • ness
  • ity

ive ive ive ive ive ive ble ble ble ble ble ble

  • ive
  • ble

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Preceding suffix -ble

slide-135
SLIDE 135

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Predicted

Full-Parsing

(Multinomial-Dirichlet Context-Free Grammar)

−5 5

{ {

  • ness
  • ity

ive ive ive ive ive ive ble ble ble ble ble ble

  • ive
  • ble

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

slide-136
SLIDE 136

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Predicted

−5 5

{ {

  • ness
  • ity

ive ive ive ive ive ive ble ble ble ble ble ble

  • ive
  • ble

Exemplar

(DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Full-Listing

(Adaptor Grammars)

slide-137
SLIDE 137

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Predicted

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

−5 5

{ {

  • ness
  • ity

ive ive ive ive ive ive ble ble ble ble ble ble

  • ive
  • ble

Exemplar-Based

(Data-Oriented Parsing 1)

slide-138
SLIDE 138

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(FrMAGment Grammars)

Predicted

−5 5

{ {

  • ness
  • ity

ive ive ive ive ive ive ble ble ble ble ble ble

  • ive
  • ble

Exemplar-Based

(Data-Oriented Parsing: Goodman Estimator)

slide-139
SLIDE 139

−5 5

{ {

  • ness
  • ity

ive ive ive ive ive ive ble ble ble ble ble ble

  • ive
  • ble

Full-Parsing

(MDPCFG)

Full-Listing

(Adaptor Grammars) Exemplar (DOP1)

Exemplar

(GDMN)

Inference

(Fragment Grammars)

Predicted

Inference-Based

(Fragment Grammars)

slide-140
SLIDE 140

Multi-way Competition

  • Explains productivity and ordering

generalization.

  • Explains difficult cases of competition

involving paradoxical suffix combinations.

slide-141
SLIDE 141

Global Summary

  • Inference based on distribution of tokens over

types.

  • Derives Baayen’s hapax-based theory.
  • View the choice of whether to retrieve or

compute as an inference.

  • Derives elsewhere condition.
  • Storage of arbitrary structures explains ordering

generalizations.

  • Explains Productivity and Ordering Generalization.
  • Also accounts for paradoxical suffix combinations such as -ability
slide-142
SLIDE 142

Conclusion

  • Model the problem of deriving word

forms using a mixture of computation and storage as a tradeoff using standard inferential tools.

  • Automatically solves many problems of

productivity and competition resolution.

slide-143
SLIDE 143

Thanks!