Discovering Morphological Paradigms from Plain Text Using a - - PowerPoint PPT Presentation

discovering morphological paradigms from plain text
SMART_READER_LITE
LIVE PREVIEW

Discovering Morphological Paradigms from Plain Text Using a - - PowerPoint PPT Presentation

Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model Markus Dreyer Jason Eisner SDL Language Weaver Johns Hopkins University This work was done at: Center for Language and Speech Processing (CLSP)


slide-1
SLIDE 1

Using a Dirichlet Process Mixture Model

Discovering Morphological Paradigms from Plain Text

Center for Language and Speech Processing (CLSP)

Human Lang. Tech. Center of Excellence (HLTCOE)

Johns Hopkins University (JHU) SDL Language Weaver Johns Hopkins University

Jason Eisner Markus Dreyer

This work was done at:

slide-2
SLIDE 2

Motivation

break break break break break break

English text German text

jump jump brichst brecht brechen breche brichst breche springe springst

Rich morphology

slide-3
SLIDE 3

Motivation

  • Analyzing text:
  • Lack of generalization
  • Data sparseness
  • Generating text:
  • Generate correct forms
  • Produce correctly inflected text

There is a need for a general morphology model that knows how to inflect words.

brichst brecht brechen breche brichst breche springe springst

slide-4
SLIDE 4

Inflectional Paradigm

So how do you inflect a word? You look it up in such a table, for example:

treffen treffe triffst trifft treffen trefft treffen traf trafst traf trafen traft trafen

But creating such supervised data is expensive.

Motivation

Let’s use unannotated text to learn these paradigms!

slide-5
SLIDE 5

Motivation

  • This talk is about a comprehensive model for

inflectional morphology.

  • Main goal:
  • Given some unannotated text, can we learn

how to inflect the verbs of a language (incl. irregularities and exceptions)?

  • Discover the inflectional paradigms (tables)
  • f a language, using minimal supervision
slide-6
SLIDE 6

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

  • 1. Identify the different lexemes in text
slide-7
SLIDE 7

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

  • 1. Identify the different lexemes in text
slide-8
SLIDE 8

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

  • 1. Identify the different lexemes in text
slide-9
SLIDE 9

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

  • 2. Place each form of a lexeme into its paradigm
slide-10
SLIDE 10

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

  • 2. Place each form of a lexeme into its paradigm
slide-11
SLIDE 11

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

springe springst

  • 2. Place each form of a lexeme into its paradigm
slide-12
SLIDE 12

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

springe springst

  • 2. Place each form of a lexeme into its paradigm
slide-13
SLIDE 13

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

springe springst

springen? sprengen? springt? sprengt? springen? sprengen? springt? sprengt? springen? sprengen? springte? sprang?

springtest? sprangst? springte? sprang?

springte? sprang? springtet? sprangt?

springten? sprangen?

  • 2. Place each form of a lexeme into its paradigm
slide-14
SLIDE 14

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

springe springst

springen? sprengen? springt? sprengt? springen? sprengen? springt? sprengt? springen? sprengen? springte? sprang?

springtest? sprangst? springte? sprang?

springte? sprang? springtet? sprangt?

springten? sprangen?

slide-15
SLIDE 15

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

springe springst

springen? sprengen? springt? sprengt? springen? sprengen? springt? sprengt? springen? sprengen? springte? sprang?

springtest? sprangst? springte? sprang?

springte? sprang? springtet? sprangt?

springten? sprangen?

saufen saufe säufst sauft

säuft? sauft? säufen? saufen? säufen? saufen?

slide-16
SLIDE 16

Motivation

In order to perform this morphological knowledge discovery, we define a probability distribution

  • ver a text corpus and its (hidden) inflectional

paradigms:

p( )

slide-17
SLIDE 17

Overview

1 p(

) p( )

2

slide-18
SLIDE 18

Overview

1 p(

) p( )

2

slide-19
SLIDE 19

Paradigms

Why build probability model over paradigms?

1

brichst brecht breche brichst breche

bricht? brecht?

brichen? brechen? brichen? brechen? brichen? brechen?

  • Jointly predict

missing string values

  • Compute marginals
  • Know what spellings

are likely in the different cells

slide-20
SLIDE 20

Paradigms

1

How to build probability model over paradigms?

brichst brecht breche brichst breche

bricht? brecht?

brichen? brechen? brichen? brechen? brichen? brechen?

Dreyer & Eisner (2009)

slide-21
SLIDE 21

Paradigms

1

X1sg X2sg X3sg X1pl X2pl X3pl XLem

Each cell is modeled by a string-valued random variable

Dreyer & Eisner (2009)

How to build probability model over paradigms?

slide-22
SLIDE 22

Paradigms

1

X1sg X2sg X3sg X1pl X2pl X3pl XLem

Each cell is modeled by a string-valued random variable

Dreyer & Eisner (2009)

How to build probability model over paradigms?

slide-23
SLIDE 23

Paradigms

1

X1sg X2sg X3sg X1pl X2pl X3pl XLem

Dreyer & Eisner (2009)

How to build probability model over paradigms?

slide-24
SLIDE 24

Paradigms

1

X1sg X2sg X3sg X1pl X2pl X3pl XLem

Dreyer & Eisner (2009)

How to build probability model over paradigms?

slide-25
SLIDE 25

Paradigms

1

X1sg X2sg X3sg X1pl X2pl X3pl XLem

Dreyer & Eisner (2009)

How to build probability model over paradigms?

slide-26
SLIDE 26

Paradigms

1

X1sg X2sg X3sg X1pl X2pl X3pl XLem p(XLem, X1sg, X2sg, X3sg, X1pl, X2pl, X3pl) =

1 Z

F1(XLem,X1sg)

×

F2(XLem,X2sg)

×

F3(XLem,X3sg)

×

F4(XLem,X1pl)

×

F5(XLem,X2pl)

×

F6(XLem,X3pl)

×

F

2

F

1

F

3

F

4

F

5

F

6

Markov Random Field over string-valued variables

How to build probability model over paradigms?

slide-27
SLIDE 27

Paradigms

1

X1sg X2sg X3sg X1pl X2pl X3pl XLem

F

2

F

1

F

3

F

4

F

5

F

6

Markov Random Field over string-valued variables

Belief Propagation:

  • Standard inference

algorithm

  • Computes Marginals

through message passing

  • We use finite-state

variant of this algorithm,

Dreyer & Eisner (2009)

slide-28
SLIDE 28

Paradigms

1

X1sg X2sg X3sg X1pl X2pl X3pl XLem

F

2

F

1

F

3

F

4

F

5

F

6

bricht brechen b r e c h e n

b r i c h e n

. . .

b r i c h e n

b r e c h e n

. . . Belief Propagation:

  • Standard inference

algorithm

  • Computes Marginals

through message passing

  • We use finite-state

variant of this algorithm,

Dreyer & Eisner (2009)

brechen

brichen

...

slide-29
SLIDE 29

Paradigms

  • Paradigms modeled as Markov

Random Fields (MRF)

  • Weighted finite-state

transducers (FST) relate the various spellings to one another

  • They encode morphological

knowledge (“grammar”)

  • Use finite-state-based belief

propagation (BP) to compute string marginals

1

Summary

slide-30
SLIDE 30
  • Dreyer & Eisner (2009):
  • Learn purely from example paradigms

(training data)

  • Then use model to predict unseen

forms

  • Disadvantages:
  • Training data is expensive
  • Predicts forms that would never occur

in real text (where an alternate form may be preferred)

1

Summary

bricht b r i c h e n

b r e c h e n

. . .

We will now address these problems.

Paradigms

slide-31
SLIDE 31

Overview

1 p(

) p( )

2

slide-32
SLIDE 32

Lexicon & Corpus

  • We use the paradigms to construct a

probabilistic lexicon that specifies which inflections of which lexemes are common and how they are spelled.

  • We define a generative probability

model of the lexicon and a text corpus.

  • This allows for clean inference

procedure to learn morphology from text and discover its inflectional paradigms

2

slide-33
SLIDE 33

Lexicon & Corpus

Generative story Model Data

generate

Inference (Sampling) Model Data

learn

2

slide-34
SLIDE 34

Lexicon & Corpus

Generative story Model Data

generate

To generate from our model:

  • First, generate

the lexicon (types).

  • Then, use it to generate

the corpus (tokens).

slide-35
SLIDE 35

(1) Choose a distribution over lexemes

Lexicon & Corpus

02

Stick-breaking process

Generating...

2

slide-36
SLIDE 36

(1) Choose a distribution over lexemes

Lexicon & Corpus

02 0.08 0.04 0.01 0.08 0.06 0.01 0.12 0.03 0.02

Stick-breaking process

Generating...

2

slide-37
SLIDE 37

(1) Choose a distribution over lexemes

Lexicon & Corpus

02 0.08 0.04 0.01 0.08 0.06 0.01 0.12 0.03 0.02

[zoom] Stick-breaking process

Generating...

2

slide-38
SLIDE 38

Lexicon & Corpus

0.02 0.01 0.08

Generating...

2

(1) Choose a distribution over lexemes

slide-39
SLIDE 39

Lexicon & Corpus

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .07 .05 .2 .12 .26 .3

0.02 0.01 0.08

Generating...

(2) For each lexeme, choose a distribution over its inflections

2

slide-40
SLIDE 40

Lexicon & Corpus

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

0.02 0.01 0.08

Generating...

(2) For each lexeme, choose a distribution over its inflections

2

slide-41
SLIDE 41

Lexicon & Corpus

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

0.02 0.01 0.08

Generating...

(3) For each lexeme, choose a paradigm that expresses the lexeme orthographically

2

slide-42
SLIDE 42

Lexicon & Corpus

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

0.02 0.01 0.08

breche brechen brichst brecht bricht brechen treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen

Generating...

(3) For each lexeme, choose a paradigm that expresses the lexeme orthographically

2

slide-43
SLIDE 43

Lexicon & Corpus

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

0.02 0.01 0.08

breche brechen brichst brecht bricht brechen treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen

Generating... Done!

(3) For each lexeme, choose a paradigm that expresses the lexeme orthographically

2

slide-44
SLIDE 44

Lexicon & Corpus

.07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

0.02 0.01 0.08

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl breche brechen brichst brecht bricht brechen 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl treffe treffen triffst trefft trifft treffen 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl springe springen springst springt springt springen

Generating... Done!

2

(3) For each lexeme, choose a paradigm that expresses the lexeme orthographically

slide-45
SLIDE 45

Lexicon & Corpus

.07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

0.02 0.01 0.08

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl breche brechen brichst brecht bricht brechen 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl treffe treffen triffst trefft trifft treffen 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl springe springen springst springt springt springen

Generating... Done!

2

(3) For each lexeme, choose a paradigm that expresses the lexeme orthographically

slide-46
SLIDE 46

Lexicon & Corpus

.07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

0.02 0.01 0.08

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl breche brechen brichst brecht bricht brechen 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl treffe treffen triffst trefft trifft treffen 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl springe springen springst springt springt springen

Generating... Done!

2

(3) For each lexeme, choose a paradigm that expresses the lexeme orthographically

slide-47
SLIDE 47

Lexicon & Corpus

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

0.02 0.01 0.08

breche brechen brichst brecht bricht brechen treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen

Generating...

The lexicon has been generated (i.e., the types of the language). Now generate the corpus (i.e., the tokens).

2

slide-48
SLIDE 48

Lexicon & Corpus

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

0.02 0.01 0.08

breche brechen brichst brecht bricht brechen treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen

Generating...

3rd sg 1st pl 2nd pl 1st pl 2nd sg

2

PRON

4

NOUN

6

PREP

1

VERB

5

VERB

3

VERB

7

VERB

8

NOUN

9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst

Done!

2

slide-49
SLIDE 49

Lexicon & Corpus

Generative story Model Data

generate

2

slide-50
SLIDE 50

Lexicon & Corpus

Inference (Sampling) Model Data

learn

2

slide-51
SLIDE 51

Lexicon & Corpus

Inference (Sampling) Model Data

learn

To do inference:

  • Start with observed

corpus

  • Construct lexicon

and estimate all distributions

2

slide-52
SLIDE 52

Lexicon & Corpus

2 4 6 1 5 3 7 8 9

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst

PRON NOUN PREP VERB VERB VERB VERB NOUN VERB

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

0.02 0.01 0.08

breche brechen brichst brecht bricht brechen treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen 3rd sg 1st pl 2nd pl 1st pl 2nd sg

2

slide-53
SLIDE 53

Lexicon & Corpus

2 4 6 1 5 3 7 8 9

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst

2

slide-54
SLIDE 54

Lexicon & Corpus

2 4 6 1 5 3 7 8 9

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ...

2

slide-55
SLIDE 55

Lexicon & Corpus

2

PRON

4

NOUN

6

CONJ

1

VERB

5

VERB

3

VERB

7

VERB

8

NOUN

9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ...

2

slide-56
SLIDE 56

Lexicon & Corpus

2

PRON

4

NOUN

6

CONJ

1

VERB

5

VERB

3

VERB

7

VERB

8

NOUN

9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen

Minimal supervision: We observe some paradigms, from which we can estimate an initial θ (which parameterizes the finite-state MRFs)

Seed paradigm

2

slide-57
SLIDE 57

Lexicon & Corpus

2

PRON

4

NOUN

6

CONJ

1

VERB

5

VERB

3

VERB

7

VERB

8

NOUN

9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen

Seed paradigm

...

Train initial θ values “e” is likely to change into “i”

3rd sg ends in “t” from 3rd sg to 1st pl, change vowel

“ m

  • r

p h

  • l
  • g

i c a l g r a m m a r ”

2

slide-58
SLIDE 58

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen

1

2

slide-59
SLIDE 59

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen

1

The red lexeme is completely specified and “bricht” does not fit in.

2

slide-60
SLIDE 60

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen

1

3rd sg

The red lexeme is completely specified and “bricht” does not fit in.

2

slide-61
SLIDE 61

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen

The red lexeme is completely specified and “bricht” does not fit in.

2

slide-62
SLIDE 62

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl

1

2

slide-63
SLIDE 63

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

1

2

slide-64
SLIDE 64

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl 3rd sg

1

2

slide-65
SLIDE 65

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg 1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen?

1

We immediately run finite-state-based belief propagation in this new paradigm.

2

slide-66
SLIDE 66

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen?

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

2

slide-67
SLIDE 67

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

We have just removed the nonsense form “brichen” (and others)

2

slide-68
SLIDE 68

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1

3rd pl

3

We have just removed the nonsense form “brichen” (and others)

2

slide-69
SLIDE 69

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

2

slide-70
SLIDE 70

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl

5

2

slide-71
SLIDE 71

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl 1st sg 1st pl 2nd sg 2nd pl springt 3rd pl

5

2

slide-72
SLIDE 72

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl 1st sg 1st pl 2nd sg 2nd pl springt 3rd pl

5

3rd sg

2

slide-73
SLIDE 73

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

Run belief propagation!

2

slide-74
SLIDE 74

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

It would fit well in two of the cells

2

slide-75
SLIDE 75

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

2

slide-76
SLIDE 76

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

3rd pl

2

slide-77
SLIDE 77

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

3rd pl

9

2

slide-78
SLIDE 78

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

3rd pl

9

2

slide-79
SLIDE 79

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

3rd pl

9

2nd sg

2

slide-80
SLIDE 80

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

3rd pl

9

2

We will now re-estimate θ, given our new “observations” (samples). This training method is called MCEM.

slide-81
SLIDE 81

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

3rd pl

9

2nd sg

2

We will now re-estimate θ, given our new “observations” (samples). This training method is called MCEM.

slide-82
SLIDE 82

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

3rd pl

9

We go over the corpus over and over again, re-analyzing words in the light of newly acquired knowledge about table frequencies, inflection frequencies and the updated “morphological grammar” θ.

2

slide-83
SLIDE 83

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Lexicon & Corpus

PRON NOUN CONJ VERB VERB VERB VERB NOUN

2 4 6 1 5 3 7 8 9

VERB

Index POS Lexeme Inflection Spelling

bricht brechen springt brechen triffst ... ... ... ... treffe treffen triffst trefft trifft treffen treffe treffen triffst trefft trifft treffen 3rd sg

1 3

3rd pl

5

3rd sg

7

3rd pl

9

2nd sg

We go over the corpus over and over again, re-analyzing words in the light of newly acquired knowledge about table frequencies, inflection frequencies and the updated “morphological grammar” θ.

2

slide-84
SLIDE 84
  • Constantly update frequency estimates for lexemes

and inflections

  • Often update the “morphological grammar” θ
  • Keep re-analyzing words accordingly
  • Run finite-state BP to fill in missing paradigm cells
  • Important: Often, BP will produce a regular and some

more irregular candidates, one of them is found in the corpus and placed in the cell, so we “learn” it!

Text & Paradigms

Summary of the sampling process:

2

slide-85
SLIDE 85

Text & Paradigms

Obtaining results for evaluation

  • We add many paradigms, in which only the

lemma form is given, but the other slots are empty.

  • Just keep track of what corpus tokens the sampler

places in those empty cells, or what candidates will be suggested from belief propagation.

  • To get an answer for particular cell, get its

marginal probability distribution at end of each

  • iteration. At the end, get average prob. per

spelling and report highest-scoring one

2

slide-86
SLIDE 86

Experiment: Learn German inflectional morphology

  • Given:
  • 50 seed paradigms (from CELEX)
  • German corpus of 10 million words (from

“WaCKy” corpus)

  • Test:

For 5,415 German verbs, predict paradigms with 21 inflections each

Results

3

slide-87
SLIDE 87

89 89.5 90 90.5 91 91.5 92 92.5 50 seed 100 seed

92.2 90.9 92 90.6 91.5 89.9

no corpus 1 million words 10 million words

Adding a large text corpus significantly improves prediction accuracy.

Regular: 85.4

Results

3

10% error reduction 8.2% error reduction

slide-88
SLIDE 88

9.2 18.4 27.6 36.8 46 55.2 64.4 73.6 82.8 92 0-9 100-999 10000-

25 61.8 78.1 84.4 91.3 25 61.4 79.3 84.5 91 20.7 57.4 71.6 78.1 90.5

no corpus 1 million words 10 million words

Large gains for irregular forms

Results

3

Regular Irregular

Frequency Accuracy

slide-89
SLIDE 89

Conclusions

  • Formulated a principled framework for

semi-supervised learning of structured morphological paradigms

  • Jointly tagged corpus tokens and learned non-

concatenative spelling changes between morphological types in the lexicon

  • Filled complete structured paradigms with
  • bserved and predicted word forms
  • Ran sampler on large corpora (up to 10 million

words), reducing prediction error by up to 10%