Discovery of Inflectional Paradigms from Plain Text using Graphical - - PowerPoint PPT Presentation

discovery of inflectional
SMART_READER_LITE
LIVE PREVIEW

Discovery of Inflectional Paradigms from Plain Text using Graphical - - PowerPoint PPT Presentation

A Non-Parametric Model for the Discovery of Inflectional Paradigms from Plain Text using Graphical Models over Strings Markus Dreyer Center for Language and Speech Processing (CLSP) Human Language Technology Center of Excellence (HLTCOE) Johns


slide-1
SLIDE 1

A Non-Parametric Model for the

Discovery of Inflectional Paradigms from Plain Text

using Graphical Models over Strings

Markus Dreyer

Center for Language and Speech Processing (CLSP) Human Language Technology Center of Excellence (HLTCOE) Johns Hopkins University (JHU)

Dissertation Defense, November 5, 2010

slide-2
SLIDE 2

Motivation

  • Rich morphology

break break break break break break

English text German text

jump jump brichst brecht brechen breche brichst breche springe springst

slide-3
SLIDE 3

Motivation

  • Analyzing text:
  • lack of generalization
  • data sparseness
  • Generating text:
  • need to generate correct

forms

  • produce correctly

inflected text

German text

brichst brecht brechen breche brichst breche springe springst

slide-4
SLIDE 4

Motivation

  • In NLP, we often need to analyze or generate

text in such languages.

  • So there is a need for a general morphology

model that knows how to inflect words.

  • Since annotations are always expensive, it

would be best to learn from unannotated text.

slide-5
SLIDE 5

Inflectional Paradigm

Motivation

So how do you inflect a verb? You look it up in such a table, for example:

treffen treffe triffst trifft treffen trefft treffen traf trafst traf trafen traft trafen

But creating such supervised data is expensive.

slide-6
SLIDE 6

Motivation

  • This talk is about a comprehensive model for

inflectional morphology.

  • Main goal:
  • Given some unannotated text, can we

learn how to inflect the verbs of a language (incl. irregularities and exceptions)?

  • Discover the inflectional paradigms

(tables) of a language, using minimal supervision

slide-7
SLIDE 7

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

  • 1. Identify the different lexemes in text
slide-8
SLIDE 8

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

  • 1. Identify the different lexemes in text
slide-9
SLIDE 9

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

  • 1. Identify the different lexemes in text
slide-10
SLIDE 10

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

  • 2. Place each form of a lexeme into its paradigm
slide-11
SLIDE 11

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

  • 2. Sort each lexeme into a paradigm
slide-12
SLIDE 12

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

springe springst

  • 2. Sort each lexeme into a paradigm
slide-13
SLIDE 13

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

springe springst

  • 2. Sort each lexeme into a paradigm
slide-14
SLIDE 14

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

springe springst

springen? sprengen? springt? sprengt? springen? sprengen? springt? sprengt? springen? sprengen? springte? sprang?

springtest? sprangst? springte? sprang?

springte? sprang? springtet? sprangt?

springten? sprangen?

  • 2. Sort each lexeme into a paradigm
slide-15
SLIDE 15

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

springe springst

springen? sprengen? springt? sprengt? springen? sprengen? springt? sprengt? springen? sprengen? springte? sprang?

springtest? sprangst? springte? sprang?

springte? sprang? springtet? sprangt?

springten? sprangen?

slide-16
SLIDE 16

Motivation

Tokens Types

brichst brecht brechen breche brichst breche

German text

springe springst

Paradigm

brichst brecht brechen breche brichst breche

bricht? brecht?

brichen? brechen? brichte? brach?

brichtest?

brachst?

brichte?

brach?

brichten? brachen? brichtet? bracht? brichten? brachen?

brichen? brechen?

springe springst

springen? sprengen? springt? sprengt? springen? sprengen? springt? sprengt? springen? sprengen? springte? sprang?

springtest? sprangst? springte? sprang?

springte? sprang? springtet? sprangt?

springten? sprangen?

saufen saufe säufst sauft

säuft? sauft? säufen? saufen? säufen? saufen?

slide-17
SLIDE 17

Motivation

  • Similar to information extraction tasks:
  • Find information in text,
  • Put it in database,
  • Make deductions,
  • Find more information in text,
  • iterate ...
slide-18
SLIDE 18

Motivation

p( )

In order to perform this morphological knowledge discovery, we define a probability distribution

  • ver a text corpus and its (hidden) inflectional

paradigms:

slide-19
SLIDE 19

Overview

1

p( )

p( )

p( )

String pairs Multiple strings (paradigms) Text and

paradigms

2 3

slide-20
SLIDE 20

Overview

1

p( )

p( )

p( )

String pairs Multiple strings (paradigms) Text and

paradigms

2 3

Dreyer, Smith & Eisner, 2008 Dreyer & Eisner, 2009 Dreyer & Eisner, in prep

slide-21
SLIDE 21

Overview

1

p( )

p( )

p( )

String pairs Multiple strings (paradigms) Text and

paradigms

2 3

slide-22
SLIDE 22

String Pairs

1

String pair problems are common in NLP:

Morphology:

slide-23
SLIDE 23

String Pairs

1

Transliteration john hardt - yue han ha te patrick johnson - pa te li ke yue han xun frederick william mulley - fu lei de li ke wei lian ma li Spelling correction Honululu - Honolulu braek- break Pronunciation Sternanisöl - xxxxxxxxx loophole - xxxxxx

String pair problems are common in NLP:

Morphology:

slide-24
SLIDE 24
  • We want to build a probability model over string

pairs.

  • Such a model can produce k-best output, can be

plugged into bigger models later, etc.

  • We would like to make use of flexible features, be

able to look at linguistic properties of the strings,

  • and train the parameters from data.

String Pairs

1

slide-25
SLIDE 25

String Pairs

Pr(s1,s2) = F(s1,s2)

1 Z

1

Pr(s2 |s1) = F(s1,s2)

1 Z

  • Function F evaluates how well the

two strings go together.

  • It looks at properties (“features”) of the

string pair and assigns some score.

slide-26
SLIDE 26

String Pairs

#breaking# #broke# s1= s2=

1

slide-27
SLIDE 27

String Pairs

#breaking# #broke# # # # # s1= s2=

1

slide-28
SLIDE 28

#breaking# # broke # #

String Pairs

1

slide-29
SLIDE 29

#breaking# # br oke # # εε ε

String Pairs

1

slide-30
SLIDE 30

#breaking# # br oke # # εε ε #breaking# #broεkeεε# #breaεking# #brεεokeεε# #breakεing# #broεkeεεε#

...

String Pairs

1

slide-31
SLIDE 31

#breaking# #broke# # # # # s1= s2=

String Pairs

1

slide-32
SLIDE 32

F(s1,s2) =

#breaking# #broke# # # # # s1= s2=

String Pairs

1

slide-33
SLIDE 33

( )

F(s1,s2) =

#breaking# #broke# # # # # s1= s2=

String Pairs

1

slide-34
SLIDE 34

+ exp

i θifi(

)

#breaking# #brεokeεε#

( )

F(s1,s2) =

#breaking# #broke# # # # # s1= s2=

String Pairs

1

slide-35
SLIDE 35

+ exp

i θifi(

)

#breaking# #broεkeεε#

+ exp

i θifi(

)

#breaking# #brεokeεε#

( )

F(s1,s2) =

#breaking# #broke# # # # # s1= s2=

String Pairs

1

slide-36
SLIDE 36

+ exp

i θifi(

)

#breaεking# #brεεokeεε#

+ exp

i θifi(

)

#breaking# #broεkeεε#

+ exp

i θifi(

)

#breaking# #brεokeεε#

( )

F(s1,s2) =

#breaking# #broke# # # # # s1= s2=

String Pairs

1

slide-37
SLIDE 37

+ exp

i θifi(

)

#breaεking# #brεεokeεε#

+ exp

i θifi(

)

#breakεing# #broεkeεεε#

+ exp

i θifi(

)

#breaking# #broεkeεε#

+ exp

i θifi(

)

#breaking# #brεokeεε#

( )

F(s1,s2) =

#breaking# #broke# # # # # s1= s2=

String Pairs

1

slide-38
SLIDE 38

+ . . .

+ exp

i θifi(

)

#breaεking# #brεεokeεε#

+ exp

i θifi(

)

#breakεing# #broεkeεεε#

+ exp

i θifi(

)

#breaking# #broεkeεε#

+ exp

i θifi(

)

#breaking# #brεokeεε#

( )

F(s1,s2) =

#breaking# #broke# # # # # s1= s2=

String Pairs

1

slide-39
SLIDE 39

Pr(s1,s2) = 1/Z F(s1,s2)

+ . . .

+ exp

i θifi(

)

#breaεking# #brεεokeεε#

+ exp

i θifi(

)

#breakεing# #broεkeεεε#

+ exp

i θifi(

)

#breaking# #broεkeεε#

+ exp

i θifi(

)

#breaking# #brεokeεε#

( )

F(s1,s2) =

#breaking# #broke# # # # # s1= s2=

String Pairs

1

slide-40
SLIDE 40

#breakεing# #broεkeεεε#

+ exp

i θifi(

)

#breakεing# #broεkeεεε#

slide-41
SLIDE 41

#breakεing# #broεkeεεε#

+ exp

i θifi(

)

#breakεing# #broεkeεεε#

slide-42
SLIDE 42

#breakεing# #broεkeεεε#

+ exp

i θifi(

)

#breakεing# #broεkeεεε#

slide-43
SLIDE 43

#breakεing# #broεkeεεε#

+ exp

i θifi(

)

#breakεing# #broεkeεεε#

slide-44
SLIDE 44

#breakεing# #broεkeεεε#

+ exp

i θifi(

)

#breakεing# #broεkeεεε#

slide-45
SLIDE 45

#breakεing# #broεkeεεε#

+ exp

i θifi(

)

#breakεing# #broεkeεεε#

slide-46
SLIDE 46

#breakεing# #broεkeεεε#

slide-47
SLIDE 47

#breakεing# #broεkeεεε#

eak

  • εk

full window

slide-48
SLIDE 48

#breakεing# #broεkeεεε#

eak

  • εk

full window

VVC VεC

vowels, consonants

slide-49
SLIDE 49

#breakεing# #broεkeεεε#

eak

  • εk

full window

???

  • εk

target language

VVC VεC

vowels, consonants

slide-50
SLIDE 50

#breakεing# #broεkeεεε#

eak

  • εk

full window

???

  • εk

target language

???

  • k

“collapsed”

VVC VεC

vowels, consonants

slide-51
SLIDE 51

#breakεing# #broεkeεεε#

eak

  • εk

full window

???

  • εk

target language

subst del ident

subst, del, ins, ident

???

  • k

“collapsed”

VVC VεC

vowels, consonants

slide-52
SLIDE 52

#breakεing# #broεkeεεε#

?ak ?εk

full window

??? ?εk

target language

???

del ident

subst, del, ins, ident

??? ?k

“collapsed”

?VC ?εC

vowels, consonants

A l s

  • a

d d v e r s i

  • n

s

  • f

t h e s e f e a t u r e s t h a t a r e b a c k e d

  • f

f t

  • b

i g r a m s !

slide-53
SLIDE 53
  • To compute such feature-based scores

for two string variables S1 and S2, we construct a weighted finite-state transducer F

  • It can assign a score to any string pair

s1, s2 Pr(s1,s2) = F(s1,s2) F S2 S1

1 Z

String Pairs

1

slide-54
SLIDE 54

What is a finite-state acceptor (FSA) An automaton with a finite number of states and arcs. Can be used to assign a score to any string. What is a finite-state transducer (FST) Same as FSA, but used to assign score to any string pair (e.g. evaluating how well they go together).

? ?

Background: Finite-state machines

slide-55
SLIDE 55
  • Specific kind of grammar that describes and

scores one or more strings

  • Closure properties under many useful operations

(we will use composition, intersection, projection)

  • Useful for many tasks in natural language

processing

String Pairs

1

slide-56
SLIDE 56

b r e c h e n b r a c h t S1 = S2 = F S2 S1 F

String Pairs

1

slide-57
SLIDE 57

b r e c h e n b r a c h t S1 = S2 = F S2 S1 F

finite-state transducer

String Pairs

1

slide-58
SLIDE 58

b r e c h e n b r a c h t S1 = S2 = F S2 S1 F

finite-state transducer

String Pairs

1

slide-59
SLIDE 59

b r e c h e n b r a c h t S1 = S2 = F S2 S1 F

finite-state transducer

arc have weights, determined by their features

String Pairs

1

slide-60
SLIDE 60

b r e c h e n b r a c h t S1 = S2 = F S2 S1 F

finite-state transducer

arc have weights, determined by their features

Transducer F computes score by looking at all alignments

String Pairs

1

slide-61
SLIDE 61

b r e c h e n b r a c h t S1 = S2 = F S2 S1 F =13.26

Sum over all paths in the finite-state transducer

finite-state transducer

arc have weights, determined by their features

Transducer F computes score by looking at all alignments

String Pairs

1

slide-62
SLIDE 62
  • The alignment between the string pair is a

latent variable.

  • We add more latent variables to the model:
  • Change regions
  • Conjugations classes

String Pairs

1

For details, see my thesis, and Dreyer, Smith & Eisner, 2008

slide-63
SLIDE 63

String Pairs

1

60 67 74 81 88 95 13SIA-13SKE 2PIE-13PKE 2PKE-z rP-pA

Moses3 FST FST (+latent)

Inflection (on German verbs) (this talk) (baseline)

See my thesis, and Dreyer, Smith & Eisner, 2008

slide-64
SLIDE 64

70 80 90 100 Basque English Irish Tagalog

Wicentowski (2002) This talk

Lemmatization

String Pairs

1

See my thesis, and Dreyer, Smith & Eisner, 2008

slide-65
SLIDE 65

Transliteration competition, NEWS 2009

U A l b e r t a N I C T T h i s t a l k I B M

61.3 60.5 60.0 54.5

Accuracy on English-to-Russian

(basic features)

U T

  • k

y

  • U

I U C

String Pairs

1

slide-66
SLIDE 66
  • Presented a novel, well-defined probability

model over string pairs (or single strings)

  • General enough to model many string-to-string

problems in NLP (and neighboring disciplines)

  • Achieved high-scoring results in different tasks

(inflection, lemmatization, transliteration) in multiple languages (German, Basque, English, Irish,

Tagalog, Russian)

Conclusions / Contributions 1

slide-67
SLIDE 67
  • Linguistic properties and soft constraints can be

expressed and learned (prefer certain vowel/consonant sequences, prefer identities, ...)

  • Arbitrary-length output is handled elegantly

(eliminates need for limiting structure insertion)

  • Much information does not need to be annotated; it

is inferred as hidden variables (alignments, conjugation classes, regions)

Conclusions / Contributions 1

slide-68
SLIDE 68

Overview

1

p( )

p( )

p( )

String pairs Text and

paradigms

2 3

Multiple strings (paradigms)

slide-69
SLIDE 69
  • We’ve seen how to

model 2 strings, using feature-based finite- state machines

  • But we have bigger

goals ...

Multiple Strings

2

slide-70
SLIDE 70

Multiple Strings

2

slide-71
SLIDE 71

Example applications

2

Inflectional paradigms

slide-72
SLIDE 72

Example applications

2

Inflectional paradigms

slide-73
SLIDE 73

? ? ? ? ? ? ? ? ? ? ? ?

Example applications

2

Inflectional paradigms

slide-74
SLIDE 74

predict predict

Inflectional paradigms

Example applications

2

slide-75
SLIDE 75

predict predict

Inflectional paradigms

Example applications

2

slide-76
SLIDE 76

predict predict

Inflectional paradigms

Example applications

2

slide-77
SLIDE 77

predict predict reinforce

Inflectional paradigms

Example applications

2

slide-78
SLIDE 78

ice cream

アイスクリーム

English orthography English phonology Japanese orthography Japanese phonology

Transliteration (using phonology)

Example applications

2

slide-79
SLIDE 79

egg sample

Misspelling Pronunciation

Spelling correction

example

Correct spelling

Example applications

2

slide-80
SLIDE 80

... and all other tasks where word forms and representations interact:

  • Cognate modeling
  • Multiple-string alignment
  • System combination

Example applications

2

slide-81
SLIDE 81

Multiple Strings

  • Let’s build a general probability model over

multiple strings

  • It extends the string-pair model we saw in the last

part.

  • We will later be able to use it to learn how to

inflect verbs.

2

slide-82
SLIDE 82

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2) = F1 S2 S1 x F1(s1, s2)

1 Z

2

slide-83
SLIDE 83

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2) = F1 S2 S1 x F1(s1, s2)

Random variable, ranges over any string

1 Z

Random variable, ranges over any string

2

slide-84
SLIDE 84

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2) = F1 S2 S1 x F1(s1, s2)

Random variable, ranges over any string

1 Z

Potential function, can score any string pair Random variable, ranges over any string

2

slide-85
SLIDE 85

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2) = F1 S2 S1 x F1(s1, s2)

1 Z

2

slide-86
SLIDE 86

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2, s3) = F1 S2 S1 x F1(s1, s2) x F2(s1, s3) S3 F2

1 Z

2

slide-87
SLIDE 87

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2, s3, s4) = F1 S2 S1 x F1(s1, s2) x F2(s1, s3) x F3(s1, s4) S3 F2 F3 S4

1 Z

2

slide-88
SLIDE 88

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2, s3, s4) = F1 S2 S1 F4 x F1(s1, s2) x F2(s1, s3) x F3(s1, s4) x F4(s2, s3) S3 F2 F3 S4

1 Z

2

slide-89
SLIDE 89

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2, s3, s4) = F1 S2 S1 F4 F5 x F1(s1, s2) x F2(s1, s3) x F3(s1, s4) x F4(s2, s3) x F5(s3, s4) S3 F2 F3 S4

1 Z

2

slide-90
SLIDE 90

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2, s3, s4) = F1 S2 S1 F4 F5 x F1(s1, s2) x F2(s1, s3) x F3(s1, s4) x F4(s2, s3) x F5(s3, s4) F6 S3 F2 F3 S4 x F6(s2, s4)

1 Z

2

slide-91
SLIDE 91

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2, s3, s4) = F1 S2 S1 F4 F5 x F1(s1, s2) x F2(s1, s3) x F3(s1, s4) x F4(s2, s3) x F5(s3, s4) F6 S3 F2 F3 S4 x F6(s2, s4)

1 Z

Potential function, can score any string pair

2

slide-92
SLIDE 92

Factor Graph:

  • Model. Factor graph examples

F1 S2 S1 F4 F5 F6 S3 F2 F3 S4

Potential function, can score any string pair

Each potential function F is computed by a finite- state transducer.

2

slide-93
SLIDE 93

Factor Graph:

  • Model. Factor graph examples

Pr(s1, s2, s3, s4) = F1 S2 S1 F4 F5 x F1(s1, s2) x F2(s1, s3) x F3(s1, s4) x F4(s2, s3) x F5(s3, s4) F6 S3 F2 F3 S4 x F6(s2, s4)

1 Z

A formal description of such a model ...

2

slide-94
SLIDE 94
  • Model. Summary
  • It is formally an undirected graphical model

(a.k.a. Markov Random Field, MRF),

  • in which the variables are string-valued,

and the factors (potential functions) are finite-state transducers.

Dreyer & Eisner, 2009

2

slide-95
SLIDE 95
  • Model. Less formal description

To model multiple strings and their various interactions, I ...

  • use many finite-state transducers,
  • have each of them look at a different string pair,
  • plug them together into a big network,
  • and coordinate them to predict all strings

jointly (also: train the transducers jointly).

2

slide-96
SLIDE 96
  • Model. Comparison with k-tape FSM
  • Model k strings with a k-tape finite-state machine?

b r e εchenε b r εachεεt b r εachenε b r εachεεε F F S1 S2 S3 S4

  • Factored model more powerful:
  • Encode swaps and other useful models
  • Encode undecidable models

☺ ☹

2

slide-97
SLIDE 97
  • Model. Comparison with k-tape FSM
  • Model k strings with a k-tape finite-state machine?
  • >26k arcs, intractable!

b r e εchenε b r εachεεt b r εachenε b r εachεεε F F S1 S2 S3 S4

Multiple-sequence alignment

  • Factored model more powerful:
  • Encode swaps and other useful models
  • Encode undecidable models

☺ ☹

2

slide-98
SLIDE 98
  • Inference. Overview

Factor Graph: F1 S2 S1 F4 F5 F6 S3 F2 F3 S4

2

slide-99
SLIDE 99
  • Inference. Overview

Factor Graph: F1 S2 S1 F4 F5 F6 S3 F2 F3 S4

  • Run Belief Propagation

(BP)

2

slide-100
SLIDE 100
  • Inference. Overview

Factor Graph: F1 S2 S1 F4 F5 F6 S3 F2 F3 S4

  • Run Belief Propagation

(BP)

  • BP is a message-passing

algorithm, a generalization of forward-backward.

2

slide-101
SLIDE 101
  • Inference. Overview

Factor Graph: F1 S2 S1 F4 F5 F6 S3 F2 F3 S4

  • Run Belief Propagation

(BP)

  • BP is a message-passing

algorithm, a generalization of forward-backward.

  • BP computes marginals

2

slide-102
SLIDE 102
  • Inference. Overview

Factor Graph: F1 S2 S1 F4 F5 F6 S3 F2 F3 S4

  • Run Belief Propagation

(BP)

  • BP is a message-passing

algorithm, a generalization of forward-backward.

  • BP computes marginals

In our version of BP, all messages and beliefs are finite-state machines, which is novel.

2

slide-103
SLIDE 103

S2 F1 S1

  • Inference. Multiple strings

brechen

Example:

2

slide-104
SLIDE 104

S2 F1 S1

  • Inference. Multiple strings

brechen

Example:

predict

0.20 bracht 0.13 brechtet 0.08 brachtet ... ( w h

  • l

e d i s t r i b u t i

  • n

)

2

slide-105
SLIDE 105

S2 F1 S3 F2 S1 F4 F5 S4 F3

  • Inference. Multiple strings

brechen

Example:

0.20 bracht 0.13 brechtet 0.08 brachtet ... ( w h

  • l

e d i s t r i b u t i

  • n

)

2

slide-106
SLIDE 106

S3

  • Inference. Multiple strings

Example: S2 F1 F2 S1 F4 F5 S4 F3

brechen

0.20 bracht 0.13 brechtet 0.08 brachtet ... ( w h

  • l

e d i s t r i b u t i

  • n

) 0.09 brach 0.03 brech 0.02 brich ...

0.27 brachen 0.07 brechten ...

2

slide-107
SLIDE 107

S3

  • Inference. Multiple strings

Example: S2 F1 F2 S1 F4 F5 S4 F3

brechen

0.20 bracht 0.13 brechtet 0.08 brachtet ... ( w h

  • l

e d i s t r i b u t i

  • n

) 0.09 brach 0.03 brech 0.02 brich ...

0.27 brachen 0.07 brechten ...

0.23 brachten 0.18 brachen 0.11 brechten ... 0.12 brachen 0.07 brechen 0.01 brichen ...

2

slide-108
SLIDE 108

S3

  • Inference. Multiple strings

Example: S2 F1 F2 S1 F4 F5 S4 F3

brechen

0.20 bracht 0.13 brechtet 0.08 brachtet ... ( w h

  • l

e d i s t r i b u t i

  • n

) 0.09 brach 0.03 brech 0.02 brich ...

0.27 brachen 0.07 brechten ...

0.23 brachten 0.18 brachen 0.11 brechten ... 0.12 brachen 0.07 brechen 0.01 brichen ...

Example: S3

0.27 brachen 0.07 brechten ...

0.23 brachten 0.18 brachen 0.11 brechten ... 0.12 brachen 0.07 brechen 0.01 brichen ...

2

slide-109
SLIDE 109

S3

  • Inference. Multiple strings

Example: S2 F1 F2 S1 F4 F5 S4 F3

brechen

0.20 bracht 0.13 brechtet 0.08 brachtet ... ( w h

  • l

e d i s t r i b u t i

  • n

) 0.09 brach 0.03 brech 0.02 brich ...

0.27 brachen 0.07 brechten ...

0.23 brachten 0.18 brachen 0.11 brechten ... 0.12 brachen 0.07 brechen 0.01 brichen ...

Example: S3

0.27 brachen 0.07 brechten ...

0.23 brachten 0.18 brachen 0.11 brechten ... 0.12 brachen 0.07 brechen 0.01 brichen ...

2

slide-110
SLIDE 110

S3

  • Inference. Multiple strings

Example: S2 F1 F2 S1 F4 F5 S4 F3

brechen

0.20 bracht 0.13 brechtet 0.08 brachtet ... ( w h

  • l

e d i s t r i b u t i

  • n

) 0.09 brach 0.03 brech 0.02 brich ...

0.27 brachen 0.07 brechten ...

Example:

0.23 brachten 0.18 brachen 0.11 brechten ... 0.12 brachen 0.07 brechen 0.01 brichen ...

Decoding output for S3 (consensus): brachen

S3

0.27 brachen 0.07 brechten ...

2

slide-111
SLIDE 111

S3

  • Inference. Multiple strings

Example: S2 F1 F2 S1 F4 F5 S4 F3

brechen

0.27 brachen 0.07 brechten ...

Example: S3

  • Each message is a

finite-state acceptor

  • Intersect all

incoming messages

2

slide-112
SLIDE 112

S3

  • Inference. Multiple strings

Example: S2 F1 F2 S1 F4 F5 S4 F3

brechen

0.27 brachen 0.07 brechten ...

Example: S3

For example, message passing in a CRF: Again: Usually, BP just works with simple lookup tables as factors and messages, not finite-state machines.

α β word tag ... ...

0.47 N 0.12 V ... 0.23 V 0.11 A ... 0.53 N 0.41 A ...

2

slide-113
SLIDE 113

S3

  • Inference. Multiple strings

Example: S2 F1 F2 S1 F4 F5 S4 F3

brechen

0.27 brachen 0.07 brechten ...

Example: S3

For example, message passing in a CRF: Again: Usually, BP just works with simple lookup tables as factors and messages, not finite-state machines.

α β word tag ... ...

0.47 N 0.12 V ... 0.23 V 0.11 A ... 0.53 N 0.41 A ...

2

slide-114
SLIDE 114
  • Inference. Multiple strings
  • We can also run loopy

belief propagation on these finite-state Markov Random Fields (MRFs)

  • Just iterate the message

passing

  • Issues with intractability,

see my thesis b r e c h e n S1 = (observed) S2 F1 F2 S1 F4 F5 S4 F3 S3

2

slide-115
SLIDE 115
  • Inference. Multiple strings
  • We can also run loopy

belief propagation on these finite-state Markov Random Fields (MRFs)

  • Just iterate the message

passing

  • Issues with intractability,

see my thesis b r e c h e n S1 = (observed) S2 F1 F2 S1 F4 F5 S4 F3 S3

2

slide-116
SLIDE 116
  • Joint inference can be used to train these

models from data

  • Training data consists of complete or

incomplete tables of forms

  • We present a method to induce factor graphs

in a data-driven way

  • See my thesis for the approach and results
  • Inference. Multiple strings

2

slide-117
SLIDE 117
  • Presented general, novel joint probability

model over multiple strings

  • Combines NLP techniques (finite-state

machines) with machine-learning techniques (graphical models)

  • Markov Random Field over strings (variables:

string-valued, potential functions: finite- state machines)

Conclusions / Contributions 2

slide-118
SLIDE 118
  • Novel variant of belief propagation with finite-

state messages

  • Presented approximations
  • Presented novel way of structure induction

for string-based models based on edit distance

  • Achieved significant improvements through

staged joint training of complex factor graphs

Conclusions / Contributions 2

slide-119
SLIDE 119

Overview

1

p( )

p( )

p( )

String pairs Multiple strings (paradigms)

2 3

Text and

paradigms

slide-120
SLIDE 120

Text & Paradigms

  • We have seen how an

inflectional paradigm (“multiple strings”) can be modeled by finite-state Markov Random Fields (MRFs)

  • Now we will build a joint model
  • ver inflectional paradigms

and a text corpus

  • Goal: Learn how to inflect words

in the language, using clues from the text corpus

3

F1 S2 S1 F4 F5 F6 S3 F2 F3 S4 Pr(s1, s2, s3, s4)

4 systematically related spellings:

slide-121
SLIDE 121

3

Text & Paradigms

  • In Part 2:

We learn how to inflect words, given some observed paradigms (complete or incomplete) that someone created as training data (expensive supervision)

  • Here in Part 3:

We also want to learn how to inflect words, we’ll use a few observed paradigms as well, but mainly learn from plain text (cheap data)

Why do we want to use a text corpus?

☺ ☹

slide-122
SLIDE 122

3

Text & Paradigms

How can a text corpus help? It can potentially fix erroneous MRF string predictions. Intuition: If a spelling predicted by the MRF cannot be found in the corpus, it was probably an incorrect prediction.

slide-123
SLIDE 123

S3 S2 F1 F2 S1 F4 F5 S4 F3

brechen

0.20 bracht 0.13 brechtet 0.08 brachtet ... ( w h

  • l

e d i s t r i b u t i

  • n

) 0.09 brach 0.03 brech 0.02 brich ...

0.27 brachen 0.07 brechten ...

0.31 brechten 0.18 brachen 0.11 brichten ... 0.12 brechten 0.07 brachen 0.01 brichen ...

Decoding output for S3 (consensus): brechten

S3

0.27 brachen 0.21 brechten ...

3

Text & Paradigms

MRF is making a mistake: brechten is nonsense and not found in corpus. But the 2nd-best form, brachen, is frequent. It’s probably correct!

slide-124
SLIDE 124

We will make such decisions using statistical inference, under a probability model that uses the finite- state MRFs, but models the text corpus as well.

Text & Paradigms

3

slide-125
SLIDE 125

3

Text & Paradigms

  • Keep tokens and types separate
  • Tokens are in the text corpus
  • Types are in the paradigms
  • We also model abstract morphological

knowledge about how forms are related What kind of probability model do we want?

slide-126
SLIDE 126
  • Each paradigm contains the systematically

related spellings of a lexeme (modeled by finite-state MRFs).

  • Assume an unbounded number of

possible lexemes and paradigms (“non- parametric”) in the text.

3

Text & Paradigms

slide-127
SLIDE 127

3

Text & Paradigms

Generative story Model Data

generate

Inference (Sampling) Model Data

learn

slide-128
SLIDE 128

3

Text & Paradigms

Generative story Model Data

generate

slide-129
SLIDE 129

To generate from our model: First, generate the types of the language. Then, use them to generate the corpus tokens.

3

Text & Paradigms

Generative story Model Data

generate

slide-130
SLIDE 130

Text & Paradigms

3

Generate infinitely many lexemes

(1)

slide-131
SLIDE 131

Text & Paradigms

3

0.02 0.04 0.01 0.08 0.01 0.12 0.03 0.06 0.08 Generate infinitely many lexemes Stick-breaking process

(1)

slide-132
SLIDE 132

Text & Paradigms

3

0.02 0.04 0.01 0.08 0.01 0.12 0.03 0.06 0.08 Generate infinitely many lexemes [zoom] Stick-breaking process

(1)

slide-133
SLIDE 133

Text & Paradigms

3

0.02 0.01 0.08 [zoom]

(1) Generate infinitely many lexemes

slide-134
SLIDE 134

Text & Paradigms

3

0.02 0.01 0.08

(1) Generate infinitely many lexemes

slide-135
SLIDE 135

Text & Paradigms

3

0.02 0.01 0.08 Each lexeme has a paradigm with slots for the different inflections

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl

(2)

slide-136
SLIDE 136

Text & Paradigms

3

0.02 0.01 0.08 Each lexeme has a paradigm with slots for the different inflections

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl

(2)

slide-137
SLIDE 137

Text & Paradigms

3

0.02 0.01 0.08

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

Each paradigm has a distribution

  • ver slot frequencies

(3)

slide-138
SLIDE 138

Text & Paradigms

3

0.02 0.01 0.08

.07 .05 .2 .12 .26 .3

(4)

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

All paradigms generate their spellings using the same finite-state MRF parameterized by θ

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl

slide-139
SLIDE 139

Text & Paradigms

3

0.02 0.01 0.08

breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3

(4)

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

All paradigms generate their spellings using the same finite-state MRF parameterized by θ

slide-140
SLIDE 140

Text & Paradigms

3

0.02 0.01 0.08

treffe treffen triffst trefft trifft treffen breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3 .13 .4 .06 .08 .3 .03

(4)

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl .08 .06 .1 .11 .25 .4

All paradigms generate their spellings using the finite-state MRF parameterized by θ

slide-141
SLIDE 141

Text & Paradigms

3

0.02 0.01 0.08

treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

(4) All paradigms generate their

spellings using the finite-state MRF parameterized by θ

slide-142
SLIDE 142

Text & Paradigms

3

0.02 0.01 0.08

treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

(4) All paradigms generate their

spellings using the finite-state MRF parameterized by θ

θ i s a “ m

  • r

p h

  • l
  • g

i c a l g r a m m a r ” T h e t r u e θ f

  • r

G e r m a n m

  • r

p h

  • l
  • g

y w i l l k n

  • w

, f

  • r

e x a m p l e , “ f r

  • m

1 s t s i n g u l a r t

  • p

l u r a l , j u s t a p p e n d t h e s u f fi x n ” ,

  • r

“ b e t w e e n 3 r d s g a n d p l , a v

  • w

e l c h a n g e i s l i k e l y ”

slide-143
SLIDE 143

Text & Paradigms

3

0.02 0.01 0.08

treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

All types of the language have been generated. Now generate the corpus tokens.

slide-144
SLIDE 144

Text & Paradigms

3

0.02 0.01 0.08

treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

slide-145
SLIDE 145

Text & Paradigms

3

0.02 0.01 0.08

treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

Generate the corpus: POS tags

(5)

Adv V PPER V N V Prep V N V

slide-146
SLIDE 146

Text & Paradigms

3

0.02 0.01 0.08

treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

Generate the corpus: Lexemes

(5)

Adv V PPER V N V Prep V N V

POS Lex

slide-147
SLIDE 147

Text & Paradigms

3

0.02 0.01 0.08

treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

Generate the corpus: Inflections

(5)

Adv V PPER V N V Prep V N V

POS Lex Infl

3rd sg 1st pl 2nd pl 1st pl 2nd sg

slide-148
SLIDE 148

Text & Paradigms

3

0.02 0.01 0.08

treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03

Generate the corpus: Look up the spellings

(5)

Adv V PPER V N V Prep V N V 3rd sg 1st pl 2nd pl 1st pl 2nd sg bricht

POS Lex Infl Spell

brechen springt brechen triffst

slide-149
SLIDE 149

Generative story Model Data

generate

Text & Paradigms

3

slide-150
SLIDE 150

Text & Paradigms

3

slide-151
SLIDE 151

Text & Paradigms

3

Inference (Sampling) Model Data

learn

slide-152
SLIDE 152

Text & Paradigms

3

We start with observed corpus tokens and construct the paradigms and estimate all distributions Inference (Sampling) Model Data

learn

slide-153
SLIDE 153

Text & Paradigms

3

Adv V PPER V N V Prep V N V bricht

POS Lex Infl Spell

brechen springt brechen triffst

0.02 0.01 0.08

treffe treffen triffst trefft trifft treffen springe springen springst springt springt springen breche brechen brichst brecht bricht brechen .07 .05 .2 .12 .26 .3 .08 .06 .1 .11 .25 .4 .13 .4 .06 .08 .3 .03 3rd sg 1st pl 2nd pl 1st pl 2nd sg

slide-154
SLIDE 154

Text & Paradigms

3

Adv V PPER V N V Prep V N V bricht

POS Lex Infl Spell

brechen springt brechen triffst

slide-155
SLIDE 155

Text & Paradigms

3

Adv V PPER V N V Prep V N V bricht

POS Lex Infl Spell

brechen springt brechen triffst

slide-156
SLIDE 156

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

bricht brechen springt brechen triffst

Minimal supervision: We do also observe a few seed paradigms, from which we can estimate an initial θ, which parameterizes the finite-state MRFs

slide-157
SLIDE 157

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

Seed paradigm

treffe treffen triffst trefft trifft treffen bricht brechen springt brechen triffst

Minimal supervision: We do also observe a few seed paradigms, from which we can estimate an initial θ, which parameterizes the finite-state MRFs

slide-158
SLIDE 158

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

Seed paradigm

treffe treffen triffst trefft trifft treffen bricht brechen springt brechen triffst

slide-159
SLIDE 159

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

Seed paradigm

treffe treffen triffst trefft trifft treffen

...

Train initial θ values “e” is likely to change into “i”

3rd sg ends in “t” from 3rd sg to 1st pl, change vowel

“ m

  • r

p h

  • l
  • g

i c a l g r a m m a r ”

bricht brechen springt brechen triffst

slide-160
SLIDE 160

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen bricht brechen springt brechen triffst

slide-161
SLIDE 161

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen bricht brechen springt brechen triffst

w1 w2 w3 w4 w5

The red lexeme is completely specified and “bricht” does not fit in.

slide-162
SLIDE 162

bricht

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen brechen springt brechen triffst

w1 w2 w3 w4 w5

1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl

3rd sg ends in “t”

w1

Remember: θ says,

slide-163
SLIDE 163

treffe treffen triffst trefft trifft treffen 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen brechen springt brechen triffst

w1 w2 w3 w4 w5

3rd sg ends in “t”

w1

Remember: θ says,

bricht bricht

slide-164
SLIDE 164

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen brechen springt brechen triffst

w1 w2 w3 w4 w5

1st sg 1st pl 2nd sg 2nd pl bricht 3rd pl

w1

3rd sg ends in “t”

Remember: θ says,

bricht 3rd sg

slide-165
SLIDE 165

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen brechen springt brechen triffst

w1 w2 w3 w4 w5

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen?

w1

We immediately run finite-state-based belief propagation in this new paradigm.

bricht 3rd sg

slide-166
SLIDE 166

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen brechen springt brechen triffst

w1 w2 w3 w4 w5

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen?

w1

bricht 3rd sg

w2

slide-167
SLIDE 167

treffe treffen triffst trefft trifft treffen

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen springt brechen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

brechen brechen 3rd pl

slide-168
SLIDE 168

brechen treffe treffen triffst trefft trifft treffen

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen springt brechen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl

slide-169
SLIDE 169

brechen treffe treffen triffst trefft trifft treffen

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen springt brechen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl 1st sg 1st pl 2nd sg 2nd pl 3rd sg 3rd pl

w3

slide-170
SLIDE 170

brechen treffe treffen triffst trefft trifft treffen

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen springt brechen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl 1st sg 1st pl 2nd sg 2nd pl springt 3rd pl

w3

springt 3rd sg

slide-171
SLIDE 171

brechen treffe treffen triffst trefft trifft treffen

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen springt brechen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

w3

springt 3rd sg

Run belief propagation!

slide-172
SLIDE 172

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

brechen treffe treffen triffst trefft trifft treffen

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht brechen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen brechen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl

w3

springt 3rd sg

slide-173
SLIDE 173

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

brechen treffe treffen triffst trefft trifft treffen

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brechen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen brechen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl

w3

springt 3rd sg

It would fit well in two of the cells:

w4

slide-174
SLIDE 174

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen?

brechen treffe treffen triffst trefft trifft treffen

briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brichen? brechen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brechen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen brechen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl

w3

springt 3rd sg

w4

brechen 3rd pl

Do not have to run BP now, because paradigm spellings are not changed. But frequency estimates change right away.

slide-175
SLIDE 175

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brechen

brechen treffe treffen triffst trefft trifft treffen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl

w3

springt 3rd sg

w4

brechen 3rd pl

w5

slide-176
SLIDE 176

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brechen

brechen treffe treffen triffst trefft trifft treffen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl

w3

springt 3rd sg

w4

brechen 3rd pl

w5

triffst 2nd sg

slide-177
SLIDE 177

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brechen

brechen treffe treffen triffst trefft trifft treffen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl

w3

springt 3rd sg

w4

brechen 3rd pl

w5

triffst 2nd sg

We will now re-estimate θ, given our new “observations” (samples). This training method is called MCEM.

slide-178
SLIDE 178

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brechen

brechen treffe treffen triffst trefft trifft treffen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen triffst

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl

w3

springt 3rd sg

w4

brechen 3rd pl

w5

triffst 2nd sg

We go over the corpus over and over again, re-analyzing words in the light of newly acquired knowledge about table frequencies, inflection frequencies and the updated “morphological grammar” θ.

slide-179
SLIDE 179

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen? briche? breche? brichen? brechen? brichst? brechst? bricht? brecht?

bricht

brechen

brechen treffe treffen triffst trefft trifft treffen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen

w1 w2 w3 w4 w5 w1

bricht 3rd sg

w2

3rd pl

w3

springt 3rd sg

w4

brechen 3rd pl

w5

triffst 2nd sg

We go over the corpus over and over again, re-analyzing words in the light of newly acquired knowledge about table frequencies, inflection frequencies and the updated “morphological grammar” θ.

slide-180
SLIDE 180

springe? sprenge? springen? sprengen? springst? sprengst? springt? sprengt?

springt

springen? sprengen? briche? breche? brichen? brechen? brichst? brechst?

bricht

bricht? brecht?

brechen

brechen treffe treffen triffst trefft trifft treffen

Text & Paradigms

3

Adv V PPER V N V Prep V N V

POS Lex Infl Spell

treffe treffen triffst trefft trifft treffen

w1 w2 w3 w4 w5 w1

bricht

w2 w3

springt

w4

brechen

w5

We go over the corpus over and over again, re-analyzing words in the light of newly acquired knowledge about table frequencies, inflection frequencies and the updated “morphological grammar” θ.

3rd sg 3rd pl 3rd sg 3rd pl 2nd sg 2nd pl triffst

slide-181
SLIDE 181
  • Constantly update frequency estimates for lexemes

and inflections

  • Often update the “morphological grammar” θ
  • Keep re-analyzing words accordingly
  • Run finite-state BP to fill in missing paradigm cells
  • Important: Often, BP will produce a regular and some

more irregular candidates, one of them is found in the corpus and placed in the cell, so we “learn” it!

Text & Paradigms

3

Summary of the sampling process:

slide-182
SLIDE 182
  • Inflections and lexemes at the corpus positions

are sampled.

  • The missing paradigm cells are marginalized
  • ver.
  • The “morphological grammar” θ is maximized.
  • We are using a collapsed Gibbs sampler,

according to a hierarchical Chinese Restaurant Process, with interspersed finite- state belief propagation steps

Text & Paradigms

3

Summary of the sampling process:

slide-183
SLIDE 183
  • Improve mixing and prevent “lock-in”
  • Do not sample word by word
  • Instead: Pick a whole lexeme, remove all its

current words and perform Gibbs sampling just with those words (for one or more iteratons)

3

Text & Paradigms

Sampling Speedup:

slide-184
SLIDE 184

Text & Paradigms

3

Obtaining results for evaluation

  • We add many paradigms, in which only the

lemma form is given, but the other slots are empty.

  • Just keep track of what corpus tokens the sampler

places in those empty cells, or what candidates will be suggested from belief propagation.

  • To get an answer for particular cell, get its

marginal probability distribution at end of each

  • iteration. At the end, get average prob. per

spelling and report highest-scoring one

slide-185
SLIDE 185

3

Text & Paradigms

∞ N Π G G0 σ2 θ D S L W

Word Paradigm

H H0 α′ σ2 φ

Base distribution

  • ver inflections

(maxent)

α

Lexeme Base distribution

  • ver Lexemes

Dirichlet process “Morphological grammar”

The complete probability model (simplified):

Finite-state transducer Distribution

  • ver paradigms
slide-186
SLIDE 186

Experiment: Learn German inflectional morphology

  • Given:
  • 50 seed paradigms (from CELEX)
  • German corpus of 10 million words (from

“WaCKy” corpus)

  • Test:

For 5,415 German verbs, predict paradigms with 21 inflections each

3

Text & Paradigms

slide-187
SLIDE 187

3

Text & Paradigms

89 89.5 90 90.5 91 91.5 92 50 seed 100 seed

92 90.9 91.9 90.4 91.5 89.4

no corpus 1 million words 10 million words

Adding a large text corpus significantly improves prediction accuracy.

Regular: 85.4

slide-188
SLIDE 188

3

Text & Paradigms

50 60 70 80 90 100

13PIA 13PIE 13PKA13PKE 13SIA 13SKA 1SIE 2PIA 2PIE 2PKA 2PKE 2SIA 2SIE 2SKA 2SKE 3SIE pA pE rP rS

no corpus 1 million words 10 million words

Many forms are easy (~100% acc.) Large gains on some forms (irregularities) In rare cases, the corpus hurts.

slide-189
SLIDE 189
  • Simplifications:
  • The finite-state MRFs use a simple factor

graph that just connects the lemma to all

  • ther forms, but not the forms among each
  • ther
  • Information flows from one form to the
  • ther through the lemma slot
  • Better factor graphs give orthogonal

improvements, see Ch. 3

3

Text & Paradigms

slide-190
SLIDE 190

Possible model extensions

  • Adding context: Take neighboring words

into account, so that 1st pl can be more likely after “we” than after “she”, etc.

  • Adding topic variables: Useful for

deciding that a particular spelling belongs into one lexeme rather than another (singed does not fit into “sing” paradigm because it’s a different topic).

3

Text & Paradigms

slide-191
SLIDE 191

Remaining morphological issues

  • Reduplication
  • Metathesis
  • Consonant doubling
  • Circumfixes
  • Templatic morphology
  • Interaction with phonology

3

Text & Paradigms

slide-192
SLIDE 192
  • Presented novel, principled approach to learning

inflectional morphology of a language

  • Developed joint probability model over text

corpus and inflectional paradigms, which are hidden

  • Presented type-based sampling procedure

that discovers inflectional paradigms from plain text

Conclusions / Contributions 3

slide-193
SLIDE 193
  • Presented novel generative story for inflectional

morphology, ...

  • ... based on ordinary linguistic notions

(lexemes, inflections, paradigms)

  • Clusters corpus words into lexemes and inflections

using hierarchical Dirichlet process

  • Allows unbounded number of lexemes
  • Handles nonconcatenative, irregular morphology

Conclusions / Contributions 3

slide-194
SLIDE 194

Related Work

  • Sherif & Kondrak (2007), Hong et al (2009), and
  • thers, get 1-best alignment, segment into chunks,

and score chunks individually

  • Others get 1-best alignment and train conventional

n-gram model (Bisani & Ney (2008), and others)

  • In contrast, we sum over all alignments, use features,

add latent variables, generate arbitrary-length

  • utput, use global normalization

1

String pairs

slide-195
SLIDE 195

Related Work

  • Joint models over multiple strings have not been

tackled much before

  • Exception: Bouchard-Cote et al (2007), who defines a

directed graphical model, does not run BP inference and does not use FSTs

2

Multiple Strings

slide-196
SLIDE 196

Related Work

  • No one has modeled structured inflectional paradigms

before

  • Typically, simple concatenative morphology is assumed

(Harris (1955), Chan (2008)), but see Yarowsky and Wicentowski (2002)

  • Goldsmith (2001) and others extract “suffix

paradigms” (lists of verb endings)

  • In contrast, we extract structured paradigms that

seemlessly handle non-concatenative phenomena

3

Text and Paradigms

slide-197
SLIDE 197
  • Presented several novel probability models step by

step, each smaller one being a factor component in the next bigger one

  • Developed a coherent, unified statistical approach to

inflectional morphology, which advances the state-of- the-art in computational morphology

  • Extracted detailed and structured morphological

knowledge from plain text;

  • Presented the most ambitious morphological

knowledge discovery task and method to date

Conclusions / Contributions

slide-198
SLIDE 198
  • All presented models have many further uses in

NLP:

  • string-pair models for transliteration,

pronunciation modeling, spelling correction, etc.

  • multiple-string models for bioinformatics,

historical linguistics, phonology, transliteration, etc.

  • text & paradigms model for text generation,

machine translation, etc.

Conclusions / Contributions

slide-199
SLIDE 199
  • This thesis naturally brings together many different concepts

from machine learning, NLP, and linguistics, in various novel ways:

  • In part 1, we use linguistically inspired features, latent

variables, FSTs and dynamic programming.

  • In part 2, we combine FSTs with graphical models and belief

propagation.

  • In part 3, we bring together all of the above with statistical

tools like Dirichlet process and collapsed Gibbs sampling to tackle a novel task that people have not been able to tackle before.

Conclusions / Contributions

slide-200
SLIDE 200

Publications

1. Dreyer and Eisner. in prep. Discovering Morphological Paradigms From Plain Text. 2. Dreyer and Eisner. 2009. Graphical Models over Multiple Strings. EMNLP. 3. McNamee, Dredze, Gerber, Garera, Finin, Mayfield, Piatko, Rao, Yarowsky,

  • Dreyer. 2009. HLCOE Approaches to Knowledge Base Population. TAC.

4. Dreyer, Eisner and Smith. 2008. Finite-State Modeling of Log-Linear String Transductions with Latent Variables and Backoff Features. EMNLP. 5. Karakos, Eisner, Khudanpur, Dreyer. 2008. Machine Translation System Combination using ITG-based Alignments. ACL. 6. Dreyer and Shafran. 2007. Exploiting Prosody for PCFGs with Latent

  • Annotations. Interspeech.

7. Dreyer, Hall, Khudanpur. 2007. Comparing Reordering Constraints for SMT Using Efficient BLEU Oracle Computation. SSST. 8. Dreyer and Eisner. 2006. Better Informed Training of Latent Syntactic

  • Features. EMNLP.

9. Dreyer, Smith, Smith. 2006. Vine Parsing and Minimum Risk Reranking for Speed and Precision. CoNLL.

  • 10. Burbank, Carpuat, Clark, Dreyer, Fox, Groves, Hall, Hearne, Melamed,

Shen, Way, Wellington, Wu. 2005. Statistical Machine Translation by

  • Parsing. CLSP.