Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan - - PowerPoint PPT Presentation

weakly supervised bayesian learning of a ccg supertagger
SMART_READER_LITE
LIVE PREVIEW

Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan - - PowerPoint PPT Presentation

Weakly-Supervised Bayesian Learning of a CCG Supertagger Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith Type-Level Supervision Type-Level Supervision Unannotated text Incomplete tag dictionary: word {tags} Type-Level


slide-1
SLIDE 1

Weakly-Supervised Bayesian Learning of a CCG Supertagger

Dan Garrette, Chris Dyer, Jason Baldridge, Noah A. Smith

slide-2
SLIDE 2

Type-Level Supervision

slide-3
SLIDE 3

Type-Level Supervision

  • Unannotated text
  • Incomplete tag dictionary: word ↦ {tags}
slide-4
SLIDE 4

Type-Level Supervision

Used for POS tagging for 20+ years

[Kupiec, 1992] [Merialdo, 1994]

slide-5
SLIDE 5

Type-Level Supervision

Good POS tagger performance even with low supervision

[Das & Petrov 2011] [Garrette & Baldridge 2013] [Garrette et al. 2013]

slide-6
SLIDE 6

Combinatory Categorial Grammar (CCG)

slide-7
SLIDE 7

CCG

Every word token is associated with a category Categories combine to categories of constituents

[Steedman, 2000] [Steedman and Baldridge, 2011]

slide-8
SLIDE 8

the dog n np n / np np

CCG

slide-9
SLIDE 9

s dogs np sleep \ s s

CCG

np

slide-10
SLIDE 10

POS vs. Supertags

the dog sleeps np/n n s\np np s the dog sleeps DT NN VBZ NP S VP

slide-11
SLIDE 11

Supertagging

Type-supervised learning for supertagging is much more difficult than for POS Penn Treebank POS CCGBank Supertags 48 tags 1,239 tags

slide-12
SLIDE 12

CCG

The grammar formalism itself can be used to guide learning

slide-13
SLIDE 13

CCG Supertagging

slide-14
SLIDE 14
  • Sequence tagging problem, like POS-tagging
  • Building block for grammatical parsing

CCG Supertagging

slide-15
SLIDE 15

Supertagging

“almost parsing”

[Bangalore and Joshi 1999]

slide-16
SLIDE 16

sleeps the lazy dog

Why Supertagging?

/ np n n / n n \ s np

slide-17
SLIDE 17

n np n np np n sleeps the lazy dog np / n n / \ s s s

Why Supertagging?

slide-18
SLIDE 18

n np np np n sleeps the lazy dog / n n / \ s np s n n

CCG Supertagging

slide-19
SLIDE 19

n np np np n sleeps the lazy dog / n n / \ s np s n

CCG Supertagging

slide-20
SLIDE 20

sleeps the lazy dog s\np n

CCG Supertagging

n / n np n /

slide-21
SLIDE 21

the lazy dog np/n ? n

CCG Supertagging

slide-22
SLIDE 22

Principle #1

the lazy dog np/n n np X X

Prefer Connections

slide-23
SLIDE 23

Supertags vs. POS

the dog sleeps np/n n s\np np s the dog sleeps DT NN VBZ NP S VP universal, intrinsic grammar properties all relationships must be learned ?

slide-24
SLIDE 24

the lazy dog np/n (np\(np/n))/n n

Principle #2 Prefer Simplicity

slide-25
SLIDE 25

Prefer Simplicity

buy := (((sb\np)/pp)/pp)/np appears 342 times in CCGbank buy := (sb\np)/np appears once

e.g. “Opponents don't buy such arguments.” “Tele-Communications agreed to buy half of Showtime Networks from Viacom for $ 225 million.” pp pp

slide-26
SLIDE 26

a {s, np, n,…} A B / B A B \ B patom(a) A A B \ C B / C × pterm pterm pterm pterm pterm × pfwd × pfwd × pfwd × pfwd × pmod × pmod × pmod × pmod

Weighted Tag Grammar

slide-27
SLIDE 27

the lazy dog np/n n/n n (np\(np/n))/n np

CCG Supertagging

slide-28
SLIDE 28

HMM Transition Prior

P(t → u) = λ · P(u) + (1−λ) · P(t → u)

simple is good connecting is good

slide-29
SLIDE 29

Type-Supervised Learning

unlabeled corpus tag dictionary universal properties of the CCG formalism

same as POS tagging

slide-30
SLIDE 30

Training

slide-31
SLIDE 31

Posterior Inference

Forward-Filter Backward-Sample (FFBS)

  • [Carter and Kohn, 1996]
slide-32
SLIDE 32

Posterior Inference

the lazy dogs wander np/n n np n/n np (s\np)/np n n/n np/n s\np …

Tag Dictionary ___ : __, __, __ ___ : __, __, __ ___ : __, __, __ ___ : __, __, __ ___ : __, __, __ Unlabeled Data ______________ ______________ ______________ ______________ ______________

slide-33
SLIDE 33

Posterior Inference

the lazy dogs np/n n np n/n np wander (s\np)/np n n/n np/n s\np …

Priors HMM

slide-34
SLIDE 34

Posterior Inference

the lazy dogs np/n n np n/n np wander (s\np)/np n n/n np/n s\np …

Priors HMM

slide-35
SLIDE 35

Posterior Inference

the lazy dogs np/n n np n/n np wander (s\np)/np n n/n np/n s\np …

Priors HMM

slide-36
SLIDE 36

Posterior Inference

the lazy dogs np np wander (s\np)/np n n/n np/n …

Priors HMM

np/n n n/n s\np np/n n n/n s\np

slide-37
SLIDE 37

Posterior Inference

the lazy dogs np np wander (s\np)/np n n/n np/n …

Priors HMM

np/n n n/n s\np

slide-38
SLIDE 38

Experiments

slide-39
SLIDE 39

Baldridge 2008

Use universal properties of CCG to initialize EM

  • Simpler definition of category complexity
  • No corpus-specific information
slide-40
SLIDE 40

English Supertagging

25 50 75 100 0.1 0.01 0.001 none

Baldridge '08 Ours

78 80 80 67 55 73 41 51 tag dictionary pruning cutoff t a g g i n g a c c u r a c y

slide-41
SLIDE 41

Chinese Supertagging

25 50 75 100 0.1 0.01 0.001 none

Baldridge '08 Ours

66 69 62 43 33 56 28 49 tag dictionary pruning cutoff t a g g i n g a c c u r a c y

slide-42
SLIDE 42

Italian Supertagging

25 50 75 100 0.1 0.01 0.001 none

Baldridge '08 Ours

54 53 47 36 33 45 32 46 tag dictionary pruning cutoff t a g g i n g a c c u r a c y

slide-43
SLIDE 43

Code Available

GitHub repository linked from my website

slide-44
SLIDE 44

Conclusion

Combining annotation exploitation with universal grammatical knowledge yields good models from weak supervision