An HDP Model for Inducing Combinatory Categorial Grammars Yonatan - - PowerPoint PPT Presentation

an hdp model for inducing combinatory categorial grammars
SMART_READER_LITE
LIVE PREVIEW

An HDP Model for Inducing Combinatory Categorial Grammars Yonatan - - PowerPoint PPT Presentation

An HDP Model for Inducing Combinatory Categorial Grammars Yonatan Bisk & Julia Hockenmaier University of Illinois at Urbana-Champaign TACL Vol 1(2013):75 88 1 Thursday, June 13, 13 PRP VBD ADJ NN She ate crunchy granola 2


slide-1
SLIDE 1

An HDP Model for Inducing Combinatory Categorial Grammars

Yonatan Bisk & Julia Hockenmaier University of Illinois at Urbana-Champaign

TACL Vol 1(2013):75−88

1 Thursday, June 13, 13
slide-2
SLIDE 2 2

PRP VBD ADJ NN She ate crunchy granola

Thursday, June 13, 13
slide-3
SLIDE 3

Dependency Grammar Induction

3

PRP VBD ADJ NN She ate crunchy granola

Thursday, June 13, 13
slide-4
SLIDE 4

Dependency Grammar Induction

3

PRP VBD ADJ NN She ate crunchy granola

Thursday, June 13, 13
slide-5
SLIDE 5

Dependency Grammar Induction

3

PRP VBD ADJ NN She ate crunchy granola

Problem for unsupervised Dependency Grammar learner: Unlabeled dependencies provide no explicit structure

Thursday, June 13, 13
slide-6
SLIDE 6

CFG Induction

4

PRP VBD ADJ NN She ate crunchy granola

Thursday, June 13, 13
slide-7
SLIDE 7

CFG Induction

4

PRP VBD ADJ NN She ate crunchy granola

NP VP S A V N N

Thursday, June 13, 13
slide-8
SLIDE 8

CFG Induction

4

PRP VBD ADJ NN She ate crunchy granola

NP VP S A V N N

Problem for unsupervised CFG learner: CFG symbols and rewrite rules are arbitrary

Thursday, June 13, 13
slide-9
SLIDE 9

CFG Induction in Practice

5

PRP VBD ADJ NN She ate crunchy granola

X6 X2 X0 X32 X4 X5 X5

Thursday, June 13, 13
slide-10
SLIDE 10

CFG Induction in Practice

5

PRP VBD ADJ NN She ate crunchy granola

X6 X2 X0 X32 X4 X5 X5

What kind of grammatical representation is suitable for unsupervised induction?

Thursday, June 13, 13
slide-11
SLIDE 11

Categorial Grammar Induction

6

PRP VBD ADJ NN She ate crunchy granola

N S\N S N/N (S\N)/N N N

Thursday, June 13, 13
slide-12
SLIDE 12

Categorial Grammar Induction

7

PRP VBD ADJ NN She ate crunchy granola

N S\N S N/N (S\N)/N N N

Thursday, June 13, 13
slide-13
SLIDE 13

Categorial Grammar Induction

8

PRP VBD ADJ NN She ate crunchy granola

N/N (S\N)/N N N

Thursday, June 13, 13
slide-14
SLIDE 14

Features of CCG

9

Thursday, June 13, 13
slide-15
SLIDE 15

Features of CCG

  • Linguistically motivated

symbolic representation:

9

Thursday, June 13, 13
slide-16
SLIDE 16

Features of CCG

  • Linguistically motivated

symbolic representation:

CCG captures core dependencies CCG captures basic word order

9

Thursday, June 13, 13
slide-17
SLIDE 17

Features of CCG

  • Linguistically motivated

symbolic representation:

CCG captures core dependencies CCG captures basic word order

  • Rules and categories

are heavily constrained:

9

Thursday, June 13, 13
slide-18
SLIDE 18

Features of CCG

  • Linguistically motivated

symbolic representation:

CCG captures core dependencies CCG captures basic word order

  • Rules and categories

are heavily constrained:

CCG categories are functions CCG rules = function application & composition

9

Thursday, June 13, 13
slide-19
SLIDE 19

Advantages of CCG

  • Linguistically motivated

symbolic representation:

Makes CCG more robust than DGs

  • n longer sentences
  • Rules and categories

are heavily constrained:

Gives CCG a simpler probability model than CFGs

10

Thursday, June 13, 13
slide-20
SLIDE 20

Advantages of CCG

  • Linguistically motivated

symbolic representation:

CCG is more robust than DG on longer sentences CCG returns linguistically interpretable parses

  • Rules and categories

are heavily constrained:

11

Thursday, June 13, 13
slide-21
SLIDE 21

Advantages of CCG

  • Linguistically motivated

symbolic representation:

CCG is more robust than DG on longer sentences CCG returns linguistically interpretable parses

  • Rules and categories

are heavily constrained:

CCG has a simpler probability model than CFGs CCG allows fast variational inference

12

Thursday, June 13, 13
slide-22
SLIDE 22

Categorial Grammar

13 Thursday, June 13, 13
slide-23
SLIDE 23

CCG categories are functions

14

Thursday, June 13, 13
slide-24
SLIDE 24

CCG categories are functions

CCG has two atomic categories:

14

Thursday, June 13, 13
slide-25
SLIDE 25

CCG categories are functions

CCG has two atomic categories:

S, N

14

Thursday, June 13, 13
slide-26
SLIDE 26

CCG categories are functions

CCG has two atomic categories:

S, N

All other CCG categories are functions:

14

Thursday, June 13, 13
slide-27
SLIDE 27

CCG categories are functions

CCG has two atomic categories:

S, N

All other CCG categories are functions:

14

Thursday, June 13, 13
slide-28
SLIDE 28

CCG categories are functions

CCG has two atomic categories:

S, N

All other CCG categories are functions:

14

S

Result

Thursday, June 13, 13
slide-29
SLIDE 29

CCG categories are functions

CCG has two atomic categories:

S, N

All other CCG categories are functions:

14

S

Result

N

Argument

Thursday, June 13, 13
slide-30
SLIDE 30

CCG categories are functions

CCG has two atomic categories:

S, N

All other CCG categories are functions:

14

S

Result

N

Argument

/

Dir.

Thursday, June 13, 13
slide-31
SLIDE 31

Rules: Function application

S

Result

15

Thursday, June 13, 13
slide-32
SLIDE 32

Rules: Function application

S/N

Function

S

Result

15

Thursday, June 13, 13
slide-33
SLIDE 33

Rules: Function application

S/N

Function

N

Argument

S

Result

15

Thursday, June 13, 13
slide-34
SLIDE 34

Rules: Function application

S/N

Function

N

Argument

S

Result

15

Result

S

Thursday, June 13, 13
slide-35
SLIDE 35

Rules: Function application

S

Result

16

Thursday, June 13, 13
slide-36
SLIDE 36

Rules: Function application

(S\N)/N

Function

S

Result

16

Thursday, June 13, 13
slide-37
SLIDE 37

Rules: Function application

(S\N)/N

Function

N

Argument

S

Result

16

Thursday, June 13, 13
slide-38
SLIDE 38

Rules: Function application

(S\N)/N

Function

N

Argument

S

Result

16

Result

S\N

Thursday, June 13, 13
slide-39
SLIDE 39

Inducing CCGs

17

Bisk & Hockenmaier, AAAI 2012

Thursday, June 13, 13
slide-40
SLIDE 40

Seed knowledge: Atoms

18

Thursday, June 13, 13
slide-41
SLIDE 41

Seed knowledge: Atoms

18

Atomic CCG category Part-of-speech tag class

Thursday, June 13, 13
slide-42
SLIDE 42

Seed knowledge: Atoms

18

Atomic CCG category Part-of-speech tag class S Verb Det, Noun,

Thursday, June 13, 13
slide-43
SLIDE 43

Seed knowledge: Atoms

18

Atomic CCG category Part-of-speech tag class S Verb N Det, Noun, Pron, Num

Thursday, June 13, 13
slide-44
SLIDE 44

Seed knowledge: Atoms

18

Atomic CCG category Part-of-speech tag class S Verb N Det, Noun, Pron, Num conj Conj

Thursday, June 13, 13
slide-45
SLIDE 45

Inducing complex categories

19

The man ate quickly N S

Thursday, June 13, 13
slide-46
SLIDE 46

Inducing complex categories

19

The man ate quickly N S S\N

Thursday, June 13, 13
slide-47
SLIDE 47

Inducing complex categories

19

The man ate quickly N S ? S\N

Thursday, June 13, 13
slide-48
SLIDE 48

Inducing complex categories

19

The man ate quickly N S ? ? S\N

Thursday, June 13, 13
slide-49
SLIDE 49

Inducing complex categories

19

The man ate quickly N S ? S\N

Thursday, June 13, 13
slide-50
SLIDE 50

Inducing complex categories

19

The man ate quickly N S ? N/N S\N

Thursday, June 13, 13
slide-51
SLIDE 51

Inducing complex categories

19

The man ate quickly N S N/N S\N

Thursday, June 13, 13
slide-52
SLIDE 52

Inducing complex categories

19

The man ate quickly N S N/N S\S S\N

Thursday, June 13, 13
slide-53
SLIDE 53

Inducing complex categories

19

The man ate quickly N S N/N S\S S/S N\N S\N ...

Thursday, June 13, 13
slide-54
SLIDE 54

An HDP Model for CCG

20 Thursday, June 13, 13
slide-55
SLIDE 55

Hierarchical Dirichlet Process

21

Thursday, June 13, 13
slide-56
SLIDE 56

Hierarchical Dirichlet Process

21

Nonparametric Bayesian model

Thursday, June 13, 13
slide-57
SLIDE 57

Hierarchical Dirichlet Process

21

Nonparametric Bayesian model

We do not need to fix the category inventory in advance

Thursday, June 13, 13
slide-58
SLIDE 58

Hierarchical Dirichlet Process

21

Nonparametric Bayesian model

We do not need to fix the category inventory in advance

Hierarchical model

Thursday, June 13, 13
slide-59
SLIDE 59

Hierarchical Dirichlet Process

21

Nonparametric Bayesian model

We do not need to fix the category inventory in advance

Hierarchical model

All distributions share a common base

Thursday, June 13, 13
slide-60
SLIDE 60

Hierarchical Dirichlet Process

21

Nonparametric Bayesian model

We do not need to fix the category inventory in advance

Hierarchical model

All distributions share a common base Parameter tying (smoothing)

Thursday, June 13, 13
slide-61
SLIDE 61

HDPs for CFGs

22

Liang et al. 2009

Thursday, June 13, 13
slide-62
SLIDE 62

HDPs for CFGs

22

X0

Liang et al. 2009

Thursday, June 13, 13
slide-63
SLIDE 63

HDPs for CFGs

22

X0 X2 X5

Liang et al. 2009

Thursday, June 13, 13
slide-64
SLIDE 64

HDPs for CFGs

22

X0 X2 X5 X6 X4

Liang et al. 2009

Thursday, June 13, 13
slide-65
SLIDE 65

HDPs for CFGs

22

X0 X2 X5 X6 X4 X32 X5

Liang et al. 2009

Thursday, June 13, 13
slide-66
SLIDE 66

Parameters for Xi → Xj Xk

23

Thursday, June 13, 13
slide-67
SLIDE 67

Parameters for Xi → Xj Xk

X1 X2 X3 X4 X5 X6 X7 X8 X9 ... X1 X2 X3 X4 X5 X6 X7 X8 X9 ...

23

Thursday, June 13, 13
slide-68
SLIDE 68

Parameters for Xi → Xj Xk

X1 X2 X3 X4 X5 X6 X7 X8 X9 ... X1 X2 X3 X4 X5 X6 X7 X8 X9 ...

? ? ? ? ? ? ? ? ? ?

23

Thursday, June 13, 13
slide-69
SLIDE 69

Parameters for Xi → Xj Xk

X1 X2 X3 X4 X5 X6 X7 X8 X9 ... X1 X2 X3 X4 X5 X6 X7 X8 X9 ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

23

Thursday, June 13, 13
slide-70
SLIDE 70

Parameters for Xi → Xj Xk

X1 X2 X3 X4 X5 X6 X7 X8 X9 ... X1 X2 X3 X4 X5 X6 X7 X8 X9 ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

23

Thursday, June 13, 13
slide-71
SLIDE 71

Parameters for Xi → Xj Xk

X1 X2 X3 X4 X5 X6 X7 X8 X9 ... X1 X2 X3 X4 X5 X6 X7 X8 X9 ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

23

Problem for nonparametric PCFG models: Each LHS nonterminal Xi is allowed a doubly infinite cross-product

  • f RHS children Xj, Xk
Thursday, June 13, 13
slide-72
SLIDE 72

Parameters for S\N → ... ...

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

24

Thursday, June 13, 13
slide-73
SLIDE 73

Parameters for S\N → ... ...

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

24

Thursday, June 13, 13
slide-74
SLIDE 74

Parameters for S\N → ... ...

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

24

Thursday, June 13, 13
slide-75
SLIDE 75

Parameters for S\N → ... ...

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

24

Thursday, June 13, 13
slide-76
SLIDE 76

Parameters for S\N → ... ...

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

24

Thursday, June 13, 13
slide-77
SLIDE 77

Parameters for S\N → ... ...

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

24

Thursday, June 13, 13
slide-78
SLIDE 78

Parameters for S\N → ... ...

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

24

Thursday, June 13, 13
slide-79
SLIDE 79

Parameters for S\N → ... ...

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

24

Thursday, June 13, 13
slide-80
SLIDE 80

Parameters for S\N → ... ...

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ...

? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

24

Thursday, June 13, 13
slide-81
SLIDE 81

CCG rules

25

Thursday, June 13, 13
slide-82
SLIDE 82

Parent Combinator Left Right

(S\N)/N N >B ((S\N)/N)/Y Y (S\N)/N /N (S\N)/N \N)/N (S\N)/N ((S\N)/N)\Y (S\N)/N (S\N)\Y (S\N)/N S\Y

CCG rules

25

Thursday, June 13, 13
slide-83
SLIDE 83

Parent Combinator Left Right

(S\N)/N N >B0 ((S\N)/N)/Y Y (S\N)/N >B (S\N)/Y Y/N (S\N)/N \N)/N (S\N)/N ((S\N)/N)\Y (S\N)/N (S\N)\Y (S\N)/N S\Y

CCG rules

25

Thursday, June 13, 13
slide-84
SLIDE 84

Parent Combinator Left Right

(S\N)/N N >B0 ((S\N)/N)/Y Y (S\N)/N >B1 (S\N)/Y Y/N (S\N)/N >B S\Y (Y\N)/N (S\N)/N ((S\N)/N)\Y (S\N)/N (S\N)\Y (S\N)/N S\Y

CCG rules

25

Thursday, June 13, 13
slide-85
SLIDE 85

Parent Combinator Left Right

(S\N)/N N >B0 ((S\N)/N)/Y Y (S\N)/N >B1 (S\N)/Y Y/N (S\N)/N >B2 S\Y (Y\N)/N (S\N)/N <B Y ((S\N)/N)\Y (S\N)/N (S\N)\Y (S\N)/N S\Y

CCG rules

25

Thursday, June 13, 13
slide-86
SLIDE 86

Parent Combinator Left Right

(S\N)/N N >B0 ((S\N)/N)/Y Y (S\N)/N >B1 (S\N)/Y Y/N (S\N)/N >B2 S\Y (Y\N)/N (S\N)/N <B0 Y ((S\N)/N)\Y (S\N)/N <B Y/N (S\N)\Y (S\N)/N S\Y

CCG rules

25

Thursday, June 13, 13
slide-87
SLIDE 87

Parent Combinator Left Right

(S\N)/N N >B0 ((S\N)/N)/Y Y (S\N)/N >B1 (S\N)/Y Y/N (S\N)/N >B2 S\Y (Y\N)/N (S\N)/N <B0 Y ((S\N)/N)\Y (S\N)/N <B1 Y/N (S\N)\Y (S\N)/N <B (Y\N)/N S\Y

CCG rules

25

Thursday, June 13, 13
slide-88
SLIDE 88

Parent Combinator Left Right

(S\N)/N N >B0 ((S\N)/N)/Y Y (S\N)/N >B1 (S\N)/Y Y/N (S\N)/N >B2 S\Y (Y\N)/N (S\N)/N <B0 Y ((S\N)/N)\Y (S\N)/N <B1 Y/N (S\N)\Y (S\N)/N <B2 (Y\N)/N S\Y

CCG rules

25

Thursday, June 13, 13
slide-89
SLIDE 89

CCG rules

26

Thursday, June 13, 13
slide-90
SLIDE 90

Parent Y

(S\N)/N S (S\N)/N S (S\N)/N S (S\N)/N S (S\N)/N S (S\N)/N S

CCG rules

26

Thursday, June 13, 13
slide-91
SLIDE 91

Parent Y Combinator

(S\N)/N S (S\N)/N S (S\N)/N S (S\N)/N S (S\N)/N S (S\N)/N S

CCG rules

26

Thursday, June 13, 13
slide-92
SLIDE 92

Parent Y Combinator Left

(S\N)/N S >B0 ((S\N)/N)/ (S\N)/N S >B1 (S\N)/ (S\N)/N S >B2 S\ (S\N)/N S <B0 S (S\N)/N S <B1 S/N (S\N)/N S <B2 (S\N)/N

CCG rules

26

Thursday, June 13, 13
slide-93
SLIDE 93

Parent Y Combinator Left Right

(S\N)/N S >B0 ((S\N)/N)/S S (S\N)/N S >B1 (S\N)/S S/N (S\N)/N S >B2 S\S (S\N)/N (S\N)/N S <B0 S ((S\N)/N)\S (S\N)/N S <B1 S/N (S\N)\S (S\N)/N S <B2 (S\N)/N S\S

CCG rules

26

Thursday, June 13, 13
slide-94
SLIDE 94

Parent Y Combinator Left Right

(S\N)/N S >B0 ((S\N)/N)/S S (S\N)/N S >B1 (S\N)/S S/N (S\N)/N S >B2 S\S (S\N)/N (S\N)/N S <B0 S ((S\N)/N)\S (S\N)/N S <B1 S/N (S\N)\S (S\N)/N S <B2 (S\N)/N S\S

CCG rules

26

CCG rules are heavily constrained:

For a given parent category, the Y category and combinator determine both children

Thursday, June 13, 13
slide-95
SLIDE 95

HDPs for CCGs

27

Thursday, June 13, 13
slide-96
SLIDE 96

HDPs for CCGs

27

S

Thursday, June 13, 13
slide-97
SLIDE 97

HDPs for CCGs

27

S

Y = N Combinator = <B0

Thursday, June 13, 13
slide-98
SLIDE 98

HDPs for CCGs

27

S S\N N

Y = N Combinator = <B0

Thursday, June 13, 13
slide-99
SLIDE 99

HDPs for CCGs

27

S S\N N

Thursday, June 13, 13
slide-100
SLIDE 100

HDPs for CCGs

27

S N (S\N)/N S\N N S\N

Y = N Combinator = >B0

Thursday, June 13, 13
slide-101
SLIDE 101

HDPs for CCGs

27

S N (S\N)/N S\N N S\N

Thursday, June 13, 13
slide-102
SLIDE 102

HDPs for CCGs

27

S N (S\N)/N S\N N S\N N/N N N (S\N)/N

Y = N Combinator = >B0

Thursday, June 13, 13
slide-103
SLIDE 103

HDPs for CCGs

27

S N (S\N)/N S\N N S\N N/N N N (S\N)/N

Thursday, June 13, 13
slide-104
SLIDE 104

HDP-CFG vs HDP-CCG

28

Thursday, June 13, 13
slide-105
SLIDE 105

CFG: doubly infinite P(Xi →Xj Xk| Xi )

HDP-CFG vs HDP-CCG

28

Thursday, June 13, 13
slide-106
SLIDE 106

CFG: doubly infinite P(Xi →Xj Xk| Xi )

X1 X2 X3 X4 X5 X6 X7 X8 X9 ... X1 ? ? ? ? ? ? ? ? ? ? X2 ? ? ? ? ? ? ? ? ? ? X3 ? ? ? ? ? ? ? ? ? ? X4 ? ? ? ? ? ? ? ? ? ? X5 ? ? ? ? ? ? ? ? ? ? X6 ? ? ? ? ? ? ? ? ? ? X7 ? ? ? ? ? ? ? ? ? ? X8 ? ? ? ? ? ? ? ? ? ? X9 ? ? ? ? ? ? ? ? ? ? ... ? ? ? ? ? ? ? ? ? ?

HDP-CFG vs HDP-CCG

28

Thursday, June 13, 13
slide-107
SLIDE 107

CFG: doubly infinite P(Xi →Xj Xk| Xi ) CCG: infinite P( Y | Xi ) and finite P( c | Y, Xi)

X1 X2 X3 X4 X5 X6 X7 X8 X9 ... X1 ? ? ? ? ? ? ? ? ? ? X2 ? ? ? ? ? ? ? ? ? ? X3 ? ? ? ? ? ? ? ? ? ? X4 ? ? ? ? ? ? ? ? ? ? X5 ? ? ? ? ? ? ? ? ? ? X6 ? ? ? ? ? ? ? ? ? ? X7 ? ? ? ? ? ? ? ? ? ? X8 ? ? ? ? ? ? ? ? ? ? X9 ? ? ? ? ? ? ? ? ? ? ... ? ? ? ? ? ? ? ? ? ?

HDP-CFG vs HDP-CCG

28

Thursday, June 13, 13
slide-108
SLIDE 108

CFG: doubly infinite P(Xi →Xj Xk| Xi ) CCG: infinite P( Y | Xi ) and finite P( c | Y, Xi)

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... X1 X2 X3 X4 X5 X6 X7 X8 X9 ... X1 ? ? ? ? ? ? ? ? ? ? X2 ? ? ? ? ? ? ? ? ? ? X3 ? ? ? ? ? ? ? ? ? ? X4 ? ? ? ? ? ? ? ? ? ? X5 ? ? ? ? ? ? ? ? ? ? X6 ? ? ? ? ? ? ? ? ? ? X7 ? ? ? ? ? ? ? ? ? ? X8 ? ? ? ? ? ? ? ? ? ? X9 ? ? ? ? ? ? ? ? ? ? ... ? ? ? ? ? ? ? ? ? ?

HDP-CFG vs HDP-CCG

28

Thursday, June 13, 13
slide-109
SLIDE 109

CFG: doubly infinite P(Xi →Xj Xk| Xi ) CCG: infinite P( Y | Xi ) and finite P( c | Y, Xi)

S N S/S S\S S/N S\N (S\N)/N (S\N)\S (S\N)\N ... X1 X2 X3 X4 X5 X6 X7 X8 X9 ... X1 ? ? ? ? ? ? ? ? ? ? X2 ? ? ? ? ? ? ? ? ? ? X3 ? ? ? ? ? ? ? ? ? ? X4 ? ? ? ? ? ? ? ? ? ? X5 ? ? ? ? ? ? ? ? ? ? X6 ? ? ? ? ? ? ? ? ? ? X7 ? ? ? ? ? ? ? ? ? ? X8 ? ? ? ? ? ? ? ? ? ? X9 ? ? ? ? ? ? ? ? ? ? ... ? ? ? ? ? ? ? ? ? ?

HDP-CFG vs HDP-CCG

28

The HDP-CFG base measure requires ββT The HDP-CCG base measure is the standard β ~ GEM(α) (akin to e.g. HDP-HMMs)

Thursday, June 13, 13
slide-110
SLIDE 110

Variational EM for HDP-CCGs

29

Thursday, June 13, 13
slide-111
SLIDE 111

Variational EM for HDP-CCGs

Computation parallels Inside-Outside:

29

Thursday, June 13, 13
slide-112
SLIDE 112

Variational EM for HDP-CCGs

Computation parallels Inside-Outside:

29

WP(Y) = Ψ(C(P,Y)+αPβY)−Ψ(C(P,∗)+αP)

Thursday, June 13, 13
slide-113
SLIDE 113

Variational EM for HDP-CCGs

Computation parallels Inside-Outside: Trivially parallelizeable; efficient

29

WP(Y) = Ψ(C(P,Y)+αPβY)−Ψ(C(P,∗)+αP)

Thursday, June 13, 13
slide-114
SLIDE 114

Variational EM for HDP-CCGs

Computation parallels Inside-Outside: Trivially parallelizeable; efficient

  • Experiments in paper:

1 min – 4 hrs

29

WP(Y) = Ψ(C(P,Y)+αPβY)−Ψ(C(P,∗)+αP)

Thursday, June 13, 13
slide-115
SLIDE 115

Results

30 Thursday, June 13, 13
slide-116
SLIDE 116

Impact of longer sentences

31

Thursday, June 13, 13
slide-117
SLIDE 117

Impact of longer sentences

31

WSJ comparison with Naseem et al. 2010’s Universal dependency grammar

Thursday, June 13, 13
slide-118
SLIDE 118

Impact of longer sentences

31

WSJ comparison with Naseem et al. 2010’s Universal dependency grammar

Trained and tested on rained and tested on ≤ 10 ≤ 20

Thursday, June 13, 13
slide-119
SLIDE 119

Impact of longer sentences

31

WSJ comparison with Naseem et al. 2010’s Universal dependency grammar

Trained and tested on rained and tested on ≤ 10 ≤ 20 Naseem et al. 71.9 50.4

Thursday, June 13, 13
slide-120
SLIDE 120

Impact of longer sentences

31

WSJ comparison with Naseem et al. 2010’s Universal dependency grammar

Trained and tested on rained and tested on ≤ 10 ≤ 20 Naseem et al. 71.9 50.4 HDP-CCG 68.2 64.2

Thursday, June 13, 13
slide-121
SLIDE 121

Impact of longer sentences

32

Trained and tested on rained and tested on ≤ 10 ≤ 20 Naseem et al. 71.9 50.4 HDP-CCG 68.2 64.2

Thursday, June 13, 13
slide-122
SLIDE 122

Impact of longer sentences

32

Can long sentences help performance

  • n short sentences?

Trained and tested on rained and tested on ≤ 10 ≤ 20 Naseem et al. 71.9 50.4 HDP-CCG 68.2 64.2

Thursday, June 13, 13
slide-123
SLIDE 123

Impact of longer sentences

32

Can long sentences help performance

  • n short sentences?

Yes! HDP-CCG achieves 71.9 on ≤10 if trained on ≤20

Trained and tested on rained and tested on ≤ 10 ≤ 20 Naseem et al. 71.9 50.4 HDP-CCG 68.2 64.2

Thursday, June 13, 13
slide-124
SLIDE 124

Impact of longer sentences

32

Can long sentences help performance

  • n short sentences?

Yes! HDP-CCG achieves 71.9 on ≤10 if trained on ≤20

Trained and tested on rained and tested on ≤ 10 ≤ 20 Naseem et al. 71.9 50.4 HDP-CCG 68.2 64.2

Thursday, June 13, 13
slide-125
SLIDE 125

Multilingual performance

33

* Max over all best performing systems (extra data, tuning, etc.)

Thursday, June 13, 13
slide-126
SLIDE 126

Multilingual performance

NAACL WILS Shared Task 2012

33

* Max over all best performing systems (extra data, tuning, etc.)

Thursday, June 13, 13
slide-127
SLIDE 127

Multilingual performance

NAACL WILS Shared Task 2012 Average ≤10 accuracy on 10 languages

33

* Max over all best performing systems (extra data, tuning, etc.)

Thursday, June 13, 13
slide-128
SLIDE 128

Multilingual performance

NAACL WILS Shared Task 2012 Average ≤10 accuracy on 10 languages

(Arabic, Danish, Slovene, Swedish, Dutch, Basque, Portuguese, WSJ, CHILDES, Czech)

33

* Max over all best performing systems (extra data, tuning, etc.)

Thursday, June 13, 13
slide-129
SLIDE 129

Multilingual performance

NAACL WILS Shared Task 2012 Average ≤10 accuracy on 10 languages

(Arabic, Danish, Slovene, Swedish, Dutch, Basque, Portuguese, WSJ, CHILDES, Czech)

33

Dependencies Dependencies CCG CCG: new model Dependencies Dependencies

Bisk & Blunsom & Cohn 2010

State of the Art*

Bisk & Hockenmaier 2012

55.2 62.3 54.2

* Max over all best performing systems (extra data, tuning, etc.)

Thursday, June 13, 13
slide-130
SLIDE 130

Multilingual performance

NAACL WILS Shared Task 2012 Average ≤10 accuracy on 10 languages

(Arabic, Danish, Slovene, Swedish, Dutch, Basque, Portuguese, WSJ, CHILDES, Czech)

33

Dependencies Dependencies CCG CCG: new model CCG: new model Dependencies Dependencies

Bisk & Blunsom & Cohn 2010

State of the Art*

Bisk & Hockenmaier 2012

MLE

55.2 62.3 54.2 50.9

* Max over all best performing systems (extra data, tuning, etc.)

Thursday, June 13, 13
slide-131
SLIDE 131

Multilingual performance

NAACL WILS Shared Task 2012 Average ≤10 accuracy on 10 languages

(Arabic, Danish, Slovene, Swedish, Dutch, Basque, Portuguese, WSJ, CHILDES, Czech)

33

Dependencies Dependencies CCG CCG: new model CCG: new model Dependencies Dependencies

Bisk &

HDP-

Blunsom & Cohn 2010

State of the Art*

Bisk & Hockenmaier 2012

MLE HDP- CCG

55.2 62.3 54.2 50.9 64.5

* Max over all best performing systems (extra data, tuning, etc.)

Thursday, June 13, 13
slide-132
SLIDE 132

Induced Lexicons: Adjectives

English Big Ball N/N N

34

Obj Adj

Thursday, June 13, 13
slide-133
SLIDE 133

Induced Lexicons: Adjectives

English Arabic Big Ball N/N N ةركةريبك N N\N (ball) (big)

Obj Adj

34

Obj Adj

Thursday, June 13, 13
slide-134
SLIDE 134

Induced Lexicons: Verbs

The man wrote a letter N (S\N)/N N

35

English

O V S

Thursday, June 13, 13
slide-135
SLIDE 135

Induced Lexicons: Verbs

Child Directed Speech

The man wrote a letter N (S\N)/N N

∅ write a letter S/N

N

35

English

O V ∅ O V S

Thursday, June 13, 13
slide-136
SLIDE 136

Induced Lexicons: Verbs

Child Directed Speech Arabic

The man wrote a letter N (S\N)/N N

∅ write a letter S/N

N

بتكلاجرلاةلاسر (S/N)/N N N

(wrote) (the man) (a letter)

O V S

35

English

O V ∅ O V S

Thursday, June 13, 13
slide-137
SLIDE 137

Induced Lexicons: Adpositions

English ran

  • n

beach (S\N)/N (S\S)/N N

V O ADP

36

Thursday, June 13, 13
slide-138
SLIDE 138

Induced Lexicons: Adpositions

English Japanese ran

  • n

beach (S\N)/N (S\S)/N N 浜 を 走った N (S/S)\N (S\N)/N

(beach) (on) (ran)

V O ADP V O ADP

36

Thursday, June 13, 13
slide-139
SLIDE 139

Summary of contributions

37

Thursday, June 13, 13
slide-140
SLIDE 140

Summary of contributions

A new probability model for CCG

37

Thursday, June 13, 13
slide-141
SLIDE 141

Summary of contributions

A new probability model for CCG

  • Exploits CCG’s functional constraints

37

Thursday, June 13, 13
slide-142
SLIDE 142

Summary of contributions

A new probability model for CCG

  • Exploits CCG’s functional constraints
  • Yields fast variational inference

37

Thursday, June 13, 13
slide-143
SLIDE 143

Summary of contributions

A new probability model for CCG

  • Exploits CCG’s functional constraints
  • Yields fast variational inference

State-of-the-Art accuracy

37

Thursday, June 13, 13
slide-144
SLIDE 144

Summary of contributions

A new probability model for CCG

  • Exploits CCG’s functional constraints
  • Yields fast variational inference

State-of-the-Art accuracy

  • Performs well on 15 languages

37

Thursday, June 13, 13
slide-145
SLIDE 145

Summary of contributions

A new probability model for CCG

  • Exploits CCG’s functional constraints
  • Yields fast variational inference

State-of-the-Art accuracy

  • Performs well on 15 languages
  • Can harness longer sentences

37

Thursday, June 13, 13
slide-146
SLIDE 146

Summary of contributions

A new probability model for CCG

  • Exploits CCG’s functional constraints
  • Yields fast variational inference

State-of-the-Art accuracy

  • Performs well on 15 languages
  • Can harness longer sentences
  • Induces linguistically informative lexicons

37

Thursday, June 13, 13
slide-147
SLIDE 147

Work in progress

38

Thursday, June 13, 13
slide-148
SLIDE 148

Work in progress

  • Performance is robust

beyond context-free CCG fragment

38

Thursday, June 13, 13
slide-149
SLIDE 149

Work in progress

  • Performance is robust

beyond context-free CCG fragment

  • Performance improves when

generating words (not just POS tags)

38

Thursday, June 13, 13
slide-150
SLIDE 150

Work in progress

  • Performance is robust

beyond context-free CCG fragment

  • Performance improves when

generating words (not just POS tags)

  • Remove dependence on POS tags

38

Thursday, June 13, 13
slide-151
SLIDE 151

Work in progress

  • Performance is robust

beyond context-free CCG fragment

  • Performance improves when

generating words (not just POS tags)

  • Remove dependence on POS tags

38

Thank you!

Thursday, June 13, 13