Learning Dependency-Based Compositional Semantics Semantic - - PowerPoint PPT Presentation

learning dependency based compositional semantics
SMART_READER_LITE
LIVE PREVIEW

Learning Dependency-Based Compositional Semantics Semantic - - PowerPoint PPT Presentation

Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference Workshop Mar. 10, 2012 Percy Liang Google/Stanford joint work with Michael Jordan and Dan Klein Motivating Problem: Question Answering 2


slide-1
SLIDE 1

Learning Dependency-Based Compositional Semantics

Semantic Representations for Textual Inference Workshop – Mar. 10, 2012

Percy Liang

Google/Stanford joint work with Michael Jordan and Dan Klein

slide-2
SLIDE 2

Motivating Problem: Question Answering

2

slide-3
SLIDE 3

Motivating Problem: Question Answering

What is the largest city in California?

2

slide-4
SLIDE 4

Motivating Problem: Question Answering

What is the largest city in California? What is the largest city in a state bordering California?

2

slide-5
SLIDE 5

Semantic Interpretation

4

slide-6
SLIDE 6

Semantic Interpretation

What is the largest city in a state bordering California? Phoenix

4

slide-7
SLIDE 7

Semantic Interpretation

What is the largest city in a state bordering California?

?

Phoenix

4

slide-8
SLIDE 8

Semantic Interpretation

What is the largest city in a state bordering California?

argmax({c :city(c)∧∃s.state(s) ∧ loc(c, s) ∧ border(s, CA)}, population)

Phoenix

4

slide-9
SLIDE 9

Semantic Interpretation

What is the largest city in a state bordering California?

argmax({c :city(c) ∧ ∃s.state(s) ∧ loc(c, s)∧border(s, CA)}, population)

Phoenix

4

slide-10
SLIDE 10

Semantic Interpretation

What is the largest city in a state bordering California?

argmax({c :city(c) ∧ ∃s.state(s) ∧ loc(c, s) ∧ border(s, CA)}, population)

Phoenix

4

slide-11
SLIDE 11

Semantic Interpretation

What is the largest city in a state bordering California?

argmax({c : city(c) ∧ ∃s.state(s) ∧ loc(c, s) ∧ border(s, CA)}, population)

Phoenix

4

slide-12
SLIDE 12

Semantic Interpretation

What is the largest city in a state bordering California?

argmax({c : city(c) ∧ ∃s.state(s) ∧ loc(c, s) ∧ border(s, CA)}, population)

computation Phoenix

4

slide-13
SLIDE 13

Semantic Interpretation

What is the largest city in a state bordering California?

?

computation Phoenix

4

slide-14
SLIDE 14

Supervision for Semantic Interpretation

6

slide-15
SLIDE 15

Supervision for Semantic Interpretation

Detailed Supervision (current)

What is the largest city in California?

argmax({c : city(c) ∧ loc(c, CA)}, population)

6

slide-16
SLIDE 16

Supervision for Semantic Interpretation

Detailed Supervision (current)

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

6

slide-17
SLIDE 17

Supervision for Semantic Interpretation

Detailed Supervision (current)

  • doesn’t scale up

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

6

slide-18
SLIDE 18

Supervision for Semantic Interpretation

Detailed Supervision (current)

  • doesn’t scale up

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

Natural Supervision (new)

What is the largest city in California? Los Angeles

6

slide-19
SLIDE 19

Supervision for Semantic Interpretation

Detailed Supervision (current)

  • doesn’t scale up

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

Natural Supervision (new)

What is the largest city in California? non-expert Los Angeles

6

slide-20
SLIDE 20

Supervision for Semantic Interpretation

Detailed Supervision (current)

  • doesn’t scale up

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

Natural Supervision (new)

  • scales up

What is the largest city in California? non-expert Los Angeles

6

slide-21
SLIDE 21

Supervision for Semantic Interpretation

Detailed Supervision (current)

  • doesn’t scale up
  • representation-dependent

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

Natural Supervision (new)

  • scales up

What is the largest city in California? non-expert Los Angeles

6

slide-22
SLIDE 22

Supervision for Semantic Interpretation

Detailed Supervision (current)

  • doesn’t scale up
  • representation-dependent

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

Natural Supervision (new)

  • scales up
  • representation-independent

What is the largest city in California? non-expert Los Angeles

6

slide-23
SLIDE 23

Outline

Representation

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Learning

x θ z w y

Experiments

9

slide-24
SLIDE 24

Considerations

Computational: how to efficiently search exponential space?

10

slide-25
SLIDE 25

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? Los Angeles

10

slide-26
SLIDE 26

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? λx.state(x) Los Angeles

10

slide-27
SLIDE 27

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? λx.city(x) Los Angeles

10

slide-28
SLIDE 28

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? λx.city(x) ∧ loc(x, CA) Los Angeles

10

slide-29
SLIDE 29

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? λx.state(x) ∧ border(x, CA) Los Angeles

10

slide-30
SLIDE 30

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California?

population(CA)

Los Angeles

10

slide-31
SLIDE 31

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California?

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

Los Angeles

10

slide-32
SLIDE 32

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? · · · LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF · · · Los Angeles

10

slide-33
SLIDE 33

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? · · · LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF · · · Los Angeles Statistical: how to parametrize mapping from sentence to logical form? What is the most populous city in California?

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

10

slide-34
SLIDE 34

Dependency-Based Compositional Semantics (DCS)

What is the most populous city in California?

11

slide-35
SLIDE 35

Dependency-Based Compositional Semantics (DCS)

What is the most populous city in California?

1 1 1 1 c

argmax population

2 1

CA loc city

Los Angeles

11

slide-36
SLIDE 36

Dependency-Based Compositional Semantics (DCS)

What is the most populous city in California?

1 1 1 1 c

argmax population

2 1

CA loc city

Los Angeles Advantages of DCS: nice computational, statistical, linguistic properties

11

slide-37
SLIDE 37

Where do the answers come from?

What is the most populous city in California?

1 1 1 1 c

argmax population

2 1

CA loc city

Los Angeles

12

slide-38
SLIDE 38

Where do the answers come from?

What is the most populous city in California?

1 1 1 1 c

argmax population

2 1

CA loc city

Database Los Angeles

12

slide-39
SLIDE 39

Database

city

San Francisco Chicago Boston · · ·

state

Alabama Alaska Arizona · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

border

Washington Oregon Washington Idaho Oregon Washington · · · · · · · · · · · ·

13

slide-40
SLIDE 40

Basic DCS Trees

DCS tree

city

1 1

loc

2 1

CA

Database

14

slide-41
SLIDE 41

Basic DCS Trees

DCS tree Constraints

city

1 1

loc

2 1

CA

Database A DCS tree encodes a constraint satisfaction problem (CSP)

14

slide-42
SLIDE 42

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

loc

2 1

CA

Database

city

San Francisco Chicago Boston · · ·

A DCS tree encodes a constraint satisfaction problem (CSP)

14

slide-43
SLIDE 43

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

loc

` ∈ loc

2 1

CA

Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

A DCS tree encodes a constraint satisfaction problem (CSP)

14

slide-44
SLIDE 44

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

loc

` ∈ loc

2 1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

14

slide-45
SLIDE 45

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = `1

loc

` ∈ loc

2 1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

14

slide-46
SLIDE 46

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = `1

loc

` ∈ loc

2 1

`2 = s1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

14

slide-47
SLIDE 47

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = `1

loc

` ∈ loc

2 1

`2 = s1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

14

slide-48
SLIDE 48

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = `1

loc

` ∈ loc

2 1

`2 = s1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

14

slide-49
SLIDE 49

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = `1

loc

` ∈ loc

2 1

`2 = s1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

14

slide-50
SLIDE 50

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = `1

loc

` ∈ loc

2 1

`2 = s1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP) Computation: dynamic programming ⇒ time = O(# nodes)

14

slide-51
SLIDE 51

Properties of DCS Trees

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

15

slide-52
SLIDE 52

Properties of DCS Trees

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Trees

15

slide-53
SLIDE 53

Properties of DCS Trees

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Linguistics syntactic locality Trees

15

slide-54
SLIDE 54

Properties of DCS Trees

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Linguistics syntactic locality Trees Computation efficient interpretation

15

slide-55
SLIDE 55

Divergence between Syntactic and Semantic Scope

most populous city in California

16

slide-56
SLIDE 56

Divergence between Syntactic and Semantic Scope

most populous city in California Syntax most populous California in city

16

slide-57
SLIDE 57

Divergence between Syntactic and Semantic Scope

most populous city in California Syntax Semantics most populous California in city

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

16

slide-58
SLIDE 58

Divergence between Syntactic and Semantic Scope

most populous city in California Syntax Semantics most populous California in city

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

16

slide-59
SLIDE 59

Divergence between Syntactic and Semantic Scope

most populous city in California Syntax Semantics most populous California in city

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

Problem: syntactic scope is lower than semantic scope

16

slide-60
SLIDE 60

Divergence between Syntactic and Semantic Scope

most populous city in California Syntax Semantics most populous California in city

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

Problem: syntactic scope is lower than semantic scope If DCS trees look like syntax, how do we get correct semantics?

16

slide-61
SLIDE 61

Solution: Mark-Execute

most populous city in California

x1 x1 1 1 1 1 c

argmax population

2 1

CA loc city

∗∗ Superlatives

17

slide-62
SLIDE 62

Solution: Mark-Execute

most populous city in California Mark at syntactic scope

x1 x1 1 1 1 1 c

argmax population

2 1

CA loc city

∗∗ Superlatives

17

slide-63
SLIDE 63

Solution: Mark-Execute

most populous city in California Execute at semantic scope Mark at syntactic scope

x1 x1 1 1 1 1 c

argmax population

2 1

CA loc city

∗∗ Superlatives

17

slide-64
SLIDE 64

Solution: Mark-Execute

Alaska borders no states. Execute at semantic scope Mark at syntactic scope

x1 x1 2 1 1 1

AK

q

no state border

∗∗ Negation

17

slide-65
SLIDE 65

Solution: Mark-Execute

Some river traverses every city. Execute at semantic scope Mark at syntactic scope

x12 x12 2 1 1 1 q

some river

q

every city traverse

∗∗ Quantification (narrow)

17

slide-66
SLIDE 66

Solution: Mark-Execute

Some river traverses every city. Execute at semantic scope Mark at syntactic scope

x21 x21 2 1 1 1 q

some river

q

every city traverse

∗∗ Quantification (wide)

17

slide-67
SLIDE 67

Solution: Mark-Execute

Some river traverses every city. Execute at semantic scope Mark at syntactic scope

x21 x21 2 1 1 1 q

some river

q

every city traverse

∗∗ Quantification (wide) Analogy: Montague’s quantifying in, Carpenter’s scoping constructor

17

slide-68
SLIDE 68

Outline

Representation

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Learning

x θ z w y

Experiments

18

slide-69
SLIDE 69

Graphical Model

z

1 2 1 1

CA capital

∗∗

database w

19

slide-70
SLIDE 70

Graphical Model

z

1 2 1 1

CA capital

∗∗

database w y Sacramento

19

slide-71
SLIDE 71

Graphical Model

z

1 2 1 1

CA capital

∗∗

database w y Sacramento Interpretation: p(y | z, w) (deterministic)

19

slide-72
SLIDE 72

Graphical Model

x capital of California? z

1 2 1 1

CA capital

∗∗

database w y Sacramento Interpretation: p(y | z, w) (deterministic)

19

slide-73
SLIDE 73

Graphical Model

x capital of California? parameters θ z

1 2 1 1

CA capital

∗∗

database w y Sacramento Interpretation: p(y | z, w) (deterministic)

19

slide-74
SLIDE 74

Graphical Model

x capital of California? parameters θ z

1 2 1 1

CA capital

∗∗

database w y Sacramento Semantic Parsing: p(z | x, θ) (probabilistic) Interpretation: p(y | z, w) (deterministic)

19

slide-75
SLIDE 75

Plan

x capital of California? parameters θ z

1 2 1 1

CA capital

∗∗

database w y Sacramento

  • What’s possible? z ∈ Z(x)
  • What’s probable? p(z | x, θ)
  • Learning θ from (x, y) data

20

slide-76
SLIDE 76

Words to Predicates (Lexical Semantics)

What is the most populous city in CA ?

21

slide-77
SLIDE 77

Words to Predicates (Lexical Semantics)

CA

What is the most populous city in CA ? Lexical Triggers:

  • 1. String match

CA ⇒ CA

21

slide-78
SLIDE 78

Words to Predicates (Lexical Semantics)

argmax CA

What is the most populous city in CA ? Lexical Triggers:

  • 1. String match

CA ⇒ CA

  • 2. Function words (20 words) most ⇒ argmax

21

slide-79
SLIDE 79

Words to Predicates (Lexical Semantics)

city city state state river river argmax population population CA

What is the most populous city in CA ? Lexical Triggers:

  • 1. String match

CA ⇒ CA

  • 2. Function words (20 words) most ⇒ argmax
  • 3. Nouns/adjectives

city ⇒ city state river population

21

slide-80
SLIDE 80

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j

22

slide-81
SLIDE 81

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k

22

slide-82
SLIDE 82

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

22

slide-83
SLIDE 83

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

22

slide-84
SLIDE 84

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 1 1 c

argmax population

2 1

CA loc city

22

slide-85
SLIDE 85

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 1 2 c

argmax population

2 1

CA loc city

22

slide-86
SLIDE 86

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 1 1 2 1 c

argmax population loc

2 1

CA loc city

22

slide-87
SLIDE 87

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 1 2 1 1 c

argmax population loc

2 1

CA loc city

22

slide-88
SLIDE 88

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 1 2 1 1 c

argmax population border

2 1

CA loc city

22

slide-89
SLIDE 89

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 c

argmax

1 1 2 1

CA loc city population

22

slide-90
SLIDE 90

Plan

x capital of California? parameters θ z

1 2 1 1

CA capital

∗∗

database w y Sacramento

  • What’s possible? z ∈ Z(x)
  • What’s probable? p(z | x, θ)
  • Learning θ from (x, y) data

23

slide-91
SLIDE 91

Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

24

slide-92
SLIDE 92

Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

)

∈ Rd

24

slide-93
SLIDE 93

Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

in

loc

: 1

)

∈ Rd

24

slide-94
SLIDE 94

Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

in

loc

: 1

1 1 loc

city

: 1)

∈ Rd

24

slide-95
SLIDE 95

Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

in

loc

: 1

1 1 loc

city

: 1 · · ·

)

∈ Rd

24

slide-96
SLIDE 96

Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

in

loc

: 1

1 1 loc

city

: 1 · · ·

)

∈ Rd score(x, z) = features(x, z) · θ

24

slide-97
SLIDE 97

Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

in

loc

: 1

1 1 loc

city

: 1 · · ·

)

∈ Rd score(x, z) = features(x, z) · θ p(z | x, θ) =

escore(x,z) P

z02Z(x) escore(x,z0)

24

slide-98
SLIDE 98

Plan

x capital of California? parameters θ z

1 2 1 1

CA capital

∗∗

database w y Sacramento

  • What’s possible? z ∈ Z(x)
  • What’s probable? p(z | x, θ)
  • Learning θ from (x, y) data

25

slide-99
SLIDE 99

Learning

Objective Function:

p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing

26

slide-100
SLIDE 100

Learning

Objective Function:

maxθ p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing

26

slide-101
SLIDE 101

Learning

Objective Function:

maxθ P

z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing

26

slide-102
SLIDE 102

Learning

Objective Function:

maxθ P

z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ (0, 0, . . . , 0)

26

slide-103
SLIDE 103

Learning

Objective Function:

maxθ P

z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ (0, 0, . . . , 0) enumerate/score DCS trees

26

slide-104
SLIDE 104

Learning

Objective Function:

maxθ P

z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0, 0, . . . , 0) enumerate/score DCS trees

tree1 tree2 tree3 tree4 tree5

26

slide-105
SLIDE 105

Learning

Objective Function:

maxθ P

z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.2, −1.3, . . . , 0.7) enumerate/score DCS trees numerical optimization (L-BFGS)

tree1 tree2 tree3 tree4 tree5

26

slide-106
SLIDE 106

Learning

Objective Function:

maxθ P

z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.2, −1.3, . . . , 0.7) enumerate/score DCS trees numerical optimization (L-BFGS)

tree3 tree8 tree6 tree2 tree4

26

slide-107
SLIDE 107

Learning

Objective Function:

maxθ P

z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.3, −1.4, . . . , 0.6) enumerate/score DCS trees numerical optimization (L-BFGS)

tree3 tree8 tree6 tree2 tree4

26

slide-108
SLIDE 108

Learning

Objective Function:

maxθ P

z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.3, −1.4, . . . , 0.6) enumerate/score DCS trees numerical optimization (L-BFGS)

tree3 tree8 tree2 tree4 tree9

26

slide-109
SLIDE 109

Outline

Representation

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Learning

x θ z w y

Experiments

27

slide-110
SLIDE 110

US Geography Benchmark

Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples

28

slide-111
SLIDE 111

US Geography Benchmark

Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples What is the highest point in Florida? How many states have a city called Rochester? What is the longest river that runs through a state that borders Tennessee? Of the states washed by the Mississippi river which has the lowest point? · · ·

28

slide-112
SLIDE 112

US Geography Benchmark

Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples What is the highest point in Florida? ⇒ answer(A,highest(A,(place(A),loc(A,B),const(B,stateid(florida))))) How many states have a city called Rochester? ⇒ answer(A,count(B,(state(B),loc(C,B),const(C,cityid(rochester, ))),A)) What is the longest river that runs through a state that borders Tennessee? ⇒ answer(A,longest(A,(river(A),traverse(A,B),state(B),next to(B,C),const(C,stateid(tennessee))))) Of the states washed by the Mississippi river which has the lowest point? ⇒ answer(A,lowest(B,(state(A),traverse(C,A),const(C,riverid(mississippi)),loc(B,A),place(B)))) · · · Supervision in past work: question + program

28

slide-113
SLIDE 113

US Geography Benchmark

Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · Supervision in past work: question + program Supervision in this work: question + answer

28

slide-114
SLIDE 114

Input to Learning Algorithm

Training data (600 examples)

What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · · · ·

29

slide-115
SLIDE 115

Input to Learning Algorithm

Training data (600 examples)

What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · · · ·

Lexicon (75 words)

city

⇒ city

state

⇒ state

mountain ⇒ mountain, peak

· · · · · ·

29

slide-116
SLIDE 116

Input to Learning Algorithm

Training data (600 examples)

What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · · · ·

Lexicon (75 words)

city

⇒ city

state

⇒ state

mountain ⇒ mountain, peak

· · · · · ·

Database

city

San Francisco Chicago Boston · · ·

state

Alabama Alaska Arizona · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

border

Washington Oregon Washington Idaho Oregon Washington · · · · · · · · · · · ·

29

slide-117
SLIDE 117

Experiment 1

On Geo, 250 training examples, 250 test examples

75 80 85 90 95 100

test accuracy

30

slide-118
SLIDE 118

Experiment 1

On Geo, 250 training examples, 250 test examples System Description Lexicon (gen./spec.) Logical forms cgcr10 FunQL [Clarke et al., 2010]

cgcr10

73.2%

75 80 85 90 95 100

test accuracy

30

slide-119
SLIDE 119

Experiment 1

On Geo, 250 training examples, 250 test examples System Description Lexicon (gen./spec.) Logical forms cgcr10 FunQL [Clarke et al., 2010] dcs

  • ur system

cgcr10

73.2%

dcs

78.9%

75 80 85 90 95 100

test accuracy

30

slide-120
SLIDE 120

Experiment 1

On Geo, 250 training examples, 250 test examples System Description Lexicon (gen./spec.) Logical forms cgcr10 FunQL [Clarke et al., 2010] dcs

  • ur system

dcs+

  • ur system

cgcr10

73.2%

dcs

78.9%

dcs+

87.2%

75 80 85 90 95 100

test accuracy

30

slide-121
SLIDE 121

Experiment 2

On Geo, 600 training examples, 280 test examples

31

slide-122
SLIDE 122

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms

75 80 85 90 95 100

test accuracy

31

slide-123
SLIDE 123

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005]

zc05

79.3%

75 80 85 90 95 100

test accuracy

31

slide-124
SLIDE 124

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007]

zc05

79.3%

zc07

86.1%

75 80 85 90 95 100

test accuracy

31

slide-125
SLIDE 125

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007] kzgs10 CCG w/unification [Kwiatkowski et al., 2010]

zc05

79.3%

zc07

86.1%

kzgs10

88.9%

75 80 85 90 95 100

test accuracy

31

slide-126
SLIDE 126

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007] kzgs10 CCG w/unification [Kwiatkowski et al., 2010] dcs

  • ur system

zc05

79.3%

zc07

86.1%

kzgs10

88.9%

dcs

88.6%

75 80 85 90 95 100

test accuracy

31

slide-127
SLIDE 127

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007] kzgs10 CCG w/unification [Kwiatkowski et al., 2010] dcs

  • ur system

dcs+

  • ur system

zc05

79.3%

zc07

86.1%

kzgs10

88.9%

dcs

88.6%

dcs+

91.1%

75 80 85 90 95 100

test accuracy

31

slide-128
SLIDE 128

Some Intuition on Learning

32

slide-129
SLIDE 129

Some Intuition on Learning

parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists

32

slide-130
SLIDE 130

Some Intuition on Learning

parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists

If no DCS tree on k-best list is correct, skip example in (2)

32

slide-131
SLIDE 131

Some Intuition on Learning

parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists

If no DCS tree on k-best list is correct, skip example in (2)

1 2 3 4

iteration

20 40 60 80 100

% examples trained on

32

slide-132
SLIDE 132

Some Intuition on Learning

parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists

If no DCS tree on k-best list is correct, skip example in (2)

1 2 3 4

iteration

20 40 60 80 100

% examples trained on

Effect: automatic curriculum learning, learning improves search

32

slide-133
SLIDE 133

Current Limitations

29

slide-134
SLIDE 134

Current Limitations

Only using forward information Execute program to get answer, but want to invert

29

slide-135
SLIDE 135

Current Limitations

Only using forward information Execute program to get answer, but want to invert Non-identifiability of program If all cities in database are in US, then can’t distinguish {c : city(c)} and {c : city(c) ∧ loc(c, US)}

29

slide-136
SLIDE 136

Current Limitations

Only using forward information Execute program to get answer, but want to invert Non-identifiability of program If all cities in database are in US, then can’t distinguish {c : city(c)} and {c : city(c) ∧ loc(c, US)} Unknown facts: How far is Los Angeles from Boston? Database has no distance information

29

slide-137
SLIDE 137

Current Limitations

Only using forward information Execute program to get answer, but want to invert Non-identifiability of program If all cities in database are in US, then can’t distinguish {c : city(c)} and {c : city(c) ∧ loc(c, US)} Unknown facts: How far is Los Angeles from Boston? Database has no distance information Unknown concepts: What states are landlocked? Need to induce database view for landlocked(x) = ¬border(x, ocean)

29

slide-138
SLIDE 138

Conclusion

Goal: learn to answer questions from question/answer pairs

30

slide-139
SLIDE 139

Conclusion

Goal: learn to answer questions from question/answer pairs Empirical result: DCS (no logical forms) u existing systems (with logical forms)

30

slide-140
SLIDE 140

Conclusion

Goal: learn to answer questions from question/answer pairs Empirical result: DCS (no logical forms) u existing systems (with logical forms) Conceptual contribution: DCS trees

  • Trees: connects dependency syntax with efficient evaluation

30

slide-141
SLIDE 141

Conclusion

Goal: learn to answer questions from question/answer pairs Empirical result: DCS (no logical forms) u existing systems (with logical forms) Conceptual contribution: DCS trees

  • Trees: connects dependency syntax with efficient evaluation
  • Mark-Execute: unifying framework for handling scope

30

slide-142
SLIDE 142

2 1

you thank

35