Learning Compositional Semantics CS224U: Natural Language - - PowerPoint PPT Presentation

learning compositional semantics
SMART_READER_LITE
LIVE PREVIEW

Learning Compositional Semantics CS224U: Natural Language - - PowerPoint PPT Presentation

Learning Compositional Semantics CS224U: Natural Language Understanding Feb. 9, 2012 Percy Liang Google/Stanford Review Last time: Mapping sentences to logical forms (FOL or lambda calculus) Alaska borders no states. x. state ( x )


slide-1
SLIDE 1

Learning Compositional Semantics

CS224U: Natural Language Understanding

  • Feb. 9, 2012

Percy Liang

Google/Stanford

slide-2
SLIDE 2

Review

Last time: Mapping sentences to logical forms (FOL or lambda calculus) Alaska borders no states.

¬∃x.state(x) ∧ border(AK, x)

2

slide-3
SLIDE 3

Review

Last time: Mapping sentences to logical forms (FOL or lambda calculus) Alaska borders no states.

¬∃x.state(x) ∧ border(AK, x)

We assumed the following were given: Lexicon: no ⇒ dt : λP.λQ.¬∃x.P(x) ∧ Q(x) states ⇒ n : λx.state(x)

2

slide-4
SLIDE 4

Review

Last time: Mapping sentences to logical forms (FOL or lambda calculus) Alaska borders no states.

¬∃x.state(x) ∧ border(AK, x)

We assumed the following were given: Lexicon: no ⇒ dt : λP.λQ.¬∃x.P(x) ∧ Q(x) states ⇒ n : λx.state(x) Grammar: dt : f n : a ⇒ np : f(a)

2

slide-5
SLIDE 5

Review

Last time: Mapping sentences to logical forms (FOL or lambda calculus) Alaska borders no states.

¬∃x.state(x) ∧ border(AK, x)

We assumed the following were given: Lexicon: no ⇒ dt : λP.λQ.¬∃x.P(x) ∧ Q(x) states ⇒ n : λx.state(x) Grammar: dt : f n : a ⇒ np : f(a) Questions: But where do they come from? What if a sentence generates multiple logical forms? What if a sentences is slightly ungrammatical?

2

slide-6
SLIDE 6

Outline

Today: building real semantic parsers! sentence Semantic Parser logical form

3

slide-7
SLIDE 7

Outline

Today: building real semantic parsers! sentence Semantic Parser logical form Strategy: break up complex mapping into two parts

3

slide-8
SLIDE 8

Outline

Today: building real semantic parsers! sentence Semantic Parser logical form Strategy: break up complex mapping into two parts Representation (Lexicon/Grammar):

  • Should be simple, require minimal human effort
  • Generates set of candidate logical forms

Allow overgeneration: state ⇒ n : λx.river(x)

3

slide-9
SLIDE 9

Outline

Today: building real semantic parsers! sentence Semantic Parser logical form Strategy: break up complex mapping into two parts Representation (Lexicon/Grammar):

  • Should be simple, require minimal human effort
  • Generates set of candidate logical forms

Allow overgeneration: state ⇒ n : λx.river(x) Learning:

  • Score/rank candidates based on features
  • Optimize feature weights discriminatively to minimize training error

3

slide-10
SLIDE 10

Outline

Representation

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Learning

x θ z w y

Experiments

4

slide-11
SLIDE 11

Semantic Formalisms

sentence Semantic Parser logical form

5

slide-12
SLIDE 12

Semantic Formalisms

sentence Semantic Parser logical form Interpretation denotation

5

slide-13
SLIDE 13

Semantic Formalisms

sentence Semantic Parser logical form Interpretation denotation We are free to choose the semantic formalism:

  • What kind of logical forms? FOL? lambda calculus?
  • What constitutes the lexicon and grammar?

5

slide-14
SLIDE 14

Semantic Formalisms

sentence Semantic Parser logical form Interpretation denotation We are free to choose the semantic formalism:

  • What kind of logical forms? FOL? lambda calculus?
  • What constitutes the lexicon and grammar?

Desiderata: Model-theoretic: logical forms must have formal interpretation (mapping from world to true/false)

5

slide-15
SLIDE 15

Semantic Formalisms

sentence Semantic Parser logical form Interpretation denotation We are free to choose the semantic formalism:

  • What kind of logical forms? FOL? lambda calculus?
  • What constitutes the lexicon and grammar?

Desiderata: Model-theoretic: logical forms must have formal interpretation (mapping from world to true/false) Compositional: meaning (logical form) of phrase computed from combining meaning of sub-phrases

5

slide-16
SLIDE 16

Semantic Formalisms

sentence Semantic Parser logical form Interpretation denotation We are free to choose the semantic formalism:

  • What kind of logical forms? FOL? lambda calculus?
  • What constitutes the lexicon and grammar?

Desiderata: Model-theoretic: logical forms must have formal interpretation (mapping from world to true/false) Compositional: meaning (logical form) of phrase computed from combining meaning of sub-phrases Semantic Formalisms:

  • Combinatory Categorial Grammar (CCG)
  • Dependency-Based Compositional Semantics (DCS)

5

slide-17
SLIDE 17

Combinatory Categorial Grammar (CCG)

Lexicalized formalism: simple grammar rules, heavy lexicon

6

slide-18
SLIDE 18

Combinatory Categorial Grammar (CCG)

Lexicalized formalism: simple grammar rules, heavy lexicon Categories (analogous to types in programming languages): np vp ⇒ s

6

slide-19
SLIDE 19

Combinatory Categorial Grammar (CCG)

Lexicalized formalism: simple grammar rules, heavy lexicon Categories (analogous to types in programming languages): np vp ⇒ s np s\np ⇒ s

6

slide-20
SLIDE 20

Combinatory Categorial Grammar (CCG)

Lexicalized formalism: simple grammar rules, heavy lexicon Categories (analogous to types in programming languages): np vp ⇒ s np s\np ⇒ s v np ⇒ vp

6

slide-21
SLIDE 21

Combinatory Categorial Grammar (CCG)

Lexicalized formalism: simple grammar rules, heavy lexicon Categories (analogous to types in programming languages): np vp ⇒ s np s\np ⇒ s v np ⇒ vp (s\np)/np np ⇒ s\np

6

slide-22
SLIDE 22

Combinatory Categorial Grammar (CCG)

Lexicalized formalism: simple grammar rules, heavy lexicon Categories (analogous to types in programming languages): np vp ⇒ s np s\np ⇒ s v np ⇒ vp (s\np)/np np ⇒ s\np In general: Base categories: s, np, n Derived categories: if X, Y are categories, then X/Y and X\Y are too

6

slide-23
SLIDE 23

Combinatory Categorial Grammar (CCG)

Lexicon: Alice np : alice Bob np : bob saw (s\np)/np : λy.λx.saw(x, y)

7

slide-24
SLIDE 24

Combinatory Categorial Grammar (CCG)

Lexicon: Alice np : alice Bob np : bob saw (s\np)/np : λy.λx.saw(x, y) Grammar (template): Forward application (>) Y/X : f X : a ⇒ Y : f(a) Backward application (<) X : a Y \X : f ⇒ Y : f(a)

7

slide-25
SLIDE 25

Combinatory Categorial Grammar (CCG)

Lexicon: Alice np : alice Bob np : bob saw (s\np)/np : λy.λx.saw(x, y) Grammar (template): Forward application (>) Y/X : f X : a ⇒ Y : f(a) Backward application (<) X : a Y \X : f ⇒ Y : f(a) Derivation: Alice np : alice saw (s\np)/np : λy.λx.saw(x, y) Bob np : bob s\np : λx.saw(x, bob)

>

s : saw(alice, bob)

<

7

slide-26
SLIDE 26

Combinatory Categorial Grammar (CCG)

More grammar rule templates: Forward composition (B>) Y/X : f X/Z : a ⇒ Y/Z : λz.f(a(z))

8

slide-27
SLIDE 27

Combinatory Categorial Grammar (CCG)

More grammar rule templates: Forward composition (B>) Y/X : f X/Z : a ⇒ Y/Z : λz.f(a(z)) Type raising (T>) X : a ⇒ Y/(Y \X) : λf.f(a)

8

slide-28
SLIDE 28

Combinatory Categorial Grammar (CCG)

More grammar rule templates: Forward composition (B>) Y/X : f X/Z : a ⇒ Y/Z : λz.f(a(z)) Type raising (T>) X : a ⇒ Y/(Y \X) : λf.f(a) Alice np : alice s/(s\np) : λf.f(alice)

T >

saw (s\np)/np : λy.λx.saw(x, y) s/np : λy.saw(alice, y)

B >

Bob np : bob s : saw(alice, bob)

>

8

slide-29
SLIDE 29

Combinatory Categorial Grammar (CCG)

More grammar rule templates: Forward composition (B>) Y/X : f X/Z : a ⇒ Y/Z : λz.f(a(z)) Type raising (T>) X : a ⇒ Y/(Y \X) : λf.f(a) Alice np : alice s/(s\np) : λf.f(alice)

T >

saw (s\np)/np : λy.λx.saw(x, y) s/np : λy.saw(alice, y)

B >

Bob np : bob s : saw(alice, bob)

>

Composition creates non-traditional bracketing useful for right-node raising: s : saw(alice, bob) ∧ heard(carol, bob) [[Alice saw] and [Carol heard]] Bob.

8

slide-30
SLIDE 30

CCG Meets Real Data

Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston

9

slide-31
SLIDE 31

CCG Meets Real Data

Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston Solution: identity functions: show me ⇒ n/n : λf.f

9

slide-32
SLIDE 32

CCG Meets Real Data

Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston Solution: identity functions: show me ⇒ n/n : λf.f Missing content: λx.flight(x) ∧ to(x, boston) Boston flights

9

slide-33
SLIDE 33

CCG Meets Real Data

Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston Solution: identity functions: show me ⇒ n/n : λf.f Missing content: λx.flight(x) ∧ to(x, boston) Boston flights Solution: type-raising: np : x ⇒ np/n : λf.λa.f(a) ∧ to(a, x)

9

slide-34
SLIDE 34

CCG Meets Real Data

Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston Solution: identity functions: show me ⇒ n/n : λf.f Missing content: λx.flight(x) ∧ to(x, boston) Boston flights Solution: type-raising: np : x ⇒ np/n : λf.λa.f(a) ∧ to(a, x) Non-standard ordering: λx.flight(x) ∧ oneway(x) flights one-way

9

slide-35
SLIDE 35

CCG Meets Real Data

Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston Solution: identity functions: show me ⇒ n/n : λf.f Missing content: λx.flight(x) ∧ to(x, boston) Boston flights Solution: type-raising: np : x ⇒ np/n : λf.λa.f(a) ∧ to(a, x) Non-standard ordering: λx.flight(x) ∧ oneway(x) flights one-way Solution: disharmonic combinators: X : a Y /X : f ⇒ Y : f(a)

9

slide-36
SLIDE 36

Dependency-Based Compositional Semantics (DCS)

What is the most populous city in California?

10

slide-37
SLIDE 37

Dependency-Based Compositional Semantics (DCS)

What is the most populous city in California?

1 1 1 1 c

argmax population

2 1

CA loc city

10

slide-38
SLIDE 38

How to interpret the logical form?

What is the most populous city in California?

1 1 1 1 c

argmax population

2 1

CA loc city

11

slide-39
SLIDE 39

How to interpret the logical form?

What is the most populous city in California?

1 1 1 1 c

argmax population

2 1

CA loc city

Los Angeles

11

slide-40
SLIDE 40

How to interpret the logical form?

What is the most populous city in California?

1 1 1 1 c

argmax population

2 1

CA loc city

World/Database Los Angeles

11

slide-41
SLIDE 41

World/Database

city

San Francisco Chicago Boston · · ·

state

Alabama Alaska Arizona · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

border

Washington Oregon Washington Idaho Oregon Washington · · · · · · · · · · · ·

12

slide-42
SLIDE 42

Basic DCS Trees

DCS tree

city

1 1

loc

2 1

CA

Database

13

slide-43
SLIDE 43

Basic DCS Trees

DCS tree Constraints

city

1 1

loc

2 1

CA

Database A DCS tree encodes a constraint satisfaction problem (CSP)

13

slide-44
SLIDE 44

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

loc

2 1

CA

Database

city

San Francisco Chicago Boston · · ·

A DCS tree encodes a constraint satisfaction problem (CSP)

13

slide-45
SLIDE 45

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

loc

ℓ ∈ loc

2 1

CA

Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

A DCS tree encodes a constraint satisfaction problem (CSP)

13

slide-46
SLIDE 46

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

loc

ℓ ∈ loc

2 1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

13

slide-47
SLIDE 47

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = ℓ1

loc

ℓ ∈ loc

2 1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

13

slide-48
SLIDE 48

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = ℓ1

loc

ℓ ∈ loc

2 1

ℓ2 = s1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

13

slide-49
SLIDE 49

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = ℓ1

loc

ℓ ∈ loc

2 1

ℓ2 = s1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

13

slide-50
SLIDE 50

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = ℓ1

loc

ℓ ∈ loc

2 1

ℓ2 = s1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

13

slide-51
SLIDE 51

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = ℓ1

loc

ℓ ∈ loc

2 1

ℓ2 = s1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP)

13

slide-52
SLIDE 52

Basic DCS Trees

DCS tree Constraints

city

c ∈ city

1 1

c1 = ℓ1

loc

ℓ ∈ loc

2 1

ℓ2 = s1

CA

s ∈ CA Database

city

San Francisco Chicago Boston · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

CA

California

A DCS tree encodes a constraint satisfaction problem (CSP) Computation: dynamic programming ⇒ time = O(# nodes)

13

slide-53
SLIDE 53

Properties of DCS Trees

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

14

slide-54
SLIDE 54

Properties of DCS Trees

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Trees

14

slide-55
SLIDE 55

Properties of DCS Trees

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Linguistics syntactic locality Trees

14

slide-56
SLIDE 56

Properties of DCS Trees

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Linguistics syntactic locality Trees Computation efficient interpretation

14

slide-57
SLIDE 57

Divergence between Syntactic and Semantic Scope

most populous city in California

15

slide-58
SLIDE 58

Divergence between Syntactic and Semantic Scope

most populous city in California Syntax most populous California in city

15

slide-59
SLIDE 59

Divergence between Syntactic and Semantic Scope

most populous city in California Syntax Semantics most populous California in city

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

15

slide-60
SLIDE 60

Divergence between Syntactic and Semantic Scope

most populous city in California Syntax Semantics most populous California in city

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

15

slide-61
SLIDE 61

Divergence between Syntactic and Semantic Scope

most populous city in California Syntax Semantics most populous California in city

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

Problem: syntactic scope is lower than semantic scope

15

slide-62
SLIDE 62

Divergence between Syntactic and Semantic Scope

most populous city in California Syntax Semantics most populous California in city

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

Problem: syntactic scope is lower than semantic scope If DCS trees look like syntax, how do we get correct semantics?

15

slide-63
SLIDE 63

Solution: Mark-Execute

most populous city in California

x1 x1 1 1 1 1 c

argmax population

2 1

CA loc city

∗∗ Superlatives

16

slide-64
SLIDE 64

Solution: Mark-Execute

most populous city in California Mark at syntactic scope

x1 x1 1 1 1 1 c

argmax population

2 1

CA loc city

∗∗ Superlatives

16

slide-65
SLIDE 65

Solution: Mark-Execute

most populous city in California Execute at semantic scope Mark at syntactic scope

x1 x1 1 1 1 1 c

argmax population

2 1

CA loc city

∗∗ Superlatives

16

slide-66
SLIDE 66

Solution: Mark-Execute

Alaska borders no states. Execute at semantic scope Mark at syntactic scope

x1 x1 2 1 1 1

AK

q

no state border

∗∗ Negation

16

slide-67
SLIDE 67

Solution: Mark-Execute

Some river traverses every city. Execute at semantic scope Mark at syntactic scope

x12 x12 2 1 1 1 q

some river

q

every city traverse

∗∗ Quantification (narrow)

16

slide-68
SLIDE 68

Solution: Mark-Execute

Some river traverses every city. Execute at semantic scope Mark at syntactic scope

x21 x21 2 1 1 1 q

some river

q

every city traverse

∗∗ Quantification (wide)

16

slide-69
SLIDE 69

Solution: Mark-Execute

Some river traverses every city. Execute at semantic scope Mark at syntactic scope

x21 x21 2 1 1 1 q

some river

q

every city traverse

∗∗ Quantification (wide) Analogy: Montague’s quantifying in, Carpenter’s scoping constructor

16

slide-70
SLIDE 70

From Sentences to DCS Trees

Lexicon (very simple/crude) no ⇒

no

state ⇒

state

17

slide-71
SLIDE 71

From Sentences to DCS Trees

Lexicon (very simple/crude) no ⇒

no

state ⇒

state

Grammar (very simple/crude) a b ⇒

i j

b a

a b ⇒

i j

a b

17

slide-72
SLIDE 72

From Sentences to DCS Trees

Lexicon (very simple/crude) no ⇒

no

state ⇒

state

Grammar (very simple/crude) a b ⇒

i j k l

b c a

a b ⇒

i j k l

a c b

17

slide-73
SLIDE 73

Words to Predicates (Lexical Semantics)

What is the most populous city in CA ?

18

slide-74
SLIDE 74

Words to Predicates (Lexical Semantics)

CA

What is the most populous city in CA ? Lexical Triggers:

  • 1. String match

CA ⇒ CA

18

slide-75
SLIDE 75

Words to Predicates (Lexical Semantics)

argmax CA

What is the most populous city in CA ? Lexical Triggers:

  • 1. String match

CA ⇒ CA

  • 2. Function words (20 words) most ⇒ argmax

18

slide-76
SLIDE 76

Words to Predicates (Lexical Semantics)

city city state state river river argmax population population CA

What is the most populous city in CA ? Lexical Triggers:

  • 1. String match

CA ⇒ CA

  • 2. Function words (20 words) most ⇒ argmax
  • 3. Nouns/adjectives

city ⇒ city state river population

18

slide-77
SLIDE 77

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j

19

slide-78
SLIDE 78

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k

19

slide-79
SLIDE 79

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

19

slide-80
SLIDE 80

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

19

slide-81
SLIDE 81

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 1 1 c

argmax population

2 1

CA loc city

19

slide-82
SLIDE 82

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 1 2 c

argmax population

2 1

CA loc city

19

slide-83
SLIDE 83

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 1 1 2 1 c

argmax population loc

2 1

CA loc city

19

slide-84
SLIDE 84

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 1 2 1 1 c

argmax population loc

2 1

CA loc city

19

slide-85
SLIDE 85

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 1 2 1 1 c

argmax population border

2 1

CA loc city

19

slide-86
SLIDE 86

Predicates to DCS Trees (Compositional Semantics)

Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j

c

argmax population

1 1 2 1

CA loc city

1 1 c

argmax

1 1 2 1

CA loc city population

19

slide-87
SLIDE 87

Comparison

CCG DCS

20

slide-88
SLIDE 88

Comparison

CCG DCS Logical form lambda calculus formulae DCS trees λx.city(x) ∧ loc(x, CA)

1 1 2 1 CA

loc city

20

slide-89
SLIDE 89

Comparison

CCG DCS Logical form lambda calculus formulae DCS trees λx.city(x) ∧ loc(x, CA)

1 1 2 1 CA

loc city

Lexicon categories + lambda calculus predicates major n/n : λf.λx.f(x) ∧ major(x)

major

20

slide-90
SLIDE 90

Comparison

CCG DCS Logical form lambda calculus formulae DCS trees λx.city(x) ∧ loc(x, CA)

1 1 2 1 CA

loc city

Lexicon categories + lambda calculus predicates major n/n : λf.λx.f(x) ∧ major(x)

major

Grammar combinator rules ≅ dependency parsing Y/X : a X : b ⇒ Y : a(b)

i j

b a

20

slide-91
SLIDE 91

Comparison

CCG DCS Logical form lambda calculus formulae DCS trees λx.city(x) ∧ loc(x, CA)

1 1 2 1 CA

loc city

Lexicon categories + lambda calculus predicates major n/n : λf.λx.f(x) ∧ major(x)

major

Grammar combinator rules ≅ dependency parsing Y/X : a X : b ⇒ Y : a(b)

i j

b a

Nature tighter control simple/permissive

20

slide-92
SLIDE 92

Comparison

CCG DCS Logical form lambda calculus formulae DCS trees λx.city(x) ∧ loc(x, CA)

1 1 2 1 CA

loc city

Lexicon categories + lambda calculus predicates major n/n : λf.λx.f(x) ∧ major(x)

major

Grammar combinator rules ≅ dependency parsing Y/X : a X : b ⇒ Y : a(b)

i j

b a

Nature tighter control simple/permissive Origin linguistics NLP

20

slide-93
SLIDE 93

Outline

Representation

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Learning

x θ z w y

Experiments

21

slide-94
SLIDE 94

Supervision

Detailed Supervision

What is the largest city in California?

argmax({c : city(c) ∧ loc(c, CA)}, population)

22

slide-95
SLIDE 95

Supervision

Detailed Supervision

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

22

slide-96
SLIDE 96

Supervision

Detailed Supervision

  • doesn’t scale up

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

22

slide-97
SLIDE 97

Supervision

Detailed Supervision

  • doesn’t scale up

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

Natural Supervision

What is the largest city in California? Los Angeles

22

slide-98
SLIDE 98

Supervision

Detailed Supervision

  • doesn’t scale up

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

Natural Supervision

What is the largest city in California? non-expert Los Angeles

22

slide-99
SLIDE 99

Supervision

Detailed Supervision

  • doesn’t scale up

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

Natural Supervision

  • scales up

What is the largest city in California? non-expert Los Angeles

22

slide-100
SLIDE 100

Supervision

Detailed Supervision

  • doesn’t scale up
  • representation-dependent

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

Natural Supervision

  • scales up

What is the largest city in California? non-expert Los Angeles

22

slide-101
SLIDE 101

Supervision

Detailed Supervision

  • doesn’t scale up
  • representation-dependent

What is the largest city in California? expert

argmax({c : city(c) ∧ loc(c, CA)}, population)

Natural Supervision

  • scales up
  • representation-independent

What is the largest city in California? non-expert Los Angeles

22

slide-102
SLIDE 102

Considerations

Computational: how to efficiently search exponential space?

23

slide-103
SLIDE 103

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? Los Angeles

23

slide-104
SLIDE 104

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? λx.state(x) Los Angeles

23

slide-105
SLIDE 105

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? λx.city(x) Los Angeles

23

slide-106
SLIDE 106

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? λx.city(x) ∧ loc(x, CA) Los Angeles

23

slide-107
SLIDE 107

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? λx.state(x) ∧ border(x, CA) Los Angeles

23

slide-108
SLIDE 108

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California?

population(CA)

Los Angeles

23

slide-109
SLIDE 109

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California?

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

Los Angeles

23

slide-110
SLIDE 110

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? · · · LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF · · · Los Angeles

23

slide-111
SLIDE 111

Considerations

Computational: how to efficiently search exponential space? What is the most populous city in California? · · · LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF · · · Los Angeles Statistical: how to parametrize mapping from sentence to logical form? What is the most populous city in California?

argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))

23

slide-112
SLIDE 112

Graphical Model

z

1 2 1 1

CA capital

∗∗

world w

24

slide-113
SLIDE 113

Graphical Model

z

1 2 1 1

CA capital

∗∗

world w y Sacramento

24

slide-114
SLIDE 114

Graphical Model

z

1 2 1 1

CA capital

∗∗

world w y Sacramento Interpretation: p(y | z, w) (deterministic)

24

slide-115
SLIDE 115

Graphical Model

x capital of California? z

1 2 1 1

CA capital

∗∗

world w y Sacramento Interpretation: p(y | z, w) (deterministic)

24

slide-116
SLIDE 116

Graphical Model

x capital of California? parameters θ z

1 2 1 1

CA capital

∗∗

world w y Sacramento Interpretation: p(y | z, w) (deterministic)

24

slide-117
SLIDE 117

Graphical Model

x capital of California? parameters θ z

1 2 1 1

CA capital

∗∗

world w y Sacramento Semantic Parsing: p(z | x, θ) (probabilistic) Interpretation: p(y | z, w) (deterministic)

24

slide-118
SLIDE 118

Semantic Parsing Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

25

slide-119
SLIDE 119

Semantic Parsing Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

)

∈ Rd

25

slide-120
SLIDE 120

Semantic Parsing Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

in

loc

: 1

)

∈ Rd

25

slide-121
SLIDE 121

Semantic Parsing Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

in

loc

: 1

1 1 loc

city

: 1)

∈ Rd

25

slide-122
SLIDE 122

Semantic Parsing Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

in

loc

: 1

1 1 loc

city

: 1 · · ·

)

∈ Rd

25

slide-123
SLIDE 123

Semantic Parsing Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

in

loc

: 1

1 1 loc

city

: 1 · · ·

)

∈ Rd score(x, z) = features(x, z) · θ

25

slide-124
SLIDE 124

Semantic Parsing Log-linear Model

z:

city city loc CA

x: city in California

1 1 2 1

features(x, z) =(

in

loc

: 1

1 1 loc

city

: 1 · · ·

)

∈ Rd score(x, z) = features(x, z) · θ p(z | x, θ) =

escore(x,z)

  • z′∈Z(x) escore(x,z′)

25

slide-125
SLIDE 125

Learning

Objective Function:

p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing

26

slide-126
SLIDE 126

Learning

Objective Function:

maxθ p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing

26

slide-127
SLIDE 127

Learning

Objective Function:

maxθ

  • z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing

26

slide-128
SLIDE 128

Learning

Objective Function:

maxθ

  • z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ (0, 0, . . . , 0)

26

slide-129
SLIDE 129

Learning

Objective Function:

maxθ

  • z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ (0, 0, . . . , 0) enumerate/score DCS trees

26

slide-130
SLIDE 130

Learning

Objective Function:

maxθ

  • z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0, 0, . . . , 0) enumerate/score DCS trees

tree1 tree2 tree3 tree4 tree5

26

slide-131
SLIDE 131

Learning

Objective Function:

maxθ

  • z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.2, −1.3, . . . , 0.7) enumerate/score DCS trees numerical optimization (L-BFGS)

tree1 tree2 tree3 tree4 tree5

26

slide-132
SLIDE 132

Learning

Objective Function:

maxθ

  • z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.2, −1.3, . . . , 0.7) enumerate/score DCS trees numerical optimization (L-BFGS)

tree3 tree8 tree6 tree2 tree4

26

slide-133
SLIDE 133

Learning

Objective Function:

maxθ

  • z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.3, −1.4, . . . , 0.6) enumerate/score DCS trees numerical optimization (L-BFGS)

tree3 tree8 tree6 tree2 tree4

26

slide-134
SLIDE 134

Learning

Objective Function:

maxθ

  • z p(y | z, w) p(z | x, θ)

Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.3, −1.4, . . . , 0.6) enumerate/score DCS trees numerical optimization (L-BFGS)

tree3 tree8 tree2 tree4 tree9

26

slide-135
SLIDE 135

Outline

Representation

1 2 1 1 2 1 1 1 2 1

CA border state loc

1 1 1 1 1 1

major

2 1

AZ traverse river traverse city

Learning

x θ z w y

Experiments

27

slide-136
SLIDE 136

US Geography Benchmark

Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples

28

slide-137
SLIDE 137

US Geography Benchmark

Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples What is the highest point in Florida? How many states have a city called Rochester? What is the longest river that runs through a state that borders Tennessee? Of the states washed by the Mississippi river which has the lowest point? · · ·

28

slide-138
SLIDE 138

US Geography Benchmark

Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples What is the highest point in Florida? ⇒ answer(A,highest(A,(place(A),loc(A,B),const(B,stateid(florida))))) How many states have a city called Rochester? ⇒ answer(A,count(B,(state(B),loc(C,B),const(C,cityid(rochester, ))),A)) What is the longest river that runs through a state that borders Tennessee? ⇒ answer(A,longest(A,(river(A),traverse(A,B),state(B),next to(B,C),const(C,stateid(tennessee))))) Of the states washed by the Mississippi river which has the lowest point? ⇒ answer(A,lowest(B,(state(A),traverse(C,A),const(C,riverid(mississippi)),loc(B,A),place(B)))) · · · Supervision in past work: question + program

28

slide-139
SLIDE 139

US Geography Benchmark

Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · Supervision in past work: question + program Supervision in this work: question + answer

28

slide-140
SLIDE 140

Input to Learning Algorithm

Training data (600 examples)

What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · · · ·

29

slide-141
SLIDE 141

Input to Learning Algorithm

Training data (600 examples)

What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · · · ·

Lexicon (20 general, 22 specific)

no

⇒ no

argmax

⇒ most

city

⇒ city

state

⇒ state

mountain ⇒ mountain

· · · · · ·

29

slide-142
SLIDE 142

Input to Learning Algorithm

Training data (600 examples)

What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · · · ·

Lexicon (20 general, 22 specific)

no

⇒ no

argmax

⇒ most

city

⇒ city

state

⇒ state

mountain ⇒ mountain

· · · · · ·

World/Database

city

San Francisco Chicago Boston · · ·

state

Alabama Alaska Arizona · · ·

loc

Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·

border

Washington Oregon Washington Idaho Oregon Washington · · · · · · · · · · · ·

29

slide-143
SLIDE 143

Experiment 1

On Geo, 250 training examples, 250 test examples

75 80 85 90 95 100

test accuracy

30

slide-144
SLIDE 144

Experiment 1

On Geo, 250 training examples, 250 test examples System Description Lexicon (gen./spec.) Logical forms cgcr10 FunQL [Clarke et al., 2010]

cgcr10

73.2%

75 80 85 90 95 100

test accuracy

30

slide-145
SLIDE 145

Experiment 1

On Geo, 250 training examples, 250 test examples System Description Lexicon (gen./spec.) Logical forms cgcr10 FunQL [Clarke et al., 2010] ljk11 DCS [Liang et al., 2011]

cgcr10

73.2%

dcs

78.9%

75 80 85 90 95 100

test accuracy

30

slide-146
SLIDE 146

Experiment 1

On Geo, 250 training examples, 250 test examples System Description Lexicon (gen./spec.) Logical forms cgcr10 FunQL [Clarke et al., 2010] ljk11 DCS [Liang et al., 2011] ljk11+ DCS [Liang et al., 2011]

cgcr10

73.2%

dcs

78.9%

dcs+

87.2%

75 80 85 90 95 100

test accuracy

30

slide-147
SLIDE 147

Experiment 2

On Geo, 600 training examples, 280 test examples

31

slide-148
SLIDE 148

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms

75 80 85 90 95 100

test accuracy

31

slide-149
SLIDE 149

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005]

zc05

79.3%

75 80 85 90 95 100

test accuracy

31

slide-150
SLIDE 150

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007]

zc05

79.3%

zc07

86.1%

75 80 85 90 95 100

test accuracy

31

slide-151
SLIDE 151

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007] kzgs10 CCG w/unification [Kwiatkowski et al., 2010]

zc05

79.3%

zc07

86.1%

kzgs10

88.9%

75 80 85 90 95 100

test accuracy

31

slide-152
SLIDE 152

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007] kzgs10 CCG w/unification [Kwiatkowski et al., 2010] ljk11 DCS [Liang et al., 2011]

zc05

79.3%

zc07

86.1%

kzgs10

88.9%

dcs

88.6%

75 80 85 90 95 100

test accuracy

31

slide-153
SLIDE 153

Experiment 2

On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007] kzgs10 CCG w/unification [Kwiatkowski et al., 2010] ljk11 DCS [Liang et al., 2011] ljk11+ DCS [Liang et al., 2011]

zc05

79.3%

zc07

86.1%

kzgs10

88.9%

dcs

88.6%

dcs+

91.1%

75 80 85 90 95 100

test accuracy

31

slide-154
SLIDE 154

Some Intuition on Learning

32

slide-155
SLIDE 155

Some Intuition on Learning

parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists

32

slide-156
SLIDE 156

Some Intuition on Learning

parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists

If no DCS tree on k-best list is correct, skip example in (2)

32

slide-157
SLIDE 157

Some Intuition on Learning

parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists

If no DCS tree on k-best list is correct, skip example in (2)

1 2 3 4

iteration

20 40 60 80 100

% examples trained on

32

slide-158
SLIDE 158

Some Intuition on Learning

parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists

If no DCS tree on k-best list is correct, skip example in (2)

1 2 3 4

iteration

20 40 60 80 100

% examples trained on

Effect: automatic curriculum learning, learning improves search

32

slide-159
SLIDE 159

Current Limitations

Unknown facts: How far is Los Angeles from Boston? Database has no distance information

33

slide-160
SLIDE 160

Current Limitations

Unknown facts: How far is Los Angeles from Boston? Database has no distance information Unknown concepts: What states are landlocked? Need to induce database view for landlocked(x) = ¬border(x, ocean)

33

slide-161
SLIDE 161

Current Limitations

Unknown facts: How far is Los Angeles from Boston? Database has no distance information Unknown concepts: What states are landlocked? Need to induce database view for landlocked(x) = ¬border(x, ocean) Unknown words: What is the largest settlement in California? Training examples do not contain the word settlement

33

slide-162
SLIDE 162

Summary

sentence Semantic Parser logical form Interpretation denotation

34

slide-163
SLIDE 163

Summary

sentence Semantic Parser logical form Interpretation denotation Learning from Weak Supervision

  • Model logical form as latent variable
  • Semantic formalisms: CCG, DCS

34

slide-164
SLIDE 164

Summary

sentence Semantic Parser logical form Interpretation denotation Learning from Weak Supervision

  • Model logical form as latent variable
  • Semantic formalisms: CCG, DCS

Strategy:

  • Lexicon/grammar generates set of candidate logical forms
  • Learned feature weights capture linguistic generalizations

34