Learning Compositional Semantics
CS224U: Natural Language Understanding
- Feb. 9, 2012
Percy Liang
Google/Stanford
Learning Compositional Semantics CS224U: Natural Language - - PowerPoint PPT Presentation
Learning Compositional Semantics CS224U: Natural Language Understanding Feb. 9, 2012 Percy Liang Google/Stanford Review Last time: Mapping sentences to logical forms (FOL or lambda calculus) Alaska borders no states. x. state ( x )
CS224U: Natural Language Understanding
Google/Stanford
Last time: Mapping sentences to logical forms (FOL or lambda calculus) Alaska borders no states.
¬∃x.state(x) ∧ border(AK, x)
2
Last time: Mapping sentences to logical forms (FOL or lambda calculus) Alaska borders no states.
¬∃x.state(x) ∧ border(AK, x)
We assumed the following were given: Lexicon: no ⇒ dt : λP.λQ.¬∃x.P(x) ∧ Q(x) states ⇒ n : λx.state(x)
2
Last time: Mapping sentences to logical forms (FOL or lambda calculus) Alaska borders no states.
¬∃x.state(x) ∧ border(AK, x)
We assumed the following were given: Lexicon: no ⇒ dt : λP.λQ.¬∃x.P(x) ∧ Q(x) states ⇒ n : λx.state(x) Grammar: dt : f n : a ⇒ np : f(a)
2
Last time: Mapping sentences to logical forms (FOL or lambda calculus) Alaska borders no states.
¬∃x.state(x) ∧ border(AK, x)
We assumed the following were given: Lexicon: no ⇒ dt : λP.λQ.¬∃x.P(x) ∧ Q(x) states ⇒ n : λx.state(x) Grammar: dt : f n : a ⇒ np : f(a) Questions: But where do they come from? What if a sentence generates multiple logical forms? What if a sentences is slightly ungrammatical?
2
Today: building real semantic parsers! sentence Semantic Parser logical form
3
Today: building real semantic parsers! sentence Semantic Parser logical form Strategy: break up complex mapping into two parts
3
Today: building real semantic parsers! sentence Semantic Parser logical form Strategy: break up complex mapping into two parts Representation (Lexicon/Grammar):
Allow overgeneration: state ⇒ n : λx.river(x)
3
Today: building real semantic parsers! sentence Semantic Parser logical form Strategy: break up complex mapping into two parts Representation (Lexicon/Grammar):
Allow overgeneration: state ⇒ n : λx.river(x) Learning:
3
Representation
1 2 1 1 2 1 1 1 2 1
CA border state loc
1 1 1 1 1 1
major
2 1
AZ traverse river traverse city
Learning
x θ z w y
Experiments
4
sentence Semantic Parser logical form
5
sentence Semantic Parser logical form Interpretation denotation
5
sentence Semantic Parser logical form Interpretation denotation We are free to choose the semantic formalism:
5
sentence Semantic Parser logical form Interpretation denotation We are free to choose the semantic formalism:
Desiderata: Model-theoretic: logical forms must have formal interpretation (mapping from world to true/false)
5
sentence Semantic Parser logical form Interpretation denotation We are free to choose the semantic formalism:
Desiderata: Model-theoretic: logical forms must have formal interpretation (mapping from world to true/false) Compositional: meaning (logical form) of phrase computed from combining meaning of sub-phrases
5
sentence Semantic Parser logical form Interpretation denotation We are free to choose the semantic formalism:
Desiderata: Model-theoretic: logical forms must have formal interpretation (mapping from world to true/false) Compositional: meaning (logical form) of phrase computed from combining meaning of sub-phrases Semantic Formalisms:
5
Lexicalized formalism: simple grammar rules, heavy lexicon
6
Lexicalized formalism: simple grammar rules, heavy lexicon Categories (analogous to types in programming languages): np vp ⇒ s
6
Lexicalized formalism: simple grammar rules, heavy lexicon Categories (analogous to types in programming languages): np vp ⇒ s np s\np ⇒ s
6
Lexicalized formalism: simple grammar rules, heavy lexicon Categories (analogous to types in programming languages): np vp ⇒ s np s\np ⇒ s v np ⇒ vp
6
Lexicalized formalism: simple grammar rules, heavy lexicon Categories (analogous to types in programming languages): np vp ⇒ s np s\np ⇒ s v np ⇒ vp (s\np)/np np ⇒ s\np
6
Lexicalized formalism: simple grammar rules, heavy lexicon Categories (analogous to types in programming languages): np vp ⇒ s np s\np ⇒ s v np ⇒ vp (s\np)/np np ⇒ s\np In general: Base categories: s, np, n Derived categories: if X, Y are categories, then X/Y and X\Y are too
6
Lexicon: Alice np : alice Bob np : bob saw (s\np)/np : λy.λx.saw(x, y)
7
Lexicon: Alice np : alice Bob np : bob saw (s\np)/np : λy.λx.saw(x, y) Grammar (template): Forward application (>) Y/X : f X : a ⇒ Y : f(a) Backward application (<) X : a Y \X : f ⇒ Y : f(a)
7
Lexicon: Alice np : alice Bob np : bob saw (s\np)/np : λy.λx.saw(x, y) Grammar (template): Forward application (>) Y/X : f X : a ⇒ Y : f(a) Backward application (<) X : a Y \X : f ⇒ Y : f(a) Derivation: Alice np : alice saw (s\np)/np : λy.λx.saw(x, y) Bob np : bob s\np : λx.saw(x, bob)
>
s : saw(alice, bob)
<
7
More grammar rule templates: Forward composition (B>) Y/X : f X/Z : a ⇒ Y/Z : λz.f(a(z))
8
More grammar rule templates: Forward composition (B>) Y/X : f X/Z : a ⇒ Y/Z : λz.f(a(z)) Type raising (T>) X : a ⇒ Y/(Y \X) : λf.f(a)
8
More grammar rule templates: Forward composition (B>) Y/X : f X/Z : a ⇒ Y/Z : λz.f(a(z)) Type raising (T>) X : a ⇒ Y/(Y \X) : λf.f(a) Alice np : alice s/(s\np) : λf.f(alice)
T >
saw (s\np)/np : λy.λx.saw(x, y) s/np : λy.saw(alice, y)
B >
Bob np : bob s : saw(alice, bob)
>
8
More grammar rule templates: Forward composition (B>) Y/X : f X/Z : a ⇒ Y/Z : λz.f(a(z)) Type raising (T>) X : a ⇒ Y/(Y \X) : λf.f(a) Alice np : alice s/(s\np) : λf.f(alice)
T >
saw (s\np)/np : λy.λx.saw(x, y) s/np : λy.saw(alice, y)
B >
Bob np : bob s : saw(alice, bob)
>
Composition creates non-traditional bracketing useful for right-node raising: s : saw(alice, bob) ∧ heard(carol, bob) [[Alice saw] and [Carol heard]] Bob.
8
Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston
9
Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston Solution: identity functions: show me ⇒ n/n : λf.f
9
Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston Solution: identity functions: show me ⇒ n/n : λf.f Missing content: λx.flight(x) ∧ to(x, boston) Boston flights
9
Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston Solution: identity functions: show me ⇒ n/n : λf.f Missing content: λx.flight(x) ∧ to(x, boston) Boston flights Solution: type-raising: np : x ⇒ np/n : λf.λa.f(a) ∧ to(a, x)
9
Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston Solution: identity functions: show me ⇒ n/n : λf.f Missing content: λx.flight(x) ∧ to(x, boston) Boston flights Solution: type-raising: np : x ⇒ np/n : λf.λa.f(a) ∧ to(a, x) Non-standard ordering: λx.flight(x) ∧ oneway(x) flights one-way
9
Non-contentful words: λx.flight(x) ∧ to(x, boston) Show me flights to Boston Solution: identity functions: show me ⇒ n/n : λf.f Missing content: λx.flight(x) ∧ to(x, boston) Boston flights Solution: type-raising: np : x ⇒ np/n : λf.λa.f(a) ∧ to(a, x) Non-standard ordering: λx.flight(x) ∧ oneway(x) flights one-way Solution: disharmonic combinators: X : a Y /X : f ⇒ Y : f(a)
9
What is the most populous city in California?
10
What is the most populous city in California?
1 1 1 1 c
argmax population
2 1
CA loc city
10
What is the most populous city in California?
1 1 1 1 c
argmax population
2 1
CA loc city
11
What is the most populous city in California?
1 1 1 1 c
argmax population
2 1
CA loc city
Los Angeles
11
What is the most populous city in California?
1 1 1 1 c
argmax population
2 1
CA loc city
World/Database Los Angeles
11
World/Database
city
San Francisco Chicago Boston · · ·
state
Alabama Alaska Arizona · · ·
loc
Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·
border
Washington Oregon Washington Idaho Oregon Washington · · · · · · · · · · · ·
12
DCS tree
city
1 1
loc
2 1
CA
Database
13
DCS tree Constraints
city
1 1
loc
2 1
CA
Database A DCS tree encodes a constraint satisfaction problem (CSP)
13
DCS tree Constraints
city
c ∈ city
1 1
loc
2 1
CA
Database
city
San Francisco Chicago Boston · · ·
A DCS tree encodes a constraint satisfaction problem (CSP)
13
DCS tree Constraints
city
c ∈ city
1 1
loc
ℓ ∈ loc
2 1
CA
Database
city
San Francisco Chicago Boston · · ·
loc
Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·
A DCS tree encodes a constraint satisfaction problem (CSP)
13
DCS tree Constraints
city
c ∈ city
1 1
loc
ℓ ∈ loc
2 1
CA
s ∈ CA Database
city
San Francisco Chicago Boston · · ·
loc
Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·
CA
California
A DCS tree encodes a constraint satisfaction problem (CSP)
13
DCS tree Constraints
city
c ∈ city
1 1
c1 = ℓ1
loc
ℓ ∈ loc
2 1
CA
s ∈ CA Database
city
San Francisco Chicago Boston · · ·
loc
Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·
CA
California
A DCS tree encodes a constraint satisfaction problem (CSP)
13
DCS tree Constraints
city
c ∈ city
1 1
c1 = ℓ1
loc
ℓ ∈ loc
2 1
ℓ2 = s1
CA
s ∈ CA Database
city
San Francisco Chicago Boston · · ·
loc
Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·
CA
California
A DCS tree encodes a constraint satisfaction problem (CSP)
13
DCS tree Constraints
city
c ∈ city
1 1
c1 = ℓ1
loc
ℓ ∈ loc
2 1
ℓ2 = s1
CA
s ∈ CA Database
city
San Francisco Chicago Boston · · ·
loc
Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·
CA
California
A DCS tree encodes a constraint satisfaction problem (CSP)
13
DCS tree Constraints
city
c ∈ city
1 1
c1 = ℓ1
loc
ℓ ∈ loc
2 1
ℓ2 = s1
CA
s ∈ CA Database
city
San Francisco Chicago Boston · · ·
loc
Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·
CA
California
A DCS tree encodes a constraint satisfaction problem (CSP)
13
DCS tree Constraints
city
c ∈ city
1 1
c1 = ℓ1
loc
ℓ ∈ loc
2 1
ℓ2 = s1
CA
s ∈ CA Database
city
San Francisco Chicago Boston · · ·
loc
Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·
CA
California
A DCS tree encodes a constraint satisfaction problem (CSP)
13
DCS tree Constraints
city
c ∈ city
1 1
c1 = ℓ1
loc
ℓ ∈ loc
2 1
ℓ2 = s1
CA
s ∈ CA Database
city
San Francisco Chicago Boston · · ·
loc
Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·
CA
California
A DCS tree encodes a constraint satisfaction problem (CSP) Computation: dynamic programming ⇒ time = O(# nodes)
13
1 2 1 1 2 1 1 1 2 1
CA border state loc
1 1 1 1 1 1
major
2 1
AZ traverse river traverse city
14
1 2 1 1 2 1 1 1 2 1
CA border state loc
1 1 1 1 1 1
major
2 1
AZ traverse river traverse city
Trees
14
1 2 1 1 2 1 1 1 2 1
CA border state loc
1 1 1 1 1 1
major
2 1
AZ traverse river traverse city
Linguistics syntactic locality Trees
14
1 2 1 1 2 1 1 1 2 1
CA border state loc
1 1 1 1 1 1
major
2 1
AZ traverse river traverse city
Linguistics syntactic locality Trees Computation efficient interpretation
14
most populous city in California
15
most populous city in California Syntax most populous California in city
15
most populous city in California Syntax Semantics most populous California in city
argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))
15
most populous city in California Syntax Semantics most populous California in city
argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))
15
most populous city in California Syntax Semantics most populous California in city
argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))
Problem: syntactic scope is lower than semantic scope
15
most populous city in California Syntax Semantics most populous California in city
argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))
Problem: syntactic scope is lower than semantic scope If DCS trees look like syntax, how do we get correct semantics?
15
most populous city in California
x1 x1 1 1 1 1 c
argmax population
2 1
CA loc city
∗∗ Superlatives
16
most populous city in California Mark at syntactic scope
x1 x1 1 1 1 1 c
argmax population
2 1
CA loc city
∗∗ Superlatives
16
most populous city in California Execute at semantic scope Mark at syntactic scope
x1 x1 1 1 1 1 c
argmax population
2 1
CA loc city
∗∗ Superlatives
16
Alaska borders no states. Execute at semantic scope Mark at syntactic scope
x1 x1 2 1 1 1
AK
q
no state border
∗∗ Negation
16
Some river traverses every city. Execute at semantic scope Mark at syntactic scope
x12 x12 2 1 1 1 q
some river
q
every city traverse
∗∗ Quantification (narrow)
16
Some river traverses every city. Execute at semantic scope Mark at syntactic scope
x21 x21 2 1 1 1 q
some river
q
every city traverse
∗∗ Quantification (wide)
16
Some river traverses every city. Execute at semantic scope Mark at syntactic scope
x21 x21 2 1 1 1 q
some river
q
every city traverse
∗∗ Quantification (wide) Analogy: Montague’s quantifying in, Carpenter’s scoping constructor
16
Lexicon (very simple/crude) no ⇒
no
state ⇒
state
17
Lexicon (very simple/crude) no ⇒
no
state ⇒
state
Grammar (very simple/crude) a b ⇒
i j
b a
a b ⇒
i j
a b
17
Lexicon (very simple/crude) no ⇒
no
state ⇒
state
Grammar (very simple/crude) a b ⇒
i j k l
b c a
a b ⇒
i j k l
a c b
17
What is the most populous city in CA ?
18
CA
What is the most populous city in CA ? Lexical Triggers:
CA ⇒ CA
18
argmax CA
What is the most populous city in CA ? Lexical Triggers:
CA ⇒ CA
18
city city state state river river argmax population population CA
What is the most populous city in CA ? Lexical Triggers:
CA ⇒ CA
city ⇒ city state river population
18
Ci,j = set of DCS trees for span [i, j] most populous city in California i j
19
Ci,j = set of DCS trees for span [i, j] most populous city in California i j k
19
Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j
19
Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j
c
argmax population
1 1 2 1
CA loc city
19
Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j
c
argmax population
1 1 2 1
CA loc city
1 1 1 1 c
argmax population
2 1
CA loc city
19
Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j
c
argmax population
1 1 2 1
CA loc city
1 1 1 2 c
argmax population
2 1
CA loc city
19
Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j
c
argmax population
1 1 2 1
CA loc city
1 1 1 1 2 1 c
argmax population loc
2 1
CA loc city
19
Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j
c
argmax population
1 1 2 1
CA loc city
1 1 1 2 1 1 c
argmax population loc
2 1
CA loc city
19
Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j
c
argmax population
1 1 2 1
CA loc city
1 1 1 2 1 1 c
argmax population border
2 1
CA loc city
19
Ci,j = set of DCS trees for span [i, j] most populous city in California i j k Ci,k Ck,j
c
argmax population
1 1 2 1
CA loc city
1 1 c
argmax
1 1 2 1
CA loc city population
19
CCG DCS
20
CCG DCS Logical form lambda calculus formulae DCS trees λx.city(x) ∧ loc(x, CA)
1 1 2 1 CA
loc city
20
CCG DCS Logical form lambda calculus formulae DCS trees λx.city(x) ∧ loc(x, CA)
1 1 2 1 CA
loc city
Lexicon categories + lambda calculus predicates major n/n : λf.λx.f(x) ∧ major(x)
major
20
CCG DCS Logical form lambda calculus formulae DCS trees λx.city(x) ∧ loc(x, CA)
1 1 2 1 CA
loc city
Lexicon categories + lambda calculus predicates major n/n : λf.λx.f(x) ∧ major(x)
major
Grammar combinator rules ≅ dependency parsing Y/X : a X : b ⇒ Y : a(b)
i j
b a
20
CCG DCS Logical form lambda calculus formulae DCS trees λx.city(x) ∧ loc(x, CA)
1 1 2 1 CA
loc city
Lexicon categories + lambda calculus predicates major n/n : λf.λx.f(x) ∧ major(x)
major
Grammar combinator rules ≅ dependency parsing Y/X : a X : b ⇒ Y : a(b)
i j
b a
Nature tighter control simple/permissive
20
CCG DCS Logical form lambda calculus formulae DCS trees λx.city(x) ∧ loc(x, CA)
1 1 2 1 CA
loc city
Lexicon categories + lambda calculus predicates major n/n : λf.λx.f(x) ∧ major(x)
major
Grammar combinator rules ≅ dependency parsing Y/X : a X : b ⇒ Y : a(b)
i j
b a
Nature tighter control simple/permissive Origin linguistics NLP
20
Representation
1 2 1 1 2 1 1 1 2 1
CA border state loc
1 1 1 1 1 1
major
2 1
AZ traverse river traverse city
Learning
x θ z w y
Experiments
21
Detailed Supervision
What is the largest city in California?
argmax({c : city(c) ∧ loc(c, CA)}, population)
22
Detailed Supervision
What is the largest city in California? expert
argmax({c : city(c) ∧ loc(c, CA)}, population)
22
Detailed Supervision
What is the largest city in California? expert
argmax({c : city(c) ∧ loc(c, CA)}, population)
22
Detailed Supervision
What is the largest city in California? expert
argmax({c : city(c) ∧ loc(c, CA)}, population)
Natural Supervision
What is the largest city in California? Los Angeles
22
Detailed Supervision
What is the largest city in California? expert
argmax({c : city(c) ∧ loc(c, CA)}, population)
Natural Supervision
What is the largest city in California? non-expert Los Angeles
22
Detailed Supervision
What is the largest city in California? expert
argmax({c : city(c) ∧ loc(c, CA)}, population)
Natural Supervision
What is the largest city in California? non-expert Los Angeles
22
Detailed Supervision
What is the largest city in California? expert
argmax({c : city(c) ∧ loc(c, CA)}, population)
Natural Supervision
What is the largest city in California? non-expert Los Angeles
22
Detailed Supervision
What is the largest city in California? expert
argmax({c : city(c) ∧ loc(c, CA)}, population)
Natural Supervision
What is the largest city in California? non-expert Los Angeles
22
Computational: how to efficiently search exponential space?
23
Computational: how to efficiently search exponential space? What is the most populous city in California? Los Angeles
23
Computational: how to efficiently search exponential space? What is the most populous city in California? λx.state(x) Los Angeles
23
Computational: how to efficiently search exponential space? What is the most populous city in California? λx.city(x) Los Angeles
23
Computational: how to efficiently search exponential space? What is the most populous city in California? λx.city(x) ∧ loc(x, CA) Los Angeles
23
Computational: how to efficiently search exponential space? What is the most populous city in California? λx.state(x) ∧ border(x, CA) Los Angeles
23
Computational: how to efficiently search exponential space? What is the most populous city in California?
population(CA)
Los Angeles
23
Computational: how to efficiently search exponential space? What is the most populous city in California?
argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))
Los Angeles
23
Computational: how to efficiently search exponential space? What is the most populous city in California? · · · LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF · · · Los Angeles
23
Computational: how to efficiently search exponential space? What is the most populous city in California? · · · LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF LF · · · Los Angeles Statistical: how to parametrize mapping from sentence to logical form? What is the most populous city in California?
argmax(λx.city(x) ∧ loc(x, CA), λx.population(x))
23
z
1 2 1 1
CA capital
∗∗
world w
24
z
1 2 1 1
CA capital
∗∗
world w y Sacramento
24
z
1 2 1 1
CA capital
∗∗
world w y Sacramento Interpretation: p(y | z, w) (deterministic)
24
x capital of California? z
1 2 1 1
CA capital
∗∗
world w y Sacramento Interpretation: p(y | z, w) (deterministic)
24
x capital of California? parameters θ z
1 2 1 1
CA capital
∗∗
world w y Sacramento Interpretation: p(y | z, w) (deterministic)
24
x capital of California? parameters θ z
1 2 1 1
CA capital
∗∗
world w y Sacramento Semantic Parsing: p(z | x, θ) (probabilistic) Interpretation: p(y | z, w) (deterministic)
24
z:
city city loc CA
x: city in California
1 1 2 1
25
z:
city city loc CA
x: city in California
1 1 2 1
25
z:
city city loc CA
x: city in California
1 1 2 1
in
loc
: 1
25
z:
city city loc CA
x: city in California
1 1 2 1
in
loc
: 1
1 1 loc
city
: 1)
25
z:
city city loc CA
x: city in California
1 1 2 1
in
loc
: 1
1 1 loc
city
: 1 · · ·
25
z:
city city loc CA
x: city in California
1 1 2 1
in
loc
: 1
1 1 loc
city
: 1 · · ·
25
z:
city city loc CA
x: city in California
1 1 2 1
in
loc
: 1
1 1 loc
city
: 1 · · ·
escore(x,z)
25
Objective Function:
Interpretation Semantic parsing
26
Objective Function:
Interpretation Semantic parsing
26
Objective Function:
Interpretation Semantic parsing
26
Objective Function:
Interpretation Semantic parsing EM-like Algorithm: parameters θ (0, 0, . . . , 0)
26
Objective Function:
Interpretation Semantic parsing EM-like Algorithm: parameters θ (0, 0, . . . , 0) enumerate/score DCS trees
26
Objective Function:
Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0, 0, . . . , 0) enumerate/score DCS trees
tree1 tree2 tree3 tree4 tree5
26
Objective Function:
Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.2, −1.3, . . . , 0.7) enumerate/score DCS trees numerical optimization (L-BFGS)
tree1 tree2 tree3 tree4 tree5
26
Objective Function:
Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.2, −1.3, . . . , 0.7) enumerate/score DCS trees numerical optimization (L-BFGS)
tree3 tree8 tree6 tree2 tree4
26
Objective Function:
Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.3, −1.4, . . . , 0.6) enumerate/score DCS trees numerical optimization (L-BFGS)
tree3 tree8 tree6 tree2 tree4
26
Objective Function:
Interpretation Semantic parsing EM-like Algorithm: parameters θ k-best list (0.3, −1.4, . . . , 0.6) enumerate/score DCS trees numerical optimization (L-BFGS)
tree3 tree8 tree2 tree4 tree9
26
Representation
1 2 1 1 2 1 1 1 2 1
CA border state loc
1 1 1 1 1 1
major
2 1
AZ traverse river traverse city
Learning
x θ z w y
Experiments
27
Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples
28
Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples What is the highest point in Florida? How many states have a city called Rochester? What is the longest river that runs through a state that borders Tennessee? Of the states washed by the Mississippi river which has the lowest point? · · ·
28
Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples What is the highest point in Florida? ⇒ answer(A,highest(A,(place(A),loc(A,B),const(B,stateid(florida))))) How many states have a city called Rochester? ⇒ answer(A,count(B,(state(B),loc(C,B),const(C,cityid(rochester, ))),A)) What is the longest river that runs through a state that borders Tennessee? ⇒ answer(A,longest(A,(river(A),traverse(A,B),state(B),next to(B,C),const(C,stateid(tennessee))))) Of the states washed by the Mississippi river which has the lowest point? ⇒ answer(A,lowest(B,(state(A),traverse(C,A),const(C,riverid(mississippi)),loc(B,A),place(B)))) · · · Supervision in past work: question + program
28
Standard semantic parsing benchmark since 1990s 600 training examples, 280 test examples What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · Supervision in past work: question + program Supervision in this work: question + answer
28
Training data (600 examples)
What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · · · ·
29
Training data (600 examples)
What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · · · ·
Lexicon (20 general, 22 specific)
no
⇒ no
argmax
⇒ most
city
⇒ city
state
⇒ state
mountain ⇒ mountain
· · · · · ·
29
Training data (600 examples)
What is the highest point in Florida? ⇒ Walton County How many states have a city called Rochester? ⇒ 2 What is the longest river that runs through a state that borders Tennessee? ⇒ Missouri Of the states washed by the Mississippi river which has the lowest point? ⇒ Louisiana · · · · · ·
Lexicon (20 general, 22 specific)
no
⇒ no
argmax
⇒ most
city
⇒ city
state
⇒ state
mountain ⇒ mountain
· · · · · ·
World/Database
city
San Francisco Chicago Boston · · ·
state
Alabama Alaska Arizona · · ·
loc
Mount Shasta California San Francisco California Boston Massachusetts · · · · · ·
border
Washington Oregon Washington Idaho Oregon Washington · · · · · · · · · · · ·
29
On Geo, 250 training examples, 250 test examples
75 80 85 90 95 100
test accuracy
30
On Geo, 250 training examples, 250 test examples System Description Lexicon (gen./spec.) Logical forms cgcr10 FunQL [Clarke et al., 2010]
cgcr10
73.2%
75 80 85 90 95 100
test accuracy
30
On Geo, 250 training examples, 250 test examples System Description Lexicon (gen./spec.) Logical forms cgcr10 FunQL [Clarke et al., 2010] ljk11 DCS [Liang et al., 2011]
cgcr10
73.2%
dcs
78.9%
75 80 85 90 95 100
test accuracy
30
On Geo, 250 training examples, 250 test examples System Description Lexicon (gen./spec.) Logical forms cgcr10 FunQL [Clarke et al., 2010] ljk11 DCS [Liang et al., 2011] ljk11+ DCS [Liang et al., 2011]
cgcr10
73.2%
dcs
78.9%
dcs+
87.2%
75 80 85 90 95 100
test accuracy
30
On Geo, 600 training examples, 280 test examples
31
On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms
75 80 85 90 95 100
test accuracy
31
On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005]
zc05
79.3%
75 80 85 90 95 100
test accuracy
31
On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007]
zc05
79.3%
zc07
86.1%
75 80 85 90 95 100
test accuracy
31
On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007] kzgs10 CCG w/unification [Kwiatkowski et al., 2010]
zc05
79.3%
zc07
86.1%
kzgs10
88.9%
75 80 85 90 95 100
test accuracy
31
On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007] kzgs10 CCG w/unification [Kwiatkowski et al., 2010] ljk11 DCS [Liang et al., 2011]
zc05
79.3%
zc07
86.1%
kzgs10
88.9%
dcs
88.6%
75 80 85 90 95 100
test accuracy
31
On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007] kzgs10 CCG w/unification [Kwiatkowski et al., 2010] ljk11 DCS [Liang et al., 2011] ljk11+ DCS [Liang et al., 2011]
zc05
79.3%
zc07
86.1%
kzgs10
88.9%
dcs
88.6%
dcs+
91.1%
75 80 85 90 95 100
test accuracy
31
32
parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists
32
parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists
If no DCS tree on k-best list is correct, skip example in (2)
32
parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists
If no DCS tree on k-best list is correct, skip example in (2)
1 2 3 4
iteration
20 40 60 80 100
% examples trained on
32
parameters θ (1) search DCS trees (hard!) (2) numerical optimization k-best lists
If no DCS tree on k-best list is correct, skip example in (2)
1 2 3 4
iteration
20 40 60 80 100
% examples trained on
Effect: automatic curriculum learning, learning improves search
32
Unknown facts: How far is Los Angeles from Boston? Database has no distance information
33
Unknown facts: How far is Los Angeles from Boston? Database has no distance information Unknown concepts: What states are landlocked? Need to induce database view for landlocked(x) = ¬border(x, ocean)
33
Unknown facts: How far is Los Angeles from Boston? Database has no distance information Unknown concepts: What states are landlocked? Need to induce database view for landlocked(x) = ¬border(x, ocean) Unknown words: What is the largest settlement in California? Training examples do not contain the word settlement
33
sentence Semantic Parser logical form Interpretation denotation
34
sentence Semantic Parser logical form Interpretation denotation Learning from Weak Supervision
34
sentence Semantic Parser logical form Interpretation denotation Learning from Weak Supervision
Strategy:
34