SLIDE 1 Driving Semantic Parsing from the World’s Response
James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth
Cognitive Computation Group University of Illinois at Urbana-Champaign
CoNLL 2010
Clarke, Goldwasser, Chang, Roth 1
SLIDE 2 What is Semantic Parsing?
I’d like a coffee with no sugar and just a little milk make(coffee, sugar=0, milk=0.3)
Meaning Representation
Clarke, Goldwasser, Chang, Roth 2
SLIDE 3 What is Semantic Parsing?
I’d like a coffee with no sugar and just a little milk make(coffee, sugar=0, milk=0.3)
Meaning Representation
Clarke, Goldwasser, Chang, Roth 2
SLIDE 4 Supervised Learning Problem
Training algorithm text meaning Model Challenges: Structured Prediction problem Model part of the structure as hidden?
Clarke, Goldwasser, Chang, Roth 3
SLIDE 5 Lots of previous work
Multiple approaches to the problem: KRISP (Kate & Mooney 2006)
SVM-based parser using string kernels.
Zettlemoyer & Collins 2005; Zettlemoyer & Collins 2007
Probabilistic parser based on relaxed CCG grammars.
WASP (Wong & Mooney 2006; Wong & Mooney 2007)
Based on Synchronous CFG.
Ge & Mooney 2009
Integrated syntactic and semantic parser.
Clarke, Goldwasser, Chang, Roth 4
SLIDE 6 Lots of previous work
Multiple approaches to the problem: KRISP (Kate & Mooney 2006)
SVM-based parser using string kernels.
Zettlemoyer & Collins 2005; Zettlemoyer & Collins 2007
Probabilistic parser based on relaxed CCG grammars.
WASP (Wong & Mooney 2006; Wong & Mooney 2007)
Based on Synchronous CFG.
Ge & Mooney 2009
Integrated syntactic and semantic parser.
Assumption: A training set consisting of natural language and meaning representation pairs.
Clarke, Goldwasser, Chang, Roth 4
SLIDE 7 Using the World’s response
I’d like a coffee with no sugar and just a little milk make(coffee, sugar=0, milk=0.3)
Meaning Representation
Clarke, Goldwasser, Chang, Roth 5
SLIDE 8 Using the World’s response
I’d like a coffee with no sugar and just a little milk make(coffee, sugar=0, milk=0.3)
Meaning Representation Bad! Good!
Clarke, Goldwasser, Chang, Roth 5
SLIDE 9 Using the World’s response
I’d like a coffee with no sugar and just a little milk make(coffee, sugar=0, milk=0.3)
Meaning Representation Bad! Good! Question: Can we use feedback based on the response to provide supervision?
Clarke, Goldwasser, Chang, Roth 5
SLIDE 10 This work
We aim to: Reduce the burden of annotation for semantic parsing. We focus on: Using the World’s response to learn a semantic parser. Developing new training algorithms to support this learning paradigm. A lightweight semantic parsing model that doesn’t require annotated data. This results in: Learning a semantic parser using zero annotated meaning representations.
Clarke, Goldwasser, Chang, Roth 6
SLIDE 11 Outline
1
Semantic Parsing
2
Learning DIRECT Approach AGGRESSIVE Approach
3
Semantic Parsing Model
4
Experiments
Clarke, Goldwasser, Chang, Roth 7
SLIDE 12 Outline
1
Semantic Parsing
2
Learning DIRECT Approach AGGRESSIVE Approach
3
Semantic Parsing Model
4
Experiments
Clarke, Goldwasser, Chang, Roth 8
SLIDE 13 Semantic Parsing
INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas)))
Clarke, Goldwasser, Chang, Roth 9
SLIDE 14 Semantic Parsing
INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) F : X → Z ˆ z = Fw(x) = arg max
y∈Y,z∈Z
wTΦ(x, y, z)
Clarke, Goldwasser, Chang, Roth 9
SLIDE 15 Semantic Parsing
INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) F : X → Z ˆ z = Fw(x) = arg max
y∈Y,z∈Z
wTΦ(x, y, z) Model The nature of inference and feature functions. Learning Strategy How we obtain the weights.
Clarke, Goldwasser, Chang, Roth 9
SLIDE 16 Semantic Parsing
INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) Response r New Mexico F : X → Z ˆ z = Fw(x) = arg max
y∈Y,z∈Z
wTΦ(x, y, z) Model The nature of inference and feature functions. Learning Strategy How we obtain the weights.
Clarke, Goldwasser, Chang, Roth 9
SLIDE 17 Outline
1
Semantic Parsing
2
Learning DIRECT Approach AGGRESSIVE Approach
3
Semantic Parsing Model
4
Experiments
Clarke, Goldwasser, Chang, Roth 10
SLIDE 18 Learning
Inputs: Natural language sentences. Feedback : X × Z → {+1, −1}. Zero meaning representations.
Clarke, Goldwasser, Chang, Roth 11
SLIDE 19 Learning
Inputs: Natural language sentences. Feedback : X × Z → {+1, −1}. Zero meaning representations. Feedback(x, z) = +1 if execute(z) = r −1
Clarke, Goldwasser, Chang, Roth 11
SLIDE 20 Learning
Inputs: Natural language sentences. Feedback : X × Z → {+1, −1}. Zero meaning representations. Goal: A weight vector that scores the correct meaning representation higher than all other meaning representations. Response Driven Learning: Input text Meaning Representation World Feedback predict apply to
Clarke, Goldwasser, Chang, Roth 11
SLIDE 21 Learning Strategies
x1 x2 x3 . . . xn repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 12
SLIDE 22 Learning Strategies
x1 y1 z1 x2 y2 z2 x3 y3 z3 . . . . . . . . . xn yn zn repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max wTΦ(x, y, z)
Clarke, Goldwasser, Chang, Roth 12
SLIDE 23 Learning Strategies
x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 12
SLIDE 24 Learning Strategies
x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 12
SLIDE 25 Outline
1
Semantic Parsing
2
Learning DIRECT Approach AGGRESSIVE Approach
3
Semantic Parsing Model
4
Experiments
Clarke, Goldwasser, Chang, Roth 13
SLIDE 26 DIRECT Approach
Input text Meaning Representation World Feedback predict apply to Binary Learning
DIRECT
Learn a binary classifier to discriminate between good and bad meaning representations.
Clarke, Goldwasser, Chang, Roth 14
SLIDE 27 DIRECT Approach
x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 Use (x, y, z) as a training example with label from feedback.
Clarke, Goldwasser, Chang, Roth 15
SLIDE 28 DIRECT Approach
x1, y1, z1 +1 x2, y2, z2 −1 x3, y3, z3 −1 . . . . . . xn, yn, zn −1 Use (x, y, z) as a training example with label from feedback. Find w such that f · wTΦ(x, y, z) > 0
Clarke, Goldwasser, Chang, Roth 15
SLIDE 29 DIRECT Approach
Each point represented by Φ(x, y, x) normalized by |x|
Clarke, Goldwasser, Chang, Roth 16
SLIDE 30 DIRECT Approach
w Learn a binary classifier to discriminate between good and bad meaning representations.
Clarke, Goldwasser, Chang, Roth 16
SLIDE 31 DIRECT Approach
repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 17
SLIDE 32 DIRECT Approach
x1 x2 x3 . . . xn repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 17
SLIDE 33 DIRECT Approach
x1 y′
1
z′
1
x2 y′
2
z′2 x3 y′
3
z′
3
. . . . . . . . . xn y′
n
z′
n
repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max wTΦ(x, y, z)
Clarke, Goldwasser, Chang, Roth 17
SLIDE 34 DIRECT Approach
x1 y′
1
z′
1
+1 x2 y′
2
z′2 +1 x3 y′
3
z′
3
+1 . . . . . . . . . . . . xn y′
n
z′
n
−1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 17
SLIDE 35 DIRECT Approach
x1, y′
1, z′ 1
+1 x2, y′
2, z′ 2
+1 x3, y′
3, z′ 3
+1 . . . . . . xn, y′
n, z′ n
−1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 17
SLIDE 36 DIRECT Approach
w
Clarke, Goldwasser, Chang, Roth 18
SLIDE 37 DIRECT Approach
w
Clarke, Goldwasser, Chang, Roth 18
SLIDE 38 DIRECT Approach
w
Clarke, Goldwasser, Chang, Roth 18
SLIDE 39 DIRECT Approach
w Repeat until convergence!
Clarke, Goldwasser, Chang, Roth 18
SLIDE 40 Outline
1
Semantic Parsing
2
Learning DIRECT Approach AGGRESSIVE Approach
3
Semantic Parsing Model
4
Experiments
Clarke, Goldwasser, Chang, Roth 19
SLIDE 41 AGGRESSIVE Approach
Input text Meaning Representation World Feedback predict apply to Structured Learning
AGGRESSIVE
Positive feedback is a good indicator of the correct meaning representation. Use data with positive feedback as training data for structured learning.
Clarke, Goldwasser, Chang, Roth 20
SLIDE 42 AGGRESSIVE Approach
x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 21
SLIDE 43 AGGRESSIVE Approach
x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 Use items with positive feedback as training data for a structured learner.
Clarke, Goldwasser, Chang, Roth 21
SLIDE 44 AGGRESSIVE Approach
x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 Use items with positive feedback as training data for a structured learner.
Clarke, Goldwasser, Chang, Roth 21
SLIDE 45 AGGRESSIVE Approach
x1 y1 z1 x2 y2 z2 x3 y3 z3 . . . . . . . . . . . . xn yn zn Use items with positive feedback as training data for a structured learner.
Clarke, Goldwasser, Chang, Roth 21
SLIDE 46 AGGRESSIVE Approach
x1 y1 z1 x2 y2 z2 x3 y3 z3 . . . . . . . . . . . . xn yn zn Use items with positive feedback as training data for a structured learner. Implicitly consider all other meaning representations for these examples as bad. Find w such that wTΦ(x, y∗, z∗) > wTΦ(x, y′, z′)
Clarke, Goldwasser, Chang, Roth 21
SLIDE 47 AGGRESSIVE Approach
repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 22
SLIDE 48 AGGRESSIVE Approach
x1 x2 x3 . . . xn repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 22
SLIDE 49 AGGRESSIVE Approach
x1 y′
1
z′
1
x2 y′
2
z′2 x3 y′
3
z′
3
. . . . . . . . . xn y′
n
z′
n
repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max wTΦ(x, y, z)
Clarke, Goldwasser, Chang, Roth 22
SLIDE 50 AGGRESSIVE Approach
x1 y′
1
z′
1
+1 x2 y′
2
z′2 +1 x3 y′
3
z′
3
−1 . . . . . . . . . . . . xn y′
n
z′
n
+1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 22
SLIDE 51 AGGRESSIVE Approach
x1 y′
1
z′
1
+1 x2 y′
2
z′2 +1 x3 y′
3
z′
3
−1 . . . . . . . . . . . . xn y′
n
z′
n
+1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 22
SLIDE 52 AGGRESSIVE Approach
x1 y′
1
z′
1
x2 y′
2
z′2 x3 y′
3
z′
3
. . . . . . . . . xn y′
n
z′
n
repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 22
SLIDE 53 AGGRESSIVE Approach
x1 y′
1
z′
1
x2 y′
2
z′2 x3 y′
3
z′
3
. . . . . . . . . xn y′
n
z′
n
repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence
Clarke, Goldwasser, Chang, Roth 22
SLIDE 54 Summary of Learning Strategies
Input text Meaning Representation World Feedback predict apply to Learning Strategy DIRECT Uses both positive and negative feedback as examples to train a binary classifier. AGGRESSIVE Adapts the feedback signal and uses only positive feedback to train a structured predictor.
Clarke, Goldwasser, Chang, Roth 23
SLIDE 55 Outline
1
Semantic Parsing
2
Learning DIRECT Approach AGGRESSIVE Approach
3
Semantic Parsing Model
4
Experiments
Clarke, Goldwasser, Chang, Roth 24
SLIDE 56 Model
INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) ˆ z = Fw(x) = arg max
y∈Y,z∈Z
wTΦ(x, y, z) First-order: Map lexical items. largest → largest Second-order: Composition. next_to(state(·)) or state(next_to(·)) Inference procedure leverages the typing information of the domain.
Clarke, Goldwasser, Chang, Roth 25
SLIDE 57
First-order Decisions
How many people live in the state of Texas ? Goal: population(state(texas))
SLIDE 58
First-order Decisions
How many people live in the state of Texas ? loc texas next_to state population null Goal: population(state(texas))
SLIDE 59
First-order Decisions
How many people live in the state of Texas ? loc texas next_to state population null Goal: population(state(texas))
SLIDE 60
First-order Decisions
How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process.
> texas texas > state state > population population > loc in > next_to next borders adjacent
Goal: population(state(texas))
SLIDE 61
First-order Decisions
How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process.
> texas texas > state state > population population > loc in > next_to next borders adjacent
Goal: population(state(texas))
SLIDE 62
First-order Decisions
How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process.
> texas texas > state state > population population > loc in > next_to next borders adjacent
Goal: population(state(texas))
SLIDE 63
First-order Decisions
How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population)
> texas texas > state state > population population > loc in > next_to next borders adjacent
Goal: population(state(texas))
SLIDE 64
First-order Decisions
How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population)
> texas texas > state state > population population > loc in > next_to next borders adjacent
Goal: population(state(texas))
SLIDE 65
First-order Decisions
How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population) Context helps disambiguate between choices.
> texas texas > state state > population population > loc in > next_to next borders adjacent
Goal: population(state(texas))
SLIDE 66 Second-order Decisions
How do we compose the predicates and constants. Domain dependent: Encode typing information inherent in the domain into the inference procedure. population(state(·)) vs state(population(·)) Features: Dependency path distance. Word position distance. Predicate “bigrams”. next_to(state(·)) vs state(next_to(·))
Clarke, Goldwasser, Chang, Roth 27
SLIDE 67 Outline
1
Semantic Parsing
2
Learning DIRECT Approach AGGRESSIVE Approach
3
Semantic Parsing Model
4
Experiments
Clarke, Goldwasser, Chang, Roth 28
SLIDE 68 Evaluation
Domain: GEOQUERY U.S Geographical Questions. Response 250. (x, r) pairs. Zero meaning representations. Query 250. (x) sentences. Evaluation metric: Accuracy (percentage of meaning representations that return the correct answer).
Clarke, Goldwasser, Chang, Roth 29
SLIDE 69 Learning Behavior
Algorithm R250 Q250 NOLEARN 22.2 — DIRECT AGGRESSIVE SUPERVISED 87.6 80.4
Clarke, Goldwasser, Chang, Roth 30
SLIDE 70 Learning Behavior
Algorithm R250 Q250 NOLEARN 22.2 — DIRECT AGGRESSIVE SUPERVISED 87.6 80.4 NOLEARN used to initialize both learning approaches.
Clarke, Goldwasser, Chang, Roth 30
SLIDE 71 Learning Behavior
Algorithm R250 Q250 NOLEARN 22.2 — DIRECT AGGRESSIVE SUPERVISED 87.6 80.4 Q: How good is our model when trained in a fully supervised manner?
Clarke, Goldwasser, Chang, Roth 30
SLIDE 72 Learning Behavior
Algorithm R250 Q250 NOLEARN 22.2 — DIRECT AGGRESSIVE SUPERVISED 87.6 80.4 Q: How good is our model when trained in a fully supervised manner? A: 80% on test data. Other supervised methods range from 60% to 85% accuracy.
Clarke, Goldwasser, Chang, Roth 30
SLIDE 73 Learning Behavior
Algorithm R250 Q250 NOLEARN 22.2 — DIRECT AGGRESSIVE SUPERVISED 87.6 80.4 Q: Is it possible to learn without any meaning representations?
Clarke, Goldwasser, Chang, Roth 30
SLIDE 74 Learning Behavior
Algorithm R250 Q250 NOLEARN 22.2 — DIRECT 75.2 69.2 AGGRESSIVE 82.4 73.2 SUPERVISED 87.6 80.4 Q: Is it possible to learn without any meaning representations? A: Yes! A: Learns to cover more of the Response data set. A: And only 7% below the SUPERVISED upper bound.
Clarke, Goldwasser, Chang, Roth 30
SLIDE 75 Learning Behavior
Algorithm R250 Q250 NOLEARN 22.2 — DIRECT 75.2 69.2 AGGRESSIVE 82.4 73.2 SUPERVISED 87.6 80.4 Q: Is it possible to learn without any meaning representations? A: Yes! A: Learns to cover more of the Response data set. A: And only 7% below the SUPERVISED upper bound.
Clarke, Goldwasser, Chang, Roth 30
SLIDE 76 Learning Behavior
Learning Iterations Accuracy on Response 250 1 2 3 4 5 6 10 20 30 40 50 60 70 80
DIRECT AGGRESSIVE Initialization AGGRESSIVE correctly interprets 16% that DIRECT does not. 9% vice-versa. Leaving only 9% incorrect.
Clarke, Goldwasser, Chang, Roth 31
SLIDE 77 Learning from Indirect Supervision
Similar to indirect learning protocols: Learning a binary classifier with “hidden explanation”. Supervision
- nly required for binary data. No labeled structures. NAACL 2010
(Chang, Goldwasser, Roth, Srikumar 2010a). Structured learning with binary and structured labels. Mix of supervision for binary data and structured data. Binary label indicates whether input has a “good” structure. ICML 2010 (Chang, Goldwasser, Roth, Srikumar 2010b).
Clarke, Goldwasser, Chang, Roth 32
SLIDE 78 Conclusions
Contributions: Response Driven Learning. A new learning paradigm that doesn’t rely on annotated meaning representations. Supervised at the response level. Natural supervision signal. Two learning algorithms capable of working within response driven learning. A shallow semantic parsing model. Future work: Can we combine the two learning algorithms? Other semantic parsing domains? Response driven learning for other tasks?
Clarke, Goldwasser, Chang, Roth 33