Driving Semantic Parsing from the Worlds Response James Clarke , Dan - - PowerPoint PPT Presentation

driving semantic parsing from the world s response
SMART_READER_LITE
LIVE PREVIEW

Driving Semantic Parsing from the Worlds Response James Clarke , Dan - - PowerPoint PPT Presentation

Driving Semantic Parsing from the Worlds Response James Clarke , Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser, Chang, Roth 1 What is Semantic


slide-1
SLIDE 1

Driving Semantic Parsing from the World’s Response

James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth

Cognitive Computation Group University of Illinois at Urbana-Champaign

CoNLL 2010

Clarke, Goldwasser, Chang, Roth 1

slide-2
SLIDE 2

What is Semantic Parsing?

I’d like a coffee with no sugar and just a little milk make(coffee, sugar=0, milk=0.3)

Meaning Representation

Clarke, Goldwasser, Chang, Roth 2

slide-3
SLIDE 3

What is Semantic Parsing?

I’d like a coffee with no sugar and just a little milk make(coffee, sugar=0, milk=0.3)

Meaning Representation

Clarke, Goldwasser, Chang, Roth 2

slide-4
SLIDE 4

Supervised Learning Problem

Training algorithm text meaning Model Challenges: Structured Prediction problem Model part of the structure as hidden?

Clarke, Goldwasser, Chang, Roth 3

slide-5
SLIDE 5

Lots of previous work

Multiple approaches to the problem: KRISP (Kate & Mooney 2006)

SVM-based parser using string kernels.

Zettlemoyer & Collins 2005; Zettlemoyer & Collins 2007

Probabilistic parser based on relaxed CCG grammars.

WASP (Wong & Mooney 2006; Wong & Mooney 2007)

Based on Synchronous CFG.

Ge & Mooney 2009

Integrated syntactic and semantic parser.

Clarke, Goldwasser, Chang, Roth 4

slide-6
SLIDE 6

Lots of previous work

Multiple approaches to the problem: KRISP (Kate & Mooney 2006)

SVM-based parser using string kernels.

Zettlemoyer & Collins 2005; Zettlemoyer & Collins 2007

Probabilistic parser based on relaxed CCG grammars.

WASP (Wong & Mooney 2006; Wong & Mooney 2007)

Based on Synchronous CFG.

Ge & Mooney 2009

Integrated syntactic and semantic parser.

Assumption: A training set consisting of natural language and meaning representation pairs.

Clarke, Goldwasser, Chang, Roth 4

slide-7
SLIDE 7

Using the World’s response

I’d like a coffee with no sugar and just a little milk make(coffee, sugar=0, milk=0.3)

Meaning Representation

Clarke, Goldwasser, Chang, Roth 5

slide-8
SLIDE 8

Using the World’s response

I’d like a coffee with no sugar and just a little milk make(coffee, sugar=0, milk=0.3)

Meaning Representation Bad! Good!

Clarke, Goldwasser, Chang, Roth 5

slide-9
SLIDE 9

Using the World’s response

I’d like a coffee with no sugar and just a little milk make(coffee, sugar=0, milk=0.3)

Meaning Representation Bad! Good! Question: Can we use feedback based on the response to provide supervision?

Clarke, Goldwasser, Chang, Roth 5

slide-10
SLIDE 10

This work

We aim to: Reduce the burden of annotation for semantic parsing. We focus on: Using the World’s response to learn a semantic parser. Developing new training algorithms to support this learning paradigm. A lightweight semantic parsing model that doesn’t require annotated data. This results in: Learning a semantic parser using zero annotated meaning representations.

Clarke, Goldwasser, Chang, Roth 6

slide-11
SLIDE 11

Outline

1

Semantic Parsing

2

Learning DIRECT Approach AGGRESSIVE Approach

3

Semantic Parsing Model

4

Experiments

Clarke, Goldwasser, Chang, Roth 7

slide-12
SLIDE 12

Outline

1

Semantic Parsing

2

Learning DIRECT Approach AGGRESSIVE Approach

3

Semantic Parsing Model

4

Experiments

Clarke, Goldwasser, Chang, Roth 8

slide-13
SLIDE 13

Semantic Parsing

INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas)))

Clarke, Goldwasser, Chang, Roth 9

slide-14
SLIDE 14

Semantic Parsing

INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) F : X → Z ˆ z = Fw(x) = arg max

y∈Y,z∈Z

wTΦ(x, y, z)

Clarke, Goldwasser, Chang, Roth 9

slide-15
SLIDE 15

Semantic Parsing

INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) F : X → Z ˆ z = Fw(x) = arg max

y∈Y,z∈Z

wTΦ(x, y, z) Model The nature of inference and feature functions. Learning Strategy How we obtain the weights.

Clarke, Goldwasser, Chang, Roth 9

slide-16
SLIDE 16

Semantic Parsing

INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) Response r New Mexico F : X → Z ˆ z = Fw(x) = arg max

y∈Y,z∈Z

wTΦ(x, y, z) Model The nature of inference and feature functions. Learning Strategy How we obtain the weights.

Clarke, Goldwasser, Chang, Roth 9

slide-17
SLIDE 17

Outline

1

Semantic Parsing

2

Learning DIRECT Approach AGGRESSIVE Approach

3

Semantic Parsing Model

4

Experiments

Clarke, Goldwasser, Chang, Roth 10

slide-18
SLIDE 18

Learning

Inputs: Natural language sentences. Feedback : X × Z → {+1, −1}. Zero meaning representations.

Clarke, Goldwasser, Chang, Roth 11

slide-19
SLIDE 19

Learning

Inputs: Natural language sentences. Feedback : X × Z → {+1, −1}. Zero meaning representations. Feedback(x, z) = +1 if execute(z) = r −1

  • therwise

Clarke, Goldwasser, Chang, Roth 11

slide-20
SLIDE 20

Learning

Inputs: Natural language sentences. Feedback : X × Z → {+1, −1}. Zero meaning representations. Goal: A weight vector that scores the correct meaning representation higher than all other meaning representations. Response Driven Learning: Input text Meaning Representation World Feedback predict apply to

Clarke, Goldwasser, Chang, Roth 11

slide-21
SLIDE 21

Learning Strategies

x1 x2 x3 . . . xn repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 12

slide-22
SLIDE 22

Learning Strategies

x1 y1 z1 x2 y2 z2 x3 y3 z3 . . . . . . . . . xn yn zn repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max wTΦ(x, y, z)

Clarke, Goldwasser, Chang, Roth 12

slide-23
SLIDE 23

Learning Strategies

x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 12

slide-24
SLIDE 24

Learning Strategies

x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 12

slide-25
SLIDE 25

Outline

1

Semantic Parsing

2

Learning DIRECT Approach AGGRESSIVE Approach

3

Semantic Parsing Model

4

Experiments

Clarke, Goldwasser, Chang, Roth 13

slide-26
SLIDE 26

DIRECT Approach

Input text Meaning Representation World Feedback predict apply to Binary Learning

DIRECT

Learn a binary classifier to discriminate between good and bad meaning representations.

Clarke, Goldwasser, Chang, Roth 14

slide-27
SLIDE 27

DIRECT Approach

x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 Use (x, y, z) as a training example with label from feedback.

Clarke, Goldwasser, Chang, Roth 15

slide-28
SLIDE 28

DIRECT Approach

x1, y1, z1 +1 x2, y2, z2 −1 x3, y3, z3 −1 . . . . . . xn, yn, zn −1 Use (x, y, z) as a training example with label from feedback. Find w such that f · wTΦ(x, y, z) > 0

Clarke, Goldwasser, Chang, Roth 15

slide-29
SLIDE 29

DIRECT Approach

Each point represented by Φ(x, y, x) normalized by |x|

Clarke, Goldwasser, Chang, Roth 16

slide-30
SLIDE 30

DIRECT Approach

w Learn a binary classifier to discriminate between good and bad meaning representations.

Clarke, Goldwasser, Chang, Roth 16

slide-31
SLIDE 31

DIRECT Approach

repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 17

slide-32
SLIDE 32

DIRECT Approach

x1 x2 x3 . . . xn repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 17

slide-33
SLIDE 33

DIRECT Approach

x1 y′

1

z′

1

x2 y′

2

z′2 x3 y′

3

z′

3

. . . . . . . . . xn y′

n

z′

n

repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max wTΦ(x, y, z)

Clarke, Goldwasser, Chang, Roth 17

slide-34
SLIDE 34

DIRECT Approach

x1 y′

1

z′

1

+1 x2 y′

2

z′2 +1 x3 y′

3

z′

3

+1 . . . . . . . . . . . . xn y′

n

z′

n

−1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 17

slide-35
SLIDE 35

DIRECT Approach

x1, y′

1, z′ 1

+1 x2, y′

2, z′ 2

+1 x3, y′

3, z′ 3

+1 . . . . . . xn, y′

n, z′ n

−1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 17

slide-36
SLIDE 36

DIRECT Approach

w

Clarke, Goldwasser, Chang, Roth 18

slide-37
SLIDE 37

DIRECT Approach

w

Clarke, Goldwasser, Chang, Roth 18

slide-38
SLIDE 38

DIRECT Approach

w

Clarke, Goldwasser, Chang, Roth 18

slide-39
SLIDE 39

DIRECT Approach

w Repeat until convergence!

Clarke, Goldwasser, Chang, Roth 18

slide-40
SLIDE 40

Outline

1

Semantic Parsing

2

Learning DIRECT Approach AGGRESSIVE Approach

3

Semantic Parsing Model

4

Experiments

Clarke, Goldwasser, Chang, Roth 19

slide-41
SLIDE 41

AGGRESSIVE Approach

Input text Meaning Representation World Feedback predict apply to Structured Learning

AGGRESSIVE

Positive feedback is a good indicator of the correct meaning representation. Use data with positive feedback as training data for structured learning.

Clarke, Goldwasser, Chang, Roth 20

slide-42
SLIDE 42

AGGRESSIVE Approach

x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 21

slide-43
SLIDE 43

AGGRESSIVE Approach

x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 Use items with positive feedback as training data for a structured learner.

Clarke, Goldwasser, Chang, Roth 21

slide-44
SLIDE 44

AGGRESSIVE Approach

x1 y1 z1 +1 x2 y2 z2 −1 x3 y3 z3 −1 . . . . . . . . . . . . xn yn zn −1 Use items with positive feedback as training data for a structured learner.

Clarke, Goldwasser, Chang, Roth 21

slide-45
SLIDE 45

AGGRESSIVE Approach

x1 y1 z1 x2 y2 z2 x3 y3 z3 . . . . . . . . . . . . xn yn zn Use items with positive feedback as training data for a structured learner.

Clarke, Goldwasser, Chang, Roth 21

slide-46
SLIDE 46

AGGRESSIVE Approach

x1 y1 z1 x2 y2 z2 x3 y3 z3 . . . . . . . . . . . . xn yn zn Use items with positive feedback as training data for a structured learner. Implicitly consider all other meaning representations for these examples as bad. Find w such that wTΦ(x, y∗, z∗) > wTΦ(x, y′, z′)

Clarke, Goldwasser, Chang, Roth 21

slide-47
SLIDE 47

AGGRESSIVE Approach

repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 22

slide-48
SLIDE 48

AGGRESSIVE Approach

x1 x2 x3 . . . xn repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 22

slide-49
SLIDE 49

AGGRESSIVE Approach

x1 y′

1

z′

1

x2 y′

2

z′2 x3 y′

3

z′

3

. . . . . . . . . xn y′

n

z′

n

repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max wTΦ(x, y, z)

Clarke, Goldwasser, Chang, Roth 22

slide-50
SLIDE 50

AGGRESSIVE Approach

x1 y′

1

z′

1

+1 x2 y′

2

z′2 +1 x3 y′

3

z′

3

−1 . . . . . . . . . . . . xn y′

n

z′

n

+1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 22

slide-51
SLIDE 51

AGGRESSIVE Approach

x1 y′

1

z′

1

+1 x2 y′

2

z′2 +1 x3 y′

3

z′

3

−1 . . . . . . . . . . . . xn y′

n

z′

n

+1 repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 22

slide-52
SLIDE 52

AGGRESSIVE Approach

x1 y′

1

z′

1

x2 y′

2

z′2 x3 y′

3

z′

3

. . . . . . . . . xn y′

n

z′

n

repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 22

slide-53
SLIDE 53

AGGRESSIVE Approach

x1 y′

1

z′

1

x2 y′

2

z′2 x3 y′

3

z′

3

. . . . . . . . . xn y′

n

z′

n

repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence

Clarke, Goldwasser, Chang, Roth 22

slide-54
SLIDE 54

Summary of Learning Strategies

Input text Meaning Representation World Feedback predict apply to Learning Strategy DIRECT Uses both positive and negative feedback as examples to train a binary classifier. AGGRESSIVE Adapts the feedback signal and uses only positive feedback to train a structured predictor.

Clarke, Goldwasser, Chang, Roth 23

slide-55
SLIDE 55

Outline

1

Semantic Parsing

2

Learning DIRECT Approach AGGRESSIVE Approach

3

Semantic Parsing Model

4

Experiments

Clarke, Goldwasser, Chang, Roth 24

slide-56
SLIDE 56

Model

INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) ˆ z = Fw(x) = arg max

y∈Y,z∈Z

wTΦ(x, y, z) First-order: Map lexical items. largest → largest Second-order: Composition. next_to(state(·)) or state(next_to(·)) Inference procedure leverages the typing information of the domain.

Clarke, Goldwasser, Chang, Roth 25

slide-57
SLIDE 57

First-order Decisions

How many people live in the state of Texas ? Goal: population(state(texas))

slide-58
SLIDE 58

First-order Decisions

How many people live in the state of Texas ? loc texas next_to state population null Goal: population(state(texas))

slide-59
SLIDE 59

First-order Decisions

How many people live in the state of Texas ? loc texas next_to state population null Goal: population(state(texas))

slide-60
SLIDE 60

First-order Decisions

How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process.

> texas texas > state state > population population > loc in > next_to next borders adjacent

Goal: population(state(texas))

slide-61
SLIDE 61

First-order Decisions

How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process.

> texas texas > state state > population population > loc in > next_to next borders adjacent

Goal: population(state(texas))

slide-62
SLIDE 62

First-order Decisions

How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process.

> texas texas > state state > population population > loc in > next_to next borders adjacent

Goal: population(state(texas))

slide-63
SLIDE 63

First-order Decisions

How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population)

> texas texas > state state > population population > loc in > next_to next borders adjacent

Goal: population(state(texas))

slide-64
SLIDE 64

First-order Decisions

How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population)

> texas texas > state state > population population > loc in > next_to next borders adjacent

Goal: population(state(texas))

slide-65
SLIDE 65

First-order Decisions

How many people live in the state of Texas ? loc texas next_to state population null Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population) Context helps disambiguate between choices.

> texas texas > state state > population population > loc in > next_to next borders adjacent

Goal: population(state(texas))

slide-66
SLIDE 66

Second-order Decisions

How do we compose the predicates and constants. Domain dependent: Encode typing information inherent in the domain into the inference procedure. population(state(·)) vs state(population(·)) Features: Dependency path distance. Word position distance. Predicate “bigrams”. next_to(state(·)) vs state(next_to(·))

Clarke, Goldwasser, Chang, Roth 27

slide-67
SLIDE 67

Outline

1

Semantic Parsing

2

Learning DIRECT Approach AGGRESSIVE Approach

3

Semantic Parsing Model

4

Experiments

Clarke, Goldwasser, Chang, Roth 28

slide-68
SLIDE 68

Evaluation

Domain: GEOQUERY U.S Geographical Questions. Response 250. (x, r) pairs. Zero meaning representations. Query 250. (x) sentences. Evaluation metric: Accuracy (percentage of meaning representations that return the correct answer).

Clarke, Goldwasser, Chang, Roth 29

slide-69
SLIDE 69

Learning Behavior

Algorithm R250 Q250 NOLEARN 22.2 — DIRECT AGGRESSIVE SUPERVISED 87.6 80.4

Clarke, Goldwasser, Chang, Roth 30

slide-70
SLIDE 70

Learning Behavior

Algorithm R250 Q250 NOLEARN 22.2 — DIRECT AGGRESSIVE SUPERVISED 87.6 80.4 NOLEARN used to initialize both learning approaches.

Clarke, Goldwasser, Chang, Roth 30

slide-71
SLIDE 71

Learning Behavior

Algorithm R250 Q250 NOLEARN 22.2 — DIRECT AGGRESSIVE SUPERVISED 87.6 80.4 Q: How good is our model when trained in a fully supervised manner?

Clarke, Goldwasser, Chang, Roth 30

slide-72
SLIDE 72

Learning Behavior

Algorithm R250 Q250 NOLEARN 22.2 — DIRECT AGGRESSIVE SUPERVISED 87.6 80.4 Q: How good is our model when trained in a fully supervised manner? A: 80% on test data. Other supervised methods range from 60% to 85% accuracy.

Clarke, Goldwasser, Chang, Roth 30

slide-73
SLIDE 73

Learning Behavior

Algorithm R250 Q250 NOLEARN 22.2 — DIRECT AGGRESSIVE SUPERVISED 87.6 80.4 Q: Is it possible to learn without any meaning representations?

Clarke, Goldwasser, Chang, Roth 30

slide-74
SLIDE 74

Learning Behavior

Algorithm R250 Q250 NOLEARN 22.2 — DIRECT 75.2 69.2 AGGRESSIVE 82.4 73.2 SUPERVISED 87.6 80.4 Q: Is it possible to learn without any meaning representations? A: Yes! A: Learns to cover more of the Response data set. A: And only 7% below the SUPERVISED upper bound.

Clarke, Goldwasser, Chang, Roth 30

slide-75
SLIDE 75

Learning Behavior

Algorithm R250 Q250 NOLEARN 22.2 — DIRECT 75.2 69.2 AGGRESSIVE 82.4 73.2 SUPERVISED 87.6 80.4 Q: Is it possible to learn without any meaning representations? A: Yes! A: Learns to cover more of the Response data set. A: And only 7% below the SUPERVISED upper bound.

Clarke, Goldwasser, Chang, Roth 30

slide-76
SLIDE 76

Learning Behavior

Learning Iterations Accuracy on Response 250 1 2 3 4 5 6 10 20 30 40 50 60 70 80

DIRECT AGGRESSIVE Initialization AGGRESSIVE correctly interprets 16% that DIRECT does not. 9% vice-versa. Leaving only 9% incorrect.

Clarke, Goldwasser, Chang, Roth 31

slide-77
SLIDE 77

Learning from Indirect Supervision

Similar to indirect learning protocols: Learning a binary classifier with “hidden explanation”. Supervision

  • nly required for binary data. No labeled structures. NAACL 2010

(Chang, Goldwasser, Roth, Srikumar 2010a). Structured learning with binary and structured labels. Mix of supervision for binary data and structured data. Binary label indicates whether input has a “good” structure. ICML 2010 (Chang, Goldwasser, Roth, Srikumar 2010b).

Clarke, Goldwasser, Chang, Roth 32

slide-78
SLIDE 78

Conclusions

Contributions: Response Driven Learning. A new learning paradigm that doesn’t rely on annotated meaning representations. Supervised at the response level. Natural supervision signal. Two learning algorithms capable of working within response driven learning. A shallow semantic parsing model. Future work: Can we combine the two learning algorithms? Other semantic parsing domains? Response driven learning for other tasks?

Clarke, Goldwasser, Chang, Roth 33