lti The Goal Input: educational text Output: quiz lti The - - PowerPoint PPT Presentation

lti
SMART_READER_LITE
LIVE PREVIEW

lti The Goal Input: educational text Output: quiz lti The - - PowerPoint PPT Presentation

Good Question! Statistical Ranking for Question Generation Michael Heilman and Noah A. Smith lti The Goal Input: educational text Output: quiz lti The Goal Input: educational text Output: quiz Output: ranked list


slide-1
SLIDE 1

Good Question!

Statistical Ranking for Question Generation

lti

  • Michael Heilman and Noah A. Smith
slide-2
SLIDE 2

The Goal

Input: educational text Output: quiz

lti

slide-3
SLIDE 3

The Goal

Input: educational text Output: quiz Output: ranked list of candidate questions

to present to a teacher

lti

Text-to-text generation

  • Knight & Marcu, 00; Clarke, 06 (Compression);

Barzilay & McKeown, 05 (Sentence Fusion); Callison-Burch, 07 (Paraphrase Generation); inter alia

slide-4
SLIDE 4

Our Approach

Sentence-level factual questions Acceptable (e.g., grammatical) questions QG as a series of sentence structure

transformations

lti

slide-5
SLIDE 5

Outline

Challenges in Question Generation (QG) Implementation Details Step-by-Step Example Rating Questions

Ranking Model

lti

Ranking Model Experiments

slide-6
SLIDE 6

Constraints on WH movement

Darwin studied how species evolve. Who studied how species evolve? *What did Darwin study how evolve? lti

WH movement is well studied. We encode this linguistic knowledge with

rules.

  • Ross, 67;

Chomsky, 77; inter alia

slide-7
SLIDE 7

Complex Input Sentences

Lincoln, who was born in Kentucky, moved to Illinois in 1831. Intermediate Form: Lincoln was born in Kentucky. Where was Lincoln born? lti

  • Step 1:

Extraction of Simple Factual Statements Step 2: Transformation into Questions

Rule-based

slide-8
SLIDE 8

Vague and Awkward Questions, etc.

Where was Lincoln born? Lincoln, who faced many challenges… What did Lincoln face? Lincoln, who was born in Kentucky… Weak predictors: # proper nouns, WH word, transformations, etc. lti

  • Step 1:

Extraction of Simple Factual Statements Step 2: Transformation into Questions Step 3: Statistical Ranking

Rule-based Learned from labeled

  • utput from steps 1&2
slide-9
SLIDE 9

Connections to Prior Work on QG

Most prior work:

  • Sentence-level factual questions
  • Syntactic rules for transformation or extraction
  • Generation in a single step

Contributions:

Mitkov & Ha, 03; Kunichika et al., 04; Gates, 08; inter alia

lti

Contributions:

  • Multi-step framework
  • Ranking model learned from labeled output
  • QG evaluation methodology with broad-

domain corpora

  • Overgeneration and Ranking for NLG:

Langkilde & Knight 98; Walker et al., 01

slide-10
SLIDE 10

Outline

Challenges in QG Implementation Details Step-by-Step Example Rating Questions

Ranking Model

lti

Ranking Model Experiments

slide-11
SLIDE 11

Implementation Details

We use BBN Indentifinder to find entity

labels, and map these to WH words.

  • PERSON -> Who
  • LOCATION -> Where
  • etc.

Bikel et al., 99

lti

  • etc.

We use phrase structure parses from

Stanford Parser.

We encode transformations in the Tregex

tree searching language.

  • Klein & Manning, 03

Levy & Andrew, 06

slide-12
SLIDE 12

Example Tregex Rule

Constraint: Phrases dominated by a clause with a WH-complementizer cannot undergo movement.

SBAR < /ˆWH.*P$/ << NP|ADJP|VP|ADVP|PP=unmv

SBAR

… “<” denotes dominance lti

  • * What did Darwin

study how _ evolve? Darwin studied how species evolve.

SBAR WHAVP VP S WRB NP More details on rules in technical report: M. Heilman and N. A. Smith.

  • 2009. Question Generation via Overgenerating Transformations and Ranking.

“<” denotes dominance

slide-13
SLIDE 13

Outline

Challenges in QG Implementation Details Step-by-Step Example Rating Questions

Ranking Model

lti

Ranking Model Experiments

slide-14
SLIDE 14

During the Gold Rush years in northern California, Los Angeles became known as the "Queen of the Cow Counties" for its role in supplying beef and other foodstuffs

… …

Preprocessing During the Gold Rush years in northern California, Los Angeles became known as the "Queen of the Cow Counties" for its role in supplying beef and other foodstuffs (other candidates)

lti

  • role in supplying beef and other foodstuffs

to hungry miners in the north. Los Angeles became known as the "Queen of the Cow Counties" for its role in supplying beef and other foodstuffs to hungry miners in the north.

… …

Extraction of Simplified Factual Statements role in supplying beef and other foodstuffs to hungry miners in the north.

slide-15
SLIDE 15

Los Angeles became known as the "Queen of the Cow Counties" for (Answer Phrase: its role in…) Los Angeles became known as the "Queen of the Cow Counties" for its role in supplying beef and

  • ther foodstuffs to hungry miners in the north.

… …

Answer Phrase Selection Main Verb Decomposition Los Angeles became known as the "Queen of the Cow Counties" for its role in supplying beef and

  • ther foodstuffs to hungry miners in the north.

Los Angeles became known as the "Queen of the Cow Counties" for (Answer Phrase: its role in…)

lti

  • Los Angeles did become known as the

"Queen of the Cow Counties" for (Answer Phrase: its role in…) Did Los Angeles become known as the "Queen of the Cow Counties" for (Answer Phrase: its role in…) Main Verb Decomposition Subject Auxiliary Inversion

slide-16
SLIDE 16

Did Los Angeles become known as the "Queen of the Cow Counties" for (Answer Phrase: its role in…) What did Los Angeles become known as the "Queen of the Cow Counties" for? Movement and Insertion of Question Phrase

lti

  • 1. What became known as…?
  • 2. What did Los Angeles become known as

the "Queen of the Cow Counties" for?

  • 3. Whose role in supplying beef…?
  • 4. …

… … …

Question Ranking

slide-17
SLIDE 17

Outline

Challenges in QG Implementation Details Step-by-Step Example Rating Questions

Ranking Model

lti

Ranking Model Experiments

slide-18
SLIDE 18

Rating Questions

We use rated questions to…

  • Learn a ranking model
  • Evaluate our system

lti

slide-19
SLIDE 19

Sources of Data

Existing datasets of questions?

  • Not focused on sentence-level facts
  • Lack negative examples
  • Noisy (e.g., Yahoo questions)
  • Relatively small

Potential

lti

  • Relatively small

Tailored data set: annotators rated output

from the overgeneration steps 1&2.

  • Potential

future work

slide-20
SLIDE 20

8 possible deficiencies

  • ungrammatical, vague, wrong WH word,…

Binary rating for each

Rating Scheme

lti

No deficiencies: Any deficiencies: “Moderate” agreement

  • )

42 . ( = κ

slide-21
SLIDE 21

Corpora

English Wikipedia Simple English Wikipedia Wall Street Journal (PTB Sec. 23) Total Texts 14 18 10 42 Questions 1,448 1,313 474 3,235

lti

  • Testing

Training 428 questions 6 texts 2,807 questions 36 texts

slide-22
SLIDE 22

Outline

Challenges in QG Implementation Details Step-by-Step Example Rating Questions

Ranking Model

lti

Ranking Model Experiments

slide-23
SLIDE 23

Ranking Model

Logistic Regression

  • Params. are estimated by optimizing L2

regularized conditional log-likelihood.

{ ∈ y } ,

lti

regularized conditional log-likelihood.

  • We use a variant of Newton’s method.

To rank, sort by

  • le Cessie & Houwelingen, 97

P( )

slide-24
SLIDE 24

Surface Features

WH words in question Negation words in question Language model probabilities Sentence lengths

lti

  • Separate features for question,

source sentence, answer phrase

slide-25
SLIDE 25

Features based on Syntactic Analysis

Grammatical categories

  • Numbers of POS tags, NPs, VPs, etc.

Transformations

  • E.g., extracted from relative clause

“Vague NP”

lti

  • Counts of NPs headed by common nouns and with no

modifiers

  • 1.0 for “the president”
  • 0.0 for “Abraham Lincoln” or “the U.S. president

during the Civil War”

slide-26
SLIDE 26

Outline

Challenges in QG Implementation Details Step-by-Step Example Rating Questions

Ranking Model

lti

Ranking Model Experiments

slide-27
SLIDE 27

Evaluation Metric

Percentage of top-ranked test set questions that were rated acceptable ( )

lti

slide-28
SLIDE 28

Rankers & Baselines

Ranker with all features Ranker with surface features

  • only sentence lengths, WH words, negation,

language model log probabilities.

Expected random (i.e., no ranking)

Training Training

lti

Expected random (i.e., no ranking) Oracle

slide-29
SLIDE 29

40% 50% 60% 70%

d Acceptable

Oracle All Features Surface Features Expected Random

Ranking Results

Testing

Noisy at top ranks. lti 20% 30% 100 200 300 400

  • Pct. Rated

Number of Top-Ranked Questions

  • All Features performed significantly better than

Surface Features (p < .05).

slide-30
SLIDE 30

Ablation Experiments

Feature Set % Acceptable inTop Ranked Fifth All Features 52.3 All – Length 52.3 All – Negation 51.7 All – Lang. Model 51.2

lti

  • All – Lang. Model

51.2 All – WH 50.6 All – Vagueness 48.3 All – Transforms 46.5 All – Grammatical 43.2

slide-31
SLIDE 31

Conclusions

Overgeneration and ranking for QG.

  • Rules encode linguistic knowledge
  • Statistical ranker captures trends not easily

encoded with rules

Statistical ranking improved top-ranked

lti

Statistical ranking improved top-ranked

  • utput.
slide-32
SLIDE 32

Questions?

Generated from our paper’s abstract:

Which challenge do we address? Who use manually written rules to perform a sequence of general

purpose syntactic transformations to turn declarative sentences into

lti

purpose syntactic transformations to turn declarative sentences into questions?

Is our approach to overgenerate questions, then rank them? What kind of regression model are these questions then ranked by? What do experimental results show that ranking nearly doubles? What kind of results show that ranking nearly doubles the

percentage of questions rated as acceptable by annotators ranked 20 % of questions?