Question answering CS685 Fall 2020 Advanced Natural Language - - PowerPoint PPT Presentation

question answering
SMART_READER_LITE
LIVE PREVIEW

Question answering CS685 Fall 2020 Advanced Natural Language - - PowerPoint PPT Presentation

Question answering CS685 Fall 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst some slides from Jordan Boyd-Graber, Jacob Devlin, and Chris Manning Stuff from


slide-1
SLIDE 1

Question answering

CS685 Fall 2020

Advanced Natural Language Processing

Mohit Iyyer

College of Information and Computer Sciences University of Massachusetts Amherst some slides from Jordan Boyd-Graber, Jacob Devlin, and Chris Manning

slide-2
SLIDE 2

Stuff from last time

  • HW0 grades published, good job!
  • HW1 coming soon :)
  • Project proposal feedback in early October
  • Exam pushed back to end of October
  • Thanks to whoever posted all those Notability tips in

the anonymous form!

slide-3
SLIDE 3

3

Who wrote the song “Kiss from a Rose”?

Question Analysis: POS/Parsing/NER Query Formulation/ Template Extraction Knowledge Base Search/ Candidate Answer Generation Answer Type Selection Evidence Retrieval/ Candidate Scoring Final Ranking

Seal

slide-4
SLIDE 4

4

Neural Network

External Knowledge

Classifier

Who wrote the song “Kiss from a Rose”? Seal

Can we replace all of these modules with a single neural network?

slide-5
SLIDE 5
  • factoid QA: the answer is a single entity / numeric
  • “who wrote the book “Dracula”?
  • non-factoid QA: answer is free text
  • “why is Dracula so evil?”
  • QA subtypes (could be factoid or non-factoid):
  • semantic parsing: question is mapped to a logical form

which is then executed over some database

  • “how many people did Dracula bite?”
  • reading comprehension: answer is a span of text within a

document (could be factoid or non-factoid)

  • community-based QA: question is answered by multiple

web users (e.g., Yahoo! Answers)

  • visual QA: questions about images
slide-6
SLIDE 6

Machine reading (“reading comprehension”)

slide-7
SLIDE 7

SQuAD

slide-8
SLIDE 8

Let’s look at the DRQA model (Chen et al., ACL 2017)

(pre-BERT)

slide-9
SLIDE 9

Big idea

slide-10
SLIDE 10

Start and End Probabilities

Pstart(i) ∝exp{~ piWs~ q} (1) Pend(i) ∝exp{~ piWe~ q} (2)

  • 1. A vector representing our question
  • 2. Vector representing each word in the query text
  • 3. Parameter: here’s the start/end of the answer
slide-11
SLIDE 11

Start and End Probabilities

Pstart(i) ∝exp{~ piWs~ q} (1) Pend(i) ∝exp{~ piWe~ q} (2)

  • 1. A vector representing our question
  • 2. Vector representing each word in the query text
  • 3. Parameter: here’s the start/end of the answer
slide-12
SLIDE 12

Start and End Probabilities

Pstart(i) ∝exp{~ piWs~ q} (1) Pend(i) ∝exp{~ piWe~ q} (2)

  • 1. A vector representing our question
  • 2. Vector representing each word in the query text
  • 3. Parameter: here’s the start/end of the answer
slide-13
SLIDE 13

Start and End Probabilities

Pstart(i) ∝exp{~ piWs~ q} (1) Pend(i) ∝exp{~ piWe~ q} (2)

  • 1. A vector representing our question
  • 2. Vector representing each word in the query text
  • 3. Parameter: here’s the start/end of the answer
slide-14
SLIDE 14

Start and End Probabilities

Pstart(i) ∝exp{~ piWs~ q} (1) Pend(i) ∝exp{~ piWe~ q} (2)

  • 1. A vector representing our question
  • 2. Vector representing each word in the query text
  • 3. Parameter: here’s the start/end of the answer
slide-15
SLIDE 15

Start and End Probabilities

Pstart(i) ∝exp{~ piWs~ q} (1) Pend(i) ∝exp{~ piWe~ q} (2)

  • 1. A vector representing our question
  • 2. Vector representing each word in the query text
  • 3. Parameter: here’s the start/end of the answer

How does this work at test-time?

slide-16
SLIDE 16

Stanford Attentive Reader++

32

Figure from SLP3: Chapter 23

Beyonce’s debut album

LSTM1 LSTM1 LSTM1 LSTM2 LSTM2 LSTM2

GloVe

PER NNP

When did Beyonce

Passage Question

LSTM1 LSTM1 LSTM1 LSTM2 LSTM2 LSTM2

GloVe GloVe GloVe

Attention Weighted sum similarity

q

p2 p3

similarity

q q

similarity

q-align1 GloVe GloVe

pstart(1) pend(1) pstart(3) pend(3) … …

O NN

GloVe GloVe q-align2

1 O NN

q-align3 GloVe GloVe

Att Att

p1 p1 p2 p3 ~ p1 p2 p3 ~ ~ q1 q2 q3

Training objective:

slide-17
SLIDE 17
  • 5. BiDAF: Bi-Directional Attention Flow for Machine Comprehension

(Seo, Kembhavi, Farhadi, Hajishirzi, ICLR 2017)

37

slide-18
SLIDE 18

Coattention Encoder

AQ AD

document product concat product

bi-LSTM bi-LSTM bi-LSTM bi-LSTM bi-LSTM

concat n+1 m+1

D: Q:

CQ CD ut

U:

slide-19
SLIDE 19

SQuAD v1.1 leaderboard, end of 2016 (Dec 6)

18

EM F1

slide-20
SLIDE 20

All of these models are trained from scratch on the SQuAD training set!!!

slide-21
SLIDE 21
slide-22
SLIDE 22
  • Simply concatenate the question and paragraph into a single sequence,

pass through BERT, and apply a softmax layer on the final layer token representations to predict start/end answer span boundaries

slide-23
SLIDE 23

SQuAD v1.1 leaderboard, 2019-02-07 – it’s solved!

slide-24
SLIDE 24

Transfer learning via BERT made most of the task-specific QA architectures obsolete

slide-25
SLIDE 25

SQuAD 2.0 Example

When did Genghis Khan kill Great Khan? Gold Answers: <No Answer> Prediction: 1234 [from Microsoft nlnet]

22

slide-26
SLIDE 26

SQuAD 2.0 leaderboard, 2019-02-07

23

EM F1

slide-27
SLIDE 27

SQuAD 2.0 leaderboard, 2019-02-07

24

slide-28
SLIDE 28

Good systems are great, but still basic NLU errors

What dynasty came before the Yuan? Gold Answers: Song dynasty Mongol Empire the Song dynasty Prediction: Ming dynasty [BERT (single model) (Google AI)]

slide-29
SLIDE 29

SQuAD limitations

  • SQuAD has a number of other key limitations too:
  • Only span-based answers (no yes/no, counting, implicit why)
  • Questions were constructed looking at the passages
  • Not genuine information needs
  • Generally greater lexical and syntactic matching between questions

and answer span than you get IRL

  • Barely any multi-fact/sentence inference beyond coreference
  • Nevertheless, it is a well-targeted, well-structured, clean dataset
  • It has been the most used and competed on QA dataset
  • It has also been a useful starting point for building systems in

industry (though in-domain data always really helps!)

slide-30
SLIDE 30

Several variants of the SQuAD style setup (all easily portable to BERT :)

slide-31
SLIDE 31

Conversational question answering: Multiple questions about the same document (answers still spans from the document) datasets: QuAC, CoQA, CSQA, etc

How do we use BERT to solve this task?

slide-32
SLIDE 32

Multi-hop question answering: Requires models to perform more “reasoning” over the document datasets: HotpotQA, QAngaroo

slide-33
SLIDE 33

long-form question answering: Answers must be generated, not extracted datasets: ELI5, NarrativeQA, etc

More on these later!

slide-34
SLIDE 34
  • pen-domain

question answering: a model must retrieve relevant documents and use them to generate an answer (no evidence given!)

The future of QA?

No supporting documents given to the model!!!

slide-35
SLIDE 35

All of these QA tasks are very similar… can we share information across different datasets to improve our performance across the board? (more next time!)

slide-36
SLIDE 36

finally… a real-world example

  • f deploying QA models
slide-37
SLIDE 37

Quiz Bowl

slide-38
SLIDE 38

what is quiz bowl?

  • a trivia game that contains questions about famous

entities (e.g., novels, battles, countries)

  • developed a deep learning system, QANTA, to play

quiz bowl

  • one of the first applications of deep learning to

question answering

Iyyer et al., EMNLP 2014 & ACL 2015

slide-39
SLIDE 39

This author described a "plank in reason" breaking and hitting a "world at every plunge" in a poem which opens "I felt a funeral in my brain." She wrote that "the stillness round my form was like the stillness in the air" in "I heard a fly buzz when I died." She wrote about a scarcely visible roof and a cornice that was "but a mound" in a poem about a carriage ride with Immortality and Death. For 10 points, name this reclusive "Belle of Amherst" who wrote "Because I could not stop for Death."

A: Emily Dickinson

slide-40
SLIDE 40

… name this reclusive "Belle of Amherst”…

NN classifier Emily Dickinson

slide-41
SLIDE 41

name this reclusive belle … …

softmax: predict Emily Dickinson out of a set of ~5000 answers Iyyer et al., EMNLP 2014

dependency-tree NNs

slide-42
SLIDE 42

simple discourse-level representations by averaging

In one novel, one of these figures antagonizes an impoverished family before leaping into an active volcano. Another of these figures titles a novella in which General Spielsdorf describes the circumstances of his niece Bertha Reinfeldt's death to the narrator, Laura. In addition to Varney and Carmilla, another of these figures sails on the Russian ship Demeter in order to reach London. That figure bites Lucy Westenra before being killed by a coalition including Jonathan Harker and Van Helsing. For 10 points, identify these bloodsucking beings most famously exemplified by Bram Stoker’s Dracula.

av =

n

X

i=1

ci n

slide-43
SLIDE 43

Of course, nowadays we would just put these questions into BERT and place a classifier over the [CLS] token to predict the answer!

slide-44
SLIDE 44

2015: defeated Ken Jennings 300-160

slide-45
SLIDE 45

2016: lost to top quiz bowlers 345-145

slide-46
SLIDE 46

2017: beat top quiz bowlers 260-215

slide-47
SLIDE 47

late 2017: crushed top team 475-185

slide-48
SLIDE 48

deep learning ~ memorization

during training, QANTA becomes very good at associating named entities in questions with answers…

That figure bites Lucy Westenra before being killed by a coalition including Jonathan Harker and Van Helsing.

Vampire

slide-49
SLIDE 49

deep learning ~ memorization

during training, QANTA becomes very good at associating named entities in questions with answers…

In one novel, one of these figures antagonizes an impoverished family before leaping into an active volcano.

???

These types of questions are still beyond the capabilities of our models