Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon - - PowerPoint PPT Presentation

real time open domain qa with dense sparse phrase index
SMART_READER_LITE
LIVE PREVIEW

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon - - PowerPoint PPT Presentation

Real-Time Open-Domain QA with Dense-Sparse Phrase Index Minjoon Seo*, Jinhyuk Lee*, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi * denotes equal contribution Open-domain QA? Some 1961 Model When was Obama born? 5 Million


slide-1
SLIDE 1

Real-Time Open-Domain QA with Dense-Sparse Phrase Index

Minjoon Seo*, Jinhyuk Lee*, Tom Kwiatkowski, Ankur Parikh, Ali Farhadi, Hannaneh Hajishirzi

* denotes equal contribution

slide-2
SLIDE 2

Open-domain QA?

slide-3
SLIDE 3

Some Model

When was Obama born? 1961

5 Million documents 3 Billion tokens

slide-4
SLIDE 4

Information Retrieval Reader (Model)

When was Obama born? 1961 Chen et al., 2017

Retrieve & Read

TF-IDF, BM-25, LSA

  • 1. Error propagation: reading only 5-10 docs
  • 2. Query-dependent encoding: 30s+ per query
slide-5
SLIDE 5

We want…

  • To ”read” entire Wikipedia
  • 5-10 docs à 5 Million docs
  • Reach long-tail answers
  • Fast inference on CPUs
  • 35s à 0.5s
  • Maintain high accuracy

HOW?

slide-6
SLIDE 6

Our approach: index phrases!

slide-7
SLIDE 7

Barack Obama … … (1961-present … … 44th President … … United States. Who is the 44th President of the U.S.? Nearest neighbor search When was Obama born? “Barack Obama (1961-present) was the 44th President of the United States.”

Phrase encoding

Question encoding

Phrase Indexing

Seo et al., 2018

slide-8
SLIDE 8

[-3, 0.1, …] [0.3, -0.2, …] [0.5, 0.1, …] [0.7, -0.4, …] [0.5, 0.0, …] [3.3, -2.2, …]

When was Obama born? Nearest neighbor search

[0.5, 0.1, …]

Document Indexing

  • Locality Sensitive Hashing (LSH)
  • aLSH (Shrivastava & Li, 2014)
  • HNSW (Malkov & Yashunin, 2018)
slide-9
SLIDE 9

! " = argmax

)

*+ ", -, . ! " = argmax

)

/+(-) 2 3+(", .)

Model phrase question document

Query-Agnostic Decomposition

Question encoder Phrase encoder

slide-10
SLIDE 10

Phrase (and question) Representation

  • Dense representation
  • Can utilize deep neural networks
  • great for capturing semantic and syntactic information
  • Not great for disambiguating ”Einstein” vs “Tesla”
  • Sparse representation (bag-of-word)
  • Great for capturing lexical information
  • Represent each phrase with a concatenation of both
slide-11
SLIDE 11

Dense-Sparse Phrase Index (DenSPI)

When was Barack Obama born?

Reader Model 1961

When was Barack Obama born?

Query vector for document Dense

1961

Sparse

Retrieve & Read

(Chen et al., 2017)

Phrase Index Document Index

DenSPI

Ours

N = 60 Billion N = 5 Million

slide-12
SLIDE 12

Dense Representation for Phrases

According Text Encoder (BERT) to the American Library Association

dot Start vector End vector Coherency vector Coherency scalar We want encoding for this phrase

phrase vector

slide-13
SLIDE 13

Dense Representation for Questions

[CLS] Text Encoder (BERT) When was Barack Obama born?

Start vector End vector Coherency vector Coherency scalar 1

question vector

slide-14
SLIDE 14

Sparse Representation

  • TF-IDF document & paragraph vector, computed over

Wikipedia

  • Unigram & Bigram (vocab size = 17 Million)
  • Adopted DrQA’s vocab/TF-IDF (Chen et al., 2017)
slide-15
SLIDE 15

Beware of the scale…

  • 60 Billion phrases in Wikipedia!
  • Training
  • Softmax on 60 Billion phrases?
  • Storage
  • 60 Billion phrases x 4 KB per phrase = 240 TB?
  • Search
  • Exact search on 60 Billion phrases?

We want to be open-research-friendly

slide-16
SLIDE 16

Training

  • Close-domain QA dataset: the model can easily overfit
  • e.g. ”who” question when only one named entity in the context
  • Negative sampling and concatenation
  • Sampling strategy is crucial
  • Use query encoder to associate similar questions in training set
  • Concatenate the context that the similar question belongs to
slide-17
SLIDE 17

Storage

  • 60 Billion phrases x 4 KB per phrase = 240 TB!
  • 1. Pointer: share start and end vectors
  • 240 TB à 12 TB
  • 2. Filter: 1-layer classifier on phrase vectors
  • 12 TB à 4.5 TB
  • 3. Scalar Quantization: 4 bytes à 1 byte per dim
  • 4.5 TB à 1.5 TB
slide-18
SLIDE 18

Search

  • An open-source library for large-scale dense+sparse nearest

neighbor search is non-existent

  • Dense-first search (DFS)
  • Sparse-first search (SFS)
  • Hybrid
slide-19
SLIDE 19

Experiments

slide-20
SLIDE 20

Weaver (Raison et al., 2018) BERTserini (Yang et al., 2019) 42% EM 39% EM DrQA (Chen et al., 2017) 30% EM DenSPI (Ours) 36% EM 35 s/Q

Open-Domain SQuAD

Red color is query- agnostic. 0.8 s/Q 144x 44x Multi-step reasoner (Das et al., 2019) 32% EM MINIMAL (Min et al., 2018) 35% EM 115 s/Q

slide-21
SLIDE 21

Qualitative Comparisons

Q: What can hurt a teacher’s mental and physical health? … and poor mental health can lead to problems such as substance abuse. Teachers face several

  • ccupational hazards in

their line of work, including occupational stress… Mental health Teacher Retrieve & Read (Chen et al., 2017) DenSPI (Ours)

slide-22
SLIDE 22

Q: Who was Kennedy’s science adviser that opposed manned spacecraft flights? Kennedy’s science advisor Jerome Wiesner, … his opposition to manned spaceflight … Apollo program … and the sun by NASA manager Abe Silverstein, who later said that … Apollo program Although Grumman wanted a second unmanned test, George Low decided … be manned. Apollo program Kennedy’s science advisor Jerome Wiesner, … his opposition to manned spaceflight … Apollo program Jerome Wiesner of MIT, who served as a … advisor to … Kennedy, … opponent of manned Space Race … science advisor Jerome Wiesner … strongly opposed to manned space exploration, … John F. Kennedy

slide-23
SLIDE 23

Q: What is the best thing to do when bored? I’m nearly bored to death Bored to Death (song) The twin tunnels were bored by … tunnel boring machine (TBM) … Waterview Connection It’s easier to say you’re bored, or to be angry, than it is to be sad. Bored to Death (song) When bored, she enjoys drawing. Big Brother 2 he can think of a much more fun thing he can do while on his back: painting. Angry Kid She is a live music goer, and her hobby is watching movies. Pearls Before Swine

slide-24
SLIDE 24

Demo

  • http://nlp.cs.washington.edu/denspi
slide-25
SLIDE 25

http://nlp.cs.washington.edu/denspi

slide-26
SLIDE 26

Conclusion

  • “Read” entire Wikipedia in 0.5s with CPUs
  • Query-agnostic, indexable phrase representations
  • Utilize both dense (BERT-based) and sparse (bag-of-word)

representations for encoding lexical, syntactic, and semantic information

  • 6,000x lower computational cost with higher accuracy for

exact search

  • At least 44x faster open-domain QA with higher accuracy
  • (query-agnostic) decomposability gap still exists (6-10%); we

hope future research can close the gap