Multi-Hop RC, HotpotQA & GNNs Select, Answer and Explain: - PowerPoint PPT Presentation

Multi-Hop RC, HotpotQA & GNNs Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents – Tu et al., AAAI 2020 Presented By: Lovish Madaan

References • HotpotQA - Peng Qi (Stanford) • GNNs - Jure Leskovec (Stanford), AAAI 2019 Tutorial by William Hamilton (McGill) • Some elements and images borrowed from Tu et al. (AAAI 2020), Yang et al. (EMNLP 2018), and Jay Alammar

Topics • Introduction and HotpotQA • Select, Answer and Explain • GNNs • Answer and Explain • Results and Ablation Study • Reviews

The Promise of Question Answering In which city was Facebook first launched? Cambridge, Massachusetts. This is because Mark Zuckerberg and his business partners launched it from his Harvard dormitory [1], and Harvard is located in Cambridge, Massachusetts [2]. [1] https://en.wikipedia.org/wiki/Mark_Zuckerberg [2] https://en.wikipedia.org/wiki/Harvard_University

Reality The Promise of Question Answering In which city was Facebook first launched? Cambridge, Massachusetts. This is because Mark Zuckerberg and his business partners launched it from his Harvard dormitory [1], and Harvard is located in Cambridge, Massachusetts [2]. [1] https://en.wikipedia.org/wiki/Mark_Zuckerberg [2] https://en.wikipedia.org/wiki/Harvard_University Sorry, folks from Google!

The Promise of Question Answering In which city was Multi-hop reasoning Facebook first launched? Cambridge, Massachusetts. This is because Mark Zuckerberg and his business partners launched it from his Harvard dormitory [1], and Harvard is located in Cambridge, Massachusetts [2]. [1] https://en.wikipedia.org/wiki/Mark_Zuckerberg [2] https://en.wikipedia.org/wiki/Harvard_University

The Promise of Question Answering In which city was Multi-hop reasoning Facebook first launched? Cambridge, Massachusetts. This is because Mark Zuckerberg and his business partners launched it from Text-based, diverse his Harvard dormitory [1], and Harvard is located in Cambridge, Massachusetts [2]. [1] https://en.wikipedia.org/wiki/Mark_Zuckerberg [2] https://en.wikipedia.org/wiki/Harvard_University

The Promise of Question Answering In which city was Multi-hop reasoning Explainability Facebook first launched? Cambridge, Massachusetts. This is because Mark Zuckerberg and his business partners launched it from Text-based, diverse his Harvard dormitory [1], and Harvard is located in Cambridge, Massachusetts [2]. [1] https://en.wikipedia.org/wiki/Mark_Zuckerberg [2] https://en.wikipedia.org/wiki/Harvard_University

Multi-hop reasoning Explainability HotpotQA Text-based, diverse Comparison Questions

Multi-hop Reasoning across Multiple Documents • Previous work (SQuAD, • HotpotQA TriviaQA, etc) When was Chris Martin born? When was the lead singer of Coldplay born? (Rajpurkar et al., 2016; Joshi et al., 2017; Dunn et al., 2017)

Explainability • Previous work • HotpotQA Answer Answer Sup fact 1 Sup fact 2

Evaluation Settings • Distractor Setting • 2 gold paragraphs + 8 extracted from information retrieval • Fullwiki Setting • Entire Wikipedia as context

• Types of Instances • Bridge Entity Questions • Comparison Questions

Multi-hop RC – Previous Works • Adapt techniques from single-hop QA • Use Graph Neural Networks (GNNs) • Cao et al., 2018 – Build entity graph and realize multi-hop reasoning

Shortcomings – Previous Works • Concatenate multiple documents / Process documents separately • No document filters • Current application of GNNs • Entities as nodes – either pre specified / use NER • Further processing if answer is not an entity

Select, Answer and Explain (SAE)

Preprocessing & Inputs • Question and set of documents • Answer text • Set of labelled support sentences from each document • Label corresponding to each document - 𝐸 𝑗 (0/1) • Answer type – (“Span” / “Yes” / “No”)

Select Module • [CLS] + Q + [SEP] + D + [SEP] • One Approach – Use BCE with [CLS] embeddings as features • Neglects inter-document interactions

MHSA – Single Attention Head X – matrix of [CLS] embeddings of question/document pairs

MHSA – Multiple Attention Heads Output is the matrix of modified [CLS] embeddings having contextual information

Pairwise Bi-Linear Layer • 𝑇 𝐸 𝑗 - Score for each document (0/1/2) • 𝑚 𝑗,𝑘 = 1 𝑗𝑔 𝑇 𝐸 𝑗 > 𝑇 𝐸 𝑘 0 𝑗𝑔 𝑇 𝐸 𝑗 ≤ 𝑇(𝐸 𝑘 ) 𝑜 𝑗 • 𝑀 = − 𝑗=0 𝑘=0,𝑘≠𝑗 𝑚 𝑗,𝑘 log(𝑄 𝐸 𝑗 , 𝐸 𝑘 ) + (1 − 𝑚 𝑗,𝑘 )log(1 − 𝑄 𝐸 𝑗 , 𝐸 𝑘 ) 𝑜 Ι 𝑄 𝐸 𝑗 , 𝐸 • 𝑆 𝑗 = 𝑘 𝑘 > 0.5 - Relevance score for each document • Take top-k documents according to this relevance score

Answer Prediction • Gold Documents extracted from Select Module • [CLS] + Q + [SEP] + Context + [SEP] 𝐼 𝑗 ∈ ℝ 𝑀 × 𝑒 BERT 2-Layer 𝑍 ∈ ℝ 𝑀 × 2 MLP ( 𝑔 𝑡𝑞𝑏𝑜 )

Contextual Sentence Embeddings • Sentence Representation: • Self Attention Weights: Weighted 2-layer MLP [0/1 Label] Representation ( 𝑔 𝑏𝑢𝑢 )

Contextual Sentence Embeddings - 2 • Motivation for adding start and end span probabilities • Answer span -> Supporting Sentence • Final sentence embeddings:

Sentence Graph • Construct a graph with the following properties: • Nodes represent the sentences • Each node has label 0/1 (supporting sentence) • 3 types of edges • Between nodes present in the same document (Type 1) • Between nodes of different documents if they have named entities / noun phrases (can be different) present in the question (Type 2) • Between nodes of different documents if they have the same named entity / noun phrase (Type 3)

Sentence Graph

Images T ext/Speech Modern deep learning toolbox is designed for simple sequences & grids Jure Leskovec, Stanford University 11

The Math § Average neighbor messages and apply a neural network. Initial “ layer 0 ” embeddings are previous layer equal to node features h 0 embedding of v v = x v 0 1 X h k − 1 N ( v ) | + B k h k − 1 @ W k A , 8 k > 0 h k u v = σ v | u 2 N ( v ) kth layer embedding non-linearity (e.g., average of neighbor ’ s of v ReLU or tanh) previous layer embeddings 19 Tutorial on Graph Representation Learning, AAAI 2019

Graph Attention Networks § Augment basic graph neural network model with attention. X ↵ v,u W k h k − 1 h k v = σ ( ) u u 2 N ( v ) [ { v } Non-linearity Sum over all neighbors (and the node itself) Tutorial on Graph Representation Learning, AAAI 2019 60

Training the Model § z A u Tutorial on Graph Representation Learning, AAAI 2019 20

Training the Model trainable matrices h 0 v = x v (i.e., what we learn) 0 1 X h k − 1 + B k h k − 1 @ W k A , 8 k 2 { 1 , . h k u v = σ , K } . . v | N ( v ) | u 2 N ( v ) z v = h K v § After K-layers of neighborhood aggregation, we get output embeddings for each node. § and run stochastic gradient descent to train the aggregation parameters. Tutorial on Graph Representation Learning, AAAI 2019 21

Training the Model § : Directly train the model for a supervised task (e.g., node classification): classification weights X v ✓ )) + (1 − y v ) log(1 − σ ( z > v ✓ )) y v log( σ ( z > L = v 2 V output node embedding node class label Tutorial on Graph Representation Learning, AAAI 2019 24

Overview of Model z A u Tutorial on Graph Representation Learning, AAAI 2019 25

Overview of Model Tutorial on Graph Representation Learning, AAAI 2019 26

Overview of Model Tutorial on Graph Representation Learning, AAAI 2019 27

Aggregation mechanism in SAE

Graph Representation • Weighted sum of the embeddings of the nodes of the graph • The weights are given by

Answer and Explain Pipeline

Dataset Details • Train – 90K • Validation/Dev – 7.4K • Test – 7.4K

Results

Ablation Study – Document Selection Module

Ablation Study – Answer & Explain Module

Ablation Study – Bridge / Comp. Questions

Attention Heatmap Example Question - “Were Scott Derrickson and Ed Wood of the same nationality?”

HotpotQA Leaderboard

Multi-Hop RC, HotpotQA & GNNs Select, Answer and Explain: - PowerPoint PPT Presentation

Multi-Hop RC, HotpotQA & GNNs Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents Tu et al., AAAI 2020 Presented By: Lovish Madaan References HotpotQA - Peng Qi (Stanford) GNNs -

HIP HOP NARRATIVES: POWER, PRIVILEGE AND PREJUDICE Diego R. Mancha THE CONCEPTS Hip Hop

Use of OSPF-MDR in Single-Hop Broadcast Networks draft-ogier-ospf-manet-single-hop-00 Richard

One Hop Lookups Plugin for RELOAD IETF81@Quebec, Canada draft-peng-p2psip-one-hop-plugin-00 Jin

MPLS Basics Penultimate Hop Popping How a router determines the outgoing interface: Last Hop

Latency-Reliability Tradeoff for Different Hop-Level ARQ-based Error Recovery in a Multi-Hop

Multipath Load Balancing in Multi-Hop Wireless Networks Evan Jones Martin Karsten Paul Ward

Comparison of Routing Metrics for Static Multi-Hop Wireless Networks Richard Draves, Jitendra

Opportunistic Routing in Multi-hop Wireless Networks Sanjit Biswas and Robert Morris MIT CSAIL

Hop Farm Brewing Company Production and Tap Room Expansion WHo IS Hop FARM? Began long ago when

FHWA/KTMPO LIVABILITY WORKSHOP March 5, 2018 OUTLINE / TALKING POINTS Who is The HOP?

Credit to: Ed Hamrick & Mike Brennan 1. Basics Hop structure History Hop

Tracking Heaps that Hop with Heap-Hop Jules Villard 1 , 3 tienne Lozes 2 , 3 Cristiano Calcagno 4

Tag/Label Switching CS598: Advanced Internet Presented by: Imranul Hoque 1 How to go from A to

Origins of Hip Hop Jacob Original Gangsta Chen The setting for Hip Hop 1973 Oil

GNNs and Graph Processing Oliver Hope 1 / 6 Introduction What is a GNN? A type of neural

Graph Representation Learning: Embedding, GNNs, and Pre-Training Yuxiao Dong

Counting (on) views Page views on Wikipedia Christian Aistleitner christian@quelltextlich.at

The Care and Feeding of Your Engineering Manager Jenna Zeigen MongoDB World 2018

A Mobile Tools and Services Platform for Formal and Informal Learning http://mobiled.uiah.fi/

servload: Generating Representative Workloads for Web Server Benchmarking Jrg Zinke, Jan

3D Delaunay Triangula.on Libin Lu, Weijing Liu, Zhuoheng Yang

Efficiency Announcements Measuring Efficiency Recursive Computation of the Fibonacci Sequence

Collaborative writing online help, manuals, vignettes for packages books about R,

CSE 484 / CSE M 584 Computer Security: Cryptography TA:

Multi-Hop RC, HotpotQA & GNNs Select, Answer and Explain: - PowerPoint PPT Presentation

Multi-Hop RC, HotpotQA & GNNs Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents Tu et al., AAAI 2020 Presented By: Lovish Madaan References HotpotQA - Peng Qi (Stanford) GNNs -

HIP HOP NARRATIVES: POWER, PRIVILEGE AND PREJUDICE Diego R. Mancha THE CONCEPTS Hip Hop

Use of OSPF-MDR in Single-Hop Broadcast Networks draft-ogier-ospf-manet-single-hop-00 Richard

One Hop Lookups Plugin for RELOAD IETF81@Quebec, Canada draft-peng-p2psip-one-hop-plugin-00 Jin

MPLS Basics Penultimate Hop Popping How a router determines the outgoing interface: Last Hop

Latency-Reliability Tradeoff for Different Hop-Level ARQ-based Error Recovery in a Multi-Hop

Multipath Load Balancing in Multi-Hop Wireless Networks Evan Jones Martin Karsten Paul Ward

Comparison of Routing Metrics for Static Multi-Hop Wireless Networks Richard Draves, Jitendra

Opportunistic Routing in Multi-hop Wireless Networks Sanjit Biswas and Robert Morris MIT CSAIL

Hop Farm Brewing Company Production and Tap Room Expansion WHo IS Hop FARM? Began long ago when

FHWA/KTMPO LIVABILITY WORKSHOP March 5, 2018 OUTLINE / TALKING POINTS Who is The HOP?

Credit to: Ed Hamrick &amp; Mike Brennan 1. Basics Hop structure History Hop

Tracking Heaps that Hop with Heap-Hop Jules Villard 1 , 3 tienne Lozes 2 , 3 Cristiano Calcagno 4

Tag/Label Switching CS598: Advanced Internet Presented by: Imranul Hoque 1 How to go from A to

Origins of Hip Hop Jacob Original Gangsta Chen The setting for Hip Hop 1973 Oil

GNNs and Graph Processing Oliver Hope 1 / 6 Introduction What is a GNN? A type of neural

Graph Representation Learning: Embedding, GNNs, and Pre-Training Yuxiao Dong

Counting (on) views Page views on Wikipedia Christian Aistleitner christian@quelltextlich.at

The Care and Feeding of Your Engineering Manager Jenna Zeigen MongoDB World 2018

A Mobile Tools and Services Platform for Formal and Informal Learning http://mobiled.uiah.fi/

servload: Generating Representative Workloads for Web Server Benchmarking Jrg Zinke, Jan

3D Delaunay Triangula.on Libin Lu, Weijing Liu, Zhuoheng Yang

Efficiency Announcements Measuring Efficiency Recursive Computation of the Fibonacci Sequence

Collaborative writing online help, manuals, vignettes for packages books about R,

CSE 484 / CSE M 584 Computer Security: Cryptography TA:

Credit to: Ed Hamrick & Mike Brennan 1. Basics Hop structure History Hop