Retrieval-augmented language models CS 685, Fall 2020 Advanced - PowerPoint PPT Presentation

Retrieval-augmented language models CS 685, Fall 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst

barbershop: 54% BERT barber: 20% Bob went to the <MASK> ( teacher ): salon: 6% to get a buzz cut 24 layer stylist: 4% Transformer …

World knowledge is implicitly encoded in BERT’s parameters! (e.g., that barbershops are places to get buzz cuts) barbershop: 54% BERT barber: 20% Bob went to the <MASK> ( teacher ): salon: 6% to get a buzz cut 24 layer stylist: 4% Transformer …

Guu et al., 2020 (“REALM”)

One option: condition predictions on explicit knowledge graphs Wang et al., 2019

Pros / cons • Explicit graph structure makes KGs easy to navigate • Knowledge graphs are expensive to produce at scale • Automatic knowledge graph induction is an open research problem • Knowledge graphs struggle to encode complex relations between entities

Another source of knowledge: unstructured text! • Readily available at scale, requires no processing • We have powerful methods of encoding semantics (e.g., BERT) • However, these methods don’t really work with larger units of text (e.g., books) • Extracting relevant information from unstructured text is more difficult than it is with KGs

How can we train this retriever???

Knowledge- augmented encoder Neural knowledge retriever

Embed function is just BERT!

Isn’t training the retriever extremely expensive? Imagine if your knowledge corpus was every article in Wikipedia… this would be super expensive without the approximation

Maximum inner product search (MIPS) • Algorithms that approximately find the top- k documents • Scales sub-linearly with the number of documents (both time and storage) • Shrivastava and Li, 2014 (“Asymmetric LSH…”) • Requires precomputing the BERT embedding of every document in the knowledge corpus and then building an index over the embeddings

Need to refresh the index! • We are training the parameters of the retriever, i.e., the BERT architecture that produces Embed doc (z) • If we precompute all of the embeddings, the search index becomes stale when we update the parameters of the retriever • REALM solution: asynchronously refresh the index by re-embedding all docs after a few hundred training iterations

Other tricks in REALM • Salient span masking : mask out spans of text corresponding to named entities and dates • Null document : always include an empty document in the top- k retrieved docs, allowing the model to rely on its implicit knowledge as well

Evaluation on open-domain QA • Unlike SQuAD-style QA, in open-domain QA we are only given a question, not a supporting document that is guaranteed to contain the answer • Open-domain QA generally has a large retrieval component, since the answer to any given question could occur anywhere in a large collection of documents

Can retrieval-augmented LMs improve other tasks?

Nearest-neighbor machine translation Khandelwal et al., 2020

Nearest-neighbor machine translation Final kNN distribution Khandelwal et al., 2020

Interpolate between kNN prediction and decoder’s actual prediction Final kNN Decoder’s predicted distribution distribution Khandelwal et al., 2020

Unlike REALM, this approach doesn’t require any training! It retrieves the kNNs via L2 distance using a fast kNN library (FAISS)

This is quite expensive!

But also increases translation quality!

Can make it faster by using a smaller datastore

Retrieval-augmented language models CS 685, Fall 2020 Advanced - PowerPoint PPT Presentation

Retrieval-augmented language models CS 685, Fall 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst barbershop: 54% BERT barber: 20% Bob went to the

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

IMPACT OF AUGMENTED REALITY ON SOCIETY BY DEREK MANDL AND STEPHEN SLADEK WHAT IS AUGMENTED

Models for Models for Retrieval and Browsing Retrieval and Browsing - Structural Models and

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Models for Models for Retrieval and Browsing Retrieval and Browsing - Fuzzy Set, Extended

Cross-Language Information Retrieval Carol Peters ISTI-CNR, Pisa Cross-Language Information

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

1/08/2012 Augmented Reality How Does This Technology Fit in the Commercial World? Augmented

Portfolio of Work (9 pages) T H E N E X T R E V O L U T I O N I N R E T A I L AUGMENTED

ubiquitous computing and augmented realities virtual and augmented reality m aking the

AUGMENTED REALITY A complete overview of what augmented reality is and how it will revolutionize

Lecture on spectroscopy and applications (Brno 9.02.17) Stephane Vennes Astronomical Institute

Agents and Environments Berlin Chen 2004 Reference: 1. S. Russell and P. Norvig. Artificial

CAMERA BASICS 353#01&&&&&slide&0& Contact Information Jamie

INTRODUCTION CHRIS WOOD CREATIVE WAYS TAEKWONDO - MILTON KEYNES 10 YEAR ANNIVERSARY IN JUNE 2020

YJTI at the NTCIR-13 STC Japanese Subtask Dec. 7, 2017 Toru Shimizu 1 Overview

FAST-Mag Audience Inquiry Double blind, placebo controlled trial of IV 1. None of the clot

Lexical Semantics Ling571 Deep Processing Techniques for NLP February 13, 2017 Roadmap

Change of Supplier Expert Group Meeting 4 22 July 2013 1 Rachel Hay ACCESS TO METERING DATA

Retrieval-augmented language models CS 685, Fall 2020 Advanced - PowerPoint PPT Presentation

Retrieval-augmented language models CS 685, Fall 2020 Advanced Natural Language Processing Mohit Iyyer College of Information and Computer Sciences University of Massachusetts Amherst barbershop: 54% BERT barber: 20% Bob went to the

CS54701: Information Retrieval CS-54701 Information Retrieval Retrieval Models: Language models

Network performance requirements of Augmented Reality Systems Mike P. Wittie 1 Augmented

Model Divergence Retrieval LM, session 10 CS6200: Information Retrieval Slides by: Jesse

Retrieval Models: Outline CS490W: Web I nformation Search &amp; Management Retrieval Models

IMPACT OF AUGMENTED REALITY ON SOCIETY BY DEREK MANDL AND STEPHEN SLADEK WHAT IS AUGMENTED

Models for Models for Retrieval and Browsing Retrieval and Browsing - Structural Models and

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web

Models for Models for Retrieval and Browsing Retrieval and Browsing - Fuzzy Set, Extended

Cross-Language Information Retrieval Carol Peters ISTI-CNR, Pisa Cross-Language Information

Retrieval by Content Part 2: Text Retrieval Term Frequency and Inverse Document Frequency

Retrieval by Content Image Retrieval Image Retrieval Problem Large Image and video data sets

Information Retrieval Introducing Information Retrieval and Web Search Information Retrieval

1/08/2012 Augmented Reality How Does This Technology Fit in the Commercial World? Augmented

Portfolio of Work (9 pages) T H E N E X T R E V O L U T I O N I N R E T A I L AUGMENTED

ubiquitous computing and augmented realities virtual and augmented reality m aking the

AUGMENTED REALITY A complete overview of what augmented reality is and how it will revolutionize

Lecture on spectroscopy and applications (Brno 9.02.17) Stephane Vennes Astronomical Institute

Agents and Environments Berlin Chen 2004 Reference: 1. S. Russell and P. Norvig. Artificial

CAMERA BASICS 353#01&amp;&amp;&amp;&amp;&amp;slide&amp;0&amp; Contact Information Jamie

INTRODUCTION CHRIS WOOD CREATIVE WAYS TAEKWONDO - MILTON KEYNES 10 YEAR ANNIVERSARY IN JUNE 2020

YJTI at the NTCIR-13 STC Japanese Subtask Dec. 7, 2017 Toru Shimizu 1 Overview

FAST-Mag Audience Inquiry Double blind, placebo controlled trial of IV 1. None of the clot

Lexical Semantics Ling571 Deep Processing Techniques for NLP February 13, 2017 Roadmap

Change of Supplier Expert Group Meeting 4 22 July 2013 1 Rachel Hay ACCESS TO METERING DATA

Retrieval Models: Outline CS490W: Web I nformation Search & Management Retrieval Models

CAMERA BASICS 353#01&&&&&slide&0& Contact Information Jamie