Commonsense Knowledge in Pre-trained Language Models Vered - PowerPoint PPT Presentation

Commonsense Knowledge in Pre-trained Language Models Vered Shwartz July 5th, 2020

Commonsense Knowledge in If I lean on Ernie my back will Pre-trained hurt less Language Models Vered Shwartz July 5th, 2020

Commonsense Elmo will feel appreciated if I give him a Knowledge in If I lean on flower Ernie my back will Pre-trained hurt less Language Models Vered Shwartz July 5th, 2020

om nom nom! Commonsense Elmo will feel appreciated if I give him a Knowledge in If I lean on flower Ernie my back will Pre-trained hurt less Language Models Vered Shwartz July 5th, 2020

Do pre-trained LMs already capture commonsense knowledge? 5

To fine-tune or not to fine-tune, that is the question 6

To fine-tune or not to fine-tune, To fine-tune or not to fine-tune, that is the question that is the question Out-of-the box 7

Knowledge-base Completion Converting KB relations to natural language templates and using LMs to query / score LMs: Templates: KBs: Conclusion: 8

Knowledge-base Completion Converting KB relations to natural language templates and using LMs to query / score ● Petroni et al. (2019) : ○ ELMo / BERT LMs: ○ Hand-crafted templates Templates: ○ ConceptNet and Wikidata KBs: ○ BERT performs well but all models Conclusion: perform poorly on many-to-many relations 9

Knowledge-base Completion Converting KB relations to natural language templates and using LMs to query / score ● Feldman et al. (2019) : ● Petroni et al. (2019) : ○ BERT ○ ELMo / BERT LMs: ○ Hand-crafted templates scored by GPT2 ○ Hand-crafted templates Templates: ○ ConceptNet, mining from Wikipedia ○ ConceptNet and Wikidata KBs: ○ Performs worse than supervised methods ○ BERT performs well but all models Conclusion: on ConceptNet but is more likely to perform poorly on many-to-many generalize to different domains relations 10

Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? 11

Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? A has fur. 12

Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? A has fur. 13

Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? A has fur. A has fur, is big, and has claws. 14

Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? A has fur. A has fur, is big, and has claws. A has fur, is big, and has claws, has teeth, is an animal, ... 15

Properties of Concepts (Weir et al., 2020) 1) Do pre-trained LMs correctly distinguish concepts associated with a given set of assumed properties? ● Good performance, RoBERTa > BERT ● Perceptual (e.g. visual) < non-perceptual (e.g. encyclopaedic or functional) - can’t be learned from texts alone ● Highly-ranked incorrect answers typically apply to a subset of properties 16

Properties of Concepts (Weir et al., 2020) 2) Can pre-trained LMs be used to list the properties associated with given concepts? 19

Properties of Concepts (Weir et al., 2020) 2) Can pre-trained LMs be used to list the properties associated with given concepts? Low correlation with human elicited properties, but coherent and mostly “verifiable by humans”. 20

Can we trust knowledge from LMs? 21

How well do LMs handle mutual exclusivity?* https://demo.allennlp.org/masked-lm 22

LMs also generate fictitious facts! 23

LMs also generate fictitious facts! Distributionally-related: 24

LMs also generate fictitious facts! Distributionally-related: Syntactically-similar: 25

Zero-shot LM-based Models for commonsense tasks 26

Zero-shot setup 27

Zero-shot setup P LM (The answer is answer_choice_1 ) P LM (The answer is answer_choice_2 ) ... P LM (The answer is answer_choice_k ) Language Model 28

Zero-shot setup P LM ( answer_choice_1 | The answer is [MASK]) P LM (The answer is answer_choice_1 ) P LM ( answer_choice_2 | The answer is [MASK]) P LM (The answer is answer_choice_2 ) ... ... P LM ( answer_choice_k | The answer is [MASK]) P LM (The answer is answer_choice_k ) Masked Language Model Language Model 29

Unsupervised Commonsense Question Answering with Self-Talk (Shwartz et al., 2020) Can we use LMs to generate required, missing or implicit knowledge for multiple choice commonsense question answering tasks? 30

Model What do professors primarily do? teach courses. The main function of a professor’s teaching s₁₁ career is to teach students how they can improve their knowledge. What do professors primarily do? wear wrinkled tweed jackets. The main function of a s₁₂ min i (s i ₁) professor’s teaching career is to teach students how they can improve their knowledge. What do professors primarily do? teach courses. The main function of a professor's teaching s k ₁ min i (s i ₂) career and is to provide instruction in the subjects they teach. What do professors primarily do? wear wrinkled tweed jackets. The main function of a s k ₂ professor's teaching career and is to provide instruction in the subjects they teach. 31

Generating Clarifications Question Generation What do professors primarily do? teach courses 32

Generating Clarifications Question Generation What do professors primarily do? p₁ What is the main function of p₁ DistilGPT2 teach courses 33

Generating Clarifications Clarification Generation Question Generation What do professors primarily do? What do professors primarily do? p₁ What is the main function of What is the main function of a professor’s teaching career? p₂ The main function of is a professor’s teaching career p₁ DistilGPT2 a professor’s teaching career? teach courses 34

Generating Clarifications Clarification Generation Question Generation What do professors primarily do? What do professors primarily do? p₁ What is the main function of What is the main function of a professor’s teaching career? p₂ The main function of is a professor’s teaching career p₁ p₂ DistilGPT2 a professor’s teaching career? DistilGPT2 to teach students how they can improve their knowledge. The main function of a professor’s teaching career is to teach students how they can improve their knowledge. teach courses 35

Knowledge-informed Model Generating clarifications from ConceptNet, Google Ngrams and COMET Taylor was doing her job so she put the money in the drawer. What will Taylor want to do next? 36

Knowledge-informed Model Generating clarifications from ConceptNet, Google Ngrams and COMET Taylor was doing her job so she put the money in the drawer. What will Taylor want to do next? job, money type of work job motivated by goal money Job is a type of work. You would work because you want money. 37

Knowledge-informed Model Generating clarifications from ConceptNet, Google Ngrams and COMET Taylor was doing her job so she put the money in the drawer. What will Taylor want to do next? job, money type of work job motivated job to earn money by goal money Job is a type of work. You would work because you want money. Job to earn money. 38

Knowledge-informed Model Generating clarifications from ConceptNet, Google Ngrams and COMET Taylor was doing her job so she put the money in the drawer. What will Taylor want to do next? xWant job, money type of work job motivated job to earn money by goal money to keep the money in the drawer Job is a type of work. You would work because you want money. Job to earn money. As a result, Taylor wants to keep the money in the drawer. 39

Unsupervised Commonsense Question Answering with Self-Talk ● Generating knowledge with LMs improve upon the baseline and performs similarly to knowledge-informed models. 40

Commonsense Knowledge in Pre-trained Language Models Vered - PowerPoint PPT Presentation

Commonsense Knowledge in Pre-trained Language Models Vered Shwartz July 5th, 2020 Commonsense Knowledge in If I lean on Ernie my back will Pre-trained hurt less Language Models Vered Shwartz July 5th, 2020 Commonsense Elmo will feel

Commonsense benchmarks Or how to measure that your model is actually doing some commonsense

Agenda 08:00 PST 1 hr 50 mins Part I - Review of CSKGs 15 min Introduction to commonsense

Agenda 08:00 PST 1 hr 50 mins Part I - Review of CSKGs 15 min Introduction to commonsense

Representing Knowledge Dustin Smith MIT Media Lab July 2008 Commonsense Computing MIT MediaLab

Which Material Design Is Commonsense . . . Possible Under Additive Commonsense . . . How

WebChild: Harvesting and Organizing Commonsense Knowledge from Web Niket Tandon Max Planck

PIQA: Reasoning about Physical Commonsense in Natural Language Shailesh M Pandey Bisk, Yonatan

Knowledge-Based Agents knowledge knowledge representation, knowledge base, types of knowledge

Acquiring Comparative Commonsense Knowledge from the Web Niket Tandon Max Planck Institute for

Contextual Word Representations with BERT and Other Pre-trained Language Models Jacob Devlin

Commonsense Properties from Query Logs and Question Answering Forums Julien Romero, Simon

Models of Language Evolution models thereof its evolution language Models of Language Evolution

Commonsense for Generative Multi-Hop Question Answering Tasks Lisa Bauer* Yicheng Wang* Mohit

Our Digital Citizenship Pledge commonsense.org/education Shareable with attribution for

This Just In! commonsense.org/education Shareable with attribution for noncommercial use. Remixing

You Won't Believe This! commonsense.org/education Shareable with attribution for noncommercial

2020 Mitigation Workgroup Policy Scenario Results June 18, 2020 Updated June 22, 2020 Reminder

Board-GAC Interactions Group (BGIG) Meeting ICANN65 25 June 2019 | 1 | 1 Meeting Agenda

Early I Early Interv ntervening ening Service Services Individual Individ als wit with h

Improved Models and Queries for Grounded Human-Robot Dialog Aishwarya Padmakumar Doctoral

Query Clarification in Voice Search The case of ambiguous terms and false memories Masters

Everything Second Example of . . . Third Example of . . . Is a Matter of Degree: Our

Active Error Detection and Resolution for Speech-to-Speech (S2S) Translation Rohit Prasad Rohit

Design and Analysis of Algorithms This Class Website and Contact Website www.cs.kent.edu/