Cheap Tricks and the Perils of Machine Learning Percy Liang - - PowerPoint PPT Presentation

cheap tricks and the perils of machine learning
SMART_READER_LITE
LIVE PREVIEW

Cheap Tricks and the Perils of Machine Learning Percy Liang - - PowerPoint PPT Presentation

Cheap Tricks and the Perils of Machine Learning Percy Liang Stanford / (Semantic Machines / Microsoft) NAACL Workshop on New Forms of Generalization June 5, 2018 [with Pranav Rajpurkar et al; 2016] Reading Comprehension (SQuAD) 1 2 [with


slide-1
SLIDE 1

Cheap Tricks and the Perils of Machine Learning

Percy Liang

Stanford / (Semantic Machines / Microsoft)

NAACL Workshop on New Forms of Generalization — June 5, 2018

slide-2
SLIDE 2

Reading Comprehension (SQuAD)

[with Pranav Rajpurkar et al; 2016] 1

slide-3
SLIDE 3

2

slide-4
SLIDE 4

Reading comprehension

The way a teacher promotes the course they are teaching, the more the student will get out of the subject matter. The three most important aspects of teacher enthusiasm are enthusiasm about teaching, enthusiasm about the students, and enthusiasm about the subject matter. A teacher must enjoy teaching. If they do not enjoy what they are doing, the students will be able to tell. They also must enjoy being around their students. A teacher who cares for their students is going to help that individual succeed in their life in the future. The teacher also needs to be enthusiastic about the subject matter they are teaching. For example, a teacher talking about chemistry needs to enjoy the art of chemistry and show that to their students. A spark in the teacher may create a spark of excitement in the student as well. An enthusiastic teacher has the ability to be very influential in the young student’s life.

What can an enthusiastic teacher be to a young student?

r-net+ [MSR-A] very influential

[with Robin Jia; EMNLP 2017] 3

slide-5
SLIDE 5

Reading comprehension

The way a teacher promotes the course they are teaching, the more the student will get out of the subject matter. The three most important aspects of teacher enthusiasm are enthusiasm about teaching, enthusiasm about the students, and enthusiasm about the subject matter. A teacher must enjoy teaching. If they do not enjoy what they are doing, the students will be able to tell. They also must enjoy being around their students. A teacher who cares for their students is going to help that individual succeed in their life in the future. The teacher also needs to be enthusiastic about the subject matter they are teaching. For example, a teacher talking about chemistry needs to enjoy the art of chemistry and show that to their students. A spark in the teacher may create a spark of excitement in the student as well. An enthusiastic teacher has the ability to be very influential in the young student’s life. An unenthusiastic teacher can be troubling to a young student.

What can an enthusiastic teacher be to a young student?

r-net+ [MSR-A]

[with Robin Jia; EMNLP 2017] 3

slide-6
SLIDE 6

Reading comprehension

The way a teacher promotes the course they are teaching, the more the student will get out of the subject matter. The three most important aspects of teacher enthusiasm are enthusiasm about teaching, enthusiasm about the students, and enthusiasm about the subject matter. A teacher must enjoy teaching. If they do not enjoy what they are doing, the students will be able to tell. They also must enjoy being around their students. A teacher who cares for their students is going to help that individual succeed in their life in the future. The teacher also needs to be enthusiastic about the subject matter they are teaching. For example, a teacher talking about chemistry needs to enjoy the art of chemistry and show that to their students. A spark in the teacher may create a spark of excitement in the student as well. An enthusiastic teacher has the ability to be very influential in the young student’s life. An unenthusiastic teacher can be troubling to a young student.

What can an enthusiastic teacher be to a young student?

r-net+ [MSR-A] troubling

[with Robin Jia; EMNLP 2017] 3

slide-7
SLIDE 7

Humans versus machines

4

slide-8
SLIDE 8

Harder tests?

We want multiple-choice questions that people can answer easily. But we also want to avoid as much as possible questions that can be answered using cheap tricks (aka heuristics). — Hector Levesque, 2013

5

slide-9
SLIDE 9

Winograd schema

The dog chased the cat, which ran up a tree. It waited at the top.

6

slide-10
SLIDE 10

Winograd schema

The dog chased the cat, which ran up a tree. It waited at the top. The dog chased the cat, which ran up a tree. It waited at the bottom.

6

slide-11
SLIDE 11

Winograd schema

The dog chased the cat, which ran up a tree. It waited at the top.

[hat tip: Fernando Pereira] 7

slide-12
SLIDE 12

Winograd schema

The dog chased the cat, which ran up a tree. It waited at the top.

[hat tip: Fernando Pereira] 7

slide-13
SLIDE 13

Winograd schema

The dog chased the cat, which ran up a tree. It waited at the top.

[hat tip: Fernando Pereira] 7

slide-14
SLIDE 14

Winograd schema

The dog chased the cat, which ran up a tree. It waited at the top.

[hat tip: Fernando Pereira] 7

slide-15
SLIDE 15

Winograd schema

The dog chased the cat, which ran up a tree. It waited at the bottom.

[hat tip: Fernando Pereira] 8

slide-16
SLIDE 16

Winograd schema

The dog chased the cat, which ran up a tree. It waited at the bottom.

[hat tip: Fernando Pereira] 8

slide-17
SLIDE 17

Winograd schema

The dog chased the cat, which ran up a tree. It waited at the bottom.

[hat tip: Fernando Pereira] 8

slide-18
SLIDE 18

Winograd schema

The dog chased the cat, which ran up a tree. It waited at the bottom.

[hat tip: Fernando Pereira] 8

slide-19
SLIDE 19

But what about...

Commonsense knowledge Logical reasoning Linguistic phenomena Intuitive physics ...

9

slide-20
SLIDE 20

But what about...

Commonsense knowledge Logical reasoning Linguistic phenomena Intuitive physics ... MACHINE LEARNING DOESN’T CARE

9

slide-21
SLIDE 21

Interpolation is insufficient

≈ Train Test

10

slide-22
SLIDE 22

Interpolation is insufficient

≈ Train Test Any expressive model with enough data will do the job

10

slide-23
SLIDE 23

Extrapolation is harder

≈ Train Test

11

slide-24
SLIDE 24

Extrapolation is harder

≈ Train Test

11

slide-25
SLIDE 25

Extrapolation is harder

≈ Train Test To extrapolate (and be robust), must get a more ”correct” model

11

slide-26
SLIDE 26

Outline

Harder data Stronger models

12

slide-27
SLIDE 27

Adversarial evaluation of reading comprehension (EMNLP 2017)

Robin Jia

13

slide-28
SLIDE 28

Reading comprehension

The way a teacher promotes the course they are teaching, the more the student will get out of the subject matter. The three most important aspects of teacher enthusiasm are enthusiasm about teaching, enthusiasm about the students, and enthusiasm about the subject matter. A teacher must enjoy teaching. If they do not enjoy what they are doing, the students will be able to tell. They also must enjoy being around their students. A teacher who cares for their students is going to help that individual succeed in their life in the future. The teacher also needs to be enthusiastic about the subject matter they are teaching. For example, a teacher talking about chemistry needs to enjoy the art of chemistry and show that to their students. A spark in the teacher may create a spark of excitement in the student as well. An enthusiastic teacher has the ability to be very influential in the young student’s life.

What can an enthusiastic teacher be to a young student?

r-net+ [MSR-A] very influential

[with Robin Jia; EMNLP 2017] 14

slide-29
SLIDE 29

Reading comprehension

The way a teacher promotes the course they are teaching, the more the student will get out of the subject matter. The three most important aspects of teacher enthusiasm are enthusiasm about teaching, enthusiasm about the students, and enthusiasm about the subject matter. A teacher must enjoy teaching. If they do not enjoy what they are doing, the students will be able to tell. They also must enjoy being around their students. A teacher who cares for their students is going to help that individual succeed in their life in the future. The teacher also needs to be enthusiastic about the subject matter they are teaching. For example, a teacher talking about chemistry needs to enjoy the art of chemistry and show that to their students. A spark in the teacher may create a spark of excitement in the student as well. An enthusiastic teacher has the ability to be very influential in the young student’s life. An unenthusiastic teacher can be troubling to a young student.

What can an enthusiastic teacher be to a young student?

r-net+ [MSR-A]

[with Robin Jia; EMNLP 2017] 14

slide-30
SLIDE 30

Reading comprehension

The way a teacher promotes the course they are teaching, the more the student will get out of the subject matter. The three most important aspects of teacher enthusiasm are enthusiasm about teaching, enthusiasm about the students, and enthusiasm about the subject matter. A teacher must enjoy teaching. If they do not enjoy what they are doing, the students will be able to tell. They also must enjoy being around their students. A teacher who cares for their students is going to help that individual succeed in their life in the future. The teacher also needs to be enthusiastic about the subject matter they are teaching. For example, a teacher talking about chemistry needs to enjoy the art of chemistry and show that to their students. A spark in the teacher may create a spark of excitement in the student as well. An enthusiastic teacher has the ability to be very influential in the young student’s life. An unenthusiastic teacher can be troubling to a young student.

What can an enthusiastic teacher be to a young student?

r-net+ [MSR-A] troubling

[with Robin Jia; EMNLP 2017] 14

slide-31
SLIDE 31

worksheets.codalab.org

submit model to evaluate on hidden test set

15

slide-32
SLIDE 32

Results on SQuAD models

Model Original F1 Adversarial F1 SLQA+ 88.6 64.2 r-net+ 88.5 63.4 ReasoNet-E 81.1 49.8 SEDT-E 80.1 46.5 BiDAF-E 80.0 46.9 Mnemonic-E 79.1 55.3 Ruminating 78.8 47.7 jNet 78.6 47.0 Mnemonic-S 78.5 56.0 ReasoNet-S 78.2 50.3 MPCM-S 77.0 50.0

16

slide-33
SLIDE 33

Results on SQuAD models

Model Original F1 Adversarial F1 Humans 92.6 89.2 SLQA+ 88.6 64.2 r-net+ 88.5 63.4 ReasoNet-E 81.1 49.8 SEDT-E 80.1 46.5 BiDAF-E 80.0 46.9 Mnemonic-E 79.1 55.3 Ruminating 78.8 47.7 jNet 78.6 47.0 Mnemonic-S 78.5 56.0 ReasoNet-S 78.2 50.3 MPCM-S 77.0 50.0

16

slide-34
SLIDE 34

Training versus testing

The way a teacher promotes the course they are teaching, the more the student will get out of the subject matter. The three most important aspects of teacher enthusiasm are enthusiasm about teaching, enthusiasm about the students, and enthusiasm about the subject matter. A teacher must enjoy teaching. If they do not enjoy what they are doing, the students will be able to tell. They also must enjoy being around their students. A teacher who cares for their students is going to help that individual succeed in their life in the future. The teacher also needs to be enthusiastic about the subject matter they are teaching. For example, a teacher talking about chemistry needs to enjoy the art of chemistry and show that to their students. A spark in the teacher may create a spark of excitement in the student as well. An enthusiastic teacher has the ability to be very influential in the young student’s life. An unenthusiastic teacher can be troubling to a young student.

  • If retrain model on adversarial examples (appended), 74.3 F1 ⇒ 70.0 F1

17

slide-35
SLIDE 35

Training versus testing

The way a teacher promotes the course they are teaching, the more the student will get out of the subject matter. The three most important aspects of teacher enthusiasm are enthusiasm about teaching, enthusiasm about the students, and enthusiasm about the subject matter. A teacher must enjoy teaching. If they do not enjoy what they are doing, the students will be able to tell. They also must enjoy being around their students. A teacher who cares for their students is going to help that individual succeed in their life in the future. The teacher also needs to be enthusiastic about the subject matter they are teaching. For example, a teacher talking about chemistry needs to enjoy the art of chemistry and show that to their students. A spark in the teacher may create a spark of excitement in the student as well. An enthusiastic teacher has the ability to be very influential in the young student’s life. An unenthusiastic teacher can be troubling to a young student.

  • If retrain model on adversarial examples (appended), 74.3 F1 ⇒ 70.0 F1
  • If test this model on prepended sentences, 70.0 F1 ⇒ 36.9 F1

Cannot patch these issues automatically!

17

slide-36
SLIDE 36

Question: can we supply stronger information with the help of humans?

18

slide-37
SLIDE 37

Know What You Don’t Know: Unanswerable Questions for SQuAD (ACL 2018)

Pranav Rajpurkar Robin Jia

19

slide-38
SLIDE 38

Cheap tricks

Who was ...?

... Jefferson ...

When was ...?

... In 1812, ...

20

slide-39
SLIDE 39

Cheap tricks

Who was ...?

... Jefferson ...

When was ...?

... In 1812, ...

Can find plausible answers by just detecting entities

20

slide-40
SLIDE 40

Unanswerable questions

As of August 2010, Victoria had 1,548 public schools, 489 Catholic schools and 214 independent schools. Just under 540,800 students were enrolled in public schools, and just over 311,800 in private schools. Over 61 per cent of private students attend Catholic schools. More than 462,000 students were enrolled in primary schools and more than 390,000 in secondary schools. Retention rates for the final two years of secondary school were 77 per cent for public school students and 90 per cent for private school students. Victoria has about 63,519 full-time teachers. How many full time janitors does Victoria have?

human no answer

21

slide-41
SLIDE 41

Unanswerable questions

As of August 2010, Victoria had 1,548 public schools, 489 Catholic schools and 214 independent schools. Just under 540,800 students were enrolled in public schools, and just over 311,800 in private schools. Over 61 per cent of private students attend Catholic schools. More than 462,000 students were enrolled in primary schools and more than 390,000 in secondary schools. Retention rates for the final two years of secondary school were 77 per cent for public school students and 90 per cent for private school students. Victoria has about 63,519 full-time teachers. How many full time janitors does Victoria have?

BiDAF + self-attention [Clark/Gardner 2017] 63,519

21

slide-42
SLIDE 42

Unanswerable questions

  • Took the same 536 documents from SQuAD
  • 100K original SQuAD + 50K unanswerable questions

DocQA+ELMo: 66.3 Humans: 89.5

22

slide-43
SLIDE 43

Unanswerable questions

  • Took the same 536 documents from SQuAD
  • 100K original SQuAD + 50K unanswerable questions

DocQA+ELMo: 66.3 Humans: 89.5

  • Automatically generated questions are too easy

TF-IDF questions Rule-based questions Crowdsourced 83.0 89.6 62.6 Need humans to give us information!

22

slide-44
SLIDE 44

Evaluating Reading Comprehension Models based on Reductions

Robin Jia

23

slide-45
SLIDE 45

Reductions

slotfilling semantic parsing relation extraction question answering

24

slide-46
SLIDE 46

Reductions

slotfilling semantic parsing relation extraction question answering If it works: solve many other NLP tasks If it doesn’t work: evaluation benchmark for question answering

24

slide-47
SLIDE 47

Slotfilling to question answering

25

slide-48
SLIDE 48

Slotfilling to question answering

Results: Baseline 49.1 Reduction to QA 75.4

25

slide-49
SLIDE 49

Slotfilling to question answering

Results: Baseline 49.1 Reduction to QA 75.4 :( Baseline + 4 rules 83.2 Zhai et al. (2017) 95.9

25

slide-50
SLIDE 50

Summary

  • Adversarial examples for question answering expose problems quickly
  • Need humans to cover the space, but space is still too big
  • Reductions as a way of evaluating systems in a useful way

26

slide-51
SLIDE 51

Outline

Harder data Stronger models

27

slide-52
SLIDE 52

Example 1: laws of physics

Train: small objects, the past Test: large objects, the future Extrapolate to novel configurations

28

slide-53
SLIDE 53

Example 2: compositional semantics

1 2 3 4 5 Train: the blue block right of the blue block

29

slide-54
SLIDE 54

Example 2: compositional semantics

1 2 3 4 5 Train: the blue block right of the blue block Test: right of the blue block and left of the green block Extrapolate to longer sentences

29

slide-55
SLIDE 55

Inductive bias in neural networks

Convolutional neural networks: Attention-based mechanisms:

30

slide-56
SLIDE 56

Extrapolation

≈ Train Test

31

slide-57
SLIDE 57

Extrapolation

≈ Train Test Domain adaptation (unseen distribution): Train: input

  • utput

Test: input’

  • utput’

31

slide-58
SLIDE 58

Extrapolation

≈ Train Test Domain adaptation (unseen distribution): Train: input

  • utput

Test: input’

  • utput’

Unsupervised learning (unseen task): Train: input side information Test: input

  • utput

31

slide-59
SLIDE 59

Style / attribute transfer in natural language (NAACL 2018)

Juncen Li Robin Jia He He

32

slide-60
SLIDE 60

Task setup

Train (review ⇒ sentiment): very tasty burritos, and cheap too! ⇒ positive found hair in my soup, would never go back again ⇒ negative ... ... Test (negative review ⇒ positive review): great food but very rude workers ⇒ great food and very friendly staff

33

slide-61
SLIDE 61

Deletion-based model

Step 1: extract attributes

34

slide-62
SLIDE 62

Deletion-based model

Step 1: extract attributes Step 2: delete + predict

34

slide-63
SLIDE 63

Deletion-based model

Step 1: extract attributes Step 2: delete + predict Inductive bias: attribute/style is localized in the text

34

slide-64
SLIDE 64

Datasets

[Shen+ 2017; Fu+ 2018; Gan+ 2017]

35

slide-65
SLIDE 65

Results

Human evaluation: grammatical, preserve content, has target attribute

[Shen+ 2017; Fu+ 2018]

36

slide-66
SLIDE 66

Results

Source: we sit down and we got some really slow and lazy service .

37

slide-67
SLIDE 67

Results

Source: we sit down and we got some really slow and lazy service . CrossAligned: we went down and we were a good , friendly food . StyleEmbedding: we sit down and we got some really slow and prices suck . MultiDecoder: we sit down and we got some really and fast food .

37

slide-68
SLIDE 68

Results

Source: we sit down and we got some really slow and lazy service . CrossAligned: we went down and we were a good , friendly food . StyleEmbedding: we sit down and we got some really slow and prices suck . MultiDecoder: we sit down and we got some really and fast food . Delete: we sit down and we got some great and quick service . Delete+Retrieve: we got very nice place to sit down and we got some service . Locality inductive bias helps!

37

slide-69
SLIDE 69

SAT solving with neural networks

Daniel Selsam Matt Lamm Benedikt Bunz Leonardo de Moura David Dill

38

slide-70
SLIDE 70

SAT solving

(x1 ∨ x2) ∧ (¬x1 ∨ x3) ⇒ x1 = 0, x2 = 1, x3 = 1 x1 ∧ ¬x1 ⇒ unsat Can neural networks do logical reasoning?

39

slide-71
SLIDE 71

Model

  • Embedding for each literal, clause and time step
  • Literals and clauses exchange messages
  • At the end, predict vote for each literal, and average

Captures inductive bias of survey propagation

40

slide-72
SLIDE 72

Predicting satisfiability

Train: random instances of sat/unsat minimal pairs (x1 ∨ x2) · · ·

  • sat

⇒ 1 (¬x1 ∨ x2) · · ·

  • unsat

⇒ 0 Test: random instances (same distribution)

41

slide-73
SLIDE 73

Predicting satisfiability

Train: random instances of sat/unsat minimal pairs (x1 ∨ x2) · · ·

  • sat

⇒ 1 (¬x1 ∨ x2) · · ·

  • unsat

⇒ 0 Test: random instances (same distribution) Test accuracy: 88%

41

slide-74
SLIDE 74

Decoding satisfying assignments

42

slide-75
SLIDE 75

Decoding satisfying assignments

Can decode 90% of instances where model predicts sat — extrapolation!

42

slide-76
SLIDE 76

Extrapolation to larger instances, more iterations

43

slide-77
SLIDE 77

Transferring to different problems

44

slide-78
SLIDE 78

Summary

  • Unsupervised learning: evaluate on structurally unseen task
  • Strong inductive bias permits this unsupervised learning

45

slide-79
SLIDE 79

Outline

Harder data Stronger models

46

slide-80
SLIDE 80

Tasks that require understanding?

man in black shirt is playing guitar

[Karpathy+ 2014; Zhang+ 2016]

47

slide-81
SLIDE 81

Tasks that require understanding?

man in black shirt is playing guitar

[Karpathy+ 2014; Zhang+ 2016]

If want to evaluate ML, need to think statistically...

47

slide-82
SLIDE 82

Challenge to the community

Today: ≈ Training set Held out test set

48

slide-83
SLIDE 83

Challenge to the community

Today: ≈ Training set Held out test set Tomorrow (hopefully): ≈ Training distribution Held out test distribution/task Every paper should evaluate on a novel distribution or novel task.

48

slide-84
SLIDE 84

Final message

Need to think like machine learning when using machine learning

49

slide-85
SLIDE 85

50

slide-86
SLIDE 86

50

slide-87
SLIDE 87

50

slide-88
SLIDE 88

Worksheets

worksheets.codalab.org

Robin Jia Pranav Rajpurkar Juncen Li He He Daniel Selsam Matt Lamm Benedikt Bunz Leonardo de Moura David Dill

Future of Life OpenPhil NSF Facebook Microsoft Tencent Thank you!

51