Machine Reading and Reasoning with Neural Program Interpreters - - PowerPoint PPT Presentation

machine reading and reasoning with neural program
SMART_READER_LITE
LIVE PREVIEW

Machine Reading and Reasoning with Neural Program Interpreters - - PowerPoint PPT Presentation

Machine Reading and Reasoning with Neural Program Interpreters Sebastian Riedel @riedelcastro Bloomsbury AI Machine Reading Collaborators Pontus Stenetorp Matko Bosnjak Johannes Welbl (UCL) (UCL) (UCL) Tim Rocktschel Jason Naradowsky


slide-1
SLIDE 1

Machine Reading and Reasoning with Neural Program Interpreters

Sebastian Riedel

Machine Reading

Bloomsbury AI

@riedelcastro

slide-2
SLIDE 2

Collaborators

2

Tim Rocktäschel Matko Bosnjak Pontus Stenetorp

(now at Oxford) (UCL) (UCL)

Johannes Welbl

(UCL)

Jason Naradowsky

(now Johns Hopkins University)

slide-3
SLIDE 3

“Should we separate meaning from language?”

3

Chris Manning @AKBC 2013

[Language] [Meaning] ? [Information Need] !

[Maybe not?] The “Classic” NLP Paradigm

Convolutional 2D Knowledge Graph Embeddings, Tim Dettmers et al. AAAI18, Mon 11:30-12:30, Room 8

slide-4
SLIDE 4

End-to-End Reading and Comprehension

4

(Hermann et. al 2015, Seo et al., 2016, Rajpurkar et al., 2016, Weissenborn 2016…)

[Language] ? [Information Need] !

slide-5
SLIDE 5

Nicola Tesla … In January 1880, two of Tesla's uncles put together enough money to help him leave Gospić for Prague where he was to study. Unfortunately, he arrived too late to enroll at Charles-Ferdinand University; he never studied Greek, a required subject; and he was illiterate in Czech, another required subject. Tesla did, however, attend lectures at the university, although, as an auditor, he did not receive grades for the courses.

Machine Reading

5

Nicola Tesla … In January 1880, two of Tesla's uncles put together enough money to help him leave Gospić for Prague where he was to study. Unfortunately, he arrived too late to enroll at Charles-Ferdinand University; he never studied Greek, a required subject; and he was illiterate in Czech, another required subject. Tesla did, however, attend lectures at the university, although, as an auditor, he did not receive grades for the courses. What city did Tesla move to in 1880? Prague Why was he unable to enroll at the university? arrived too late to enroll Nicola Tesla … In January 1880, two of Tesla's uncles put together enough money to help him leave Gospić for Prague where he was to study. Unfortunately, he arrived too late to enroll at Charles-Ferdinand University; he never studied Greek, a required subject; and he was illiterate in Czech, another required subject. Tesla did, however, attend lectures at the university, although, as an auditor, he did not receive grades for the courses.

slide-6
SLIDE 6

6

slide-7
SLIDE 7

7

DEEP LEARNING KILLED THE LINGUISTS BOSTON SCIENTIST CLAIMS

slide-8
SLIDE 8

8

BOSTON SCIENTIST IS COMPLETELY WRONG CLAIMS PHILADELPHIA PROFESSOR

slide-9
SLIDE 9

How to read and reason end-to-end?

9

slide-10
SLIDE 10

Machine Reading and Reasoning

10

Which medical specialty deals with pituitary ACTH hypersecretion? Endocrinology Pituitary ACTH hypersecretion ... is a form of hyperpituitarism characterized by an abnormally high level of ACTH produced by the anterior pituitary … A major organ of the endocrine system, the anterior pituitary is the glandular, anterior lobe that ... The endocrine system is ... ... The field of study dealing with the endocrine system and its disorders is endocrinology, a branch of internal medicine.

slide-11
SLIDE 11

Machine Reading and Reasoning

11

How many pictures were in each of the albums? 2 Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album.

slide-12
SLIDE 12

Can we learn this end-to-end?

12

slide-13
SLIDE 13

Part 1: Learning to Read and Calculate

13

Tim Rocktäschel Matko Bosnjak Jason Naradowsky

slide-14
SLIDE 14

Machine Reading and Reasoning: Math

14

How many pictures were in each of the albums? 2 Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album.

slide-15
SLIDE 15

Differentiable Program Interpreters

15

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 3 4 2

d

Stack Program Interpreter

1 2 3 4 5 6 7 8 9

Heap

Bosnjak et al. ICML 2017

Reader Model

Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album. How many pictures were in each of the albums?

slide-16
SLIDE 16

1 2 3 4 5 6 7 8 9 3 4 2

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

Differentiable Program Interpreters

16

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 4 2 3

d

Stack Program Interpreter

1 2 3 4 5 6 7 8 9

Heap

Bosnjak et al. ICML 2017

Reader Model

Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album. How many pictures were in each of the albums?

slide-17
SLIDE 17

1 2 3 4 5 6 7 8 9 3 4 2

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

Differentiable Program Interpreters

17

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 4 2 3

d

Stack Program Interpreter

1 2 3 4 5 6 7 8 9

Heap

Bosnjak et al. ICML 2017

Reader Model

Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album. How many pictures were in each of the albums?

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 6 3

d

Stack

slide-18
SLIDE 18

1 2 3 4 5 6 7 8 9 3 4 2

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

Differentiable Program Interpreters

18

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 4 2 3

d

Stack Program Interpreter

1 2 3 4 5 6 7 8 9

Heap

Bosnjak et al. ICML 2017

Reader Model

Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album. How many pictures were in each of the albums?

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 6 3

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 3 6

d

Stack

slide-19
SLIDE 19

1 2 3 4 5 6 7 8 9 3 4 2

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

Differentiable Program Interpreters

19

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 4 2 3

d

Stack Program Interpreter

1 2 3 4 5 6 7 8 9

Heap

Bosnjak et al. ICML 2017

Reader Model

Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album. How many pictures were in each of the albums?

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 6 3

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 3 6

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 2

d

Stack

slide-20
SLIDE 20

1 2 3 4 5 6 7 8 9 3 4 2

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

Training

20

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 4 2 3

d

Stack Program Interpreter

1 2 3 4 5 6 7 8 9

Heap

Bosnjak et al. ICML 2017

Reader Model

Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album. How many pictures were in each of the albums?

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 6 3

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 3 6

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 2

d

Stack

slide-21
SLIDE 21

1 2 3 4 5 6 7 8 9 3 4 2

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

Training

21

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 4 2 3

d

Stack Program Interpreter

1 2 3 4 5 6 7 8 9

Heap

Bosnjak et al. ICML 2017

Reader Model

Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album. How many pictures were in each of the albums?

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 6 3

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 3 6

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 2

d

Stack

slide-22
SLIDE 22

1 2 3 4 5 6 7 8 9 3 4 2

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

Training

22

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 4 2 3

d

Stack Program Interpreter

1 2 3 4 5 6 7 8 9

Heap

Bosnjak et al. ICML 2017

Reader Model

Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album. How many pictures were in each of the albums?

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 6 3

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 3 6

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 2

d

Stack

slide-23
SLIDE 23

1 2 3 4 5 6 7 8 9 3 4 2

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

Training

23

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 4 2 3

d

Stack Program Interpreter

1 2 3 4 5 6 7 8 9

Heap

Bosnjak et al. ICML 2017

Reader Model

Isabel uploaded 2 pictures from her phone and 4 from her camera to facebook. She sorted the pics into 3 different albums with the same amount of pics in each album. How many pictures were in each of the albums?

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 6 3

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 3 6

d

Stack

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9 2

d

Stack

def solve(x): {+|-|%|*} solve(y)

slide-24
SLIDE 24

Zoom in: State Transitions

24

ht-1 ht

1 2 3 4 5 6 7 8 9

Heap

1 2 3 4 5 6 7 8 9

p

1 2 3 4 5 6 7 8 9 2

d

Code Stack

slide-25
SLIDE 25

Zoom in: Pop Operation

25

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9

d

Stack

ht-1 ht Pop Circular Shift Matrix

p

slide-26
SLIDE 26

Zoom in: Pop Operation

26

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9

d

Stack

ht-1 ht Pop Circular Shift Matrix

p p

x

2 2 3 4 5 6 7 8 9

d

slide-27
SLIDE 27

Zoom in: Pop Operation

27

1 2 3 4 5 6 7 8 9

p

Code

1 2 3 4 5 6 7 8 9

d

Stack

ht-1 ht Pop Circular Shift Matrix

p

x

2 2 3 4 5 6 7 8 9

d d

1 2 3 4 5 6 7 8 9

slide-28
SLIDE 28

Zoom in: Add 1 Operation

28

ht-1 ht Circular Shift Matrix

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

+1

slide-29
SLIDE 29

Zoom in: Add 1 Operation

29

ht-1 ht Circular Shift Matrix

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

+1

p

2 2 3 4 5 6 7 8 9

d

slide-30
SLIDE 30

Zoom in: Add 1 Operation

30

ht-1 ht Circular Shift Matrix

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

+1

p

2 2 3 4 5 6 7 8 9

d

x

d

x T

d

1 2 3 4 5 6 7 8 9

slide-31
SLIDE 31

Code Pointer Uncertainty

31

ht-1 ht

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

+1

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

Pop

slide-32
SLIDE 32

Code Uncertainty

32

ht-1 ht

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

+1

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

Pop

slide-33
SLIDE 33

Code Learning

33

ht-1 ht

p

2 2 3 4 5 6 7 8 9

? ? ? ? ? ? ? ? ? ? d

1 2 3 4 5 6 7 8 9

+1

p

2 2 3 4 5 6 7 8 9

? ? ? ? ? ? ? ? ? ? d

1 2 3 4 5 6 7 8 9

Pop

slide-34
SLIDE 34

Dynamic Code

34

ht-1 ht

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

slide-35
SLIDE 35

Direct Data Manipulation

35

ht-1 ht

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

p

2 2 3 4 5 6 7 8 9

d

1 2 3 4 5 6 7 8 9

slide-36
SLIDE 36

Related Work

Program Synthesis (Manna & Waldinger 71, Koza 92, Nordin 97, …)

We learn via SGD, allow dynamic code outside the host language

Probabilistic Programming (Goodman 08, Pfeffer 01, Milch 05, De Raedt 07)

We use procedural language, discriminative, easy to integrate into end-to-end neural architectures

Neural Program Induction (Graves et al. 14, Reed & Freitas 15, …)

We enable program sketches, host language is proper 3rd generation language

36

slide-37
SLIDE 37

Results on Learning To Sort

37

Bosnjak et al. ICML 2017

25 50 75 100 8 4 2 Seq2Seq Ours Seq2Seq Ours Seq2Seq Ours

100 100 100 37.9 57.8 99.8

def bubble(list): if len(list) == 1: return list else: return slot1(bubble(slot2(list)) def sort(list): for i in range(0,len(list): list = bubble(list) return list

Test Length: 8

slide-38
SLIDE 38

Program Trace During Learning

38

(a) Program Counter trace in early stages of training. (b) Program Counter trace in the middle of training. (c) Program Counter trace at the end of training.

Early Stage of Training Middle of Training End of training

slide-39
SLIDE 39

Results on Math Problems (Accuracy)

39

Bosnjak et al. ICML 2017

50 62.5 75 87.5 100 Roy & Roth (2015) Ours Seq2Seq

95 96 55.5 How many pictures were in each of the albums? (2 + 4) / 3

Seq2Seq solves simpler problem

slide-40
SLIDE 40

Limitations

Continuous relaxations difficult to train

Hard to learn long programs Hard to learn with recursive function calls

Need better gradients in presence of discrete variables

40

slide-41
SLIDE 41

Part 2: Learning to Aggregate

41

Tim Rocktäschel

slide-42
SLIDE 42

Neural Theorem Provers

42

Pituitary ACTH hypersecretion ... is a form of hyperpituitarism characterized by an abnormally high level of ACTH produced by the anterior pituitary … Which medical specialty deals with pituitary ACTH hypersecretion? A major organ of the endocrine system, the anterior pituitary is the glandular, anterior lobe that ... The endocrine system is ... ... The field of study dealing with the endocrine system and its disorders is endocrinology, a branch of internal medicine.

slide-43
SLIDE 43

Neural Theorem Provers

43

Pituitary ACTH hypersecretion ... is a form of hyperpituitarism characterized by an abnormally high level of ACTH produced by the anterior pituitary … Which medical specialty deals with pituitary ACTH hypersecretion? A major organ of the endocrine system, the anterior pituitary is the glandular, anterior lobe that ... The endocrine system is ... ... The field of study dealing with the endocrine system and its disorders is endocrinology, a branch of internal medicine.

Rocktäschel and Riedel NIPS 2017

Agent Program Interpreter Reader Question

Which specialty deals with pituitary ACTH hypersecretion?

1 2 3 4 5 6 7 8 9

p

Code

slide-44
SLIDE 44

1 2 3 4 5 6 7 8 9

p

Code

Neural Theorem Provers

44

Pituitary ACTH hypersecretion ... is a form of hyperpituitarism characterized by an abnormally high level of ACTH produced by the anterior pituitary … A major organ of the endocrine system, the anterior pituitary is the glandular, anterior lobe that ... The endocrine system is ... ... The field of study dealing with the endocrine system and its disorders is endocrinology, a branch of internal medicine.

Rocktäschel and Riedel NIPS 2017

Agent Program Interpreter Reader Question

Which specialty deals with pituitary ACTH hypersecretion?

Question

What is ACTH hypersecretion created by?

1 2 3 4 5 6 7 8 9

p

Code

Which medical specialty deals with pituitary ACTH hypersecretion?

slide-45
SLIDE 45

Question

Which specialty deals with pituitary ACTH hypersecretion?

1 2 3 4 5 6 7 8 9

p

Code

Neural Theorem Provers

45

Pituitary ACTH hypersecretion ... is a form of hyperpituitarism characterized by an abnormally high level of ACTH produced by the anterior pituitary … A major organ of the endocrine system, the anterior pituitary is the glandular, anterior lobe that ... The endocrine system is ... ... The field of study dealing with the endocrine system and its disorders is endocrinology, a branch of internal medicine.

Rocktäschel and Riedel NIPS 2017

Agent Program Interpreter Reader Question

Which specialty deals with pituitary ACTH hypersecretion?

1 2 3 4 5 6 7 8 9

p

Code Question

What is the pituitary a part of?

1 2 3 4 5 6 7 8 9

p

Code

Which medical specialty deals with pituitary ACTH hypersecretion?

slide-46
SLIDE 46

Question

Which specialty deals with pituitary ACTH hypersecretion?

1 2 3 4 5 6 7 8 9

p

Code

Neural Theorem Provers

46

Pituitary ACTH hypersecretion ... is a form of hyperpituitarism characterized by an abnormally high level of ACTH produced by the anterior pituitary … A major organ of the endocrine system, the anterior pituitary is the glandular, anterior lobe that ... The endocrine system is ... ... The field of study dealing with the endocrine system and its disorders is endocrinology, a branch of internal medicine.

Rocktäschel and Riedel NIPS 2017

Agent Program Interpreter Reader Question

Which specialty deals with pituitary ACTH hypersecretion?

1 2 3 4 5 6 7 8 9

p

Code Question

What is the pituitary a part of?

1 2 3 4 5 6 7 8 9

p

Code Question

What topic covers the endocrine system?

1 2 3 4 5 6 7 8 9

p

Code

Which medical specialty deals with pituitary ACTH hypersecretion?

slide-47
SLIDE 47

Vectors Correspond to Interpretable Rules

47

Pituitary ACTH hypersecretion ... is a form of hyperpituitarism characterized by an abnormally high level of ACTH produced by the anterior pituitary … A major organ of the endocrine system, the anterior pituitary is the glandular, anterior lobe that ... The endocrine system is ... ... The field of study dealing with the endocrine system and its disorders is endocrinology, a branch of internal medicine.

Rocktäschel and Riedel NIPS 2017

Agent Program Interpreter Reader Question

Which specialty deals with pituitary ACTH hypersecretion?

1 2 3 4 5 6 7 8 9

p

Code Question

What is the pituitary a part of?

1 2 3 4 5 6 7 8 9

p

Code Question

What topic covers the endocrine system?

1 2 3 4 5 6 7 8 9

p

Code

X deals with Y if Y produced by Z Z is a part of U X deals with U

Which medical specialty deals with pituitary ACTH hypersecretion?

slide-48
SLIDE 48

Catch: Currently only Works on Relational Data

48

createdBy(hypersecretion, anterior pituitary) dealsWith(hypersecretion, X) partOf(endocrine system, anterior pituitary) dealsWith(endocrine system, endocrinology)

Rocktäschel and Riedel NIPS 2017

Agent Program Interpreter Reader Question

Which specialty deals with pituitary ACTH hypersecretion?

1 2 3 4 5 6 7 8 9

p

Code Question

What is the pituitary a part of?

1 2 3 4 5 6 7 8 9

p

Code Question

covers(endocrine system, X)

1 2 3 4 5 6 7 8 9

p

Code

X deals with Y if Y produced by Z Z is a part of U X deals with U

slide-49
SLIDE 49

Catch: Currently only Works on Relational Data

49

createdBy(hypersecretion, anterior pituitary) partOf(endocrine system, anterior pituitary) dealsWith(endocrine system, endocrinology)

Rocktäschel and Riedel NIPS 2017

Agent

Amounts to Differentiable Version of the Backward Chaining algorithm used in Prolog

dealsWith(hypersecretion, X)

slide-50
SLIDE 50

Supports Soft Unification

50

createdBy(hypersecretion, anterior pituitary) dealsWith(hypersecretion, X) partOf(endocrine system, anterior pituitary) dealsWith(endocrine system, endocrinology)

Rocktäschel and Riedel NIPS 2017

Agent Program Interpreter Reader Question

Which specialty deals with pituitary ACTH hypersecretion?

1 2 3 4 5 6 7 8 9

p

Code Question

What is the pituitary a part of?

1 2 3 4 5 6 7 8 9

p

Code Question

covers(endocrine system, X)

1 2 3 4 5 6 7 8 9

p

Code

X deals with Y if Y produced by Z Z is a part of U X deals with U

slide-51
SLIDE 51

Results on Benchmark (Rank of Correct Answer)

51

25 50 75 100 Countries S3 UMLS Nations Complex NTP Complex NTP Complex NTP

89 99 77.3 86 96 48.4

Comparable or Better than Baselines, and interpretable Rocktäschel and Riedel NIPS 2017

slide-52
SLIDE 52

Interpretability: Learnt Rules

if X is located in Y and Y is located in Z then X is located in Z if X expels diplomats of Y then X shows negative behaviour towards Y if X interacts with Y and Y interacts with Z then X interacts with Z

52

slide-53
SLIDE 53

Related Work

Probabilistic Logic Programming: IBAL (Pfeffer, 2001), BLOG (Milch et al., 2005), Markov Logic Networks (Richardson and Domingos, 2006), ProbLog (De Raedt et al., 2007), BLP (Kersting, De Raedt, 2007) Inductive Logic Programming: Plotkin (1970), Shapiro (1991), Muggleton (1991), De Raedt (1999) . . . Statistical Predicate Invention (Kok and Domingos, 2007) Neural-symbolic Connectionism

Propositional rules: EBL-ANN (Shavlik and Towell, 1989), KBANN (Towell and Shavlik, 1994), C-LIP (Garcez and Zaverucha, 1999) First-order inference (no training of symbol representations): Unification Neural Networks (Holldobler, 1990; Komendantskaya 2011), SHRUTI (Shastri, 1992), Neural Prolog (Ding, 1995), CLIP++ (Franca et al. 2014), Lifted Relational Networks (Sourek et al. 2015), TensorLog (Cohen 46)

53

slide-54
SLIDE 54

Limitations

Scalability

Currently only works for KBs with < 10k facts Small proof depth

Still requires relational representation

54

slide-55
SLIDE 55

Part 3: A Read & Reason Dataset

55

Pontus Stenetorp Johannes Welbl

slide-56
SLIDE 56

56

slide-57
SLIDE 57

A Single Instance

57

What is the nationality of Jamie Burnett? Candidates: Scotland (correct) China (incorrect) … Jamie Burnett (born 16 September 1975 ) is a professional snooker player from Hamilton, South Lanarkshire…He began the 2014/2015 season with a quarter-final showing at the Yixing Open… …Hamilton is a town in South Lanarkshire, in the central Lowlands of Scotland… The Yixing Open was a professional minor- ranking snooker tournament that took place at the Yixing Sports Centre in Yixing, China.

slide-58
SLIDE 58

Dataset Construction Method

58

Unlabelled Text Knowledge Base Dataset Construction Method [Conditionally Accepted to TACL]

Multihop Dataset

slide-59
SLIDE 59

Dataset Construction Method

59

Documents Entities

Jamie Burnett Scotland

described in mentions described in mentions

Hamilton Jamie Burnett, citizenship, Scotland Yixing Open China What is the nationality of Jamie Burnett?

KB Triple Instance

Scotland China

slide-60
SLIDE 60

Dataset Construction Method

60

Unlabelled Text Knowledge Base Dataset Construction Method [Conditionally Accepted to TACL]

Multihop Dataset

slide-61
SLIDE 61

Dataset Construction Method

61

Dataset Construction Method

WikiHop

Unlabelled Text Knowledge Base

slide-62
SLIDE 62

Dataset Construction Method

62

Dataset Construction Method

MedHop

Unlabelled Text Knowledge Base

slide-63
SLIDE 63

Baseline Results

63

Accuracy [%]

8.333 16.667 25 33.333 41.667 50 WikiHop MedHop

47.8 42.9 9 25.6 9.5 10.6 13.9 11.5

Random Max-Mention TF-IDF BiDAF

slide-64
SLIDE 64

Reduction to Traditional Machine Comprehension

64

Which medical specialty deals with pituitary ACTH hypersecretion? Endocrinology Pituitary ACTH hypersecretion ... is a form of hyperpituitarism characterized by an abnormally high level of ACTH produced by the anterior pituitary … A major organ of the endocrine system, the anterior pituitary is the glandular, anterior lobe that ... The endocrine system is ... ... The field of study dealing with the endocrine system and its disorders is endocrinology, a branch of internal medicine.

slide-65
SLIDE 65

Reduction to Traditional Machine Comprehension

65

Which medical specialty deals with pituitary ACTH hypersecretion? Endocrinology A major organ of the endocrine system, the anterior pituitary is the glandular, anterior lobe that ... Pituitary ACTH hypersecretion ... is a form of hyperpituitarism characterized by an abnormally high level of ACTH produced by the anterior pituitary … The endocrine system is ... ... The field of study dealing with the endocrine system and its disorders is endocrinology, a branch of internal medicine.

slide-66
SLIDE 66

Do Neural Reading Models Aggregate?

66

Which medical specialty deals with pituitary ACTH hypersecretion? Endocrinology A major organ of the endocrine system, the anterior pituitary is the glandular, anterior lobe that ... Pituitary ACTH hypersecretion ... is a form of hyperpituitarism characterized by an abnormally high level of ACTH produced by the anterior pituitary … The endocrine system is ... ... The field of study dealing with the endocrine system and its disorders is endocrinology, a branch of internal medicine.

slide-67
SLIDE 67

Removing Relevant Documents, Keep Answer Documents

67

Accuracy [%]

25 30 35 40 45 50 55 WikiHop MedHop

30.4 44.6 33.7 54.5

BIDAF BIDAF doc-rem

10%

slide-68
SLIDE 68

Conclusion

Great Progress in End-to-End Reading Comprehension Reasoning (aggregation, calculation etc.) end-to-end is still very challenging Our Approaches

create datasets cast reasoning as program learning and execution are end-to-end differentiable (can be trained on downstream loss) are inspired and tied to traditional symbolic formalisms (Forth, Prolog/Datalog) are learnt models are interpretable allow injection of prior knowledge

68