Deep Reasoning A Vision for Automated Deduction Stephan Schulz - - PowerPoint PPT Presentation

deep reasoning
SMART_READER_LITE
LIVE PREVIEW

Deep Reasoning A Vision for Automated Deduction Stephan Schulz - - PowerPoint PPT Presentation

Deep Reasoning A Vision for Automated Deduction Stephan Schulz Deep Reasoning A Vision for Automated Deduction Wer Visionen hat, sollte zum Arzt gehen! Deep Reasoning A Vision for Automated Deduction Anybody with visions should go see a


slide-1
SLIDE 1

Deep Reasoning

A Vision for Automated Deduction Stephan Schulz

slide-2
SLIDE 2

Deep Reasoning

A Vision for Automated Deduction Wer Visionen hat, sollte zum Arzt gehen!

slide-3
SLIDE 3

Deep Reasoning

A Vision for Automated Deduction Anybody with visions should go see a doctor!

slide-4
SLIDE 4

Agenda

◮ Introduction ◮ Deep Learning ◮ Automated Theorem Proving ◮ Deep Reasoning ◮ Conclusion

2

slide-5
SLIDE 5

Introduction: Historical Perspective

1955 Logic Theorist 1956 Dartmouth Workshop - “Birth of AI” 1957 Perceptron 1958 LISP 1960 Davis-Putnam (DPLL 1962) 1965 Resolution/Unification 1970 Knuth-Bendix Completion 1972 PROLOG (1983 WAM) 1965-1975 MLP/back propagation 1980s Expert systems/Planners 1986 Decision tree learning 1990-1994 Superposition calculus since 1997 Development of (E 0.3 January 1999) since ca. 2005 “Deep Learning” 2008 E 1.0

3

slide-6
SLIDE 6

Deep Learning

4

slide-7
SLIDE 7

Deep Learning - Introduction

◮ Instance of machine learning ◮ Typical setting: Supervised learning

◮ Large number of pre-classified examples ◮ Examples are presented with expected output ◮ System learns classification/evaluation

◮ Result: Trained model

◮ Will provide classification/evaluation when presented with new input

5

slide-8
SLIDE 8

Deep Learning - Methods

◮ Application of known techniques on a new scale

◮ Supervised learning (classification/evaluation/association) ◮ Artificial neural networks ◮ Gradient-based learning/back-propagation

◮ New:

◮ Big networks ◮ Complex network structure

◮ Multiple sub-networks ◮ Convolution layers ◮ Recurrence

◮ (Mostly) raw input

◮ Feature extraction is part of the learning ◮ Encoding is part of the learning 6

slide-9
SLIDE 9

Deep Learning - Successes

◮ AI used to have problems with “easy” tasks ◮ Deep learning successfully addresses these problems

◮ Image recognition ◮ Voice recognition ◮ Natural language translation ◮ Hard games

◮ Video games (real time) ◮ Go ◮ Poker 7

slide-10
SLIDE 10

Deep Learning - Successes

◮ AI used to have problems with “easy” tasks ◮ Deep learning successfully addresses these problems

◮ Image recognition ◮ Voice recognition ◮ Natural language translation ◮ Hard games

◮ Video games (real time) ◮ Go ◮ Poker

Deep learning drives resurgence of Artificial Intelligence!

7

slide-11
SLIDE 11

Deep Learning - Why Now?

◮ Popularity of Deep Learning

◮ . . . slowly growing since the mid 2000s ◮ . . . explosively growing since mid 2010s

◮ Driven by “big hardware”

◮ Clusters of computers ◮ . . . with clusters of GPUs

◮ Driven by “big data”

◮ Large training sets ◮ Large size of individuals

◮ Driven by Open Source

◮ Algorithms and models published under permissive licenses ◮ Many state-of-the-art machine learning libraries available

8

slide-12
SLIDE 12

Deep Learning - A Parable

Cast of Characters

9

slide-13
SLIDE 13

Deep Learning - A Parable

Cast of Characters Neanderthal Man

9

slide-14
SLIDE 14

Deep Learning - A Parable

Cast of Characters Neanderthal Man Sir Isaac Newton

9

slide-15
SLIDE 15

Deep Learning - A Parable

Cast of Characters Neanderthal Man Sir Isaac Newton

  • Dr. Albert Einstein

9

slide-16
SLIDE 16

Neanderthal Learning

10

slide-17
SLIDE 17

Neanderthal Learning

10

slide-18
SLIDE 18

Neanderthal Learning

10

slide-19
SLIDE 19

Neanderthal Learning

10

slide-20
SLIDE 20

Neanderthal Learning

10

slide-21
SLIDE 21

Neanderthal Learning

Don’t sit under tree! Ugh!

10

slide-22
SLIDE 22

Neanderthal Learning

Don’t sit under tree! Ugh! Round things fall down! Ugh!

10

slide-23
SLIDE 23

Enlightenment!

11

slide-24
SLIDE 24

Enlightenment!

11

slide-25
SLIDE 25

Enlightenment!

11

slide-26
SLIDE 26

Enlightenment!

11

slide-27
SLIDE 27

Enlightenment!

11

slide-28
SLIDE 28

Enlightenment!

F = ma

F = Gm1m2 r2

11

slide-29
SLIDE 29

Compare and Contrast

12

slide-30
SLIDE 30

Compare and Contrast

12

slide-31
SLIDE 31

Compare and Contrast

F = ma F = Gm1m2 r2

12

slide-32
SLIDE 32

Compare and Contrast

Gµν = 8πG c4 Tµν

E = mc2

12

slide-33
SLIDE 33

Compare and Contrast

12

slide-34
SLIDE 34

Compare and Contrast

Round things fall down! Ugh!

12

slide-35
SLIDE 35

Compare and Contrast

What an interesting early

  • human. I wonder

what he thinks!

12

slide-36
SLIDE 36

Deep Learning Weaknesses

◮ Computationally expensive

◮ Big models use specialized hardware for training ◮ Even model application has non-trivial cost

◮ Knowledge is represented by large set distributed weights

◮ Low inherent level of abstraction ◮ Model is noisy

◮ Knowledge is largely inaccessible

◮ Hard to understand ◮ Hard to explain ◮ Hard to communicate

13

slide-37
SLIDE 37

Deep Learning Weaknesses

◮ Computationally expensive

◮ Big models use specialized hardware for training ◮ Even model application has non-trivial cost

◮ Knowledge is represented by large set distributed weights

◮ Low inherent level of abstraction ◮ Model is noisy

◮ Knowledge is largely inaccessible

◮ Hard to understand ◮ Hard to explain ◮ Hard to communicate

Unsupported claim (still true): Deep learning alone will run into natural limits!

13

slide-38
SLIDE 38

Automated Theorem Proving

14

slide-39
SLIDE 39

Theorem Proving: Big Picture

Real World Problem Real W

15

slide-40
SLIDE 40

Theorem Proving: Big Picture

Real World Problem

15

slide-41
SLIDE 41

Theorem Proving: Big Picture

Real World Problem Formalized Problem

15

slide-42
SLIDE 42

Theorem Proving: Big Picture

∀X : human(X) → mortal(X) ∀X : philosopher(X) → human(X) philosopher(socrates) ? | = mortal(socrates)

Real World Problem Formalized Problem

15

slide-43
SLIDE 43

Theorem Proving: Big Picture

∀X : human(X) → mortal(X) ∀X : philosopher(X) → human(X) philosopher(socrates) ? | = mortal(socrates)

Real World Problem Formalized Problem

15

slide-44
SLIDE 44

Theorem Proving: Big Picture

∀X : human(X) → mortal(X) ∀X : philosopher(X) → human(X) philosopher(socrates) ? | = mortal(socrates)

Real World Problem Formalized Problem

ATP

15

slide-45
SLIDE 45

Theorem Proving: Big Picture

∀X : human(X) → mortal(X) ∀X : philosopher(X) → human(X) philosopher(socrates) ? | = mortal(socrates)

Real World Problem Formalized Problem

ATP

Proof

15

slide-46
SLIDE 46

Theorem Proving: Big Picture

∀X : human(X) → mortal(X) ∀X : philosopher(X) → human(X) philosopher(socrates) ? | = mortal(socrates)

Real World Problem Formalized Problem

ATP

Proof Countermodel

  • r

15

slide-47
SLIDE 47

Theorem Proving: Big Picture

∀X : human(X) → mortal(X) ∀X : philosopher(X) → human(X) philosopher(socrates) ? | = mortal(socrates)

Real World Problem Formalized Problem

ATP

Proof Countermodel Timeout

  • r
  • r

15

slide-48
SLIDE 48

Logics of Interest

◮ Propositional logic

◮ SAT-solving: relatively independent sub-field

◮ First-order logics

◮ . . . with free symbols ◮ . . . with free symbols and equality ◮ . . . with background theories ◮ . . . with free symbols and background theories

◮ Higher order logics

◮ Currently developing field

16

slide-49
SLIDE 49

Contradiction and Saturation

◮ Proof by contradiction

◮ Assume negation of conjecture ◮ Show that axioms and negated conjecture imply falsity

◮ Saturation

◮ Convert problem to Clause Normal Form ◮ Systematically enumerate logical consequences of axioms and negated conjecture ◮ Goal: Explicit contradiction (empty clause)

◮ Redundancy elimination

◮ Use contracting inferences to simplify or eliminate some clauses

Formula set Equi- satisfiable clause set

Clausifier

17

slide-50
SLIDE 50

Contradiction and Saturation

◮ Proof by contradiction

◮ Assume negation of conjecture ◮ Show that axioms and negated conjecture imply falsity

◮ Saturation

◮ Convert problem to Clause Normal Form ◮ Systematically enumerate logical consequences of axioms and negated conjecture ◮ Goal: Explicit contradiction (empty clause)

◮ Redundancy elimination

◮ Use contracting inferences to simplify or eliminate some clauses

Search control problem: How and in which order do we enumerate consequences?

Formula set Equi- satisfiable clause set

Clausifier

17

slide-51
SLIDE 51

Proof Search

# SZS status Theorem # SZS output start CNFRefutation fof(pel55_4, axiom, (![X1]:![X2]:(killed(X1,X2)=>hates(X1,X2))), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ001+1.p’, fof(pel55_1, axiom, (?[X1]:(lives(X1)&killed(X1,agatha))), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ001+1.p’, fof(pel55_3, axiom, (![X1]:(lives(X1)=>((X1=agatha|X1=butler)|X1=charles))), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ00 fof(pel55_10, axiom, (![X1]:?[X2]:~(hates(X1,X2))), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ001+1.p’, p fof(pel55_9, axiom, (![X1]:(hates(agatha,X1)=>hates(butler,X1))), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ001+1.p’, fof(pel55_5, axiom, (![X1]:![X2]:(killed(X1,X2)=>~(richer(X1,X2)))), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ001+1.p’, fof(pel55_8, axiom, (![X1]:(~(richer(X1,agatha))=>hates(butler,X1))), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ001+1.p’, fof(pel55_6, axiom, (![X1]:(hates(agatha,X1)=>~(hates(charles,X1)))), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ001+1.p’, fof(pel55_7, axiom, (![X1]:(X1!=butler=>hates(agatha,X1))), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ001+1.p’, fof(pel55_11, axiom, (agatha!=butler), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ001+1.p’, pel55_11)). fof(pel55, conjecture, (killed(agatha,agatha)), file(’/Users/schulz/EPROVER/TPTP_6.4.0_FLAT/PUZ001+1.p’, pel55 fof(c_0_11, plain, (![X3]:![X4]:(~killed(X3,X4)|hates(X3,X4))), inference(variable_rename,[status(thm)],[inference(fof_nnf,[st fof(c_0_12, plain, ((lives(esk1_0)&killed(esk1_0,agatha))), inference(skolemize,[status(esa)],[inference(variable_rename,[stat fof(c_0_13, plain, (![X2]:(~lives(X2)|((X2=agatha|X2=butler)|X2=charles))), inference(variable_rename,[status(thm)],[inference cnf(c_0_14,plain,(hates(X1,X2)|~killed(X1,X2)), inference(split_conjunct,[status(thm)],[c_0_11])). cnf(c_0_15,plain,(killed(esk1_0,agatha)), inference(split_conjunct,[status(thm)],[c_0_12])). cnf(c_0_16,plain,(X1=charles|X1=butler|X1=agatha|~lives(X1)), inference(split_conjunct,[status(thm)],[c_0_13])). cnf(c_0_17,plain,(lives(esk1_0)), inference(split_conjunct,[status(thm)],[c_0_12])). fof(c_0_18, plain, (![X3]:~hates(X3,esk2_1(X3))), inference(skolemize,[status(esa)],[inference(variable_rename,[status(thm)],[ fof(c_0_19, plain, (![X2]:(~hates(agatha,X2)|hates(butler,X2))), inference(variable_rename,[status(thm)],[inference(fof_nnf,[s fof(c_0_20, plain, (![X3]:![X4]:(~killed(X3,X4)|~richer(X3,X4))), inference(variable_rename,[status(thm)],[inference(fof_nnf,[ fof(c_0_21, plain, (![X2]:(richer(X2,agatha)|hates(butler,X2))), inference(variable_rename,[status(thm)],[inference(fof_nnf,[s fof(c_0_22, plain, (![X2]:(~hates(agatha,X2)|~hates(charles,X2))), inference(variable_rename,[status(thm)],[inference(fof_nnf, cnf(c_0_23,plain,(hates(esk1_0,agatha)), inference(spm,[status(thm)],[c_0_14, c_0_15])). cnf(c_0_24,plain,(esk1_0=charles|esk1_0=butler|esk1_0=agatha), inference(spm,[status(thm)],[c_0_16, c_0_17])). cnf(c_0_25,plain,(~hates(X1,esk2_1(X1))), inference(split_conjunct,[status(thm)],[c_0_18])). cnf(c_0_26,plain,(hates(butler,X1)|~hates(agatha,X1)), inference(split_conjunct,[status(thm)],[c_0_19])). fof(c_0_27, plain, (![X2]:(X2=butler|hates(agatha,X2))), inference(variable_rename,[status(thm)],[inference(fof_nnf,[status(th cnf(c_0_28,plain,(~richer(X1,X2)|~killed(X1,X2)), inference(split_conjunct,[status(thm)],[c_0_20])). cnf(c_0_29,plain,(hates(butler,X1)|richer(X1,agatha)), inference(split_conjunct,[status(thm)],[c_0_21])). cnf(c_0_30,plain,(~hates(charles,X1)|~hates(agatha,X1)), inference(split_conjunct,[status(thm)],[c_0_22])). cnf(c_0_31,plain,(esk1_0=agatha|esk1_0=butler|hates(charles,agatha)), inference(spm,[status(thm)],[c_0_23, c_0_24])). 18

slide-52
SLIDE 52

Proof Search

# SZS output start CNFRefutation fof(pel55_4, axiom, (![X1]:![X2]:(killed(X1,X2)=>hates(X1,X2))), file(’PUZ001+1.p’, pel55_4)). ... fof(pel55, conjecture, (killed(agatha,agatha)), file(’PUZ001+1.p’, pel55)). ... fof(c_0_12, plain, ((lives(esk1_0)&killed(esk1_0,agatha))), inference(skolemize,[status(esa)], [inference(variable_rename,[status(thm)],[pel55_1])])). ... cnf(c_0_14,plain,(hates(X1,X2)|~killed(X1,X2)), inference(split_conjunct,[status(thm)],[c_0_11])). ... cnf(c_0_23,plain,(hates(esk1_0,agatha)), inference(spm,[status(thm)],[c_0_14, c_0_15])). ... cnf(c_0_45,plain,($false), inference(sr,[status(thm)],[inference(rw,[status(thm)], [c_0_15, c_0_43]), c_0_44]), [’proof’]). # SZS output end CNFRefutation

19

slide-53
SLIDE 53

Proof Search and Choice Points

◮ First-order logic is semi-decidable

◮ Provers search for proof in infinite space ◮ . . . of possible derivations ◮ . . . of possible consequences

◮ Major choice points of Superposition calculus:

◮ Term ordering (which terms are bigger) ◮ (Negative) literal selection ◮ Selection of clauses for inferences (with the given clause algorithm)

20

slide-54
SLIDE 54

Some Properties of ATP

◮ Individual operations cheap(ish)

◮ Computing one consequence is no problem ◮ Computing 1000 consequences is no problem

◮ But: Large/infinite search space

◮ 1000 consequences is usually enough for a proof ◮ . . . but rarely enough to find it!

◮ Combinatorial explosion

◮ High branching factor ◮ Simplification helps a lot ◮ . . . but not nearly enough!

21

slide-55
SLIDE 55

Big Data and ATP

◮ Automated tuning of theorem provers since

the 1990s

◮ Examples:

◮ E-SETHEO schedules ◮ E’s automatic auto mode ◮ Vampire’s black magic box

◮ Based on performance only

◮ Reason: Proof search traces are big!

◮ . . . really big! ◮ . . . and theorem provers are memory-limited anyways

22

slide-56
SLIDE 56

Big Data and ATP

◮ Automated tuning of theorem provers since

the 1990s

◮ Examples:

◮ E-SETHEO schedules ◮ E’s automatic auto mode ◮ Vampire’s black magic box

◮ Based on performance only

◮ Reason: Proof search traces are big!

◮ . . . really big! ◮ . . . and theorem provers are memory-limited anyways

◮ Ca. 2014: Something wonderful happens

◮ Hardware finally catches up ◮ Implementation techniques improve

What is wrong? The prover is not running out of memory!

22

slide-57
SLIDE 57

Big Data and ATP

◮ Automated tuning of theorem provers since

the 1990s

◮ Examples:

◮ E-SETHEO schedules ◮ E’s automatic auto mode ◮ Vampire’s black magic box

◮ Based on performance only

◮ Reason: Proof search traces are big!

◮ . . . really big! ◮ . . . and theorem provers are memory-limited anyways

◮ Ca. 2014: Something wonderful happens

◮ Hardware finally catches up ◮ Implementation techniques improve

What is wrong? The prover is not running out of memory!

We can finally afford to look DEEPLY into proofs!

22

slide-58
SLIDE 58

Deep Reasoning

23

slide-59
SLIDE 59

Vision: Search Control

◮ Long-term goal: Extract search control knowledge

◮ . . . from examples of successful proof searches ◮ . . . from examples of failing proof searches

◮ Primary use case: Clause selection

◮ Which of the current candidate consequences should be considered first? ◮ Extract good/bad search decisions from proof protocols

24

slide-60
SLIDE 60

Vision: Search Control

◮ Long-term goal: Extract search control knowledge

◮ . . . from examples of successful proof searches ◮ . . . from examples of failing proof searches

◮ Primary use case: Clause selection

◮ Which of the current candidate consequences should be considered first? ◮ Extract good/bad search decisions from proof protocols

◮ It’s happening!

◮ Premise selection (Urban, Irving, et al) ◮ Clause Selection (Loos, Irvin, Kaliszyk et al) - see next session

24

slide-61
SLIDE 61

Vision: Search Control

◮ Long-term goal: Extract search control knowledge

◮ . . . from examples of successful proof searches ◮ . . . from examples of failing proof searches

◮ Primary use case: Clause selection

◮ Which of the current candidate consequences should be considered first? ◮ Extract good/bad search decisions from proof protocols

◮ It’s happening!

◮ Premise selection (Urban, Irving, et al) ◮ Clause Selection (Loos, Irvin, Kaliszyk et al) - see next session

24

slide-62
SLIDE 62

Vision: Automated Scientist

◮ Setting: Background theory+examples

◮ Background theory in explicit logic ◮ Examples

◮ Process

◮ Deep learner hypothesizes relationship ◮ Hypothesis is converted to symbolic logic (Magic happens here) ◮ ATP system checks hypotheses for consistency with background theory

◮ Failure: Abduction can refine hypothesis ◮ Success: Tentatively add hypothesis to theory

◮ ATP system generates new consequences to test on examples

25

slide-63
SLIDE 63

Vision: Fully Interactive AI

◮ Setting: Rational agent interacting with environment ◮ Deep learner:

◮ Vision ◮ Voice ◮ Language ◮ Suggest actions

◮ Symbolic reasoning system

◮ Hard-coded world knowledge ◮ Hard-coded constraints on behavior

26

slide-64
SLIDE 64

The End

27

slide-65
SLIDE 65

Conclusion

◮ Deep learning and symbolic reasoning are complementary ◮ Hardware is now finally sufficient for both

◮ . . . even in combined systems

◮ We’re looking forward to an interesting future

28

slide-66
SLIDE 66

Conclusion

◮ Deep learning and symbolic reasoning are complementary ◮ Hardware is now finally sufficient for both

◮ . . . even in combined systems

◮ We’re looking forward to an interesting future

And when the time comes to decide whether to switch on the new, improved AI that is vastly superior to humans and will eliminate all errors, a couple of imperial bureaucrats will gather round a table, and

  • ne will say: “We’ve already paid for it, so let’s switch it on”. . .

28

slide-67
SLIDE 67

Conclusion

◮ Deep learning and symbolic reasoning are complementary ◮ Hardware is now finally sufficient for both

◮ . . . even in combined systems

◮ We’re looking forward to an interesting future

And when the time comes to decide whether to switch on the new, improved AI that is vastly superior to humans and will eliminate all errors, a couple of imperial bureaucrats will gather round a table, and

  • ne will say: “We’ve already paid for it, so let’s switch it on”. . .

Marc Uwe Kling (as “the Kangaroo”)

28

slide-68
SLIDE 68

Thank you!

Questions? Discussion?

29