Measuring progress to predict success: Can a good proof strategy be - PowerPoint PPT Presentation

Measuring progress to predict success: Can a good proof strategy be evolved? Giles Reger 1 , Martin Suda 2 1 School of Computer Science, University of Manchester, UK 2 TU Wien, Vienna, Austria AITP 2017 – Obergurgl, March 29, 2017 1/21

Vampire advertising Vampire a “reasonably well-performing” first-order ATP unfortunately not open source known to be notoriously hard to obtain 1/21

Vampire advertising Vampire a “reasonably well-performing” first-order ATP unfortunately not open source known to be notoriously hard to obtain Things are actually not so dark: email me, I can send you an executable find one at https://www.starexec.org/ (don’t) look for the source at: http://www.cs.miami.edu/~tptp/CASC/J8/Entrants.html 1/21

Outline The role of strategies in modern ATPs 1 Proving with orderings 2 How to evolve a precedence? 3 Conclusion 4 2/21

The role of strategies in modern ATPs Strategy: there are many-many options to setup the proving process a strategy is a concrete way to do this setup 3/21

The role of strategies in modern ATPs Strategy: there are many-many options to setup the proving process a strategy is a concrete way to do this setup From the ATP lore If a strategy solves a problem then it typically solves it within a short amount of time (say, 5 seconds). 3/21

The role of strategies in modern ATPs Strategy: there are many-many options to setup the proving process a strategy is a concrete way to do this setup From the ATP lore If a strategy solves a problem then it typically solves it within a short amount of time (say, 5 seconds). What does this mean? There is no single best strategy It’s usually better to start something else than to wait Strategy Scheduling (portfolio approach) 3/21

CASC-mode: a conditional schedule of strategies case Property::FNE: if (atoms > 2000) { quick.push("dis+1011_40_bs=on:cond=on:gs=on:gsaa=from_current:nwc=1:sfr=on:ssfp=1000:ssfq=2.0:smm=sco:ssnc=none:updr=off_282"); quick.push("lrs+1011_3_nwc=1:stl=90:sos=on:spl=off:sp=reverse_arity_133"); quick.push("dis-10_5_cond=fast:gsp=input_only:gs=on:gsem=off:nwc=1:sas=minisat:sos=all:spl=off:sp=occurrence_190"); quick.push("lrs+1011_5_cond=fast:gs=on:nwc=2.5:stl=30:sd=3:ss=axioms:sdd=off:sfr=on:ssfp=100000:ssfq=1.0:smm=sco:ssnc=none:sp=occurrence_278"); quick.push("lrs-3_5:4_bs=on:bsr=on:cond=on:fsr=off:gsp=input_only:gs=on:gsaa=from_current:gsem=on:lcm=predicate:nwc=1.1:nicw=on:sas=minisat:stl= } else if (atoms > 1200) { quick.push("lrs+1011_5_cond=fast:gs=on:nwc=2.5:stl=30:sd=3:ss=axioms:sdd=off:sfr=on:ssfp=100000:ssfq=1.0:smm=sco:ssnc=none:sp=occurrence_2"); quick.push("dis+1011_8_bsr=unit_only:cond=fast:fsr=off:gs=on:gsaa=full_model:nm=0:nwc=1:sas=minisat:sos=all:sfr=on:ssfp=4000:ssfq=1.1:smm=off:sp quick.push("dis+11_7_gs=on:gsaa=full_model:lcm=predicate:nwc=1.1:sas=minisat:ssac=none:ssfp=1000:ssfq=1.0:smm=sco:sp=reverse_arity:urr=ec_only_8 quick.push("ins+11_5_br=off:gs=on:gsem=off:igbrr=0.9:igrr=1/64:igrp=1400:igrpq=1.1:igs=1003:igwr=on:lcm=reverse:nwc=1:spl=off:urr=on:updr=off_11 } else { quick.push("dis+11_7_16"); quick.push("dis+1011_5:4_gs=on:gsssp=full:nwc=1.5:sas=minisat:ssac=none:sdd=off:sfr=on:ssfp=40000:ssfq=1.4:smm=sco:ssnc=all:sp=reverse_arity:upd quick.push("dis+1011_40_bs=on:cond=on:gs=on:gsaa=from_current:nwc=1:sfr=on:ssfp=1000:ssfq=2.0:smm=sco:ssnc=none:updr=off_14"); ... 4/21

Results for FOF division of CASC 2016 1 1 www.cs.miami.edu/~tptp/CASC/J8/WWWFiles/ResultsPlots.html 5/21

Outline The role of strategies in modern ATPs 1 Proving with orderings 2 How to evolve a precedence? 3 Conclusion 4 6/21

b The Saturation Loop Saturate a set of clauses with respect to an inference system Unprocessed Active Passive Initially: the input clauses start in passive, active is empty Given clause: selected from passive as the next to be processed Move the give clause from active to passive and perform all inferences between clauses in active and the given clause 7/21

The superposition calculus ( ≻ ) Resolution Factoring ¬ A ′ ∨ C 2 A ∨ A ′ ∨ C A ∨ C 1 , , ( C 1 ∨ C 2 ) θ ( A ∨ C ) θ where, for both inferences, θ = mgu ( A , A ′ ) and A is not an equality literal , and A and ¬ A ′ are (strictly) maximal in their respective clauses Superposition t [ s ] p ⊗ t ′ ∨ C 2 L [ s ] p ∨ C 2 l ≃ r ∨ C 1 l ≃ r ∨ C 1 , or ( t [ r ] p ⊗ t ′ ∨ C 1 ∨ C 2 ) θ ( L [ r ] p ∨ C 1 ∨ C 2 ) θ where θ = mgu ( l , s ) and r θ �� l θ and, for the left rule L [ s ] is not an equality literal, and for the right rule ⊗ stands either for ≃ or �≃ and t ′ θ �� t [ s ] θ EqualityResolution EqualityFactoring s ≃ t ∨ s ′ ≃ t ′ ∨ C s �≃ t ∨ C , , ( t �≃ t ′ ∨ s ′ ≃ t ′ ∨ C ) θ C θ where θ = mgu ( s , s ′ ) , t θ �� s θ, and t ′ θ �� s ′ θ where θ = mgu ( s , t ) 8/21

How important could an ordering be? Consider proving a formula � � ψ = ( a i ∨ b i ) → ( a i ∨ b i ) i = 1 ,..., n i = 1 ,..., n 9/21

How important could an ordering be? Consider proving a formula � � ψ = ( a i ∨ b i ) → ( a i ∨ b i ) i = 1 ,..., n i = 1 ,..., n a naive clausification of ¬ ψ has 2 n + n clauses! 9/21

How important could an ordering be? Consider proving a formula � � ψ = ( a i ∨ b i ) → ( a i ∨ b i ) i = 1 ,..., n i = 1 ,..., n a naive clausification of ¬ ψ has 2 n + n clauses! goes down to 3 n + 1 with Tseitin encoding: ( a i ∨ b i ) , ( ¬ m i ∨ ¬ a i ) , ( ¬ m i ∨ ¬ b i ) , ( m 1 ∨ . . . ∨ m n ) , where m i is a name for ¬ a i ∧ ¬ b i 9/21

How important could an ordering be? Consider proving a formula � � ψ = ( a i ∨ b i ) → ( a i ∨ b i ) i = 1 ,..., n i = 1 ,..., n a naive clausification of ¬ ψ has 2 n + n clauses! goes down to 3 n + 1 with Tseitin encoding: ( a i ∨ b i ) , ( ¬ m i ∨ ¬ a i ) , ( ¬ m i ∨ ¬ b i ) , ( m 1 ∨ . . . ∨ m n ) , where m i is a name for ¬ a i ∧ ¬ b i Question: What will superposition derive under an ordering where m i ≻ a j and m i ≻ b j for every i and j ? 9/21

Choosing an ordering Orderings typically used in ATPs: Knuth-Bendix Ordering (KBO), Lexicographic Path Ordering (LPO) 10/21

Choosing an ordering Orderings typically used in ATPs: Knuth-Bendix Ordering (KBO), Lexicographic Path Ordering (LPO) Both determined by a precedence on the problem’s signature: a linear order on the symbols occurring in the problem We have n ! possibilities for choosing the ordering 10/21

Choosing an ordering Orderings typically used in ATPs: Knuth-Bendix Ordering (KBO), Lexicographic Path Ordering (LPO) Both determined by a precedence on the problem’s signature: a linear order on the symbols occurring in the problem We have n ! possibilities for choosing the ordering ATPs typically provide a few schemes for fixing the precedence Example Vampire: arity, reverse arity, occurrence E: frequency ( invfreq ), many more 10/21

Playing with precedence Rules of the game Fix a single theorem proving strategy in Vampire: -av off -sa discount -awr 10 -lcm predicate Then by varying only the precedence try to solve as many TPTP problems as possible 11/21

Playing with precedence Rules of the game Fix a single theorem proving strategy in Vampire: -av off -sa discount -awr 10 -lcm predicate Then by varying only the precedence try to solve as many TPTP problems as possible TPTP library, version 6.4.0, contains 17280 first-order problems 11/21

Playing with precedence Rules of the game Fix a single theorem proving strategy in Vampire: -av off -sa discount -awr 10 -lcm predicate Then by varying only the precedence try to solve as many TPTP problems as possible TPTP library, version 6.4.0, contains 17280 first-order problems 9277 solved by “arity” in 300s 11/21

Playing with precedence Rules of the game Fix a single theorem proving strategy in Vampire: -av off -sa discount -awr 10 -lcm predicate Then by varying only the precedence try to solve as many TPTP problems as possible TPTP library, version 6.4.0, contains 17280 first-order problems 9277 solved by “arity” in 300s 9457 solved by “frequency” in 300s (Thank you, Stephan!) 11/21

Playing with precedence Rules of the game Fix a single theorem proving strategy in Vampire: -av off -sa discount -awr 10 -lcm predicate Then by varying only the precedence try to solve as many TPTP problems as possible TPTP library, version 6.4.0, contains 17280 first-order problems 9277 solved by “arity” in 300s 9457 solved by “frequency” in 300s (Thank you, Stephan!) ∼ 12500 solved in 300s by either casc or casc_sat mode 11/21

How good is a random precedence? From the previous page: 9277 by “arity” in 300s 9457 by “frequency” in 300s 12/21

How good is a random precedence? From the previous page: 9277 by “arity” in 300s 9457 by “frequency” in 300s Shuffle once: ∼ 7100 solved with a random precedence (3s) ∼ 8450 solved with a random precedence (60s) ∼ 9100 solved with a random precedence (300s) 12/21

Measuring progress to predict success: Can a good proof strategy be - PowerPoint PPT Presentation

Measuring progress to predict success: Can a good proof strategy be evolved? Giles Reger 1 , Martin Suda 2 1 School of Computer Science, University of Manchester, UK 2 TU Wien, Vienna, Austria AITP 2017 Obergurgl, March 29, 2017 1/21 Vampire

PREDICT- -HD HD PREDICT BIG QUESTION: What do we need before we can treat HD ? How does

3515ICT Theory of Computation Some sample proofs 4-0 Proof types 1. Proof

tomferry.com/success tomferry.com/success tomferry.com/success Send me a Tweet @TomFerry w/

Architecture Aromatique Good Taste Good Food Good Health Based on sustainability Technical

Solar Cycle 25 in V2 of SSN If possible, also provide: Predict north/south hemispheres

PROOF installation/usage Attila Krasznahorkay for the Tier3 PROOF WG Wednesday, June 9, 2010

TOURNAMENT PAPER WORK REVIEW TOURNAMENT PLAYER VERIFICATION FORM Proof of Age Proof of

CS 671 Automated Reasoning Proof Automation in First Order Logic 1. Tactic-based proof search 2.

N OT A SINGLE PROOF ASSISTANT FOR ALL BUT PROOF ASSISTANTS FOR EVERYONE N ICOLAS T ABAREAU Not

PROOF of the Pudding in Canada PROOF of the Pudding in Canada 2010 ITMAT International Symposium

Proof of Honesty: Coalition-Proof Blockchain Validation without Proof of Work or Stake John P.

Proof Methods Makarius Wenzel TU M unchen August 2009 Structured proof texts Structured

Proof Mining: Proof Interpretations and Their Use in Mathematics Ulrich Kohlenbach Department of

KeYmaera Improving the Proof Experience Corwin de Boor Cyber-Physical Systems

Kildare Export Success Seminar Kilian Duignan Export Success Seminar Export Success Seminar

Fall Vegetable Garden A Successful Garden Good Siting Sunlight at least 6 hrs. Good

AUTOMOTIVE NEWS MARKETING 360 ED LAUKES GROUP VICE PRESIDENT, TOYOTA DIVISION MARKETING Under

Connecticut Bar Association Commercial l Law & Bankr kruptcy Section Hon. Julie A. Manning ,

Taming the SOA Beast SOA introduces complexity as well as new organizational impacts. We need to

Fiscal Multipliers and Financial Crises Miguel Faria-e-Castro Federal Reserve Bank of St. Louis

Charge! a framework for higher-order separation logic in Coq Lars Birkedal Logic and Semantics

Kelowna Chamber of Commerce - OC Series Online Kickstarting Tourism Recovery in Kelowna Where

Channel design Channel coverage Intensive Selective Exclusive Channel

Resurprise Me! By: Will Gannon A smart recipe picker for the picky eater Architecture Weka

Measuring progress to predict success: Can a good proof strategy be - PowerPoint PPT Presentation

Measuring progress to predict success: Can a good proof strategy be evolved? Giles Reger 1 , Martin Suda 2 1 School of Computer Science, University of Manchester, UK 2 TU Wien, Vienna, Austria AITP 2017 Obergurgl, March 29, 2017 1/21 Vampire

PREDICT- -HD HD PREDICT BIG QUESTION: What do we need before we can treat HD ? How does

3515ICT Theory of Computation Some sample proofs 4-0 Proof types 1. Proof

tomferry.com/success tomferry.com/success tomferry.com/success Send me a Tweet @TomFerry w/

Architecture Aromatique Good Taste Good Food Good Health Based on sustainability Technical

Solar Cycle 25 in V2 of SSN If possible, also provide: Predict north/south hemispheres

PROOF installation/usage Attila Krasznahorkay for the Tier3 PROOF WG Wednesday, June 9, 2010

TOURNAMENT PAPER WORK REVIEW TOURNAMENT PLAYER VERIFICATION FORM Proof of Age Proof of

CS 671 Automated Reasoning Proof Automation in First Order Logic 1. Tactic-based proof search 2.

N OT A SINGLE PROOF ASSISTANT FOR ALL BUT PROOF ASSISTANTS FOR EVERYONE N ICOLAS T ABAREAU Not

PROOF of the Pudding in Canada PROOF of the Pudding in Canada 2010 ITMAT International Symposium

Proof of Honesty: Coalition-Proof Blockchain Validation without Proof of Work or Stake John P.

Proof Methods Makarius Wenzel TU M unchen August 2009 Structured proof texts Structured

Proof Mining: Proof Interpretations and Their Use in Mathematics Ulrich Kohlenbach Department of

KeYmaera Improving the Proof Experience Corwin de Boor Cyber-Physical Systems

Kildare Export Success Seminar Kilian Duignan Export Success Seminar Export Success Seminar

Fall Vegetable Garden A Successful Garden Good Siting Sunlight at least 6 hrs. Good

AUTOMOTIVE NEWS MARKETING 360 ED LAUKES GROUP VICE PRESIDENT, TOYOTA DIVISION MARKETING Under

Connecticut Bar Association Commercial l Law &amp; Bankr kruptcy Section Hon. Julie A. Manning ,

Taming the SOA Beast SOA introduces complexity as well as new organizational impacts. We need to

Fiscal Multipliers and Financial Crises Miguel Faria-e-Castro Federal Reserve Bank of St. Louis

Charge! a framework for higher-order separation logic in Coq Lars Birkedal Logic and Semantics

Kelowna Chamber of Commerce - OC Series Online Kickstarting Tourism Recovery in Kelowna Where

Channel design Channel coverage Intensive Selective Exclusive Channel

Resurprise Me! By: Will Gannon A smart recipe picker for the picky eater Architecture Weka

Connecticut Bar Association Commercial l Law & Bankr kruptcy Section Hon. Julie A. Manning ,