Hints for AVATAR (and some more) Martin Suda Czech Technical - PowerPoint PPT Presentation

Hints for AVATAR (and some more) Martin Suda Czech Technical University in Prague, Czech Republic PIWo 2019, Prague, October 2019 1/17

“Interactive Theorem Proving” with ATPs Some people actually use ATPs to do math! 1/17

“Interactive Theorem Proving” with ATPs Some people actually use ATPs to do math! e.g., Bob Veroff and Michael Kinyon using Otter, Prover9, Mace4 questions from algebra: axioms bases for boolean algebras, ortho-lattices, loop theory targeting open problems (e.g. the AIM conjecture) 1/17

“Interactive Theorem Proving” with ATPs Some people actually use ATPs to do math! e.g., Bob Veroff and Michael Kinyon using Otter, Prover9, Mace4 questions from algebra: axioms bases for boolean algebras, ortho-lattices, loop theory targeting open problems (e.g. the AIM conjecture) In what sense interactive? a single proof attempt (ATP call) usually does not solve it trying different formulations / axiomatizations trying various additional assumptions and learning from them 1/17

“Interactive Theorem Proving” with ATPs Some people actually use ATPs to do math! e.g., Bob Veroff and Michael Kinyon using Otter, Prover9, Mace4 questions from algebra: axioms bases for boolean algebras, ortho-lattices, loop theory targeting open problems (e.g. the AIM conjecture) In what sense interactive? a single proof attempt (ATP call) usually does not solve it trying different formulations / axiomatizations trying various additional assumptions and learning from them ➥ By the way, these attempts may run for weeks! 1/17

Hints What is a hint? a clause supplied by the user as part of the input whenever a newly derived clause C subsumes a hint clause, this C is prioritized for selection 2/17

Hints What is a hint? a clause supplied by the user as part of the input whenever a newly derived clause C subsumes a hint clause, this C is prioritized for selection ➥ Hints are a means for steering the proof search! 2/17

Hints What is a hint? a clause supplied by the user as part of the input whenever a newly derived clause C subsumes a hint clause, this C is prioritized for selection ➥ Hints are a means for steering the proof search! Where do hints come from? the (expert) user just thinks of some 2/17

Hints What is a hint? a clause supplied by the user as part of the input whenever a newly derived clause C subsumes a hint clause, this C is prioritized for selection ➥ Hints are a means for steering the proof search! Where do hints come from? the (expert) user just thinks of some more realistically: clauses from proofs of similar theorems or of the same theorem but under different assumptions 2/17

Hints What is a hint? a clause supplied by the user as part of the input whenever a newly derived clause C subsumes a hint clause, this C is prioritized for selection ➥ Hints are a means for steering the proof search! Where do hints come from? the (expert) user just thinks of some more realistically: clauses from proofs of similar theorems or of the same theorem but under different assumptions ➥ Hope that similar theorems can be proved using similar intermediate steps. 2/17

Hints What is a hint? a clause supplied by the user as part of the input whenever a newly derived clause C subsumes a hint clause, this C is prioritized for selection ➥ Hints are a means for steering the proof search! Where do hints come from? the (expert) user just thinks of some more realistically: clauses from proofs of similar theorems or of the same theorem but under different assumptions ➥ Hope that similar theorems can be proved using similar intermediate steps. How to come up with hints automatically? 2/17

AVATAR: a reminder AVATAR [Voronkov’14] modern architecture of first order theorem provers integrates saturation with a SAT solver (or an SMT solver) efficient realization of the clause splitting rule instead of one monolithic proof search a sequence of proof searches on (much) smaller sub-problems implemented in theorem prover Vampire shown highly successful in practice 3/17

AVATAR architecture overview FO solver Update model Assert C Ð r C s New splittable clause: C 1 _ . . . _ C n Remove component C New contradiction K Ð r C 1 s , . . . , r C n s Splitting Interface Solve Model or Insert split clause r C 1 s _ . . . _ r C n s Unsatisfiable Insert contradiction clause �r C 1 s _ . . . _ �r C n s Base (SAT or SMT) solver 4/17

Boosting AVATAR with hints Instead of waiting for the user to supply hints for problem P . . . . . . attempt P using AVATAR and collect as hints the first-order parts of the clauses appearing in the sub-proofs of the so far derived contradiction clauses 5/17

Boosting AVATAR with hints Instead of waiting for the user to supply hints for problem P . . . . . . attempt P using AVATAR and collect as hints the first-order parts of the clauses appearing in the sub-proofs of the so far derived contradiction clauses DEMO! 5/17

Outline Hints for AVATAR 1 An Experiment 2 What is a Significant Improvement? 3 6/17

Experimental setup Vampire setup: --saturation_algorithm discount (for stability) --age_weight_ratio 1:10 (works well with discount) --time_limit 10 (reasonable time to finish) 8/17

Experimental setup Vampire setup: --saturation_algorithm discount (for stability) --age_weight_ratio 1:10 (works well with discount) --time_limit 10 (reasonable time to finish) Computers: either Starexec or CTU’s (slurm) cluster 8/17

Experimental setup Vampire setup: --saturation_algorithm discount (for stability) --age_weight_ratio 1:10 (works well with discount) --time_limit 10 (reasonable time to finish) Computers: either Starexec or CTU’s (slurm) cluster The benchmark: TPTP v 7.2.0 17573 eligible first-order problems 8/17

Results (on Starexec) configuration solved uniques additional base 7914 0 7914 base+hints 7882 2 62 sac 8100 13 299 sac+hints 8106 13 23 base = -sa discount -awr 10 -t 10 sac = --split_at_activation on 9/17

Results (on Starexec) configuration solved uniques additional base 7914 0 7914 base+hints 7882 2 62 sac 8100 13 299 sac+hints 8106 13 23 base = -sa discount -awr 10 -t 10 sac = --split_at_activation on Experimented with AVATAR flushing; also not very interesting 9/17

Let’s try a different benchmark . . . MIZAR bushy “small” 57 880 problems translated from the MIZAR library 10/17

Let’s try a different benchmark . . . MIZAR bushy “small” 57 880 problems translated from the MIZAR library (base: -sa discount -awr 10 -t 10 -sac on ) 10/17

Let’s try a different benchmark . . . MIZAR bushy “small” 57 880 problems translated from the MIZAR library (base: -sa discount -awr 10 -t 10 -sac on ) Results configuration solved uniques base 14843 184 base+hints 14873 214 10/17

Let’s try a different benchmark . . . MIZAR bushy “small” 57 880 problems translated from the MIZAR library (base: -sa discount -awr 10 -t 10 -sac on ) Results configuration solved uniques base 14843 184 base+hints 14873 214 (30 problems is approx. 0.5%� of the benchmark size) 10/17

So, should we be sad and abandon the idea? 11/17

So, should we be sad and abandon the idea? Maybe, but . . . 11/17

So, should we be sad and abandon the idea? Maybe, but . . . maybe it only gets interesting with really hard problems! 11/17

So, should we be sad and abandon the idea? Maybe, but . . . maybe it only gets interesting with really hard problems! maybe we should have a smarter notion of similarity! demodulate hints? 11/17

So, should we be sad and abandon the idea? Maybe, but . . . maybe it only gets interesting with really hard problems! maybe we should have a smarter notion of similarity! demodulate hints? maybe we need restarts to prevent the prover from choking 11/17

So, should we be sad and abandon the idea? Maybe, but . . . maybe it only gets interesting with really hard problems! maybe we should have a smarter notion of similarity! demodulate hints? maybe we need restarts to prevent the prover from choking we should also try strengthening the theory with reasonable additional assumptions, as routinely done by Veroff et al. 11/17

So, should we be sad and abandon the idea? Maybe, but . . . maybe it only gets interesting with really hard problems! maybe we should have a smarter notion of similarity! demodulate hints? maybe we need restarts to prevent the prover from choking we should also try strengthening the theory with reasonable additional assumptions, as routinely done by Veroff et al. ➥ Ongoing and future work! 11/17

A Methodology Question When should we get excited about a new technique? 13/17

A Methodology Question When should we get excited about a new technique? 1 The idea looks clever and sophisticated 13/17

A Methodology Question When should we get excited about a new technique? 1 The idea looks clever and sophisticated ➥ Could aim for a pure theory paper at CADE! 13/17

Hints for AVATAR (and some more) Martin Suda Czech Technical - PowerPoint PPT Presentation

Hints for AVATAR (and some more) Martin Suda Czech Technical University in Prague, Czech Republic PIWo 2019, Prague, October 2019 1/17 Interactive Theorem Proving with ATPs Some people actually use ATPs to do math! 1/17 Interactive

Playing with AVATAR How to play with AVATAR Giles Reger, Martin Suda and Andrei Voronkov School

Project Avatar Dr Geraldine Paterson 1 Avatar Funded by Network Innovation Allowance October

IVA Interactive Video Avatar (Toolkit for an interactive video for creating Avatar, simulator

presentations some hints some hints How to give good seminar Friedemann Mattern , ETH Zurich

Senior school home page SEM and EM photographs of various microbes and all pack artwork available

Technology Hints and Tips 2 Technology Hints and Tips This presentation focuses on sharing

The Avatar project: Improving embedded security with SE, KLEE and Qemu

Building A Better Airline What Makes Avatar Different ? Building A Better Airline

AVATAR: A Framework for Dynamic Security Analysis of Embedded Systems Firmwares Jonas Zaddach

Contents 1. Binary firmware analysis 2. Tooling landscape 3. The avatar 2 framework 4. Examples

Avatar Mobility in Wei Tsang Ooi Mehul Motani Huiguang Liang Ian Tay Ming Feng Neo National

Cooperating Proof Attempts in Vampire Dmitry Tishkovsky Andrei Voronkov Giles Reger University

Avatar - Enhancing Binary Firmware Security Analysis with Dynamic Multi-Target Orchestration

Type hints w jzyku Python Konrad Haas 4Developers 2018 Plan type hints dlaczego?

Learn more Do more Be more Learn more Do more Be more UNITY Learn more Do

Defect Detection Thomas Zimmermann The First Bug September 9, 1947 More Bugs More Bugs More

Get Your Business Marketing Ready MELISSA LOVE 3 keys to success. 1.Take massive action 2.

Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent Rainer Gemulla

Introduction to NEXT TUESDAY (25th November) and THURSDAY Second Life (27th November) we will

Outline Introduction Related Work System Architecture: three major software modules

When Should We Add Theory Axioms And Which Ones? Giles Reger 1 , Martin Suda 1 2 1 School of

Measuring QoS in Web-Based Virtual Worlds: an Evaluation of

Build Scalable APIs using GraphQL and Serverless @simona_cotin @simona_cotin @simona_cotin

From UseCases to Specifications Fulup Ar Foll Liberty Technical Expert Group Master Architect,