Project Proposal: Machine Learning Good Symbol Precedences 1 Filip - PowerPoint PPT Presentation

Project Proposal: Machine Learning Good Symbol Precedences 1 Filip Bártek Martin Suda Czech Technical University in Prague, Czech Republic September 16, 2020 1 Supported by the ERC Consolidator grant AI4REASON no. 649043 under the EU-H2020 programme, the Czech Science Foundation project 20-06390Y and the Grant Agency of the Czech Technical University in Prague, grant no. SGS20/215/OHK3/3T/37. 1/10

Outline Motivation Precedence recommender system Architecture Training Experimental results 2/10

Context Theorem prover of choice: Vampire Automated theorem proving for first-order logic (FOL) Refutation-based Saturation-based Superposition calculus Simplification ordering on terms Symbol precedence 3/10

Why does symbol precedence matter? FOL problem: a = b ⇒ f ( a , b ) = f ( b , b ) CNF: a = b ∧ f ( a , b ) � = f ( b , b ) Precedence [ f , a , b ] orders a < b : f ( a , b ) � = f ( b , b ) → f ( a , a ) � = f ( b , b ) → f ( a , a ) � = f ( a , b ) → f ( a , a ) � = f ( a , a ) →⊥ Precedence [ f , b , a ] orders b < a : f ( a , b ) � = f ( b , b ) → f ( b , b ) � = f ( b , b ) →⊥ 4/10

Precedence recommender system First-order logic problem Vampire (clausification mode) Clause normal form (CNF) Graph Convolution Network Symbol embeddings Feed-forward neural network Symbol costs Order symbols by their costs Symbol precedence 5/10

Training data Repeat: 1. Sample a problem P from TPTP. 2. Try to solve P using Vampire with two random precedences π 0 , π 1 . 3. If π 0 leads to a faster proof search than π 1 , store the training sample ( P , π 0 , π 1 ) . We train a classifier that decides: Is π 0 better than π 1 ? 6/10

Model of “precedence π 0 is better than π 1 ” 1. Trainable symbol cost model c sym : Σ → R 2. Precedence cost c prec : Precedences (Σ) → R : � c prec ( π ) = c sym ( π ( i )) · i 1 ≤ i ≤| Σ | Ordering symbols in decreasing order by c sym minimizes c prec . 3. Precedence pair cost: c pair ( π 0 , π 1 ) = c prec ( π 1 ) − c prec ( π 0 ) 4. Probability that π 0 is better than π 1 : sigmoid( c pair ( π 0 , π 1 )) 7/10

Classifier: Is precedence π 0 better than π 1 ? Problem P π 0 π 1 Vampire Invert Invert π 0-1 π 1-1 Clause normal form (CNF) Graph Convolution Network Symbol embeddings Inverse precedence difference Feed-forward neural network Normalize Symbol costs Normalized inverse precedence difference Order symbols by their costs Symbol precedence Precedence pair cost Binary cross-entropy Loss 8/10

Graph Convolution Network example a = b ∧ f ( a , b ) � = f ( b , b ) f(a,b)≠f(b,b) : clause - f(a,b)=f(b,b) : equality atom a=b : clause f(a,b) : term f(b,b) : term + a=b : equality atom argument 1 f : function argument 1 a : term argument 2 argument 2 a : function b : term b : function 9/10

Preliminary experimental results 0.72 0.7 0.68 0.66 0.64 0.62 0.6 0.58 0.56 0.54 0.52 -10k 0 10k 20k 30k 40k 50k 60k 70k 80k 90k 100k 110k 120k Figure: Accuracy versus training iterations Symbol cost model Accuracy Graph Convolution Network 0.70 Frequency heuristic 0.56 Dataset: 4,821 problems, 1,411,730 precedence pairs 10/10

Section 4 Backup slides 1/9

Symbol costs rationale Symbol cost function c sym : Σ → R is optimal on problem P iff ordering the symbols by their cost values in ascending order yields an optimal symbol precedence π ∗ . This is true iff π ∗ minimizes � 1 ≤ i ≤ n i · c sym ( π ( i )) where n = | Σ P | . What is a good symbol cost function? How can we train symbol costs such that when we order symbols by symbol costs 2/9

Training data Model layers: 1. Problem -> symbol embeddings 2. Symbol embedding -> symbol cost 3. Symbol costs -> precedence cost Let s ∈ Σ . Let M c be a differentiable symbol cost model: c sym ( s ) = M c ( fv ( s )) � � c sym ( s i ) · π − 1 ( s i ) c prec ( π ) = C c sym ( π ( i )) · i = C 1 ≤ i ≤ n 1 ≤ i ≤ n � � c sym ( s i ) · f ( π − 1 ( s i )) c prec ( π ) = C c sym ( π ( i )) · f ( i ) = C 1 ≤ i ≤ n 1 ≤ i ≤ n 2 C = n ( n + 1 ) so that c sym ( s ) = 1 for all s implies c prec ( π ) = 1 for all π . � c sym ( s i ) · [ π − 1 1 ( s i ) − π − 1 c pair ( π 0 , π 1 ) = c prec ( π 1 ) − c prec ( π 0 ) = C 0 ( s i )] 1 ≤ i ≤ n Loss: L()... 3/9

Our math model of precedence cost: weighted sum of symbol costs. Show on an example that minimizing this expression corresponds to sorting in descending order. We search for csym such that cprec correlates with the quality of precedence. Why pairs of precedences? We are sure which of two is better but we are not sure what is a good (target) quality value of a precedence. 4/9

Graph Convolution Network schema clause +/- +/- equality term or atom predicate function argument variable Symbol features: in conjecture, introduced 5/9

GNN architecture Trainable parameters are emphasized . ◮ For each node type: layer 0 node embedding ◮ For each layer: ◮ For each edge type: Message model (dense layer) ◮ Input: source node embedding, source node features, edge features ◮ Output: message ◮ Message aggregation step (sum all incoming messages for each node and incoming edge type) ◮ For each node type: Node aggregation model (dense layer) ◮ Input: node embedding, aggregated message for each incoming edge type ◮ Output: node embedding 6/9

References Geoff Sutcliffe. The TPTP problem library and associated infrastructure. From CNF to TH0, TPTP v6.4.0. Journal of Automated Reasoning , 59(4):483–502, 2017. doi: 10.1007/s10817-017-9407-7 . 7/9

Experimental setup ◮ Only predicate precedences are learned. Function symbols are ordered by invfreq . ◮ Problems from TPTP Sutcliffe [2017] – CNF and FOF (clausified with Vampire) ◮ P train (8217 problems): at most 200 predicate symbols, at least 1 out of 24 random predicate precedences yield success ◮ P test (15751 problems): at most 1024 predicate symbols ◮ 5 evaluation iterations (splits): 1000 training problems and 1000 test problems ◮ 100 precedences per training problem ◮ Vampire configuration: time limit: 10 seconds, memory limit: 8192 MB, literal comparison mode: predicate, function symbol precedence: invfreq , saturation algorithm: discount, age-weight ratio: 1:10, AVATAR: disabled ◮ 10 6 symbol pair samples to train M 8/9

Elastic-Net feature coefficients of individual symbols Training set Arity Frequency Unit frequency 0 − . 98 . 01 − . 01 1 . 56 . 44 2 . 36 . 64 3 − . 88 . 04 4 . 93 . 07 P train . 43 . 57 Symbol order: descending by predicted value ◮ Sets 1, 2, 4, P train : ◮ Descending by frequency: low frequency ∼ early inference ◮ Similar to invfreq and vampire --sp frequency ◮ Sets 0, 3: ◮ Ascending by arity: high arity ∼ early inference ◮ Similar to vampire --sp arity 9/9

Project Proposal: Machine Learning Good Symbol Precedences 1 Filip - PowerPoint PPT Presentation

Project Proposal: Machine Learning Good Symbol Precedences 1 Filip Brtek Martin Suda Czech Technical University in Prague, Czech Republic September 16, 2020 1 Supported by the ERC Consolidator grant AI4REASON no. 649043 under the EU-H2020

Writing Your CAREER Proposal Writing Your CAREER Proposal 2016 NSF/CMMI CAREER Proposal Writing

Writing Your CAREER Proposal Writing Your CAREER Proposal 2016 NSF CAREER Proposal Writing

A PROPOSAL FOR A PRESERVATION A PROPOSAL FOR A PRESERVATION A PROPOSAL FOR A PRESERVATION A

Project Proposal Brocks Hill / Parklands Project Proposal One Project Brocks Hill

How to defend your proposal like professional Dr Dr. Reyas What is proposal presentation

LICH PROPOSAL 1 Proposal Sponsor and Partners PROPOSAL SPONSOR: Fortis Property Group, LLC, a

Breathalyzer Breathalyzer Breathalyzer Breathalyzer Policy Proposal Policy Proposal Policy

Stage-2 proposal submitted at 6th February 2013 Proposal Acronym: PanCareLIFE Proposal

General CBFI- Holders Assembly Alternate Fee Structure Proposal Employee Compensation Plan

Writing Your CAREER Proposal 2017 NSF/CMMI CAREER Proposal Writing Workshop Portland, Oregon

Electronic Proposal Submission Systems: NSF FastLane Objectives Create a new proposal in

Manchester Tank Coating Project RFP No. 912 Pre Pre-Proposal Proposal Meeting Meeting

NSF Proposal Elements and Planning US APA Construction Proposal Workshop June 10-11, 2019 Ed

Project Altitude Retention Based Tax Abatement Proposal Project Altitude Proposal 10 year

Project Proposal PROPOSED BY McKarthy Labs PROJECT PROPOSAL THE COMPANY McKarthy Labs is a

INTRODUCTION TO PROPOSAL WRITING Foundation Center 1 2/24/2017 What We Will Cover Today The

Generalised link-layer adaptation with higher-layer criteria for energy-constrained and

Recursive Program Schemes with Effects Daniel Schwencke, 28th March 2010 Outline 1 Introduction 2

Objectives Introduction to Grammars Identify and explain the parts of a grammar. Defjne

Linking Philipp Koehn 18 April 2018 Philipp Koehn Computer Systems Fundamentals: Linking 18

Linear Regression and the Bias Variance Tradeoff Guest Lecturer Joseph E. Gonzalez slides

CSE443 Compilers Dr. Carl Alphonce alphonce@buffalo.edu 343 Davis Hall Phases of a Syntactic

EECS 3401 AI and Logic Prog. Lecture 3 Adapted from slides of Prof. Yves Lesperance

Exercises for 8.1.3 1. Give a parse tree with semantic records for the statement Z = 2 + 2. Show