Project Proposal: Machine Learning Good Symbol Precedences 1 Filip - - PowerPoint PPT Presentation

project proposal
SMART_READER_LITE
LIVE PREVIEW

Project Proposal: Machine Learning Good Symbol Precedences 1 Filip - - PowerPoint PPT Presentation

Project Proposal: Machine Learning Good Symbol Precedences 1 Filip Brtek Martin Suda Czech Technical University in Prague, Czech Republic September 16, 2020 1 Supported by the ERC Consolidator grant AI4REASON no. 649043 under the EU-H2020


slide-1
SLIDE 1

1/10

Project Proposal: Machine Learning Good Symbol Precedences1

Filip Bártek Martin Suda

Czech Technical University in Prague, Czech Republic

September 16, 2020

1Supported by the ERC Consolidator grant AI4REASON no. 649043 under

the EU-H2020 programme, the Czech Science Foundation project 20-06390Y and the Grant Agency of the Czech Technical University in Prague, grant

  • no. SGS20/215/OHK3/3T/37.
slide-2
SLIDE 2

2/10

Outline

Motivation Precedence recommender system Architecture Training Experimental results

slide-3
SLIDE 3

3/10

Context

Theorem prover of choice: Vampire

Automated theorem proving for first-order logic (FOL) Refutation-based Saturation-based Superposition calculus Symbol precedence Simplification ordering on terms

slide-4
SLIDE 4

4/10

Why does symbol precedence matter?

FOL problem: a = b ⇒ f (a, b) = f (b, b) CNF: a = b ∧ f (a, b) = f (b, b) Precedence [f , a, b] orders a < b: f (a, b) = f (b, b) →f (a, a) = f (b, b) →f (a, a) = f (a, b) →f (a, a) = f (a, a) →⊥ Precedence [f , b, a] orders b < a: f (a, b) = f (b, b) →f (b, b) = f (b, b) →⊥

slide-5
SLIDE 5

5/10

Precedence recommender system

First-order logic problem Clause normal form (CNF) Vampire (clausification mode) Symbol embeddings Graph Convolution Network Symbol costs Feed-forward neural network Symbol precedence Order symbols by their costs

slide-6
SLIDE 6

6/10

Training data

Repeat:

  • 1. Sample a problem P from TPTP.
  • 2. Try to solve P using Vampire with two random precedences

π0, π1.

  • 3. If π0 leads to a faster proof search than π1, store the training

sample (P, π0, π1). We train a classifier that decides: Is π0 better than π1?

slide-7
SLIDE 7

7/10

Model of “precedence π0 is better than π1”

  • 1. Trainable symbol cost model csym : Σ → R
  • 2. Precedence cost cprec : Precedences(Σ) → R:

cprec(π) =

  • 1≤i≤| Σ |

csym(π(i)) · i Ordering symbols in decreasing order by csym minimizes cprec.

  • 3. Precedence pair cost:

cpair(π0, π1) = cprec(π1) − cprec(π0)

  • 4. Probability that π0 is better than π1:

sigmoid(cpair(π0, π1))

slide-8
SLIDE 8

8/10

Classifier: Is precedence π0 better than π1?

Problem P Clause normal form (CNF) Vampire π0 π0-1 Invert π1 π1-1 Invert Inverse precedence difference Symbol embeddings Graph Convolution Network Symbol costs Feed-forward neural network Precedence pair cost Symbol precedence Order symbols by their costs Normalized inverse precedence difference Normalize Loss Binary cross-entropy

slide-9
SLIDE 9

9/10

Graph Convolution Network example

a = b ∧ f (a, b) = f (b, b)

a=b : clause a=b : equality atom + f(a,b)≠f(b,b) : clause f(a,b)=f(b,b) : equality atom

  • a : term

b : term f(a,b) : term f(b,b) : term a : function b : function f : function argument 1 argument 1 argument 2 argument 2

slide-10
SLIDE 10

10/10

Preliminary experimental results

0.52 0.54 0.56 0.58 0.6 0.62 0.64 0.66 0.68 0.7 0.72

  • 10k

10k 20k 30k 40k 50k 60k 70k 80k 90k 100k 110k 120k

Figure: Accuracy versus training iterations

Symbol cost model Accuracy Graph Convolution Network 0.70 Frequency heuristic 0.56 Dataset: 4,821 problems, 1,411,730 precedence pairs

slide-11
SLIDE 11

1/9

Section 4 Backup slides

slide-12
SLIDE 12

2/9

Symbol costs rationale

Symbol cost function csym : Σ → R is optimal on problem P iff

  • rdering the symbols by their cost values in ascending order yields

an optimal symbol precedence π∗. This is true iff π∗ minimizes

1≤i≤n i · csym(π(i)) where n = | ΣP |.

What is a good symbol cost function? How can we train symbol costs such that when we order symbols by symbol costs

slide-13
SLIDE 13

3/9

Training data

Model layers:

  • 1. Problem -> symbol embeddings 2. Symbol embedding ->

symbol cost 3. Symbol costs -> precedence cost Let s ∈ Σ. Let Mc be a differentiable symbol cost model: csym(s) = Mc(fv(s)) cprec(π) = C

  • 1≤i≤n

csym(π(i)) · i = C

  • 1≤i≤n

csym(si) · π−1(si) cprec(π) = C

  • 1≤i≤n

csym(π(i)) · f (i) = C

  • 1≤i≤n

csym(si) · f (π−1(si)) C =

2 n(n+1) so that csym(s) = 1 for all s implies cprec(π) = 1 for all

π. cpair(π0, π1) = cprec(π1)−cprec(π0) = C

  • 1≤i≤n

csym(si)·[π−1

1 (si)−π−1 0 (si)]

Loss: L()...

slide-14
SLIDE 14

4/9

Our math model of precedence cost: weighted sum of symbol costs. Show on an example that minimizing this expression corresponds to sorting in descending order. We search for csym such that cprec correlates with the quality of precedence. Why pairs of precedences? We are sure which of two is better but we are not sure what is a good (target) quality value of a precedence.

slide-15
SLIDE 15

5/9

Graph Convolution Network schema

term or atom predicate function argument clause +/- equality +/- variable

Symbol features: in conjecture, introduced

slide-16
SLIDE 16

6/9

GNN architecture

Trainable parameters are emphasized. ◮ For each node type: layer 0 node embedding ◮ For each layer:

◮ For each edge type: Message model (dense layer)

◮ Input: source node embedding, source node features, edge features ◮ Output: message

◮ Message aggregation step (sum all incoming messages for each node and incoming edge type) ◮ For each node type: Node aggregation model (dense layer)

◮ Input: node embedding, aggregated message for each incoming edge type ◮ Output: node embedding

slide-17
SLIDE 17

7/9

References

Geoff Sutcliffe. The TPTP problem library and associated

  • infrastructure. From CNF to TH0, TPTP v6.4.0. Journal of

Automated Reasoning, 59(4):483–502, 2017. doi: 10.1007/s10817-017-9407-7.

slide-18
SLIDE 18

8/9

Experimental setup

◮ Only predicate precedences are learned. Function symbols are ordered by invfreq. ◮ Problems from TPTP Sutcliffe [2017] – CNF and FOF (clausified with Vampire)

◮ Ptrain (8217 problems): at most 200 predicate symbols, at least 1 out of 24 random predicate precedences yield success ◮ Ptest (15751 problems): at most 1024 predicate symbols

◮ 5 evaluation iterations (splits): 1000 training problems and 1000 test problems ◮ 100 precedences per training problem ◮ Vampire configuration: time limit: 10 seconds, memory limit: 8192 MB, literal comparison mode: predicate, function symbol precedence: invfreq, saturation algorithm: discount, age-weight ratio: 1:10, AVATAR: disabled ◮ 106 symbol pair samples to train M

slide-19
SLIDE 19

9/9

Elastic-Net feature coefficients

  • f individual symbols

Training set Arity Frequency Unit frequency −.98 .01 −.01 1 .56 .44 2 .36 .64 3 −.88 .04 4 .93 .07 Ptrain .43 .57 Symbol order: descending by predicted value ◮ Sets 1, 2, 4, Ptrain:

◮ Descending by frequency: low frequency ∼ early inference ◮ Similar to invfreq and vampire --sp frequency

◮ Sets 0, 3:

◮ Ascending by arity: high arity ∼ early inference ◮ Similar to vampire --sp arity