logic is everywhere The Core Method la l ogica est a por todas - - PowerPoint PPT Presentation

logic is everywhere
SMART_READER_LITE
LIVE PREVIEW

logic is everywhere The Core Method la l ogica est a por todas - - PowerPoint PPT Presentation

Hikmat har Jaga Hai The First Order CORE Method logika je svuda Steffen H olldobler Mantk her yerde International Center for Computational Logic logika je v sude


slide-1
SLIDE 1

logika je vˇ sude

logic is everywhere

logika je svuda Logika ada di mana-mana

Hikmat har Jaga Hai a l´

  • gica est´

a em toda parte

Logik ist ¨ uberall

la logique est partout

Mantık her yerde la logica ` e dappertutto

la l´

  • gica est´

a por todas partes

Logica este peste tot

  • The First Order CORE Method

Steffen H¨

  • lldobler

International Center for Computational Logic Technische Universit¨ at Dresden Germany

◮ The Core Method ◮ Logic Programs ◮ Mapping Interpretations to Real Numbers ◮ Approximation of Interpretations ◮ Constructive Approaches ◮ Implementation ◮ Open Problems

Steffen H¨

  • lldobler

The First Order CORE Method 1

slide-2
SLIDE 2

The CORE Method

◮ Relate logic programs and connectionist systems. ◮ Embed interpretations into (vectors of) real numbers. ◮ Hence, obtain an embedded version of the T

P-operator.

◮ Construct a network computing one application of fP. ◮ Add recurrent connections from output to input layer. ◮ Compute (or approximate) the least fixed point of T

P.

Symbolic System Connectionist System embedding extraction writable readable trainable

Steffen H¨

  • lldobler

The First Order CORE Method 2

slide-3
SLIDE 3

Logic Programs

◮ A logic program P over a first-order language L is a finite set of clauses A ← L1 ∧ . . . ∧ Ln, where A is an atom, Li are literals and n ≥ 0. ◮ TL is the set of all ground terms over L. ◮ BL is the set of all ground atoms over L called Herbrand base. ◮ A Herbrand interpretation I is a mapping BL → {⊤, ⊥}. ◮ 2BL is the set of all Herbrand interpretations. ◮ g P is the set of all ground instances of clauses in P. ◮ Immediate consequence operator T

P : 2BL → 2BL:

T

P(I) = {A | there is a clause A ← L1 ∧ . . . ∧ Ln ∈ g P

such that I | = L1 ∧ . . . ∧ Ln}. ◮ I is a supported model iff T

P(I) = I. Steffen H¨

  • lldobler

The First Order CORE Method 3

slide-4
SLIDE 4

Two Examples

◮ Natural numbers n0 % 0 is a natural number. nsX ← nX % The successor sX is a natural % number if X is a natural number. ◮ Even and odd numbers e0 % 0 is an even number. esX ← oX % The successor of an odd X is even.

  • X ← ¬eX

% If X is not even then it is odd. ⊲ Herbrand base BL = {e0, es0, . . . , o0, os0, . . .} ⊲ Some interpretations I1 = {es2m0 | m ≥ 1} I3 = I1 ∩ I2 = ∅ I2 = {os2m+10 | m ≥ 0} I4 = I1 ∪ I3

Steffen H¨

  • lldobler

The First Order CORE Method 4

slide-5
SLIDE 5

The Immediate Consequence Operator

◮ T

P(I) = {A | there is a clause A ← L1 ∧ . . . ∧ Ln ∈ g P

such that I | = L1 ∧ . . . ∧ Ln}. ◮ Natural Numbers {n0, nsX ← nX} ∅ → {n0} {n0} → {n0, ns0} {n0, ns0} → {n0, ns0, nss0} BL → BL ◮ Even and odd numbers {e0, esX ← oX, oX ← ¬eX} ∅ → {e0, oX | X ∈ TL} {oX | X ∈ TL} → {e0, esX, oX | X ∈ TL} {es2m0 | n ≥ 0} → {e0, os2m+10 | m ≥ 0} {os2m+10 | n ≥ 0} → {e0, es2m0 | m ≥ 0} BL → {e0, esX | X ∈ TL}

Steffen H¨

  • lldobler

The First Order CORE Method 5

slide-6
SLIDE 6

The Initial Approach

◮ H¨

  • lldobler, Kalinke, St¨
  • rr 1999:

Can the core method be extended to first-order logic programs? ◮ Problem ⊲ Given a logic program P over a first order language L together with T

P : 2BL → 2BL.

⊲ BL is countably infinite. ⊲ The method used to relate propositional logic and connectionist systems is not applicable. ⊲ How can the gap between the discrete, symbolic setting of logic and the continuous, real valued setting of connectionist networks be closed?

Steffen H¨

  • lldobler

The First Order CORE Method 6

slide-7
SLIDE 7

The Goal

◮ Find rep : 2BL → R and fP : R → R such that the following conditions hold. ⊲ T

P(I) = I′

implies fP(rep(I)) = rep(I′). fP(x) = x′ implies T

P(rep−1(x)) = rep−1(x′).

◮ ◮ fP is a sound and complete encoding of T

P.

⊲ T

P is a contraction on 2BL iff fP is a contraction on R.

◮ ◮ The contraction property and fixed points are preserved. ⊲ fP is continuous on R. ◮ ◮ A connectionst network approximating fP is known to exist. ◮ ◮ fP and, hence, T

P can be trained by backpropagation

and related training methods.

Steffen H¨

  • lldobler

The First Order CORE Method 7

slide-8
SLIDE 8

Level Mappings

◮ Let P be a program over a first order language L. ◮ A level mapping for P is a function l : BL → N. ⊲ We define l(¬A) = l(A). ◮ Examples ⊲ Natural Numbers {n0, nsX ← nX} l(nsm0) = m + 1 ⊲ Even and odd numbers {e0, esX ← oX, oX ← ¬eX} l(esm0) = 2n + 1, l(osm0) = 2m + 2

Steffen H¨

  • lldobler

The First Order CORE Method 8

slide-9
SLIDE 9

Acyclic Logic Programs

◮ We can associate a metric dL with L and a level mapping l as follows. Let I, J ∈ 2BL: dL(I, J) =  0 if I = J 2−n if n is the smallest level on which I and J differ. ◮ Proposition (Fitting 1994) (2BL, dL) is a complete metric space. ◮ P is said to be acyclic wrt a level mapping l, if for every A ← L1 ∧ . . . ∧ Ln ∈ g P we find l(A) > l(Li) for all i. ◮ P is said to be acyclic if P is acylic wrt some level mapping. ⊲ Both running examples are acyclic. ◮ Proposition Let P be an acyclic logic program wrt l and dL the metric associated with L and l, then T

P is a contraction on (2BL, dL). Steffen H¨

  • lldobler

The First Order CORE Method 9

slide-10
SLIDE 10

Mapping Interpretations to Real Numbers

◮ Let l be a bijective level mapping. ◮ Let rep be defined as rep(I) = X

A∈I

4−l(A). ◮ Example BL = { e0,

  • 0,

es0,

  • s0,

ess0, . . .} rep({e0}) = 0. 1 04 = 0.2510 rep({e0, es0, ess0}) = 0. 1 1 14 ≈ 0.2710

Steffen H¨

  • lldobler

The First Order CORE Method 10

slide-11
SLIDE 11

The Set of all Embedded Interpretations

◮ Let E = {rep(I) | I ∈ 2BL} ◮ Bader 2003 E is the attractor of an iterated function system.

✲ ✲ ✲ ✲

◮ Proposition rep is a bijection between 2BL and E. We have a sound and complete encoding of interpretations. ◮ Proposition E is compact.

Steffen H¨

  • lldobler

The First Order CORE Method 11

slide-12
SLIDE 12

Mapping Immediate Consequence Operators to Functions on the Reals

◮ We define fP : E → E : r → rep(T

P(rep−1(r))).

r

✲ ✲

fP T

P

r′ I I′

❄ ❄

rep rep We have a sound and complete encoding of T

P.

◮ Proposition Let P be an acylic program wrt a bijective level mapping. fP is a contraction on E. Contraction property and fixed points are preserved.

Steffen H¨

  • lldobler

The First Order CORE Method 12

slide-13
SLIDE 13

Approximating Continuous Functions

◮ Corollary fP is continuous. ◮ Recall Funahashi’s theorem: ⊲ Let K ⊆ Rn be compact. Every continuous function f : K → R can be uniformly approximated by input-output functions of 3-layer feed forward networks. ◮ Theorem fP can be uniformly approximated by input-output functions of 3-layer feed-forward networks. ⊲ T

P can be approximated as well by applying rep−1 .

Connectionist network approximating immediate consequence operator exists.

Steffen H¨

  • lldobler

The First Order CORE Method 13

slide-14
SLIDE 14

An Example

◮ Consider P = {n0, nsX ← nX} and let l(nsm0) = m + 1. ⊲ P is acyclic wrt l, l is bijective, rep(BL) = 1

3.

⊲ fP(rep(I)) = 4−l(n0) + P

nX∈I 4−l(nsX)

= 4−l(n0) + P

nX∈I 4−(l(nX))+1) = 1+rep(I) 4

. ◮ Approximation of fP to accuracy ε yields f(x) ∈ »1 + x 4 − ε, 1 + x 4 + ε – . ◮ Starting with some x and iterating f yields in the limit a value r ∈ »1 − 4ε 3 , 1 + 4ε 3 – . ◮ Select r ∈ E with minimal distance to r. ◮ Applying rep−1 to r we find nsm0 ∈ rep−1(r) if m < −log4ε − 1.

Steffen H¨

  • lldobler

The First Order CORE Method 14

slide-15
SLIDE 15

Approximation of Interpretations

◮ Let P be a logic program over a first order language L and l a level mapping. ◮ An interpretation I approximates an interpretation J to a degree n ∈ N if for all atoms A ∈ BL with l(A) < n we find I(A) = ⊤ iff J(A) = ⊤. ⊲ I approximates J to a degree n iff dL(I, J) ≤ 2−n.

Steffen H¨

  • lldobler

The First Order CORE Method 15

slide-16
SLIDE 16

Approximation of Supported Models

◮ Given an acyclic logic program P with bijective level mapping. ◮ Let T

P be the immediate consequence operator associated with P and

MP the least supported model of P. ◮ We can approximate T

P by a 3-layer feed-forward network.

◮ We can turn this network into a recurrent one. Does the recurrent network approximate the supported model of P? ◮ Theorem For an arbitrary m ∈ N there exists a recursive network with sigmoidal activation function for the hidden layer units and linear activation functions for the input and output layer units computing a function fP such that there exists an n0 ∈ N such that for all n ≥ n0 and for all x ∈ [−1, 1] we find dL(rep−1(f

n P(x)), MP) ≤ 2−m. Steffen H¨

  • lldobler

The First Order CORE Method 16

slide-17
SLIDE 17

First Order Core Method – Extensions

◮ Detailed study in (topological) continuity of semantic operators Hitzler, Seda 2003 and Hitzler, H¨

  • lldobler, Seda 2004:

⊲ many-valued logics, ⊲ larger class of logic programs, ⊲ other approximation theorems. ◮ A core method for reflexive reasoning H¨

  • lldobler, Kalinke, Wunderlich 2000.

◮ The graph of fP is an attractor of some iterated function system Bader 2003 and Bader, Hitzler 2004: ⊲ representation theorems, ⊲ fractal interpolation, ⊲ core with units computing radial basis functions.

Steffen H¨

  • lldobler

The First Order CORE Method 17

slide-18
SLIDE 18

Constructive Approaches: Approximating Piecewise Constant Functions

◮ Consider graph of fP. Approximate fP up to a given level l. Construct network computing piecewise constant function. Step activation functions. Sigmoidal activation functions. Radial basis functions.

0.5 0.2 0.45 0.1 0.4 0.35 0.5 0.4 0.3

Steffen H¨

  • lldobler

The First Order CORE Method 18

slide-19
SLIDE 19

Constructive Approaches: Approximating Piecewise Constant Functions

◮ Consider graph of fP. ◮ Approximate fP up to a given level l. Construct core computing piecewise constant function. Step activation functions. Sigmoidal activation functions. Radial basis functions.

0.5 0.2 0.45 0.1 0.4 0.35 0.5 0.4 0.3

Steffen H¨

  • lldobler

The First Order CORE Method 19

slide-20
SLIDE 20

Constructive Approaches: Approximating Piecewise Constant Functions

◮ Consider graph of fP. ◮ Approximate fP up to a given level l. ◮ Construct core computing piecewise constant function. ⊲ Step activation functions. Sigmoidal activation functions. Radial basis functions.

0.5 0.2 0.45 0.1 0.4 0.35 0.5 0.4 0.3

Steffen H¨

  • lldobler

The First Order CORE Method 20

slide-21
SLIDE 21

Constructive Approaches: Approximating Piecewise Constant Functions

◮ Consider graph of fP. ◮ Approximate fP up to a given level l. ◮ Construct core computing piecewise constant function. ⊲ Step activation functions. ⊲ Sigmoidal activation functions. Radial basis functions.

0.5 0.2 0.45 0.1 0.4 0.35 0.5 0.4 0.3

Steffen H¨

  • lldobler

The First Order CORE Method 21

slide-22
SLIDE 22

Constructive Approaches: Approximating Piecewise Constant Functions

◮ Consider graph of fP. ◮ Approximate fP up to a given level l. ◮ Construct core computing piecewise constant function. ⊲ Step activation functions. ⊲ Sigmoidal activation functions. ⊲ Radial basis functions.

3 2 1

  • 1
  • 2

1

  • 3

0.8 0.6 0.4 0.2 3 2 1

  • 1
  • 2

1

  • 3

0.8 0.6 0.4 0.2

◮ Bader, Hitzler, H¨

  • lldobler, Witzel 2007.

Steffen H¨

  • lldobler

The First Order CORE Method 22

slide-23
SLIDE 23

A Problem

◮ The accuracy of this approach is very limited. ◮ E.g., on a 32 bit computer only 16 atoms can be represented. ◮ We need to use real vectors instead of a single real number to represent interpretations.

Steffen H¨

  • lldobler

The First Order CORE Method 23

slide-24
SLIDE 24

Multi-Dimensional Level Mappings

◮ An k-dimensional level mapping | · | assigns to each ground atom a level l ∈ N+ and a dimension d ∈ {1, . . . k}: ◮ Example |esm0| = (m + 1, 1), |osm0| = (m + 1, 2). ◮ We can now define the embedding rep as follows: rep(I) = P

A∈I rep(A)

where rep(A) = (rep1(A), . . . , repk(A)) and repj(A) =  4−l(A) if |A| = (l, j)

  • therwise

◮ All results can be extended to k-dimensional level mappings.

Steffen H¨

  • lldobler

The First Order CORE Method 24

slide-25
SLIDE 25

Implementation

◮ A first prototype (FineBlend) was implemented in Witzel 2006. ⊲ Merging of the techniques described above and supervised growing neural gas (SGNG) developed in Fritzke 1998. ⊲ Radial basis function network approximating T

P.

⊲ Very robust with respect to noise and damage. ⊲ Trainable using a version of backpropagation together with techniques from SGNG.

Steffen H¨

  • lldobler

The First Order CORE Method 25

slide-26
SLIDE 26

FineBlend versus SGNG

0.01 0.1 1 10 100 2000 4000 6000 8000 10000 12000 14000 20 40 60 80 100 120 140 error #units #examples #units (FineBlend 1) error (FineBlend 1) #units (SGNG) error (SGNG)

Steffen H¨

  • lldobler

The First Order CORE Method 26

slide-27
SLIDE 27

FineBlend: Unit Failure

0.01 0.1 1 10 100 2000 4000 6000 8000 10000 12000 14000 16000 10 20 30 40 50 60 70 80 error #units #examples #units (FineBlend 1) error (FineBlend 1)

Steffen H¨

  • lldobler

The First Order CORE Method 27

slide-28
SLIDE 28

FineBlend: Iterating Random Inputs

0.05 0.1 0.15 0.2 0.25 0.3 0.05 0.1 0.15 0.2 0.25 0.3 dimension 2 (odd) dimension 1 (even)

Steffen H¨

  • lldobler

The First Order CORE Method 28

slide-29
SLIDE 29

Open Problems

◮ How can first order terms be represented and manipulated in a connectionist system (Pollack 1990, H¨

  • lldobler 1990, Plate 1991)?

◮ Can the mapping rep be learned (Gust, K¨ uhnberger 2005)? ◮ How can first order rules be extracted from a connectionist system? ◮ How can multiple instances of first order rules be represented in a connectionist system (Shastri 1990)? ◮ What does a theory for the integration of logic and connectionist systems look like? ◮ Can such a theory be applied in real domains outperforming conventional approaches?

Steffen H¨

  • lldobler

The First Order CORE Method 29