logic is everywhere Associative Memories la l ogica est a por - - PowerPoint PPT Presentation

logic is everywhere
SMART_READER_LITE
LIVE PREVIEW

logic is everywhere Associative Memories la l ogica est a por - - PowerPoint PPT Presentation

Hikmat har Jaga Hai Symmetric Networks logika je svuda Steffen H olldobler Mantk her yerde International Center for Computational Logic logika je v sude


slide-1
SLIDE 1

logika je vˇ sude

logic is everywhere

logika je svuda Logika ada di mana-mana

Hikmat har Jaga Hai a l´

  • gica est´

a em toda parte

Logik ist ¨ uberall

la logique est partout

Mantık her yerde la logica ` e dappertutto

la l´

  • gica est´

a por todas partes

Logica este peste tot

  • Symmetric Networks

Steffen H¨

  • lldobler

International Center for Computational Logic Technische Universit¨ at Dresden Germany

◮ Associative Memories ◮ Symmetric Networks ◮ Energy Functions ◮ Stochastic Networks ◮ Combinatorial Optimization Problems

Steffen H¨

  • lldobler

Symmetric Networks 1

slide-2
SLIDE 2

Associative Memories

◮ Literatur Hertz, Krogh, Palmer: Introduction to the Theory of Neural Computation. Addison-Wesley Publishing Company (1991). ◮ How can we model an associative memory? ⊲ Let M = {v1, . . . , vm} be a set of patterns. ⊲ Here, patterns are bit vectors of length l. ⊲ Let x be a bit vector of length l. ⊲ Find vj ∈ M which is most similar to x. ◮ Possible solution: ⊲ For all j = 1 . . . m compute the Hamming distance between vj and x:

l

X

i=1

(vji(1 − xi) + (1 − vji)xi) ⊲ Select vj, whose Hamming distance to x is smallest.

Steffen H¨

  • lldobler

Symmetric Networks 2

slide-3
SLIDE 3

Symmetric Networks

◮ A symmetric network consists of a finite set U of binary threshold units and a finite set W ⊆ U × U of weighted connections such that ⊲ whenever (ui, uj) ∈ W then (uj, ui) ∈ W , ⊲ wij = wji for all (ui, uj) ∈ W , and ⊲ wjj = 0 for all (uj, uj) ∈ W . ◮ Asynchronous update procedure: while current state is unstable: update an arbitrary unit.

Steffen H¨

  • lldobler

Symmetric Networks 3

slide-4
SLIDE 4

Symmetric Networks – Examples

5

  • 1

2 2 2 5

  • 1

2 2 2 5

  • 1

2 2 2 5

  • 1

2 2 2 5

  • 1

2 2 2 5

  • 1

2 2 2 1 1 2 1 1 2

Steffen H¨

  • lldobler

Symmetric Networks 4

slide-5
SLIDE 5

Symmetric Networks – Another Example

◮ Consider the following network with initial external activation 1 2 −1 1 −2 3 −1 1 −1 3 −1 ◮ Exercise Find the stable states of the network shown on this slide. ◮ Notation In the sequel, I will omit the threshold iff it is 0.

Steffen H¨

  • lldobler

Symmetric Networks 5

slide-6
SLIDE 6

Attractors

◮ Consider the space of states of a given network. ⊲ Stable states are also often called attractors. ⊲ The computation starts in the state corresponding to x. ⊲ Updating this state subsequentially leads to trajectories of states. ⊲ The trajectories are finite and yield attractors as final states. ⊲ The set of states whose trajectories lead to the same attractor is called basin of this attractor. ◮ Exercise Consider the network shown on the previous page. Specify all basins of attractors and all trajectories.

Steffen H¨

  • lldobler

Symmetric Networks 6

slide-7
SLIDE 7

Symmetric Networks and Associative Memories

◮ Can we use symmetric networks as associative memories? ◮ Let M be a set of patterns and x a bit vector of length l. ◮ Idea ⊲ Externally activate a network of l units by x at time t = 0; all inputs at time t > 0 are 0. ⊲ Search for weights such that after some time the network reaches a stable state which represents the pattern with minimal Hamming distance to x.

Steffen H¨

  • lldobler

Symmetric Networks 7

slide-8
SLIDE 8

Notational Convention

◮ To simplify the mathematical model we make the following assumptions: ⊲ Threshold θk = 0 for all units uk in a symmetric network. ⊲ Output vj ∈ {−1, 1} for all units uk. i.e., we use binary bipolar threshold units with threshold 0. ◮ Exercise Are these assumptions a restriction? ◮ Let l be the number of units in the network. ◮ Then vi = sgn(

l

X

j=1

wijvj), where sgn(x) =  1 if x ≥ 0, −1

  • therwise.

Steffen H¨

  • lldobler

Symmetric Networks 8

slide-9
SLIDE 9

Storing a Single Bit Vector

◮ How shall the weights look like? ◮ Let v be a bit vector of length l. ◮ v is a stable state if for all i we find vi = sgn(

l

X

j=1

wijvj). ◮ This holds if the weights are proportional to vivj, e.g., wij = 1

lvivj:

vi = sgn(Pl

j=1 1 lvivjvj)

= sgn(Pl

j=1 1 lvi)

= sgn vi = vi ◮ Errors in x are corrected if #errors(x) < l

2.

◮ v is an attractor. ◮ But −v is also an attractor which is reached if #errors(x) > l

2. Steffen H¨

  • lldobler

Symmetric Networks 9

slide-10
SLIDE 10

Storing Several Bit Vectors

◮ Let m be the number of bit vectors and l the number of units in the network: wij = 1 l

m

X

k=1

vkivkj. ◮ Remark This is often called the (generalized) Hebb rule (see Hebb 1949). ◮ Are all vectors vr ∈ M stable states? vri = sgn(Pl

j=1 wijvrj)

= sgn(1

l

Pl

j=1

Pm

k=1 vkivkjvrj)

= sgn(vri + 1

l

Pl

j=1

Pm

k=1,k=r vkivkjvrj).

◮ Let Cri = 1

l

Pl

j=1

Pl

k=1,k=r vkivkjvrj.

◮ If Cri = 0 for all i, then each vector is a stable state. ◮ If |Cri| < 1 for all i then Cri cannot change the sign of vri. ◮ Storage capacity: If vectors are stochastically independent and should be perfectly recalled then the maximum storage capacity is proportional to

l log l. Steffen H¨

  • lldobler

Symmetric Networks 10

slide-11
SLIDE 11

Hopfield and Symmetric Networks

◮ A network realizing an associative memory as shown on the previous slide is often called Hopfield network. ◮ J.J. Hopfield: Neural Networks and Physical Systems with Emergent Collective Computational Abilities. In: Proceedings of the National Academy of Sciences USA, 2554-2558 (1982). ◮ Exercise Suppose we want to store the vectors (−1, −1, 1, −1, 1, −1) and (1, 1, −1, −1, 1, 1) in a symmetric network with 6 units. Construct the network which solves this problem.

Steffen H¨

  • lldobler

Symmetric Networks 11

slide-12
SLIDE 12

Energy Functions

◮ What happens precisely when a symmetric network is updated? ◮ Consider the energy function E(t) = −1 2

l

X

i,j=1

wijvi(t)vj(t) describing the state of a symmetric network with N units at time t. ◮ Example u1 u2 u4 u3 −1 2 2 2 E(t) = v1(t)v2(t) − 2v1(t)v4(t) − 2v2(t)v4(t) − 2v3(t)v4(t).

Steffen H¨

  • lldobler

Symmetric Networks 12

slide-13
SLIDE 13

Properties of Energy Functions

◮ Theorem E is monoton decreasing, i.e., E(t + 1) ≤ E(t). ◮ Exercise How does an update change the energy of a symmetric network if we do not assume that wii = 0? ◮ Exercise Is the energy function still monoton decreasing if we do not assume that wij = wji? Prove your answer. ◮ How plausible is the assumption that wij = wji? ◮ Exercise Consider symmetric networks, where the threshold of the units need not be 0. Define a monoton decreasing energy function for these networks. Proof your claim.

Steffen H¨

  • lldobler

Symmetric Networks 13

slide-14
SLIDE 14

Relation to Ising Models

◮ Spins are magnetic atoms with directions 1 and −1. ◮ Suppose there are l atoms. ◮ For each atom vi a magnetic field hi is defined by hi =

l

X

j=1

wijvj + he where he is the external field. ◮ At low temperatures spins follow the magnetic field. This is described by energy function H = −1 2

l

X

i,j=1

wijvivj − he

l

X

i=1

vi.

Steffen H¨

  • lldobler

Symmetric Networks 14

slide-15
SLIDE 15

More on Ising Models

◮ At high temperatures spins do not follow the magnetic field. ◮ Depending on the temperature T thermal fluctuations occur. ◮ Mathematical model: Glauber dynamics vi =  1 with probability g(hi), −1 with probability 1 − g(hi), where g(h) = 1 1 + exp(−2βh), β = 1 kBT , kB Boltzmann’s constant. ◮ 1 − g(h) = g(−h). ◮ Behaviour of spins: prob(vi = ±1) = 1 1 + exp(∓2βhi). ◮ In equilibrium states with low energy are more likely than states with higher energy.

Steffen H¨

  • lldobler

Symmetric Networks 15

slide-16
SLIDE 16

Stochastic Networks

◮ Hinton, Sejnowski: Optimal Perceptual Inference. In: Proceedings of the IEEE Conference on Computer Vision and Recognition, 448-453 (1983). ⊲ They applied the previously mentioned results to symmetric networks: prob(vi = 1) = 1 1 + exp(−β Pl

j=1 wijvj)

where β = 1 T . ⊲ Those networks are called Boltzmann machines or stochastic networks. ⊲ T 0: symmetric networks. ◮ Kirkpatrick, Gelatt, Vecchi: Optimization by simulated annealing. Science 220, 671-680 (1983). ⊲ Simulated annealing. ◮ Geman, Geman: Stochastic Relaxation, Gibbs Distribution, and the Bayesian Restoration of Images. IEEE Transactions on Pattern Analysis and Machine In- telligence, 6, 721-741 (1984). ⊲ Simulated annealing is guaranteed to find a global minima of the energy func- tion if temperature is lowered in infinitesimal small steps.

Steffen H¨

  • lldobler

Symmetric Networks 16

slide-17
SLIDE 17

Combinatorial Optimization Problem

◮ We consider problems of size n and want to find the optimal solution. ◮ Class P there exists a deterministic algorithm solving the problem in polynomial time. ◮ Class N P (non-deterministic polynomial)

  • ne can test in polynomial time whether any guess of the solution is right.

◮ N P-complete problem if one could find a determinstic algorithm solving the problem in polynomial time, then all other N P problems could be solved in polynomial time. ◮ It is widely believed that P = N P, but no proof has been found yet. ◮ Literature Garey, Johnson: Computers and Intractability. H. Freeman and Company (1979).

Steffen H¨

  • lldobler

Symmetric Networks 17

slide-18
SLIDE 18

The Travelling Salesman Problem

◮ Given n cities and costs cij for traveling to city i from city j. ◮ Problem Find a tour visiting each city exactly once and return to the start city such that the accumulated costs of the tour are minimal.

1 2 3 3 1 2 3 4 stops cities c23

Steffen H¨

  • lldobler

Symmetric Networks 18

slide-19
SLIDE 19

Modelling the Travelling Salesman Problem (1)

◮ Binary threshold units vik =  1 if the kth stop is in the ith city,

  • therwise.

◮ Costs of the tour 1 2

n

X

i,j,k=1

cijvik(vj,k+1 + vj,k−1), where indices are taken modulo n. ◮ Each city occurs only once on the tour (∀i)

n

X

k=1

vik = 1. ◮ Each stop on the tour is at just one city (∀k)

n

X

i=1

vik = 1.

Steffen H¨

  • lldobler

Symmetric Networks 19

slide-20
SLIDE 20

Modelling the Travelling Salesman Problem (2)

◮ Altogether, we obtain 1 2

n

X

i,j,k=1

cijvik(vj,k+1 + vj,k−1) + γ 2 (

n

X

i=1

(1 −

n

X

k=1

vik)2 +

n

X

k=1

(1 −

n

X

i=1

vik)2). ◮ This corresponds to an energy function. ◮ Exercise Construct the corresponding symmetric network. ◮ Exercise What is the role of γ? ◮ For a discussion of different solutions for the travelling salesman problem see (Hertz etal. 1991).

Steffen H¨

  • lldobler

Symmetric Networks 20

slide-21
SLIDE 21

Graph Bipartitioning

◮ Exercise Consider the following optimization problem: A graph with an even num- ber of vertices shall be bipartitioned with a minimal cut. I.e., the vertices of the graph shall be split into two sets A and B having the same cardinality such that the number of edges going from A to B is minimal. ⊲ Construct an energy function such that the minima of the energy function cor- respond precisely to the minimal cut through the graph. ⊲ Specify an additional term ensuring |A| = |B|. ◮ Hint Let vi =  1 if vertex i ∈ A, −1

  • therwise

and cij =  1 if there is an edge to i from j,

  • therwise.

Steffen H¨

  • lldobler

Symmetric Networks 21

slide-22
SLIDE 22

Propositional Logic

◮ Variables are p1, p2, . . .. ◮ Connectives are ¬, ∨, ∧. ◮ Atoms are variables. ◮ Literals are atoms and negated atoms. ◮ Clauses are (generalized) disjunctions of literals. ◮ Formulas in clause form are (generalized) conjunctions of clauses. ◮ Notation Sometimes variables are denoted by different letters if there is a bijection between the set of these letters and {p1, . . . , pn}. ◮ Example (o → m) ∧ (s → ¬m) ∧ (c → m) ∧ (c → s) ∧ (v → ¬m) ≡ (¬o ∨ m) ∧ (¬s ∨ ¬m) ∧ (¬c ∨ m) ∧ (¬c ∨ s) ∧ (¬v ∨ ¬m) = [¬o, m] , [¬s, ¬m] , [¬c, m] , [¬c, s] , [¬v, ¬m].

Steffen H¨

  • lldobler

Symmetric Networks 22

slide-23
SLIDE 23

Interpretations and Models

◮ Notation (all symbols may be indexed) ⊲ A denotes an atom. ⊲ L denotes a literal. ⊲ F, G denote formulas. ⊲ C denotes a clause. ◮ The set of truth values is {0, 1}. (Later it will be {⊥, ⊤}.) ◮ An Interpretation I is a mapping from the language of propositional logic into the set of truth values such that I(¬F ) = 1 − I(F ), I(F ∧ G) = min(I(F ), I(G)) = I(F ) × I(G), I(F ∨ G) = max(I(F ), I(G)) = I(F ) + I(G) − I(F ) × I(G). ◮ Let F be a formula and var(F ) be the set of variables occurring in F . Each interpretation for F can be represented by a mapping from var(F ) to {0, 1}. ◮ I is a model for F iff I(F ) = 1. ◮ F is satisfiable if it has a model.

Steffen H¨

  • lldobler

Symmetric Networks 23

slide-24
SLIDE 24

Interpretations and Models – Example

◮ Let F = [¬p1, p2], [p3, ¬p2], I(p1) = 1, I(p2) = 0 and I(p3) = 1, then: I(F ) = min(I([¬p1, p2]), I([p3, ¬p2]) = min(max(I(¬p1), I(p2)), max(I(p3), I(¬p2))) = min(max(1 − I(p1), I(p2)), max(I(p3), 1 − I(p2))) = min(max(1 − 1, 0), max(1, 1 − 0)) = min(0, 1) = ◮ Hence, I is not a model for F , but is a model for [p3, ¬p2]. ◮ Exercise ⊲ Is F satisfiable? Prove your claim. ⊲ Is [¬p], [p, ¬q], [q] satisfiable? Prove your claim. ⊲ Find all models of [¬o, m], [¬s, ¬m], [¬c, m], [¬c, s], [¬v, ¬m].

Steffen H¨

  • lldobler

Symmetric Networks 24

slide-25
SLIDE 25

Propositional Reasoning and Energy Minimization

◮ Pinkas 1991: Is there a link between propositional logic and symmetric networks? ◮ Let F = C1, . . . , Cm be a propositional formula in clause form. ◮ We define rep(C) = 8 > > < > > : if C = [ ], A if C = [A], 1 − A if C = [¬A], rep(C1) + rep(C2) − rep(C1)rep(C2) if C = (C1 ∨ C2). rep(F ) = Pm

i=1(1 − rep(Ci)).

◮ Example rep([¬o, m], [¬s, ¬m], [¬c, m], [¬c, s], [¬v, ¬m]) = vm − cm − cs + sm − om + 2c + o. ◮ Exercise Compute rep([¬p], [p, ¬q], [q]).

Steffen H¨

  • lldobler

Symmetric Networks 25

slide-26
SLIDE 26

Propositional Reasoning and Symmetric Networks

◮ Theorem I(F ) = 1 iff rep(F ) has a global minima at I and rep(F )(I) = 0. ◮ Compare rep(F ) = vm − cm − cs + sm − om + 2c + o with E = − P

k<j wkjvjvk + P k θkvk.

m 1

  • s

v 2 c 1

  • 1

1

  • 1

1

Steffen H¨

  • lldobler

Symmetric Networks 26

slide-27
SLIDE 27

Propositional Non-Monotonic Reasoning

◮ Pinkas 1991a: Can the above mentioned approach be extended to non-monotonic reasoning? ◮ Consider F = (C1, k1), . . . , (Cm, km), where Ci are clauses and ki ∈ R+. ◮ We define penalty(I, (C, k)) =  0 if I(C) = 1, k if I(C) = 0, penalty(I, F ) = Pm

j=1 penalty(I, (Cj, k, j))

◮ I is preferred over J wrt F iff penalty(I, F ) < penalty(J, F ). ◮ Modify rep to become rep(F ) = Pm

i=1 ki(1 − rep(Ci)), e.g.,

rep(([¬o, m], 1), ([¬s, ¬m], 2), ([¬c, m], 4), ([¬c, s], 4), ([¬v, ¬m], 4)) = 4vm − 4cm − 4cs + 2sm − om + 8c + o. ◮ The corresponding stochastic network computes most preferred interpretations.

Steffen H¨

  • lldobler

Symmetric Networks 27

slide-28
SLIDE 28

Exercises

◮ Exercise Consider F = ([¬o, m], 1), ([¬s, ¬m], 2), ([¬c, m], 4), ([¬c, s], 4), ([¬v, ¬m], 4). ⊲ Compute the most preferred interpretations of F . ⊲ What happens if we add (o, 100) to F ? ⊲ What happens if we add (o, 100) and (s, 100) to F ? ◮ Exercise Extend symmetric networks such that a unit becomes active as soon as a model for the corresponding propositional logic formula has been found. Hint: The extension may contain other units than logical threshold ones and the addi- tional connections need not to be symmetric.

Steffen H¨

  • lldobler

Symmetric Networks 28