GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY - - PowerPoint PPT Presentation

graph representations backpropagation and biological
SMART_READER_LITE
LIVE PREVIEW

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY - - PowerPoint PPT Presentation

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB, University of Siena NeurIPS 2019 OUTLINE Learning in structured domains Diffusion machines and spatiotemporal locality Backpropagation diffusion


slide-1
SLIDE 1

NeurIPS 2019

GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY

Marco Gori SAILAB, University of Siena

slide-2
SLIDE 2

NeurIPS 2019

OUTLINE

  • Learning in structured domains
  • Diffusion machines and spatiotemporal locality
  • Backpropagation diffusion and biological

plausibility

slide-3
SLIDE 3

NeurIPS 2019

LEARNING IN STRUCTURED DOMAINS

?

slide-4
SLIDE 4

p h y s i c

  • c

h e m i c a l b e h a v i

  • r

What are the features?

image classification

Graphs as Pattern Models

slide-5
SLIDE 5

Social nets

Social networks Citation networks Communication networks Multi-agent systems

here we need to make prediction at node level! quasi-eqlibrium dynamic models

slide-6
SLIDE 6

NeurIPS 2019

GRAPH NEURAL NETS

?

pictures from Z. Wu et al

popular and successful mostly thanks to graph convolutional networks Non-Euclidean Deep Learning

slide-7
SLIDE 7

NeurIPS 2019

HISTORICALLY … ANOTHER PATH WAS FOLLOWED!

slide-8
SLIDE 8

Extension of the idea of time unfolding …

slide-9
SLIDE 9

Structure unfolding

The case of binary trees …

slide-10
SLIDE 10

NeurIPS 2019

Graph Compiling …

11 12 13 14 15 16 17 18 19 110 21 22 23 24 25 26 27 28 29 210 31 32 33 34 35 36 37 38 39 310 41 42 43 44 45 46 47 48 49 410 51 52 53 54 55 56 57 58 59 510 5 11 12 13 14 23 24 15 25 35 45 55 16 17 18 26 27 28 19 110 29 210 36 37 38 46 47 48 39 310 49 410 56 57 58 59 510 5

25

A recurrent net arises from cyclic graphs

The Graph Neural Network Model

Gori et al IJCNN 2005, 2009 IEEE-TNN

slide-11
SLIDE 11

NeurIPS 2019

LEARNING AS A DIFFUSION PROCESS

THE FRAMEWORK OF CONSTRAINED-BASED LEARNING AND THE ROLE OF TIME COHERENCE

?

slide-12
SLIDE 12

NeurIPS 2019

A = Z T dt e−t/✏ ✓1 2✏2⇢¨ q2 + 1 2✏⌫ ˙ q2 + V (q, t) ◆

kinetic energy potential energy

Loss function of neural net

Natural Laws of Learning The links with mechanics

Once we believe in ergodicity … there is no distinction between training and test sets! regularization term laws of learning laws of mechanics

slide-13
SLIDE 13

NeurIPS 2019

Natural Learning Theory Mechan- ics Remarks wi qi Weights are interpreted as generalized coor- dinates. ˙ wi ˙ qi Weights variations are interpreted as gener- alized velocities. υi pi The conjugate momentum to the weights is defined by using the machinery of Legendre transforms. A(w) S(q) The cognitive action is the dual of the action in mechanics. F(t, w, ˙ w) L(t, q, ˙ q) The Lagrangian F is associated with the classic Lagrangian L in mechanics. H(t, w, υ) H(t, q, p) When using w and υ, we can define the Hamiltonian, just like in mechanics.

Natural Laws of Cognition: A Pre-Algorithmic Step

w e i g h t s

  • f

n e u r a l n e t a s p

  • s

i t i

  • n

p a r t i c l e s

slide-14
SLIDE 14

NeurIPS 2019

y((0, 1), (1, 0))

¬y((0, 0), (1, 1)).

∧ ∨ L = {((0, 0), 0), ((0, 1), 1), ((1, 0), 1), ((1, 1), 0)} =

Constraint Reactions

“hard” architectural constraints training set constraints

xκ5 − σ(w53xκ3 + w54xκ4 + b4) = 0 xκ4 − σ(w41xκ1 + w42xκ2 + b4) = 0 xκ3 − σ(w31xκ1 + w32xκ2 + b3) = 0

κ = 1, 2, 3, 4

x15 = 1, x25 = 1, x35 = 0, x45 = 0

1 2

3

4

architectural and environmental constraints

slide-15
SLIDE 15

Lagrangian Multipliers Static Models Dynamic Models

holonomic constraints non-holonomic constraints

Lagrangian Approach

functional optimization: variational calculus under subsidiary conditions

slide-16
SLIDE 16

Formulation of Learning

R A (x, W) := Z 1 2(mx| ˙ x(t)|2 + mW | ˙ W(t)|2) $(t)dt + F(x, W), constraints

let us consider the case ∈ M#

ν

and let wing F(x, W) := R F(t, x, ˙ x, ¨ x, W, ˙ W, ¨ W) dt, Gj(t, x(t), W(t)) = 0, 1 ≤ j ≤ ⌫. proposition holds true:

Gj(⌧, ⇠, M) := ⇢⇠j − ej(⌧), if 1 ≤ j ≤ !; ⇠j − (mjk⇠k) if ! < j ≤ ⌫,

neural constraints (Einstein’s notation)

holonomic constraints (DAGs)

Proposition 1: Functionally independent for acyclic graphs feedforward nets

risk function regularization term

slide-17
SLIDE 17

Gj(⌧, ⇠, M, ⇣) := ( ⇠j − ej(⌧) + ⇣j, if 1 ≤ j ≤ !; ⇠j − (mjk⇠k) + ⇣j if ! < j ≤ ⌫.

A (x, W, s) := Z 1 2(mx| ˙ x(t)|2 + mW | ˙ W(t)|2 + ms| ˙ s(t)|2) $(t)dt + F(x, W, s), (12)

Z re F(x, W, s) := R F(t, x, ˙ x, ¨ x, W, ˙ W, ¨ W, s) dt.

neural constraints

Formulation of Learning (con’t)

holonomic constraints - any digraph

slack variables

Proposition 2: Functionally independent for any graph regularization term risk function

slide-18
SLIDE 18

A (x, W) = Z ⇣mx 2 | ˙ x(t)|2 + mW 2 | ˙ W(t)|2 + F(t, x, W) ⌘ $(t) dt

Formulation of Learning (con’t)

Non-holonomic constraints (any digraph)

neural constraints

Proposition 3: Functionally independent for any graph regularization term loss term

˙ xi(t) + cxi(t) − (wik(t)xk(t)) = 0;

0 < c < 1

slide-19
SLIDE 19

Feedforward Networks (DAGs)

− mx$(t)¨ x(t) − mx ˙ $(t) ˙ x(t) − j(t)Gj

ξ(x(t), W(t)) + Lx F (x(t), W(t)) = 0;

− mW $(t) ¨ W(t) − mW ˙ $(t) ˙ W(t) − j(t)Gj

M(x(t), W(t)) + LW F (x(t), W(t)) = 0,

where Lx

F = Fx−d(F ˙ x)/dt+d2(F¨ x)/dt2, LW F = FW −d(F ˙ W )/dt+d2(F ¨ W )/dt2

vatives of F with respect to x and W respectively (see (9)). An expression for Lagrange

⇣Gi

ξaGj ξa

mx + Gi

mabGj mab

mW ⌘ j =$

  • Gi

ττ + 2(Gi τξa ˙

xa + Gi

τmab ˙

wab + Gi

ξambc ˙

xa ˙ wbc) + Gi

ξaξb ˙

xa ˙ xb + Gi

mabmcd ˙

wab ˙ wcd

  • − ˙

$( ˙ xaGi

ξa + ˙

wabGi

mab) +

Lxa

F Gi ξa

mx + Lwab

F

Gi

mab

mW ,

instantaneous linear equation

F(t, x, ˙ x, ¨ x, W, ˙ W, ¨ W) = F(t, x) → Lx

F = ∂xF,

Lw

F = 0

supervised learning

slide-20
SLIDE 20

Reduction to Backpropagation

→ ˙ Wij = − 1 0(wikxk)ixj; Gi

ξaGj ξaj = −VxaGi ξa,

T = −Vx,

T is T =   1 −0(w21x1)w21 1 −0(w32x2)w32 1  

  • Eq. (26) the Lagrange multipliers are derived as follows

3 = −Vx3; 2 = 0(w32x2)w323; 1 = 0(w21x1)w212. λ

W

x

the chain rule arises …

mx → 0

Augmented Learning Space

A somewhat surprising kinship with the BP delta-error Early discovery by Yan Le Cun, 1989

slide-21
SLIDE 21

Euler-Lagrange Equations

non-holonomic constraints

λ

W

x BP-like GNN factorization δjxi

Unlike BPTT and RTRL, learning equations are local in space and time: connections with Equilibrium Propagation (Y. Bengio et al)

Augmented Learning Space

intuition: we need to store the multipliers and provide temporal updating

˙ xi(t) + cxi(t) − (wik(t)xk(t)) = 0; 1

− ˙ W(t) = − 1 j(t)Gj

M(t, x(t), W(t), ˙

x(t)); ˙ (t) = j(t)Gj

ξ(t, x(t), W(t), ˙

x(t)) + Vξ(t, x(t))

This makes GNN efficient!

slide-22
SLIDE 22

DIFFUSION LEARNING AND BIOLOGICAL PLAUSIBILITY

LOCALITY IN SPACE AND IN TIME

δi xj

reactions: Lagrangian multipliers environmental interaction inputs

slide-23
SLIDE 23

− xi(t) = (wik(t − 1)xk(t − 1)).

Biological Plausibility of Backpropagation

xi(t) = σ(wikxk(t))

Biological concerns should not involve BP , but the instantaneous map

replace with

… clever related comment by Francis Crick, 1989 BP diffusion is biologically plausible BP algorithm is NOT biologically plausible

˙ xi(t) + cxi(t) − (wik(t)xk(t)) = 0;

slide-24
SLIDE 24

t t + 1 t + 2 t + 3 t + 4 t + 5 t + 6 t + 7 t + 8

Forward and Backward Waves

BP diffusion is biologically plausible BP algorithm is NOT biologically plausible

slide-25
SLIDE 25
  • GNN: Success due to convolutional graphs, but the

“diffusion path” is still worth exploring

  • What happens with deep networks in graph compiling?
  • Laws of learning, pre-algorithmic issues, and biological

plausibility

  • Dynamic models for Lagrangian multipliers (always delta-

error): new perspective whenever time-coherence does matter!

  • Euler-Lagrangian Learning and SGD

Conclusions

PRELIMINARY EXPERIMENTAL CHECK

slide-26
SLIDE 26

Acknowledgments

NeurIPS 2019

Alessandro Betti, SAILAB

Publications

  • F. Scarselli et al, “The Graph Neural Network Model,” IEEE-TNN,

2009

  • A. Betti, M. Gori, and S. Melacci, Cognitive Action Laws: The

Case of Visual Features, IEEE-TNNLS 2019

  • A. Betti, M. Gori, and S. Melacci, Motion Invariance in

Visual Environment, IJCAI 2019

  • A. Betti and M. Gori, Backprop Diffusion is Biologically

Plausible, arXiv:1912.04635

  • A. Betti and M. Gori, Spatiotemporal Local Propagation, arXiv:

1907.05106

  • arXiv:1912.04635
  • Software

Preliminary version

slide-27
SLIDE 27

Machine Learning

Marco Gori

A CONSTRAINT-BASED APPROACH