NeurIPS 2019
GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY - - PowerPoint PPT Presentation
GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY - - PowerPoint PPT Presentation
GRAPH REPRESENTATIONS, BACKPROPAGATION AND BIOLOGICAL PLAUSIBILITY Marco Gori SAILAB, University of Siena NeurIPS 2019 OUTLINE Learning in structured domains Diffusion machines and spatiotemporal locality Backpropagation diffusion
NeurIPS 2019
OUTLINE
- Learning in structured domains
- Diffusion machines and spatiotemporal locality
- Backpropagation diffusion and biological
plausibility
NeurIPS 2019
LEARNING IN STRUCTURED DOMAINS
?
p h y s i c
- c
h e m i c a l b e h a v i
- r
What are the features?
image classification
Graphs as Pattern Models
Social nets
Social networks Citation networks Communication networks Multi-agent systems
here we need to make prediction at node level! quasi-eqlibrium dynamic models
NeurIPS 2019
GRAPH NEURAL NETS
?
pictures from Z. Wu et al
popular and successful mostly thanks to graph convolutional networks Non-Euclidean Deep Learning
NeurIPS 2019
HISTORICALLY … ANOTHER PATH WAS FOLLOWED!
Extension of the idea of time unfolding …
Structure unfolding
The case of binary trees …
NeurIPS 2019
Graph Compiling …
11 12 13 14 15 16 17 18 19 110 21 22 23 24 25 26 27 28 29 210 31 32 33 34 35 36 37 38 39 310 41 42 43 44 45 46 47 48 49 410 51 52 53 54 55 56 57 58 59 510 5 11 12 13 14 23 24 15 25 35 45 55 16 17 18 26 27 28 19 110 29 210 36 37 38 46 47 48 39 310 49 410 56 57 58 59 510 5
25
A recurrent net arises from cyclic graphs
The Graph Neural Network Model
Gori et al IJCNN 2005, 2009 IEEE-TNN
NeurIPS 2019
LEARNING AS A DIFFUSION PROCESS
THE FRAMEWORK OF CONSTRAINED-BASED LEARNING AND THE ROLE OF TIME COHERENCE
?
NeurIPS 2019
A = Z T dt e−t/✏ ✓1 2✏2⇢¨ q2 + 1 2✏⌫ ˙ q2 + V (q, t) ◆
kinetic energy potential energy
✏
Loss function of neural net
Natural Laws of Learning The links with mechanics
Once we believe in ergodicity … there is no distinction between training and test sets! regularization term laws of learning laws of mechanics
NeurIPS 2019
Natural Learning Theory Mechan- ics Remarks wi qi Weights are interpreted as generalized coor- dinates. ˙ wi ˙ qi Weights variations are interpreted as gener- alized velocities. υi pi The conjugate momentum to the weights is defined by using the machinery of Legendre transforms. A(w) S(q) The cognitive action is the dual of the action in mechanics. F(t, w, ˙ w) L(t, q, ˙ q) The Lagrangian F is associated with the classic Lagrangian L in mechanics. H(t, w, υ) H(t, q, p) When using w and υ, we can define the Hamiltonian, just like in mechanics.
Natural Laws of Cognition: A Pre-Algorithmic Step
w e i g h t s
- f
n e u r a l n e t a s p
- s
i t i
- n
p a r t i c l e s
NeurIPS 2019
y((0, 1), (1, 0))
¬y((0, 0), (1, 1)).
∧ ∨ L = {((0, 0), 0), ((0, 1), 1), ((1, 0), 1), ((1, 1), 0)} =
Constraint Reactions
“hard” architectural constraints training set constraints
xκ5 − σ(w53xκ3 + w54xκ4 + b4) = 0 xκ4 − σ(w41xκ1 + w42xκ2 + b4) = 0 xκ3 − σ(w31xκ1 + w32xκ2 + b3) = 0
κ = 1, 2, 3, 4
x15 = 1, x25 = 1, x35 = 0, x45 = 0
1 2
3
4
architectural and environmental constraints
Lagrangian Multipliers Static Models Dynamic Models
holonomic constraints non-holonomic constraints
Lagrangian Approach
functional optimization: variational calculus under subsidiary conditions
Formulation of Learning
R A (x, W) := Z 1 2(mx| ˙ x(t)|2 + mW | ˙ W(t)|2) $(t)dt + F(x, W), constraints
let us consider the case ∈ M#
ν
and let wing F(x, W) := R F(t, x, ˙ x, ¨ x, W, ˙ W, ¨ W) dt, Gj(t, x(t), W(t)) = 0, 1 ≤ j ≤ ⌫. proposition holds true:
Gj(⌧, ⇠, M) := ⇢⇠j − ej(⌧), if 1 ≤ j ≤ !; ⇠j − (mjk⇠k) if ! < j ≤ ⌫,
neural constraints (Einstein’s notation)
holonomic constraints (DAGs)
Proposition 1: Functionally independent for acyclic graphs feedforward nets
risk function regularization term
Gj(⌧, ⇠, M, ⇣) := ( ⇠j − ej(⌧) + ⇣j, if 1 ≤ j ≤ !; ⇠j − (mjk⇠k) + ⇣j if ! < j ≤ ⌫.
A (x, W, s) := Z 1 2(mx| ˙ x(t)|2 + mW | ˙ W(t)|2 + ms| ˙ s(t)|2) $(t)dt + F(x, W, s), (12)
Z re F(x, W, s) := R F(t, x, ˙ x, ¨ x, W, ˙ W, ¨ W, s) dt.
neural constraints
Formulation of Learning (con’t)
holonomic constraints - any digraph
slack variables
Proposition 2: Functionally independent for any graph regularization term risk function
A (x, W) = Z ⇣mx 2 | ˙ x(t)|2 + mW 2 | ˙ W(t)|2 + F(t, x, W) ⌘ $(t) dt
Formulation of Learning (con’t)
Non-holonomic constraints (any digraph)
neural constraints
Proposition 3: Functionally independent for any graph regularization term loss term
˙ xi(t) + cxi(t) − (wik(t)xk(t)) = 0;
0 < c < 1
Feedforward Networks (DAGs)
− mx$(t)¨ x(t) − mx ˙ $(t) ˙ x(t) − j(t)Gj
ξ(x(t), W(t)) + Lx F (x(t), W(t)) = 0;
− mW $(t) ¨ W(t) − mW ˙ $(t) ˙ W(t) − j(t)Gj
M(x(t), W(t)) + LW F (x(t), W(t)) = 0,
where Lx
F = Fx−d(F ˙ x)/dt+d2(F¨ x)/dt2, LW F = FW −d(F ˙ W )/dt+d2(F ¨ W )/dt2
vatives of F with respect to x and W respectively (see (9)). An expression for Lagrange
⇣Gi
ξaGj ξa
mx + Gi
mabGj mab
mW ⌘ j =$
- Gi
ττ + 2(Gi τξa ˙
xa + Gi
τmab ˙
wab + Gi
ξambc ˙
xa ˙ wbc) + Gi
ξaξb ˙
xa ˙ xb + Gi
mabmcd ˙
wab ˙ wcd
- − ˙
$( ˙ xaGi
ξa + ˙
wabGi
mab) +
Lxa
F Gi ξa
mx + Lwab
F
Gi
mab
mW ,
instantaneous linear equation
F(t, x, ˙ x, ¨ x, W, ˙ W, ¨ W) = F(t, x) → Lx
F = ∂xF,
Lw
F = 0
supervised learning
Reduction to Backpropagation
→ ˙ Wij = − 1 0(wikxk)ixj; Gi
ξaGj ξaj = −VxaGi ξa,
T = −Vx,
T is T = 1 −0(w21x1)w21 1 −0(w32x2)w32 1
- Eq. (26) the Lagrange multipliers are derived as follows
3 = −Vx3; 2 = 0(w32x2)w323; 1 = 0(w21x1)w212. λ
W
x
the chain rule arises …
mx → 0
Augmented Learning Space
A somewhat surprising kinship with the BP delta-error Early discovery by Yan Le Cun, 1989
Euler-Lagrange Equations
non-holonomic constraints
λ
W
x BP-like GNN factorization δjxi
Unlike BPTT and RTRL, learning equations are local in space and time: connections with Equilibrium Propagation (Y. Bengio et al)
Augmented Learning Space
intuition: we need to store the multipliers and provide temporal updating
˙ xi(t) + cxi(t) − (wik(t)xk(t)) = 0; 1
− ˙ W(t) = − 1 j(t)Gj
M(t, x(t), W(t), ˙
x(t)); ˙ (t) = j(t)Gj
ξ(t, x(t), W(t), ˙
x(t)) + Vξ(t, x(t))
This makes GNN efficient!
DIFFUSION LEARNING AND BIOLOGICAL PLAUSIBILITY
LOCALITY IN SPACE AND IN TIME
δi xj
reactions: Lagrangian multipliers environmental interaction inputs
− xi(t) = (wik(t − 1)xk(t − 1)).
Biological Plausibility of Backpropagation
xi(t) = σ(wikxk(t))
Biological concerns should not involve BP , but the instantaneous map
replace with
… clever related comment by Francis Crick, 1989 BP diffusion is biologically plausible BP algorithm is NOT biologically plausible
˙ xi(t) + cxi(t) − (wik(t)xk(t)) = 0;
t t + 1 t + 2 t + 3 t + 4 t + 5 t + 6 t + 7 t + 8
Forward and Backward Waves
BP diffusion is biologically plausible BP algorithm is NOT biologically plausible
- GNN: Success due to convolutional graphs, but the
“diffusion path” is still worth exploring
- What happens with deep networks in graph compiling?
- Laws of learning, pre-algorithmic issues, and biological
plausibility
- Dynamic models for Lagrangian multipliers (always delta-
error): new perspective whenever time-coherence does matter!
- Euler-Lagrangian Learning and SGD
Conclusions
PRELIMINARY EXPERIMENTAL CHECK
Acknowledgments
NeurIPS 2019
Alessandro Betti, SAILAB
Publications
- F. Scarselli et al, “The Graph Neural Network Model,” IEEE-TNN,
2009
- A. Betti, M. Gori, and S. Melacci, Cognitive Action Laws: The
Case of Visual Features, IEEE-TNNLS 2019
- A. Betti, M. Gori, and S. Melacci, Motion Invariance in
Visual Environment, IJCAI 2019
- A. Betti and M. Gori, Backprop Diffusion is Biologically
Plausible, arXiv:1912.04635
- A. Betti and M. Gori, Spatiotemporal Local Propagation, arXiv:
1907.05106
- arXiv:1912.04635
- Software
Preliminary version
Machine Learning
Marco Gori
A CONSTRAINT-BASED APPROACH