Neural Networks Stefan Edelkamp 1 Overview - Introduction - - - PowerPoint PPT Presentation
Neural Networks Stefan Edelkamp 1 Overview - Introduction - - - PowerPoint PPT Presentation
Neural Networks Stefan Edelkamp 1 Overview - Introduction - Percepton - Hofield-Nets - Self-Organizing Maps - Feed-Forward Neural Networks - Backpropagation Overview 1 2 Introduction Idea: Mimic principle of biological neural networks
1 Overview
- Introduction
- Percepton
- Hofield-Nets
- Self-Organizing Maps
- Feed-Forward Neural Networks
- Backpropagation
Overview 1
2 Introduction
Idea: Mimic principle of biological neural networks with artificial neural networks
1 2 3 4 5 6 7 8 9
- adapt settled solutions of nature
- parallelization ⇒ high performance
- redundancy ⇒ tolerance for failures
- enable learning with small programming efforts
Introduction 2
Ingrediences
Needs for an artificial neural network:
- behavior artificial neurons
- order of computation
- activation function
- structure of the net (topology)
- recurrent nets
- feed-forward nets
- integration in environment
- learning algorithm
Introduction 3
Percepton-Learning
. . . very simple network with no hidden neurons Inputs: x, weighted with w, weights added Activating Function: Θ Output: z, determined by computing Θ(wTx) Additional: weighted input representing constant 1
Introduction 4
Training
f : M ⊂ I Rd → {0, 1} net function
- 1. initialize counter i and initial weight vector w0 to 0
- 2. as long as there are vectors for which wix ≤ 0 set wi+1 to wi + x and
increase i by 1
- 3. return wi+1
Introduction 5
Termination on Training Data
Assume vector w to be normalized, and w∗ to be final with ||w∗|| = 1
- f = Θ((x, 1)w∗) , constants δ and γ with |(x, 1)w∗| ≥ δ and ||(x, 1)|| ≤ γ
- for angle between wi and w∗ we have 1 ≥ cos αi = wT
i w∗/||wi||
- wT
i+1w∗ = (wi + xi)Tw∗ = wT i w∗ + xT i w∗
- xT
0w∗ ≥ δ ⇒ wT i+1w∗ ≥ δ(i + 1)
- ||wi+1|| =
- (wi + xi)T(wi + xi) ≤
- ||wi||2 + ||xi||2 + 2wT
i xi ≤
- ||wi||2 + γ2 ≤ γ√i + 1 (Induction: ||wi|| ≤ γ
√ i) ⇒ cos αi ≥ δ√i + 1/γ converges to ∞ (as i goes to ∞)
Introduction 6
3 Hopfield Nets
Neurons: 1 2 . . . d Activations: x1 x2 . . . xd ; xi ∈ {0, 1} Connections: wij ∈ I R (1 ≤ i, j ≤ d) with wii = 0, wij = wji ⇒ W :=
- wij
- d×d
Update: asynchronous & stochastic x′
j :=
if d
i=1 xiwij < 0
1 if d
i=1 xiwij > 0
xj
- therwise
Hopfield Nets 7
Example
x3 1 −2 x1 3 x2
W =
1 −2 1 3 −2 3
Use:
- associative memory
- computing Boolean functions
- combinatorical optimization
Hopfield Nets 8
Energy of a Hopfield-Net
x = (x1, x2, . . . , xd)T ⇒ E(x) := −1
2 xT W x = − i<j xi wij xj be the
energy of a Hopfield net Theorem Every update, which changes the Hopfield-Netz, reduces the energy. Proof Assume: Update changes xk into x′
k ⇒
E(x) − E(x′) = −
- i<j
xi wij xj +
- i<j
x′
i wij x′ j
= −
- j=k
xk wkj xj +
- j=k
x′
k wkj x′ j
- =xj
= −
- xk + (−x′
k)
- j=k
wkj xj > 0
Hopfield Nets 9
Solving a COP
Input: Combinatorial Optimization Problem (COP) Output: Solution for COP Algorithm:
- select Hopfield net with parameters of COP as weights and solution at
minimum of energy
- start net with random activation
- computer sequence of updates until stabilization
- read parameters
- test feasibility and optimality of solution
Hopfield Nets 10
Multi Flop-Problem
Problem Instance: k, n ∈ I N , k < n Feasible Solutions: ˜
x = (x1, . . . , xn) ∈ {0, 1}n
Objective Function: P(˜
x) = n
i=1 xi
Optimal Solution: solution ˜
x with P(˜ x) = k
Minimization Problem: d = n + 1, xd = 1, x = (x1, x2, . . . , xn, xd)T ⇒ E(x) =
d
i=1 xi − (k + 1)
2
=
d
- i=1
x2
i
- =xi
+
- i=j
xi xj − 2(k + 1)
d
- i=1
xi + (k + 1)2 =
- i=j
xi xj − (2k + 1)
d−1
- i=1
xi xd + k2 = −1 2
- i<j
xi(−4)xj − 1 2
- i<d
xi (4k + 2) xd + k2
Hopfield Nets 11
Example
(n = 3, k = 1):
x1 x3 −2 −2 1 1 1 −2 x2 x4 Hopfield Nets 12
Traveling Salesperson-Problem (TSP)
Problem Instance: Cities: 1 2 . . . n Distances: dij ∈ I R+ (1 ≤ i, j ≤ n) with dii = 0 Feasible Solution: permutation π of (1, 2, . . . , n) Objective Functions: P(π) = n
i=1 dπ(i),π(i mod n+1)
Optimal Solutions: feasible solution π with minimal P(π)
Hopfield Nets 13
Encoding
Idea: Hopfield-Net with d = n2 + 1 neurons:
− + −d32 −d12 −d23 −d21 π(i) i
Problem: ”‘Size”’ of the weights to allow both feasible and good solutions Trick: Transition to continuous Hopfield-Net and modified weights ⇒ good solution
- f TSP
Hopfield Nets 14
4 Self-Organizing Maps (SOM)
Neurons: Input: 1, 2, . . . , d for components xi Map: 1, 2, . . . , m; regular (linear-, rectangular, hexagonial-) Grid r to store pattern vectors µi ∈ I Rd Output:1, 2, . . . , d for µc Update: L ⊂ I Rd, learning set; at time t ∈ I N+, x ∈ L is chosen by random ⇒ c ∈ {1, . . . , m} determined with x − µc ≤ x − µi (∀i ∈ {1, . . . , m}) and adapted to the pattern: µ′
i := µi + h(c, i, t) (x − µi)
∀i ∈ {1, . . . , m} with h(c, i, t) time-dependent neighborhood-relation and h(c, i, t) → 0 for t → ∞, e.g.: h(c, i, t) = α(t) · exp
- −rc − ri2
2σ(t)2
- Self-Organizing Maps (SOM)
15
Application of SOM
. . . include: visualization and interpretation, dimension reduction schemes, clustering, and classification, COPs . . .
Self-Organizing Maps (SOM) 16
A size-50 map adapts to a triangle
Self-Organizing Maps (SOM) 17
A 15 × 15-Grid is adapted to a triangle
Self-Organizing Maps (SOM) 18
SOM for Combinatorial Optimization
∆-TSP Idea: Use growing ring (elastic band) of neurons Tests with n ≤ 2392 show that the running time scales linearly and deviates from the optimum by less than 9 %
Self-Organizing Maps (SOM) 19
SOM for Combinatorial Optimization
Self-Organizing Maps (SOM) 20
500 neurons 2000 neurons 10 neurons 50 neurons
SOM for Combinatorial Optimization
Tour with 2526 neurons:
Self-Organizing Maps (SOM) 21
5 Layered Feed-Forward Nets (MLP)
1 2 3
Layered Feed-Forward Nets (MLP) 22
Formalization
A L-layered MLP (multi-layered perceptron) Layer: S0, S1, . . . , SL−1, SL Connection: Of each neuron i in Sℓ to j in Sℓ+1 with weight wij, exept 1-neurons Update: layer-wise synchronously mixed x′
j := ϕ
- i∈V(j) xi wij
- with ϕ differenciable,
z.B. ϕ(a) = σ(a) =
1 1+exp(−a)
5
- 5
Layered Feed-Forward Nets (MLP) 23
Layered Feed-Forward Nets
Applications: function approximation, classification Theorem: All Boolean functions can be computed with a 2-layered MLP (no proof) Theorem: continuous real functions and their derivatives can be jointly approximated to an arbitrary precision on a compact sets (no proof)
Layered Feed-Forward Nets (MLP) 24
Learning Parameters in MLP
Given: x1, . . . , xN ∈ I Rd und t1, . . . , tN ∈ I Rc, MLP with d input and c output neurons,
w = w1, . . . , wM contains all weights, f(x, w) is the net function
Task: find optimal w∗, that minimizes the error E(w) := 1 2
N
- n=1
c
- k=1
fk(xn, w) − tn
k
2
partial derivatives of f exists with respect to the inputs and the parameters ⇒ any gradient-based optimization methods can be used (conjugated gradient, . . . ) ∇wE(w) =
N
- n=1
c
- k=1
fk(xn, w) − tn
k
∇wfk(xn, w)
Layered Feed-Forward Nets (MLP) 25
Backpropagation
Basic Calculus: ∂ ∂t f(g(t))
- t=t0
=
- ∂
∂s f(s)
- s=g(t0)
∂ ∂t g(t)
- t=t0
- Example: ϕ(a) := 9 − a2, x = (1, 2)T, w = (1, 1)T, t = 2:
w2
∗ ∗
x1 x2 + w1
.2 2
− ϕ t f E
Layered Feed-Forward Nets (MLP) 26
∇wE(w)|w=(1,1)T = h(x, y) = x ∗ y ⇒ ∂/∂x h(x, y) = y h(x, y) = x + y ⇒ ∂/∂x h(x, y) = 1 h(x, y) = x − y ⇒ ∂/∂x h(x, y) = 1 ϕ(x) = 9 − x2 ⇒ ∂/∂x ϕ(x) = −2x h(x) = x2/2 ⇒ ∂/∂x h(x) = x
Backpropagation
Theorem: ∇wE(w) can be computed in time O(N × M) if network is of size O(M) Algorithm: ∀n ∈ {1, . . . , N}
- compute net functions f(xn, w) and associated error E in forward direction
store values in net
- compute partial derivatives of E with respect to all intermediates in backward
direction and add all parts for total gradient
Layered Feed-Forward Nets (MLP) 27