Neural Networks Stefan Edelkamp 1 Overview - Introduction - - - PowerPoint PPT Presentation

neural networks
SMART_READER_LITE
LIVE PREVIEW

Neural Networks Stefan Edelkamp 1 Overview - Introduction - - - PowerPoint PPT Presentation

Neural Networks Stefan Edelkamp 1 Overview - Introduction - Percepton - Hofield-Nets - Self-Organizing Maps - Feed-Forward Neural Networks - Backpropagation Overview 1 2 Introduction Idea: Mimic principle of biological neural networks


slide-1
SLIDE 1

Neural Networks

Stefan Edelkamp

slide-2
SLIDE 2

1 Overview

  • Introduction
  • Percepton
  • Hofield-Nets
  • Self-Organizing Maps
  • Feed-Forward Neural Networks
  • Backpropagation

Overview 1

slide-3
SLIDE 3

2 Introduction

Idea: Mimic principle of biological neural networks with artificial neural networks

1 2 3 4 5 6 7 8 9

  • adapt settled solutions of nature
  • parallelization ⇒ high performance
  • redundancy ⇒ tolerance for failures
  • enable learning with small programming efforts

Introduction 2

slide-4
SLIDE 4

Ingrediences

Needs for an artificial neural network:

  • behavior artificial neurons
  • order of computation
  • activation function
  • structure of the net (topology)
  • recurrent nets
  • feed-forward nets
  • integration in environment
  • learning algorithm

Introduction 3

slide-5
SLIDE 5

Percepton-Learning

. . . very simple network with no hidden neurons Inputs: x, weighted with w, weights added Activating Function: Θ Output: z, determined by computing Θ(wTx) Additional: weighted input representing constant 1

Introduction 4

slide-6
SLIDE 6

Training

f : M ⊂ I Rd → {0, 1} net function

  • 1. initialize counter i and initial weight vector w0 to 0
  • 2. as long as there are vectors for which wix ≤ 0 set wi+1 to wi + x and

increase i by 1

  • 3. return wi+1

Introduction 5

slide-7
SLIDE 7

Termination on Training Data

Assume vector w to be normalized, and w∗ to be final with ||w∗|| = 1

  • f = Θ((x, 1)w∗) , constants δ and γ with |(x, 1)w∗| ≥ δ and ||(x, 1)|| ≤ γ
  • for angle between wi and w∗ we have 1 ≥ cos αi = wT

i w∗/||wi||

  • wT

i+1w∗ = (wi + xi)Tw∗ = wT i w∗ + xT i w∗

  • xT

0w∗ ≥ δ ⇒ wT i+1w∗ ≥ δ(i + 1)

  • ||wi+1|| =
  • (wi + xi)T(wi + xi) ≤
  • ||wi||2 + ||xi||2 + 2wT

i xi ≤

  • ||wi||2 + γ2 ≤ γ√i + 1 (Induction: ||wi|| ≤ γ

√ i) ⇒ cos αi ≥ δ√i + 1/γ converges to ∞ (as i goes to ∞)

Introduction 6

slide-8
SLIDE 8

3 Hopfield Nets

Neurons: 1 2 . . . d Activations: x1 x2 . . . xd ; xi ∈ {0, 1} Connections: wij ∈ I R (1 ≤ i, j ≤ d) with wii = 0, wij = wji ⇒ W :=

  • wij
  • d×d

Update: asynchronous & stochastic x′

j :=

      

if d

i=1 xiwij < 0

1 if d

i=1 xiwij > 0

xj

  • therwise

Hopfield Nets 7

slide-9
SLIDE 9

Example

x3 1 −2 x1 3 x2

W =

  

1 −2 1 3 −2 3

  

Use:

  • associative memory
  • computing Boolean functions
  • combinatorical optimization

Hopfield Nets 8

slide-10
SLIDE 10

Energy of a Hopfield-Net

x = (x1, x2, . . . , xd)T ⇒ E(x) := −1

2 xT W x = − i<j xi wij xj be the

energy of a Hopfield net Theorem Every update, which changes the Hopfield-Netz, reduces the energy. Proof Assume: Update changes xk into x′

k ⇒

E(x) − E(x′) = −

  • i<j

xi wij xj +

  • i<j

x′

i wij x′ j

= −

  • j=k

xk wkj xj +

  • j=k

x′

k wkj x′ j

  • =xj

= −

  • xk + (−x′

k)

  • j=k

wkj xj > 0

Hopfield Nets 9

slide-11
SLIDE 11

Solving a COP

Input: Combinatorial Optimization Problem (COP) Output: Solution for COP Algorithm:

  • select Hopfield net with parameters of COP as weights and solution at

minimum of energy

  • start net with random activation
  • computer sequence of updates until stabilization
  • read parameters
  • test feasibility and optimality of solution

Hopfield Nets 10

slide-12
SLIDE 12

Multi Flop-Problem

Problem Instance: k, n ∈ I N , k < n Feasible Solutions: ˜

x = (x1, . . . , xn) ∈ {0, 1}n

Objective Function: P(˜

x) = n

i=1 xi

Optimal Solution: solution ˜

x with P(˜ x) = k

Minimization Problem: d = n + 1, xd = 1, x = (x1, x2, . . . , xn, xd)T ⇒ E(x) =

d

i=1 xi − (k + 1)

2

=

d

  • i=1

x2

i

  • =xi

+

  • i=j

xi xj − 2(k + 1)

d

  • i=1

xi + (k + 1)2 =

  • i=j

xi xj − (2k + 1)

d−1

  • i=1

xi xd + k2 = −1 2

  • i<j

xi(−4)xj − 1 2

  • i<d

xi (4k + 2) xd + k2

Hopfield Nets 11

slide-13
SLIDE 13

Example

(n = 3, k = 1):

x1 x3 −2 −2 1 1 1 −2 x2 x4 Hopfield Nets 12

slide-14
SLIDE 14

Traveling Salesperson-Problem (TSP)

Problem Instance: Cities: 1 2 . . . n Distances: dij ∈ I R+ (1 ≤ i, j ≤ n) with dii = 0 Feasible Solution: permutation π of (1, 2, . . . , n) Objective Functions: P(π) = n

i=1 dπ(i),π(i mod n+1)

Optimal Solutions: feasible solution π with minimal P(π)

Hopfield Nets 13

slide-15
SLIDE 15

Encoding

Idea: Hopfield-Net with d = n2 + 1 neurons:

− + −d32 −d12 −d23 −d21 π(i) i

Problem: ”‘Size”’ of the weights to allow both feasible and good solutions Trick: Transition to continuous Hopfield-Net and modified weights ⇒ good solution

  • f TSP

Hopfield Nets 14

slide-16
SLIDE 16

4 Self-Organizing Maps (SOM)

Neurons: Input: 1, 2, . . . , d for components xi Map: 1, 2, . . . , m; regular (linear-, rectangular, hexagonial-) Grid r to store pattern vectors µi ∈ I Rd Output:1, 2, . . . , d for µc Update: L ⊂ I Rd, learning set; at time t ∈ I N+, x ∈ L is chosen by random ⇒ c ∈ {1, . . . , m} determined with x − µc ≤ x − µi (∀i ∈ {1, . . . , m}) and adapted to the pattern: µ′

i := µi + h(c, i, t) (x − µi)

∀i ∈ {1, . . . , m} with h(c, i, t) time-dependent neighborhood-relation and h(c, i, t) → 0 for t → ∞, e.g.: h(c, i, t) = α(t) · exp

  • −rc − ri2

2σ(t)2

  • Self-Organizing Maps (SOM)

15

slide-17
SLIDE 17

Application of SOM

. . . include: visualization and interpretation, dimension reduction schemes, clustering, and classification, COPs . . .

Self-Organizing Maps (SOM) 16

slide-18
SLIDE 18

A size-50 map adapts to a triangle

Self-Organizing Maps (SOM) 17

slide-19
SLIDE 19

A 15 × 15-Grid is adapted to a triangle

Self-Organizing Maps (SOM) 18

slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22

SOM for Combinatorial Optimization

∆-TSP Idea: Use growing ring (elastic band) of neurons Tests with n ≤ 2392 show that the running time scales linearly and deviates from the optimum by less than 9 %

Self-Organizing Maps (SOM) 19

slide-23
SLIDE 23

SOM for Combinatorial Optimization

Self-Organizing Maps (SOM) 20

slide-24
SLIDE 24

500 neurons 2000 neurons 10 neurons 50 neurons

slide-25
SLIDE 25
slide-26
SLIDE 26

SOM for Combinatorial Optimization

Tour with 2526 neurons:

Self-Organizing Maps (SOM) 21

slide-27
SLIDE 27
slide-28
SLIDE 28

5 Layered Feed-Forward Nets (MLP)

1 2 3

Layered Feed-Forward Nets (MLP) 22

slide-29
SLIDE 29

Formalization

A L-layered MLP (multi-layered perceptron) Layer: S0, S1, . . . , SL−1, SL Connection: Of each neuron i in Sℓ to j in Sℓ+1 with weight wij, exept 1-neurons Update: layer-wise synchronously mixed x′

j := ϕ

  • i∈V(j) xi wij
  • with ϕ differenciable,

z.B. ϕ(a) = σ(a) =

1 1+exp(−a)

5

  • 5

Layered Feed-Forward Nets (MLP) 23

slide-30
SLIDE 30

Layered Feed-Forward Nets

Applications: function approximation, classification Theorem: All Boolean functions can be computed with a 2-layered MLP (no proof) Theorem: continuous real functions and their derivatives can be jointly approximated to an arbitrary precision on a compact sets (no proof)

Layered Feed-Forward Nets (MLP) 24

slide-31
SLIDE 31

Learning Parameters in MLP

Given: x1, . . . , xN ∈ I Rd und t1, . . . , tN ∈ I Rc, MLP with d input and c output neurons,

w = w1, . . . , wM contains all weights, f(x, w) is the net function

Task: find optimal w∗, that minimizes the error E(w) := 1 2

N

  • n=1

c

  • k=1

fk(xn, w) − tn

k

2

partial derivatives of f exists with respect to the inputs and the parameters ⇒ any gradient-based optimization methods can be used (conjugated gradient, . . . ) ∇wE(w) =

N

  • n=1

c

  • k=1

fk(xn, w) − tn

k

∇wfk(xn, w)

Layered Feed-Forward Nets (MLP) 25

slide-32
SLIDE 32

Backpropagation

Basic Calculus: ∂ ∂t f(g(t))

  • t=t0

=

∂s f(s)

  • s=g(t0)

∂ ∂t g(t)

  • t=t0
  • Example: ϕ(a) := 9 − a2, x = (1, 2)T, w = (1, 1)T, t = 2:

w2

∗ ∗

x1 x2 + w1

.2 2

− ϕ t f E

Layered Feed-Forward Nets (MLP) 26

slide-33
SLIDE 33

∇wE(w)|w=(1,1)T = h(x, y) = x ∗ y ⇒ ∂/∂x h(x, y) = y h(x, y) = x + y ⇒ ∂/∂x h(x, y) = 1 h(x, y) = x − y ⇒ ∂/∂x h(x, y) = 1 ϕ(x) = 9 − x2 ⇒ ∂/∂x ϕ(x) = −2x h(x) = x2/2 ⇒ ∂/∂x h(x) = x

slide-34
SLIDE 34

Backpropagation

Theorem: ∇wE(w) can be computed in time O(N × M) if network is of size O(M) Algorithm: ∀n ∈ {1, . . . , N}

  • compute net functions f(xn, w) and associated error E in forward direction

store values in net

  • compute partial derivatives of E with respect to all intermediates in backward

direction and add all parts for total gradient

Layered Feed-Forward Nets (MLP) 27