neural networks
play

Neural Networks Stefan Edelkamp 1 Overview - Introduction - - PowerPoint PPT Presentation

Neural Networks Stefan Edelkamp 1 Overview - Introduction - Percepton - Hofield-Nets - Self-Organizing Maps - Feed-Forward Neural Networks - Backpropagation Overview 1 2 Introduction Idea: Mimic principle of biological neural networks


  1. Neural Networks Stefan Edelkamp

  2. 1 Overview - Introduction - Percepton - Hofield-Nets - Self-Organizing Maps - Feed-Forward Neural Networks - Backpropagation Overview 1

  3. 2 Introduction Idea: Mimic principle of biological neural networks with artificial neural networks 6 5 9 8 1 3 2 7 4 - adapt settled solutions of nature - parallelization ⇒ high performance - redundancy ⇒ tolerance for failures - enable learning with small programming efforts Introduction 2

  4. Ingrediences Needs for an artificial neural network: • behavior artificial neurons • order of computation • activation function • structure of the net (topology) • recurrent nets • feed-forward nets • integration in environment • learning algorithm Introduction 3

  5. Percepton-Learning . . . very simple network with no hidden neurons Inputs: x , weighted with w , weights added Activating Function: Θ Output: z , determined by computing Θ( w T x ) Additional: weighted input representing constant 1 Introduction 4

  6. Training R d → { 0 , 1 } net function f : M ⊂ I 1. initialize counter i and initial weight vector w 0 to 0 2. as long as there are vectors for which w i x ≤ 0 set w i +1 to w i + x and increase i by 1 3. return w i +1 Introduction 5

  7. Termination on Training Data Assume vector w to be normalized, and w ∗ to be final with || w ∗ || = 1 - f = Θ(( x , 1) w ∗ ) , constants δ and γ with | ( x , 1) w ∗ | ≥ δ and || ( x , 1) || ≤ γ - for angle between w i and w ∗ we have 1 ≥ cos α i = w T i w ∗ / || w i || i +1 w ∗ = ( w i + x i ) T w ∗ = w T i w ∗ + x T - w T i w ∗ 0 w ∗ ≥ δ ⇒ w T i +1 w ∗ ≥ δ ( i + 1) - x T � � || w i || 2 + || x i || 2 + 2 w T ( w i + x i ) T ( w i + x i ) ≤ - || w i +1 || = i x i ≤ √ || w i || 2 + γ 2 ≤ γ √ i + 1 (Induction: || w i || ≤ γ � i ) ⇒ cos α i ≥ δ √ i + 1 /γ converges to ∞ (as i goes to ∞ ) Introduction 6

  8. 3 Hopfield Nets 1 2 Neurons: . . . d ; x i ∈ { 0 , 1 } Activations: x 1 x 2 . . . x d Connections: w ij ∈ I R (1 ≤ i, j ≤ d ) with � � w ii = 0 , w ij = w ji ⇒ W := w ij d × d Update: asynchronous & stochastic  if � d 0 i =1 x i w ij < 0    x ′ j := if � d 1 i =1 x i w ij > 0   x j otherwise  Hopfield Nets 7

  9. Example 1 x 1 x 2   0 1 − 2   W = 1 0 3 − 2 3   − 2 3 0 x 3 Use: • associative memory • computing Boolean functions • combinatorical optimization Hopfield Nets 8

  10. Energy of a Hopfield-Net x = ( x 1 , x 2 , . . . , x d ) T ⇒ E ( x ) := − 1 2 x T W x = − � i<j x i w ij x j be the energy of a Hopfield net Theorem Every update, which changes the Hopfield-Netz, reduces the energy. Proof Assume: Update changes x k into x ′ k ⇒ � � E ( x ) − E ( x ′ ) x ′ i w ij x ′ = − x i w ij x j + j i<j i<j � � x ′ k w kj x ′ = x k w kj x j + − j ���� j � = k j � = k = x j � � � x k + ( − x ′ = − k ) w kj x j > 0 j � = k Hopfield Nets 9

  11. Solving a COP Input: Combinatorial Optimization Problem (COP) Output: Solution for COP Algorithm: • select Hopfield net with parameters of COP as weights and solution at minimum of energy • start net with random activation • computer sequence of updates until stabilization • read parameters • test feasibility and optimality of solution Hopfield Nets 10

  12. Multi Flop-Problem Problem Instance: k, n ∈ I N , k < n x = ( x 1 , . . . , x n ) ∈ { 0 , 1 } n Feasible Solutions: ˜ x ) = � n Objective Function: P (˜ i =1 x i Optimal Solution: solution ˜ x with P (˜ x ) = k Minimization Problem: d = n + 1 , x d = 1 , x = ( x 1 , x 2 , . . . , x n , x d ) T ⇒ � 2 �� d E ( x ) = i =1 x i − ( k + 1) d d � � � x 2 x i + ( k + 1) 2 = + x i x j − 2( k + 1) i ���� i =1 i � = j i =1 = x i d − 1 � � x i x d + k 2 = x i x j − (2 k + 1) i =1 i � = j − 1 x i ( − 4) x j − 1 � � x i (4 k + 2) x d + k 2 = 2 2 i<j i<d Hopfield Nets 11

  13. Example ( n = 3 , k = 1) : x 1 1 − 2 1 x 4 x 2 − 2 − 2 1 x 3 Hopfield Nets 12

  14. Traveling Salesperson-Problem (TSP) Problem Instance: Cities: 1 2 . . . n R + d ij ∈ I (1 ≤ i, j ≤ n ) with d ii = 0 Distances: Feasible Solution: permutation π of (1 , 2 , . . . , n ) Objective Functions: P ( π ) = � n i =1 d π ( i ) ,π ( i mod n +1) Optimal Solutions: feasible solution π with minimal P ( π ) Hopfield Nets 13

  15. Encoding Idea: Hopfield-Net with d = n 2 + 1 neurons: + i − − d 12 − d 21 π ( i ) − d 32 − d 23 Problem: ”‘Size”’ of the weights to allow both feasible and good solutions Trick: Transition to continuous Hopfield-Net and modified weights ⇒ good solution of TSP Hopfield Nets 14

  16. 4 Self-Organizing Maps (SOM) Neurons: Input: 1 , 2 , . . . , d for components x i Map: 1 , 2 , . . . , m ; regular (linear-, rectangular, hexagonial-) Grid r to store R d pattern vectors µ i ∈ I Output: 1 , 2 , . . . , d for µ c Update: R d , learning set; at time t ∈ I N + , x ∈ L is chosen by random ⇒ L ⊂ I c ∈ { 1 , . . . , m } determined with � x − µ c � ≤ � x − µ i � ( ∀ i ∈ { 1 , . . . , m } ) and adapted to the pattern: µ ′ i := µ i + h ( c, i, t ) ( x − µ i ) ∀ i ∈ { 1 , . . . , m } with h ( c, i, t ) time-dependent neighborhood-relation � � −� r c − r i � 2 and h ( c, i, t ) → 0 for t → ∞ , e.g.: h ( c, i, t ) = α ( t ) · exp 2 σ ( t ) 2 Self-Organizing Maps (SOM) 15

  17. Application of SOM . . . include: visualization and interpretation, dimension reduction schemes, clustering, and classification, COPs . . . Self-Organizing Maps (SOM) 16

  18. A size- 50 map adapts to a triangle Self-Organizing Maps (SOM) 17

  19. A 15 × 15 -Grid is adapted to a triangle Self-Organizing Maps (SOM) 18

  20. SOM for Combinatorial Optimization ∆ -TSP Idea: Use growing ring (elastic band) of neurons Tests with n ≤ 2392 show that the running time scales linearly and deviates from the optimum by less than 9 % Self-Organizing Maps (SOM) 19

  21. SOM for Combinatorial Optimization Self-Organizing Maps (SOM) 20

  22. 10 neurons 50 neurons 500 neurons 2000 neurons

  23. SOM for Combinatorial Optimization Tour with 2526 neurons: Self-Organizing Maps (SOM) 21

  24. 5 Layered Feed-Forward Nets (MLP) 1 2 3 Layered Feed-Forward Nets (MLP) 22

  25. Formalization A L -layered MLP (multi-layered perceptron) Layer: S 0 , S 1 , . . . , S L − 1 , S L Connection: Of each neuron i in S ℓ to j in S ℓ +1 with weight w ij , exept 1-neurons Update: layer-wise synchronously mixed �� � x ′ j := ϕ i ∈V ( j ) x i w ij with ϕ differenciable, 1 ϕ ( a ) = σ ( a ) = z.B. 1+exp( − a ) -5 5 Layered Feed-Forward Nets (MLP) 23

  26. Layered Feed-Forward Nets Applications: function approximation, classification Theorem: All Boolean functions can be computed with a 2-layered MLP (no proof) Theorem: continuous real functions and their derivatives can be jointly approximated to an arbitrary precision on a compact sets (no proof) Layered Feed-Forward Nets (MLP) 24

  27. Learning Parameters in MLP Given: x 1 , . . . , x N ∈ I R d und t 1 , . . . , t N ∈ I R c , MLP with d input and c output neurons, w = w 1 , . . . , w M contains all weights, f ( x , w ) is the net function find optimal w ∗ , that minimizes the error Task: N c E ( w ) := 1 � � � 2 � f k ( x n , w ) − t n k 2 n =1 k =1 partial derivatives of f exists with respect to the inputs and the parameters ⇒ any gradient-based optimization methods can be used (conjugated gradient, . . . ) N c � ∇ w f k ( x n , w ) � � � f k ( x n , w ) − t n ∇ w E ( w ) = k n =1 k =1 Layered Feed-Forward Nets (MLP) 25

  28. Backpropagation Basic Calculus: � � � � � � � ∂ ∂ ∂ � � � ∂t f ( g ( t )) = ∂s f ( s ) ∂t g ( t ) � � � � t = t 0 � s = g ( t 0 ) � t = t 0 Example: ϕ ( a ) := 9 − a 2 , x = (1 , 2) T , w = (1 , 1) T , t = 2 : x 1 ∗ f E ϕ + . 2 − w 1 2 x 2 ∗ t w 2 Layered Feed-Forward Nets (MLP) 26

  29. ∇ w E ( w ) | w =(1 , 1) T = h ( x, y ) = x ∗ y ⇒ ∂/∂x h ( x, y ) = y h ( x, y ) = x + y ⇒ ∂/∂x h ( x, y ) = 1 h ( x, y ) = x − y ⇒ ∂/∂x h ( x, y ) = 1 = 9 − x 2 ⇒ ∂/∂x ϕ ( x ) ϕ ( x ) = − 2 x = x 2 / 2 h ( x ) ⇒ ∂/∂x h ( x ) = x

  30. Backpropagation Theorem: ∇ w E ( w ) can be computed in time O ( N × M ) if network is of size O ( M ) Algorithm: ∀ n ∈ { 1 , . . . , N } • compute net functions f ( x n , w ) and associated error E in forward direction store values in net • compute partial derivatives of E with respect to all intermediates in backward direction and add all parts for total gradient Layered Feed-Forward Nets (MLP) 27

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend