Multi-Layer vs. Single-Layer Networks Single-layer networks based - PowerPoint PPT Presentation

Multi-Layer vs. Single-Layer Networks Single-layer networks • based on a linear combination of the input variables which is transformed by linear/non-linear activation function • are limited in terms of the range of functions they can represent Multi-layer networks • consist of multiple layers and are capable of approximating any continuous functional mapping • are compared to single-layer networks not so straightforward to train – p. 82

Multi-Layer Network outputs y 1 y K second layer, (weights v kj ) bias hidden units z 0 = 1 z 1 z M first layer, (weights w ji ) bias x 0 = 1 x 1 x d inputs Connection in first layer from input unit i to hidden unit j is denoted as w ji . Connection from hidden unit j to output unit k is denoted as v kj . – p. 83

Multi-Layer Network (cont.) Hidden unit j receives input d d � � a j = w ji x i + w j 0 = w ji x i i =1 i =0 and produces output � d � � z j = g ( a j ) = g w ji x i . i =0 Output unit k thus receives M M � � a k = v kj z j + v k 0 = v kj z j j =1 j =0 – p. 84

Multi-Layer Network (cont.) and produces the final output    � � d M M � � �  = g y k = g ( a k ) = g v kj z j v kj g w ji x i    j =0 j =0 i =0 Note that the activation function g ( · ) in the first layer can be different from those in the second layer (or other layers). – p. 85

Multi-Layer Networks Example y 1 y 2 v 13 v 23 v 10 v 20 bias z 0 = 1 z 1 z 2 z 3 w 10 w 20 w 22 w 32 w 12 w 30 bias x 0 = 1 x 1 x 2 Note: sometimes the layers of units are counted (here three layers), rather the layers of adaptive weights. In this course L -layer network is referred to a network with L layers of adaptive weights. – p. 86

LMS Learning Rule for Multi-Layer Networks • We have seen that the LMS learning rule is based on the gradient descent algorithm. • The LMS learning rule worked because the error is proportional to the square difference between actual output y and target output t and can be evaluated for each output unit. • In a multi-layer network we can use LMS learning rule on the hidden-to-output layer weights because target outputs are known. Problem : we cannot compute the target outputs of the input-to-hidden weights because these values are unknown, or, to put it the other way around, how to update the weights in the first layer? – p. 87

Backpropagation (Hidden-to-Output Layer) Recall that we want to minimize the error on training patterns between actual output y k and target output t k : K E = 1 � ( y k − t k ) 2 . 2 k =1 Backpropagation learning rule is based on gradient descent: ∆ w = − η ∂E ∂ w , component form ∆ w st = − η ∂E ∂w st Apply chain rule for differentiation: ∂E = ∂E ∂a k ∂v kj ∂a k ∂v kj – p. 88

Backprop. (Hidden-to-Output Layer) (cont.) Gradient descent rule gives: − η ∂E = − η ( y k − t k ) g ′ ( a k ) z j ∆ v kj = ∂v kj = − ηδ k z j where δ k = ( y k − t k ) g ′ ( a k ) . Observe that this result is identical to that obtained for LMS. – p. 89

Backpropagation (Input-to-Hidden Layer) For the input-to-hidden connection we must differentiate with respect to the w ji ’s which are deeply embedded in   �  2 � d K M E = 1 � � �  − t k  g v kj g w ji x i   2 j =0 i =0 k =1 Apply chain rule: − η ∂E = − η ∂E ∂z j ∂a j ∆ w ji = ∂w ji ∂z j ∂a j ∂w ji K � ( y k − t k ) g ′ ( a k ) v kj g ′ ( a j ) x i = − η � �� k =1 δ k K � δ k v kj g ′ ( a j ) x i = − η – p. 90 k =1

Backprop. (Input-to-Hidden Layer) (cont.) ∆ w ji = − ηδ j x i where K � δ j = g ′ ( a j ) v kj δ k k =1 Observe: that we need to propagate the errors ( δ ’s) backwards to update the weights v and w ∆ v kj = − ηδ k z j ( y k − t k ) g ′ ( a k ) δ k = ∆ w ji = − ηδ j x i K � g ′ ( a j ) = δ j v kj δ k k =1 – p. 91

Error Backpropagation • Apply input x and forward propagate through the network using a j = � d i =0 w ji x i and z j = g ( a j ) to find the activations of all the hidden and output units. • Compute the deltas δ k for all the output units using δ k = ( y k − t k ) g ′ ( a k ) . • Backpropagate the δ ’s using δ j = g ′ ( a j ) � K k =1 v kj δ k to obtain δ j for each hidden unit in the network. Time and space complexity: d input units, M hidden units and K output units results in M ( d + 1) weights in first layer and K ( M + 1) weights in second layer. Space and time complexity is O ( M ( K + d )) . If e training epochs are performed, then time complexity is O ( e M ( K + d )) . – p. 92

Backprop. (Output-to-Hidden Layer) Vis. y 1 y 2 v new = v 13 − ηδ 1 z 3 δ 1 13 δ 1 δ 1 δ 1 bias z 0 = 1 z 1 z 2 z 3 bias x 0 = 1 x 1 x 2 – p. 93

Backprop. (Hidden-to-Input Layer) Vis. y 1 y 2 δ 1 δ 2 bias z 0 = 1 z 1 z 2 z 3 w new = w 12 − η [ g ′ ( a 1 )( v 11 δ 1 + v 21 δ 2 )] x 2 12 � �� δ j bias x 0 = 1 x 1 x 2 – p. 94

Property of Activation Functions • In the Backpropagation algorithm the derivative of g ( a ) is required to evaluate the δ ’s. • Activation functions 1 and g 1 ( a ) = g 2 ( a ) = tanh( βa ) 1 + exp( − βa ) obey the property g ′ 1 ( a ) = β g 1 ( a )(1 − g 1 ( a )) β (1 − ( g 2 ( a )) 2 ) g ′ 2 ( a ) = – p. 95

Online Backpropagation Algorithm input : ( x 1 , t 1 ) , . . . , ( x N , t N ) ∈ R d × {C 1 , C 2 , . . . , C K } , η ∈ R + , max.epoch ∈ N , ǫ ∈ R + output : w , v begin Randomly initialize w , v epoch ← 0 repeat for n ← 1 to N do x ← select pattern x n v kj ← v kj − ηδ k z j w ji ← w ji − ηδ j x i epoch ← epoch + 1 until ( epoch = max.epoch ) or ( �∇ E � < ǫ ) return w , v end – p. 96

Batch Backpropagation Algorithm input : ( x 1 , t 1 ) , . . . , ( x N , t N ) ∈ R d × {C 1 , C 2 , . . . , C K } , η ∈ R + , max.epoch ∈ N , ǫ ∈ R + output : w , v begin Randomly initialize w , v epoch ← 0 , ∆ w ji ← 0 , ∆ v kj ← 0 repeat for n ← 1 to N do x ← select pattern x n ∆ v kj ← ∆ v kj − ηδ k z j , ∆ w ji ← ∆ w ji − ηδ j x i v kj ← v kj + ∆ v kj w ji ← w ji + ∆ w ji epoch ← epoch + 1 until ( epoch = max.epoch ) or ( �∇ E � < ǫ ) return w , v end – p. 97

Multi-Layer Networks & Heaviside Step Func. �� Possible decision boundaries which can be generated by networks having various numbers of layers and using Heaviside activation function. – p. 98

Multi-Layer NN for XOR Separability Problem y x 1 XOR x 2 x 1 x 2 − 1 0 . 7 − 0 . 4 − 1 − 1 − 1 − 1 +1 +1 − 1 . 5 z 1 z 2 z 0 +1 − 1 +1 0 . 5 1 1 +1 +1 − 1 1 1 x 1 x 2 x 0 � − 1 if a < 0 g ( a ) = +1 if a ≥ 0 – p. 99

Multi-Layer NN for XOR Sep. Problem (cont.) 1 1 0.5 0.5 0 0 -0.5 -0.5 1 1 0.5 0.5 − 1 − 1 − 1 0 0 x2 -0.5 x2 − 1 -0.5 -0.5 0 -0.5 0 x1 x1 0.5 0.5 − 1 1 − 1 1 1 �� 1 �� 0.5 0 − 1 1 1 -0.5 0.5 0 �� − 1 x2 �� − 1 �� − 1 -0.5 -0.5 0 x1 0.5 − 1 1 – p. 100

Expressive Power of Multi-Layer Networks With a two-layer network and a sufficient number of hidden units, any type of function can be represented when given proper nonlinearities and weights. The famous mathematician Andrey Kolmogorov proved that any continuous function y ( x ) defined on the unit hypercube [0 , 1] n , n ≥ 2 can be represented in the form � d � 2 n +1 � � y ( x ) = Ξ j Ψ ij ( x i ) j =1 i =1 for properly chosen Ξ j and Ψ ij . – p. 101

Bayes Decision Region vs. Neural Network 2.5 2.0 1.5 y 1.0 0.5 0.0 0 2 4 6 8 10 x Points from blue and red class are generated by a mixture of Gaussians. Black curve shows optimal separation in a Bayes sense. Gray curve shows neural network separation of two independent backpropagation learning runs. – p. 102

Neural Network (Density) Decision Region – p. 103

Multi-Layer vs. Single-Layer Networks Single-layer networks based - PowerPoint PPT Presentation

Multi-Layer vs. Single-Layer Networks Single-layer networks based on a linear combination of the input variables which is transformed by linear/non-linear activation function are limited in terms of the range of functions they can

Overview Multi-layer networks: Cognitive Modeling limits of single layer networks; Lecture

Multi Multi Multi- Multi - - -Layer Access Control Layer Access Control Layer Access

A multi- -layer layer A multi A multi-layer research and training platform research and

1 Kinds of Networks Feed-forward Single layer Multi-layer Recurrent Kinds of

Network Layer October 2, 2019 guha.jayachandran@sjsu.edu Layer 2: Protocol atop Layer 1

Routing in Multi- -Layer Layer Routing in Multi Transport Networks Transport Networks

Lecture 6: Wireless Link Layer, Lecture 6: Wireless Link Layer, MAC protocols, CSMA MAC

1 Transport Layer Transport Layer Outline Message, Segment, Datagram Transport-layer

ELEC / COMP 177 Fall 2016 Some slides from Kurose and Ross, Computer Networking , 5 th Edition

5 Network Layer Network Layer Network Layer Network Layer Example: Choosing among multiple ASes

Kinds of picture Single frame Kinds of picture Single frame Multi-frame Kinds of

Multi-Source Adjustment of Multi-Layer Annotation: the Bits of Wisdom Approach Kilian Evang 20

Network Layer Understand principles behind network layer services: network layer service

Application Layer CS 3516 Computer Networks CS 3516 Computer Networks 2: Application Layer 2

10 mm Cytoarchitecture and function layer 4: input layer 5: output Motor cortex: expanded layer

Data-link layer Da Data ta-link link layer er Referred to as layer 2 Physical

experiments Kroly Pilth dr. 2 17.08.2015. Physics lessons can be good, if the teacher feels

Constraints on global carbon and heat exchanges from measurements of atmospheric O 2 and related

Evaluation of risks due to therm al stress before physical failure appearance Michael Hertl

Assessment of Major Systems Cooling System S. Michael Modro Joint IAEA-ICTP Essential Knowledge

t srtt r

An Introduction to Neural Networks - Feedforward NN Backpropagation Agathe Merceron Beuth

TeV-PeV CR ANISOTROPY AS TeV-PeV CR ANISOTROPY AS A PROBE OF LOCAL A PROBE OF LOCAL

Improving forecast skill by Improving forecast skill by assimilation of quality-controlled

Sambuz

Useful Links

Newsletter

Mail Us