Propagating Error Backward A Back-Propagation Example /* Propagate - PowerPoint PPT Presentation

Learning in Neural Networks w 1,3 w 3,5 1 3 5 w w 1,4 3,6 w w 2,3 4,5 2 4 6 w w 2,4 4,6 Class #21: Back-Propagation; } A neural network can learn a classification function by Tuning Hyper-Parameters adjusting its weights to compute different responses } This process is another version of gradient descent: the Machine Learning (COMP 135): M. Allen, 06 Apr. 20 algorithm moves through a complex space of partial solutions, always seeking to minimize overall error 2 Monday, 6 Apr. 2020 Machine Learning (COMP 135) 1 2 Source: Russel & Norvig, Back-Propagation (Hinton, et al.) Propagating Output Values Forward AI: A Modern Approach (Prentice Hal, 2010) for each example x y in do function B ACK -P ROP -L EARNING ( examples , network ) returns a neural network /* Propagate the inputs forward to compute the outputs */ inputs : examples , a set of examples, each with input vector x and output vector y network , a multilayer network with L layers, weights w i,j , activation function g for each node i in the input layer do local variables : ∆ , a vector of errors, indexed by network node At first (“top”) layer, each Initial random weights a i ← x i repeat for each weight w i,j in network do for ℓ = 2 to L do neuron input is set to the w i,j ← a small random number for each node j in layer ℓ do for each example ( x , y ) in examples do corresponding feature value /* Propagate the inputs forward to compute the outputs */ in j ← P i w i,j a i for each node i in the input layer do a i ← x i a j ← g ( in j ) for ℓ = 2 to L do Loop over all training for each node j in layer ℓ do /* Propagate deltas backward from output layer to input laye in j ← P i w i,j a i examples, generating the a j ← g ( in j ) /* Propagate deltas backward from output layer to input layer */ output, and then updating for each node j in the output layer do weights based on error Go down layer-by-layer, ∆ [ j ] ← g ′ ( in j ) × ( y j − a j ) for ℓ = L − 1 to 1 do calculating weighted input sums for each node i in layer ℓ do ∆ [ i ] ← g ′ ( in i ) P j w i,j ∆ [ j ] for each neuron, and computing /* Update every weight in network using deltas */ for each weight w i,j in network do output function g Stop when weights converge w i,j ← w i,j + α × a i × ∆ [ j ] until some stopping criterion is satisfied or error is minimized return network 4 Monday, 6 Apr. 2020 Machine Learning (COMP 135) 3 Monday, 6 Apr. 2020 Machine Learning (COMP 135) 3 4 1

<latexit sha1_base64="1QPsF535z/j1dg9D/2yKjswG8Zg=">ACKHicZVBdT8IwFO38RPya+uhLI9FANGRDjPpgQuSFR0zkI2Fk6boOGrp1aTsSQvhB+mPUN+XVX2IHe0Fu0vTk9Nzbe4XMyqVZc2Njc2t7Z3d3F5+/+Dw6Ng8OW1LnghMWpgzLroekoTRiLQUVYx0Y0FQ6DHS8Ub19L0zJkJSHr2qSUz6IRpENKAYKU25Zt2RSehOaeQ2ZvDqCVplG17DYno5ioZEauautEZVS9Bx8kv9o2sWrLK1KLgO7AwUQFZN1/xwfI6TkEQKMyRlz7Zi1Z8ioShmZJZ3EklihEdoQKYLizN4qSkfBlzoEym4YFd0EVcLSyvdvUQFD3tLk4UifByTJAwqDhM04A+FQrNtEAYUH1/xAPkUBY6cxWJomEf8GjtOgfb0rG3CtH4YVva8OwP5vdx20K2X7tlx5qRZqz1kUOXAOLkAR2OAe1EADNELYPAGPsEPmBvxpfxbcyX0g0j6zkDK2X8/gGxU575</latexit> <latexit sha1_base64="f6x0ZdNxyc/Sc0b4u/EyA3uIls=">ACnicZVDLSsNAFJ34rPUVdelmaCmkGEJSFd0IRTcuK9gHtKVMJpN26CQTJpNiCf0CXfojutNuXbv3zh9bGIPDBzO3Me5x40YjaVtz7SNza3tnd3cXn7/4PDoWD85bcQ8EZjUMWdctFwUE0ZDUpdUMtKBEGBy0jTHT7M/5sjImLKw2c5jkg3QP2Q+hQjqaSeXjI6AZID109fJiYcl+EdNAzbujahbV2VTWg4pl0u9/SibdkLwHXirEixWuhcvM+q41pP/+14HCcBCSVmKI7bjh3JboqEpJiRSb6TxCRCeIj6JF0cMYElJXnQ50K9UMKFmqkLuVyYznS3E+nfdlMaRokIV6O8RMGJYfze6FHBcGSjRVBWFC1H+IBEghLlUpmkgY8Uw4mkfpKa+sz1X9IKgovyoA5/+56RsZxLq/KkrgHS+TAOSgAzjgBlTBI6iBOsDgFXyAbzDV3rRP7UubLks3tFXPGchA+/kDo/qYyg=</latexit> <latexit sha1_base64="f6x0ZdNxyc/Sc0b4u/EyA3uIls=">ACnicZVDLSsNAFJ34rPUVdelmaCmkGEJSFd0IRTcuK9gHtKVMJpN26CQTJpNiCf0CXfojutNuXbv3zh9bGIPDBzO3Me5x40YjaVtz7SNza3tnd3cXn7/4PDoWD85bcQ8EZjUMWdctFwUE0ZDUpdUMtKBEGBy0jTHT7M/5sjImLKw2c5jkg3QP2Q+hQjqaSeXjI6AZID109fJiYcl+EdNAzbujahbV2VTWg4pl0u9/SibdkLwHXirEixWuhcvM+q41pP/+14HCcBCSVmKI7bjh3JboqEpJiRSb6TxCRCeIj6JF0cMYElJXnQ50K9UMKFmqkLuVyYznS3E+nfdlMaRokIV6O8RMGJYfze6FHBcGSjRVBWFC1H+IBEghLlUpmkgY8Uw4mkfpKa+sz1X9IKgovyoA5/+56RsZxLq/KkrgHS+TAOSgAzjgBlTBI6iBOsDgFXyAbzDV3rRP7UubLks3tFXPGchA+/kDo/qYyg=</latexit> <latexit sha1_base64="f6x0ZdNxyc/Sc0b4u/EyA3uIls=">ACnicZVDLSsNAFJ34rPUVdelmaCmkGEJSFd0IRTcuK9gHtKVMJpN26CQTJpNiCf0CXfojutNuXbv3zh9bGIPDBzO3Me5x40YjaVtz7SNza3tnd3cXn7/4PDoWD85bcQ8EZjUMWdctFwUE0ZDUpdUMtKBEGBy0jTHT7M/5sjImLKw2c5jkg3QP2Q+hQjqaSeXjI6AZID109fJiYcl+EdNAzbujahbV2VTWg4pl0u9/SibdkLwHXirEixWuhcvM+q41pP/+14HCcBCSVmKI7bjh3JboqEpJiRSb6TxCRCeIj6JF0cMYElJXnQ50K9UMKFmqkLuVyYznS3E+nfdlMaRokIV6O8RMGJYfze6FHBcGSjRVBWFC1H+IBEghLlUpmkgY8Uw4mkfpKa+sz1X9IKgovyoA5/+56RsZxLq/KkrgHS+TAOSgAzjgBlTBI6iBOsDgFXyAbzDV3rRP7UubLks3tFXPGchA+/kDo/qYyg=</latexit> <latexit sha1_base64="E8stiyCyrzSMsDX3V9smHqHOBro=">ACSXicZVFNSwMxEM3W7/pV9eglWBRFLbtVqR6EohdvKlgV3Lqk2dk2mN0sSbYoy/5Be/+C71pT2a3vVQHQh4z701mXjoxZ0rb9odVmpicmp6ZnSvPLywuLVdWVm+VSCSFhVcyPsOUcBZBC3NIf7WAIJOxzuOk/nef2uD1IxEd3olxjaIelGLGCUaJPyKr6rktBLWeRdZnjrFNs1B+/i7fxyNQtBmczRYWMHFyWDsOuWiZcO2W4gCU2dLM1F8JjuDzlZhl0Sx1I85+qDE69StWt2Efg/cEagikZx5VXeXV/QJIRIU06UenDsWLdTIjWjHLKymyiICX0iXUgLDzK8aVI+DoQ0J9K4yI7xIqGLncfUD4kOjtm/TjRENFhmyDhWAuc24V9JoFq/mIAoZKZ9zHtEbO1NqaOdZIJB38P9/Of8M2svCsMvxfWzbzGAOfvuv/Bb3mHNTq14fV5tnIilm0jbQNnJQAzXRBbpCLUTRK/pEP2hgvVlf1rc1GFJL1kizhsaiNPEL2guqJw=</latexit> <latexit sha1_base64="YWPhxb0duxzHhHpD5p1d+ab0tE=">ACEnicZVDNSwJBHJ21L7OvrY5dhiQIymVXC4sIpC4eDdIENRlnZ3VwdmeZnZVkWeh/qH+mbuU16hwE/SuNqxfzwcDjze/rvY7PaCBN81tLSwuLa+kVzNr6xubW/r2Ti3gocCkijnjot5BAWHUI1VJSN1XxDkdhi56/Svx/93AyICyr1bOfRJy0VdjzoUI6mktp5D7agcw0vYdATCkRVHFjyC5D7KmYZ1HsewiXxf8AdoGqcnxbaeNQ0zAZwn1pRkSxe/X48/BVFp659Nm+PQJZ7EDAVBwzJ92YqQkBQzEmeaYUB8hPuoS6LETAwPlGRDhwv1PAkTdabO4zI5fqa7EUrnrBVRzw8l8fBkjBMyKDkc+4Y2FQRLNlQEYUHVfoh7SHmWKp2ZSJkxD6Gg3GktrqVdbmq7l5da8KwPpvd57U8oZVMPI3KokrMEa7IF9cAgsUAQlUAYVUAUYPIEX8A5G2rP2qr1po0lpSpv27IZaB9/VD2fPQ=</latexit> Propagating Error Backward A Back-Propagation Example /* Propagate deltas backward from output layer to input layer */ } Consider the following I1 I2 bias for each node j in the output layer do simple network, with: At output (“bottom”) 1 ∆ [ j ] ← g ′ ( in j ) × ( y j − a j ) 0.1 T wo inputs 0.1 layer, each delta-value is 1. for ℓ = L − 1 to 1 do set to the error on that for each node i in layer ℓ do A single hidden layer, 2. 0.1 neuron, multiplied by the ∆ [ i ] ← g ′ ( in i ) P j w i,j ∆ [ j ] consisting of one neuron H derivative of function g /* Update every weight in network using deltas */ T wo output neurons 3. for each weight w i,j in network do 1 0.1 0.1 1 Initial weights as shown w i,j ← w i,j + α × a i × ∆ [ j ] 4. Go bottom-up and set 0.1 until some stopping criterion is satisfied 0.1 delta to derivative value } Suppose we have the multiplied by sum of O1 O2 After all the delta values are computed, following data-point: deltas at the next layer update weights on every node in the down (weighting each ( x , y ) = ((0 . 5 , 0 . 4) , (1 , 0)) network such value appropriately) 6 Monday, 6 Apr. 2020 Machine Learning (COMP 135) 5 Monday, 6 Apr. 2020 Machine Learning (COMP 135) 5 6 A Back-Propagation Example A Back-Propagation Example 0.5 0.4 0.5 0.4 ( x , y ) = ((0 . 5 , 0 . 4) , (1 , 0)) ( x , y ) = ((0 . 5 , 0 . 4) , (1 , 0)) bias bias 1 1 } For this data, we start by } Next, we compute the output of 0.1 0.1 0.1 0.1 each of the two output neurons computing the output of H 0.1 0.1 } Since each has identical weights, } We have the weighted linear sum: initial outputs will be the same: a H = 0.547 H a H = 0.547 H X = 0 . 1 + (0 . 1 × 0 . 5) + (0 . 1 × 0 . 4) X = 0 . 1 + (0 . 1 × 0 . 547) = 0 . 1547 1 1 1 0.1 0.1 1 in H 0.1 0.1 in O = 0 . 19 0.1 0.1 1 a O = 1 + e − 0 . 1547 ≈ 0 . 539 0.1 0.1 } And, assuming the logistic O1 O1 O2 O2 activation function, we get output: 1 a 01 = 0.539 a 02 = 0.539 a H = 1 + e − 0 . 19 ≈ 0 . 547 8 Monday, 6 Apr. 2020 Machine Learning (COMP 135) 7 Monday, 6 Apr. 2020 Machine Learning (COMP 135) 7 8 2

Propagating Error Backward A Back-Propagation Example /* Propagate - PowerPoint PPT Presentation

Learning in Neural Networks w 1,3 w 3,5 1 3 5 w w 1,4 3,6 w w 2,3 4,5 2 4 6 w w 2,4 4,6 Class #21: Back-Propagation; } A neural network can learn a classification function by Tuning Hyper-Parameters adjusting its weights to

Propagating Range Propagating Range . . . (Uncertainty) and Importance of . . . Propagating . .

FORWARD www.revekoll.com 2 BACKWARD FORWARD www.revekoll.com 3 BACKWARD FORWARD 5 10 16

Separating propagating and non-propagating dynamics in fluid-flow equations Samuel Sinayoko, A.

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Timeless elegance... FORWARD www.revekoll.com 2 BACKWARD FORWARD www.revekoll.com 3

Die Hard 1.1024.0: Die Hard 1.1024.0: Backward compatibility of a Backward compatibility of a

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Optimally Propagating SAT Encodings Martin Brain, Liana Hadarean , Ruben Martins and Daniel

A Backward dual representation for the quantile hedging of Bermudan options G eraldine

Lecture 8 Backward Induction 14.12 Game Theory Muhamet Yildiz 1 Road Map 1. Backward Induction

Using first order logic (Ch. 9) Backward chaining Backward chaining is almost the opposite of

Human Error and Human Error Identification Techniques adapted from an IE 545 presentaton by

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS

Questions From Chapter 1 Figure 1.1: Testing life cycle Ch 12 Error vocabulary 1

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

llvm::Error Rich Error Handling in LLVM Error Handling History LLVMs APIs historically

A High-Granularity Timing Detector for the Phase-II upgrade of the ATLAS Calorimeter system

Class 9: Conservation of Energy Class 9: Conservation of Energy Potential Energy on Earth Surface

The Phase 2 Upgrade of the LHCb Calorimeter system. Yu. Guz (IHEP Protvino) on behalf of the

April 19, Week 13 Today: Chapter 13, Gravity Homework Assignment #10 - Due Today. Mastering

Hamiltonian equa-ons of mo-on and the extra conserved quan-ty for the Lanchester-Joukowski

Geometric Numerical Integration: Overview Hamiltonian Systems, Symplectic Transformations

Pico-second resolution timing measurements Chiara Nociforo GSI Helmholtzzentrum fr

Detector Challenges in Photon Science. Heinz Graafsma DESY-Hamburg; Germany & University

Propagating Error Backward A Back-Propagation Example /* Propagate - PowerPoint PPT Presentation

Learning in Neural Networks w 1,3 w 3,5 1 3 5 w w 1,4 3,6 w w 2,3 4,5 2 4 6 w w 2,4 4,6 Class #21: Back-Propagation; } A neural network can learn a classification function by Tuning Hyper-Parameters adjusting its weights to

Propagating Range Propagating Range . . . (Uncertainty) and Importance of . . . Propagating . .

FORWARD www.revekoll.com 2 BACKWARD FORWARD www.revekoll.com 3 BACKWARD FORWARD 5 10 16

Separating propagating and non-propagating dynamics in fluid-flow equations Samuel Sinayoko, A.

Chapter 11: The R.M.S. Error for Regression Errors: A has a large positive error B has a large

Timeless elegance... FORWARD www.revekoll.com 2 BACKWARD FORWARD www.revekoll.com 3

Die Hard 1.1024.0: Die Hard 1.1024.0: Backward compatibility of a Backward compatibility of a

ERROR DETECTON &amp; CORRECTION Error Detection EDC= Error Detection and Correction bits

Optimally Propagating SAT Encodings Martin Brain, Liana Hadarean , Ruben Martins and Daniel

A Backward dual representation for the quantile hedging of Bermudan options G eraldine

Lecture 8 Backward Induction 14.12 Game Theory Muhamet Yildiz 1 Road Map 1. Backward Induction

Using first order logic (Ch. 9) Backward chaining Backward chaining is almost the opposite of

Human Error and Human Error Identification Techniques adapted from an IE 545 presentaton by

An Overview of Human Error Drawn f rom J . Reason, Human Error , Cambridge, 1990 Aaron Brown CS

Questions From Chapter 1 Figure 1.1: Testing life cycle Ch 12 Error vocabulary 1

Error Detection Codes Error Detection Two types Nave scheme Error Detection Codes

llvm::Error Rich Error Handling in LLVM Error Handling History LLVMs APIs historically

A High-Granularity Timing Detector for the Phase-II upgrade of the ATLAS Calorimeter system

Class 9: Conservation of Energy Class 9: Conservation of Energy Potential Energy on Earth Surface

The Phase 2 Upgrade of the LHCb Calorimeter system. Yu. Guz (IHEP Protvino) on behalf of the

April 19, Week 13 Today: Chapter 13, Gravity Homework Assignment #10 - Due Today. Mastering

Hamiltonian equa-ons of mo-on and the extra conserved quan-ty for the Lanchester-Joukowski

Geometric Numerical Integration: Overview Hamiltonian Systems, Symplectic Transformations

Pico-second resolution timing measurements Chiara Nociforo GSI Helmholtzzentrum fr

Detector Challenges in Photon Science. Heinz Graafsma DESY-Hamburg; Germany &amp; University

ERROR DETECTON & CORRECTION Error Detection EDC= Error Detection and Correction bits

Detector Challenges in Photon Science. Heinz Graafsma DESY-Hamburg; Germany & University