Optimal Learning Rate What is the optimal value opt of the learning - PowerPoint PPT Presentation

Optimal Learning Rate • What is the optimal value η opt of the learning rate? Consider 1 -dim. case. Use first-order Taylor expansion around current weight w c E ( w ) = E ( w c ) + ( w − w c ) ∂E ( w c ) . ∂w Differentiating both sides with respect to w gives: + ( w − w c ) ∂ 2 E ( w c ) ∂E ( w ) = ∂E ( w c ) ∂w 2 ∂w ∂w ∂E ( w min ) Setting w = w min and noting that = 0 , one obtains ∂w + ( w min − w c ) + ∂ 2 E ( w c ) 0 = ∂E ( w c ) ∂w 2 ∂w – p. 132

Optimal Learning Rate (cont.) � − 1 � ∂ 2 E ( w c ) ∂E ( w c ) w min = w c − ∂w 2 ∂w � �� η opt E ( w ) E ( w ) η < η opt η = η opt w w w min w min – p. 133

Hopfield Network Introductory Example recalled by the memory • Suppose we want to store N binary images in some memory. • The memory should be content-addressable and insensitive to small errors. • We present corrupted images to the memory (e.g. our brain) and recall the corresponding images. presentation of corrupted images – p. 134

Hopfield Network S 5 • w ij denotes weight S 4 connection from unit j w 51 = w 15 to unit i • no unit has connection S 1 with itself w ii = 0 , ∀ i • connections are sym- S 3 metric w ij = w ji , ∀ i, j S 2 State of unit i can take values ± 1 and is denoted as S i . State dynamics are governed by activity rule:   � if a ≥ 0 , +1 �  , where sgn ( a ) = S i = sgn w ij S j if a < 0 − 1 j – p. 135

Learning Rule in a Hopfield Network Learning in Hopfield networks: • Store a set of desired memories { x ( n ) } in the network, where each memory is a binary pattern with x i ∈ {− 1 , +1 } . • The weights are set using the sum of outer products w ij = 1 � x ( n ) x ( n ) j , i N n where N denotes the number of units ( N can also be some positive constant, e.g. number of patterns). Given a m × 1 column vector a and 1 × n row vector b . The outer product a ⊗ b (short a b ) is defined as the m × n matrix     a 1 a 1 b 1 a 1 b 2 a 1 b 3      ⊗ [ b 1 b 2 b 3 ] = m = n = 3  , a 2 a 2 b 1 a 2 b 2 a 2 b 3   a 3 a 3 b 1 a 3 b 2 a 3 b 3 – p. 136

Learning in Hopfield Network (Example) Suppose we want to store patterns x (1) = [ − 1 , +1 , − 1] and x (2) = [+1 , − 1 , +1] .     +1 − 1 +1 − 1  ⊗ [ − 1 , +1 , − 1]   = − 1 +1 − 1  +1   − 1 +1 − 1 +1 +     +1 − 1 +1 +1  ⊗ [+1 , − 1 , +1]   = − 1 +1 − 1  − 1   +1 +1 − 1 +1 – p. 137

Learning in Hopfield Network (Example) (cont.)   0 − 2 +2 W = 1   − 2 0 − 2   3 +2 − 2 0 Recall: no unit has connection with itself. The storage of patterns in the network can also be interpreted as constructing stable states. The condition for patterns to be stable is:   �  = x i , ∀ i. sgn w ij x i j Suppose we present pattern x (1) to the network and want to restore the corresponding pattern. – p. 138

Learning in Hopfield Network (Example) (cont.) Let us assume that the network states are set as follows: S i = x i , ∀ i . We can restore pattern x (1) = [ − 1 , +1 , − 1] as follows:     3 3 � �  = +1 S 1 = sgn = − 1 S 2 = sgn w 1 j S j w 2 j S j    j =1 j =1   3 � S 3 = sgn = − 1 w 3 j S j   j =1 Can we also restore the original patterns by presenting “similar”patterns which are corrupted by noise? – p. 139

Updating States in a Hopfield Network Synchronous updates: �� • all units update their states S i = sgn j w ij S j simultaneously. Asynchronous updates: • one unit at a time updates its state. The sequence of selected units may be a fixed sequence or a random sequence. Synchronously updating states can lead to oscillation (no convergence to a stable state). 1 S 1 = +1 S 2 = − 1 1 – p. 140

Aim of a Hopfield Network Our aim is that by presenting a corrupted pattern, and by ap- plying iteratively the state update rule the Hopfield network will settle down in a stable state which corresponds to the desired pattern. Hopfield network is a method for • pattern completion • error correction. The state of a Hopfield network can be expressed in terms of the energy function E = − 1 � w ij S i S j 2 i,j Hopfield observed that if a state is a local minimum in the energy function, it is also a stable state for the network. – p. 141

Basin of Attraction and Stable States �� basin of attraction stable states Within the space the stored patterns x ( n ) are acting like attractors. – p. 142

Haykin’s Digit Example Suppose we stored the following digits in the Hopfield network: Energy = −67.73 Energy = −67.87 Energy = −82.33 Energy = −86.6 Energy = −77.73 Pattern 0 Pattern 1 Pattern 2 Pattern 3 Pattern 4 Energy = −90.47 Energy = −83.13 Energy = −66.93 Pattern 6 Pattern 9 Pattern box – p. 143

Updated States of Corrupted Digit 6 Energy = −10.27 Energy = −12.2 Energy = −13.6 Energy = −14.87 Energy = −15.87 Start Pattern updated unit 40 updated unit 39 updated unit 81 updated unit 98 Energy = −18.07 Energy = −20.4 Energy = −22.2 Energy = −23.33 Energy = −25.73 updated unit 80 updated unit 12 updated unit 114 updated unit 115 updated unit 49 Energy = −26.8 Energy = −29.67 Energy = −30.13 Energy = −31.47 Energy = −34.4 updated unit 117 updated unit 3 updated unit 48 updated unit 6 updated unit 79 – p. 144

Updated States of Corrupted Digit 6 (cont.) Energy = −36.73 Energy = −38.4 Energy = −41.07 Energy = −42.4 Energy = −45.27 updated unit 113 updated unit 57 updated unit 103 updated unit 18 updated unit 109 Energy = −47.6 Energy = −50.4 Energy = −52.67 Energy = −56.47 Energy = −58.4 updated unit 83 updated unit 71 updated unit 77 updated unit 26 updated unit 15 Energy = −60.67 Energy = −63.33 Energy = −64.47 Energy = −68 Energy = −71.27 updated unit 31 updated unit 58 updated unit 16 updated unit 29 updated unit 88 – p. 145

Updated States of Corrupted Digit 6 (cont.) The resulting pattern (stable state with energy − 90 . 47 ) matches the desired pattern. Energy = −73.73 Energy = −77.27 Energy = −81.47 Energy = −84.27 Energy = −87.33 updated unit 72 updated unit 90 updated unit 19 updated unit 21 updated unit 25 Energy = −90.47 Energy = −90.47 updated unit 73 Original Pattern 6 – p. 146

Recall a Spurious Pattern Energy = −28.27 Energy = −28.27 Energy = −30.27 Energy = −31.93 Energy = −32.8 Start Pattern updated unit 44 updated unit 12 updated unit 64 updated unit 45 Energy = −33.4 Energy = −35.6 Energy = −37.6 Energy = −40 Energy = −42.6 updated unit 98 updated unit 111 updated unit 50 updated unit 81 updated unit 95 Energy = −44.53 Energy = −44.8 Energy = −48.13 Energy = −50.53 Energy = −51.87 updated unit 65 updated unit 15 updated unit 54 updated unit 62 updated unit 33 – p. 147

Recall a Spurious Pattern (cont.) Energy = −53.73 Energy = −56.53 Energy = −59.93 Energy = −61.6 Energy = −63.2 updated unit 37 updated unit 91 updated unit 58 updated unit 84 updated unit 43 Energy = −63.73 Energy = −66.8 Energy = −67.6 Energy = −69 Energy = −70.4 updated unit 28 updated unit 112 updated unit 48 updated unit 88 updated unit 26 Energy = −71.93 Energy = −74.13 Energy = −76.6 Energy = −80.27 Energy = −81.4 updated unit 73 updated unit 70 updated unit 40 updated unit 117 updated unit 106 – p. 148

Recall a Spurious Pattern (cont.) The Hopfield network settled down in local minima with energy − 84 . 93 . This pattern however is not the desired pattern. It is a pattern which was not stored in the network. Energy = −84.8 Energy = −84.93 Energy = −83.13 updated unit 61 updated unit 15 Original Pattern 9 – p. 149

Incorrect Recall of Corrupted Pattern 2 Energy = −22.07 Energy = −22.07 Energy = −22.13 Energy = −22.33 Energy = −24.13 Start Pattern updated unit 97 updated unit 17 updated unit 58 updated unit 45 Energy = −24.53 Energy = −27.6 Energy = −28.33 Energy = −29.87 Energy = −31.47 updated unit 18 updated unit 100 updated unit 7 updated unit 103 updated unit 81 Energy = −32.13 Energy = −32.33 Energy = −35.47 Energy = −36.53 Energy = −38.67 updated unit 68 updated unit 86 updated unit 119 updated unit 33 updated unit 87 – p. 150

Incorrect Recall of Corrupted Pattern 2 (cont.) Energy = −39.2 Energy = −41.73 Energy = −45.47 Energy = −48 Energy = −49.6 updated unit 57 updated unit 73 updated unit 120 updated unit 104 updated unit 43 Energy = −51.6 Energy = −51.67 Energy = −55.6 Energy = −56.4 Energy = −58.27 updated unit 91 updated unit 37 updated unit 3 updated unit 31 updated unit 24 Energy = −60.73 Energy = −61.87 Energy = −62.87 Energy = −64.8 Energy = −68.93 updated unit 101 updated unit 41 updated unit 117 updated unit 65 updated unit 10 – p. 151

Optimal Learning Rate What is the optimal value opt of the learning - PowerPoint PPT Presentation

Optimal Learning Rate What is the optimal value opt of the learning rate? Consider 1 -dim. case. Use first-order Taylor expansion around current weight w c E ( w ) = E ( w c ) + ( w w c ) E ( w c ) . w Differentiating both sides

Labor Classification Yrs Rate 1 Rate 2 Rate 3 Rate 4 Rate 5 Rate 6 Rate 7 Rate 8 Rate 9

Variable Rate Debt Options: Auction Rate Securities Auction Rate Securities What are Auction Rate

Optimal Agents Nick Hay 27th September 2005 1 / 36 Nick Hay Optimal Agents The Optimal Agent

Toward Computing Towards an Optimal . . . An (Almost) Optimal . . . Minor Problem an Optimal

27 MARCH 2014 27 MARCH 2012 1 BITR and BDTI Rate evolution BITR Rate Evolution (ws) BDTI Rate

Rate Proceeding November 5, 2019 Chehalis Agenda Whats Driving the Rate Increase?

Interest Rate Swap and Interest Rate Swap and Variable Rate Debt Programs Variable Rate Debt

Rate run 9611 Dante Totani Flavio Cavanna Rate single cell (ch 133) NO CUT Rate regions A ~ 71

Inverse problems and control optimal in non-linear mechanics C. Stolz 1 2 Introduction

Martingale Optimal Transport in Higher Hadrien De March Dimension Optimal transport

E-rate Schools and Libraries Program Lane ESD 9/20/2018 E-rate in Oregon State E-rate

Study Objectives 1. Rate Structure Review 2. Rate Setting and Financial Analysis 3. Rate Results

Proposed Tax Rate for FY2014-15 June 17, 2014 Tax Rate Composition Total Tax Rate Consists of

E-Rate Modernization The New World of E-RATE PREPARI NG FOR FY2015 November 17, 2014 ESC Region

SPOT RATE THE NORMAL RATE QUOTED IN THE FOREIGN EXCHANGE MARKET IS THE SPOT RATE. THIS

Queuing Theory Equations Definition = Arrival Rate = Service Rate = / C = Number

Neurodynamic Optimization: New Models and kWTA Applications Jun Wang jwang@mae.cuhk.edu.hk

The 3-pound universe we live in Cerebrum/Cerebral Cortex Thalamus Hypothalamus Pons

Neural Networks Find a way to teach networks to do a certain computation (e.g. ICA) Network

ISIT 2020 Signal and Information Processing Laboratory Institut fr Signal- und

ECE 6504: Deep Learning for Perception Topics: Recurrent Neural Networks (RNNs) BackProp

Neural Networks, Chapter 11 in ESL II STK-IN4300 Statistical Learning Methods in Data Science

Quantum neurons Yudong Cao with Gian Giacomo Guerreschi, Aln Aspuru-Guzik Quantum Techniques

Unsupervised Learning Gustavo Velasco-Hern andez Pattern Recognition, 2014 Gustavo