optimal learning rate
play

Optimal Learning Rate What is the optimal value opt of the learning - PowerPoint PPT Presentation

Optimal Learning Rate What is the optimal value opt of the learning rate? Consider 1 -dim. case. Use first-order Taylor expansion around current weight w c E ( w ) = E ( w c ) + ( w w c ) E ( w c ) . w Differentiating both sides


  1. Optimal Learning Rate • What is the optimal value η opt of the learning rate? Consider 1 -dim. case. Use first-order Taylor expansion around current weight w c E ( w ) = E ( w c ) + ( w − w c ) ∂E ( w c ) . ∂w Differentiating both sides with respect to w gives: + ( w − w c ) ∂ 2 E ( w c ) ∂E ( w ) = ∂E ( w c ) ∂w 2 ∂w ∂w ∂E ( w min ) Setting w = w min and noting that = 0 , one obtains ∂w + ( w min − w c ) + ∂ 2 E ( w c ) 0 = ∂E ( w c ) ∂w 2 ∂w – p. 132

  2. Optimal Learning Rate (cont.) � − 1 � ∂ 2 E ( w c ) ∂E ( w c ) w min = w c − ∂w 2 ∂w � �� � η opt E ( w ) E ( w ) η < η opt η = η opt w w w min w min – p. 133

  3. Hopfield Network Introductory Example recalled by the memory • Suppose we want to store N binary images in some memory. • The memory should be content-addressable and insensitive to small errors. • We present corrupted images to the memory (e.g. our brain) and re- call the corresponding images. presentation of corrupted images – p. 134

  4. Hopfield Network S 5 • w ij denotes weight S 4 connection from unit j w 51 = w 15 to unit i • no unit has connection S 1 with itself w ii = 0 , ∀ i • connections are sym- S 3 metric w ij = w ji , ∀ i, j S 2 State of unit i can take values ± 1 and is denoted as S i . State dynamics are governed by activity rule:   � if a ≥ 0 , +1 �  , where sgn ( a ) = S i = sgn w ij S j if a < 0 − 1 j – p. 135

  5. Learning Rule in a Hopfield Network Learning in Hopfield networks: • Store a set of desired memories { x ( n ) } in the network, where each memory is a binary pattern with x i ∈ {− 1 , +1 } . • The weights are set using the sum of outer products w ij = 1 � x ( n ) x ( n ) j , i N n where N denotes the number of units ( N can also be some positive constant, e.g. number of patterns). Given a m × 1 column vector a and 1 × n row vector b . The outer product a ⊗ b (short a b ) is defined as the m × n matrix     a 1 a 1 b 1 a 1 b 2 a 1 b 3      ⊗ [ b 1 b 2 b 3 ] = m = n = 3  , a 2 a 2 b 1 a 2 b 2 a 2 b 3   a 3 a 3 b 1 a 3 b 2 a 3 b 3 – p. 136

  6. Learning in Hopfield Network (Example) Suppose we want to store patterns x (1) = [ − 1 , +1 , − 1] and x (2) = [+1 , − 1 , +1] .     +1 − 1 +1 − 1  ⊗ [ − 1 , +1 , − 1]   = − 1 +1 − 1  +1   − 1 +1 − 1 +1 +     +1 − 1 +1 +1  ⊗ [+1 , − 1 , +1]   = − 1 +1 − 1  − 1   +1 +1 − 1 +1 – p. 137

  7. Learning in Hopfield Network (Example) (cont.)   0 − 2 +2 W = 1   − 2 0 − 2   3 +2 − 2 0 Recall: no unit has connection with itself. The storage of patterns in the network can also be interpreted as constructing stable states. The condition for patterns to be stable is:   �  = x i , ∀ i. sgn w ij x i j Suppose we present pattern x (1) to the network and want to restore the corresponding pattern. – p. 138

  8. Learning in Hopfield Network (Example) (cont.) Let us assume that the network states are set as follows: S i = x i , ∀ i . We can restore pattern x (1) = [ − 1 , +1 , − 1] as follows:     3 3 � �  = +1 S 1 = sgn = − 1 S 2 = sgn w 1 j S j w 2 j S j    j =1 j =1   3 � S 3 = sgn = − 1 w 3 j S j   j =1 Can we also restore the original patterns by presenting “similar”patterns which are corrupted by noise? – p. 139

  9. Updating States in a Hopfield Network Synchronous updates: �� � • all units update their states S i = sgn j w ij S j simultaneously. Asynchronous updates: • one unit at a time updates its state. The sequence of selected units may be a fixed sequence or a random sequence. Synchronously updating states can lead to oscillation (no convergence to a stable state). 1 S 1 = +1 S 2 = − 1 1 – p. 140

  10. Aim of a Hopfield Network Our aim is that by presenting a corrupted pattern, and by ap- plying iteratively the state update rule the Hopfield network will settle down in a stable state which corresponds to the desired pattern. Hopfield network is a method for • pattern completion • error correction. The state of a Hopfield network can be expressed in terms of the energy function E = − 1 � w ij S i S j 2 i,j Hopfield observed that if a state is a local minimum in the energy function, it is also a stable state for the network. – p. 141

  11. Basin of Attraction and Stable States ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� ��� basin of attraction stable states Within the space the stored patterns x ( n ) are acting like attractors. – p. 142

  12. Haykin’s Digit Example Suppose we stored the following digits in the Hopfield network: Energy = −67.73 Energy = −67.87 Energy = −82.33 Energy = −86.6 Energy = −77.73 Pattern 0 Pattern 1 Pattern 2 Pattern 3 Pattern 4 Energy = −90.47 Energy = −83.13 Energy = −66.93 Pattern 6 Pattern 9 Pattern box – p. 143

  13. Updated States of Corrupted Digit 6 Energy = −10.27 Energy = −12.2 Energy = −13.6 Energy = −14.87 Energy = −15.87 Start Pattern updated unit 40 updated unit 39 updated unit 81 updated unit 98 Energy = −18.07 Energy = −20.4 Energy = −22.2 Energy = −23.33 Energy = −25.73 updated unit 80 updated unit 12 updated unit 114 updated unit 115 updated unit 49 Energy = −26.8 Energy = −29.67 Energy = −30.13 Energy = −31.47 Energy = −34.4 updated unit 117 updated unit 3 updated unit 48 updated unit 6 updated unit 79 – p. 144

  14. Updated States of Corrupted Digit 6 (cont.) Energy = −36.73 Energy = −38.4 Energy = −41.07 Energy = −42.4 Energy = −45.27 updated unit 113 updated unit 57 updated unit 103 updated unit 18 updated unit 109 Energy = −47.6 Energy = −50.4 Energy = −52.67 Energy = −56.47 Energy = −58.4 updated unit 83 updated unit 71 updated unit 77 updated unit 26 updated unit 15 Energy = −60.67 Energy = −63.33 Energy = −64.47 Energy = −68 Energy = −71.27 updated unit 31 updated unit 58 updated unit 16 updated unit 29 updated unit 88 – p. 145

  15. Updated States of Corrupted Digit 6 (cont.) The resulting pattern (stable state with energy − 90 . 47 ) matches the desired pattern. Energy = −73.73 Energy = −77.27 Energy = −81.47 Energy = −84.27 Energy = −87.33 updated unit 72 updated unit 90 updated unit 19 updated unit 21 updated unit 25 Energy = −90.47 Energy = −90.47 updated unit 73 Original Pattern 6 – p. 146

  16. Recall a Spurious Pattern Energy = −28.27 Energy = −28.27 Energy = −30.27 Energy = −31.93 Energy = −32.8 Start Pattern updated unit 44 updated unit 12 updated unit 64 updated unit 45 Energy = −33.4 Energy = −35.6 Energy = −37.6 Energy = −40 Energy = −42.6 updated unit 98 updated unit 111 updated unit 50 updated unit 81 updated unit 95 Energy = −44.53 Energy = −44.8 Energy = −48.13 Energy = −50.53 Energy = −51.87 updated unit 65 updated unit 15 updated unit 54 updated unit 62 updated unit 33 – p. 147

  17. Recall a Spurious Pattern (cont.) Energy = −53.73 Energy = −56.53 Energy = −59.93 Energy = −61.6 Energy = −63.2 updated unit 37 updated unit 91 updated unit 58 updated unit 84 updated unit 43 Energy = −63.73 Energy = −66.8 Energy = −67.6 Energy = −69 Energy = −70.4 updated unit 28 updated unit 112 updated unit 48 updated unit 88 updated unit 26 Energy = −71.93 Energy = −74.13 Energy = −76.6 Energy = −80.27 Energy = −81.4 updated unit 73 updated unit 70 updated unit 40 updated unit 117 updated unit 106 – p. 148

  18. Recall a Spurious Pattern (cont.) The Hopfield network settled down in local minima with energy − 84 . 93 . This pattern however is not the desired pattern. It is a pattern which was not stored in the network. Energy = −84.8 Energy = −84.93 Energy = −83.13 updated unit 61 updated unit 15 Original Pattern 9 – p. 149

  19. Incorrect Recall of Corrupted Pattern 2 Energy = −22.07 Energy = −22.07 Energy = −22.13 Energy = −22.33 Energy = −24.13 Start Pattern updated unit 97 updated unit 17 updated unit 58 updated unit 45 Energy = −24.53 Energy = −27.6 Energy = −28.33 Energy = −29.87 Energy = −31.47 updated unit 18 updated unit 100 updated unit 7 updated unit 103 updated unit 81 Energy = −32.13 Energy = −32.33 Energy = −35.47 Energy = −36.53 Energy = −38.67 updated unit 68 updated unit 86 updated unit 119 updated unit 33 updated unit 87 – p. 150

  20. Incorrect Recall of Corrupted Pattern 2 (cont.) Energy = −39.2 Energy = −41.73 Energy = −45.47 Energy = −48 Energy = −49.6 updated unit 57 updated unit 73 updated unit 120 updated unit 104 updated unit 43 Energy = −51.6 Energy = −51.67 Energy = −55.6 Energy = −56.4 Energy = −58.27 updated unit 91 updated unit 37 updated unit 3 updated unit 31 updated unit 24 Energy = −60.73 Energy = −61.87 Energy = −62.87 Energy = −64.8 Energy = −68.93 updated unit 101 updated unit 41 updated unit 117 updated unit 65 updated unit 10 – p. 151

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend