neural networks and
play

Neural networks and Reinforcement learning review CS 540 Yingyu - PowerPoint PPT Presentation

Neural networks and Reinforcement learning review CS 540 Yingyu Liang Neural Networks Outline Building unit: neuron Linear perceptron Non-linear perceptron The power/limit of a single perceptron Learning of a single


  1. Neural networks and Reinforcement learning review CS 540 Yingyu Liang

  2. Neural Networks

  3. Outline • Building unit: neuron • Linear perceptron • Non-linear perceptron • The power/limit of a single perceptron • Learning of a single perceptron • Neural network: a network of neurons • Layers, hidden units • Learning of neural network: backpropagation (gradient descent)

  4. Linear perceptron • Input: 𝑦 1 , 𝑦 2 , … , 𝑦 𝐸 (For notation simplicity, define 𝑦 0 = 1) • Weights: 𝑥 1 , 𝑥 2 , … , 𝑥 𝐸 • Bias: 𝑥 0 𝐸 • Output: 𝑏 = σ 𝑒=0 𝑥 𝑒 𝑦 𝑒 1 …

  5. Nonlinear perceptron • Input: 𝑦 1 , 𝑦 2 , … , 𝑦 𝐸 (For notation simplicity, define 𝑦 0 = 1) • Weights: 𝑥 1 , 𝑥 2 , … , 𝑥 𝐸 • Bias: 𝑥 0 • Activation function: 𝑕 𝑨 = step 𝑨 , sigmoid 𝑨 , relu 𝑨 , … 𝐸 • Output: 𝑏 = 𝑕(σ 𝑒=0 𝑥 𝑒 𝑦 𝑒 ) 1 …

  6. Example Question • Will you go to the festival? • Go only if Weather is favorable and at least one of the other two conditions is favorable ? Weather Company Proximity All inputs are binary; 1 is favorable

  7. Multi-layer neural networks • Training: encode a label 𝑧 by an indicator vector ▪ class1=(1,0,0,…,0), class2=(0,1,0,…,0) etc. • Test: choose the class corresponding to the largest output unit (3) 𝑥 11 2 = 𝑕 (2) 𝑏 1 ෍ 𝑦 𝑒 𝑥 1𝑒 (2) 2 𝑥 1𝑗 (3) 𝑥 11 𝑏 1 = 𝑕 ෍ 𝑏 𝑗 𝑒 (3) 𝑥 𝐿1 𝑗 𝑦 1 (2) 𝑥 21 (3) (2) 𝑥 12 … 𝑥 31 2 = 𝑕 (2) 𝑏 2 ෍ 𝑦 𝑒 𝑥 2𝑒 (3) 𝑥 𝐿2 (2) 𝑥 12 𝑒 (2) 𝑥 22 𝑦 2 (3) 𝑥 13 2 𝑥 𝐿𝑗 (3) (2) 𝑏 𝐿 = 𝑕 ෍ 𝑏 𝑗 𝑥 32 2 = 𝑕 (2) 𝑏 3 ෍ 𝑦 𝑒 𝑥 3𝑒 (3) 𝑥 𝐿3 𝑗 𝑒 slide 7

  8. Learning in neural network • Again we will minimize the error ( 𝐿 outputs): 𝐿 𝐹 = 1 𝑧 − 𝑏 2 = ෍ 𝑏 𝑑 − 𝑧 𝑑 2 2 ෍ 𝐹 𝑦 , 𝐹 𝑦 = 𝑦∈𝐸 𝑑=1 • 𝑦 : one training point in the training set 𝐸 • 𝑏 𝑑 : the 𝑑 -th output for the training point 𝑦 • 𝑧 𝑑 : the 𝑑 -th element of the label indicator vector for 𝑦 𝑏 1 1 𝑦 1 0 … = 𝑧 … 𝑦 2 0 𝑏 𝐿 0 slide 8

  9. Backpropagation Layer (1) Layer (2) Layer (3) Layer (4) (4) 𝜀 1 (3) 𝑏 1 (4) 𝑥 11 𝑦 1 𝑏 1 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 4 𝑏 1 (3) 𝑥 11 (4) 𝑨 1 𝐹 𝑦 4 𝑏 2 (3) 𝑥 12 (4) = 𝜖𝐹 𝒚 (4) 𝜀 1 (4) = 2(𝑏 1 − 𝑧 1 )𝑕′ 𝑨 1 𝜖𝑨 1 𝜖𝐹 𝒚 (4) 𝑏 1 (3) (4) = 𝜀 1 By Chain Rule: 𝜖𝑥 11 slide 9

  10. Backpropagation of 𝜀 Layer (1) Layer (2) Layer (3) Layer (4) (3) (4) (2) 𝜀 1 𝜀 1 𝜀 1 𝑦 1 𝑏 1 𝑧 − 𝑏 2 = 𝐹 𝑦 𝑏 2 𝑦 2 (3) (4) (2) 𝜀 2 𝜀 2 𝜀 2 Thus, for any neuron in the network: (𝑚) = ෍ 𝑚+1 𝑥 𝑙𝑘 𝑚+1 𝑚 𝜀 𝜀 𝑙 𝑕′ 𝑨 𝑘 𝑘 𝑙 (𝑚) : 𝜀 of 𝑘 𝑢ℎ Neuron in Layer 𝑚 𝜀 𝑘 (𝑚+1) : 𝜀 of 𝑙 𝑢ℎ Neuron in Layer 𝑚 + 1 𝜀 𝑙 : derivative of 𝑘 𝑢ℎ Neuron in Layer 𝑚 w.r.t. its linear combination input 𝑚 𝑕′ 𝑨 𝑘 (𝑚+1) : Weight from 𝑘 𝑢ℎ Neuron in Layer 𝑚 to 𝑙 𝑢ℎ Neuron in Layer 𝑚 + 1 slide 10 𝑥 𝑙𝑘

  11. Example Question

  12. Example Question

  13. Example Question

  14. Example Question

  15. Convolution: discrete version • Given array 𝑣 𝑢 and 𝑥 𝑢 , their convolution is a function 𝑡 𝑢 +∞ 𝑡 𝑢 = ෍ 𝑣 𝑏 𝑥 𝑢−𝑏 𝑏=−∞ • Written as 𝑡 = 𝑣 ∗ 𝑥 or 𝑡 𝑢 = 𝑣 ∗ 𝑥 𝑢 • When 𝑣 𝑢 or 𝑥 𝑢 is not defined, assumed to be 0

  16. Convolution illustration 𝑥 = [z, y, x] 𝑡 3 𝑣 = [a, b, c, d, e, f] xb+yc+zd 𝐱 𝟑 𝐱 𝟐 𝐱 𝟏 x y z 𝐯 𝟐 𝒗 𝟑 𝐯 𝟒 a b c d e f

  17. Pooling illustration 𝑣 = [a, b, c, d, e, f] Max(b,c,d) 𝐯 𝟐 𝒗 𝟑 𝐯 𝟒 a b c d e f

  18. Example question 𝑥 = [-1,1,1] What is the value 𝑡 = 𝑣 ∗ 𝑥 ? (Valid padding) 𝑣 = [1,2,3,4,5,6] 𝐱 𝟑 𝐱 𝟐 𝐱 𝟏 1 1 -1 1 2 3 4 5 6

  19. Reinforcement Learning

  20. Outline • The reinforcement learning task • Markov decision process • Value functions • Value iteration • Q functions • Q learning

  21. Reinforcement learning as a Markov decision process (MDP) • Markov assumption agent = ( | , , , ,...) ( | , ) P s s a s a P s s a + − − + t 1 t t t 1 t 1 t 1 t t action state reward • also assume reward is Markovian environment = ( | , , , ,...) ( | , ) P r s a s a P r s a + − − + 1 1 1 1 t t t t t t t t a 0 a 1 a 2 s 0 s 1 s 2 r 0 r 1 r 2 Goal: learn a policy π : S → A for choosing actions that maximizes +  +  +    2 [ ...] where 0 1 E r r r + + 1 2 t t t 21 for every possible starting state s 0

  22. Value function for a policy • given a policy π : S → A define    =  t ( ) [ ] assuming action sequence chosen V s E r t according to π starting at state s = 0 t we want the optimal policy π * where • p * = argmax p V p ( s ) for all s we’ll denote the value function for this optimal policy as V * ( s ) 22

  23. Value iteration for learning V * ( s ) initialize V ( s ) arbitrarily loop until policy good enough { loop for s ∈ S { loop for a ∈ A {   +  ( , ) ( , ) ( ' | , ) ( ' ) Q s a r s a P s s a V s  ' s S }  ( ) max ( , ) V s Q s a a } } 23

  24. Q function define a new function, closely related to V*      +  * ( , ) ( , ) ( ' ) Q s a E r s a E V s ' | , s s a if agent knows Q ( s, a ) , it can choose optimal action without knowing P ( s’ | s , a )    * * ( ) arg max ( , ) ( ) max ( , ) s Q s a V s Q s a a a and it can learn Q ( s, a ) without knowing P ( s ’ | s , a ) 24

  25. Q learning for deterministic worlds ˆ  for each s, a initialize table entry ( , ) 0 Q s a observe current state s do forever select an action a and execute it receive immediate reward r observe the new state s ’ update table entry ˆ ˆ  +  ( , ) max ( ' , ' ) Q s a r Q s a s ← s ’ ' a 25

  26. Example question

  27. Example question

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend