hope you had a fantastic spring break hope you had a
play

Hope you had a FANTASTIC spring break! Hope you had a FANTASTIC - PowerPoint PPT Presentation

Hope you had a FANTASTIC spring break! Hope you had a FANTASTIC spring break! Thanksgiving CS 188: Artificial Intelligence Neural Nets (ctd) and IRL Instructor: Anca Dragan --- University of California, Berkeley [These slides were created by


  1. Hope you had a FANTASTIC spring break!

  2. Hope you had a FANTASTIC spring break! Thanksgiving

  3. CS 188: Artificial Intelligence Neural Nets (ctd) and IRL Instructor: Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

  4. Reminder: Linear Classifiers § Inputs are feature values § Each feature has a weight § Sum is the activation § If the activation is: w 1 f 1 S w 2 § Positive, output +1 >0? f 2 w 3 § Negative, output -1 f 3

  5. Multiclass Logistic Regression § Multi-class linear classification § A weight vector for each class: § Score (activation) of a class y: § Prediction w/highest score wins: § How to make the scores into probabilities? e z 1 e z 2 e z 3 z 1 , z 2 , z 3 → e z 1 + e z 2 + e z 3 , e z 1 + e z 2 + e z 3 , e z 1 + e z 2 + e z 3 original activations softmax activations

  6. Best w? § Maximum likelihood estimation: X log P ( y ( i ) | x ( i ) ; w ) max ll ( w ) = max w w i e w y ( i ) · f ( x ( i ) ) P ( y ( i ) | x ( i ) ; w ) = with: y e w y · f ( x ( i ) ) P = Multi-Class Logistic Regression

  7. Gradient in n dimensions   ∂ g ∂ w 1 ∂ g   ∂ w 2 r g =     · · ·   ∂ g ∂ w n

  8. Optimization Procedure: Gradient Ascent § init w § for iter = 1, 2, … w w + α ⇤ r g ( w ) : learning rate --- tweaking parameter that needs to be § α chosen carefully § How? Try multiple choices § Crude rule of thumb: update changes about 0.1 – 1 % w

  9. Neural Networks

  10. Multi-class Logistic Regression § = special case of neural network f 1 (x) e z 1 z 1 s P ( y 1 | x ; w ) = e z 1 + e z 2 + e z 3 o f 2 (x) f e z 2 t z 2 P ( y 2 | x ; w ) = e z 1 + e z 2 + e z 3 f 3 (x) m a x e z 3 … z 3 P ( y 3 | x ; w ) = e z 1 + e z 2 + e z 3 f K (x)

  11. Deep Neural Network = Also learn the features! z (1) z ( n ) z (2) x 1 z ( n − 1) 1 1 1 1 z ( OUT ) s P ( y 1 | x ; w ) = 1 z ( n ) z (1) z (2) x 2 o z ( n − 1) 2 2 2 2 f … t z ( OUT ) P ( y 2 | x ; w ) = z ( n ) z (2) x 3 z (1) z ( n − 1) 2 m 3 3 3 3 a x … … … … … z ( OUT ) P ( y 3 | x ; w ) = 3 z ( n ) z (1) z (2) x L z ( n − 1) K (1) K (2) K ( n ) K ( n − 1) z ( k ) W ( k − 1 ,k ) z ( k − 1) X g = nonlinear activation function = g ( ) i i,j j j

  12. Deep Neural Network: Also Learn the Features! § Training the deep neural network is just like logistic regression: X log P ( y ( i ) | x ( i ) ; w ) max ll ( w ) = max w w i just w tends to be a much, much larger vector J à just run gradient ascent + stop when log likelihood of hold-out data starts to decrease

  13. How well does it work?

  14. Computer Vision

  15. Object Detection

  16. Manual Feature Design

  17. Features and Generalization [HoG: Dalal and Triggs, 2005]

  18. Features and Generalization Image HoG

  19. Performance graph credit Matt Zeiler, Clarifai

  20. Performance graph credit Matt Zeiler, Clarifai

  21. Performance AlexNet graph credit Matt Zeiler, Clarifai

  22. Performance AlexNet graph credit Matt Zeiler, Clarifai

  23. Performance AlexNet graph credit Matt Zeiler, Clarifai

  24. MS COCO Image Captioning Challenge Karpathy & Fei-Fei, 2015; Donahue et al., 2015; Xu et al, 2015; many more

  25. Visual QA Challenge Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh

  26. Speech Recognition graph credit Matt Zeiler, Clarifai

  27. Machine Translation Google Neural Machine Translation (in production)

  28. What’s still missing? – correlation \neq causation [Ribeiro et al.]

  29. What’s still missing? – covariate shift [Carroll et al.]

  30. What’s still missing? – covariate shift [Carroll et al.]

  31. What’s still missing – knowing what loss to optimize

  32. CS 188: Artificial Intelligence Neural Nets (ctd) and IRL Instructor: Anca Dragan --- University of California, Berkeley [These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]

  33. Reminder: Optimal Policies R(s) = -0.01 R(s) = -0.03 R(s) = -0.4 R(s) = -2.0

  34. Utility? Clear utility function Not so clear utility function

  35. Planning/RL 𝑆 → 𝜌 ∗

  36. Inverse Planning/RL 𝜌 ∗ → 𝑆

  37. Inverse Planning/RL 𝜊 → 𝑆

  38. Inverse Planning/RL

  39. Inverse Planning/RL

  40. IRL is relevant to all 3 types of people: person in its its end-user its designer environment

  41. Inverse Planning/RL given: 𝜊 " find: 𝑆(𝑡, 𝑏) 𝑺 𝜊 # ≥ 𝑺 𝜊 ∀𝜊 s.t.

  42. Inverse Planning/RL given: 𝜊 " find: 𝑆 𝑡, 𝑏 = 𝜄 $ 𝜚(𝑡, 𝑏) 𝑺 𝜊 # ≥ 𝑺 𝜊 ∀𝜊 s.t.

  43. Inverse Planning/RL given: 𝜊 " find: 𝑆 𝑡, 𝑏 = 𝜄 $ 𝜚(𝑡, 𝑏) 𝑺 𝜊 # ≥ max 𝑺 𝜊 s.t. %

  44. Problem given: 𝜊 " zero/constant reward is a solution find: 𝑆 𝑡, 𝑏 = 𝜄 $ 𝜚(𝑡, 𝑏) 𝑺 𝜊 # ≥ max 𝑺 𝜊 s.t. %

  45. Revised formulation given: 𝜊 " find: 𝑆 𝑡, 𝑏 = 𝜄 $ 𝜚(𝑡, 𝑏) 𝑺 𝜊 ! ≥ max [𝑺 𝜊 + 𝑚(𝜊, 𝜊 ! )] s.t. " small close to the demonstration

  46. Optimization max [𝑺 𝜊 ! − max [𝑺 𝜊 + 𝑚 𝜊, 𝜊 ! ]] # "

  47. Optimization [𝜄 " 𝝔(𝜊 # ) − max [𝜄 " 𝝔 𝜊 + 𝑚 𝜊, 𝜊 # ]] max ! $

  48. Optimization ∗ = arg max 𝜊 # [𝜄 " 𝝔(𝜊 # ) − max [𝜄 " 𝝔 𝜊 + 𝑚 𝜊, 𝜊 # ]] max ! $

  49. Optimization [𝜄 " 𝝔(𝜊 # ) − max [𝜄 " 𝝔 𝜊 + 𝑚 𝜊, 𝜊 # ]] max ! $ ∗ ) ∇ # = 𝝔(𝜊 ! ) - 𝝔(𝜊 # subgradient:

  50. Optimization [𝜄 " 𝝔(𝜊 # ) − max [𝜄 " 𝝔 𝜊 + 𝑚 𝜊, 𝜊 # ]] max ! $ ∗ ) ∇ # = 𝝔(𝜊 ! ) - 𝝔(𝜊 # subgradient: ∗ ) ) 𝜄 %&' = 𝜄 % + 𝛽 ( 𝝔(𝜊 ! ) - 𝝔(𝜊 # !

  51. Interpretation ∗ ) 𝝔(𝜊 # ! goes on rocks: [1,0] 𝝔(𝜊 ! ) goes on grass: [0,1] ∗ ) ) 𝜄 %&' = 𝜄 % + 𝛽 ( 𝝔(𝜊 ! ) - 𝝔(𝜊 # !

  52. Interpretation ∗ ) 𝝔(𝜊 # ! goes on rocks: [1,0] 𝝔(𝜊 ! ) goes on grass: [0,1] 𝜄 %&' = 𝜄 % + 𝛽 ([-1,1]) ∗ ) ) 𝜄 %&' = 𝜄 % + 𝛽 ( 𝝔(𝜊 ! ) - 𝝔(𝜊 # !

  53. Interpretation ∗ ) 𝝔(𝜊 # ! goes on rocks: [1,0] 𝝔(𝜊 ! ) goes on grass: [0,1] rocks weight goes down grass weight goes up 𝜄 %&' = 𝜄 % + 𝛽 ([-1,1]) ∗ ) ) 𝜄 %&' = 𝜄 % + 𝛽 ( 𝝔(𝜊 ! ) - 𝝔(𝜊 # !

  54. Interpretation ∗ ) 𝝔(𝜊 # ! goes on rocks: [1,0] 𝝔(𝜊 ! ) goes on grass: [0,1] rocks weight goes down grass weight goes up The new reward likes grass more and rocks less.

  55. Inverse Planning/RL

  56. Inverse Planning/RL

  57. Is the demonstrator really optimal? 𝑺 𝜊 # ≥ 𝑺 𝜊 ∀𝜊

  58. The Bayesian view 𝑄 𝜊 # 𝜄 evidence hidden

  59. The Bayesian view 𝑄 𝜊 # 𝜄 ∝ 𝑓 45 ! 𝝔(% " )

  60. The Bayesian view 𝑄 𝜊 # 𝜄 = 𝑓 45 ! 𝝔(% " ) ∑ % 𝑓 45 ! 𝝔(%)

  61. The Bayesian view 𝑄 𝜊 # 𝜄 = 𝑓 45 ! 𝝔(% " ) ∑ % 𝑓 45 ! 𝝔(%) 𝑐 6 𝜄 ∝ 𝑐 𝜄 𝑄(𝜊 # |𝜄)

  62. The Bayesian view 𝑄 𝜊 # 𝜄 = 𝑓 45 ! 𝝔(% " ) ∑ % 𝑓 45 ! 𝝔(%) max 𝑄(𝜊 # |𝜄) 5

  63. The Bayesian view log 𝑓 45 ! 𝝔(% " ) max ∑ % 𝑓 45 ! 𝝔(%) 5

  64. The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 %

  65. The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % 1 𝑓 %! ! 𝝔 $ ) ∇ ! = 𝛾𝝔 𝜊 # − ∑ $ 𝑓 %! ! 𝝔 $ ∇(= $

  66. The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % 1 𝑓 %! ! 𝝔 $ ) ∇ ! = 𝛾𝝔 𝜊 # − ∑ $ 𝑓 %! ! 𝝔 $ ∇(= $

  67. The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % 1 𝑓 %! ! 𝝔 $ 𝛾𝝔 𝜊 ∇ ! = 𝛾𝝔 𝜊 # − ∑ $ 𝑓 %! ! 𝝔 $ = $

  68. The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % 𝑓 %! ! 𝝔($) ∇ ! = 𝛾𝝔 𝜊 # − = ∑ $) 𝑓 %! ! 𝝔 $) 𝛾𝝔 𝜊 $

  69. The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % ∇ ! = 𝛾𝝔 𝜊 # − = 𝑄(𝜊|𝜄)𝛾𝝔 𝜊 $

  70. The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % ∇ ! = 𝛾(𝝔 𝜊 # − 𝔽 $~! 𝝔 𝜊 )

  71. The Bayesian view 𝑓 45 ! 𝝔(%) 𝛾𝜄 $ 𝝔 𝜊 # − log ? max 5 % ∇ ! = 𝛾(𝝔 𝜊 # − 𝔽 $~! 𝝔 𝜊 ) expected feature values produced by the current reward

  72. The Bayesian view 𝑄 𝜊 # 𝜄 = 𝑓 45 ! 𝝔(% " ) ∑ % 𝑓 45 ! 𝝔(%) 𝑐 6 𝜄 ∝ 𝑐 𝜄 𝑄(𝜊 # |𝜄)

  73. The Bayesian view (actions) 𝑄 𝑏 # 𝑡, 𝜄 = 𝑓 4A(B,C " ;5) ∑ C 𝑓 4A(B,C;5) 𝑐 6 𝜄 ∝ 𝑐 𝜄 𝑄(𝑏 # |𝜄)

  74. [Ratliff et al. Maximum Margin Planning ]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend