Hope you had a FANTASTIC spring break! Hope you had a FANTASTIC - - PowerPoint PPT Presentation
Hope you had a FANTASTIC spring break! Hope you had a FANTASTIC - - PowerPoint PPT Presentation
Hope you had a FANTASTIC spring break! Hope you had a FANTASTIC spring break! Thanksgiving CS 188: Artificial Intelligence Neural Nets (ctd) and IRL Instructor: Anca Dragan --- University of California, Berkeley [These slides were created by
Hope you had a FANTASTIC spring break! Thanksgiving
CS 188: Artificial Intelligence
Neural Nets (ctd) and IRL
Instructor: Anca Dragan --- University of California, Berkeley
[These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]
Reminder: Linear Classifiers
§ Inputs are feature values § Each feature has a weight § Sum is the activation § If the activation is:
§ Positive, output +1 § Negative, output -1
S
f1 f2 f3 w1 w2 w3
>0?
Multiclass Logistic Regression
§ Multi-class linear classification
§ A weight vector for each class: § Score (activation) of a class y: § Prediction w/highest score wins:
§ How to make the scores into probabilities?
z1, z2, z3 → ez1 ez1 + ez2 + ez3 , ez2 ez1 + ez2 + ez3 , ez3 ez1 + ez2 + ez3
- riginal activations
softmax activations
Best w?
§ Maximum likelihood estimation: with:
max
w
ll(w) = max
w
X
i
log P(y(i)|x(i); w)
P(y(i)|x(i); w) = ewy(i)·f(x(i)) P
y ewy·f(x(i))
= Multi-Class Logistic Regression
Gradient in n dimensions
rg =
∂g ∂w1 ∂g ∂w2
· · ·
∂g ∂wn
Optimization Procedure: Gradient Ascent
§ init § for iter = 1, 2, …
w
§ : learning rate --- tweaking parameter that needs to be chosen carefully § How? Try multiple choices
§ Crude rule of thumb: update changes about 0.1 – 1 %
α w w w + α ⇤ rg(w)
Neural Networks
Multi-class Logistic Regression
§ = special case of neural network
z1 z2 z3
f1(x) f2(x) f3(x) fK(x)
s
- f
t m a x
P(y1|x; w) = ez1 ez1 + ez2 + ez3 P(y2|x; w) = ez2 ez1 + ez2 + ez3
P(y3|x; w) = ez3 ez1 + ez2 + ez3
…
Deep Neural Network = Also learn the features!
s
- f
t m a x
P(y1|x; w) = P(y2|x; w) =
P(y3|x; w) =
…
x1 x2 x3 xL
… … … …
z(1)
1
z(1)
2
z(1)
3
z(1)
K(1)
z(n)
K(n)
z(2)
K(2)
z(2)
1
z(2)
2
z(2)
3
z(n)
3
z(n)
2
z(n)
1
z(OUT )
1
z(OUT )
2
z(OUT )
3
z(n−1)
3
z(n−1)
2
z(n−1)
1
z(n−1)
K(n−1)
…
z(k)
i
= g( X
j
W (k−1,k)
i,j
z(k−1)
j
)
g = nonlinear activation function
Deep Neural Network: Also Learn the Features!
§ Training the deep neural network is just like logistic regression:
just w tends to be a much, much larger vector J àjust run gradient ascent + stop when log likelihood of hold-out data starts to decrease
max
w
ll(w) = max
w
X
i
log P(y(i)|x(i); w)
How well does it work?
Computer Vision
Object Detection
Manual Feature Design
Features and Generalization
[HoG: Dalal and Triggs, 2005]
Features and Generalization
Image HoG
Performance
graph credit Matt Zeiler, Clarifai
Performance
graph credit Matt Zeiler, Clarifai
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
Performance
graph credit Matt Zeiler, Clarifai
AlexNet
MS COCO Image Captioning Challenge
Karpathy & Fei-Fei, 2015; Donahue et al., 2015; Xu et al, 2015; many more
Visual QA Challenge
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh
Speech Recognition
graph credit Matt Zeiler, Clarifai
Machine Translation
Google Neural Machine Translation (in production)
What’s still missing? – correlation \neq causation
[Ribeiro et al.]
What’s still missing? – covariate shift
[Carroll et al.]
What’s still missing? – covariate shift
[Carroll et al.]
What’s still missing – knowing what loss to optimize
CS 188: Artificial Intelligence
Neural Nets (ctd) and IRL
Instructor: Anca Dragan --- University of California, Berkeley
[These slides were created by Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley. All CS188 materials are available at http://ai.berkeley.edu.]
Reminder: Optimal Policies
R(s) = -2.0 R(s) = -0.4 R(s) = -0.03 R(s) = -0.01
Utility?
Clear utility function Not so clear utility function
Planning/RL
𝑆 → 𝜌∗
Inverse Planning/RL
𝜌∗ → 𝑆
Inverse Planning/RL
𝜊 → 𝑆
Inverse Planning/RL
Inverse Planning/RL
IRL is relevant to all 3 types of people:
its end-user person in its environment its designer
Inverse Planning/RL
given: 𝜊" find: 𝑆(𝑡, 𝑏) s.t.
𝑺 𝜊# ≥ 𝑺 𝜊 ∀𝜊
Inverse Planning/RL
given: 𝜊" find: 𝑆 𝑡, 𝑏 = 𝜄$𝜚(𝑡, 𝑏) s.t.
𝑺 𝜊# ≥ 𝑺 𝜊 ∀𝜊
Inverse Planning/RL
given: 𝜊" find: 𝑆 𝑡, 𝑏 = 𝜄$𝜚(𝑡, 𝑏) s.t.
𝑺 𝜊# ≥ max
%
𝑺 𝜊
Problem
given: 𝜊" find: 𝑆 𝑡, 𝑏 = 𝜄$𝜚(𝑡, 𝑏) s.t. zero/constant reward is a solution
𝑺 𝜊# ≥ max
%
𝑺 𝜊
Revised formulation
given: 𝜊" find: 𝑆 𝑡, 𝑏 = 𝜄$𝜚(𝑡, 𝑏) s.t.
𝑺 𝜊! ≥ max
"
[𝑺 𝜊 + 𝑚(𝜊, 𝜊!)]
small close to the demonstration
Optimization
max
#
[𝑺 𝜊! − max
"
[𝑺 𝜊 + 𝑚 𝜊, 𝜊! ]]
Optimization
max
!
[𝜄"𝝔(𝜊#) − max
$
[𝜄"𝝔 𝜊 + 𝑚 𝜊, 𝜊# ]]
Optimization
max
!
[𝜄"𝝔(𝜊#) − max
$
[𝜄"𝝔 𝜊 + 𝑚 𝜊, 𝜊# ]]
𝜊#
∗ = arg max
Optimization
subgradient:
∇#= 𝝔(𝜊!)- 𝝔(𝜊#
∗)
max
!
[𝜄"𝝔(𝜊#) − max
$
[𝜄"𝝔 𝜊 + 𝑚 𝜊, 𝜊# ]]
Optimization
subgradient:
∇#= 𝝔(𝜊!)- 𝝔(𝜊#
∗)
max
!
[𝜄"𝝔(𝜊#) − max
$
[𝜄"𝝔 𝜊 + 𝑚 𝜊, 𝜊# ]]
𝜄%&' = 𝜄% + 𝛽(𝝔(𝜊!)- 𝝔(𝜊#!
∗ ))
Interpretation
𝜄%&' = 𝜄% + 𝛽(𝝔(𝜊!)- 𝝔(𝜊#!
∗ ))
𝝔(𝜊#!
∗ )
goes on rocks: [1,0]
𝝔(𝜊!)
goes on grass: [0,1]
Interpretation
𝜄%&' = 𝜄% + 𝛽(𝝔(𝜊!)- 𝝔(𝜊#!
∗ ))
𝝔(𝜊#!
∗ )
goes on rocks: [1,0]
𝝔(𝜊!)
goes on grass: [0,1]
𝜄%&' = 𝜄% + 𝛽([-1,1])
Interpretation
𝜄%&' = 𝜄% + 𝛽(𝝔(𝜊!)- 𝝔(𝜊#!
∗ ))
𝝔(𝜊#!
∗ )
goes on rocks: [1,0]
𝝔(𝜊!)
goes on grass: [0,1]
𝜄%&' = 𝜄% + 𝛽([-1,1])
rocks weight goes down grass weight goes up
Interpretation
𝝔(𝜊#!
∗ )
goes on rocks: [1,0]
𝝔(𝜊!)
goes on grass: [0,1] rocks weight goes down grass weight goes up The new reward likes grass more and rocks less.
Inverse Planning/RL
Inverse Planning/RL
Is the demonstrator really optimal?
𝑺 𝜊# ≥ 𝑺 𝜊 ∀𝜊
The Bayesian view
𝑄 𝜊# 𝜄
evidence hidden
The Bayesian view
𝑄 𝜊# 𝜄 ∝ 𝑓45!𝝔(%")
The Bayesian view
𝑄 𝜊# 𝜄 = 𝑓45!𝝔(%") ∑% 𝑓45!𝝔(%)
The Bayesian view
𝑄 𝜊# 𝜄 = 𝑓45!𝝔(%") ∑% 𝑓45!𝝔(%) 𝑐6 𝜄 ∝ 𝑐 𝜄 𝑄(𝜊#|𝜄)
The Bayesian view
max
5
𝑄(𝜊#|𝜄) 𝑄 𝜊# 𝜄 = 𝑓45!𝝔(%") ∑% 𝑓45!𝝔(%)
The Bayesian view
max
5
log 𝑓45!𝝔(%") ∑% 𝑓45!𝝔(%)
The Bayesian view
max
5
𝛾𝜄$𝝔 𝜊# − log ?
%
𝑓45!𝝔(%)
The Bayesian view
max
5
𝛾𝜄$𝝔 𝜊# − log ?
%
𝑓45!𝝔(%)
∇!= 𝛾𝝔 𝜊# − 1 ∑$ 𝑓%!!𝝔 $ ∇(=
$
𝑓%!!𝝔 $ )
The Bayesian view
max
5
𝛾𝜄$𝝔 𝜊# − log ?
%
𝑓45!𝝔(%)
∇!= 𝛾𝝔 𝜊# − 1 ∑$ 𝑓%!!𝝔 $ ∇(=
$
𝑓%!!𝝔 $ )
The Bayesian view
max
5
𝛾𝜄$𝝔 𝜊# − log ?
%
𝑓45!𝝔(%)
∇!= 𝛾𝝔 𝜊# − 1 ∑$ 𝑓%!!𝝔 $ =
$
𝑓%!!𝝔 $ 𝛾𝝔 𝜊
The Bayesian view
max
5
𝛾𝜄$𝝔 𝜊# − log ?
%
𝑓45!𝝔(%)
∇!= 𝛾𝝔 𝜊# − =
$
𝑓%!!𝝔($) ∑$) 𝑓%!!𝝔 $) 𝛾𝝔 𝜊
The Bayesian view
max
5
𝛾𝜄$𝝔 𝜊# − log ?
%
𝑓45!𝝔(%)
∇!= 𝛾𝝔 𝜊# − =
$
𝑄(𝜊|𝜄)𝛾𝝔 𝜊
The Bayesian view
max
5
𝛾𝜄$𝝔 𝜊# − log ?
%
𝑓45!𝝔(%)
∇!= 𝛾(𝝔 𝜊# − 𝔽$~!𝝔 𝜊 )
The Bayesian view
max
5
𝛾𝜄$𝝔 𝜊# − log ?
%
𝑓45!𝝔(%)
∇!= 𝛾(𝝔 𝜊# − 𝔽$~!𝝔 𝜊 )
expected feature values produced by the current reward
The Bayesian view
𝑄 𝜊# 𝜄 = 𝑓45!𝝔(%") ∑% 𝑓45!𝝔(%) 𝑐6 𝜄 ∝ 𝑐 𝜄 𝑄(𝜊#|𝜄)
The Bayesian view (actions)
𝑄 𝑏# 𝑡, 𝜄 = 𝑓4A(B,C";5) ∑C 𝑓4A(B,C;5) 𝑐6 𝜄 ∝ 𝑐 𝜄 𝑄(𝑏#|𝜄)
[Ratliff et al. Maximum Margin Planning]
[Levine et al. Continuous Inverse Optimal Control with Locally Linear Examples]