SLIDE 1
- n variational inference and optimal control
Patrick van der Smagt
Director of AI Research Volkswagen Group Munich, Germany https://argmax.ai
VO LKSWAGEN GRO U P AI RESEARCH
SLIDE 2 controller plant u(t) x(t+1)
K z-1
control approach #1: feedback control
problem: requires very fast feedback loop
SLIDE 3 controller plant u(t) x(t+1)
K z-1
LQR
control approach #2: model-based feedback control
problem: requires fast feedback loop and inverse model
model-1
SLIDE 4 controller simulator "model” u(t)
x(t+1:T)
K z-1 plant
u(t) x(t+1)
control approach #3: model-reference control
simulator "dreams" the future, aka predictive coding problem: how do I get this model?
SLIDE 5
problems
1) engineered models are expensive to set up 2) engineered models are expensive to compute 3) engineered models do not scale
SLIDE 6
we can write
we really want to represent p(x) p(x) = Z p(x | z) p(z) dz
z
x
SLIDE 7
Two problems: (1) how do we shape to carry the right information of ? A: We don't hand-design it.
Assume it is a Gaussian pd. (2) how do we compute the integral? It is intractable (we only have the data; need MCMC)
we really want to represent p(x) p(x) = Z p(x | z) p(z) dz p(z) x
z
x
SLIDE 8
Trick to do efficient MCMC: (1) we choose a specific and look in its neighbourhood (to find that most likely produced it) (2) use to sample the corresponding (3) evaluate there
we really want to represent p(z | x) p(z) p(x) = Z p(x | z) p(z) dz x p(x | z) p(z)
bummer, we don't have it
p(x)
z
x x
SLIDE 9
Trick to do efficient MCMC: (1) we choose a specific and look in its neighbourhood (to find that most likely produced it) (2) use to sample the corresponding (3) evaluate there
we really want to represent p(z) p(x) = Z p(x | z) p(z) dz x p(x | z) p(z)
p(x)
q(z | x)
z
x x
SLIDE 10 minimise Kullback-Leibler to make q look like p KL[q(z|x)kp(z|x)] = X
z
q(z|x) log q(z|x) p(z|x) = E[log q(z|x) log p(z|x)] = E log q(z|x) log p(x|z)P(z) P(x)
- = E[log q(z|x) log p(x|z) log p(z) + log p(x)]
log p(x) KL[q(z|x)kp(z|x)] = E[log p(x|z) (log q(z|x) log p(z))] = E[log p(x|z)] KL[q(z|x)kp(z)]
SLIDE 11 log p(x) KL[q(z|x)kp(z|x)] = E[log p(x|z)] KL[q(z|x)kp(z)]
w e c a n g e t
r g e n e r a t i v e m
e l . . . . . . w h i l e w e w a n t q t
e c l
e t
. . . . . . b y m a x i m i s i n g t h e M L E f
x g i v e n z . . . (
t i m i s i n g t h e r e c
s t r u c t i
b y s a m p l i n g ) . . . a n d p l e a s e m a k e z e q u a l t
h e p r i
.
this is why I chose argmax.ai for our lab website
argmaxθ
I need this I can compute this
SLIDE 12 efficient computation as a neural network: the Variational AutoEncoder
x = reconstruction of x p(x|z) system state x latent space z q(z|x) encoder decoder
} }
~
probability density with (Gauss) prior
loss = reconstruction loss + KL[q(z|x) || prior] "nonlinear PCA"
Durk Kingma and Max Welling, 2013 Rezende, Mohamed & Wierstra, 2014
z
x
SLIDE 13 preprocessing sensor data with VAE—emerging properties
Maximilian Karl, Nutan Chen, Patrick van der Smagt (2014)
1 2 3 4 5 6 7 −150 −100 −50 50 100 150 200 time(s) taxel values
video: SynTouch, LLC
unsupervised
Maximilian
Karl Nutan
Chen
SLIDE 14 Deep Variational Bayes Filter
Graphical model assumes
latent Markovian dynamics i) Observations depend only on the current state, ii) State depends only on the previous state and control signal,
zt zt+1 zt+2 xt xt+1 xt+2 ut ut+1
p(x1:T , z1:T | u1:T ) = ρ(z1)
T −1
Y
t=1
p(zt+1 | zt, ut)
T
Y
t=1
p(xt | zt)
<latexit sha1_base64="451VOAlQ67V5poN/XQhpmDSxUM=">ACrnicbVHbtQwEHXCrYTbAo+8jKiQdtVlFfNCValSBRLisZV2txWbEDmO07XqxJY9QSxR/oWf4Qd4w/4CS8F6p2y0iWzpw5czwe50ZJh3H8Kwhv3b5z97O/ejBw0ePn/SePps63VguJlwrbc9y5oStZigRCXOjBWsypU4zS/eL+unX4R1UtdjXBiRVuy8lqXkD2V9X6YfpvkJXztspYejLshrNJvmxShZrqtlQAziExM51/5+QDiACSIzVBWQtHtLuczt+TtPAphLWYt79Kqfp3B4aY0DL46SYTKMtpyu+SzHxG2XAWS93XgUrwJuAroBu0cfv/5Tg5zno/k0LzphI1csWcm9HYNoyi5Ir0UVJ4Rh/IKdi5mHNauES9vVujt45ZkCSm39qRFW7NWOlXOLarcKyuGc7dW5L/q80aLPfTVtamQVHz9UVlowA1LP8OCmkFR7XwgHEr/azA58wyjv6HI78Euv3km2D6ZkTjET3x23hH1rFDXpCXpE8oeUuOyEdyTCaEB3vBSfApmIVxOA3TMFtLw2DT85xci3D+FzMgzno=</latexit><latexit sha1_base64="Ixq18gsPFA3NVFov5Lbq4w01tg4=">ACrnicbVHbtQwEHXCrYTbAo+8jKiQdtVlFfMCQqpUgYR4bKXdbcUmRI7jdK06cWRPEIsVfoWv4QfgiT/gJ5DwXqjaLSNZOnPmzPF4nDdKWozjX0F47fqNm7d2bkd37t67/6D38NHU6tZwMeFaXOSMyuUrMUEJSpx0hjBqlyJ4/zs7bJ+/EkYK3U9xkUj0oqd1rKUnKGnst73pu+SvITPXebo63E3hFX6ZNCUsliTbUbagD7kJi57v8T0gFEAEljdAGZw3afXTj57TzJEBzLnO4Ry/6eQqH59Y48OIoGSbDaMvpks9yTNx2GUDW241H8SrgKqAbsHvw7vefbz/18Os9yMpNG8rUSNXzNoZjRtMHTMouRJdlLRWNIyfsVMx87BmlbCpW627g2eKaDUxp8aYcVe7HCsnZR5V5ZMZzb7dqS/F9t1mL5KnWybloUNV9fVLYKUMPy76CQRnBUCw8YN9LPCnzODOPofzjyS6DbT74Kpi9GNB7RI7+N2QdO+QJeUr6hJKX5IC8J4dkQniwFxwFH4JZGIfTMA2ztTQMNj2PyaUI538BiLvQOQ=</latexit><latexit sha1_base64="Ixq18gsPFA3NVFov5Lbq4w01tg4=">ACrnicbVHbtQwEHXCrYTbAo+8jKiQdtVlFfMCQqpUgYR4bKXdbcUmRI7jdK06cWRPEIsVfoWv4QfgiT/gJ5DwXqjaLSNZOnPmzPF4nDdKWozjX0F47fqNm7d2bkd37t67/6D38NHU6tZwMeFaXOSMyuUrMUEJSpx0hjBqlyJ4/zs7bJ+/EkYK3U9xkUj0oqd1rKUnKGnst73pu+SvITPXebo63E3hFX6ZNCUsliTbUbagD7kJi57v8T0gFEAEljdAGZw3afXTj57TzJEBzLnO4Ry/6eQqH59Y48OIoGSbDaMvpks9yTNx2GUDW241H8SrgKqAbsHvw7vefbz/18Os9yMpNG8rUSNXzNoZjRtMHTMouRJdlLRWNIyfsVMx87BmlbCpW627g2eKaDUxp8aYcVe7HCsnZR5V5ZMZzb7dqS/F9t1mL5KnWybloUNV9fVLYKUMPy76CQRnBUCw8YN9LPCnzODOPofzjyS6DbT74Kpi9GNB7RI7+N2QdO+QJeUr6hJKX5IC8J4dkQniwFxwFH4JZGIfTMA2ztTQMNj2PyaUI538BiLvQOQ=</latexit><latexit sha1_base64="xj+iNcZHmN+e6TRSPrdvrVIbdkI=">ACrnicbVFNj9MwEHXC1xK+Chy5jKiQWm2pYi4gpJVWcOG4K7XdFU2IHMfZWusklj1BFCs/jz/AjX+D24bVbpeRL158+Z5PM61khbj+E8Q3rl7/6Dg4fRo8dPnj4bPH+xsE1ruJjzRjXmPGdWKFmLOUpU4lwbwapcibP8vOmfvZdGCubeoZrLdKXdSylJyhp7LBLz1ySV7Cjy5z9Osm8A2/dmnkFSy2FtT43hCBKzakb/hHQMEUCiTVNA5vCIdt/c7C3tPAmgr2QOD+l1P0/h5Moax14cJZNkEu053fDZjIn7LmPIBsN4Gm8DbgPagyHp4yQb/E6KhreVqJErZu2SxhpTxwxKrkQXJa0VmvFLdiGWHtasEjZ123V38MYzBZSN8adG2LXOxyrF1XuVdWDFd2v7Yh/1dbtlh+SJ2sdYui5ruLylYBNrD5OyikERzV2gPGjfSzAl8xwzj6H478Euj+k2+Dxbspjaf0NB4ef+rXcUBekdkRCh5T47JF3JC5oQHh8Fp8DVYhnG4CNMw20nDoO95SW5EuPoLF3Lbw=</latexit>
Maximilian
Karl Justin
Bayer Maximilian
Sölch
SLIDE 15 Deep Variational Bayes Filtering: filtering in latent space of a variational autoencoder
z(t) z(t+1)
system state x(t) system state x(t+1)
A z(t) +
B u(t) +
C x(t+1) A z(t+1) +
B u(t+1) +
C x(t+2)
system state x(t+2)
z(t+2)
system state x(t+1) = process noise system state x(t+2) = process noise ~ ~ ~ ~ system state x(t) ~
Karl & Soelch & Bayer & van der Smagt, ICLR 2017
control input u(t) control input u(t+1)
SLIDE 16
Deep Variational Bayes Filter: example
ml2 ¨ ϕ(t) = −µ ˙ ϕ(t) + mgl sin ϕ(t) + u(t)
transition model: z(t+1) = A z(t) + B u(t) + C x(t+1)
SLIDE 17 Formula E use case with Audi Motorsport
Audi Motorsport is interested in optimal energy strategies. Knowing future battery temperature is key. Approach: Learn simulator of battery temperature given race conditions and control commands. Use simulator to choose strategy that has best temperature for final performance. Project… … initiated end of August,
… started a week later,
… deployed to hardware during test in November 2017,
… tested on car during race early December 2017. Results: error < ±1 degree in 50% of the races
baseline
Philip
Becker
SLIDE 18 Deep Variational Bayes Filter with DMP
zt zt+1 zt+2 xt xt+1 xt+2 ut ut+1
transition model:
⌧¨ zt+1 = ↵(z(zgoal − zt) − ˙ zt) + ft + ✏
<latexit sha1_base64="V9YsHb9yqwoZQLZOcyO0HhlUIk=">ACYnicbVFNTxsxEPVuS0vTAqE9lsMIVCkoarTLpVwqIXrhSKUGkOI0mnVmEwvh+xZpLDa/9LfxK2nXvpD8CahQtCRL15zPjeU5Kox1H0e8gfPFy49XrzTedt+2tne6u+8vXFZRUNVmMJeJejI6JyGrNnQVWkJs8TQZXL9rdUvb8g6XeQ/eFHSOMNZrlOtkD016S4kYwVyOi0YoJZJCrfNpOZ+3MBXkGjKOfZkQoyTW+it9Z8yQ57bDOpZgaBz/8K+dAnsu31wICn+qs9brHkqnTv7IBpEy4DnIF6Dg5N92f8lhDifdO98Z1VlLMy6Nwojkoe12hZK0NR1aOSlTXOKORhzlm5Mb10qIGPnlmCmlh/ckZluzjihoz5xZ4m+2y7mnWkv+TxtVnB6Pa52XFVOuVoPSygAX0PoNU21JsVl4gMpq/1ZQc7So2P9Kx5sQP135Obg4GsTRIP7u3TgVq9gUH8W+6IlYfBEn4kyci6FQ4k+wEWwHO8HfsBPuh9WV8NgXbPOHyLcuwcevrVx</latexit><latexit sha1_base64="pfwyswTS57MReyHjX4GVbk+rI98=">ACYnicbVFNb9NAEF27FEqANoUjHEatkIiIpsLXJAiuHAsEmkrZUM03oyTVdcf2h0juZb/RX9Zbz31wg9hnaQItYy0pv3dmZ23ial0Y6j6CYIdx7tPn6y97T37PmL/YP+4ctTV1RW0UQVprDnCToyOqcJazZ0XlrCLDF0lx87fSzX2SdLvIfXJc0y3CZ61QrZE/N+7VkrEAuFgUDNDJ4bKdNzyMW/gMEk25woFMiHF+CYOt/lNmyCubQbMs0LTw/m8hv/OJ7HrdMeCp4SZLve6xpNJp080+jkbROuAhiLfgeHwkh1c34/pk3r/2nVWVUc7KoHPTOCp51qBlrQy1PVk5KlFd4JKmHuaYkZs1a4taeOuZBaSF9SdnWLP/VjSYOVdnib/ZLefuax35P21acfp1ui8rJhytRmUVga4gM5vWGhLik3tASqr/VtBrdCiYv8rPW9CfH/lh+D0wyiORvF378YXsYk98VociYGIxUcxFt/EiZgIJW6D3WA/OAh+h73wMHy1uRoG25ptfhfhmz8oKLb3</latexit><latexit sha1_base64="pfwyswTS57MReyHjX4GVbk+rI98=">ACYnicbVFNb9NAEF27FEqANoUjHEatkIiIpsLXJAiuHAsEmkrZUM03oyTVdcf2h0juZb/RX9Zbz31wg9hnaQItYy0pv3dmZ23ial0Y6j6CYIdx7tPn6y97T37PmL/YP+4ctTV1RW0UQVprDnCToyOqcJazZ0XlrCLDF0lx87fSzX2SdLvIfXJc0y3CZ61QrZE/N+7VkrEAuFgUDNDJ4bKdNzyMW/gMEk25woFMiHF+CYOt/lNmyCubQbMs0LTw/m8hv/OJ7HrdMeCp4SZLve6xpNJp080+jkbROuAhiLfgeHwkh1c34/pk3r/2nVWVUc7KoHPTOCp51qBlrQy1PVk5KlFd4JKmHuaYkZs1a4taeOuZBaSF9SdnWLP/VjSYOVdnib/ZLefuax35P21acfp1ui8rJhytRmUVga4gM5vWGhLik3tASqr/VtBrdCiYv8rPW9CfH/lh+D0wyiORvF378YXsYk98VociYGIxUcxFt/EiZgIJW6D3WA/OAh+h73wMHy1uRoG25ptfhfhmz8oKLb3</latexit><latexit sha1_base64="6BxCZT2tXbXq1HPUK83Bib9Ib0=">ACYnicbVFNbxMxEPUuFEobUqPcBgRVQqiHa5wAWpgkuPRWraSnGIZp3ZxKr3Q/YsUraP8mNExd+SL3JUvWDkSy9ec8z43lOSqMdR9HvIHzydOvZ8+0XvZc7r3b3+vuvz1RWUVjVZjCXiboyOicxqzZ0GVpCbPE0EVy9a3VL36SdbrIz3hV0jTDRa5TrZA9NeuvJGMFcj4vGKCWSQrXzazmo7iBLyDRlEscyoQYZ9cw7PQfMkNe2gzqRYGmgQ+3hfzeJ7Lt9Y8BTx1tstTrHksqnTbt7E0itYBj0HcgYHo4nTW/+U7qyqjnJVB5yZxVPK0RstaGWp6snJUorCBU08zDEjN63XFjVw6Jk5pIX1J2dYs3crasycW2WJv9ku5x5qLfk/bVJx+nla67ysmHK1GZRWBriA1m+Ya0uKzcoDVFb7t4JaokXF/ld63oT4cqPwfnHURyN4u/R4PhrZ8e2eCPeiaGIxSdxLE7EqRgLJf4EW8FusBf8DXvhfniwuRoGXc2BuBfh2xsMP7Po</latexit>
Maximilian
Karl Nutan
Chen
SLIDE 19
Deep Variational Bayes Filtering: DMPs in latent space of a variational autoencoder
unsupervised
SLIDE 20 latent space z(t) is not straight!
latent space sampling #2: this is the optimal "shortest" path
unsupervised
Chen & Klushyn & Kurle & Bayer & van der Smagt, 2018
SLIDE 21 how does Geodesics work?
2 L(γ) := Z 1
∂t
Z 1
∂γ(t) ∂γ(t) ∂t
Z 1
∂t
is the Jacobian. Eq. (6) can be expressed as
min
ω
L(gω(t)) s.t. gω(0) = z0, gω(1) = z1.
Z 1 q⌦ γ0(t), γ0(t) ↵
γ(t)dt =
Z 1 q γ0(t)T G γ0(t)dt
tensor G = JT J.
the
curve in z neural network
MF := √ det G.
decoder NN length of γ
Richard
Kurle Alexej
Klushyn Nutan
Chen
SLIDE 22 ...on a 6-DoF robot arm...
Alex
Paraschos Alexej
Klushyn Nutan
Chen Djalel
Benbouzid
SLIDE 23
SLIDE 24 VOLKSWAGEN GROUP AI RESEARCH
Deep Variational Bayes Filter with a map
zt zt+1 zt+2 mt mt+1 mt+2
… …
xt+2 xt+1 xt M
… … … …
ut ut+1
Graphical model assumes global Map iii) Observations are extracted from map through attention model based on current location, iv) Latent state is identified with location.
Justin
Bayer Atanas
Mirchev Baris
Kayalibay
SLIDE 25 VOLKSWAGEN GROUP AI RESEARCH
Our approach is data-driven: deep neural networks, attention models and variational inference.
End-to-End SLAM
agent traversing map inverse pose
model & odometry
… …
sensor fusion attention model & grid-based map
SLIDE 26 mapping, localisation and planning—all in the same model. Navigation via optimal control:
The cost at the goal is 0 and -1 everywhere else. Optimisation is performed in a learned model and executed only after planning has finished.
- ptimal control of a learnt model
SLIDE 27 The Bayesian nature of the model allows a principled quantification
We can estimate how good the model knows certain regions of its environment. Optimal control drives the agent into unexplored regions.
exploration—maximise expected surprise
SLIDE 28 can it be efficiently computed?
Erwin Schrödinger, 1944: Negentropy Klyubin et al, 2005: Empowerment Wissner-Gross et al, 2013: Causal Entropic Forces
Karl & Sölch & Ehmck & Benbouzid & van der Smagt & Bayer: Unsupervised Real-Time Control through Variational Empowerment, arXiv 2017
Emp(s) :=
Empowerment
maxω ZZ p(z0, u | z) ln p(z0, u | z) p(z0 | z) ω(u | z) dz0 du
SLIDE 29 Maximilian
Sölch
empowerment is the channel capacity between action and the following state
how is efficient empowerment computed?
looking for the best state where each action has a meaningful consequence intractable (computed for all actions) plan
Karl & ... & van der Smagt & Bayer: Unsupervised Real-Time Control through Variational Empowerment, arXiv, 2017
approximate with lower
bound
Maximilian
Karl Justin
Bayer Philip
Becker Djalel
Benbouzid
SLIDE 30
empowerment on a pendulum
Empowerment on a pendulum
SLIDE 31 independent balls with 40-dimensional lidar sensors
unsupervised
Karl & ... & van der Smagt & Bayer: Unsupervised Real-Time Control through Variational Empowerment, arXiv, 2017
SLIDE 32
control through DVBF: exploration
SLIDE 33 control through DVBF: after unsupervised learning with Empowerment
unsupervised
Karl & ... & van der Smagt & Bayer: Unsupervised Real-Time Control through Variational Empowerment, arXiv, 2017
SLIDE 34
SLIDE 35
actions in lidar space empowerment