on variational inference and optimal control VO LKSWAGEN GRO U P AI - - PowerPoint PPT Presentation

on variational inference and optimal control
SMART_READER_LITE
LIVE PREVIEW

on variational inference and optimal control VO LKSWAGEN GRO U P AI - - PowerPoint PPT Presentation

on variational inference and optimal control VO LKSWAGEN GRO U P AI RESEARCH Patrick van der Smagt Director of AI Research Volkswagen Group Munich, Germany https://argmax.ai control approach #1: feedback control controller x d (t) u(t) K


slide-1
SLIDE 1
  • n variational inference and optimal control

Patrick van der Smagt

Director of AI Research Volkswagen Group Munich, Germany https://argmax.ai

VO LKSWAGEN GRO U P AI RESEARCH

slide-2
SLIDE 2

controller plant u(t) x(t+1)

  • xd(t)

K z-1

control approach #1: feedback control

problem: requires very fast feedback loop

slide-3
SLIDE 3

controller plant u(t) x(t+1)

  • xd(t)

K z-1

LQR

control approach #2: model-based feedback control

problem: requires fast feedback loop and inverse model

model-1

slide-4
SLIDE 4

controller simulator "model” u(t)

x(t+1:T)

  • xd(t)

K z-1 plant

u(t) x(t+1)

control approach #3: model-reference control

simulator "dreams" the future, aka predictive coding problem: how do I get this model?

slide-5
SLIDE 5

problems

1) engineered models are expensive to set up 2) engineered models are expensive to compute 3) engineered models do not scale

slide-6
SLIDE 6

we can write

we really want to represent p(x) p(x) = Z p(x | z) p(z) dz

z

x

slide-7
SLIDE 7

Two problems: (1) how do we shape to carry the right information of ? A: We don't hand-design it. 
 Assume it is a Gaussian pd. (2) how do we compute the integral? It is intractable (we only have the data; need MCMC)

we really want to represent p(x) p(x) = Z p(x | z) p(z) dz p(z) x

z

x

slide-8
SLIDE 8

Trick to do efficient MCMC: (1) we choose a specific and look in its neighbourhood (to find that most likely produced it) (2) use to sample the corresponding (3) evaluate there

we really want to represent p(z | x) p(z) p(x) = Z p(x | z) p(z) dz x p(x | z) p(z)

bummer, we don't have it

p(x)

z

x x

slide-9
SLIDE 9

Trick to do efficient MCMC: (1) we choose a specific and look in its neighbourhood (to find that most likely produced it) (2) use to sample the corresponding (3) evaluate there

we really want to represent p(z) p(x) = Z p(x | z) p(z) dz x p(x | z) p(z)

p(x)

q(z | x)

z

x x

slide-10
SLIDE 10

minimise Kullback-Leibler to make q look like p KL[q(z|x)kp(z|x)] = X

z

q(z|x) log q(z|x) p(z|x) = E[log q(z|x) log p(z|x)] = E  log q(z|x) log p(x|z)P(z) P(x)

  • = E[log q(z|x) log p(x|z) log p(z) + log p(x)]

log p(x) KL[q(z|x)kp(z|x)] = E[log p(x|z) (log q(z|x) log p(z))] = E[log p(x|z)] KL[q(z|x)kp(z)]

slide-11
SLIDE 11

log p(x) KL[q(z|x)kp(z|x)] = E[log p(x|z)] KL[q(z|x)kp(z)]

w e c a n g e t

  • u

r g e n e r a t i v e m

  • d

e l . . . . . . w h i l e w e w a n t q t

  • b

e c l

  • s

e t

  • p

. . . . . . b y m a x i m i s i n g t h e M L E f

  • r

x g i v e n z . . . (

  • p

t i m i s i n g t h e r e c

  • n

s t r u c t i

  • n

b y s a m p l i n g ) . . . a n d p l e a s e m a k e z e q u a l t

  • t

h e p r i

  • r

.

this is why I chose argmax.ai for our lab website

argmaxθ

I need this I can compute this

slide-12
SLIDE 12

efficient computation as a neural network: the Variational AutoEncoder

x = reconstruction of x p(x|z) system state x latent space z q(z|x) encoder decoder

} }

~

probability density with (Gauss) prior

loss = reconstruction loss + KL[q(z|x) || prior] "nonlinear PCA"

Durk Kingma and Max Welling, 2013 Rezende, Mohamed & Wierstra, 2014

z

x

slide-13
SLIDE 13

preprocessing sensor data with VAE—emerging properties

Maximilian Karl, Nutan Chen, Patrick van der Smagt (2014)

1 2 3 4 5 6 7 −150 −100 −50 50 100 150 200 time(s) taxel values

video: SynTouch, LLC

unsupervised

Maximilian
 Karl Nutan
 Chen

slide-14
SLIDE 14

Deep Variational Bayes Filter

Graphical model assumes
 latent Markovian dynamics i) Observations depend only on the current state, ii) State depends only on the previous state and control signal,

zt zt+1 zt+2 xt xt+1 xt+2 ut ut+1

p(x1:T , z1:T | u1:T ) = ρ(z1)

T −1

Y

t=1

p(zt+1 | zt, ut)

T

Y

t=1

p(xt | zt)

<latexit sha1_base64="451VOAlQ67V5poN/XQhpmDSxUM=">ACrnicbVHbtQwEHXCrYTbAo+8jKiQdtVlFfNCValSBRLisZV2txWbEDmO07XqxJY9QSxR/oWf4Qd4w/4CS8F6p2y0iWzpw5czwe50ZJh3H8Kwhv3b5z97O/ejBw0ePn/SePps63VguJlwrbc9y5oStZigRCXOjBWsypU4zS/eL+unX4R1UtdjXBiRVuy8lqXkD2V9X6YfpvkJXztspYejLshrNJvmxShZrqtlQAziExM51/5+QDiACSIzVBWQtHtLuczt+TtPAphLWYt79Kqfp3B4aY0DL46SYTKMtpyu+SzHxG2XAWS93XgUrwJuAroBu0cfv/5Tg5zno/k0LzphI1csWcm9HYNoyi5Ir0UVJ4Rh/IKdi5mHNauES9vVujt45ZkCSm39qRFW7NWOlXOLarcKyuGc7dW5L/q80aLPfTVtamQVHz9UVlowA1LP8OCmkFR7XwgHEr/azA58wyjv6HI78Euv3km2D6ZkTjET3x23hH1rFDXpCXpE8oeUuOyEdyTCaEB3vBSfApmIVxOA3TMFtLw2DT85xci3D+FzMgzno=</latexit><latexit sha1_base64="Ixq18gsPFA3NVFov5Lbq4w01tg4=">ACrnicbVHbtQwEHXCrYTbAo+8jKiQdtVlFfMCQqpUgYR4bKXdbcUmRI7jdK06cWRPEIsVfoWv4QfgiT/gJ5DwXqjaLSNZOnPmzPF4nDdKWozjX0F47fqNm7d2bkd37t67/6D38NHU6tZwMeFaXOSMyuUrMUEJSpx0hjBqlyJ4/zs7bJ+/EkYK3U9xkUj0oqd1rKUnKGnst73pu+SvITPXebo63E3hFX6ZNCUsliTbUbagD7kJi57v8T0gFEAEljdAGZw3afXTj57TzJEBzLnO4Ry/6eQqH59Y48OIoGSbDaMvpks9yTNx2GUDW241H8SrgKqAbsHvw7vefbz/18Os9yMpNG8rUSNXzNoZjRtMHTMouRJdlLRWNIyfsVMx87BmlbCpW627g2eKaDUxp8aYcVe7HCsnZR5V5ZMZzb7dqS/F9t1mL5KnWybloUNV9fVLYKUMPy76CQRnBUCw8YN9LPCnzODOPofzjyS6DbT74Kpi9GNB7RI7+N2QdO+QJeUr6hJKX5IC8J4dkQniwFxwFH4JZGIfTMA2ztTQMNj2PyaUI538BiLvQOQ=</latexit><latexit sha1_base64="Ixq18gsPFA3NVFov5Lbq4w01tg4=">ACrnicbVHbtQwEHXCrYTbAo+8jKiQdtVlFfMCQqpUgYR4bKXdbcUmRI7jdK06cWRPEIsVfoWv4QfgiT/gJ5DwXqjaLSNZOnPmzPF4nDdKWozjX0F47fqNm7d2bkd37t67/6D38NHU6tZwMeFaXOSMyuUrMUEJSpx0hjBqlyJ4/zs7bJ+/EkYK3U9xkUj0oqd1rKUnKGnst73pu+SvITPXebo63E3hFX6ZNCUsliTbUbagD7kJi57v8T0gFEAEljdAGZw3afXTj57TzJEBzLnO4Ry/6eQqH59Y48OIoGSbDaMvpks9yTNx2GUDW241H8SrgKqAbsHvw7vefbz/18Os9yMpNG8rUSNXzNoZjRtMHTMouRJdlLRWNIyfsVMx87BmlbCpW627g2eKaDUxp8aYcVe7HCsnZR5V5ZMZzb7dqS/F9t1mL5KnWybloUNV9fVLYKUMPy76CQRnBUCw8YN9LPCnzODOPofzjyS6DbT74Kpi9GNB7RI7+N2QdO+QJeUr6hJKX5IC8J4dkQniwFxwFH4JZGIfTMA2ztTQMNj2PyaUI538BiLvQOQ=</latexit><latexit sha1_base64="xj+iNcZHmN+e6TRSPrdvrVIbdkI=">ACrnicbVFNj9MwEHXC1xK+Chy5jKiQWm2pYi4gpJVWcOG4K7XdFU2IHMfZWusklj1BFCs/jz/AjX+D24bVbpeRL158+Z5PM61khbj+E8Q3rl7/6Dg4fRo8dPnj4bPH+xsE1ruJjzRjXmPGdWKFmLOUpU4lwbwapcibP8vOmfvZdGCubeoZrLdKXdSylJyhp7LBLz1ySV7Cjy5z9Osm8A2/dmnkFSy2FtT43hCBKzakb/hHQMEUCiTVNA5vCIdt/c7C3tPAmgr2QOD+l1P0/h5Moax14cJZNkEu053fDZjIn7LmPIBsN4Gm8DbgPagyHp4yQb/E6KhreVqJErZu2SxhpTxwxKrkQXJa0VmvFLdiGWHtasEjZ123V38MYzBZSN8adG2LXOxyrF1XuVdWDFd2v7Yh/1dbtlh+SJ2sdYui5ruLylYBNrD5OyikERzV2gPGjfSzAl8xwzj6H478Euj+k2+Dxbspjaf0NB4ef+rXcUBekdkRCh5T47JF3JC5oQHh8Fp8DVYhnG4CNMw20nDoO95SW5EuPoLF3Lbw=</latexit>

Maximilian
 Karl Justin
 Bayer Maximilian
 Sölch

slide-15
SLIDE 15

Deep Variational Bayes Filtering: filtering in latent space of a variational autoencoder

z(t) z(t+1)

system state x(t) system state x(t+1)

A z(t) +
 B u(t) +
 C x(t+1) A z(t+1) +
 B u(t+1) +
 C x(t+2)

system state x(t+2)

z(t+2)

system state x(t+1) = process noise system state x(t+2) = process noise ~ ~ ~ ~ system state x(t) ~

Karl & Soelch & Bayer & van der Smagt, ICLR 2017

control input u(t) control input u(t+1)

slide-16
SLIDE 16

Deep Variational Bayes Filter: example

ml2 ¨ ϕ(t) = −µ ˙ ϕ(t) + mgl sin ϕ(t) + u(t)

transition model: z(t+1) = A z(t) + B u(t) + C x(t+1)

slide-17
SLIDE 17

Formula E use case with Audi Motorsport

Audi Motorsport is interested in optimal energy strategies. Knowing future battery temperature is key. Approach: Learn simulator of battery temperature given race conditions and control commands. Use simulator to choose strategy that has best temperature for final performance. Project… … initiated end of August,
 … started a week later,
 … deployed to hardware during test in November 2017,
 … tested on car during race early December 2017. Results: error < ±1 degree in 50% of the races


baseline

  • ur method

Philip
 Becker

slide-18
SLIDE 18

Deep Variational Bayes Filter with DMP

zt zt+1 zt+2 xt xt+1 xt+2 ut ut+1

transition model:

⌧¨ zt+1 = ↵(z(zgoal − zt) − ˙ zt) + ft + ✏

<latexit sha1_base64="V9YsHb9yqwoZQLZOcyO0HhlUIk=">ACYnicbVFNTxsxEPVuS0vTAqE9lsMIVCkoarTLpVwqIXrhSKUGkOI0mnVmEwvh+xZpLDa/9LfxK2nXvpD8CahQtCRL15zPjeU5Kox1H0e8gfPFy49XrzTedt+2tne6u+8vXFZRUNVmMJeJejI6JyGrNnQVWkJs8TQZXL9rdUvb8g6XeQ/eFHSOMNZrlOtkD016S4kYwVyOi0YoJZJCrfNpOZ+3MBXkGjKOfZkQoyTW+it9Z8yQ57bDOpZgaBz/8K+dAnsu31wICn+qs9brHkqnTv7IBpEy4DnIF6Dg5N92f8lhDifdO98Z1VlLMy6Nwojkoe12hZK0NR1aOSlTXOKORhzlm5Mb10qIGPnlmCmlh/ckZluzjihoz5xZ4m+2y7mnWkv+TxtVnB6Pa52XFVOuVoPSygAX0PoNU21JsVl4gMpq/1ZQc7So2P9Kx5sQP135Obg4GsTRIP7u3TgVq9gUH8W+6IlYfBEn4kyci6FQ4k+wEWwHO8HfsBPuh9WV8NgXbPOHyLcuwcevrVx</latexit><latexit sha1_base64="pfwyswTS57MReyHjX4GVbk+rI98=">ACYnicbVFNb9NAEF27FEqANoUjHEatkIiIpsLXJAiuHAsEmkrZUM03oyTVdcf2h0juZb/RX9Zbz31wg9hnaQItYy0pv3dmZ23ial0Y6j6CYIdx7tPn6y97T37PmL/YP+4ctTV1RW0UQVprDnCToyOqcJazZ0XlrCLDF0lx87fSzX2SdLvIfXJc0y3CZ61QrZE/N+7VkrEAuFgUDNDJ4bKdNzyMW/gMEk25woFMiHF+CYOt/lNmyCubQbMs0LTw/m8hv/OJ7HrdMeCp4SZLve6xpNJp080+jkbROuAhiLfgeHwkh1c34/pk3r/2nVWVUc7KoHPTOCp51qBlrQy1PVk5KlFd4JKmHuaYkZs1a4taeOuZBaSF9SdnWLP/VjSYOVdnib/ZLefuax35P21acfp1ui8rJhytRmUVga4gM5vWGhLik3tASqr/VtBrdCiYv8rPW9CfH/lh+D0wyiORvF378YXsYk98VociYGIxUcxFt/EiZgIJW6D3WA/OAh+h73wMHy1uRoG25ptfhfhmz8oKLb3</latexit><latexit sha1_base64="pfwyswTS57MReyHjX4GVbk+rI98=">ACYnicbVFNb9NAEF27FEqANoUjHEatkIiIpsLXJAiuHAsEmkrZUM03oyTVdcf2h0juZb/RX9Zbz31wg9hnaQItYy0pv3dmZ23ial0Y6j6CYIdx7tPn6y97T37PmL/YP+4ctTV1RW0UQVprDnCToyOqcJazZ0XlrCLDF0lx87fSzX2SdLvIfXJc0y3CZ61QrZE/N+7VkrEAuFgUDNDJ4bKdNzyMW/gMEk25woFMiHF+CYOt/lNmyCubQbMs0LTw/m8hv/OJ7HrdMeCp4SZLve6xpNJp080+jkbROuAhiLfgeHwkh1c34/pk3r/2nVWVUc7KoHPTOCp51qBlrQy1PVk5KlFd4JKmHuaYkZs1a4taeOuZBaSF9SdnWLP/VjSYOVdnib/ZLefuax35P21acfp1ui8rJhytRmUVga4gM5vWGhLik3tASqr/VtBrdCiYv8rPW9CfH/lh+D0wyiORvF378YXsYk98VociYGIxUcxFt/EiZgIJW6D3WA/OAh+h73wMHy1uRoG25ptfhfhmz8oKLb3</latexit><latexit sha1_base64="6BxCZT2tXbXq1HPUK83Bib9Ib0=">ACYnicbVFNbxMxEPUuFEobUqPcBgRVQqiHa5wAWpgkuPRWraSnGIZp3ZxKr3Q/YsUraP8mNExd+SL3JUvWDkSy9ec8z43lOSqMdR9HvIHzydOvZ8+0XvZc7r3b3+vuvz1RWUVjVZjCXiboyOicxqzZ0GVpCbPE0EVy9a3VL36SdbrIz3hV0jTDRa5TrZA9NeuvJGMFcj4vGKCWSQrXzazmo7iBLyDRlEscyoQYZ9cw7PQfMkNe2gzqRYGmgQ+3hfzeJ7Lt9Y8BTx1tstTrHksqnTbt7E0itYBj0HcgYHo4nTW/+U7qyqjnJVB5yZxVPK0RstaGWp6snJUorCBU08zDEjN63XFjVw6Jk5pIX1J2dYs3crasycW2WJv9ku5x5qLfk/bVJx+nla67ysmHK1GZRWBriA1m+Ya0uKzcoDVFb7t4JaokXF/ld63oT4cqPwfnHURyN4u/R4PhrZ8e2eCPeiaGIxSdxLE7EqRgLJf4EW8FusBf8DXvhfniwuRoGXc2BuBfh2xsMP7Po</latexit>

Maximilian
 Karl Nutan
 Chen

slide-19
SLIDE 19

Deep Variational Bayes Filtering: DMPs in latent space of a variational autoencoder

unsupervised

slide-20
SLIDE 20

latent space z(t) is not straight!

latent space sampling #2: this is the optimal "shortest" path

unsupervised

Chen & Klushyn & Kurle & Bayer & van der Smagt, 2018

slide-21
SLIDE 21

how does Geodesics work?

2 L(γ) := Z 1

  • ∂f(γ(t))

∂t

  • dt =

Z 1

  • ∂f(γ(t))

∂γ(t) ∂γ(t) ∂t

  • dt =

Z 1

  • J∂γ(t)

∂t

  • dt,

is the Jacobian. Eq. (6) can be expressed as

min

ω

L(gω(t)) s.t. gω(0) = z0, gω(1) = z1.

  • ) =

Z 1 q⌦ γ0(t), γ0(t) ↵

γ(t)dt =

Z 1 q γ0(t)T G γ0(t)dt

tensor G = JT J.

the

curve in z neural network

MF := √ det G.

decoder NN length of γ

Richard
 Kurle Alexej
 Klushyn Nutan
 Chen

slide-22
SLIDE 22

...on a 6-DoF robot arm...

Alex
 Paraschos Alexej
 Klushyn Nutan
 Chen Djalel
 Benbouzid

slide-23
SLIDE 23
slide-24
SLIDE 24

VOLKSWAGEN GROUP AI RESEARCH

Deep Variational Bayes Filter with a map

zt zt+1 zt+2 mt mt+1 mt+2

… …

xt+2 xt+1 xt M

… … … …

ut ut+1

Graphical model assumes global Map iii) Observations are extracted from map through attention model based on current location, iv) Latent state is identified with location.

Justin
 Bayer Atanas
 Mirchev Baris
 Kayalibay

slide-25
SLIDE 25

VOLKSWAGEN GROUP AI RESEARCH

Our approach is data-driven: deep neural networks, attention models and variational inference.

End-to-End SLAM

agent traversing map inverse pose 
 model & odometry

… …

sensor fusion attention model & grid-based map

slide-26
SLIDE 26

mapping, localisation and planning—all in the same model. Navigation via optimal control: 
 The cost at the goal is 0 and -1 everywhere else. Optimisation is performed in a learned model and executed only after planning has finished.

  • ptimal control of a learnt model
slide-27
SLIDE 27

The Bayesian nature of the model allows a principled quantification

  • f uncertainty.

We can estimate how good the model knows certain regions of its environment. Optimal control drives the agent into unexplored regions.

exploration—maximise expected surprise

slide-28
SLIDE 28

can it be efficiently computed?

Erwin Schrödinger, 1944: Negentropy Klyubin et al, 2005: Empowerment Wissner-Gross et al, 2013: Causal Entropic Forces

Karl & Sölch & Ehmck & Benbouzid & van der Smagt & Bayer: Unsupervised Real-Time Control through Variational Empowerment, arXiv 2017

Emp(s) :=

Empowerment

maxω ZZ p(z0, u | z) ln p(z0, u | z) p(z0 | z) ω(u | z) dz0 du

slide-29
SLIDE 29

Maximilian
 Sölch

empowerment is the channel capacity between action and the following state

how is efficient empowerment computed?

looking for the best state where each action has a meaningful consequence intractable (computed for all actions) plan

Karl & ... & van der Smagt & Bayer: Unsupervised Real-Time Control through Variational Empowerment, arXiv, 2017

approximate with lower
 bound

Maximilian
 Karl Justin
 Bayer Philip
 Becker Djalel
 Benbouzid

slide-30
SLIDE 30

empowerment on a pendulum

Empowerment on a pendulum

slide-31
SLIDE 31

independent balls with 40-dimensional lidar sensors

unsupervised

Karl & ... & van der Smagt & Bayer: Unsupervised Real-Time Control through Variational Empowerment, arXiv, 2017

slide-32
SLIDE 32

control through DVBF: exploration

slide-33
SLIDE 33

control through DVBF: after unsupervised learning with Empowerment

unsupervised

Karl & ... & van der Smagt & Bayer: Unsupervised Real-Time Control through Variational Empowerment, arXiv, 2017

slide-34
SLIDE 34
slide-35
SLIDE 35

actions in lidar space empowerment