Bayesian Decision Theory in Sensorimotor Control Matthias - - PowerPoint PPT Presentation

bayesian decision theory in sensorimotor control
SMART_READER_LITE
LIVE PREVIEW

Bayesian Decision Theory in Sensorimotor Control Matthias - - PowerPoint PPT Presentation

TU Graz - Signal Processing and Speech Communication Laboratory Bayesian Decision Theory in Sensorimotor Control Matthias Freiberger, Martin Ottl Signal Processing and Speech Communication Laboratory Advanced Signal Processing Matthias


slide-1
SLIDE 1

TU Graz - Signal Processing and Speech Communication Laboratory

Bayesian Decision Theory in Sensorimotor Control

Matthias Freiberger, Martin ¨ Ottl

Signal Processing and Speech Communication Laboratory

Advanced Signal Processing

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 1/88

slide-2
SLIDE 2

TU Graz - Signal Processing and Speech Communication Laboratory

Outline

Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 2/88

slide-3
SLIDE 3

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - What is sensorimotor control?

”sen· so·ri·mo·tor: (adj.) Of, relating to, or involving both sensory and motor activity: sensorimotor nerve centers; sensorimotor pathways.”

The American Heritage Dictionary of the English Language, Fourth Edition

”Movement is the only way for humans to interact with the world. All communication including speech, sign language, gestures and writing, is mediated by the motor system.”

http://www.pom.cam.ac.uk/research/sensorimotor.html Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 3/88

slide-4
SLIDE 4

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - What is sensorimotor control?

We want to understand/describe by application methods from computer science and control theory how

◮ .. human beings are able to play back a tennis ball ◮ .. or grab a bottle of water and drink ◮ .. birds of prey are capable of catching a mouse in flight ◮ .. basically how any kind of physical interaction with the

environment is performed by biological systems, pursuing a certain objective while permanently performing corrections using sensor input

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 4/88

slide-5
SLIDE 5

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - Challenges

◮ Action selection is a fundamental decision process ◮ CNS sends constantly sends motor commands to the muscles ◮ At each point in time: the appropriate motor command needs

to be selected

◮ Knowledge about the environment needs to be combined with

actual observation data and knowledge about cost/reward of currently possible actions to make optimal decisions.

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 5/88

slide-6
SLIDE 6

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - Schematic Control Flow

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 6/88

slide-7
SLIDE 7

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - Uncertainty of human sensorium

◮ Human sensorium is plagued by noise ◮ Muscle output is noisy as well ◮ Therefore state of environment/body needs to be estimated ◮ Additionally the “cost” of each movement shall be minimized ◮ Bayesian statistics come in as a powerful way to deal with the

uncertainty of the human sensorium

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 7/88

slide-8
SLIDE 8

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - Bayesian integration

◮ CNS needs to integrate prior knowledge about environment

with knowledge obtained from sensory data to estimate the state of the environment optimally

◮ When estimating bounce location of a tennis ball: ball might

be more likely to bounce off at edges of court

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 8/88

slide-9
SLIDE 9

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - Bayesian Cue Combination

Combination of sensor signals for better estimates

◮ Combination of different sensor modalities (e.g. Vision and

Proprioception)

◮ Combination of signal of same modality (several visual cues to

a stereo image... )

◮ Cues need to be weighted against each other

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 9/88

slide-10
SLIDE 10

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - Bayesian Cue Combination

Given a set of observations from different cues d1, d2, d3, ..., dn under the assumption that cues are independent from each other we can rewrite the likelihood P(d1, d2, d3, ..., dn) as P(d1, d2, d3, ..., dn|s) =

n

  • k=1

P(dk|s) (1) Therefore we can rewrite the corresponding posterior probability: P(s|d1, d2, d3, ..., dn) = P(s) · n

k=1 P(dk|s)

P(d1, d2, d3, ..., dn) (2)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 10/88

slide-11
SLIDE 11

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - Cost Functions

◮ Model how good or bad the outcome of a particular move is ◮ Seems reasonable to minimize consumed energy and strain on

muscles

◮ Several cost functions have been proposed

(smoothness,precision)

◮ CNS also adapts very well to external cost functions

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 11/88

slide-12
SLIDE 12

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - Cost Functions

◮ Actual cost function of human movement can be inferred

using indifference lines

◮ Utility function can be found from these lines : compare

points from lines,and assigning utilities to lines

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 12/88

slide-13
SLIDE 13

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - Cost Functions

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 13/88

slide-14
SLIDE 14

TU Graz - Signal Processing and Speech Communication Laboratory

Intro - Cost Functions

Given a set of possible actions X and a set of possible outcomes O, as well as a utility function U(o) : O → R, for any x ∈ X we can compute the expected utility E{U} =

  • O

P(o|x) · U(o) (3) Therefore the optimal decision in respect to the cost function U(o) is considered to be the one which maximizes the expected utility E{U}.

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 14/88

slide-15
SLIDE 15

TU Graz - Signal Processing and Speech Communication Laboratory

Outline

Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 15/88

slide-16
SLIDE 16

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation – Intro

Until now

Find the optimal action for a finite amount of actions

But the world is continious...

Actual continuos state of our body parts has to be estimated permanently, optimal actions according to state estimation need to be found.

In control terms...

We need to model ourselves an observer, which estimates the inner state (e.g the position and velocity) of our limbs

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 16/88

slide-17
SLIDE 17

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation – Experiment

Experiment setup

◮ Test subjects had to estimate the location of their thumb

after moving their arm

◮ Resistive or assistive force has been added by torque motors ◮ Hand is constrained to move on a straight line ◮ Arm is illuminated for 2s, to give an initial state ◮ After that, participants have to rely solely on proprioception

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 17/88

slide-18
SLIDE 18

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation – Experiment

Experiment setup

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 18/88

slide-19
SLIDE 19

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation – Models

A system that mimics the behavior of a natural process, is called an internal model Internal models are an important concept in motor control Basically, two classes of internal models can be distinguished: forward models and backward models

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 19/88

slide-20
SLIDE 20

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation – Internal models: forward vs. backward

Forward models

◮ Mimic the causal flow of a process by predicting its next state ◮ Comes up natural since delays in most sensorimotor loops are

large,feedback control may be too slow for rapid movements

◮ Key indegredient in systems that use motor outflow (efference

copy)

Backward models

◮ Estimate the appropriate motor command which caused a

particular state transition

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 20/88

slide-21
SLIDE 21

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation – Internal models: forward vs. backward

How do we optimally model our limbs now?

◮ Wolpert et. al. used a forward model incorparating a

correction term for the given problem.

◮ State estimation for a system containing noise is a complex

task

◮ We will follow an intuitive approach by modeling an observer

for a deterministic system first

◮ From our deterministic observer, we will perform the

transition to a Probabilistic Observer ( Kalman Filter)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 21/88

slide-22
SLIDE 22

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation - The Plant

Model arm as a damped mass system

State model

˙ x = Ax + bu state update equation y = cT x + du model for sensory output x

State variables

x1 position of the mass (hand) u(t) applied force x2 velocity of the mass (hand) x y(t) sensory output

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 22/88

slide-23
SLIDE 23

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation - The Plant

Model parameters

A = 1 − β

m

  • b =

1 m

  • c =

1

  • d =
  • m

mass of hand β damping parameter

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 23/88

slide-24
SLIDE 24

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Luenberger Observer

Observer Model

u(t)

r r r r r r

˙ x = Ax + bu Observer y(t)

Ansatz for the Luenberger Observer

˙ ˆ x = ˆ Aˆ x + ˆ b1u + ˆ b2y (4)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 24/88

slide-25
SLIDE 25

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Luenberger Observer

Derivation

Error constraint e(t) = x(t) − ˆ x(t) lim

t→∞ e(t) = 0

˙ e = ˙ x − ˙ ˆ x = (Ax + bu) − ( ˆ Aˆ x + ˆ b1u + ˆ b2y) Set y = cT x and rearrange the equation ˙ e = (A − ˆ b2cT )x − ˆ Aˆ x + (b − ˆ b1)u

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 25/88

slide-26
SLIDE 26

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Luenberger Observer

Derivation

˙ e = (A − ˆ b2cT )x − ˆ Aˆ x + (b − ˆ b1)u Error shall be independent from the input → set ˆ b1 = b ˙ e = (A − ˆ b2cT )x − ˆ Aˆ x Choose ˆ A = A − ˆ b2cT and get for the error ˙ e = (A − ˆ b2cT )e Final model: ˙ ˆ x = (A − ˆ b2cT )ˆ x + bu + ˆ b2y

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 26/88

slide-27
SLIDE 27

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Luenberger Observer

Derivation

˙ ˆ x = (A − ˆ b2cT )ˆ x + ˆ b1u + ˆ b2y Rewrite ˆ b2 = ˆ b and cT ˆ x = ˆ y ˙ ˆ x = Aˆ x − ˆ bˆ y + ˆ by + bu Comprehend terms ˙ ˆ x = Aˆ x + bu + ˆ b(y − ˆ y)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 27/88

slide-28
SLIDE 28

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Luenberger Observer

Where are our models now?

ˆ ˙ x = Aˆ x + bu

  • Forward model

+ ˆ b(y − ˆ y)

  • Sensory correction

◮ Forward model takes the actual state estimate, tries to predict

the further trend of the state

◮ Use difference between actual sensory feedback y prediction ˆ

y weighted by ˆ b to update state estimate.

How to choose ˆ b?

For deterministic Systems: Choose ˆ b such that (A − ˆ b2cT ) is asymptotically stable.

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 28/88

slide-29
SLIDE 29

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Probabilistic Observer

Real world can be mean and difficult

Noise is everywhere..

◮ Circuits are plagued by noise ◮ so are radio transmissions ◮ and even our body

u(t)

r r r r r r

˙ x = Ax + bu ˆ ˙ x = Aˆ x + bu + ˆ b(y − ˆ y) y(t)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 29/88

slide-30
SLIDE 30

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Probabilistic Observer

Real world can be mean and difficult

Noise is everywhere..

◮ Circuits are plagued by noise ◮ so are radio transmissions ◮ and even our body

u(t)

r r

v(t)

r r r r r r r

˙ x = Ax + bu + w Observer? y(t)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 30/88

slide-31
SLIDE 31

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Probabilistic Observer

Stochastic model

˙ x = Ax + bu + w state update equation y = Cx + v model for sensory output x w(t) motor noise v(t) sensory noise

◮ w and v are Random Variables ◮ Therefore, the state vector x is a vector of RVs as well ◮ This means that we need a Bayesian estimator to estimate the

mean x and covariance matrix P of an RV X

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 31/88

slide-32
SLIDE 32

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Probabilistic Observer

Some simplifications

We assume that our noise is Additive White Gaussian Noise, as well as uncorrelated from the initial state x0 w(t) ∈ N (0, Qc) v(t) ∈ N (0, Rc) It can be shown that in this case, the minimum variance estimator is the Kalman Filter.

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 32/88

slide-33
SLIDE 33

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Kalman Filter

model for a Kalman filter

˙ ˆ x = Aˆ x + bu

  • Forwardmodel

+ Kt(y − ˆ y)

  • Sensorycorrection

(5)

Computation of K and P

Kt = PtCT R−1

c

Kalman matrix ˙ Pt = −KtR−1

c KtT + APt + PtAT + Qc

Update rule for P

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 33/88

slide-34
SLIDE 34

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Experiment results

Test probands (GAM)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 34/88

slide-35
SLIDE 35

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Experiment results

Kalman Filter

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 35/88

slide-36
SLIDE 36

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Experiment results

Test probands (GAM) Kalman Filter

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 36/88

slide-37
SLIDE 37

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Estimation- Experiment conclusions

Kalman Filter

◮ Curves are quite similar ◮ Noticeable peak at 1s seems to be a tradeoff between forward

model and backward model

◮ Variance jitter on experiment for changing forces,no force

dependent change in variance is predicted by the Kalman filter

◮ The experiment provides support for the use of forward

models applying sensory correction

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 37/88

slide-38
SLIDE 38

TU Graz - Signal Processing and Speech Communication Laboratory

Outline

Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 38/88

slide-39
SLIDE 39

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Feedback Control

Introduction

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 39/88

slide-40
SLIDE 40

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – Markov Decision Process (MDP)

Some notation

x ∈ X state of the Markov process u ∈ U(x) action / control in state x p(x′|x, u) control-dependent transition probability distribution ℓ(x, u) ≥ 0 immediate cost for choosing control u in state x

Shortest Path problem

5 3 5 1 3 1 2 3 3 1 4 3 2 Target Cumulative cost: 5 Immediate cost: 2

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 40/88

slide-41
SLIDE 41

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – MDP First exit formulation (1)

Goal

◮ find for each state a control law / policy u = π(x) ∈ U(x)

which moves the trajectory towards a terminal state x ∈ T .

◮ each trajectory should cause the lowest total cost vπ(x).

vπ(x) is also called cost-to-go.

◮ cost at terminal state is vπ(x) = qT (x).

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 41/88

slide-42
SLIDE 42

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – MDP First exit formulation (2)

Cost-to-go as path sum

vπ(x) = Ex0=x

xk+1∼p(.|xk,πk(xk))

  • qT (xtfirst) + tfirst−1

k=0

ℓ(xk, πk(xk))

  • Definition Hamiltonian

H [x, u, v(·)]

def

= ℓ(x, u) + Ex′∼p(·|x,u) {v(x′)}

Bellman equations

policy-specific cost-to-go vπ(x) = H [x, π(x), vπ(·)]

  • ptimal cost-to-go

v∗(x) = minu∈U(x) H [x, u, v∗(·)]

  • ptimal policy

π∗(x) = argminu∈U(x) H [x, u, v∗(·)]

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 42/88

slide-43
SLIDE 43

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – MDP Finite horizon formulation

All trajectories end at t = N.

Cost-to-go as path sum

t (x) = Ext=x xk+1∼p(.|xk,πk(xk))

  • qT (xN) + N−1

k=t ℓ(xk, πk(xk))

  • Definition Hamiltonian

H [x, u, v(·)]

def

= ℓ(x, u) + Ex′∼p(·|x,u) {v(x′)}

Bellman equations

policy-specific cost-to-go vπ

t (x) = H

  • x, πt(x), vπ

t+1(·)

  • ptimal cost-to-go

v∗

t (x) = minu∈U(x) H

  • x, u, v∗

t+1(·)

  • ptimal policy

π∗

t (x) = argminu∈U(x) H

  • x, u, v∗

t+1(·)

  • Matthias Freiberger, Martin ¨

Ottl Advanced Signal Processing page 43/88

slide-44
SLIDE 44

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – MDP Infinite horizon discounted cost form.

Trajectories continue forever; future costs are exponentially discounted with α < 1 to ensure a finite cost-to-go.

Cost-to-go as path sum

vπ(x) = Ex0=x

xk+1∼p(.|xk,πk(xk))

k=0 αkℓ(xk, π(xk))

  • Definition Hamiltonian

Hα [x, u, v(·)]

def

= ℓ(x, u) + α Ex′∼p(·|x,u) {v(x′)}

Bellman equations

policy-specific cost-to-go vπ(x) = Hα [x, π(x), vπ(·)]

  • ptimal cost-to-go

v∗(x) = minu∈U(x) Hα [x, u, v∗(·)]

  • ptimal policy

π∗(x) = argminu∈U(x) Hα [x, u, v∗(·)]

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 44/88

slide-45
SLIDE 45

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – MDP Infinite horizon average cost formulation

Trajectories continue forever; there is no discounting and therefore the resulting cost-to-go is infinte.

Average cost-to-go

cπ = limN→∞ 1

N vπ,N

(x)

Differential cost-to-go

˜ vπ(x) = vπ,N (x) − Ncπ

Bellman equations

policy-specific cost-to-go cπ + ˜ vπ(x) = H [x, π(x), ˜ vπ(·)]

  • ptimal cost-to-go

c∗ + ˜ v∗(x) = minu∈U(x) H [x, u, ˜ v∗(·)]

  • ptimal policy

π∗(x) = argminu∈U(x) H [x, u, ˜ v∗(·)]

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 45/88

slide-46
SLIDE 46

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – MDP Solution

Algorithms for calculating a optimal cost-to-go

◮ Value Iteration ◮ Policy Iteration ◮ Linear Programming

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 46/88

slide-47
SLIDE 47

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – Continuous-time stochastic system (1)

System Dynamics

dx = f(x, u) dt + G(x, u) dξ x(t) ∈ Rn state vector u(t) ∈ Rm control vector ξ(t) ∈ Rk Brownian motion (integral of white noise)

Interpretation

x(t) − x(0) = t f(x(s), u(s)) ds + t G(x(s), u(s)) dξ(s)

The last integral is an Ito integral.

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 47/88

slide-48
SLIDE 48

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – Continuous-time stochastic system (2)

Ito integral

An Ito integral for a square integrable function g(t) is defined as following

t g(s) dξ(s) = lim

n→∞ n−1

  • k=0

g(sk)(ξ(sk+1) − ξ(sk))

with 0 = s0 < s1 < · · · < sn = t

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 48/88

slide-49
SLIDE 49

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – Continuous-time stochastic system (3)

Definition Hamiltonian

H [x, u, v(·)]

def

= ℓ(x, u) + f(x, u)T vx(x) + 1 2tr (Σ(x, u)vxx(x)) Σ(x, u) = G(x, u) G(x, u)T is the noise covariance.

Hamilton-Jacobi-Bellman (HJB) equations for optimal cost-to-go

first exit

0 = minu H [x, u, v∗(·)] v∗(x ∈ T ) = qT (x)

finite horizon

−v∗

t (x, t) = minu H [x, u, v∗(·, t)]

v∗(x, T ) = qT (x)

discounted

1 τ v∗(x) = minu H [x, u, v∗(·)]

average

c = minu H [x, u, ˜ v∗(·)]

discounted cost-to-go vπ(x) = E ∞

0 exp(−t/τ) ℓ(x(t), u(t)) dt

  • Matthias Freiberger, Martin ¨

Ottl Advanced Signal Processing page 49/88

slide-50
SLIDE 50

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – Inverse pendulum example (1)

Task

◮ find optimal control law

for inverse pendulum ¨ θ = k sin(θ) + u force of gravity k, angle θ, torque u

◮ state dependent cost

q(θ) = 1 − exp(−2θ2)

◮ control dependent cost r 2u2 ◮ overall cost per step

ℓ(x, u) = q(θ) + r

2u2

Mechanics

q k

State dependent cost

x1 = q x2 = q .

  • p

p +8

  • 8

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 50/88

slide-51
SLIDE 51

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – Inverse pendulum example (2)

Stochastic dynamics

¨ θ = k sin(θ) + u ⇒ dx = (a(x) + Bu)dt + Gdξ x = x1 x2

  • =

θ ˙ θ

  • , a(x) =
  • x2

k sin(x1)

  • , B =

1

  • , G =

σ

  • Discounted HJB equation (from above)

1 τ v∗(x) = minu H [x, u, v∗(·)]

= minu

  • ℓ(x, u) + f(x, u)T v∗

x(x) + 1 2tr (Σ(x, u)v∗ xx(x))

  • HJB for inverse pendulum

1 τ v∗(x) =

minu

  • q(x) + r

2u2 + (a(x) + Bu)T v∗ x(x) + 1 2tr

  • GGT v∗

xx(x)

  • Matthias Freiberger, Martin ¨

Ottl Advanced Signal Processing page 51/88

slide-52
SLIDE 52

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – Inverse pendulum example (3)

Optimal control law minimizes Hamiltonian

◮ differentiate Hamiltonian and set it zero ◮ ru + BT v∗ x(x) = 0

u = − 1

rBT v∗ x(x) = − 1 rv∗ x2(x)

Remarks

◮ v∗ x is also called costate vector ◮ optimal control law depends on multiplication of a matrix

containing system dynamics and energy costs with costate vector

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 52/88

slide-53
SLIDE 53

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – Inverse pendulum example (4)

Calculation of costate vector

◮ insert optimal control law into HJB 1 τ v∗(x) = q(x) − 1 2rv∗ x2 2 + x2v∗ x1 + k sin(x1)v∗ x2 + 1 2σ2v∗ x2x2 ◮ construct MDP (discretize state space, approximate derivates

with finite differences, ...)

◮ solve MDP

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 53/88

slide-54
SLIDE 54

TU Graz - Signal Processing and Speech Communication Laboratory

Intro – Inverse pendulum example (5)

State dependent cost q(x)

x1 = q x2 = q .

  • p

p +8

  • 8

Optimal cost-to-go v∗(x)

x1 = q x2 = q .

  • p

p +8

  • 8

Optimal policy u = − 1

rv∗ x2(x)

x1 = q x2 = q .

  • p

p +8

  • 8

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 54/88

slide-55
SLIDE 55

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Feedback Control

Linear Quadratic Gaussian Framework (LQG)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 55/88

slide-56
SLIDE 56

TU Graz - Signal Processing and Speech Communication Laboratory

LQG – Linear Quadratic Gaussian framework

In most cases a optimal control law can’t be obtained in closed form, one exception is the LQG system.

LQG properties

◮ linear dynamics ◮ quadratic costs ◮ additive Gaussian noise (if present)

Here the Hamiltonian can be minimized analytically.

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 56/88

slide-57
SLIDE 57

TU Graz - Signal Processing and Speech Communication Laboratory

LQG – Continuous-time stochastic system

Continuous-time LQG

dynamics dx = (Ax + Bu) dt + Gdξ cost rate ℓ(x, u) = 1

2uT Ru + 1 2xT Qx

final cost h(x) = 1

2xT Qfx

R control costs matrix (symmetric positive definite) Q state costs matrix (symmetric) Qf final state cost matrix (symmetric)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 57/88

slide-58
SLIDE 58

TU Graz - Signal Processing and Speech Communication Laboratory

LQG – Derivation cont.-time stochastic system (1)

Guess for optimal value function

v(x, t) = 1

2xT V (t)x + a(t), V (t) symmetric

Derivatives

vt(x, t) = 1

2xT ˙

V (t)x + ˙ a(t) vx(x, t) = V (t) x vxx(x, t) = V (t)

Substitution into finite horizon HJB

− 1

2xT ˙

V (t)x − ˙ a(t) = minu 1

2uT Ru + 1 2xT Qx + (Ax + Bu)T V (t)x + 1 2tr(GGT V (t))

  • Remember

∂xT Ax ∂x

=

  • A + AT

x

∂aT x ∂x

= ∂xT a

∂x

= a

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 58/88

slide-59
SLIDE 59

TU Graz - Signal Processing and Speech Communication Laboratory

LQG – Derivation cont.-time stochastic system (2)

Analytically found minimum

u = −R−1BT V (t)x

Using this u, the control dependent part in HJB becomes

1 2uT Ru + (Bu)T V (t)x = − 1 2xT V (t)BR−1BT V (t)x

Simplifications (V is symmetric)

◮ xT AT V x = xT V T Ax = xT V Ax ◮ 2xT AT V x = xT AT V x + xT V Ax

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 59/88

slide-60
SLIDE 60

TU Graz - Signal Processing and Speech Communication Laboratory

LQG – Derivation cont.-time stochastic system (3)

Regrouping of the HJB equation yields to

− 1

2xT ˙

V (t)x − ˙ a(t) = − 1

2xT

Q + AT V (t) + V (t)A − V (t)BR−1BT V (t)

  • x +

1 2tr(GGT V (t))

Our guess of the optimal value function is correct iff both following equations hold; the first one is called continuous-time Riccati equation − ˙ V (t) = Q + AT V (t) + V (t)A − V (t)BR−1BT V (t) −˙ a(t) = 1

2tr(GGT V (t))

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 60/88

slide-61
SLIDE 61

TU Graz - Signal Processing and Speech Communication Laboratory

LQG – Derivation cont.-time stochastic system (4)

Boundary conditions (we used finite horizon HJB)

◮ v(x, tf) = 1 2xT V (tf)x + a(tf) = h(x) ◮ V (tf) = Qf ◮ a(tf) = 0

V (t), a(t) can be obtained by using the boundary conditions and integrating − ˙ V (t) = . . . and −˙ a(t) = . . . backward in time

Optimal control law (repeated from above)

u = −R−1BT V (t)x ⇒ control law is independent of noise

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 61/88

slide-62
SLIDE 62

TU Graz - Signal Processing and Speech Communication Laboratory

LQG – Discrete-time stochastic system (1)

In practice one uses usually discrete time systems.

Discrete-Time LQG

dynamics xt+1 = Axt + But + ξt cost rate ℓ(x, u) = 1

2uT t Rut + 1 2xT t Qxt

final cost h(x) = 1

2xT tf Qfxtf

Optimal control law

ut = −Ltxt

Control gain

Lt =

  • R + BT Vt+1B

−1 BT Vt+1A

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 62/88

slide-63
SLIDE 63

TU Graz - Signal Processing and Speech Communication Laboratory

LQG – Discrete-time stochastic system (2)

Discrete-time Riccati equation

Vt = Qt + AT Vt+1 (A − BLt)

Solving above equations

◮ control gain is independent of state sequence and can be

computed offline

◮ Vt is computed by initializing Vtf = Qf and iterating the

Riccati equation backward in time

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 63/88

slide-64
SLIDE 64

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Feedback Control

LQG + Kalman Filter

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 64/88

slide-65
SLIDE 65

TU Graz - Signal Processing and Speech Communication Laboratory

LQG + Kalman Filter – Overview

◮ controller generates motor command ut and needs actual

state estimate ˆ xt

◮ estimator compensates sensory delays by the usage of the

efference copy ut

◮ controller and estimator operate in a loop and therefore can

generate motor commands even when sensory data become unavailable

Controller (LQG) Estimator (Kalman Filter) Biomechanical plant Sensory apparatus Motor command ut Sensory data yt Measurement noise wt Process noise xt Efference copy ut State xt Estimated state xt ^ Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 65/88

slide-66
SLIDE 66

TU Graz - Signal Processing and Speech Communication Laboratory

LQG + Kalman Filter – System model

System model

dynamics xt+1 = Axt + But + ξt feedback yt = Hxt + ωt cost per step ℓ(x, u) = xT

t Qtxt + uT t Rut

ξt process noise, Gaussian with zero mean and covariance Ωξ ωt measurement noise, Gaussian with zero mean and covariance Ωω H

  • bservation matrix

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 66/88

slide-67
SLIDE 67

TU Graz - Signal Processing and Speech Communication Laboratory

LQG + Kalman Filter – Controller/Estimator

Kalman Filter

state estimate ˆ xt+1 = Aˆ xt + But + Kt (yt − Hˆ xt) filter gain Kt = AΣtHT HΣtHT + Ωω−1 estimation error Σt+1 = Ωξ + (A − KtH) ΣtAT covariance ˆ x0 and Σ0 is given!

Linear-Quadratic Regulator (LQR)

control law ut = −Ltˆ xt control gain Lt =

  • R + BT Vt+1B

−1 BT Vt+1A Riccati equation Vt = Qt + AT Vt+1 (A − BLt)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 67/88

slide-68
SLIDE 68

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Feedback Control

LQG Multiplicative Noise

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 68/88

slide-69
SLIDE 69

TU Graz - Signal Processing and Speech Communication Laboratory

LQG Multiplicative Noise – Overview

Motivation

◮ Fitts law ”Faster movements are less accurate” suggests that

noise is control dependent

◮ standard deviation of muscle force can be good approximated

by a linear function of mean force

◮ no explicit smoothness cost formulation necessary to achieve

smooth trajectories

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 69/88

slide-70
SLIDE 70

TU Graz - Signal Processing and Speech Communication Laboratory

LQG Multiplicative Noise – Definition

System Model

dynamics xt+1 = Axt + But + ξt+ c

i=1 εi tCiut

feedback yt = Hxt + ωt+ d

i=1 ǫi tDixt

cost per step ℓ(x, u) = xT

t Qtxt + uT t Rut

Ci scaling matrices for control-dependent system noise εi

t

ith control dependent noise component Gaussian with zero mean and covariance Ωε = I Di scaling matrices for state-dependent observation noise ǫi

t

ith state dependent noise component Gaussian with zero mean and covariance Ωǫ = I

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 70/88

slide-71
SLIDE 71

TU Graz - Signal Processing and Speech Communication Laboratory

LQG Multiplicative Noise – Controller/Estimator

Estimator

ˆ xt+1 = (A − BLt)ˆ xt + Kt(yt − Hˆ xt) + ηt

Controller

ut = −Ltˆ xt

Properties

◮ considers also internal noise ηt ◮ independence of estimation and control isn’t given anymore ◮ Kt and Lt are calculated offline; equations see [4]

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 71/88

slide-72
SLIDE 72

TU Graz - Signal Processing and Speech Communication Laboratory

LQG Multiplicative Noise – Algorithm

Algorithm to calculate Kt and Lt

  • 1. initialize filter gains K1 . . . Kn−1 with zero or with Kalman

filter gain

  • 2. calculate control gain Lt backward in time
  • 3. calculate filter gain Kt in a forward pass through time
  • 4. repeat 2. and 3. until convergence

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 72/88

slide-73
SLIDE 73

TU Graz - Signal Processing and Speech Communication Laboratory

LQG Multiplicative Noise – Example (1)

Task

◮ 1 D positioning of point mass ◮ from start position p(0) = 0 to

target position p∗

◮ time step ∆; duration tend = 0.3s ◮ minimal energy consumption

Biomechanical plant

p* p(t) f(t) m = 1kg muscle-like low pass filter f(t) p(t)

+ X

u(t)

e(t)

Dynamics mechanic

p(t + ∆) = p(t) + ˙ p(t)∆ ˙ p(t + ∆) = ˙ p(t) + f(t)∆/m

Dynamics muscle (time constants τ1, τ2)

f(t + ∆) = f(t)(1 − ∆/τ2) + g(t)∆/τ2 g(t + ∆) = g(t)(1 − ∆/τ1) + u(t)(1 + σcεt)∆/τ1

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 73/88

slide-74
SLIDE 74

TU Graz - Signal Processing and Speech Communication Laboratory

LQG Multiplicative Noise – Example (2)

Dynamics matrix formulation

xt+1 = Axt + But + εtC1ut xt =

  • p(t)

˙ p(t) f(t) g(t) p∗T A =       1 ∆ 1 ∆/m 1 − ∆/τ2 ∆/τ2 1 − ∆/τ1 1       , B =       ∆/τ1       , C1 = Bσc

Feedback matrix formulation

yt = Hxt + ωt H =   1 1 1  

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 74/88

slide-75
SLIDE 75

TU Graz - Signal Processing and Speech Communication Laboratory

LQG Multiplicative Noise – Example (3)

Total cost

(p(tend) − p∗)2

  • (1)

+ (wv ˙ p(tend))2

  • (2)

+ (wf f(tend))2

  • (3)

+ r n − 1

n−1

  • k=1

u2(k∆)

  • (4)

◮ (1) penalizes deviations from target position ◮ (2)+(3) force that movement must be finished at tend ◮ (4) ensures energy minimization ◮ wv, wf and r are corresponding weights

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 75/88

slide-76
SLIDE 76

TU Graz - Signal Processing and Speech Communication Laboratory

LQG Multiplicative Noise – Example (4)

Cost per step matrix formulation

◮ we define p =

  • 1

−1 T and can write p(tend) − p∗ = pT xt, therefore term (1) can be expressed as xT

t (ppT )xt ◮ for term (2) and (3) we use v =

  • wv

T and f =

  • wf

T

◮ that leads to

ℓ(x, u) = xT

t Qtxt + uT t Rut with

Q1,...,n−1 = 0, Qn = ppT + vvT + ff T and R = r

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 76/88

slide-77
SLIDE 77

TU Graz - Signal Processing and Speech Communication Laboratory

LQG Multiplicative Noise – Example (5)

Resulting trajectories

◮ smooth trajectories without modeling smoothness in the costs ◮ system can be unstable, but not encountered in problems the

author is dealing with

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 77/88

slide-78
SLIDE 78

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Feedback Control

Minimal Intervention Principle

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 78/88

slide-79
SLIDE 79

TU Graz - Signal Processing and Speech Communication Laboratory

Minimal Intervention Principle – Definition

Definition

ignore task-irrelevant deviations

Simple example

◮ x1, x2 are uncoupled state variables ◮ states are driven by u1, u2 ◮ control multiplicative noise ◮ initial state is sampled from a circular Gaussian

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 79/88

slide-80
SLIDE 80

TU Graz - Signal Processing and Speech Communication Laboratory

Minimal Intervention Principle – Example 1

Task

◮ x1 + x2 = target ◮ use small u1, u2

Optimum

◮ u1 = u2 ◮ control law depends on x1 + x2 ◮ ⇒ u1, u2 form motor synergy

Result

◮ black ellipse shows distribution of

final states

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 80/88

slide-81
SLIDE 81

TU Graz - Signal Processing and Speech Communication Laboratory

Minimal Intervention Principle – Example 2

Alternative control law

◮ x1 = x2 = target/2

Results

◮ gray circle shows distribution of

final states

◮ variance in redundant direction is

reduced

◮ variance in task relevant direction

is increased

◮ control signals are increased (⇒

not optimal)

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 81/88

slide-82
SLIDE 82

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Feedback Control

Hierarchical Optimal Controller

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 82/88

slide-83
SLIDE 83

TU Graz - Signal Processing and Speech Communication Laboratory

Hierarchical Optimal Controller – Overview

Principle

◮ low-level controller generates abstract

representation y(x) of state x

◮ high-level controller generates

commands v(y) to change y

◮ low-level controller computes energy

efficient controls u(v, x) consistent with v

Comparison with example 1, minimal intervention principle

◮ y = x1 + x2 ◮ v = f(y) ◮ u =

  • v

v T

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 83/88

slide-84
SLIDE 84

TU Graz - Signal Processing and Speech Communication Laboratory

Optimal Feedback Control

Conclusion

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 84/88

slide-85
SLIDE 85

TU Graz - Signal Processing and Speech Communication Laboratory

Conclusion – Summary

We talked about ...

◮ Markov Decision Process (MDP), Cost-to-go formulations ◮ Continuous-time stochastic system ◮ Linear Quadratic Gaussian Framework (LQG) ◮ LQG + Kalman Filter ◮ LQG Multiplicative Noise ◮ Minimal Intervention Principle ◮ Hierarchical Optimal Controller

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 85/88

slide-86
SLIDE 86

TU Graz - Signal Processing and Speech Communication Laboratory

Outline

Introduction Definition Challenges of sensorimotor control Bayesian Integration in motor control Cost Functions Optimal Estimation Optimal Feedback Control Introduction Linear Quadratic Gaussian Framework (LQG) LQG + Kalman Filter LQG Multiplicative Noise Minimal Intervention Principle Hierarchical Optimal Controller Conclusion References

Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 86/88

slide-87
SLIDE 87

TU Graz - Signal Processing and Speech Communication Laboratory

References

Konrad P. K¨

  • rding, Daniel M. Wolpert

”Bayesian decision theory in sensorimotor control”, Trends Cogn Sci., vol. 10, no. 7, pp. 319–326, July 2006, 10.1016/j.tics.2006.05.003. Konrad P. K¨

  • rding, Daniel M. Wolpert,

”Bayesian integration in sensorimotor learning”, letters to nature vol. 427, no. 15, pp. 244–247, January 2004 Emanuel Todorov, ”Optimality principles in sensorimotor control”, Nature Neuroscience, vol. 7, no. 9, pp. 907–915, Sep. 2004, 10.1038/nn1309. Emanuel Todorov, ”Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System”, Neural Comput., vol. 17, no. 5, pp. 1084–1108, May. 2005, 10.1162/0899766053491887. Kenji Doya et al, ”Bayesian Brain”, Chapter: ”Optimal Control Theory”, MIT Press, 2006 Emanuel Todorov, Lecture: ”Intelligent control through learning and optimization”, http://www.cs.washington.edu/homes/todorov/courses/amath579/, accessed 14 Mai 2012 Matthias Freiberger, Martin ¨ Ottl Advanced Signal Processing page 87/88

slide-88
SLIDE 88