Reduction of continuous-time control to Problem discrete-time - - PowerPoint PPT Presentation

reduction of continuous time control to
SMART_READER_LITE
LIVE PREVIEW

Reduction of continuous-time control to Problem discrete-time - - PowerPoint PPT Presentation

Reduction of continuous- time control to discrete-time control A. Jean-Marie Reduction of continuous-time control to Problem discrete-time control statement The model Uniformization A. Jean-Marie Event model Application OCOQS


slide-1
SLIDE 1

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Reduction of continuous-time control to discrete-time control

  • A. Jean-Marie

OCOQS Meeting, 24 January 2012

slide-2
SLIDE 2

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Outline

1

Problem statement

2

The basic model

3

Uniformization

4

Event model

5

Application

slide-3
SLIDE 3

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Progress

1

Problem statement

2

The basic model

3

Uniformization

4

Event model

5

Application

slide-4
SLIDE 4

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Problem statement

Consider some continuous-time, discrete-event, infinite-horizon control problem. The standard way to analyze such problems is to reduce them to a discrete-time problem using some embedding of a discrete-time process into the continuous-time one. The optimal policy is deduced from the solution of the discrete-time problem.

slide-5
SLIDE 5

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Problem statement (ctd)

There are various ways to place the observation points: jump instants, controllable event instants, uniformization instants. They may result in different value functions. Question Is there a way to “play” with the embedding process in order to

  • btain structural properties of the optimal policy?
slide-6
SLIDE 6

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Progress

1

Problem statement

2

The basic model

3

Uniformization

4

Event model

5

Application

slide-7
SLIDE 7

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

A basic continuous-time control model

As a starting point, consider: a continuous-time, piecewise-constant process {X(t); t ≥ 0} over some discrete state space X; a sequence of decision instants {Tn; n ∈ N}, endogenous a finite set of actions A; at a decision point t, given the current state x = X(t), there is a feasible set of actions Ax ⊂ A. Assuming that action a ∈ As is applied,

a reward r(x, a, y) is obtained; the state jumps to a random Ta(x) with distribution Pxay = P(Ta(x) = y); given y, the next decision point is at t + τ, where τ has an exponential distribution with parameter λy.

between decision points, a reward is accumulated at ℓ(x(t)), piecewise constant by assumption.

slide-8
SLIDE 8

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Basic model (ctd.)

Reward criterion: expected total discounted reward. Given X(0) = x, J(x) = E ∞ e−αtℓ(X(t))dt +

  • n=1

e−αTnr(X(T −

n ), A(Tn), X(T + n ))

  • .

The goal is to find the optimal feedback control d : X → A (with the constraint that d(x) ∈ Ax for all x) to maximize J.

slide-9
SLIDE 9

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Basic embedding

Features of this model: control is instantaneous and localized in time evolution is strictly Markovian immediate generalization to semi-Markov decision/transition instants. Two possibilities for the observation of the process: just before a transition/control: → V −(x) just after a transition/control: → V +(x) Question: What is their relation with J(x)?

slide-10
SLIDE 10

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Direct Bellman equations

Conditioning on T1, the first decision point, we get: V +(x) = 1 α + λx

  • ℓ(x) + λxV −(x)
  • V −(x)

= max

a∈Ax

  • y

Pxay

  • r(x, a, y) + V +(y)
  • =

max

a∈Ax

  • E
  • r(x, a, Ta(x)) + V +(Ta(x))
  • .
slide-11
SLIDE 11

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Basic functional equations

Eliminating V + or V − leads to two forms of Bellman’s equation: Bellman Equations V +(x) = 1 α + λx

  • ℓ(x)

+ λx max

a∈Ax

  • y

Pxay

  • r(x, a, y) + V +(y)

V −(x) = max

a∈Ax

  • y

Pxay

  • r(x, a, y)

+ 1 α + λy

  • ℓ(y) + λyV −(y)

.

slide-12
SLIDE 12

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Progress

1

Problem statement

2

The basic model

3

Uniformization

4

Event model

5

Application

slide-13
SLIDE 13

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Uniformization ` a la carte

For each state x, define νx ≥ λx and introduce a new, uncontrollable transition point after τ ∼ Exp(νx). Extend the state space to X × {r, u}, r = regular event, u = uniformization event. Table of rewards and transition probabilities: x′ a y′ r(x′, a, y′) Px′ay′ (x, r) a (y, r) r(x, a, y) λy νy Pxay (x, r) a (y, u) r(x, a, y) νy − λy νy Pxay (x, u) ∗ (x, r) λy νy (x, u) ∗ (y, u) νy − λy νy Running reward: ℓ(x, e) = ℓ(x); transition rate: λ(x, e) = νx.

slide-14
SLIDE 14

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Relationships

Lemma Let V (·) be the direct value function and Vu(·, ·) be the uniformized value function. Then: V −

u (x, r)

= V −(x) V −

u (x, u)

= V +(x) V +

u (x, r)

= 1 α + νx (ℓ(x) + νxV −(x)) V +

u (x, u)

= 1 α + νx (ℓ(x) + νxV +(x)) .

slide-15
SLIDE 15

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Interpretations

No uniformization (λx = µx): V +

u (x, r)

= 1 α + λx (ℓ(x) + λxV −(x)) = V +(x) V +

u (x, u)

= E T1 e−αuℓ(x)du + e−αT1V +(x)

  • .

Hyper-frequent uniformization (νx → ∞): lim

νx→∞ V + u (x, r)

= V −(x) = V −

u (x, u)

lim

νx→∞ V + u (x, u)

= V +(x) = V −

u (x, r) .

No discounting (α → 0): V +

u (x, r)

∼ ℓ(x) νx + V −(x) V +

u (x, u)

∼ ℓ(x) νx + V +(x) .

slide-16
SLIDE 16

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Bellman equations for the uniformized process

Lemma The basic value functions V + and V − satisfy: V +(x) = 1 α + νx

  • ℓ(x) + (νx − λx)V +(x)

+ λx max

a∈Ax

  • y

Pxay

  • r(x, a, y) + V +(y)

V −(x) = 1 α + νx

  • (νx − λx)V −(x)

+ (α + λx) max

a∈Ax

  • y

Pxay

  • r(x, a, y) +

1 α + λy (ℓ(y) + λyV −(y))

slide-17
SLIDE 17

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Progress

1

Problem statement

2

The basic model

3

Uniformization

4

Event model

5

Application

slide-18
SLIDE 18

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

The event model

If transitions have several “types”, the strictly markovian model requires to extend the state space: x = (s, e) with s the actual system state, and e the event type. We get: V +(s, e) = 1 α + λs,e

  • ℓ(s, e)

+ λs,e max

a∈As,e

  • s′
  • e′

P((s, e); a; (s′, e′))

  • r((s, e), a, (s′, e′)) + V +(s′, e′)

V −(s, e) = max

a∈As,e

  • s′
  • e′

P((s, e); a; (s′, e′))

  • r((s, e), a, (s′, e′)) + ℓ(s′, e′) + λs′,e′V −(s′, e′)

α + λs′,e′

slide-19
SLIDE 19

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

The event model

Question Under which conditions is it possible to “get rid” of the event part in the state representation. Is it possible that: V +(s, e) = V +(s) ∀e?

slide-20
SLIDE 20

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Progress

1

Problem statement

2

The basic model

3

Uniformization

4

Event model

5

Application

slide-21
SLIDE 21

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Application: arrival control in the M/M/1

Let λ and µ denote the arrival and service rates. Reward R for each accepted customer, and (negative) running reward ℓ(s) for keeping s customers in queue. Markovian state: x ∈ N × {a, d} (numbered 1/0 in Puterman). The equations for the value function, after uniformization at uniform rate λ + µ, are: VP(s, d) = 1 α + λ + µ

  • ℓ(s) + µVP((s − 1)+, d) + λVP(s, a)
  • VP(s, a)

= max

  • R +

1 α + λ + µ [ℓ(s + 1) + µVP(s, d) + λVP(s + 1, a)] 1 α + λ + µ [ℓ(s) + µVP(s − 1, d) + λVP(s, a)]

  • .
slide-22
SLIDE 22

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Where is the observation?

But Puterman p. 568 says: The system is in state < s, 0 > if there are s jobs in the system and no arrivals. We observe this state when a transition corresponds to a departure. [...] The state < s, 1 > occurs when there are s jobs in the system and a new job arrives. In our notation, this would correspond to setting: VP(s, d) = V +

u ((s + 1, d), r)

VP(s, a) = V −

u ((s, a), r) .

Work in progress....

slide-23
SLIDE 23

Reduction of continuous- time control to discrete-time control

  • A. Jean-Marie

Problem statement The model Uniformization Event model Application

Lunch time!