Critical Level Policies in Lost Sales Inventory Systems with - - PowerPoint PPT Presentation

critical level policies in lost sales inventory systems
SMART_READER_LITE
LIVE PREVIEW

Critical Level Policies in Lost Sales Inventory Systems with - - PowerPoint PPT Presentation

Markov Decision Processes Model description Extensions Critical Level Policies in Lost Sales Inventory Systems with Different Demand Classes Aleksander Wieczorek 1 , 4 c 1 Emmanuel Hyon 2 , 3 Ana Bu si 1 INRIA/ENS, Paris, France 2


slide-1
SLIDE 1

Markov Decision Processes Model description Extensions

Critical Level Policies in Lost Sales Inventory Systems with Different Demand Classes

Aleksander Wieczorek1,4 Ana Buˇ si´ c1 Emmanuel Hyon2,3

1INRIA/ENS, Paris, France 2Universit´

e Paris Ouest Nanterre, Nanterre, France

3LIP6, UPMC, Paris, France 4Institute of Computing Science, Poznan University of Technology, Poznan, Poland

EPEW, Borrowdale, UK, October 13, 2011

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-2
SLIDE 2

Markov Decision Processes Model description Extensions

Table of Contents

1

Markov Decision Processes Definition Optimal control

2

Model description Admission control Policies Results

3

Extensions

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-3
SLIDE 3

Markov Decision Processes Model description Extensions

Model presentation

S Stock μ1 μ2 ... μN N phases λ

p1 (cost c1) p2 (cost c2) pJ (cost cJ)

... J classes of customers (increasing costs)

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-4
SLIDE 4

Markov Decision Processes Model description Extensions Definition Optimal control

Plan

1

Markov Decision Processes Definition Optimal control

2

Model description Admission control Policies Results

3

Extensions

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-5
SLIDE 5

Markov Decision Processes Model description Extensions Definition Optimal control

Markov Decision Process

Formalism and notation [3] A collection of objects (X, A, p(y|x, a), c(x, a)) where: X — state space, X = {1, . . . , S} × {1, . . . , N} ∪ {(0, 1)}, ∀(x, k) ∈ X x — replenishment, k — phase, A — set of actions, A = {0, 1}, 1 — acceptance, 0 — rejection, p(y|x, a) — probability of moving to state y from state x when action a is triggered, c(x, a) — instantaneous cost in state x when action a is triggered.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-6
SLIDE 6

Markov Decision Processes Model description Extensions Definition Optimal control

Plan

1

Markov Decision Processes Definition Optimal control

2

Model description Admission control Policies Results

3

Extensions

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-7
SLIDE 7

Markov Decision Processes Model description Extensions Definition Optimal control

Optimal control problem

Policy A policy π is a sequence of decision rules that maps the information history (past states and actions) to the action set A. Markov deterministic policy A Markov deterministic policy is of the form (a(·), a(·), . . .) where a(·) is a single deterministic decision rule that maps the current state to a decision (hence, in our case a(·) is a function from X to A).

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-8
SLIDE 8

Markov Decision Processes Model description Extensions Definition Optimal control

Optimal control problem — optimality criteria

Minimal long-run average cost ¯ v∗ = min

π

lim

n→∞

1 nEπ

y

n−1

  • ℓ=0

C(yℓ, aℓ)

  • Policies π∗ optimising some optimality criteria are called optimal

policies (with respect to a given criterion). Goal: characterise optimal policy π∗ that reaches ¯ v∗.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-9
SLIDE 9

Markov Decision Processes Model description Extensions Definition Optimal control

Optimal control problem — optimality criteria

Minimal (expected) n-stage total cost Vn(y) = min

π(n) Eπ(n) y

n−1

  • ℓ=0

C(yℓ, aℓ)

  • , y ∈ X, y0 = y

Convergence results [2], [3, Chapter 8] The minimal n-stage total cost value function Vn does not converge when n tends to infinity. The difference Vn+1(y) − Vn(y) converges to the minimal long-run average cost (¯ v∗). Relation between different optimality criteria [2], [3, Chapter 8] The optimal n-stage policy (minimizing Vn) tends to the optimal average policy π∗ (minimizing ¯ v∗) when n tends to infinity.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-10
SLIDE 10

Markov Decision Processes Model description Extensions Definition Optimal control

Cost value function

Bellman equation Vn+1 = TVn where T is the dynamic programming operator: (Tf)(y) = min

a (ˆ

Tf)(y, a) = min

a

 C(y, a) +

  • y′∈X

P (y′|(y, a)) f(y′)   , Decomposition of T The dynamic programming equation is: Vn(x, k) = Tunif J

  • i=1

piTCA(i)(Vn−1), TD(Vn−1)

  • ,

(1) where V0(x, k) ≡ 0 and Tunif, TCA(i) and TD are the different event

  • perators.
  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-11
SLIDE 11

Markov Decision Processes Model description Extensions Admission control Policies Results

Plan

1

Markov Decision Processes Definition Optimal control

2

Model description Admission control Policies Results

3

Extensions

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-12
SLIDE 12

Markov Decision Processes Model description Extensions Admission control Policies Results

Description of operators

S Stock μ1 μ2 ... μN N phases λ p1 (cost c1) p2 (cost c2) pJ (cost cJ) ... J classes of customers (increasing costs)

Controlled arrival operator of a customer of class i, TCA(i) TCA(i)f(x, k) =

  • min{f(x + 1, k), f(x, k) + ci}

if x < S, f(x, k) + ci if x = S.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-13
SLIDE 13

Markov Decision Processes Model description Extensions Admission control Policies Results

Description of operators

Let µ′

k = µk/α.

Departure operator, TD TDf(x, k) = µ′

k

  • f(x, k + 1)

if (k < N) and (x > 0), f((x − 1)+, 1) if (k = N) or (x = 0 and k = 0) + (1 − µ′

k)f(x, k).

Uniformization operator, Tunif Tunif(f(x, k), g(x, k)) = λ λ + αf(x, k) + α λ + αg(x, k).

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-14
SLIDE 14

Markov Decision Processes Model description Extensions Admission control Policies Results

Plan

1

Markov Decision Processes Definition Optimal control

2

Model description Admission control Policies Results

3

Extensions

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-15
SLIDE 15

Markov Decision Processes Model description Extensions Admission control Policies Results

Critical level policies

Definition (Critical level policy) A policy is called a critical level policy if for any fixed k and any customer class j it exists a level tk,j in x, depending on phase k and customer class j, such that in state (x, k):

  • for all 0 ≤ x < tk,j it is optimal to accept any customer of class j,
  • for all x ≥ tk,j it is optimal to reject any customer of class j.
  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-16
SLIDE 16

Markov Decision Processes Model description Extensions Admission control Policies Results

Structural properties of policies

Assume a critical level policy and consider a decision for a fixed customer class j. Definition (Switching curve) For every k, we define a level t(k) = tk,j such that when we are in state (x, k) decision 1 is taken if and only if x < t(k) and 0 otherwise. The mapping k → t(k) is called a switching curve. Definition (Monotone switching curve) We say that a decision rule is of the monotone switching curve type if the mapping k → t(k) is monotone.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-17
SLIDE 17

Markov Decision Processes Model description Extensions Admission control Policies Results

Example — critical levels, switching curve

1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 x − no. of customers in queue k − phase

Figure: Acceptance points for different customer classes. Blue circle — all classes are accepted, green triangle — classes 2 and 3 are accepted, pink square — only class 3 is accepted, red asterisk — rejection of any class.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-18
SLIDE 18

Markov Decision Processes Model description Extensions Admission control Policies Results

Properties of value functions

Definition (Convexity) f is convex in x (denoted by Convex(x)) if for all y = (x, k): 2f(x + 1, k) ≤ f(x, k) + f(x + 2, k) . Definition (Submodularity) f is submodular in x and k (denoted by Sub(x, k)) if for all y = (x, k): f(x + 1, k + 1) + f(x, k) ≤ f(x + 1, k) + f(x, k + 1) . Theorem (Th 8.1 [2]) Let a(y) be the optimal decision rule: i) If f ∈ Convex(x), then a(y) is decreasing in x. ii) If f ∈ Sub(x, k), then a(y) is increasing in k.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-19
SLIDE 19

Markov Decision Processes Model description Extensions Admission control Policies Results

Properties of value functions

Definition (Convexity) f is convex in x (denoted by Convex(x)) if for all y = (x, k): 2f(x + 1, k) ≤ f(x, k) + f(x + 2, k) . Definition (Submodularity) f is submodular in x and k (denoted by Sub(x, k)) if for all y = (x, k): f(x + 1, k + 1) + f(x, k) ≤ f(x + 1, k) + f(x, k + 1) . Theorem (Th 8.1 [2]) Let a(y) be the optimal decision rule: i) If f ∈ Convex(x), then a(y) is decreasing in x. ii) If f ∈ Sub(x, k), then a(y) is increasing in k.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-20
SLIDE 20

Markov Decision Processes Model description Extensions Admission control Policies Results

Plan

1

Markov Decision Processes Definition Optimal control

2

Model description Admission control Policies Results

3

Extensions

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-21
SLIDE 21

Markov Decision Processes Model description Extensions Admission control Policies Results

Properties of the value function of the model

Let Vn be a n-steps total cost value function satisfying the definition of the model. Vn(x, k) = Tcost

  • Tunif

J

i=1 piTCA(i)(Vn−1), TD(Vn−1)

  • ,

Lemma For all n ≥ 0, Vn is in Incr() ∩ AConvex(x) ∩ Convex(x). Lemma For all n ≥ 0 Vn is in Sub(x, k) ∩ BSub(x, k).

f ∈ AConvex(x) means that ∀k ∈ {1, .., N} f(0, 1) + f(2, k) ≥ 2f(1, k) f ∈ BSub(x, k) means that ∀0 < x < S f(x, 1) + f(x, N) ≤ f(x − 1, 1) + f(x + 1, N)

Proofs are done by checking the preservation of all the properties by all the operators.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-22
SLIDE 22

Markov Decision Processes Model description Extensions Admission control Policies Results

Main structural results

Theorem The optimal policy is a critical level policy. Theorem For any critical level policy, if the rejection costs are nondecreasing (c1 ≤ · · · ≤ cJ), then the levels tk,j are nondecreasing with respect to customer class j, i.e. tk,j ≤ tk,j+1. Proofs: convexity (+ convergence).

1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 x − no. of customers in queue k − phase

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-23
SLIDE 23

Markov Decision Processes Model description Extensions Admission control Policies Results

Main structural results

Theorem The optimal policy defines an increasing switching curve. Proof: submodularity (+ convergence).

1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 x − no. of customers in queue k − phase

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-24
SLIDE 24

Markov Decision Processes Model description Extensions

Hyperexponential model

1 2 3 4 5 1 2 3 4 5 6 7 8 9 10 x − no. of customers in queue k − phase

Figure: Acceptance points for different customer classes. Blue circle — all classes are accepted, green triangle — classes 2 and 3 are accepted, pink square — only class 3 is accepted, red asterisk — rejection of any class.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-25
SLIDE 25

Markov Decision Processes Model description Extensions

Extensions

Holding costs The addition of holding costs breaks similarities between queueing models and inventory systems. Holding cost operator, Tcost Tcostf(x, k) = x λ + α + f(x, k) Universality of the approach The same reasoning can be applied to queueing models with holding costs resulting in the same properties of optimal policies.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon

slide-26
SLIDE 26

Markov Decision Processes Model description Extensions

References

A.Y. Ha. Inventory rationing in a make-to-stock production system with several demand classes and lost sales. Management Science, 43(8):1093–1103, 1997.

  • G. Koole.

Monotonicity in Markov reward and decision chains: Theory and applications. Foundation and Trends in Stochastic Systems, 1(1), 2006.

  • M. Puterman.

Markov Decision Processes Discrete Stochastic Dynamic Programming. Wiley, 2005.

  • A. Wieczorek, A. Buˇ

si´ c, E. Hyon