critical level policies in lost sales inventory systems
play

Critical Level Policies in Lost Sales Inventory Systems with - PowerPoint PPT Presentation

Markov Decision Processes Model description Extensions Critical Level Policies in Lost Sales Inventory Systems with Different Demand Classes Aleksander Wieczorek 1 , 4 c 1 Emmanuel Hyon 2 , 3 Ana Bu si 1 INRIA/ENS, Paris, France 2


  1. Markov Decision Processes Model description Extensions Critical Level Policies in Lost Sales Inventory Systems with Different Demand Classes Aleksander Wieczorek 1 , 4 c 1 Emmanuel Hyon 2 , 3 Ana Buˇ si´ 1 INRIA/ENS, Paris, France 2 Universit´ e Paris Ouest Nanterre, Nanterre, France 3 LIP6, UPMC, Paris, France 4 Institute of Computing Science, Poznan University of Technology, Poznan, Poland EPEW, Borrowdale, UK, October 13, 2011 A. Wieczorek, A. Buˇ si´ c, E. Hyon

  2. Markov Decision Processes Model description Extensions Table of Contents Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon

  3. N phases λ J classes of customers (increasing costs) ... S Stock μ1 μ2 ... μN Markov Decision Processes Model description Extensions Model presentation p1 (cost c1) p2 (cost c2) pJ (cost cJ) A. Wieczorek, A. Buˇ si´ c, E. Hyon

  4. Markov Decision Processes Definition Model description Optimal control Extensions Plan Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon

  5. Markov Decision Processes Definition Model description Optimal control Extensions Markov Decision Process Formalism and notation [3] A collection of objects ( X , A , p ( y | x , a ) , c ( x , a )) where: X — state space, X = { 1 , . . . , S } × { 1 , . . . , N } ∪ { ( 0 , 1 ) } , ∀ ( x , k ) ∈ X x — replenishment, k — phase , A — set of actions, A = { 0 , 1 } , 1 — acceptance, 0 — rejection , p ( y | x , a ) — probability of moving to state y from state x when action a is triggered, c ( x , a ) — instantaneous cost in state x when action a is triggered. A. Wieczorek, A. Buˇ si´ c, E. Hyon

  6. Markov Decision Processes Definition Model description Optimal control Extensions Plan Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon

  7. Markov Decision Processes Definition Model description Optimal control Extensions Optimal control problem Policy A policy π is a sequence of decision rules that maps the information history (past states and actions) to the action set A . Markov deterministic policy A Markov deterministic policy is of the form ( a ( · ) , a ( · ) , . . . ) where a ( · ) is a single deterministic decision rule that maps the current state to a decision (hence, in our case a ( · ) is a function from X to A ). A. Wieczorek, A. Buˇ si´ c, E. Hyon

  8. Markov Decision Processes Definition Model description Optimal control Extensions Optimal control problem — optimality criteria Minimal long-run average cost � n − 1 � 1 v ∗ = min � n E π ¯ lim C ( y ℓ , a ℓ ) y π n →∞ ℓ = 0 Policies π ∗ optimising some optimality criteria are called optimal policies (with respect to a given criterion). Goal: characterise optimal policy π ∗ that reaches ¯ v ∗ . A. Wieczorek, A. Buˇ si´ c, E. Hyon

  9. Markov Decision Processes Definition Model description Optimal control Extensions Optimal control problem — optimality criteria Minimal (expected) n -stage total cost � n − 1 � π ( n ) E π ( n ) � V n ( y ) = min C ( y ℓ , a ℓ ) , y ∈ X , y 0 = y y ℓ = 0 Convergence results [2], [3, Chapter 8] The minimal n -stage total cost value function V n does not converge when n tends to infinity. The difference V n + 1 ( y ) − V n ( y ) converges to the minimal long-run v ∗ ). average cost ( ¯ Relation between different optimality criteria [2], [3, Chapter 8] The optimal n -stage policy (minimizing V n ) tends to the optimal average policy π ∗ (minimizing ¯ v ∗ ) when n tends to infinity. A. Wieczorek, A. Buˇ si´ c, E. Hyon

  10. Markov Decision Processes Definition Model description Optimal control Extensions Cost value function Bellman equation V n + 1 = TV n where T is the dynamic programming operator:   a (ˆ � P ( y ′ | ( y , a )) f ( y ′ )  , ( Tf )( y ) = min Tf )( y , a ) = min  C ( y , a ) + a y ′ ∈X Decomposition of T The dynamic programming equation is: � J � � V n ( x , k ) = T unif p i T CA ( i ) ( V n − 1 ) , T D ( V n − 1 ) , (1) i = 1 where V 0 ( x , k ) ≡ 0 and T unif , T CA ( i ) and T D are the different event operators. A. Wieczorek, A. Buˇ si´ c, E. Hyon

  11. Markov Decision Processes Admission control Model description Policies Extensions Results Plan Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon

  12. J classes of customers (increasing costs) μ1 λ N phases μN ... μ2 Stock ... S Markov Decision Processes Admission control Model description Policies Extensions Results Description of operators p1 (cost c1) p2 (cost c2) pJ (cost cJ) Controlled arrival operator of a customer of class i , T CA ( i ) � min { f ( x + 1 , k ) , f ( x , k ) + c i } if x < S , T CA ( i ) f ( x , k ) = f ( x , k ) + c i if x = S . A. Wieczorek, A. Buˇ si´ c, E. Hyon

  13. Markov Decision Processes Admission control Model description Policies Extensions Results Description of operators Let µ ′ k = µ k /α . Departure operator, T D � f ( x , k + 1 ) if ( k < N ) and ( x > 0 ) , T D f ( x , k ) = µ ′ k f (( x − 1 ) + , 1 ) if ( k = N ) or ( x = 0 and k = 0 ) + ( 1 − µ ′ k ) f ( x , k ) . Uniformization operator, T unif λ α T unif ( f ( x , k ) , g ( x , k )) = λ + α f ( x , k ) + λ + α g ( x , k ) . A. Wieczorek, A. Buˇ si´ c, E. Hyon

  14. Markov Decision Processes Admission control Model description Policies Extensions Results Plan Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon

  15. Markov Decision Processes Admission control Model description Policies Extensions Results Critical level policies Definition (Critical level policy) A policy is called a critical level policy if for any fixed k and any customer class j it exists a level t k , j in x , depending on phase k and customer class j , such that in state ( x , k ) : - for all 0 ≤ x < t k , j it is optimal to accept any customer of class j , - for all x ≥ t k , j it is optimal to reject any customer of class j . A. Wieczorek, A. Buˇ si´ c, E. Hyon

  16. Markov Decision Processes Admission control Model description Policies Extensions Results Structural properties of policies Assume a critical level policy and consider a decision for a fixed customer class j . Definition (Switching curve) For every k , we define a level t ( k ) = t k , j such that when we are in state ( x , k ) decision 1 is taken if and only if x < t ( k ) and 0 otherwise. The mapping k �→ t ( k ) is called a switching curve . Definition (Monotone switching curve) We say that a decision rule is of the monotone switching curve type if the mapping k �→ t ( k ) is monotone. A. Wieczorek, A. Buˇ si´ c, E. Hyon

  17. Markov Decision Processes Admission control Model description Policies Extensions Results Example — critical levels, switching curve 10 x − no. of customers in queue 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 k − phase Figure: Acceptance points for different customer classes. Blue circle — all classes are accepted, green triangle — classes 2 and 3 are accepted, pink square — only class 3 is accepted, red asterisk — rejection of any class. A. Wieczorek, A. Buˇ si´ c, E. Hyon

  18. Markov Decision Processes Admission control Model description Policies Extensions Results Properties of value functions Definition (Convexity) f is convex in x (denoted by Convex ( x ) ) if for all y = ( x , k ) : 2 f ( x + 1 , k ) ≤ f ( x , k ) + f ( x + 2 , k ) . Definition (Submodularity) f is submodular in x and k (denoted by Sub( x , k )) if for all y = ( x , k ) : f ( x + 1 , k + 1 ) + f ( x , k ) ≤ f ( x + 1 , k ) + f ( x , k + 1 ) . Theorem (Th 8.1 [2]) Let a ( y ) be the optimal decision rule: i) If f ∈ Convex ( x ) , then a ( y ) is decreasing in x. ii) If f ∈ Sub ( x , k ) , then a ( y ) is increasing in k. A. Wieczorek, A. Buˇ si´ c, E. Hyon

  19. Markov Decision Processes Admission control Model description Policies Extensions Results Properties of value functions Definition (Convexity) f is convex in x (denoted by Convex ( x ) ) if for all y = ( x , k ) : 2 f ( x + 1 , k ) ≤ f ( x , k ) + f ( x + 2 , k ) . Definition (Submodularity) f is submodular in x and k (denoted by Sub( x , k )) if for all y = ( x , k ) : f ( x + 1 , k + 1 ) + f ( x , k ) ≤ f ( x + 1 , k ) + f ( x , k + 1 ) . Theorem (Th 8.1 [2]) Let a ( y ) be the optimal decision rule: i) If f ∈ Convex ( x ) , then a ( y ) is decreasing in x. ii) If f ∈ Sub ( x , k ) , then a ( y ) is increasing in k. A. Wieczorek, A. Buˇ si´ c, E. Hyon

  20. Markov Decision Processes Admission control Model description Policies Extensions Results Plan Markov Decision Processes 1 Definition Optimal control Model description 2 Admission control Policies Results Extensions 3 A. Wieczorek, A. Buˇ si´ c, E. Hyon

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend