The simplex method is strongly polynomial for deterministic Markov decision processes
Ian Post Yinyu Ye Fields Institute November 29, 2013
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 1 / 18
The simplex method is strongly polynomial for deterministic Markov - - PowerPoint PPT Presentation
The simplex method is strongly polynomial for deterministic Markov decision processes Ian Post Yinyu Ye Fields Institute November 29, 2013 Post, Ye Simplex on MDPs Fields, Nov 29, 2013 1 / 18 Markov Decision Processes A Markov decision
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 1 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 2 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 3 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 3 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 3 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 3 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 4 / 18
◮ Close to being strongly polynomial [Ye05] and possess a lot of structure
◮ ...but also appear hard for powerful algorithms [Fea10] [FHZ11] Post, Ye Simplex on MDPs Fields, Nov 29, 2013 4 / 18
◮ Close to being strongly polynomial [Ye05] and possess a lot of structure
◮ ...but also appear hard for powerful algorithms [Fea10] [FHZ11]
◮ A number of open questions including their performance on special cases
◮ Important for developing new algorithms with better performance Post, Ye Simplex on MDPs Fields, Nov 29, 2013 4 / 18
◮ Long conjectured to be strongly polynomial but only exponential bounds
◮ Recently shown to be exponential [Fea10]
◮ ǫ-approximation to the optimum [Bel57] ◮ True optimum [Ye11] [HMZ11]
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 5 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 6 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 6 / 18
◮ This defines a Markov chain
◮ Key property: increasing the value of one state only increases values of
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 7 / 18
◮ Flux through an action in π is always between 1 and
n 1−γ = n ∞ i=0 γi
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 8 / 18
◮ Flux through an action in π is always between 1 and
n 1−γ = n ∞ i=0 γi
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 8 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 9 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 10 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 11 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 12 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 13 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 14 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 14 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 14 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 15 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 16 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 17 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 17 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 17 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 17 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 17 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 17 / 18
Post, Ye Simplex on MDPs Fields, Nov 29, 2013 18 / 18