online convex optimization in adversarial mdps
play

Online Convex Optimization in Adversarial MDPs Aviv Rosenberg - PowerPoint PPT Presentation

Poster #150 Online Convex Optimization in Adversarial MDPs Aviv Rosenberg Yishay Mansour Motivation: MDPs are very popular but dont consider time -changing environments BGP Routing is a great motivating example Adversarial MDP is an


  1. Poster #150 Online Convex Optimization in Adversarial MDPs Aviv Rosenberg Yishay Mansour Motivation: ▪ MDPs are very popular but don’t consider time -changing environments ▪ BGP Routing is a great motivating example Adversarial MDP is an Model: MDP in which the losses might change arbitrarily ▪ Episodic MDP ▪ Transition Function is fixed but unknown to the learner ▪ Sequence of loss functions is chosen by an adversary ▪ Success is measures by the regret – comparing to the best policy in hindsight

  2. Poster #150 Online Convex Optimization in Adversarial MDPs Aviv Rosenberg Yishay Mansour Problem Reformulation: ▪ The learner picks policies or occupancy measures equivalently ▪ Picking occupancy measures makes this an instance of online convex optimization Occupancy measure is a probability distribution Algorithm: over the state-action pairs ▪ Basic idea: run online mirror descent ▪ Problem: unknow transition function means we don’t know if an occupancy measure is legal ▪ Solution: maintain confidence sets that contain the MDP with high probability

  3. Poster #150 Online Convex Optimization in Adversarial MDPs Aviv Rosenberg Yishay Mansour Challenges: Performance criterion is a ▪ Efficient implementation of the algorithm function that aggregates all the losses of a single episode. ▪ Regret analysis Examples involve risk-sensitivity and robustness. Contributions: Previous state-of-the-art: ▪ handling performance criteria that are convex • Based on Follow the Perturbed with respect to the occupancy measures Leader • Regret bound of 𝑃 𝐼 𝑇 𝐵 𝑈 ▪ High confidence regret bound of 𝑃 𝐼 𝑇 𝐵 𝑈 in expectation

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend