ARTIFICIAL INTELLIGENCE Planning under uncertainty: POMDPs - PowerPoint PPT Presentation

Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Planning under uncertainty: POMDPs Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

Markov models types Prediction Planning Markov chain MDP Fully observable (Markov decision process) Hidden POMDP Partially observable (Partially observable Markov model Markov decision process) Prediction models can be represented at variable level by a (Dynamic) Bayesian network: S 3 S 2 S 1 … S 2 … S 1 S 3 O 1 O 2 O 3 2

Recap: Markov Decision Process An MDP is defined by:  A set of states s  S ; typically finite  A set of actions a  A ; typically finite T: S � A � S → �0,1�  A transition function T(s, a, s’) = P(S t+1 = s ’ | S t = s and A t = a) R: S � A → 𝑺  A reward function (Note: simpler than before) Stationary process: T and R are time independent Note the representation at variable level! 3

Solving a Markov System Goal: find an optimal policy π* (recall: best action to choose in each state s )  compute, for each state s , the expected sum of future rewards, when acting optimally:    * *    ( ) max ( , , ' ) ( , ) ( ' ) V s T s a s R s a V s a ' s (this can be done by e.g. Value Iteration (DP))  for π* take the argmax , as before 4

Planning under partial observability Imperfect observation Goal Environment Environment Action POMDP allows partial satisfaction of goals and tradeoffs among competing goals 5

POMDP a world o  Goal is to maximize expected long‐term reward from the initial state distribution  But: state is not directly observed 6

Definition of POMDP A Partially Observable Markov Decision Process (POMDP) is defined by:  All MDP variables/sets and functions  A set of observations o  O ; typically finite  An observation function : S A O Z(s, a, o) = P(O t = o | S t = s and A t‐1 = a) hidden states layer 7

Memory vs Markov The POMDP is non‐Markovian from viewpoint of agent:  any of the past actions and observations may influence the agent’s belief concerning the current state  if action choices are based on only most recent observation the policy becomes memoryless 8

Two sources of POMDP complexity  Curse of dimensionality – size of state space – shared by other planning problems  Curse of memory – size of value function (number of vectors) – or equivalently, size of controller (memory) – unique to POMDPs dimensionality memory   2 1 O | | | || | Complexity of each iteration of DP: n S A  | | n Γ represents a vector of values for possible states (true state unknown) 9

Note The following slides contain several formulas; you don’t have to understand these in detail if you grasp the general idea:  Rather than states, we use a probability distribution over states  The transition and observation functions of the POMDP contain all information necessary to update this distribution 10

Solution: Belief state In regular MDP we track and update current state. Since in POMDP the actual state is uncertain, we maintain a b : S → �0,1� belief state: (note: this is a probability distribution!) The belief state contains all relevant information from history: consider belief b ; after subsequent action a and observation o this belief can be updated (using POMDP ingredients): 𝑐 �� 𝑡 � � 𝑄 𝑡 � 𝑏, 𝑝, 𝑐 � 𝑄 𝑝 𝑏, 𝑐, 𝑡 � 𝑄 𝑡 � 𝑏, 𝑐� (Bayes’ rule) 𝑄 𝑝 𝑏, 𝑐 � � 𝑎 𝑡 � , 𝑏, 𝑝 𝑈 𝑡, 𝑏, 𝑡 � 𝑐�𝑡�/𝑄�𝑝|𝑏, 𝑐� Also expressible in Z , T and b �∈� (see next slide) 11

A belief-state MDP A belief‐state MDP is a continuous space MDP, completely specified with ingredients from the POMDP. It contains: A set of states b  B , where B is space of distributions over S A set of actions a  A ; same as original POMDP 𝑈 � : B � A → 𝐶 A transition function 𝑄 𝑐 � 𝑏, 𝑐, 𝑝 𝑄 𝑝 𝑏, 𝑐 � b 𝑐, 𝑏, 𝑐′ � 𝑄 𝑐 � 𝑐, 𝑏 � ∑ �∈� 𝑄 𝑐 � 𝑏, 𝑐, 𝑝 ∑ 𝑈 𝑡, 𝑏, 𝑡 � 𝑐 𝑡 𝑎 𝑡 � , 𝑏, 𝑝 ∑ = ∑ � � ∈� �∈� �∈� zero or one 𝑆 � : B � A → 𝑺 A reward function b 𝑐, 𝑏 � ∑ 𝑐 𝑡 𝑆�𝑡, 𝑏� �∈� where S , T, R and Z are defined by the corresponding POMDP 12

Belief state and Markov property The process of maintaining the belief state is Markovian!  For any belief state, the successor belief state depends only on the action and observation a 1 o 1 o 2 o 2 o 1 a 2 P(s 0 ) = 0 P(s 0 ) = 1 13

Solving the POMDP Current a Action b  Policy Obs. o b Belief  P(b  |b,a,o) State (Register)  Update belief state after action and observation  Policy maps belief state to action  Policy is found by solving the belief‐state MDP 14

Solving a belief-state MDP For solving the continuous space MDP  use the value iteration algorithm... after some adaptations to cope with continuous space:  We cannot find a state's new value by looping over all the possible (= infinitely many) next states  Representation of value function in tabular form not possible; POMDP restrictions cause finite horizon value function to be piecewise linear and convex 15

Conclusions  MDP’s have some efficient solution methods, but require fully observable states  POMDP’s are used in more realistic settings, but require sophisticated solution methods 16

Let’s learn!  We now know how to plan, if we have a fully specified (PO)MDP  But what if we don’t…? 17

ARTIFICIAL INTELLIGENCE Planning under uncertainty: POMDPs - PowerPoint PPT Presentation

Utrecht University INFOB2KI 2019-2020 The Netherlands ARTIFICIAL INTELLIGENCE Planning under uncertainty: POMDPs Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from

Fast approximate planning in POMDPs Geoff Gordon ggordon@cs.cmu.edu Joelle Pineau, Geoff Gordon,

Solving POMDPs through Macro Decomposition Larry Bush Tony Jimenez Brian Bairstow POMDPs are a

Artificial Intelligence Artificial Intelligence Artificial Intelligence Study and design of

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial Intelligence Course Presentation Summary Artificial Intelligence Motivations

Artificial intelligence Artificial Intelligence is the science of PHILOSOPHY OF ARTIFICIAL

Artificial Intelligence Intro (Chapter 1 of AIMA) Summary Artificial Intelligence What is AI?

CS325 Artificial Intelligence Ch. 17 Planning Under Uncertainty Cengiz Gnay, Emory Univ.

Uncertainty AIMA Chapter 13 Outline Uncertainty Uncertainty Probability Syntax and

POMDPs (Ch. 17.4-17.6) Markov Decision Process Recap of Markov Decision Processes (MDPs): Know:

Goal Recognition over POMDPs: Inferring the Intention of a POMDP Agent Miquel Ramirez, Hector

What is Artificial Intelligence? CPSC 322 Lecture 1 September 5, 2007 What is Artificial

Traditional Definition of Artificial Intelligence Trends Artificial Intelligence (AI) is

Decision Making Under Uncertainty Making Decisions Under Uncertainty AI C LASS 10 (C H .

Intro to Artificial Intelligence CS 171 Reasoning Under Uncertainty Chapter 13 and 14.1-14.2

Artificial Intelligence as Law Bart Verheij Department of Artificial Intelligence, Bernoulli

Chapter 1 Ordinary human inquiry Ordinary human inquiry Chapter 1 Tradition

Hidden Markov Models Markov Model (Finite State Machine with Probs) Modeling a sequence of

Farm Business Management: The Fundamentals of Good Practice Peter L. Nuthall Chapter 10

RAMBO: Run-time packer Analysis with Multiple Branch Observation Xabier Ugarte-Pedrero, Davide

Multi-Observation Elicitation July 2017 Sebastian Casalaina-Martin Colorado Rafael Frongillo

1. Introduction In this lecture we will derive the formulas for the symmetric two-sided prediction

First observation of DCS decay of a charmed Brayon: Liu Kai Outline Today let's talk more

Electronic Part Obsolescence Forecasting CALCE Electronic Products and Systems Center University