Making Complex Decisions Paolo Turrini Department of Computing, - PowerPoint PPT Presentation

Intro to AI (2nd Part) Makings plans In this case, the probability of accidental successes doesn’t play a significant role. However it might very well, under different decision models, rewards, environments etc. 0 . 32776 is still less than 1 3 , so we don’t seem to be doing very well. Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Rewards We introduce a utility function r : S → R Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Rewards We introduce a utility function r : S → R r stands for rewards. To avoid confusion with established terminology, we also call it a reward function. Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Terminology Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Terminology rewards for local utilities, assigned to states - denoted r Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Terminology rewards for local utilities, assigned to states - denoted r values for global long-range utilities, also assigned to states - denoted v Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Terminology rewards for local utilities, assigned to states - denoted r values for global long-range utilities, also assigned to states - denoted v utility and expected utility used as general terms applied to actions, states, sequences of states etc. - denoted u Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Rewards Consider now the following. The reward is: Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Rewards Consider now the following. The reward is: +1 at state +1, -1 at -1, -0.04 in all other states. Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Rewards Consider now the following. The reward is: +1 at state +1, -1 at -1, -0.04 in all other states. What’s the expected utility of [ Up , Up , Right , Right , Right ]? Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Rewards Consider now the following. The reward is: +1 at state +1, -1 at -1, -0.04 in all other states. What’s the expected utility of [ Up , Up , Right , Right , Right ]? IT DEPENDS Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Rewards Consider now the following. The reward is: +1 at state +1, -1 at -1, -0.04 in all other states. What’s the expected utility of [ Up , Up , Right , Right , Right ]? IT DEPENDS on how we are going to put rewards together! Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We need to compare sequences of states. Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We need to compare sequences of states. Look at the following: Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We need to compare sequences of states. Look at the following: u [ s 1 , s 2 , . . . s n ] is the utility of sequence s 1 , s 2 , . . . s n . Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We need to compare sequences of states. Look at the following: u [ s 1 , s 2 , . . . s n ] is the utility of sequence s 1 , s 2 , . . . s n . Does it remind you of anything? Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We need to compare sequences of states. Look at the following: u [ s 1 , s 2 , . . . s n ] is the utility of sequence s 1 , s 2 , . . . s n . Does it remind you of anything? multi-criteria decision making Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We need to compare sequences of states. Look at the following: u [ s 1 , s 2 , . . . s n ] is the utility of sequence s 1 , s 2 , . . . s n . Does it remind you of anything? multi-criteria decision making Many ways of comparing states: Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We need to compare sequences of states. Look at the following: u [ s 1 , s 2 , . . . s n ] is the utility of sequence s 1 , s 2 , . . . s n . Does it remind you of anything? multi-criteria decision making Many ways of comparing states: summing all the rewards Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We need to compare sequences of states. Look at the following: u [ s 1 , s 2 , . . . s n ] is the utility of sequence s 1 , s 2 , . . . s n . Does it remind you of anything? multi-criteria decision making Many ways of comparing states: summing all the rewards giving priority to the immediate rewards Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We need to compare sequences of states. Look at the following: u [ s 1 , s 2 , . . . s n ] is the utility of sequence s 1 , s 2 , . . . s n . Does it remind you of anything? multi-criteria decision making Many ways of comparing states: summing all the rewards giving priority to the immediate rewards . . . Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We are going to assume only one axiom, Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We are going to assume only one axiom, stationary preferences on reward sequences: Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences We are going to assume only one axiom, stationary preferences on reward sequences: [ r , r 0 , r 1 , r 2 , . . . ] ≻ [ r , r ′ 0 , r ′ 1 , r ′ 2 , . . . ] ⇔ [ r 0 , r 1 , r 2 , . . . ] ≻ [ r ′ 0 , r ′ 1 , r ′ 2 , . . . ] Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences Theorem There are only two ways to combine rewards over time. Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences Theorem There are only two ways to combine rewards over time. Additive utility function: Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences Theorem There are only two ways to combine rewards over time. Additive utility function: u ([ s 0 , s 1 , s 2 , . . . ]) = r ( s 0 ) + r ( s 1 ) + r ( s 2 ) + · · · Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences Theorem There are only two ways to combine rewards over time. Additive utility function: u ([ s 0 , s 1 , s 2 , . . . ]) = r ( s 0 ) + r ( s 1 ) + r ( s 2 ) + · · · Discounted utility function: Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences Theorem There are only two ways to combine rewards over time. Additive utility function: u ([ s 0 , s 1 , s 2 , . . . ]) = r ( s 0 ) + r ( s 1 ) + r ( s 2 ) + · · · Discounted utility function: u ([ s 0 , s 1 , s 2 , . . . ]) = r ( s 0 ) + γ r ( s 1 ) + γ 2 r ( s 2 ) + · · · Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Utility of state sequences Theorem There are only two ways to combine rewards over time. Additive utility function: u ([ s 0 , s 1 , s 2 , . . . ]) = r ( s 0 ) + r ( s 1 ) + r ( s 2 ) + · · · Discounted utility function: u ([ s 0 , s 1 , s 2 , . . . ]) = r ( s 0 ) + γ r ( s 1 ) + γ 2 r ( s 2 ) + · · · where γ ∈ [0 , 1] is the discount factor Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Discount factor Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Discount factor γ is a measure of the agent patience. How much more she values a gain of 5 today than a gain of 5 tomorrow, the day after etc... Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Discount factor γ is a measure of the agent patience. How much more she values a gain of 5 today than a gain of 5 tomorrow, the day after etc... Used everywhere in AI, game theory, cognitive psychology Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Discount factor γ is a measure of the agent patience. How much more she values a gain of 5 today than a gain of 5 tomorrow, the day after etc... Used everywhere in AI, game theory, cognitive psychology A lot of experimental research on it Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Discount factor γ is a measure of the agent patience. How much more she values a gain of 5 today than a gain of 5 tomorrow, the day after etc... Used everywhere in AI, game theory, cognitive psychology A lot of experimental research on it Variants: hyperbolic discounting Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Discounting Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Discounting With discounted rewards the utility of an infinite sequence if finite Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Discounting With discounted rewards the utility of an infinite sequence if finite In fact, if γ < 1 and rewards are bounded by r , we have: Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Discounting With discounted rewards the utility of an infinite sequence if finite In fact, if γ < 1 and rewards are bounded by r , we have: ∞ ∞ r � � γ t r ( s t ) ≤ γ t r = u [ s 1 , s 2 , . . . ] = 1 − γ t =0 t =0 Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Markov Decision Process A Markov Decision Process is a sequential decision problem for a: Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Markov Decision Process A Markov Decision Process is a sequential decision problem for a: fully observable environment Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Markov Decision Process A Markov Decision Process is a sequential decision problem for a: fully observable environment with stochastic actions Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Markov Decision Process A Markov Decision Process is a sequential decision problem for a: fully observable environment with stochastic actions with a Markovian transition model Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Markov Decision Process A Markov Decision Process is a sequential decision problem for a: fully observable environment with stochastic actions with a Markovian transition model and with discounted (possibly additive) rewards Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) MDPs formally Definition States s ∈ S , actions a ∈ A Model P ( s ′ | s , a ) = probability that a in s leads to s ′ Reward function R ( s ) (or R ( s , a ), R ( s , a , s ′ )) = � − 0 . 04 (small penalty) for nonterminal states ± 1 for terminal states Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Value of plans The utility of executing a plan p from state s is given by: Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Value of plans The utility of executing a plan p from state s is given by: ∞ � v p ( s ) = E [ γ t r ( S t )] t =0 Paolo Turrini Intro to AI (2nd Part)

Intro to AI (2nd Part) Value of plans The utility of executing a plan p from state s is given by: ∞ � v p ( s ) = E [ γ t r ( S t )] t =0 Where S t is a random variable and the expectation is wrt to the probability distribution over state sequences determined by s and p . Paolo Turrini Intro to AI (2nd Part)

Making Complex Decisions Paolo Turrini Department of Computing, - PowerPoint PPT Presentation

Intro to AI (2nd Part) Making Complex Decisions Paolo Turrini Department of Computing, Imperial College London Introduction to Artificial Intelligence 2nd Part Paolo Turrini Intro to AI (2nd Part) Intro to AI (2nd Part) AlphaGo beats World

Today Making Simple Decisions Making Decisions Making Sequential Decisions Planning

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Intermembrane Space H + H + Cyt c Co Q Complex Complex III IV H + ATPase H + Complex

Making Decisions 10 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 10 1 10 Making Decisions

Making better decisions and improving Making better decisions and improving performance

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

GCSE or Equivalent Options Decisions! Decisions! Decisions! An important time for our Year 10

Doing Your Taxes Decisions Decisions Decisions How do I get ready? Should I

Dysphagia: decisions, decisions, decisions Sean White Home Enteral Feed Dietitian Sheffield

$ Lesson One Making Decisions 04/09 the decision-making process The decision-making process

Overview of Complex Networks Complex Networks Principles of Complex Systems | @pocsvox Basic

Complex Networks Principles of Complex Systems Basic definitions Examples of CSYS/MATH 300,

Why Complex-Valued When Are Integration . . . Relation to Complex . . . Fuzzy? Why Complex

Math 211 Math 211 Complex Numbers and Matrices October 29, 2001 2 Complex Numbers Complex

Complex Networks Basic definitions Principles of Complex Systems Books Course 300, Fall, 2008

Continuous-time Markov Decisions based on Partial Exploration Pranav Ashok Technical University

All Units...Pick up your Charter Renewal Packet & Schedule your Charter Turn-in (about 15

APPROACHES TO MANAGEMENT Dr.M. Thenmozhi Professor Department of Management Studies Indian

Resul sults ts Base sed d Accounta countability bility Are we really making a difference?

Collective Decision Making with Incomplete Individual Opinions Zoi Terzopoulou Institute for

Mathematical Fuzzy Logic in Reasoning and Decision Making under Uncertainty Hykel Hosni

Supporting Robust Decisions with Classification and Data-Mining Algorithms Benjamin Bryant

Adaptive inference and its relations to sequential decision making Alexandra Carpentier 1 OvGU

Making Complex Decisions Paolo Turrini Department of Computing, - PowerPoint PPT Presentation

Intro to AI (2nd Part) Making Complex Decisions Paolo Turrini Department of Computing, Imperial College London Introduction to Artificial Intelligence 2nd Part Paolo Turrini Intro to AI (2nd Part) Intro to AI (2nd Part) AlphaGo beats World

Today Making Simple Decisions Making Decisions Making Sequential Decisions Planning

Complex Numbers Complex Numbers 1 / 19 Complex Numbers Complex numbers ( C ) are an extension of

Intermembrane Space H + H + Cyt c Co Q Complex Complex III IV H + ATPase H + Complex

Making Decisions 10 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 10 1 10 Making Decisions

Making better decisions and improving Making better decisions and improving performance

An introduction to complex numbers The complex numbers Are the real numbers not sufficient? A

Making maps pretty Andrea Aime Jim Groffen Making Maps Pretty Making Maps Pretty 1 1 Making

GCSE or Equivalent Options Decisions! Decisions! Decisions! An important time for our Year 10

Doing Your Taxes Decisions Decisions Decisions How do I get ready? Should I

Dysphagia: decisions, decisions, decisions Sean White Home Enteral Feed Dietitian Sheffield

$ Lesson One Making Decisions 04/09 the decision-making process The decision-making process

Overview of Complex Networks Complex Networks Principles of Complex Systems | @pocsvox Basic

Complex Networks Principles of Complex Systems Basic definitions Examples of CSYS/MATH 300,

Why Complex-Valued When Are Integration . . . Relation to Complex . . . Fuzzy? Why Complex

Math 211 Math 211 Complex Numbers and Matrices October 29, 2001 2 Complex Numbers Complex

Complex Networks Basic definitions Principles of Complex Systems Books Course 300, Fall, 2008

Continuous-time Markov Decisions based on Partial Exploration Pranav Ashok Technical University

All Units...Pick up your Charter Renewal Packet &amp; Schedule your Charter Turn-in (about 15

APPROACHES TO MANAGEMENT Dr.M. Thenmozhi Professor Department of Management Studies Indian

Resul sults ts Base sed d Accounta countability bility Are we really making a difference?

Collective Decision Making with Incomplete Individual Opinions Zoi Terzopoulou Institute for

Mathematical Fuzzy Logic in Reasoning and Decision Making under Uncertainty Hykel Hosni

Supporting Robust Decisions with Classification and Data-Mining Algorithms Benjamin Bryant

Adaptive inference and its relations to sequential decision making Alexandra Carpentier 1 OvGU

All Units...Pick up your Charter Renewal Packet & Schedule your Charter Turn-in (about 15