Module 3 Utility Theory CS 886 Sequential Decision Making and - PowerPoint PPT Presentation

Decision Making under Uncertainty • I give planning problem to robot: I want coffee – but coffee maker is broken: robot reports “No plan!” • For more robust behaviour, I should provide some indication of my preferences over alternatives – e.g., coffee better than tea, – tea better than water, – water better than nothing, etc. 2 CS886 (c) 2013 Pascal Poupart

Decision Making under Uncertainty • But it’s more complex: – it could wait 45 minutes for coffee maker to be fixed – what’s better: tea now? coffee in 45 minutes? – could express preferences for <beverage,time> pairs 3 CS886 (c) 2013 Pascal Poupart

Preferences • A preference ordering ≽ is a ranking of all possible states of affairs (worlds) S – these could be outcomes of actions, truth assts, states in a search problem, etc. – s ≽ t: means that state s is at least as good as t – s ≻ t: means that state s is strictly preferred to t – s~t: means that the agent is indifferent between states s and t 4 CS886 (c) 2013 Pascal Poupart

Lotteries • If an agent’s actions are deterministic then we know what states will occur • If an agent’s actions are not deterministic then we represent this by lotteries – Probability distribution over outcomes – Lottery L=[p 1 ,s 1 ;p 2 ,s 2 ;…; p n ,s n ] – s 1 occurs with prob p 1 , s 2 occurs with prob p 2 ,… 5 CS886 (c) 2013 Pascal Poupart

Axioms • Orderability: Given 2 states A and B (A ≻ B) v (B ≻ A) v (A ~ B) • Transitivity: Given 3 states, A, B, and C (A ≻ B)  (B ≻ C)  (A ≻ C) • Continuity: A ≻ B ≻ C   p [p,A;1-p,C] ~ B • Substitutability: A~B  [p,A;1-p,C] ~ [p,B;1-p,C] • Monotonicity: A ≻ B  (p  q  [p,A;1-p,B] ≽ [q,A;1-q,B] • Decomposability: [p,A;1-p,[q,B;1-q,C]] ~ [p,A;(1-p)q,B; (1-p)(1-q),C] 6 CS886 (c) 2013 Pascal Poupart

Why Impose These Conditions? • Structure of preference ordering imposes certain “rationality requirements” (it is a ≻ Best weak ordering) • E.g., why transitivity? ≻ – Suppose you (strictly) prefer coffee to tea, tea to OJ, OJ to coffee – If you prefer X to Y, you’ll trade me ≻ Y plus $1 for X – I can construct a “money pump” and extract arbitrary amounts of money Worst from you 7 CS886 (c) 2013 Pascal Poupart

Decision Problems: Certainty • A decision problem under certainty is: – a set of decisions D • e.g., paths in search graph, plans, actions, etc. – a set of outcomes or states S • e.g., states you could reach by executing a plan – an outcome function f : D → S • the outcome of any decision – a preference ordering ≽ over S • A solution to a decision problem is any d* ∊ D such that f(d*) ≽ f(d) for all d ∊ D 8 CS886 (c) 2013 Pascal Poupart

Decision Making under Uncertainty c, ~mess ~c, ~mess getcoffee donothing ~c, mess • Suppose actions don’t have deterministic outcomes – e.g., when robot pours coffee, it spills 20% of time (mess) – preferences: c, ~mess ≻ ~c,~mess ≻ ~c, mess • What should robot do? – getcoffee leads to good/bad outcome with some probability – donothing leads to medium outcome for sure • Should robot be optimistic? pessimistic? • Odds of success should influence decision – but how? 9 CS886 (c) 2013 Pascal Poupart

Utilities • Instead of ranking outcomes, quantify degrees of preference – e.g., how much more important is c than ~mess • A utility function U:S → ℝ associates a real- valued utility with each outcome. – U(s) measures degree of preference for s • Note: U induces a preference ordering ≽ U over S defined as: s ≽ U t iff U(s) ≥ U(t) – obviously ≽ U is reflexive and transitive 10 CS886 (c) 2013 Pascal Poupart

Expected Utility • Under uncertainty, each decision d induces a distribution Pr d over possible outcomes – Pr d (s) is probability of outcome s under decision d • The expected utility of decision d is defined   EU ( d ) Pr ( s ) U ( s ) d  s S 11 CS886 (c) 2013 Pascal Poupart

Expected Utility c, ~mess ~c, ~mess getcoffee donothing ~c, mess When robot pours coffee, it spills 20% of time (mess) If U(c,~ms) = 10, U(~c,~ms) = 5, U(~c,ms) = 0, then EU(getcoffee) = (0.8)(10)+(0.2)(0)=8 and EU(donothing) = 5 If U(c,~ms) = 10, U(~c,~ms) = 9, U(~c,ms) = 0, then EU(getcoffee) = (0.8)(10)+(0.2)(0)=8 and EU(donothing) = 9 12 CS886 (c) 2013 Pascal Poupart

The MEU Principle • The principle of maximum expected utility (MEU) states that the optimal decision under conditions of uncertainty is that with the greatest expected utility. • In our example – if my utility function is the first one, my robot should get coffee – if your utility function is the second one, your robot should do nothing 13 CS886 (c) 2013 Pascal Poupart

Decision Problems: Uncertainty • A decision problem under uncertainty is: – a set of decisions D – a set of outcomes or states S – an outcome function Pr : D → Δ (S) • Δ (S) is the set of distributions over S (e.g., Pr d ) – a utility function U over S • A solution is any d* ∊ D such that EU(d*) ≽ EU(d) for all d ∊ D • For single-shot problems, this is trivial 14 CS886 (c) 2013 Pascal Poupart

Expected Utility: Notes • This viewpoint accounts for: – uncertainty in action outcomes – uncertainty in state of knowledge – any combination of the two 0.7 t1 a s1 0.3 t2 0.8 0.7 s1 0.2 s2 a b 0.3 s2 0.3 s0 0.7 w1 b s3 0.7 0.3 w2 s4 Stochastic actions Uncertain knowledge 15 CS886 (c) 2013 Pascal Poupart

Expected Utility: Notes • Why MEU? Where do utilities come from? – underlying foundations of utility theory tightly couple utility with action/choice – a utility function can be determined by asking someone about their preferences for actions in specific scenarios (or “lotteries” over outcomes ) • Utility functions need not be unique – if I multiply U by a positive constant, all decisions have same relative utility – if I add a constant to U, same thing – U is unique up to positive affine transformations 16 CS886 (c) 2013 Pascal Poupart

So What are the Complications? • Outcome space is large – states spaces can be huge – don’t want to spell out distributions like Pr d explicitly • Decision space is large – usually decisions are not one-shot actions – rather they involve sequential choices (like plans) – if we treat each plan as a distinct decision, decision space is too large to handle directly – Soln: use dynamic programming methods to construct optimal plans (actually generalizations of plans, called policies… like in game trees) 17 CS886 (c) 2013 Pascal Poupart

A Simple Example • Suppose we have two actions: a, b • We have time to execute two actions in sequence – [a,a], [a,b], [b,a], [b,b] • Actions are stochastic: Pr a (s i | s j ) – e.g., Pr a (s 2 | s 1 ) = .9 means prob. of moving to state s 2 when a is performed at s 1 is .9 – similar distribution for action b • How good is a particular sequence of actions? 18 CS886 (c) 2013 Pascal Poupart

Distributions for Action Sequences s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b a b a b . 5 . 5 . 6 . 4 . 2 . 8 . 7 . 3 . 1 . 9 . 2 . 8 . 2 . 8 . 7 . 3 s4 s5 s6 s7 s8 s9 s10 s11 s14 s15 s16 s17 s18 s19 s20 s21 19 CS886 (c) 2013 Pascal Poupart

Distributions for Action Sequences s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b a b a b . 5 . 5 . 6 . 4 . 2 . 8 . 7 . 3 . 1 . 9 . 2 . 8 . 2 . 8 . 7 . 3 s4 s5 s6 s7 s8 s9 s10 s11 s14 s15 s16 s17 s18 s19 s20 s21 • Sequence [a,a] gives distribution over “final states” – Pr(s4) = .45, Pr(s5) = .45, Pr(s8) = .02, Pr(s9) = .08 • Similarly: – [a,b]: Pr(s6) = .54, Pr(s7) = .36, Pr(s10) = .07, Pr(s11) = .03 – and similar distributions for sequences [b,a] and [b,b] 20 CS886 (c) 2013 Pascal Poupart

How Good is a Sequence? • We associate utilities with the “final” outcomes – how good is it to end up at s4, s5, s6, … – note: we could assign utilities to the intermediate states s2, s3, s12, and s13 also. We ignore this for now. Technically, think of utility u(s4) as utility of entire trajectory or sequence of states we pass through. • Now we have: – EU(aa) = .45u(s4) + .45u(s5) + .02u(s8) + .08u(s9) – EU(ab) = .54u(s6) + .36u(s7) + .07u(s10) + .03u(s11) – etc… 21 CS886 (c) 2013 Pascal Poupart

Why Sequences might be bad s1 a b . 9 . 1 . 2 . 8 s2 s3 s12 s13 a b a b a b a b . 5 . 5 . 6 . 4 . 2 . 8 . 7 . 3 . 1 . 9 . 2 . 8 . 2 . 8 . 7 . 3 s4 s5 s6 s7 s8 s9 s10 s11 s14 s15 s16 s17 s18 s19 s20 s21 • Suppose we do a first; we could reach s2 or s3: – At s2, assume: EU(a) = .5u(s4) + .5u(s5) > EU(b) = .6u(s6) + .4u(s7) – At s3: EU(a) = .2u(s8) + .8u(s9) < EU(b) = .7u(s10) + .3u(s11) • After doing a first, we want to do a next if we reach s2 , but we want to do b second if we reach s3 22 CS886 (c) 2013 Pascal Poupart

Module 3 Utility Theory CS 886 Sequential Decision Making and - PowerPoint PPT Presentation

Module 3 Utility Theory CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo 1 CS886 (c) 2013 Pascal Poupart Decision Making under Uncertainty I give planning problem to robot: I want coffee but coffee

JOBS IN VALUE CHAINS ANALYSIS INTRODUCTION Roadmap: Why are we here today? Agenda for the

WebEOC Training 1 Topics Module 1 WebEOC Overview Module 2 Getting Started Module 3

Module E: Solving Systems of Linear Equations Module E Math 237 Module E Section E.0 Section

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -

Emergency Management Roles and Responsibilities Joe Myers Agenda MODULE 1 WHAT IS MODULE

1 MODULE SPECIFICATION Module Aims The module aims to deliver knowledge of the essential

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

6.15 Module 15: Research and Presentation Module Title Research and Presentation Module NFQ

Module Title: Broadcasting & Presentation Skills Level : 4 Credit Value : 20 Code of module

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -

Using the Code Review Module Szeged DrupalCon Using the Code Review Module Doug Green Stella

Module 3 Doing a Noise Audit This module and Module 2 provide the necessary training needed

MODERATE SEDATION MODULE MODERATE SEDATION MODULE MODERATE SEDATION MODULE Introduction

Auxiliary Rubrics Module 6 Module 5 Review At the conclusion of Module 5, the team completed

Two sides of emergence in participatory simulations AISB05/SIC-14/4/05 Paul Guyot, Universit

Data Str u ct u res : Vocab , Le x emes and StringStore AD VAN C E D N L P W ITH SPAC Y Ines

What is Verification? Oded Maler CNRS-VERIMAG Grenoble, France Control from Computer Science

Overlaying a Racial Wealth Equity Lens on Policy and Programs Agenda

Welcome! #IAFMeetup Agenda 1. Welcome & introductions 2. 2019-20 Board report, Q&A 3.

Concepts of Concurrent Computation Bertrand Meyer Sebastian Nanz Lecture 11: CCS Introduction

Greedy Algorithms Week 5 Objectives Subproblem structure Greedy algorithm

Towards an integrated bioprocess Introduc)on to week 5 Prof.

Module 3 Utility Theory CS 886 Sequential Decision Making and - PowerPoint PPT Presentation

Module 3 Utility Theory CS 886 Sequential Decision Making and Reinforcement Learning University of Waterloo 1 CS886 (c) 2013 Pascal Poupart Decision Making under Uncertainty I give planning problem to robot: I want coffee but coffee

JOBS IN VALUE CHAINS ANALYSIS INTRODUCTION Roadmap: Why are we here today? Agenda for the

WebEOC Training 1 Topics Module 1 WebEOC Overview Module 2 Getting Started Module 3

Module E: Solving Systems of Linear Equations Module E Math 237 Module E Section E.0 Section

Module V: Vector Spaces Module V Math 237 Module V Section V.0 Section V.1 Section V.2

Agenda Module 1 - Risk, Volatility &amp; Timescale Module 2 - Asset Allocation Module 3 -

Emergency Management Roles and Responsibilities Joe Myers Agenda MODULE 1 WHAT IS MODULE

1 MODULE SPECIFICATION Module Aims The module aims to deliver knowledge of the essential

Canadian Bioinformatics Workshops www.bioinformatics.ca Module #: Title of Module 2 Module bio

Module A: Algebraic properties of linear maps Module A Math 237 Module A Section A.1 Section

6.15 Module 15: Research and Presentation Module Title Research and Presentation Module NFQ

Module Title: Broadcasting &amp; Presentation Skills Level : 4 Credit Value : 20 Code of module

Agenda Module 1 - Risk, Volatility &amp; Timescale Module 2 - Asset Allocation Module 3 -

Using the Code Review Module Szeged DrupalCon Using the Code Review Module Doug Green Stella

Module 3 Doing a Noise Audit This module and Module 2 provide the necessary training needed

MODERATE SEDATION MODULE MODERATE SEDATION MODULE MODERATE SEDATION MODULE Introduction

Auxiliary Rubrics Module 6 Module 5 Review At the conclusion of Module 5, the team completed

Two sides of emergence in participatory simulations AISB05/SIC-14/4/05 Paul Guyot, Universit

Data Str u ct u res : Vocab , Le x emes and StringStore AD VAN C E D N L P W ITH SPAC Y Ines

What is Verification? Oded Maler CNRS-VERIMAG Grenoble, France Control from Computer Science

Overlaying a Racial Wealth Equity Lens on Policy and Programs Agenda

Welcome! #IAFMeetup Agenda 1. Welcome &amp; introductions 2. 2019-20 Board report, Q&amp;A 3.

Concepts of Concurrent Computation Bertrand Meyer Sebastian Nanz Lecture 11: CCS Introduction

Greedy Algorithms Week 5 Objectives Subproblem structure Greedy algorithm

Towards an integrated bioprocess Introduc)on to week 5 Prof.

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -

Module Title: Broadcasting & Presentation Skills Level : 4 Credit Value : 20 Code of module

Agenda Module 1 - Risk, Volatility & Timescale Module 2 - Asset Allocation Module 3 -

Welcome! #IAFMeetup Agenda 1. Welcome & introductions 2. 2019-20 Board report, Q&A 3.