outline
play

Outline Decision making Utility Theory Lecture 11 Decision - PDF document

Outline Decision making Utility Theory Lecture 11 Decision Trees Utility Theory Chapter 16 in R&N Note: Some of the material we are October 14, 2008 covering today is not in the textbook CS 486/686 1 2 CS486/686


  1. Outline • Decision making – Utility Theory Lecture 11 – Decision Trees Utility Theory • Chapter 16 in R&N – Note: Some of the material we are October 14, 2008 covering today is not in the textbook CS 486/686 1 2 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Decision Making under Uncertainty Decision Making under Uncertainty • I give robot a planning problem: I want • But it’s more complex: coffee – it could wait 45 minutes for coffee maker to – but coffee maker is broken: robot reports be fixed “No plan!” – what’s better: tea now? coffee in 45 • If I want more robust behavior – if I minutes? want robot to know what to do when my – could express preferences for primary goal can’t be satisfied – I should <beverage,time> pairs provide it with some indication of my preferences over alternatives – e.g., coffee better than tea, tea better than water, water better than nothing, etc. 3 4 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Preferences Preferences • A preference ordering ≽ is a ranking of • If an agent’s actions are deterministic all possible states of affairs (worlds) S then we know what states will occur – these could be outcomes of actions, truth • If an agent’s actions are not assts, states in a search problem, etc. deterministic then we represent this by – s ≽ t: means that state s is at least as lotteries good as t – Probability distribution over outcomes – s ≻ t: means that state s is strictly – Lottery L=[p 1 ,s 1 ;p 2 ,s 2 ;…;p n ,s n ] preferred to t – s 1 occurs with prob p 1 , s 2 occurs with prob – s~t: means that the agent is indifferent p 2 ,… between states s and t 5 6 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson 1

  2. Axioms Why Impose These Conditions? • Structure of preference • Orderability: Given 2 states A and B ordering imposes certain – (A ≻ B) v (B ≻ A) v (A ~ B) “rationality requirements” (it • Transitivity: Given 3 states, A, B, and C is a weak ordering) ≻ Best – (A ≻ B) ∧ (B ≻ C) ⇒ (A ≻ C) • E.g., why transitivity? • Continuity: – A ≻ B ≻ C ⇒ ∃ p [p,A;1-p,C] ~ B – Suppose you (strictly) prefer ≻ • Substitutability: coffee to tea, tea to OJ, OJ to coffee – A~B � [p,A;1-p,C] ~ [p,B;1-p,C] – If you prefer X to Y, you’ll • Monotonicity: ≻ trade me Y plus $1 for X – A ≻ B ⇒ (p ≥ q ⇔ [p,A;1-p,B] ≽ [q,A;1-q,B] – I can construct a “money pump” • Decomposibility: and extract arbitrary amounts – [p,A;1-p,[q,B;1-q,C]] ~ [p,A;(1-p)q,B; (1-p)(1-q),C] Worst of money from you 7 8 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Decision Problems: Certainty Decision Making under Uncertainty • A decision problem under certainty is: c, ~mess – a set of decisions D getcoffee donothing ~c, ~mess ~c, mess • e.g., paths in search graph, plans, actions, etc. • Suppose actions don’t have deterministic outcomes – a set of outcomes or states S – e.g., when robot pours coffee, it spills 20% of time, making a mess • e.g., states you could reach by executing a plan – preferences: c, ~mess ≻ ~c,~mess ≻ ~c, mess – an outcome function f : D → S • What should robot do? • the outcome of any decision – decision getcoffee leads to a good outcome and a bad outcome with some probability – a preference ordering ≽ over S – decision donothing leads to a medium outcome for sure • Should robot be optimistic? pessimistic? • A solution to a decision problem is any • Really odds of success should influence decision d* ∊ D such that f(d*) ≽ f(d) for all d ∊ D – but how? 9 10 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Utilities Expected Utility • Under conditions of uncertainty, each • Rather than just ranking outcomes, we must decision d induces a distribution Pr d over quantify our degree of preference possible outcomes – e.g., how much more important is c than ~mess – Pr d (s) is probability of outcome s under • A utility function U:S → ℝ associates a real- decision d valued utility with each outcome. – U(s) measures your degree of preference for s • Note: U induces a preference ordering ≽ U • The expected utility of decision d is over S defined as: s ≽ U t iff U(s) ≥ U(t) defined ∑ – obviously ≽ U will be reflexive, transitive, = EU ( d ) Pr ( s ) U ( s ) d connected ∈ s S 11 12 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson 2

  3. Expected Utility The MEU Principle • The principle of maximum expected c, ~mess utility (MEU) states that the optimal getcoffee donothing ~c, ~mess ~c, mess decision under conditions of uncertainty When robot pours coffee, it spills 20% of time, making is that with the greatest expected a mess utility. If U(c,~ms) = 10, U(~c,~ms) = 5, U(~c,ms) = 0, • In our example then EU(getcoffee) = (0.8)(10)+(0.2)(0)=8 – if my utility function is the first one, my and EU(donothing) = 5 robot should get coffee If U(c,~ms) = 10, U(~c,~ms) = 9, U(~c,ms) = 0, – if your utility function is the second one, then EU(getcoffee) = (0.8)(10)+(0.2)(0)=8 your robot should do nothing and EU(donothing) = 9 13 14 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Expected Utility: Notes Decision Problems: Uncertainty • Note that this viewpoint accounts for • A decision problem under uncertainty is: both: – a set of decisions D – uncertainty in action outcomes – a set of outcomes or states S – uncertainty in state of knowledge – an outcome function Pr : D →Δ (S) – any combination of the two 0.7 t1 • Δ (S) is the set of distributions over S (e.g., Pr d ) – a utility function U over S s1 a 0.3 t2 0.7 s1 0.8 • A solution to a decision problem under 0.2 s2 a 0.3 s2 b uncertainty is any d* ∊ D such that EU(d*) ≽ 0.3 s0 0.7 w1 b s3 EU(d) for all d ∊ D 0.7 0.3 w2 s4 • Again, for single-shot problems, this is trivial Stochastic actions Uncertain knowledge 15 16 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson Expected Utility: Notes So What are the Complications? • Why MEU? Where do utilities come from? • Outcome space is large – underlying foundations of utility theory tightly – like all of our problems, states spaces can be huge couple utility with action/choice – don’t want to spell out distributions like Pr d explicitly – a utility function can be determined by asking – Soln: Bayes nets (or related: influence diagrams ) someone about their preferences for actions in • Decision space is large specific scenarios (or “lotteries” over outcomes) • Utility functions needn’t be unique – usually our decisions are not one-shot actions – rather they involve sequential choices (like plans) – if I multiply U by a positive constant, all decisions have same relative utility – if we treat each plan as a distinct decision, decision space is too large to handle directly – if I add a constant to U, same thing – Soln: use dynamic programming methods to construct – U is unique up to positive affine transformation optimal plans (actually generalizations of plans, called policies… like in game trees) 17 18 CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson CS486/686 Lecture Slides (c) 2008 C. Boutilier, P.Poupart & K. Larson 3

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend