goals and preferences
play

Goals and Preferences Alice . . . went on Would you please tell me, - PowerPoint PPT Presentation

Goals and Preferences Alice . . . went on Would you please tell me, please, which way I ought to go from here? That depends a good deal on where you want to get to, said the Cat. I dont much care where said Alice.


  1. Factored Representation of Utility Suppose the outcomes can be described in terms of features X 1 , . . . , X n . An additive utility is one that can be decomposed into set of factors: u ( X 1 , . . . , X n ) = f 1 ( X 1 ) + · · · + f n ( X n ) . This assumes additive independence . Strong assumption: contribution of each feature doesn’t depend on other features. Many ways to represent the same utility: — a number can be added to one factor as long as it is subtracted from others. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 28

  2. Additive Utility An additive utility has a canonical representation: u ( X 1 , . . . , X n ) = w 1 × u 1 ( X 1 ) + · · · + w n × u n ( X n ) . If best i is the best value of X i , u i ( X i = best i ) = 1. If worst i is the worst value of X i , u i ( X i = worst i ) = 0. w i are weights, � i w i = 1. The weights reflect the relative importance of features. We can determine weights by comparing outcomes. w 1 = � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 29

  3. Additive Utility An additive utility has a canonical representation: u ( X 1 , . . . , X n ) = w 1 × u 1 ( X 1 ) + · · · + w n × u n ( X n ) . If best i is the best value of X i , u i ( X i = best i ) = 1. If worst i is the worst value of X i , u i ( X i = worst i ) = 0. w i are weights, � i w i = 1. The weights reflect the relative importance of features. We can determine weights by comparing outcomes. w 1 = u ( best 1 , x 2 , . . . , x n ) − u ( worst 1 , x 2 , . . . , x n ) . for any values x 2 , . . . , x n of X 2 , . . . , X n . � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 30

  4. General Setup for Additive Utility Suppose there are: multiple users multiple alternatives to choose among, e.g., hotel 1,. . . multiple criteria upon which to judge, e.g., rate , location utility is a function of � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 31

  5. General Setup for Additive Utility Suppose there are: multiple users multiple alternatives to choose among, e.g., hotel 1,. . . multiple criteria upon which to judge, e.g., rate , location utility is a function of users and alternatives fact ( crit , alt ) is the fact about the domain value of criteria crit for alternative alt . E.g., fact ( rate , hotel 1) is the room rate for hotel#1, which is $125 per night. score ( val , user , crit ) gives the score of the domain value for user on criteria crit . � utility ( user , alt ) = weight ( user , crit ) × score ( fact ( crit , alt ) , user , crit ) crit for user , alternative alt , criteria crit � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 32

  6. Complements and Substitutes Often additive independence is not a good assumption. Values x 1 of feature X 1 and x 2 of feature X 2 are complements if having both is better than the sum of the two. Values x 1 of feature X 1 and x 2 of feature X 2 are substitutes if having both is worse than the sum of the two. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 33

  7. Complements and Substitutes Often additive independence is not a good assumption. Values x 1 of feature X 1 and x 2 of feature X 2 are complements if having both is better than the sum of the two. Values x 1 of feature X 1 and x 2 of feature X 2 are substitutes if having both is worse than the sum of the two. Example: on a holiday ◮ An excursion for 6 hours North on day 3. ◮ An excursion for 6 hours South on day 3. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 34

  8. Complements and Substitutes Often additive independence is not a good assumption. Values x 1 of feature X 1 and x 2 of feature X 2 are complements if having both is better than the sum of the two. Values x 1 of feature X 1 and x 2 of feature X 2 are substitutes if having both is worse than the sum of the two. Example: on a holiday ◮ An excursion for 6 hours North on day 3. ◮ An excursion for 6 hours South on day 3. Example: on a holiday ◮ A trip to a location 3 hours North on day 3 ◮ The return trip for the same day. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 35

  9. Generalized Additive Utility A generalized additive utility can be written as a sum of factors: u ( X 1 , . . . , X n ) = f 1 ( X 1 ) + · · · + f k ( X k ) where X i ⊆ { X 1 , . . . , X n } . An intuitive canonical representation is difficult to find. It can represent complements and substitutes. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 36

  10. Utility and time Would you prefer $1000 today or $1000 next year? What price would you pay now to have an eternity of happiness? How can you trade off pleasures today with pleasures in the future? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 37

  11. Pascal’s Wager (1670) Decide whether to believe in God. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 38

  12. Pascal’s Wager (1670) Decide whether to believe in God. Believe in God Utility God Exists � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 39

  13. Utility and time How would you compare the following sequences of rewards (per week): A: $1000000, $0, $0, $0, $0, $0,. . . B: $1000, $1000, $1000, $1000, $1000,. . . C: $1000, $0, $0, $0, $0,. . . D: $1, $1, $1, $1, $1,. . . E: $1, $2, $3, $4, $5,. . . � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 40

  14. Rewards and Values Suppose the agent receives a sequence of rewards r 1 , r 2 , r 3 , r 4 , . . . in time. What utility should be assigned? “Return” or “value” � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 41

  15. Rewards and Values Suppose the agent receives a sequence of rewards r 1 , r 2 , r 3 , r 4 , . . . in time. What utility should be assigned? “Return” or “value” ∞ � total reward V = r i i =1 average reward V = lim n →∞ ( r 1 + · · · + r n ) / n � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 42

  16. Average vs Accumulated Rewards Agent goes on forever? yes no Agent gets stuck in "absorbing" state(s) with zero reward? yes no � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 43

  17. Rewards and Values Suppose the agent receives a sequence of rewards r 1 , r 2 , r 3 , r 4 , . . . in time. discounted return V = r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · γ is the discount factor 0 ≤ γ ≤ 1. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 44

  18. Properties of the Discounted Rewards The discounted return for rewards r 1 , r 2 , r 3 , r 4 , . . . is r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · V = = � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 45

  19. Properties of the Discounted Rewards The discounted return for rewards r 1 , r 2 , r 3 , r 4 , . . . is r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · V = = r 1 + γ ( r 2 + γ ( r 3 + γ ( r 4 + . . . ))) If V t is the value obtained from time step t V t = � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 46

  20. Properties of the Discounted Rewards The discounted return for rewards r 1 , r 2 , r 3 , r 4 , . . . is r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · V = = r 1 + γ ( r 2 + γ ( r 3 + γ ( r 4 + . . . ))) If V t is the value obtained from time step t V t = r t + γ V t +1 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 47

  21. Properties of the Discounted Rewards The discounted return for rewards r 1 , r 2 , r 3 , r 4 , . . . is r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · V = = r 1 + γ ( r 2 + γ ( r 3 + γ ( r 4 + . . . ))) If V t is the value obtained from time step t V t = r t + γ V t +1 How is the infinite future valued compared to immediate rewards? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 48

  22. Properties of the Discounted Rewards The discounted return for rewards r 1 , r 2 , r 3 , r 4 , . . . is r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · V = = r 1 + γ ( r 2 + γ ( r 3 + γ ( r 4 + . . . ))) If V t is the value obtained from time step t V t = r t + γ V t +1 How is the infinite future valued compared to immediate rewards? 1 + γ + γ 2 + γ 3 + · · · = 1 / (1 − γ ) Therefore minimum reward ≤ V t ≤ maximum reward 1 − γ 1 − γ We can approximate V with the first k terms, with error: V − ( r 1 + γ r 2 + · · · + γ k − 1 r k ) = γ k V k +1 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 49

  23. Allais Paradox (1953) What would you prefer: A: $1 m — one million dollars B: lottery [0 . 10 : $2 . 5 m , 0 . 89 : $1 m , 0 . 01 : $0] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 50

  24. Allais Paradox (1953) What would you prefer: A: $1 m — one million dollars B: lottery [0 . 10 : $2 . 5 m , 0 . 89 : $1 m , 0 . 01 : $0] What would you prefer: C: lottery [0 . 11 : $1 m , 0 . 89 : $0] D: lottery [0 . 10 : $2 . 5 m , 0 . 9 : $0] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 51

  25. Allais Paradox (1953) What would you prefer: A: $1 m — one million dollars B: lottery [0 . 10 : $2 . 5 m , 0 . 89 : $1 m , 0 . 01 : $0] What would you prefer: C: lottery [0 . 11 : $1 m , 0 . 89 : $0] D: lottery [0 . 10 : $2 . 5 m , 0 . 9 : $0] It is inconsistent with the axioms of preferences to have A ≻ B and D ≻ C . � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 52

  26. Allais Paradox (1953) What would you prefer: A: $1 m — one million dollars B: lottery [0 . 10 : $2 . 5 m , 0 . 89 : $1 m , 0 . 01 : $0] What would you prefer: C: lottery [0 . 11 : $1 m , 0 . 89 : $0] D: lottery [0 . 10 : $2 . 5 m , 0 . 9 : $0] It is inconsistent with the axioms of preferences to have A ≻ B and D ≻ C . A,C: lottery [0 . 11 : $1 m , 0 . 89 : X ] B,D: lottery [0 . 10 : $2 . 5 m , 0 . 01 : $0 , 0 . 89 : X ] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 53

  27. Framing Effects [Tversky and Kahneman] A disease is expected to kill 600 people. Two alternative programs have been proposed: Program A: 200 people will be saved Program B: probability 1/3: 600 people will be saved probability 2/3: no one will be saved Which program would you favor? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 54

  28. Framing Effects [Tversky and Kahneman] A disease is expected to kill 600 people. Two alternative programs have been proposed: Program C: 400 people will die Program D: probability 1/3: no one will die probability 2/3: 600 will die Which program would you favor? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 55

  29. Framing Effects [Tversky and Kahneman] A disease is expected to kill 600 people. Two alternative programs have been proposed: Program A: 200 people will be saved Program B: probability 1/3: 600 people will be saved probability 2/3: no one will be saved Which program would you favor? A disease is expected to kill 600 people. Two alternative programs have been proposed: Program C: 400 people will die Program D: probability 1/3: no one will die probability 2/3: 600 will die Which program would you favor? Tversky and Kahneman: 72% chose A over B. 22% chose C over D. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 56

  30. Prospect Theory psychological value $ Losses Gains In mixed gambles, loss aversion causes extreme risk-averse choices In bad choices, diminishing responsibility causes risk seeking. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 57

  31. Reference Points Consider Anthony and Betty: Anthony’s current wealth is $1 million. Betty’s current wealth is $4 million. They are both offered the choice between a gamble and a sure thing: Gamble: equal chance to end up owning $1 million or $4 million. Sure Thing: own $2 million What does expected utility theory predict? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 58

  32. Reference Points Consider Anthony and Betty: Anthony’s current wealth is $1 million. Betty’s current wealth is $4 million. They are both offered the choice between a gamble and a sure thing: Gamble: equal chance to end up owning $1 million or $4 million. Sure Thing: own $2 million What does expected utility theory predict? What does prospect theory predict? [From D. Kahneman, Thinking, Fast and Slow , 2011, pp. 275-276.] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 59

  33. Framing Effects What do you think of Alan and Ben: Alan: intelligent—industrious—impulsive—critical— stubborn—envious � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 60

  34. Framing Effects What do you think of Alan and Ben: Ben: envious—stubborn—critical—impulsive— industrious—intelligent � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 61

  35. Framing Effects What do you think of Alan and Ben: Alan: intelligent—industrious—impulsive—critical— stubborn—envious Ben: envious—stubborn—critical—impulsive— industrious—intelligent [From D. Kahneman, Thinking Fast and Slow, 2011, p. 82] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 62

  36. Framing Effects Suppose you had bought tickets for the theatre for $50. When you got to the theatre, you had lost the tickets. You have your credit card and can buy equivalent tickets for $50. Do you buy the replacement tickets on your credit card? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 63

  37. Framing Effects Suppose you had bought tickets for the theatre for $50. When you got to the theatre, you had lost the tickets. You have your credit card and can buy equivalent tickets for $50. Do you buy the replacement tickets on your credit card? Suppose you had $50 in your pocket to buy tickets. When you got to the theatre, you had lost the $50. You have your credit card and can buy equivalent tickets for $50. Do you buy the tickets on your credit card? [From R.M. Dawes, Rational Choice in an Uncertain World, 1988.] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 64

  38. The Ellsberg Paradox Two bags: Bag 1 40 white chips, 30 yellow chips, 30 green chips Bag 2 40 white chips, 60 chips that are yellow or green What do you prefer: A: Receive $1m if a white or yellow chip is drawn from bag 1 B: Receive $1m if a white or yellow chip is drawn from bag 2 C: Receive $1m if a white or green chip is drawn from bag 2 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 65

  39. The Ellsberg Paradox Two bags: Bag 1 40 white chips, 30 yellow chips, 30 green chips Bag 2 40 white chips, 60 chips that are yellow or green What do you prefer: A: Receive $1m if a white or yellow chip is drawn from bag 1 B: Receive $1m if a white or yellow chip is drawn from bag 2 C: Receive $1m if a white or green chip is drawn from bag 2 What about D: Lottery [0 . 5 : B , 0 . 5 : C ] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 66

  40. The Ellsberg Paradox Two bags: Bag 1 40 white chips, 30 yellow chips, 30 green chips Bag 2 40 white chips, 60 chips that are yellow or green What do you prefer: A: Receive $1m if a white or yellow chip is drawn from bag 1 B: Receive $1m if a white or yellow chip is drawn from bag 2 C: Receive $1m if a white or green chip is drawn from bag 2 What about D: Lottery [0 . 5 : B , 0 . 5 : C ] However A and D should give same outcome, no matter what the proportion in Bag 2. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 67

  41. St. Petersburg Paradox What if there is no “best” outcome? Are utilities unbounded? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 68

  42. St. Petersburg Paradox What if there is no “best” outcome? Are utilities unbounded? Suppose utilities are unbounded. Then for any outcome o i there is an outcome o i +1 such that u ( o i +1 ) > 2 u ( o i ). � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 69

  43. St. Petersburg Paradox What if there is no “best” outcome? Are utilities unbounded? Suppose utilities are unbounded. Then for any outcome o i there is an outcome o i +1 such that u ( o i +1 ) > 2 u ( o i ). Would the agent prefer o 1 or the lottery [0 . 5 : o 2 , 0 . 5 : 0]? where 0 is the worst outcome. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 70

  44. St. Petersburg Paradox What if there is no “best” outcome? Are utilities unbounded? Suppose utilities are unbounded. Then for any outcome o i there is an outcome o i +1 such that u ( o i +1 ) > 2 u ( o i ). Would the agent prefer o 1 or the lottery [0 . 5 : o 2 , 0 . 5 : 0]? where 0 is the worst outcome. Is it rational to gamble o 1 to on a coin toss to get o 2 ? Is it rational to gamble o 2 to on a coin toss to get o 3 ? Is it rational to gamble o 3 to on a coin toss to get o 4 ? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 71

  45. St. Petersburg Paradox What if there is no “best” outcome? Are utilities unbounded? Suppose utilities are unbounded. Then for any outcome o i there is an outcome o i +1 such that u ( o i +1 ) > 2 u ( o i ). Would the agent prefer o 1 or the lottery [0 . 5 : o 2 , 0 . 5 : 0]? where 0 is the worst outcome. Is it rational to gamble o 1 to on a coin toss to get o 2 ? Is it rational to gamble o 2 to on a coin toss to get o 3 ? Is it rational to gamble o 3 to on a coin toss to get o 4 ? What will eventually happen? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 72

  46. Predictor Paradox Two boxes: Box 1: contains $10,000 Box 2: contains either $0 or $1m You can either choose both boxes or just box 2. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 73

  47. Predictor Paradox Two boxes: Box 1: contains $10,000 Box 2: contains either $0 or $1m You can either choose both boxes or just box 2. The “predictor” has put $1m in box 2 if he thinks you will take box 2 and $0 in box 2 if he thinks you will take both. The predictor has been correct in previous predictions. Do you take both boxes or just box 2? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 74

  48. Making Decisions Under Uncertainty What an agent should do depends on: The agent’s ability — what options are available to it. The agent’s beliefs — the ways the world could be, given the agent’s knowledge. Sensing updates the agent’s beliefs. The agent’s preferences — what the agent wants and tradeoffs when there are risks. Decision theory specifies how to trade off the desirability and probabilities of the possible outcomes for competing actions. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 1

  49. Decision Variables Decision variables are like random variables that an agent gets to choose a value for. A possible world specifies a value for each decision variable and each random variable. For each assignment of values to all decision variables, the measure of the set of worlds satisfying that assignment sum to 1. The probability of a proposition is undefined unless the agent condition on the values of all decision variables. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 2

  50. Decision Tree for Delivery Robot The robot can choose to wear pads to protect itself or not. The robot can choose to go the short way past the stairs or a long way that reduces the chance of an accident. There is one random variable of whether there is an accident. accident w0 - moderate damage short way w1 - quick, extra weight no accident wear pads accident w2 - moderate damage long way w3 - slow, extra weight no accident accident w4 - severe damage short way don’t w5 - quick, no weight no accident wear accident w6 - severe damage pads long way w7 - slow, no weight no accident � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 3

  51. Expected Values The expected value of a function of possible worlds is its average value, weighting possible worlds by their probability. Suppose f ( ω ) is the value of function f on world ω . ◮ The expected value of f is � E ( f ) = P ( ω ) × f ( ω ) . ω ∈ Ω ◮ The conditional expected value of f given e is � E ( f | e ) = P ( ω | e ) × f ( ω ) . ω | = e � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 4

  52. Single decisions In a single decision variable, the agent can choose D = d i for any d i ∈ dom ( D ). The expected utility of decision D = d i is E ( u | D = d i ) where u ( ω ) is the utility of world ω . An optimal single decision is a decision D = d max whose expected utility is maximal: E ( u | D = d max ) = d i ∈ dom ( D ) E ( u | D = d i ) . max � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 5

  53. Single-stage decision networks Extend belief networks with: Decision nodes, that the agent chooses the value for. Domain is the set of possible actions. Drawn as rectangle. Utility node, the parents are the variables on which the utility depends. Drawn as a diamond. Accident Which Way Utility Wear Pads This shows explicitly which nodes affect whether there is an accident. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 6

  54. Finding an optimal decision Suppose the random variables are X 1 , . . . , X n , and utility depends on X i 1 , . . . , X i k E ( u | D ) = � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 7

  55. Finding an optimal decision Suppose the random variables are X 1 , . . . , X n , and utility depends on X i 1 , . . . , X i k � E ( u | D ) = P ( X 1 , . . . , X n | D ) × u ( X i 1 , . . . , X i k ) X 1 ,..., X n � = X 1 ,..., X n � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 8

  56. Finding an optimal decision Suppose the random variables are X 1 , . . . , X n , and utility depends on X i 1 , . . . , X i k � E ( u | D ) = P ( X 1 , . . . , X n | D ) × u ( X i 1 , . . . , X i k ) X 1 ,..., X n n � � = P ( X i | parents ( X i )) × u ( X i 1 , . . . , X i k ) X 1 ,..., X n i =1 To find an optimal decision: ◮ Create a factor for each conditional probability and for the utility ◮ Sum out all of the random variables ◮ This creates a factor on D that gives the expected utility for each D ◮ Choose the D with the maximum value in the factor. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 9

  57. Example Initial Factors Which Way Accident Value long true 0.01 long false 0.99 short true 0.2 short false 0.8 Which Way Accident Wear Pads Value long true true 30 long true false 0 long false true 75 long false false 80 short true true 35 short true false 3 short false true 95 short false false 100 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 10

  58. After summing out Accident Which Way Wear Pads Value long true 74.55 long false 79.2 short true 83.0 short false 80.6 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 11

  59. Decision Networks flat or modular or hierarchical explicit states or features or individuals and relations static or finite stage or indefinite stage or infinite stage fully observable or partially observable deterministic or stochastic dynamics goals or complex preferences single agent or multiple agents knowledge is given or knowledge is learned perfect rationality or bounded rationality � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 12

  60. Sequential Decisions An intelligent agent doesn’t carry out a multi-step plan ignoring information it receives between actions. A more typical scenario is where the agent: observes, acts, observes, acts, . . . Subsequent actions can depend on what is observed. What is observed depends on previous actions. Often the sole reason for carrying out an action is to provide information for future actions. For example: diagnostic tests, spying. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 13

  61. Sequential decision problems A sequential decision problem consists of a sequence of decision variables D 1 , . . . , D n . Each D i has an information set of variables parents ( D i ), whose value will be known at the time decision D i is made. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 14

  62. Decisions Networks A decision network is a graphical representation of a finite sequential decision problem, with 3 types of nodes: A random variable is drawn as an ellipse. Arcs into the node represent probabilistic dependence. A decision variable is drawn as an rectangle. Arcs into the node represent information available when the decision is make. A utility node is drawn as a diamond. Arcs into the node represent variables that the utility depends on. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 15

  63. Umbrella Decision Network Weather Forecast Utility Umbrella You don’t get to observe the weather when you have to decide whether to take your umbrella. You do get to observe the forecast. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 16

  64. Decision Network for the Alarm Problem Utility Tampering Fire Alarm Smoke Leaving SeeSmoke Check Smoke Report Call � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 17

  65. No-forgetting A No-forgetting decision network is a decision network where: The decision nodes are totally ordered. This is the order the actions will be taken. All decision nodes that come before D i are parents of decision node D i . Thus the agent remembers its previous actions. Any parent of a decision node is a parent of subsequent decision nodes. Thus the agent remembers its previous observations. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 18

  66. What should an agent do? What an agent should do at any time depends on what it will do in the future. What an agent does in the future depends on what it did before. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 19

  67. Policies A policy specifies what an agent should do under each circumstance. A policy is a sequence δ 1 , . . . , δ n of decision functions δ i : dom ( parents ( D i )) → dom ( D i ) . This policy means that when the agent has observed O ∈ dom ( parents ( D i )), it will do δ i ( O ). � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 20

  68. Expected Utility of a Policy Possible world ω satisfies policy δ , written ω | = δ if the world assigns the value to each decision node that the policy specifies. The expected utility of policy δ is � E ( u | δ ) = u ( ω ) × P ( ω ) , ω | = δ An optimal policy is one with the highest expected utility. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 21

  69. Finding an optimal policy Create a factor for each conditional probability table and a factor for the utility. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 22

  70. Finding an optimal policy Create a factor for each conditional probability table and a factor for the utility. Repeat: ◮ Sum out random variables that are not parents of a decision node. ◮ Select a variable D that is only in a factor f with (some of) its parents. ◮ Eliminate D by maximizing. This returns: ◮ an optimal decision function for D : arg max D f ◮ a new factor: max D f until there are no more decision nodes. Sum out the remaining random variables. Multiply the factors: this is the expected utility of an optimal policy. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 23

  71. Initial factors for the Umbrella Decision Weather Fcast Value norain sunny 0.7 Weather Value norain cloudy 0.2 norain 0.7 norain rainy 0.1 rain 0.3 rain sunny 0.15 rain cloudy 0.25 rain rainy 0.6 Weather Umb Value norain take 20 norain leave 100 rain take 70 rain leave 0 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 24

  72. Eliminating By Maximizing Fcast Val Fcast Umb Val sunny 49.0 max Umb f : cloudy 14.0 sunny take 12.95 sunny leave 49.0 rainy 14.0 f : cloudy take 8.05 cloudy leave 14.0 Fcast Umb rainy take 14.0 sunny leave arg max Umb f : rainy leave 7.0 cloudy leave rainy take � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 25

  73. Exercise Utility Disease Symptoms Outcome Test Result Test Treatment What are the factors? Which random variables get summed out first? Which decision variable is eliminated? What factor is created? Then what is eliminated (and how)? What factors are created after maximization? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 26

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend