Goals and Preferences Alice . . . went on Would you please tell me, - PowerPoint PPT Presentation

Factored Representation of Utility Suppose the outcomes can be described in terms of features X 1 , . . . , X n . An additive utility is one that can be decomposed into set of factors: u ( X 1 , . . . , X n ) = f 1 ( X 1 ) + · · · + f n ( X n ) . This assumes additive independence . Strong assumption: contribution of each feature doesn’t depend on other features. Many ways to represent the same utility: — a number can be added to one factor as long as it is subtracted from others. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 28

Additive Utility An additive utility has a canonical representation: u ( X 1 , . . . , X n ) = w 1 × u 1 ( X 1 ) + · · · + w n × u n ( X n ) . If best i is the best value of X i , u i ( X i = best i ) = 1. If worst i is the worst value of X i , u i ( X i = worst i ) = 0. w i are weights, � i w i = 1. The weights reflect the relative importance of features. We can determine weights by comparing outcomes. w 1 = � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 29

Additive Utility An additive utility has a canonical representation: u ( X 1 , . . . , X n ) = w 1 × u 1 ( X 1 ) + · · · + w n × u n ( X n ) . If best i is the best value of X i , u i ( X i = best i ) = 1. If worst i is the worst value of X i , u i ( X i = worst i ) = 0. w i are weights, � i w i = 1. The weights reflect the relative importance of features. We can determine weights by comparing outcomes. w 1 = u ( best 1 , x 2 , . . . , x n ) − u ( worst 1 , x 2 , . . . , x n ) . for any values x 2 , . . . , x n of X 2 , . . . , X n . � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 30

General Setup for Additive Utility Suppose there are: multiple users multiple alternatives to choose among, e.g., hotel 1,. . . multiple criteria upon which to judge, e.g., rate , location utility is a function of � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 31

General Setup for Additive Utility Suppose there are: multiple users multiple alternatives to choose among, e.g., hotel 1,. . . multiple criteria upon which to judge, e.g., rate , location utility is a function of users and alternatives fact ( crit , alt ) is the fact about the domain value of criteria crit for alternative alt . E.g., fact ( rate , hotel 1) is the room rate for hotel#1, which is $125 per night. score ( val , user , crit ) gives the score of the domain value for user on criteria crit . � utility ( user , alt ) = weight ( user , crit ) × score ( fact ( crit , alt ) , user , crit ) crit for user , alternative alt , criteria crit � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 32

Complements and Substitutes Often additive independence is not a good assumption. Values x 1 of feature X 1 and x 2 of feature X 2 are complements if having both is better than the sum of the two. Values x 1 of feature X 1 and x 2 of feature X 2 are substitutes if having both is worse than the sum of the two. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 33

Complements and Substitutes Often additive independence is not a good assumption. Values x 1 of feature X 1 and x 2 of feature X 2 are complements if having both is better than the sum of the two. Values x 1 of feature X 1 and x 2 of feature X 2 are substitutes if having both is worse than the sum of the two. Example: on a holiday ◮ An excursion for 6 hours North on day 3. ◮ An excursion for 6 hours South on day 3. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 34

Complements and Substitutes Often additive independence is not a good assumption. Values x 1 of feature X 1 and x 2 of feature X 2 are complements if having both is better than the sum of the two. Values x 1 of feature X 1 and x 2 of feature X 2 are substitutes if having both is worse than the sum of the two. Example: on a holiday ◮ An excursion for 6 hours North on day 3. ◮ An excursion for 6 hours South on day 3. Example: on a holiday ◮ A trip to a location 3 hours North on day 3 ◮ The return trip for the same day. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 35

Generalized Additive Utility A generalized additive utility can be written as a sum of factors: u ( X 1 , . . . , X n ) = f 1 ( X 1 ) + · · · + f k ( X k ) where X i ⊆ { X 1 , . . . , X n } . An intuitive canonical representation is difficult to find. It can represent complements and substitutes. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 36

Utility and time Would you prefer $1000 today or $1000 next year? What price would you pay now to have an eternity of happiness? How can you trade off pleasures today with pleasures in the future? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 37

Pascal’s Wager (1670) Decide whether to believe in God. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 38

Pascal’s Wager (1670) Decide whether to believe in God. Believe in God Utility God Exists � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 39

Utility and time How would you compare the following sequences of rewards (per week): A: $1000000, $0, $0, $0, $0, $0,. . . B: $1000, $1000, $1000, $1000, $1000,. . . C: $1000, $0, $0, $0, $0,. . . D: $1, $1, $1, $1, $1,. . . E: $1, $2, $3, $4, $5,. . . � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 40

Rewards and Values Suppose the agent receives a sequence of rewards r 1 , r 2 , r 3 , r 4 , . . . in time. What utility should be assigned? “Return” or “value” � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 41

Rewards and Values Suppose the agent receives a sequence of rewards r 1 , r 2 , r 3 , r 4 , . . . in time. What utility should be assigned? “Return” or “value” ∞ � total reward V = r i i =1 average reward V = lim n →∞ ( r 1 + · · · + r n ) / n � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 42

Average vs Accumulated Rewards Agent goes on forever? yes no Agent gets stuck in "absorbing" state(s) with zero reward? yes no � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 43

Rewards and Values Suppose the agent receives a sequence of rewards r 1 , r 2 , r 3 , r 4 , . . . in time. discounted return V = r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · γ is the discount factor 0 ≤ γ ≤ 1. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 44

Properties of the Discounted Rewards The discounted return for rewards r 1 , r 2 , r 3 , r 4 , . . . is r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · V = = � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 45

Properties of the Discounted Rewards The discounted return for rewards r 1 , r 2 , r 3 , r 4 , . . . is r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · V = = r 1 + γ ( r 2 + γ ( r 3 + γ ( r 4 + . . . ))) If V t is the value obtained from time step t V t = � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 46

Properties of the Discounted Rewards The discounted return for rewards r 1 , r 2 , r 3 , r 4 , . . . is r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · V = = r 1 + γ ( r 2 + γ ( r 3 + γ ( r 4 + . . . ))) If V t is the value obtained from time step t V t = r t + γ V t +1 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 47

Properties of the Discounted Rewards The discounted return for rewards r 1 , r 2 , r 3 , r 4 , . . . is r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · V = = r 1 + γ ( r 2 + γ ( r 3 + γ ( r 4 + . . . ))) If V t is the value obtained from time step t V t = r t + γ V t +1 How is the infinite future valued compared to immediate rewards? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 48

Properties of the Discounted Rewards The discounted return for rewards r 1 , r 2 , r 3 , r 4 , . . . is r 1 + γ r 2 + γ 2 r 3 + γ 3 r 4 + · · · V = = r 1 + γ ( r 2 + γ ( r 3 + γ ( r 4 + . . . ))) If V t is the value obtained from time step t V t = r t + γ V t +1 How is the infinite future valued compared to immediate rewards? 1 + γ + γ 2 + γ 3 + · · · = 1 / (1 − γ ) Therefore minimum reward ≤ V t ≤ maximum reward 1 − γ 1 − γ We can approximate V with the first k terms, with error: V − ( r 1 + γ r 2 + · · · + γ k − 1 r k ) = γ k V k +1 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 49

Allais Paradox (1953) What would you prefer: A: $1 m — one million dollars B: lottery [0 . 10 : $2 . 5 m , 0 . 89 : $1 m , 0 . 01 : $0] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 50

Allais Paradox (1953) What would you prefer: A: $1 m — one million dollars B: lottery [0 . 10 : $2 . 5 m , 0 . 89 : $1 m , 0 . 01 : $0] What would you prefer: C: lottery [0 . 11 : $1 m , 0 . 89 : $0] D: lottery [0 . 10 : $2 . 5 m , 0 . 9 : $0] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 51

Allais Paradox (1953) What would you prefer: A: $1 m — one million dollars B: lottery [0 . 10 : $2 . 5 m , 0 . 89 : $1 m , 0 . 01 : $0] What would you prefer: C: lottery [0 . 11 : $1 m , 0 . 89 : $0] D: lottery [0 . 10 : $2 . 5 m , 0 . 9 : $0] It is inconsistent with the axioms of preferences to have A ≻ B and D ≻ C . � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 52

Allais Paradox (1953) What would you prefer: A: $1 m — one million dollars B: lottery [0 . 10 : $2 . 5 m , 0 . 89 : $1 m , 0 . 01 : $0] What would you prefer: C: lottery [0 . 11 : $1 m , 0 . 89 : $0] D: lottery [0 . 10 : $2 . 5 m , 0 . 9 : $0] It is inconsistent with the axioms of preferences to have A ≻ B and D ≻ C . A,C: lottery [0 . 11 : $1 m , 0 . 89 : X ] B,D: lottery [0 . 10 : $2 . 5 m , 0 . 01 : $0 , 0 . 89 : X ] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 53

Framing Effects [Tversky and Kahneman] A disease is expected to kill 600 people. Two alternative programs have been proposed: Program A: 200 people will be saved Program B: probability 1/3: 600 people will be saved probability 2/3: no one will be saved Which program would you favor? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 54

Framing Effects [Tversky and Kahneman] A disease is expected to kill 600 people. Two alternative programs have been proposed: Program C: 400 people will die Program D: probability 1/3: no one will die probability 2/3: 600 will die Which program would you favor? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 55

Framing Effects [Tversky and Kahneman] A disease is expected to kill 600 people. Two alternative programs have been proposed: Program A: 200 people will be saved Program B: probability 1/3: 600 people will be saved probability 2/3: no one will be saved Which program would you favor? A disease is expected to kill 600 people. Two alternative programs have been proposed: Program C: 400 people will die Program D: probability 1/3: no one will die probability 2/3: 600 will die Which program would you favor? Tversky and Kahneman: 72% chose A over B. 22% chose C over D. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 56

Prospect Theory psychological value $ Losses Gains In mixed gambles, loss aversion causes extreme risk-averse choices In bad choices, diminishing responsibility causes risk seeking. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 57

Reference Points Consider Anthony and Betty: Anthony’s current wealth is $1 million. Betty’s current wealth is $4 million. They are both offered the choice between a gamble and a sure thing: Gamble: equal chance to end up owning $1 million or $4 million. Sure Thing: own $2 million What does expected utility theory predict? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 58

Reference Points Consider Anthony and Betty: Anthony’s current wealth is $1 million. Betty’s current wealth is $4 million. They are both offered the choice between a gamble and a sure thing: Gamble: equal chance to end up owning $1 million or $4 million. Sure Thing: own $2 million What does expected utility theory predict? What does prospect theory predict? [From D. Kahneman, Thinking, Fast and Slow , 2011, pp. 275-276.] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 59

Framing Effects What do you think of Alan and Ben: Alan: intelligent—industrious—impulsive—critical— stubborn—envious � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 60

Framing Effects What do you think of Alan and Ben: Ben: envious—stubborn—critical—impulsive— industrious—intelligent � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 61

Framing Effects What do you think of Alan and Ben: Alan: intelligent—industrious—impulsive—critical— stubborn—envious Ben: envious—stubborn—critical—impulsive— industrious—intelligent [From D. Kahneman, Thinking Fast and Slow, 2011, p. 82] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 62

Framing Effects Suppose you had bought tickets for the theatre for $50. When you got to the theatre, you had lost the tickets. You have your credit card and can buy equivalent tickets for $50. Do you buy the replacement tickets on your credit card? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 63

Framing Effects Suppose you had bought tickets for the theatre for $50. When you got to the theatre, you had lost the tickets. You have your credit card and can buy equivalent tickets for $50. Do you buy the replacement tickets on your credit card? Suppose you had $50 in your pocket to buy tickets. When you got to the theatre, you had lost the $50. You have your credit card and can buy equivalent tickets for $50. Do you buy the tickets on your credit card? [From R.M. Dawes, Rational Choice in an Uncertain World, 1988.] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 64

The Ellsberg Paradox Two bags: Bag 1 40 white chips, 30 yellow chips, 30 green chips Bag 2 40 white chips, 60 chips that are yellow or green What do you prefer: A: Receive $1m if a white or yellow chip is drawn from bag 1 B: Receive $1m if a white or yellow chip is drawn from bag 2 C: Receive $1m if a white or green chip is drawn from bag 2 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 65

The Ellsberg Paradox Two bags: Bag 1 40 white chips, 30 yellow chips, 30 green chips Bag 2 40 white chips, 60 chips that are yellow or green What do you prefer: A: Receive $1m if a white or yellow chip is drawn from bag 1 B: Receive $1m if a white or yellow chip is drawn from bag 2 C: Receive $1m if a white or green chip is drawn from bag 2 What about D: Lottery [0 . 5 : B , 0 . 5 : C ] � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 66

The Ellsberg Paradox Two bags: Bag 1 40 white chips, 30 yellow chips, 30 green chips Bag 2 40 white chips, 60 chips that are yellow or green What do you prefer: A: Receive $1m if a white or yellow chip is drawn from bag 1 B: Receive $1m if a white or yellow chip is drawn from bag 2 C: Receive $1m if a white or green chip is drawn from bag 2 What about D: Lottery [0 . 5 : B , 0 . 5 : C ] However A and D should give same outcome, no matter what the proportion in Bag 2. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 67

St. Petersburg Paradox What if there is no “best” outcome? Are utilities unbounded? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 68

St. Petersburg Paradox What if there is no “best” outcome? Are utilities unbounded? Suppose utilities are unbounded. Then for any outcome o i there is an outcome o i +1 such that u ( o i +1 ) > 2 u ( o i ). � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 69

St. Petersburg Paradox What if there is no “best” outcome? Are utilities unbounded? Suppose utilities are unbounded. Then for any outcome o i there is an outcome o i +1 such that u ( o i +1 ) > 2 u ( o i ). Would the agent prefer o 1 or the lottery [0 . 5 : o 2 , 0 . 5 : 0]? where 0 is the worst outcome. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 70

St. Petersburg Paradox What if there is no “best” outcome? Are utilities unbounded? Suppose utilities are unbounded. Then for any outcome o i there is an outcome o i +1 such that u ( o i +1 ) > 2 u ( o i ). Would the agent prefer o 1 or the lottery [0 . 5 : o 2 , 0 . 5 : 0]? where 0 is the worst outcome. Is it rational to gamble o 1 to on a coin toss to get o 2 ? Is it rational to gamble o 2 to on a coin toss to get o 3 ? Is it rational to gamble o 3 to on a coin toss to get o 4 ? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 71

St. Petersburg Paradox What if there is no “best” outcome? Are utilities unbounded? Suppose utilities are unbounded. Then for any outcome o i there is an outcome o i +1 such that u ( o i +1 ) > 2 u ( o i ). Would the agent prefer o 1 or the lottery [0 . 5 : o 2 , 0 . 5 : 0]? where 0 is the worst outcome. Is it rational to gamble o 1 to on a coin toss to get o 2 ? Is it rational to gamble o 2 to on a coin toss to get o 3 ? Is it rational to gamble o 3 to on a coin toss to get o 4 ? What will eventually happen? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 72

Predictor Paradox Two boxes: Box 1: contains $10,000 Box 2: contains either $0 or $1m You can either choose both boxes or just box 2. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 73

Predictor Paradox Two boxes: Box 1: contains $10,000 Box 2: contains either $0 or $1m You can either choose both boxes or just box 2. The “predictor” has put $1m in box 2 if he thinks you will take box 2 and $0 in box 2 if he thinks you will take both. The predictor has been correct in previous predictions. Do you take both boxes or just box 2? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.1, Page 74

Making Decisions Under Uncertainty What an agent should do depends on: The agent’s ability — what options are available to it. The agent’s beliefs — the ways the world could be, given the agent’s knowledge. Sensing updates the agent’s beliefs. The agent’s preferences — what the agent wants and tradeoffs when there are risks. Decision theory specifies how to trade off the desirability and probabilities of the possible outcomes for competing actions. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 1

Decision Variables Decision variables are like random variables that an agent gets to choose a value for. A possible world specifies a value for each decision variable and each random variable. For each assignment of values to all decision variables, the measure of the set of worlds satisfying that assignment sum to 1. The probability of a proposition is undefined unless the agent condition on the values of all decision variables. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 2

Decision Tree for Delivery Robot The robot can choose to wear pads to protect itself or not. The robot can choose to go the short way past the stairs or a long way that reduces the chance of an accident. There is one random variable of whether there is an accident. accident w0 - moderate damage short way w1 - quick, extra weight no accident wear pads accident w2 - moderate damage long way w3 - slow, extra weight no accident accident w4 - severe damage short way don’t w5 - quick, no weight no accident wear accident w6 - severe damage pads long way w7 - slow, no weight no accident � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 3

Expected Values The expected value of a function of possible worlds is its average value, weighting possible worlds by their probability. Suppose f ( ω ) is the value of function f on world ω . ◮ The expected value of f is � E ( f ) = P ( ω ) × f ( ω ) . ω ∈ Ω ◮ The conditional expected value of f given e is � E ( f | e ) = P ( ω | e ) × f ( ω ) . ω | = e � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 4

Single decisions In a single decision variable, the agent can choose D = d i for any d i ∈ dom ( D ). The expected utility of decision D = d i is E ( u | D = d i ) where u ( ω ) is the utility of world ω . An optimal single decision is a decision D = d max whose expected utility is maximal: E ( u | D = d max ) = d i ∈ dom ( D ) E ( u | D = d i ) . max � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 5

Single-stage decision networks Extend belief networks with: Decision nodes, that the agent chooses the value for. Domain is the set of possible actions. Drawn as rectangle. Utility node, the parents are the variables on which the utility depends. Drawn as a diamond. Accident Which Way Utility Wear Pads This shows explicitly which nodes affect whether there is an accident. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 6

Finding an optimal decision Suppose the random variables are X 1 , . . . , X n , and utility depends on X i 1 , . . . , X i k E ( u | D ) = � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 7

Finding an optimal decision Suppose the random variables are X 1 , . . . , X n , and utility depends on X i 1 , . . . , X i k � E ( u | D ) = P ( X 1 , . . . , X n | D ) × u ( X i 1 , . . . , X i k ) X 1 ,..., X n � = X 1 ,..., X n � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 8

Finding an optimal decision Suppose the random variables are X 1 , . . . , X n , and utility depends on X i 1 , . . . , X i k � E ( u | D ) = P ( X 1 , . . . , X n | D ) × u ( X i 1 , . . . , X i k ) X 1 ,..., X n n � � = P ( X i | parents ( X i )) × u ( X i 1 , . . . , X i k ) X 1 ,..., X n i =1 To find an optimal decision: ◮ Create a factor for each conditional probability and for the utility ◮ Sum out all of the random variables ◮ This creates a factor on D that gives the expected utility for each D ◮ Choose the D with the maximum value in the factor. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 9

Example Initial Factors Which Way Accident Value long true 0.01 long false 0.99 short true 0.2 short false 0.8 Which Way Accident Wear Pads Value long true true 30 long true false 0 long false true 75 long false false 80 short true true 35 short true false 3 short false true 95 short false false 100 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 10

After summing out Accident Which Way Wear Pads Value long true 74.55 long false 79.2 short true 83.0 short false 80.6 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 11

Decision Networks flat or modular or hierarchical explicit states or features or individuals and relations static or finite stage or indefinite stage or infinite stage fully observable or partially observable deterministic or stochastic dynamics goals or complex preferences single agent or multiple agents knowledge is given or knowledge is learned perfect rationality or bounded rationality � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 12

Sequential Decisions An intelligent agent doesn’t carry out a multi-step plan ignoring information it receives between actions. A more typical scenario is where the agent: observes, acts, observes, acts, . . . Subsequent actions can depend on what is observed. What is observed depends on previous actions. Often the sole reason for carrying out an action is to provide information for future actions. For example: diagnostic tests, spying. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 13

Sequential decision problems A sequential decision problem consists of a sequence of decision variables D 1 , . . . , D n . Each D i has an information set of variables parents ( D i ), whose value will be known at the time decision D i is made. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 14

Decisions Networks A decision network is a graphical representation of a finite sequential decision problem, with 3 types of nodes: A random variable is drawn as an ellipse. Arcs into the node represent probabilistic dependence. A decision variable is drawn as an rectangle. Arcs into the node represent information available when the decision is make. A utility node is drawn as a diamond. Arcs into the node represent variables that the utility depends on. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 15

Umbrella Decision Network Weather Forecast Utility Umbrella You don’t get to observe the weather when you have to decide whether to take your umbrella. You do get to observe the forecast. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 16

Decision Network for the Alarm Problem Utility Tampering Fire Alarm Smoke Leaving SeeSmoke Check Smoke Report Call � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 17

No-forgetting A No-forgetting decision network is a decision network where: The decision nodes are totally ordered. This is the order the actions will be taken. All decision nodes that come before D i are parents of decision node D i . Thus the agent remembers its previous actions. Any parent of a decision node is a parent of subsequent decision nodes. Thus the agent remembers its previous observations. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 18

What should an agent do? What an agent should do at any time depends on what it will do in the future. What an agent does in the future depends on what it did before. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 19

Policies A policy specifies what an agent should do under each circumstance. A policy is a sequence δ 1 , . . . , δ n of decision functions δ i : dom ( parents ( D i )) → dom ( D i ) . This policy means that when the agent has observed O ∈ dom ( parents ( D i )), it will do δ i ( O ). � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 20

Expected Utility of a Policy Possible world ω satisfies policy δ , written ω | = δ if the world assigns the value to each decision node that the policy specifies. The expected utility of policy δ is � E ( u | δ ) = u ( ω ) × P ( ω ) , ω | = δ An optimal policy is one with the highest expected utility. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 21

Finding an optimal policy Create a factor for each conditional probability table and a factor for the utility. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 22

Finding an optimal policy Create a factor for each conditional probability table and a factor for the utility. Repeat: ◮ Sum out random variables that are not parents of a decision node. ◮ Select a variable D that is only in a factor f with (some of) its parents. ◮ Eliminate D by maximizing. This returns: ◮ an optimal decision function for D : arg max D f ◮ a new factor: max D f until there are no more decision nodes. Sum out the remaining random variables. Multiply the factors: this is the expected utility of an optimal policy. � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 23

Initial factors for the Umbrella Decision Weather Fcast Value norain sunny 0.7 Weather Value norain cloudy 0.2 norain 0.7 norain rainy 0.1 rain 0.3 rain sunny 0.15 rain cloudy 0.25 rain rainy 0.6 Weather Umb Value norain take 20 norain leave 100 rain take 70 rain leave 0 � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 24

Eliminating By Maximizing Fcast Val Fcast Umb Val sunny 49.0 max Umb f : cloudy 14.0 sunny take 12.95 sunny leave 49.0 rainy 14.0 f : cloudy take 8.05 cloudy leave 14.0 Fcast Umb rainy take 14.0 sunny leave arg max Umb f : rainy leave 7.0 cloudy leave rainy take � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 25

Exercise Utility Disease Symptoms Outcome Test Result Test Treatment What are the factors? Which random variables get summed out first? Which decision variable is eliminated? What factor is created? Then what is eliminated (and how)? What factors are created after maximization? � D. Poole and A. Mackworth 2010 c Artificial Intelligence, Lecture 9.2, Page 26

Goals and Preferences Alice . . . went on Would you please tell me, - PowerPoint PPT Presentation

Goals and Preferences Alice . . . went on Would you please tell me, please, which way I ought to go from here? That depends a good deal on where you want to get to, said the Cat. I dont much care where said Alice.

Moral Preferences F R A N C E S C A R O S S I Decision making Based on our preferences

Rational preferences Idea: preferences of a rational agent must obey constraints. Rational

Rational preferences Idea: preferences of a rational agent must obey constraints. Rational

Introduction to Artificial Intelligence Planning under Uncertainty Janyl Jumadinova November 2,

14.54 International Trade Lecture 3: Preferences and Demand 14.54 Week 2 Fall 2016 14.54 (Week

Incentives and Behavior Prof. Dr. Heiner Schumacher KU Leuven 6. Time Preferences I Prof. Dr.

Incentives and Behavior Prof. Dr. Heiner Schumacher KU Leuven 3. Risk Preferences I Prof. Dr.

Cardinal and Ordinal Preferences A preference structure represents an agents preferences over a

Ordinal and Cardinal Preferences A preference structure represents an agents preferences over a

Axiomatic Foundations of Multiplier Preferences Tomasz Strzalecki Multiplier preferences

Platform for Privacy Platform for Privacy Preferences (P3P) Project Preferences (P3P) Project

Sustainable intergenerational preferences preferences Geir B. Combining sensitivity for the

Social Choice: Single-Peaked Preferences Game Theory Course: Jackson, Leyton-Brown & Shoham

APS Strategic Plan Goals and Strategies February 10, 2018 Strategic Plan Goals Goals : Key areas

Meeting your goals Meeting your goals Meeting your goals Meeting your goals We are DNS DNS

x agent with these beliefs and these My preferences preferences to make. Justin C. Fisher

A Weakly Supervised Approach for Adaptive Detection of Cyberbullying Roles Bert Huang Department

Towards Verification of Domestic Robot Assistants-Part 2 Clare Dixon Department of Computer

Training Seminar Todays slides can be found: www.linkedin.com/company/nara-training/

WRP Steering Committee Planning Meeting with Committee Co-chairs JUNE 29, 2015 Todays Agenda

James VII and II New York named after James as Duke of York Queen Henrietta Maria by unknown

Verteilte Systeme Synchronisation I Prof. Dr. Oliver Haase 1 berblick Synchronisation 1

Outline Problem Description MASC Architecture MASC Results Improving Candidate

What will be your mark? New Year New Beginnings Expectations for academic, behavior, and

Goals and Preferences Alice . . . went on Would you please tell me, - PowerPoint PPT Presentation

Goals and Preferences Alice . . . went on Would you please tell me, please, which way I ought to go from here? That depends a good deal on where you want to get to, said the Cat. I dont much care where said Alice.

Moral Preferences F R A N C E S C A R O S S I Decision making Based on our preferences

Rational preferences Idea: preferences of a rational agent must obey constraints. Rational

Rational preferences Idea: preferences of a rational agent must obey constraints. Rational

Introduction to Artificial Intelligence Planning under Uncertainty Janyl Jumadinova November 2,

14.54 International Trade Lecture 3: Preferences and Demand 14.54 Week 2 Fall 2016 14.54 (Week

Incentives and Behavior Prof. Dr. Heiner Schumacher KU Leuven 6. Time Preferences I Prof. Dr.

Incentives and Behavior Prof. Dr. Heiner Schumacher KU Leuven 3. Risk Preferences I Prof. Dr.

Cardinal and Ordinal Preferences A preference structure represents an agents preferences over a

Ordinal and Cardinal Preferences A preference structure represents an agents preferences over a

Axiomatic Foundations of Multiplier Preferences Tomasz Strzalecki Multiplier preferences

Platform for Privacy Platform for Privacy Preferences (P3P) Project Preferences (P3P) Project

Sustainable intergenerational preferences preferences Geir B. Combining sensitivity for the

Social Choice: Single-Peaked Preferences Game Theory Course: Jackson, Leyton-Brown &amp; Shoham

APS Strategic Plan Goals and Strategies February 10, 2018 Strategic Plan Goals Goals : Key areas

Meeting your goals Meeting your goals Meeting your goals Meeting your goals We are DNS DNS

x agent with these beliefs and these My preferences preferences to make. Justin C. Fisher

A Weakly Supervised Approach for Adaptive Detection of Cyberbullying Roles Bert Huang Department

Towards Verification of Domestic Robot Assistants-Part 2 Clare Dixon Department of Computer

Training Seminar Todays slides can be found: www.linkedin.com/company/nara-training/

WRP Steering Committee Planning Meeting with Committee Co-chairs JUNE 29, 2015 Todays Agenda

James VII and II New York named after James as Duke of York Queen Henrietta Maria by unknown

Verteilte Systeme Synchronisation I Prof. Dr. Oliver Haase 1 berblick Synchronisation 1

Outline Problem Description MASC Architecture MASC Results Improving Candidate

What will be your mark? New Year New Beginnings Expectations for academic, behavior, and

Social Choice: Single-Peaked Preferences Game Theory Course: Jackson, Leyton-Brown & Shoham