Decision-Making Paolo Turrini Department of Computing, Imperial - - PowerPoint PPT Presentation

decision making
SMART_READER_LITE
LIVE PREVIEW

Decision-Making Paolo Turrini Department of Computing, Imperial - - PowerPoint PPT Presentation

Intro to AI (2nd Part) Decision-Making Paolo Turrini Department of Computing, Imperial College London Introduction to Artificial Intelligence 2nd Part Paolo Turrini Intro to AI (2nd Part) Intro to AI (2nd Part) Outline Lotteries (and how


slide-1
SLIDE 1

Intro to AI (2nd Part)

Decision-Making

Paolo Turrini

Department of Computing, Imperial College London

Introduction to Artificial Intelligence 2nd Part

Paolo Turrini Intro to AI (2nd Part)

slide-2
SLIDE 2

Intro to AI (2nd Part)

Outline

Lotteries (and how to win them) Risky moves maybe “Time” but I very much doubt it

Paolo Turrini Intro to AI (2nd Part)

slide-3
SLIDE 3

Intro to AI (2nd Part)

Lotteries

(and how to win them)

Paolo Turrini Intro to AI (2nd Part)

slide-4
SLIDE 4

Intro to AI (2nd Part)

The main reference

Stuart Russell and Peter Norvig Artificial Intelligence: a modern approach Chapters 16-17

Paolo Turrini Intro to AI (2nd Part)

slide-5
SLIDE 5

Intro to AI (2nd Part)

Rewards

Paolo Turrini Intro to AI (2nd Part)

Sensors Breeze, Glitter, Smell Actuators Turn L/R, Go, Grab, Release, Shoot, Climb Rewards 1000 escaping with gold, -1000 dying, -10 using arrow, -1 walking Environment Squares adjacent to Wumpus are smelly Squares adjacent to pit are breezy Glitter iff gold is in the same square Shooting kills Wumpus if you are facing it Shooting uses up the only arrow Grabbing picks up gold if in same square Releasing drops the gold in same square

slide-6
SLIDE 6

Intro to AI (2nd Part)

Rewards

Paolo Turrini Intro to AI (2nd Part)

Sensors Breeze, Glitter, Smell Actuators Turn L/R, Go, Grab, Release, Shoot, Climb Rewards 1000 escaping with gold, -1000 dying, -10 using arrow, -1 walking Environment Squares adjacent to Wumpus are smelly Squares adjacent to pit are breezy Glitter iff gold is in the same square Shooting kills Wumpus if you are facing it Shooting uses up the only arrow Grabbing picks up gold if in same square Releasing drops the gold in same square

slide-7
SLIDE 7

Intro to AI (2nd Part)

State space

The universe in which the agent moves is a finite set of states S = {s1, . . . , sn}

Paolo Turrini Intro to AI (2nd Part)

slide-8
SLIDE 8

Intro to AI (2nd Part)

State space

The universe in which the agent moves is a finite set of states S = {s1, . . . , sn} e.g., the squares in the Wumpus World.

Paolo Turrini Intro to AI (2nd Part)

slide-9
SLIDE 9

Intro to AI (2nd Part)

State space

The universe in which the agent moves is a finite set of states S = {s1, . . . , sn} e.g., the squares in the Wumpus World. States can also take into account the inner state of the agent, e.g., the knowledge base KB;

Paolo Turrini Intro to AI (2nd Part)

slide-10
SLIDE 10

Intro to AI (2nd Part)

State space

The universe in which the agent moves is a finite set of states S = {s1, . . . , sn} e.g., the squares in the Wumpus World. States can also take into account the inner state of the agent, e.g., the knowledge base KB;

  • r the actions they have performed, e.g., climbing out of the

cave with the gold.

Paolo Turrini Intro to AI (2nd Part)

slide-11
SLIDE 11

Intro to AI (2nd Part)

Utility functions

A utility function is a function u : S → R associating a real number to each state.

Paolo Turrini Intro to AI (2nd Part)

slide-12
SLIDE 12

Intro to AI (2nd Part)

Utility functions

A utility function is a function u : S → R associating a real number to each state. Important: Utility functions are not the same as money. Utility functions are a representation of happiness, goal satisfaction, fulfilment and the

  • like. They are just a mathematical tool to represent a comparison

between outcomes. So altruism, unselfishness, and so fort can be modelled using utility functions.

Paolo Turrini Intro to AI (2nd Part)

slide-13
SLIDE 13

Intro to AI (2nd Part)

Utility functions

A utility function is a function u : S → R associating a real number to each state. Important: Utility functions are not the same as money. Utility functions are a representation of happiness, goal satisfaction, fulfilment and the

  • like. They are just a mathematical tool to represent a comparison

between outcomes. So altruism, unselfishness, and so fort can be modelled using utility functions. (Paolo Turrini 2016)

Paolo Turrini Intro to AI (2nd Part)

slide-14
SLIDE 14

Intro to AI (2nd Part)

Lotteries

A lottery is a probability distribution over the set of states.

Paolo Turrini Intro to AI (2nd Part)

slide-15
SLIDE 15

Intro to AI (2nd Part)

Lotteries

A lottery is a probability distribution over the set of states. e.g., for outcomes A1 and A2, and p ∈ [0, 1] Lottery A = [p, A1; (1 − p), A2]

Paolo Turrini Intro to AI (2nd Part)

slide-16
SLIDE 16

Intro to AI (2nd Part)

Lotteries

A lottery is a probability distribution over the set of states. e.g., for outcomes A1 and A2, and p ∈ [0, 1] Lottery A = [p, A1; (1 − p), A2] L is the set of lotteries over S.

Paolo Turrini Intro to AI (2nd Part)

slide-17
SLIDE 17

Intro to AI (2nd Part)

Simple Lotteries

Observation: A state s ∈ S can be seen as a lottery

Paolo Turrini Intro to AI (2nd Part)

slide-18
SLIDE 18

Intro to AI (2nd Part)

Simple Lotteries

Observation: A state s ∈ S can be seen as a lottery: where s is assigned probability 1 and all other states probability 0.

Paolo Turrini Intro to AI (2nd Part)

slide-19
SLIDE 19

Intro to AI (2nd Part)

Simple Lotteries

Observation: A state s ∈ S can be seen as a lottery: where s is assigned probability 1 and all other states probability 0. e.g., A = [1, A1; 0, A2; 0, A3; . . .] We get A1 with probability 1, and the rest with probability 0.

Paolo Turrini Intro to AI (2nd Part)

slide-20
SLIDE 20

Intro to AI (2nd Part)

Compound Lotteries

A lottery over the set of lotteries

Paolo Turrini Intro to AI (2nd Part)

slide-21
SLIDE 21

Intro to AI (2nd Part)

Compound Lotteries

A lottery over the set of lotteries is itself a lottery.

Paolo Turrini Intro to AI (2nd Part)

slide-22
SLIDE 22

Intro to AI (2nd Part)

Compound Lotteries

A lottery over the set of lotteries is itself a lottery. A = [q1, A; q2, B; . . . ; qn, C] =

Paolo Turrini Intro to AI (2nd Part)

slide-23
SLIDE 23

Intro to AI (2nd Part)

Compound Lotteries

A lottery over the set of lotteries is itself a lottery. A = [q1, A; q2, B; . . . ; qn, C] = = [q1, [p1, A1; p2, A2; . . . pn, An]; q2, B; . . . ; qn, C] =

Paolo Turrini Intro to AI (2nd Part)

slide-24
SLIDE 24

Intro to AI (2nd Part)

Compound Lotteries

A lottery over the set of lotteries is itself a lottery. A = [q1, A; q2, B; . . . ; qn, C] = = [q1, [p1, A1; p2, A2; . . . pn, An]; q2, B; . . . ; qn, C] = = [q1p1, A1; q1p2, A2; . . . qnpn, An; q2, B; . . . ; qn, C] = . . .

Paolo Turrini Intro to AI (2nd Part)

slide-25
SLIDE 25

Intro to AI (2nd Part)

Compound Lotteries

A lottery over the set of lotteries is itself a lottery. A = [q1, A; q2, B; . . . ; qn, C] = = [q1, [p1, A1; p2, A2; . . . pn, An]; q2, B; . . . ; qn, C] = = [q1p1, A1; q1p2, A2; . . . qnpn, An; q2, B; . . . ; qn, C] = . . . Compound lotteries can be reduced to simple lotteries

Paolo Turrini Intro to AI (2nd Part)

slide-26
SLIDE 26

Intro to AI (2nd Part)

Expected Utility

Let A = [p1, A1; p2, A2; . . . pn, An] be a lottery.

Paolo Turrini Intro to AI (2nd Part)

slide-27
SLIDE 27

Intro to AI (2nd Part)

Expected Utility

Let A = [p1, A1; p2, A2; . . . pn, An] be a lottery. The expected utility of A is u(A) =

  • pi,Ai

pi × u(Ai)

Paolo Turrini Intro to AI (2nd Part)

slide-28
SLIDE 28

Intro to AI (2nd Part)

Expected Utility

Let A = [p1, A1; p2, A2; . . . pn, An] be a lottery. The expected utility of A is u(A) =

  • pi,Ai

pi × u(Ai) e.g., rolling a fair six-sided dice, I win 27k if 6 comes out, lose 3k

  • therwise.

Paolo Turrini Intro to AI (2nd Part)

slide-29
SLIDE 29

Intro to AI (2nd Part)

Expected Utility

Let A = [p1, A1; p2, A2; . . . pn, An] be a lottery. The expected utility of A is u(A) =

  • pi,Ai

pi × u(Ai) e.g., rolling a fair six-sided dice, I win 27k if 6 comes out, lose 3k

  • therwise. The expected utility is

Paolo Turrini Intro to AI (2nd Part)

slide-30
SLIDE 30

Intro to AI (2nd Part)

Expected Utility

Let A = [p1, A1; p2, A2; . . . pn, An] be a lottery. The expected utility of A is u(A) =

  • pi,Ai

pi × u(Ai) e.g., rolling a fair six-sided dice, I win 27k if 6 comes out, lose 3k

  • therwise. The expected utility is = 1

627k − 5 63k

Paolo Turrini Intro to AI (2nd Part)

slide-31
SLIDE 31

Intro to AI (2nd Part)

Expected Utility

Let A = [p1, A1; p2, A2; . . . pn, An] be a lottery. The expected utility of A is u(A) =

  • pi,Ai

pi × u(Ai) e.g., rolling a fair six-sided dice, I win 27k if 6 comes out, lose 3k

  • therwise. The expected utility is = 1

627k − 5 63k = 2k.

Paolo Turrini Intro to AI (2nd Part)

slide-32
SLIDE 32

Intro to AI (2nd Part)

Humans and Expected Utility

Paolo Turrini Intro to AI (2nd Part)

Tverski and Kahneman’s Prospect Theory: Humans have complex utility estimates Risk aversion, satisfaction level

Figure: Typical empirical data

slide-33
SLIDE 33

Intro to AI (2nd Part)

Humans and Expected Utility

Paolo Turrini Intro to AI (2nd Part)

Tverski and Kahneman’s Prospect Theory: Humans have complex utility estimates Risk aversion, satisfaction level

Warning! controversial statement:

Figure: Typical empirical data

slide-34
SLIDE 34

Intro to AI (2nd Part)

Humans and Expected Utility

Paolo Turrini Intro to AI (2nd Part)

Tverski and Kahneman’s Prospect Theory: Humans have complex utility estimates Risk aversion, satisfaction level

Warning! controversial statement:

PT does not refute the principle

  • f maximization of expected utility.

Figure: Typical empirical data

slide-35
SLIDE 35

Intro to AI (2nd Part)

Humans and Expected Utility

Paolo Turrini Intro to AI (2nd Part)

Tverski and Kahneman’s Prospect Theory: Humans have complex utility estimates Risk aversion, satisfaction level

Warning! controversial statement:

PT does not refute the principle

  • f maximization of expected utility.

We can incorporate risk aversion and satisfaction as properties of outcomes.

Figure: Typical empirical data

slide-36
SLIDE 36

Intro to AI (2nd Part)

Preferences

A preference relation is a relation ⊆ L × L over the set of lotteries.

Paolo Turrini Intro to AI (2nd Part)

slide-37
SLIDE 37

Intro to AI (2nd Part)

Preferences

A preference relation is a relation ⊆ L × L over the set of lotteries. A B means that lottery A is weakly preferred to lottery B.

Paolo Turrini Intro to AI (2nd Part)

slide-38
SLIDE 38

Intro to AI (2nd Part)

Preferences

A preference relation is a relation ⊆ L × L over the set of lotteries. A B means that lottery A is weakly preferred to lottery B. A ≻ B = (A B and not B A) means that lotter A is strictly preferred to lottery B.

Paolo Turrini Intro to AI (2nd Part)

slide-39
SLIDE 39

Intro to AI (2nd Part)

Preferences

A preference relation is a relation ⊆ L × L over the set of lotteries. A B means that lottery A is weakly preferred to lottery B. A ≻ B = (A B and not B A) means that lotter A is strictly preferred to lottery B. A ∼ B = (A B and B A) means that lottery A the same as lottery B value-wise (indifference).

Paolo Turrini Intro to AI (2nd Part)

slide-40
SLIDE 40

Intro to AI (2nd Part)

Rational preferences

Let A, B, C be three states and let p, q ∈ [0, 1].

Paolo Turrini Intro to AI (2nd Part)

slide-41
SLIDE 41

Intro to AI (2nd Part)

Rational preferences

Let A, B, C be three states and let p, q ∈ [0, 1]. A preference relation makes sense if it satisfies the following constraints

Paolo Turrini Intro to AI (2nd Part)

slide-42
SLIDE 42

Intro to AI (2nd Part)

Rational preferences

Let A, B, C be three states and let p, q ∈ [0, 1]. A preference relation makes sense if it satisfies the following constraints Orderability (A ≻ B) ∨ (B ∼ A) ∨ (B ≻ A)

Paolo Turrini Intro to AI (2nd Part)

slide-43
SLIDE 43

Intro to AI (2nd Part)

Rational preferences

Let A, B, C be three states and let p, q ∈ [0, 1]. A preference relation makes sense if it satisfies the following constraints Orderability (A ≻ B) ∨ (B ∼ A) ∨ (B ≻ A) Transitivity (A ≻ B) ∧ (B ≻ C) ⇒ (A ≻ C)

Paolo Turrini Intro to AI (2nd Part)

slide-44
SLIDE 44

Intro to AI (2nd Part)

Rational preferences

Let A, B, C be three states and let p, q ∈ [0, 1]. A preference relation makes sense if it satisfies the following constraints Orderability (A ≻ B) ∨ (B ∼ A) ∨ (B ≻ A) Transitivity (A ≻ B) ∧ (B ≻ C) ⇒ (A ≻ C) Continuity A ≻ B ≻ C ⇒ ∃ p [p, A; 1 − p, C] ∼ B

Paolo Turrini Intro to AI (2nd Part)

slide-45
SLIDE 45

Intro to AI (2nd Part)

Rational preferences

Let A, B, C be three states and let p, q ∈ [0, 1]. A preference relation makes sense if it satisfies the following constraints Orderability (A ≻ B) ∨ (B ∼ A) ∨ (B ≻ A) Transitivity (A ≻ B) ∧ (B ≻ C) ⇒ (A ≻ C) Continuity A ≻ B ≻ C ⇒ ∃ p [p, A; 1 − p, C] ∼ B Substitutability A ∼ B ⇒ [p, A; 1 − p, C] ∼ [p, B; 1 − p, C]

Paolo Turrini Intro to AI (2nd Part)

slide-46
SLIDE 46

Intro to AI (2nd Part)

Rational preferences

Let A, B, C be three states and let p, q ∈ [0, 1]. A preference relation makes sense if it satisfies the following constraints Orderability (A ≻ B) ∨ (B ∼ A) ∨ (B ≻ A) Transitivity (A ≻ B) ∧ (B ≻ C) ⇒ (A ≻ C) Continuity A ≻ B ≻ C ⇒ ∃ p [p, A; 1 − p, C] ∼ B Substitutability A ∼ B ⇒ [p, A; 1 − p, C] ∼ [p, B; 1 − p, C] Monotonicity A ≻ B ⇒ (p ≥ q ⇔ [p, A; 1 − p, B] ≻ ∼ [q, A; 1 − q, B])

Paolo Turrini Intro to AI (2nd Part)

slide-47
SLIDE 47

Intro to AI (2nd Part)

Rational preferences contd.

Paolo Turrini Intro to AI (2nd Part)

Violating the constraints leads to self-evident irrationality.

slide-48
SLIDE 48

Intro to AI (2nd Part)

Rational preferences contd.

Paolo Turrini Intro to AI (2nd Part)

Violating the constraints leads to self-evident irrationality. Take transitivity.

slide-49
SLIDE 49

Intro to AI (2nd Part)

Rational preferences contd.

Paolo Turrini Intro to AI (2nd Part)

Violating the constraints leads to self-evident irrationality. Take transitivity.

slide-50
SLIDE 50

Intro to AI (2nd Part)

Rational preferences contd.

Paolo Turrini Intro to AI (2nd Part)

Violating the constraints leads to self-evident irrationality. Take transitivity. If B ≻ C, then an agent who has C would pay (say) 1 cent to get B

slide-51
SLIDE 51

Intro to AI (2nd Part)

Rational preferences contd.

Paolo Turrini Intro to AI (2nd Part)

Violating the constraints leads to self-evident irrationality. Take transitivity. If B ≻ C, then an agent who has C would pay (say) 1 cent to get B If A ≻ B, then an agent who has B would pay (say) 1 cent to get A

slide-52
SLIDE 52

Intro to AI (2nd Part)

Rational preferences contd.

Paolo Turrini Intro to AI (2nd Part)

Violating the constraints leads to self-evident irrationality. Take transitivity. If B ≻ C, then an agent who has C would pay (say) 1 cent to get B If A ≻ B, then an agent who has B would pay (say) 1 cent to get A If C ≻ A, then an agent who has A would pay (say) 1 cent to get C

slide-53
SLIDE 53

Intro to AI (2nd Part)

Representation Theorem

Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944) A preference relation ≻ ∼ makes sense if and only if there exists a real-valued function u such that:

Paolo Turrini Intro to AI (2nd Part)

slide-54
SLIDE 54

Intro to AI (2nd Part)

Representation Theorem

Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944) A preference relation ≻ ∼ makes sense if and only if there exists a real-valued function u such that: u(A) ≥ u(B) ⇔ A ≻ ∼ B

Paolo Turrini Intro to AI (2nd Part)

slide-55
SLIDE 55

Intro to AI (2nd Part)

Representation Theorem

Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944) A preference relation ≻ ∼ makes sense if and only if there exists a real-valued function u such that: u(A) ≥ u(B) ⇔ A ≻ ∼ B u([p1, S1; . . . ; pn, Sn]) =Σi piu(Si)

Paolo Turrini Intro to AI (2nd Part)

slide-56
SLIDE 56

Intro to AI (2nd Part)

Representation Theorem

Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944) A preference relation ≻ ∼ makes sense if and only if there exists a real-valued function u such that: u(A) ≥ u(B) ⇔ A ≻ ∼ B u([p1, S1; . . . ; pn, Sn]) =Σi piu(Si) [⇐]

Paolo Turrini Intro to AI (2nd Part)

slide-57
SLIDE 57

Intro to AI (2nd Part)

Representation Theorem

Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944) A preference relation ≻ ∼ makes sense if and only if there exists a real-valued function u such that: u(A) ≥ u(B) ⇔ A ≻ ∼ B u([p1, S1; . . . ; pn, Sn]) =Σi piu(Si) [⇐] By contraposition. E.g., pick transitivity and show that if the relation is not transitive there is no way of associating numbers to

  • utcomes.

Paolo Turrini Intro to AI (2nd Part)

slide-58
SLIDE 58

Intro to AI (2nd Part)

Representation Theorem

Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944) A preference relation ≻ ∼ makes sense if and only if there exists a real-valued function u such that: u(A) ≥ u(B) ⇔ A ≻ ∼ B u([p1, S1; . . . ; pn, Sn]) =Σi piu(Si) [⇐] By contraposition. E.g., pick transitivity and show that if the relation is not transitive there is no way of associating numbers to

  • utcomes.

[⇒]

Paolo Turrini Intro to AI (2nd Part)

slide-59
SLIDE 59

Intro to AI (2nd Part)

Representation Theorem

Theorem (Ramsey, 1931; von Neumann and Morgenstern, 1944) A preference relation ≻ ∼ makes sense if and only if there exists a real-valued function u such that: u(A) ≥ u(B) ⇔ A ≻ ∼ B u([p1, S1; . . . ; pn, Sn]) =Σi piu(Si) [⇐] By contraposition. E.g., pick transitivity and show that if the relation is not transitive there is no way of associating numbers to

  • utcomes.

[⇒] We use the axioms to show that there are infinitely many functions that satisfy them, but they are all “equivalent” to a unique real-valued utility functions.

Paolo Turrini Intro to AI (2nd Part)

slide-60
SLIDE 60

Intro to AI (2nd Part)

Representation Theorem

Michael Maschler, Eilon Solan and Shmiel Zamir Game Theory (Ch. 2) Cambridge University Press, 2013.

Paolo Turrini Intro to AI (2nd Part)

slide-61
SLIDE 61

Intro to AI (2nd Part)

Representation Theorem

Michael Maschler, Eilon Solan and Shmiel Zamir Game Theory (Ch. 2) Cambridge University Press, 2013. The main message Give me any order on outcomes that makes sense and I can turn it into a utility function!

Paolo Turrini Intro to AI (2nd Part)

slide-62
SLIDE 62

Intro to AI (2nd Part)

Multicriteria decision-making

Certain outcomes seem difficult to compare:

Paolo Turrini Intro to AI (2nd Part)

slide-63
SLIDE 63

Intro to AI (2nd Part)

Multicriteria decision-making

Certain outcomes seem difficult to compare:

what factors are more important?

Paolo Turrini Intro to AI (2nd Part)

slide-64
SLIDE 64

Intro to AI (2nd Part)

Multicriteria decision-making

Certain outcomes seem difficult to compare:

what factors are more important? have we considered all the relevant ones?

Paolo Turrini Intro to AI (2nd Part)

slide-65
SLIDE 65

Intro to AI (2nd Part)

Multicriteria decision-making

Certain outcomes seem difficult to compare:

what factors are more important? have we considered all the relevant ones? do factor interfere with one another?

Paolo Turrini Intro to AI (2nd Part)

slide-66
SLIDE 66

Intro to AI (2nd Part)

Multicriteria decision-making

Certain outcomes seem difficult to compare:

what factors are more important? have we considered all the relevant ones? do factor interfere with one another?

In other situations the utility function may be updated because

  • f new incoming information (e.g., evaluating non-terminal

positions in a long extensive game like Chess or Go)

Paolo Turrini Intro to AI (2nd Part)

slide-67
SLIDE 67

Intro to AI (2nd Part)

Multicriteria decision-making

Figure: Deep Blue- Kasparov 1996, Final Game. Material favours Black but the position is hopeless

Paolo Turrini Intro to AI (2nd Part)

slide-68
SLIDE 68

Intro to AI (2nd Part)

Multicriteria decision-making

How can we handle utility functions of many variables X1 . . . Xn?

Paolo Turrini Intro to AI (2nd Part)

slide-69
SLIDE 69

Intro to AI (2nd Part)

Multicriteria decision-making

How can we handle utility functions of many variables X1 . . . Xn? e.g., what is U(king safety, material advantage, control of the centre)?

Paolo Turrini Intro to AI (2nd Part)

slide-70
SLIDE 70

Intro to AI (2nd Part)

Multicriteria decision-making

How can we handle utility functions of many variables X1 . . . Xn? e.g., what is U(king safety, material advantage, control of the centre)? We need to find ways to compare bundles of factors, but might be difficult in general (strict dominance, stochastic dominance).

Paolo Turrini Intro to AI (2nd Part)

slide-71
SLIDE 71

Intro to AI (2nd Part)

Multicriteria decision-making

How can we handle utility functions of many variables X1 . . . Xn? e.g., what is U(king safety, material advantage, control of the centre)? We need to find ways to compare bundles of factors, but might be difficult in general (strict dominance, stochastic dominance). Search methods to avoid multicriteria altogether: Monte Carlo Tree Search generates random endgames.

Paolo Turrini Intro to AI (2nd Part)

slide-72
SLIDE 72

Intro to AI (2nd Part)

Multicriteria decision-making

How can we handle utility functions of many variables X1 . . . Xn? e.g., what is U(king safety, material advantage, control of the centre)? We need to find ways to compare bundles of factors, but might be difficult in general (strict dominance, stochastic dominance). Search methods to avoid multicriteria altogether: Monte Carlo Tree Search generates random endgames. We assume there is a way of assigning a utility function to bundles

  • f factors and therefore compare them.

Paolo Turrini Intro to AI (2nd Part)

slide-73
SLIDE 73

Intro to AI (2nd Part)

Rationality and expected utility

Robert J. Aumann

Nobel Prize Winner Economics

“A person’s behavior is rational if it is in his best interests, given his information”

Paolo Turrini Intro to AI (2nd Part)

slide-74
SLIDE 74

Intro to AI (2nd Part)

Rationality and expected utility

Robert J. Aumann

Nobel Prize Winner Economics

“A person’s behavior is rational if it is in his best interests, given his information”

Paolo Turrini Intro to AI (2nd Part)

Choose an action that maximises the expected utility

slide-75
SLIDE 75

Intro to AI (2nd Part)

Beliefs and Expected Utility

Paolo Turrini Intro to AI (2nd Part)

slide-76
SLIDE 76

Intro to AI (2nd Part)

Beliefs and Expected Utility

Paolo Turrini Intro to AI (2nd Part)

Rewards: −1000 for dying 0 any other square

slide-77
SLIDE 77

Intro to AI (2nd Part)

Beliefs and Expected Utility

Paolo Turrini Intro to AI (2nd Part)

Rewards: −1000 for dying 0 any other square What’s the expected utility of going to [3, 1], [2, 2], [1, 3]?

slide-78
SLIDE 78

Intro to AI (2nd Part)

Using conditional independence contd.

Paolo Turrini Intro to AI (2nd Part)

P(P1,3|known, b) = α′ 0.2(0.04 + 0.16 + 0.16), 0.8(0.04 + 0.16) ≈ 0.31, 0.69 P(P2,2|known, b) ≈ 0.86, 0.14

slide-79
SLIDE 79

Intro to AI (2nd Part)

Beliefs and expected utility

The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is: u(1, 3) =

Paolo Turrini Intro to AI (2nd Part)

slide-80
SLIDE 80

Intro to AI (2nd Part)

Beliefs and expected utility

The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is: u(1, 3) = u[0.31, −1000; 0.69, 0]

Paolo Turrini Intro to AI (2nd Part)

slide-81
SLIDE 81

Intro to AI (2nd Part)

Beliefs and expected utility

The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is: u(1, 3) = u[0.31, −1000; 0.69, 0] = −310

Paolo Turrini Intro to AI (2nd Part)

slide-82
SLIDE 82

Intro to AI (2nd Part)

Beliefs and expected utility

The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is: u(1, 3) = u[0.31, −1000; 0.69, 0] = −310 u(3, 1) = u(1, 3)

Paolo Turrini Intro to AI (2nd Part)

slide-83
SLIDE 83

Intro to AI (2nd Part)

Beliefs and expected utility

The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is: u(1, 3) = u[0.31, −1000; 0.69, 0] = −310 u(3, 1) = u(1, 3) u(2, 2) =

Paolo Turrini Intro to AI (2nd Part)

slide-84
SLIDE 84

Intro to AI (2nd Part)

Beliefs and expected utility

The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is: u(1, 3) = u[0.31, −1000; 0.69, 0] = −310 u(3, 1) = u(1, 3) u(2, 2) = u[0.86, −1000; 0.14, 0]

Paolo Turrini Intro to AI (2nd Part)

slide-85
SLIDE 85

Intro to AI (2nd Part)

Beliefs and expected utility

The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is: u(1, 3) = u[0.31, −1000; 0.69, 0] = −310 u(3, 1) = u(1, 3) u(2, 2) = u[0.86, −1000; 0.14, 0] = −860

Paolo Turrini Intro to AI (2nd Part)

slide-86
SLIDE 86

Intro to AI (2nd Part)

Beliefs and expected utility

The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is: u(1, 3) = u[0.31, −1000; 0.69, 0] = −310 u(3, 1) = u(1, 3) u(2, 2) = u[0.86, −1000; 0.14, 0] = −860 Clearly going to [2, 2] from either [1, 2] or [2, 1] is irrational.

Paolo Turrini Intro to AI (2nd Part)

slide-87
SLIDE 87

Intro to AI (2nd Part)

Beliefs and expected utility

The expected utility u(1, 3) of the action (1, 3) of going to [1, 3] from an explored adjacent square is: u(1, 3) = u[0.31, −1000; 0.69, 0] = −310 u(3, 1) = u(1, 3) u(2, 2) = u[0.86, −1000; 0.14, 0] = −860 Clearly going to [2, 2] from either [1, 2] or [2, 1] is irrational. Either going to [1, 3] or [3, 1] is the rational choice.

Paolo Turrini Intro to AI (2nd Part)

slide-88
SLIDE 88

Intro to AI (2nd Part)

Risky moves

Paolo Turrini Intro to AI (2nd Part)

slide-89
SLIDE 89

Intro to AI (2nd Part)

Actuators

Paolo Turrini Intro to AI (2nd Part)

Sensors Breeze, Glitter, Smell Actuators Turn L/R, Go, Grab, Release, Shoot, Climb Rewards 1000 escaping with gold, -1000 dying, -10 using arrow, -1 walking Environment Squares adjacent to Wumpus are smelly Squares adjacent to pit are breezy Glitter iff gold is in the same square Shooting kills Wumpus if you are facing it Shooting uses up the only arrow Grabbing picks up gold if in same square Releasing drops the gold in same square

slide-90
SLIDE 90

Intro to AI (2nd Part)

Actuators

Paolo Turrini Intro to AI (2nd Part)

Sensors Breeze, Glitter, Smell Actuators Turn L/R, Go, Grab, Release, Shoot, Climb Rewards 1000 escaping with gold, -1000 dying, -10 using arrow, -1 walking Environment Squares adjacent to Wumpus are smelly Squares adjacent to pit are breezy Glitter iff gold is in the same square Shooting kills Wumpus if you are facing it Shooting uses up the only arrow Grabbing picks up gold if in same square Releasing drops the gold in same square

slide-91
SLIDE 91

Intro to AI (2nd Part)

Deterministic actions

Paolo Turrini Intro to AI (2nd Part)

Actions in the Wumpus World are deterministic

slide-92
SLIDE 92

Intro to AI (2nd Part)

Deterministic actions

Paolo Turrini Intro to AI (2nd Part)

Actions in the Wumpus World are deterministic If I want to go from [2, 3] to [2, 2] I just go.

slide-93
SLIDE 93

Intro to AI (2nd Part)

Deterministic actions

Paolo Turrini Intro to AI (2nd Part)

Actions in the Wumpus World are deterministic If I want to go from [2, 3] to [2, 2] I just go. P([2, 2] | [2, 3], (2, 2))

slide-94
SLIDE 94

Intro to AI (2nd Part)

Deterministic actions

Paolo Turrini Intro to AI (2nd Part)

Actions in the Wumpus World are deterministic If I want to go from [2, 3] to [2, 2] I just go. P([2, 2] | [2, 3], (2, 2)) =1

slide-95
SLIDE 95

Intro to AI (2nd Part)

Stochastic actions

The result of performing a in state s is a lottery over S, i.e., probability distribution over the set of all possible states.

Paolo Turrini Intro to AI (2nd Part)

slide-96
SLIDE 96

Intro to AI (2nd Part)

Stochastic actions

The result of performing a in state s is a lottery over S, i.e., probability distribution over the set of all possible states. (s, a) = [p1, A1; p2, A2; . . . pn, An]

Paolo Turrini Intro to AI (2nd Part)

slide-97
SLIDE 97

Intro to AI (2nd Part)

Stochastic actions

The result of performing a in state s is a lottery over S, i.e., probability distribution over the set of all possible states. (s, a) = [p1, A1; p2, A2; . . . pn, An] e.g., the agent decides to go from [2, 1] to [2, 2] but:

Paolo Turrini Intro to AI (2nd Part)

slide-98
SLIDE 98

Intro to AI (2nd Part)

Stochastic actions

The result of performing a in state s is a lottery over S, i.e., probability distribution over the set of all possible states. (s, a) = [p1, A1; p2, A2; . . . pn, An] e.g., the agent decides to go from [2, 1] to [2, 2] but: Goes to [2, 2] with probability 0.5

Paolo Turrini Intro to AI (2nd Part)

slide-99
SLIDE 99

Intro to AI (2nd Part)

Stochastic actions

The result of performing a in state s is a lottery over S, i.e., probability distribution over the set of all possible states. (s, a) = [p1, A1; p2, A2; . . . pn, An] e.g., the agent decides to go from [2, 1] to [2, 2] but: Goes to [2, 2] with probability 0.5 Goes to [3, 1] with probability 0.3

Paolo Turrini Intro to AI (2nd Part)

slide-100
SLIDE 100

Intro to AI (2nd Part)

Stochastic actions

The result of performing a in state s is a lottery over S, i.e., probability distribution over the set of all possible states. (s, a) = [p1, A1; p2, A2; . . . pn, An] e.g., the agent decides to go from [2, 1] to [2, 2] but: Goes to [2, 2] with probability 0.5 Goes to [3, 1] with probability 0.3 Goes back to [1, 1] with probability 0.1

Paolo Turrini Intro to AI (2nd Part)

slide-101
SLIDE 101

Intro to AI (2nd Part)

Stochastic actions

The result of performing a in state s is a lottery over S, i.e., probability distribution over the set of all possible states. (s, a) = [p1, A1; p2, A2; . . . pn, An] e.g., the agent decides to go from [2, 1] to [2, 2] but: Goes to [2, 2] with probability 0.5 Goes to [3, 1] with probability 0.3 Goes back to [1, 1] with probability 0.1 Bumps his head on the wall and stays in [2, 1] with prob. 0.1

Paolo Turrini Intro to AI (2nd Part)

slide-102
SLIDE 102

Intro to AI (2nd Part)

Stochastic actions

The result of performing a in state s is a lottery over S, i.e., probability distribution over the set of all possible states. (s, a) = [p1, A1; p2, A2; . . . pn, An] e.g., the agent decides to go from [2, 1] to [2, 2] but: Goes to [2, 2] with probability 0.5 Goes to [3, 1] with probability 0.3 Goes back to [1, 1] with probability 0.1 Bumps his head on the wall and stays in [2, 1] with prob. 0.1 Goes to any other square with probability 0

Paolo Turrini Intro to AI (2nd Part)

slide-103
SLIDE 103

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

Paolo Turrini Intro to AI (2nd Part)

slide-104
SLIDE 104

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

Paolo Turrini Intro to AI (2nd Part)

slide-105
SLIDE 105

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

Paolo Turrini Intro to AI (2nd Part)

Rewards: −1000 for dying 0 any other square

slide-106
SLIDE 106

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

Paolo Turrini Intro to AI (2nd Part)

Rewards: −1000 for dying 0 any other square What’s the expected utility of going to [3, 1], [2, 2], [1, 3]?

slide-107
SLIDE 107

Intro to AI (2nd Part)

Expected Utility and Stochastic Actions

Paolo Turrini Intro to AI (2nd Part)

P(P1,3|known, b) = α′ 0.2(0.04 + 0.16 + 0.16), 0.8(0.04 + 0.16) ≈ 0.31, 0.69 P(P2,2|known, b) ≈ 0.86, 0.14

slide-108
SLIDE 108

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

Let (s, a) = [p1, A1; p2, A2; . . . pn, An] be the result of performing action a in state s

Paolo Turrini Intro to AI (2nd Part)

slide-109
SLIDE 109

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

Let (s, a) = [p1, A1; p2, A2; . . . pn, An] be the result of performing action a in state s, where each Ai is of the form [q1, A1i; q2, A2i, . . . , qn, Ani].

Paolo Turrini Intro to AI (2nd Part)

slide-110
SLIDE 110

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

Let (s, a) = [p1, A1; p2, A2; . . . pn, An] be the result of performing action a in state s, where each Ai is of the form [q1, A1i; q2, A2i, . . . , qn, Ani]. Then the utility of such action is given be: u(s, a) =

  • pi,Ai

pi × u(Ai)

Paolo Turrini Intro to AI (2nd Part)

slide-111
SLIDE 111

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

Let (s, a) = [p1, A1; p2, A2; . . . pn, An] be the result of performing action a in state s, where each Ai is of the form [q1, A1i; q2, A2i, . . . , qn, Ani]. Then the utility of such action is given be: u(s, a) =

  • pi,Ai

pi × u(Ai) The expected utility of each outcome, assuming we have reached it, times the probability of actually reaching it.

Paolo Turrini Intro to AI (2nd Part)

slide-112
SLIDE 112

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

Let (s, a) = [p1, A1; p2, A2; . . . pn, An] be the result of performing action a in state s, where each Ai is of the form [q1, A1i; q2, A2i, . . . , qn, Ani]. Then the utility of such action is given be: u(s, a) =

  • pi,Ai

pi × u(Ai) The expected utility of each outcome, assuming we have reached it, times the probability of actually reaching it. It is a lottery of lotteries!

Paolo Turrini Intro to AI (2nd Part)

slide-113
SLIDE 113

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(1, 3) =

Paolo Turrini Intro to AI (2nd Part)

slide-114
SLIDE 114

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(1, 3) = 0.8 × u[0.31, −1000; 0.69, 0] + 0.1 × u[1, 0]+ +0.1 × u[0.86, −1000; 0.14, 0]

Paolo Turrini Intro to AI (2nd Part)

slide-115
SLIDE 115

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(1, 3) = 0.8 × u[0.31, −1000; 0.69, 0] + 0.1 × u[1, 0]+ +0.1 × u[0.86, −1000; 0.14, 0] = 0.8 × −310 + 0.1 × −860 =

Paolo Turrini Intro to AI (2nd Part)

slide-116
SLIDE 116

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(1, 3) = 0.8 × u[0.31, −1000; 0.69, 0] + 0.1 × u[1, 0]+ +0.1 × u[0.86, −1000; 0.14, 0] = 0.8 × −310 + 0.1 × −860 = −248 − 86

Paolo Turrini Intro to AI (2nd Part)

slide-117
SLIDE 117

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(1, 3) = 0.8 × u[0.31, −1000; 0.69, 0] + 0.1 × u[1, 0]+ +0.1 × u[0.86, −1000; 0.14, 0] = 0.8 × −310 + 0.1 × −860 = −248 − 86 = −334

Paolo Turrini Intro to AI (2nd Part)

slide-118
SLIDE 118

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(1, 3) = 0.8 × u[0.31, −1000; 0.69, 0] + 0.1 × u[1, 0]+ +0.1 × u[0.86, −1000; 0.14, 0] = 0.8 × −310 + 0.1 × −860 = −248 − 86 = −334

Paolo Turrini Intro to AI (2nd Part)

slide-119
SLIDE 119

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(1, 3) = 0.8 × u[0.31, −1000; 0.69, 0] + 0.1 × u[1, 0]+ +0.1 × u[0.86, −1000; 0.14, 0] = 0.8 × −310 + 0.1 × −860 = −248 − 86 = −334 We can can get to [2, 2] from two directions, but by symmetry it’s the same.

Paolo Turrini Intro to AI (2nd Part)

slide-120
SLIDE 120

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(2, 2) =

Paolo Turrini Intro to AI (2nd Part)

slide-121
SLIDE 121

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(2, 2) = 0.8 × u[0.86, −1000; 0.14, 0] + 0.1 × u[0.31, −1000; 0.69, 0]+ +0.1 × u[1, 0]

Paolo Turrini Intro to AI (2nd Part)

slide-122
SLIDE 122

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(2, 2) = 0.8 × u[0.86, −1000; 0.14, 0] + 0.1 × u[0.31, −1000; 0.69, 0]+ +0.1 × u[1, 0] = 0.8 × −860 + 0.1 × −310 =

Paolo Turrini Intro to AI (2nd Part)

slide-123
SLIDE 123

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(2, 2) = 0.8 × u[0.86, −1000; 0.14, 0] + 0.1 × u[0.31, −1000; 0.69, 0]+ +0.1 × u[1, 0] = 0.8 × −860 + 0.1 × −310 = − 688 − 31

Paolo Turrini Intro to AI (2nd Part)

slide-124
SLIDE 124

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(2, 2) = 0.8 × u[0.86, −1000; 0.14, 0] + 0.1 × u[0.31, −1000; 0.69, 0]+ +0.1 × u[1, 0] = 0.8 × −860 + 0.1 × −310 = − 688 − 31 = −729

Paolo Turrini Intro to AI (2nd Part)

slide-125
SLIDE 125

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(2, 2) = 0.8 × u[0.86, −1000; 0.14, 0] + 0.1 × u[0.31, −1000; 0.69, 0]+ +0.1 × u[1, 0] = 0.8 × −860 + 0.1 × −310 = − 688 − 31 = −729 u(1, 3) = u(3, 1) (because of symmetry)

Paolo Turrini Intro to AI (2nd Part)

slide-126
SLIDE 126

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(2, 2) = 0.8 × u[0.86, −1000; 0.14, 0] + 0.1 × u[0.31, −1000; 0.69, 0]+ +0.1 × u[1, 0] = 0.8 × −860 + 0.1 × −310 = − 688 − 31 = −729 u(1, 3) = u(3, 1) (because of symmetry) Going to [2, 2] is still the irrational choice, but not as bad. The rational choice is either going to [1, 3] or [3, 1].

Paolo Turrini Intro to AI (2nd Part)

slide-127
SLIDE 127

Intro to AI (2nd Part)

Beliefs, Expected Utility and Stochastic Actions

u(2, 2) = 0.8 × u[0.86, −1000; 0.14, 0] + 0.1 × u[0.31, −1000; 0.69, 0]+ +0.1 × u[1, 0] = 0.8 × −860 + 0.1 × −310 = − 688 − 31 = −729 u(1, 3) = u(3, 1) (because of symmetry) Going to [2, 2] is still the irrational choice, but not as bad. The rational choice is either going to [1, 3] or [3, 1]. Obviously, the more chaotic the decision system the less the impact of reward difference.

Paolo Turrini Intro to AI (2nd Part)

slide-128
SLIDE 128

Intro to AI (2nd Part)

Summary

Utility, lotteries and preferences Maximisation of expected utility Stochastic actions

Paolo Turrini Intro to AI (2nd Part)

slide-129
SLIDE 129

Intro to AI (2nd Part)

What’s next

Risky plans What’s the best “strategy” to follow? Estimating future gains: how patient should we be?

Paolo Turrini Intro to AI (2nd Part)

slide-130
SLIDE 130

Intro to AI (2nd Part)

What’s next

Risky plans What’s the best “strategy” to follow? Estimating future gains: how patient should we be?

Paolo Turrini Intro to AI (2nd Part)