343H: Honors AI
Lecture 7: Expectimax Search 2/6/2014
Kristen Grauman UT-Austin Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted
1
343H: Honors AI Lecture 7: Expectimax Search 2/6/2014 Kristen - - PowerPoint PPT Presentation
343H: Honors AI Lecture 7: Expectimax Search 2/6/2014 Kristen Grauman UT-Austin Slides courtesy of Dan Klein, UC-Berkeley Unless otherwise noted 1 Announcements PS1 is out, due in 2 weeks Last time Adversarial search with game trees
1
3
10 10 9 100 max min
10
9
10
must reason about impact of their actions when computing value of a state
action
entire tree
iterative deepening) to deal with resource limits.
Imperfect adversaries
10 10 9 100 max min
10
9
10
Optimal against a perfect player. Factors of chance
Kristen Grauman
0.50, P(T=heavy) = 0.25
L(none) = 20, L(light) = 30, L(heavy) = 60 min
E[ L(T) ] = L(none)*P(none) + L(light)*P(light) + L(heavy)*P(heavy) E[ L(T) ] = (20 * 0.25) + (30 * 0.5) + (60 * 0.25) = 35 minutes
respond randomly
wheels could slip
case outcomes, not worst-case (minimax) outcomes
score under optimal play
the outcome is uncertain
9
10 4 5 7 max chance 10 10 9 100 10 54.5
def value(s) if s is a terminal node return utility(s) if s is a max node return maxValue(s) if s is an exp node return expValue(s) def maxValue(s) values = [value(s’) for s’ in successors(s)] return max(values) def expValue(s) values = [value(s’) for s’ in successors(s)] weights = [probability(s’) for s’ in successors(s)] return expectation(values, weights)
8 4 5 6
1/2 1/3 1/6
12 9 6 3 2 15 4 6
12 9 3 2 4
… … 492 362 … 400 300 Estimate of true expectimax value (which would require a lot of work to compute)
40 20 30 x2 1600 400 900
40 20 30 x2 1600 400 900
20 25 800 650
probabilistic model of how the
behave in any state
distribution (roll a die)
require a great deal of computation
adversarial actions are likely!
magically have a distribution to assign probabilities to opponent actions / environment outcomes
Having a probabilistic belief about an agent’s action does not mean that agent is flipping any coins!
Assuming chance when the world is adversarial
Assuming the worst case when it’s not likely
Adapted from Dan Klein
Adversarial Ghost Random Ghost Minimax Pacman Won 5/5
483 Won 5/5 Avg Score: 493 Expectimax Pacman Won 1/5
Won 5/5
503
Pacman used depth 4 search with an eval function that avoids trouble Ghost used depth 2 search with an eval function that seeks Pacman
player that moves after each agent
expectations, otherwise like minimax
ExpectiMinimax-Value(state):
with 2 dice
reaching a given search node shrinks
search + very good evaluation function + reinforcement learning: world-champion level play
utility tuples
also utility tuples
maximizes its
cooperation and competition dynamically…
1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5
[1,6,6]
expected utility, given its knowledge
23
Kristen Grauman
can be summarized as a utility function
Getting ice cream Get Single Get Double Oops Whew
uncertain prizes
would pay (say) 1 cent to get B
would pay (say) 1 cent to get A
would pay (say) 1 cent to get C
a real-valued function U such that:
and lotteries!
without ever representing or manipulating utilities and probabilities
reduce product risks, etc.
involving substantial risk
can be determined, i.e., total order on prizes
the utility of having money (or being in debt)
reduce their risk