spring 2009
play

Spring 2009 Written Assignment 1: Due at the end of lecture If you - PDF document

Announcements CS 188: Artificial Intelligence Spring 2009 Written Assignment 1: Due at the end of lecture If you havent done it, but still want some points, come talk to me after class Lecture 7: Expectimax Search Project 1:


  1. Announcements CS 188: Artificial Intelligence Spring 2009  Written Assignment 1:  Due at the end of lecture  If you haven’t done it, but still want some points, come talk to me after class Lecture 7: Expectimax Search  Project 1:  Most of you did very well 2/10/2009  We promise not to steal your slip days  Come to office hours if you didn’t finish & want help  Project 2: John DeNero – UC Berkeley  Due a week from tomorrow (Wednesday) Slides adapted from Dan Klein, Stuart Russell or Andrew Moore  Want a partner? Come to the front after lecture Today Mini-Contest Winners  Problem: eat all the food in bigSearch  Mini-contest 1 results  Challenge: finding a provably optimal path is very difficult  Winning solutions (baseline is 350):  Pruning game trees  5 th : Greedy hill-climbing, Jeremy Cowles: 314  4 th : Local choices, Jon Hirschberg and Nam Do: 292  Chance in game trees  3 rd : Local choices, Richard Guo and Shendy Kurnia: 290  2 nd : Local choices, Tim Swift: 286  1 st : A* with inadmissible heuristic, Nikita Mikhaylin: 284 GamesCrafters Adversarial Games Minimax values:  Deterministic, zero-sum games: computed recursively  tic-tac-toe, chess, checkers 5 max  One player maximizes result  The other minimizes result 2 5 min  Minimax search:  A state-space search tree  Players alternate turns 8 2 5 6  Each node has a minimax Terminal values: value: best achievable utility part of the game against a rational adversary http://gamescrafters.berkeley.edu/ 5 6 1

  2. Computing Minimax Values Pruning in Minimax Search  Two recursive functions:  max-value maxes the values of successors  min-value mins the values of successors [3,3] [3,14] [3,+ ] [- ,+ ] [3,5] def value(state): If the state is a terminal state: return the state’s utility If the next agent is MAX: return max-value(state) [3,3] [- ,3] [- ,2] [- ,14] [- ,5] [2,2] If the next agent is MIN: return min-value(state) def max-value(state): Initialize max = - ∞ 3 12 8 2 14 5 2 For each successor of state: Compute value(successor) Update max accordingly Return max 8 Alpha-Beta Pruning Alpha-Beta Pseudocode  General configuration  a is the best value that MAX MAX can get at any choice point along the MIN a current path  If n becomes worse than a , MAX will avoid it, so MAX can stop considering n ’s other children b n MIN  Define b similarly for MIN 9 v Alpha-Beta Pruning Example Alpha-Beta Pruning Properties  This pruning has no effect on final result at the root a = - a = 3 b = + b = + 3  Values of intermediate nodes might be wrong! a = - a = - b = + b = 3 a = 3 a = 3 a = 3  Good move ordering improves effectiveness of pruning ≤2 ≤1 3 b = + b = 14 b = 5 a = 3 b = +  With “perfect ordering”:  Time complexity drops to O(b m/2 ) 3 12 2 14 5 1  Doubles solvable depth  Full search of, e.g. chess, is still hopeless! a = - ≥8 b = 3  This is a simple example of metareasoning, and the only a is MAX’s best alternative in the branch 8 b is MIN’s best alternative in the branch one you need to know in detail 13 2

  3. Expectimax Search Trees Maximum Expected Utility  What if we don’t know what the  Why should we average utilities? Why not minimax? result of an action will be? E.g.,  In solitaire, next card is unknown  In monopoly, the dice are random max  Principle of maximum expected utility: an agent should  In pacman, the ghosts act randomly chose the action which maximizes its expected utility,  We can do expectimax search given its knowledge  Chance nodes are like min nodes, chance except the outcome is uncertain  Calculate expected utilities  General principle for decision making  Max nodes as in minimax search  Often taken as the definition of rationality  Chance nodes take average (expectation) of value of children 10 4 5 7  We’ll see this idea over and over in this course!  Later, we’ll learn how to formalize the underlying problem as a  Let’s decompress this definition… Markov Decision Process 14 15 Reminder: Probabilities What are Probabilities?  Objectivist / frequentist answer:  A random variable represents an event whose outcome is unknown  A probability distribution is an assignment of weights to outcomes  Averages over repeated experiments  E.g. empirically estimating P(rain) from historical observation  Example: traffic on freeway?  Assertion about how future experiments will go (in the limit)  Random variable: T = how much traffic is there  New evidence changes the reference class  Outcomes: T in {none, light, heavy}  Makes one think of inherently random events, like rolling dice  Distribution: P(T=none) = 0.25, P(T=light) = 0.5, P(T=heavy) = 0.25  Common abbreviation: P(light) = 0.5  Subjectivist / Bayesian answer:  Some laws of probability (more later):  Degrees of belief about unobserved variables  Probabilities are always non-negative  E.g. an agent’s belief that it’s raining, given the temperature  Probabilities over all possible outcomes sum to one  E.g. pacman’s belief that the ghost will turn left, given the state  As we get more evidence, probabilities may change:  Often learn probabilities from past experiences (more later)  P(T=heavy) = 0.25, P(T=heavy | Hour=8am) = 0.60  New evidence updates beliefs (more later)  We’ll talk about methods for reasoning and updating probabilities later 16 17 Uncertainty Everywhere Reminder: Expectations  We can define function f(X) of a random variable X  Not just for games of chance!  I’m sick: will I sneeze this minute?  Email contains “FREE!”: is it spam?  The expected value of a function is its average value,  Tooth hurts: have cavity? weighted by the probability distribution over inputs  60 min enough to get to the airport?  Robot rotated wheel three times, how far did it advance?  Example: How long to get to the airport?  Safe to cross street? (Look both ways!)  Length of driving time as a function of traffic:  Sources of uncertainty in random variables: L(none) = 20, L(light) = 30, L(heavy) = 60  Inherently random process (dice, etc)  What is my expected driving time?  Insufficient or weak evidence  Notation: E[ L(T) ]  Ignorance of underlying processes  Remember, P(T) = {none: 0.25, light: 0.5, heavy: 0.25}  Unmodeled variables  The world’s just noisy – it doesn’t behave according to plan!  E[ L(T) ] = L(none) * P(none) + L(light) * P(light) + L(heavy) * P(heavy)  Compare to fuzzy logic , which has degrees of truth , rather than just degrees of belief  E[ L(T) ] = (20 * 0.25) + (30 * 0.5) + (60 * 0.25) = 35 18 19 3

  4. Expectimax Search Expectimax Pseudocode  In expectimax search, we have def value(s) a probabilistic model of how the if s is a max node: return maxValue(s) opponent (or environment) will behave in any state if s is an exp node: return expValue(s)  Model could be a simple if s is a terminal node: return evaluation(s) uniform distribution (roll a die)  Model could be sophisticated def maxValue(s) and require a great deal of computation values = [value(s’) for s’ in successors(s)]  We have a node for every return max(values) outcome out of our control: 8 4 5 6 opponent or environment  The model might say that def expValue(s) adversarial actions are likely! values = [value(s’) for s’ in successors(s)]  For now, assume for any state weights = [probability(s, s’) for s’ in successors(s)] we have a distribution to assign probabilities to opponent actions return expectation(values, weights) / environment outcomes Having a probabilistic belief about an agent’s action does not mean 22 23 that agent is flipping any coins! Expectimax for Pacman Expectimax for Pacman  Notice that we’ve gotten away from thinking that the Results from playing 5 games ghosts are trying to minimize pacman’s score Minimizing Random  Instead, they are now a part of the environment Ghost Ghost  Pacman has a belief (distribution) over how they will act Won 5/5 Won 5/5 Minimax  Questions: Pacman Avg. Score: Avg. Score: 493 483  Is minimax a special case of expectimax? Won 1/5 Won 5/5  What happens if we think ghosts move randomly, but Expectimax they really do try to minimize Pacman’s score? Pacman Avg. Score: Avg. Score: -303 503 Pacmanused depth 4 search with an eval function that avoids trouble 24 Ghost used depth 2 search with an eval function that seeks Pacman [Demo] Expectimax Pruning? Expectimax Evaluation  For minimax search, evaluation function scale doesn’t matter  We just want better states to have higher evaluations (get the ordering right)  We call this property insensitivity to monotonic transformations  For expectimax, we need the magnitudes to be meaningful as well  E.g. must know whether a 50% / 50% lottery between A and B is better than 100% chance of C  100 or -10 vs 0 is different than 10 or -100 vs 0 26 27 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend