1
play

1 - Pruning Pseudocode - Pruning Properties Pruning has no - PDF document

Recap: Resource Limits Introduction to Artificial Intelligence Cannot search to leaves max 4 V22.0472-001 Fall 2009 -2 4 Depth-limited search min min Instead, search a limited depth of -1 -2 4 9 tree Replace


  1. Recap: Resource Limits Introduction to Artificial Intelligence • Cannot search to leaves max 4 V22.0472-001 Fall 2009 -2 4 • Depth-limited search min min • Instead, search a limited depth of -1 -2 4 9 tree • Replace terminal utilities with an Lecture 6: Expectimax Search Lecture 6: Expectimax Search eval function for non-terminal eval function for non terminal positions • Guarantee of optimal play is gone • Replanning agents: Rob Fergus – Dept of Computer Science, Courant Institute, NYU • Search to choose next action • Replan each new turn in response to new state Many slides from Dan Klein, Stuart Russell or Andrew Moore ? ? ? ? 2 Evaluation for Pacman Iterative Deepening Iterative deepening uses DFS as a subroutine: b 1. Do a DFS which only searches for paths of length 1 … or less. (DFS gives up on any path of length 2) 2. If “1” failed, do a DFS which only searches paths of length 2 or less. 3. If “2” failed, do a DFS which only searches paths of length 3 or less. ….and so on. This works for single-agent search as well! Why do we want to do this for multiplayer games? 3 4 α - β Pruning Example α - β Pruning • General configuration • α is the best value that MAX Player can get at any choice point along the current path α Opponent • If n becomes worse than α , MAX will avoid it, so can stop considering n ’s other Player children • Define β similarly for MIN n Opponent 5 6 1

  2. α - β Pruning Pseudocode α - β Pruning Properties • Pruning has no effect on final result • Good move ordering improves effectiveness of pruning • With “perfect ordering”: With perfect ordering : • Time complexity drops to O(b m/2 ) • Doubles solvable depth • Full search of, e.g. chess, is still hopeless! • A simple example of metareasoning, here reasoning about β which computations are relevant v 7 8 Expectimax Search Trees Maximum Expected Utility • What if we don’t know what the result • Why should we average utilities? Why not minimax? of an action will be? E.g., • In solitaire, next card is unknown • In minesweeper, mine locations max • Principle of maximum expected utility: an agent should chose • In pacman, the ghosts act randomly the action which maximizes its expected utility, given its • Can do expectimax search knowledge • Chance nodes, like min nodes, except Ch d lik i d t chance chance the outcome is uncertain • Calculate expected utilities • General principle for decision making • Max nodes as in minimax search • Often taken as the definition of rationality • Chance nodes take average (expectation) of value of children 10 4 5 7 • We’ll see this idea over and over in this course! Later, we’ll learn how to formalize the • underlying problem as a Markov • Let’s decompress this definition… Decision Process 9 10 Reminder: Probabilities What are Probabilities? • Objectivist / frequentist answer: • A random variable represents an event whose outcome is unknown A probability distribution is an assignment of weights to outcomes • • Averages over repeated experiments • E.g. empirically estimating P(rain) from historical observation • Example: traffic on freeway? • Assertion about how future experiments will go (in the limit) • Random variable: T = whether there’s traffic • New evidence changes the reference class • Outcomes: T in {none, light, heavy} • Makes one think of inherently random events like rolling dice Makes one think of inherently random events, like rolling dice • • Distribution: P(T=none) = 0.25, P(T=light) = 0.55, P(T=heavy) = 0.20 Di t ib ti P(T ) 0 25 P(T li ht) 0 55 P(T h ) 0 20 • Some laws of probability (more later): • Subjectivist / Bayesian answer: Probabilities are always non-negative • • Degrees of belief about unobserved variables • Probabilities over all possible outcomes sum to one • E.g. an agent’s belief that it’s raining, given the temperature • As we get more evidence, probabilities may change: • E.g. pacman’s belief that the ghost will turn left, given the state • P(T=heavy) = 0.20, P(T=heavy | Hour=8am) = 0.60 • Often learn probabilities from past experiences (more later) • We’ll talk about methods for reasoning and updating probabilities later • New evidence updates beliefs (more later) 11 12 2

  3. Uncertainty Everywhere Reminder: Expectations • Often a quantity of interest depends on a random variable Not just for games of chance! • • I’m snuffling: am I sick? • The expected value of a function is its average output, weighted • Email contains “FREE!”: is it spam? • Tooth hurts: have cavity? by a given distribution over inputs • 60 min enough to get to the airport? • Robot rotated wheel three times, how far did it advance? • Example: How late if I leave 60 min before my flight? p y g • Safe to cross street? (Look both ways!) Safe to cross street? (Look both ways!) • Lateness is a function of traffic: • Why can a random variable have uncertainty? L(none) = -10, L(light) = -5, L(heavy) = 15 • Inherently random process (dice, etc) • What is my expected lateness? • Insufficient or weak evidence • Need to specify some belief over T to weight the outcomes • Ignorance of underlying processes • Say P(T) = {none: 2/5, light: 2/5, heavy: 1/5} • Unmodeled variables • The expected lateness: • The world’s just noisy! • Compare to fuzzy logic , which has degrees of truth , or rather than just degrees of belief 13 14 Expectations Utilities • Real valued functions of random variables: • Utilities are functions from outcomes (states of the world) to real numbers that describe an agent’s preferences • Expectation of a function of a random variable • Where do utilities come from? • In a game, may be simple (+1/-1) • Utilities summarize the agent’s goals Utilities summarize the agent s goals • Theorem: any set of preferences between outcomes can be summarized as a utility function (provided the preferences meet certain conditions) • Example: Expected value of a fair die roll • In general, we hard-wire utilities and let actions emerge (why P X f don’t we let agents decide their own utilities?) 1 1/6 1 2 1/6 2 • More on utilities soon… 3 1/6 3 4 1/6 4 5 1/6 5 15 16 6 1/6 6 Expectimax Search Expectimax Pseudocode • In expectimax search, we have a def value(s) probabilistic model of how the if s is a max node return maxValue(s) opponent (or environment) will if s is an exp node return expValue(s) behave in any state • Model could be a simple uniform if s is a terminal node return evaluation(s) distribution (roll a die) • Model could be sophisticated and def maxValue(s) def maxValue(s) require a great deal of computation require a great deal of computation • We have a node for every outcome values = [value(s’) for s’ in successors(s)] out of our control: opponent or return max(values) environment 8 4 5 6 • The model might say that adversarial actions are likely! def expValue(s) For now, assume for any state we • values = [value(s’) for s’ in successors(s)] magically have a distribution to assign probabilities to opponent weights = [probability(s, s’) for s’ in successors(s)] actions / environment outcomes return expectation(values, weights) Having a probabilistic belief about an agent’s action does not mean 17 18 that agent is flipping any coins! 3

  4. Expectimax for Pacman Expectimax Pruning? • Notice that we’ve gotten away from thinking that the ghosts are trying to minimize pacman’s score • Instead, they are now a part of the environment • Pacman has a belief (distribution) over how they will act • Quiz: Can we see minimax as a special case of expectimax? • Quiz: what would pacman s computation look like if we • Q i h t ld ’ t ti l k lik if assumed that the ghosts were doing 1-ply minimax and taking the result 80% of the time, otherwise moving randomly? • If you take this further, you end up calculating belief distributions over your opponents’ belief distributions over your belief distributions, etc… • Can get unmanageable very quickly! 19 20 Expectimax Evaluation Mixed Layer Types • E.g. Backgammon • For minimax search, evaluation function insensitive • Expectiminimax to monotonic transformations • Environment is an extra • We just want better states to have higher evaluations (get player that moves after the ordering right) each agent • Chance nodes take • For expectimax, we need the magnitudes to be expectations, otherwise like minimax meaningful as well • E.g. must know whether a 50% / 50% lottery between A and B is better than 100% chance of C • 100 or -10 vs 0 is different than 10 or -100 vs 0 21 22 Stochastic Two-Player Non-Zero-Sum Games • Similar to • Dice rolls increase b : 21 possible rolls with minimax: 2 dice • Utilities are now • Backgammon ≈ 20 legal moves tuples • Depth 4 = 20 x (21 x 20) 3 1.2 x 10 9 • Each player • As depth increases, probability of reaching maximizes their a given node shrinks a given node shrinks own entry at own entry at each node • So value of lookahead is diminished • Propagate (or • So limiting depth is less damaging back up) nodes • But pruning is less possible… from children • TDGammon uses depth-2 search + very • Can give rise to good eval function + reinforcement cooperation and 1,2,6 4,3,2 6,1,2 7,4,1 5,1,1 1,5,2 7,7,1 5,4,5 learning: world-champion level play competition dynamically… 23 24 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend