announcements cs 188 artificial intelligence
play

Announcements CS 188: Artificial Intelligence W2 is due today - PDF document

Announcements CS 188: Artificial Intelligence W2 is due today (lecture or drop box) Spring 2010 P2 is out and due on 2/18 Lecture 8: MEU / Utilities 2/11/2010 Pieter Abbeel UC Berkeley Many slides over the course adapted from Dan


  1. Announcements CS 188: Artificial Intelligence � W2 is due today (lecture or drop box) Spring 2010 � P2 is out and due on 2/18 Lecture 8: MEU / Utilities 2/11/2010 Pieter Abbeel – UC Berkeley Many slides over the course adapted from Dan Klein 1 2 Expectimax Search Trees Maximum Expected Utility � What if we don’t know what the � Why should we average utilities? Why not minimax? result of an action will be? E.g., � In solitaire, next card is unknown � In minesweeper, mine locations max � Principle of maximum expected utility: an agent should � In pacman, the ghosts act randomly choose the action which maximizes its expected utility, � Can do expectimax search given its knowledge � Chance nodes, like min nodes, chance except the outcome is uncertain � Calculate expected utilities � General principle for decision making � Max nodes as in minimax search � Often taken as the definition of rationality � Chance nodes take average (expectation) of value of children 10 4 5 7 � We’ll see this idea over and over in this course! � Later, we’ll learn how to formalize the underlying problem as a � Let’s decompress this definition… Markov Decision Process � Probability --- Expectation --- Utility 4 5 Reminder: Probabilities What are Probabilities? � Objectivist / frequentist answer: � A random variable represents an event whose outcome is unknown � A probability distribution is an assignment of weights to outcomes � Averages over repeated experiments � E.g. empirically estimating P(rain) from historical observation � Example: traffic on freeway? � Assertion about how future experiments will go (in the limit) � Random variable: T = amount of traffic � New evidence changes the reference class � Outcomes: T in {none, light, heavy} � Makes one think of inherently random events, like rolling dice � Distribution: P(T=none) = 0.25, P(T=light) = 0.55, P(T=heavy) = 0.20 � Some laws of probability (more later): � Subjectivist / Bayesian answer: � Probabilities are always non-negative � Degrees of belief about unobserved variables � Probabilities over all possible outcomes sum to one � E.g. an agent’s belief that it’s raining, given the temperature � E.g. pacman’s belief that the ghost will turn left, given the state � As we get more evidence, probabilities may change: � P(T=heavy) = 0.20, P(T=heavy | Hour=8am) = 0.60 � Often learn probabilities from past experiences (more later) � We’ll talk about methods for reasoning and updating probabilities later � New evidence updates beliefs (more later) 6 7 1

  2. Uncertainty Everywhere Reminder: Expectations � We can define function f(X) of a random variable X � Not just for games of chance! � I’m sick: will I sneeze this minute? � Email contains “FREE!”: is it spam? � The expected value of a function is its average value, � Tooth hurts: have cavity? weighted by the probability distribution over inputs � 60 min enough to get to the airport? � Robot rotated wheel three times, how far did it advance? � Example: How long to get to the airport? � Safe to cross street? (Look both ways!) � Length of driving time as a function of traffic: � Sources of uncertainty in random variables: L(none) = 20, L(light) = 30, L(heavy) = 60 � Inherently random process (dice, etc) � What is my expected driving time? � Insufficient or weak evidence � Notation: E[ L(T) ] � Ignorance of underlying processes � Remember, P(T) = {none: 0.25, light: 0.5, heavy: 0.25} � Unmodeled variables � The world’s just noisy – it doesn’t behave according to plan! � E[ L(T) ] = L(none) * P(none) + L(light) * P(light) + L(heavy) * P(heavy) � E[ L(T) ] = (20 * 0.25) + (30 * 0.5) + (60 * 0.25) = 35 9 10 Utilities Expectimax Search � Utilities are functions from outcomes (states of the world) � In expectimax search, we have a probabilistic model of how the to real numbers that describe an agent’s preferences opponent (or environment) will behave in any state � Where do utilities come from? � Model could be a simple uniform distribution (roll a die) � In a game, may be simple (+1/-1) � Model could be sophisticated � Utilities summarize the agent’s goals and require a great deal of � Theorem: any set of preferences between outcomes can be computation summarized as a utility function (provided the preferences meet � We have a node for every certain conditions) outcome out of our control: opponent or environment � The model might say that � In general, we hard-wire utilities and let actions emerge adversarial actions are likely! (why don’t we let agents decide their own utilities?) � For now, assume for any state we magically have a distribution to assign probabilities to � More on utilities soon… opponent actions / environment Having a probabilistic belief about outcomes an agent’s action does not mean that agent is flipping any coins! 12 13 Expectimax Search Expectimax Pseudocode � Chance nodes def value(s) � Chance nodes are like min if s is a max node return maxValue(s) nodes, except the outcome if s is an exp node return expValue(s) is uncertain 1 search ply if s is a terminal node return evaluation(s) � Calculate expected utilities � Chance nodes average successor values (weighted) def maxValue(s) � values = [value(s’) for s’ in successors(s)] Each chance node has a probability distribution over its return max(values) 8 4 5 6 outcomes (called a model) Estimate of true � For now, assume we’re … expectimax value 400 300 def expValue(s) given the model (which would values = [value(s’) for s’ in successors(s)] require a lot of � Utilities for terminal states weights = [probability(s, s’) for s’ in successors(s)] … work to compute) � Static evaluation functions return expectation(values, weights) … give us limited-depth search 492 362 15 2

  3. Expectimax Evaluation Mixed Layer Types � E.g. Backgammon � Evaluation functions quickly return an estimate for a � Expectiminimax node’s true value (which value, expectimax or minimax?) � Environment is an extra � For minimax, evaluation function scale doesn’t matter player that moves after � We just want better states to have higher evaluations each agent (get the ordering right) � Chance nodes take � We call this insensitivity to monotonic transformations expectations, otherwise like minimax � For expectimax, we need magnitudes to be meaningful ExpectiMinimax-Value( state ): 20 30 x 2 400 900 0 40 0 1600 Stochastic Two-Player � Dice rolls increase b : 21 possible rolls with 2 dice � Backgammon ≈ 20 legal moves � Depth 4 = 20 x (21 x 20) 3 1.2 x 10 9 � As depth increases, probability of reaching a given node shrinks � So value of lookahead is diminished � So limiting depth is less damaging � But pruning is less possible… � TDGammon uses depth-2 search + very good eval function + reinforcement learning: world- champion level play 23 24 Maximum Expected Utility Utilities: Unknown Outcomes � Principle of maximum expected utility: Going to airport from home � A rational agent should choose the action which maximizes its expected utility, given its knowledge Take Take surface freeway � Questions: streets � Where do utilities come from? � How do we know such utilities even exist? � Why are we taking expectations of utilities (not, e.g. minimax)? Clear, Traffic, Clear, 10 min 50 min 20 min � What if our behavior can’t be described by utilities? Arrive Arrive Arrive early late on time 25 26 3

  4. Preferences Rational Preferences � An agent chooses among: � We want some constraints on ⇒ preferences before we call � � � ( A B ) ( B C ) ( A C ) ∧ � Prizes: A, B , etc. them rational � Lotteries: situations with uncertain prizes � For example: an agent with intransitive preferences can be induced to give away all of its money � Notation: � If B > C, then an agent with C would pay (say) 1 cent to get B � If A > B, then an agent with B would pay (say) 1 cent to get A � If C > A, then an agent with A would pay (say) 1 cent to get C 27 28 Rational Preferences MEU Principle � Preferences of a rational agent must obey constraints. � Theorem: � The axioms of rationality: � [Ramsey, 1931; von Neumann & Morgenstern, 1944] � Given any preferences satisfying these constraints, there exists a real-valued function U such that: � Maximum expected utility (MEU) principle: � Choose the action that maximizes expected utility � Note: an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities � Theorem: Rational preferences imply behavior � E.g., a lookup table for perfect tictactoe describable as maximization of expected utility 29 30 Utility Scales Human Utilities � Normalized utilities: u + = 1.0, u - = 0.0 � Utilities map states to real numbers. Which numbers? � Standard approach to assessment of human utilities: � Micromorts: one-millionth chance of death, useful for paying to reduce product risks, etc. � Compare a state A to a standard lottery L p between � “best possible prize” u + with probability p � QALYs: quality-adjusted life years, useful for medical decisions � “worst possible catastrophe” u - with probability 1-p involving substantial risk � Adjust lottery probability p until A ~ L p � Note: behavior is invariant under positive linear transformation � Resulting p is a utility in [0,1] � With deterministic prizes only (no lottery choices), only ordinal utility can be determined, i.e., total order on prizes 31 32 4

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend