10/23/2015 CSE 473: Artificial Intelligence Autumn 2015 Hill - PDF document

10/23/2015 CSE 473: Artificial Intelligence Autumn 2015 Hill Climbing Expectimax Search Uncertainty Fereshteh Sadeghi and Steve Tanimoto With slides from : Dieter Fox, Dan Weld, Dan Klein, Pieter Abbeel and others. 1 1

10/23/2015 2

10/23/2015 Worst-Case vs. Average Case Probabilities max chance 10 10 9 100 Idea: Uncertain outcomes controlled by chance! Reminder: Probabilities Reminder: Expectations • The expected value of a function of a random • A random variable represents an event whose outcome is unknown variable is the average, weighted by the • A probability distribution is an assignment of weights to outcomes 0.25 probability distribution over outcomes • Example: Traffic on freeway • Random variable: T = whether there’s traffic • Example: How long to get to the airport? • Outcomes: T in {none, light, heavy} • Distribution: P(T=none) = 0.25, P(T=light) = 0.50, P(T=heavy) = 0.25 Time: 20 30 60 0.50 • Some laws of probability (more later): + + 35 min min min • Probabilities are always non-negative x x x Probability: 0.25 0.50 0.25 • Probabilities over all possible outcomes sum to one min • As we get more evidence, probabilities may change: • P(T=heavy) = 0.25, • P(T=heavy | Hour=8am) = 0.60 0.25 • We’ll talk about methods for reasoning and updating probabilities later 3

10/23/2015 Worst-Case vs. Average Case Worst-Case vs. Average Case max max min chance 10 10 9 100 10 10 9 100 Idea: Uncertain outcomes controlled by chance, not an adversary! What Probabilities to Use? Randomness? • In expectimax search, we have a max probabilistic model of how the opponent (or environment) will behave in any state A 1 A 2 • Model could be a simple uniform distribution (roll a die) • Model could be sophisticated and require a chance • Why wouldn’t we know the results of an great deal of computation • We have a chance node for any outcome out action? of our control: opponent or environment • The model might say that adversarial actions • Explicit randomness: rolling dice are likely! • Unpredictable opponents: the ghosts 10 10 10 4 9 5 100 7 respond erratically • For now, assume each chance node • Actions can fail: when robot moves, its magically comes along with probabilities wheels might slip that specify the distribution over its outcomes Expectimax Search Expectimax Pseudocode max • Values now reflect average-case def value(state): (expected) outcomes, not worst-case if the state is a terminal state: return the state’s utility (minimum) outcomes if the next agent is MAX: return max-value(state) chance if the next agent is EXP: return exp-value(state) • Expectimax search: Compute average score under optimal play • Max nodes as in minimax search def max-value(state): 10 10 10 4 9 5 100 7 def exp-value(state): • Chance nodes are like min nodes but the initialize v = 0 initialize v = -∞ outcome is uncertain. Calculate their for each successor of state: expected utilities for each successor of state: p = • I.e. take weighted average (expectation) of v = max(v, value(successor)) probability(successor) children return v v += p * value(successor) return v [Demo: min vs exp (L7D1,2)] 4

10/23/2015 Expectimax Pseudocode Utilities 10 1/2 1/6 1/3 5 8 24 7 -12 def exp-value(state): initialize v = 0 for each successor of state: v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10 p = probability(successor) v += p * value(successor) return v Maximum Expected Utility Utilities • Utilities are functions from • Why should we average utilities? outcomes (states of the world) to real numbers that describe an agent’s preferences • Principle of maximum expected utility: • Where do utilities come from? • A rational agent should chose the action that • In a game, may be simple (+1/- maximizes its expected utility, given its 1) knowledge • Utilities summarize the agent’s goals • Theorem: any “rational” • Questions: preferences can be summarized as a utility function • Where do utilities come from? • How do we know such utilities even exist? • We hard-wire utilities and let behaviors emerge • How do we know that averaging even makes • Why don’t we let agents pick sense? utilities? • What if our behavior (preferences) can’t be • Why don’t we prescribe described by utilities? behaviors? Preferences Utilities: Uncertain Outcomes Getting ice cream A Prize A Lottery • An agent must have preferences among: Get Single Get A • Prizes: A, B , etc. Double • Lotteries: situations with p 1 -p uncertain prizes Oops Whe A B w! • Notation: • Preference: • Indifference: 5

10/23/2015 Rationality Rational Preferences • We want some constraints on preferences before we call them rational, such as:   Axiom of Transitivity: ( A B ) ( B C ) ( A C )    • For example: an agent with intransitive preferences can be induced to give away all of its money • If B > C, then an agent with C would pay (say) 1 cent to get B • If A > B, then an agent with B would pay (say) 1 cent to get A • If C > A, then an agent with A would pay (say) 1 cent to get C Rational Preferences MEU Principle • Theorem [Ramsey, 1931; von Neumann & Morgenstern, 1944] The Axioms of Rationality • Given any preferences satisfying these constraints, there exists a real- valued function U such that: • I.e. values assigned by U preserve preferences of both prizes and lotteries! • Maximum expected utility (MEU) principle: • Choose the action that maximizes expected utility • Note: an agent can be entirely rational (consistent with MEU) without ever representing or manipulating utilities and probabilities • E.g., a lookup table for perfect tic-tac-toe, a reflex vacuum cleaner Theorem: Rational preferences imply behavior describable as maximization of expected utility Human Utilities Human Utilities Playing Russian Roulette? 6

10/23/2015 Playing Russian Roulette? Playing Russian Roulette? How much you would pay to avoid a a risk? How much you would pay to avoid a a risk? What value people would place on their own lives? What value people would place on their own lives? Perhaps tens of thousands of dollars …?? Playing Russian Roulette? Playing Russian Roulette? How much you would pay to avoid a a risk? How much you would pay to avoid a a risk? What value people would place on their own lives? What value people would place on their own lives? Perhaps tens of thousands of dollars …?? Perhaps tens of thousands of dollars …?? micromort micromort The actual human behavior reflects a much lower monetary value for a micromort!!! Playing Russian Roulette? Utility Scales • Normalized utilities: u + = 1.0, u - = 0.0 How much you would pay to avoid a a risk? What value people would place on their own lives? • Micromorts: one-millionth chance of death, useful for paying to reduce product risks, etc. Perhaps tens of thousands of dollars …?? micromort • QALYs: quality-adjusted life years, useful for medical decisions involving substantial risk • Note: behavior is invariant under positive linear transformation The actual human behavior reflects a much lower monetary value for a micromort!!! Driving for 230 miles incurs a risk of one micromort!! Over the life of your car (~92k miles) that’s 400 micromorts!! People are willing to pay $10k for a car that halves the risk of death!! • With deterministic prizes only (no lottery choices), only ordinal utility can be determined, i.e., total order on prizes 7

10/23/2015 Human Utilities Utility of Money • Utilities map states to real numbers. Which numbers? • Standard approach to assessment (elicitation) of human • Money plays a significant rule in human utility functions utilities: • Compare a prize A to a standard lottery L p between • Usually an agent prefers more money to less • “best possible prize” u + with probability p • “worst possible catastrophe” u - with probability 1-p • Adjust lottery probability p until indifference: A ~ L p • Resulting p is a utility in [0,1] 0.999999 0.000001 Pay $30 No change Instant death Utility of Money Utility of Money • Money plays a significant rule in human utility functions • Money plays a significant rule in human utility functions • Usually an agent prefers more money to less • Usually an agent prefers more money to less • • The agent exhibits a monotonic preference for more money The agent exhibits a monotonic preference for more money But! • This does not mean that money behaves as a utility function! • This does not say anything about preferences between lotteries involving money! Money Example: • Money does not behave as a utility function, but we can talk about the utility of having money (or being in debt) • In a television game show: • Given a lottery L = [p, $X; (1-p), $Y] • The expected monetary value EMV(L) is p*X + (1-p)*Y A) take $1,000,000 prize • U(L) = p*U($X) + (1-p)*U($Y) • Typically, U(L) < U( EMV(L) ) B) gamble on the flip of a coin: • In this sense, people are risk-averse • If heads nothing • When deep in debt, people are risk-prone • If tails get $2,500,000 Which one you would take? A or B? 8

10/23/2015 CSE 473: Artificial Intelligence Autumn 2015 Hill - PDF document

10/23/2015 CSE 473: Artificial Intelligence Autumn 2015 Hill Climbing Expectimax Search Uncertainty Fereshteh Sadeghi and Steve Tanimoto With slides from : Dieter Fox, Dan Weld, Dan Klein, Pieter Abbeel and others. 1 1 10/23/2015 2

2015 2015 2015 Transportation Summit 2015 Transportation Summit 2015 2015 GDOT PowerPoint GDOT

Tucson Fire Department April 8, 2016, 2015 Awards April 8, 2016, 2015 Awards March 8, 2015: April

Results 2014 and Outlook 2015 24 March 2015 24 March 2015 / Results 2014 and Outlook 2015 / 1

WHMIS 2015 WHMIS 2015 Welcome to WHMIS 2015. 1 WHMIS 2015 Overview WHMIS 2015 Introduction

Q3 2015 Press Presentation | October 29, 2015 | | Page 1 October 29, 2015 Q3 2015 At a Glance

Q1 2015 IR Presentation May 07, 2015 Page 1 | May 07, 2015 Agenda 1 Q1 2015 At a Glance

ANNUAL GENERAL MEETING 2015 1 1 AGM - 10 June 2015 AGM 2015 2 2 AGM - 10 June 2015 3 3

2015 OUTLOOK Q1 2015 21 May 2015 Copenhagen CONTENTS Overview Q1 2015 Outlook 2015

FY 2015 Press Presentation | October 15, 2015 | | Page 1 Agenda FY 2015 1. FY 2015 At a

DTL PLAN OF APPROACH 2015 UPDATE NOV 2015 DTL PAC mee*ng Nov 13th 2015 2015: IMPLEMENT DTL AS

Sculpture in the City 2015 City Arts Initiative 17 February 2015 Sculpture in the City 2015

Q3 2015 IR Presentation | October 29, 2015 | | Page 1 Agenda 1 Q3 2015 At a Glance 2

9M 2015 Results and Business Update 29 October 2015 9M 2015 results: key remarks 3Q 2015

2015 Joint 2015 Joint PIE PIE / BUDGE BUDGET Meeting Meeting May 28, 2015 Ma y 28, 2015

VI-EPSCoR Annual Conference 2015 VI-EPSCoR Annual Conference 2015 VI-EPSCoR Annual Conference

2Q 2015 2Q 2015 2Q 2015 2Q 2015 EARNINGS EARNINGS EARNINGS EARNINGS REVIEW REVIEW REVIEW

and Monte Carlo Localization Wolfram Burgard, Cyrill Stachniss, Maren Bennewitz, Diego Tipaldi,

Lecture 14 - Grand Recap Welcome! , = (, ) ,

Roulette: Inheritance Case Study Roulette involves a player, a wheel, and bets Real game

Roulette: Inheritance Case Study What are scenarios? Roulette involves a player, a wheel, and

The Sandbox Roulette: are you ready to gamble? Rafal Wojtczuk rafal@bromium.com Rahul Kashyap

Variance reduction A primer on simplest techniques What is variance reduction Reduce

Ukraine Eurobond Market 2005-YTD: or Russian

02941 Physically Based Rendering Path Tracing Jeppe Revall Frisvad June 2020 Heckberts Light