Announcements Homework k 3: Game Trees s (lead TA: Zhaoqing) Due - PDF document

Announcements • Homework k 3: Game Trees s (lead TA: Zhaoqing) • Due Mon 30 Sep at 11:59pm • Pr Project 2 t 2: Multi-Agent Search (lead TA: Zhaoqing) • Due Thu 10 Oct at 11:59pm (and Thursdays thereafter) • Offi Office Ho Hours • Iris: s: Mon 10.00am-noon, RI 237 • JW JW: Tue 1.40pm-2.40pm, DG 111 • El Eli: Fri 10.00am-noon, RY 207 • Zh Zhaoqi qing: : Thu 9.00am-11.00am, HS 202 CS 4100: Artificial Intelligence Uncertainty and Utilities Ja Jan-Wi Willem van de Meent Northeastern University [These slides were created by Dan Klein, Pieter Abbeel for CS188 Intro to AI at UC Berkeley (ai.berkeley.edu).]

Uncertain Outcomes Worst-Case vs. Average Case max min 10 10 9 100 Id Idea: Uncertain outcomes controlled by chance, not an adversary!

Expectimax Search • Why y wouldn’t we kn know what the resu sult of an action will be? max • Exp xplicit randomness: ss: rolling dice • Unpredictable opponents: s: the ghosts respond randomly • Actions s can fail: when moving a robot, wheels might slip chance • Id Idea: ea: Values should reflect average-case ( exp xpectimax ) outcomes, not worst-case ( mi minima max ) outcomes • Exp xpectimax se search : compute the ave verage sc score under optimal play 10 10 10 4 9 5 100 7 • Max x nodes s as in minimax search • Ch Chance n nodes are like min nodes but the outcome is uncertain • Calculate their exp xpected utilities • I.e. take weighted average (expectation) of children • Later, we’ll learn how to formalize the underlying uncertain- result problems as Marko kov v Decisi sion Processe sses [Demo: min vs exp (L7D1,2)] Minimax vs Expectimax (Minimax)

Minimax vs Expectimax (Expectimax) Expectimax Pseudocode def value(state): if the state is a terminal state: return the state’s utility if the next agent is MAX: return max-value(state) if the next agent is EXP: return exp-value(state) def max-value(state): def exp-value(state): initialize v = -∞ initialize v = 0 for each successor of state: for each successor of state: v = max(v, value(successor)) p = probability(successor) return v v += p * value(successor) return v

Expectimax Pseudocode def exp-value(state): initialize v = 0 for each successor of state: 1/2 1/6 p = probability(successor) 1/3 v += p * value(successor) return v 5 8 24 7 -12 v = (1/2) (8) + (1/3) (24) + (1/6) (-12) = 10 Expectimax Example 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 1/3 3 12 9 2 4 6 15 6 0

Expectimax Pruning? 3 12 9 2 Depth-Limited Expectimax Estimate of true … expectimax value 400 300 (which would require a lot of … work to compute) … 492 362

Probabilities Reminder: Probabilities • A random va variable represents an eve vent whose out outcom come is unknown • A probability y dist stribution assigns weights to outcomes 0.25 • Exa xample: Traffic on freeway • Random va variable: T = am T = amount of tr ount of traffi affic s: T ∈ {none, light, heavy} • Outcomes: vy} • Dist stribution: P(T P(T=n =none) = 0 ) = 0.2 .25 , P(T P(T=ligh =light) = 0 t) = 0.5 .50 , P(T=heavy) vy) = 0.25 0.50 • Some laws s of probability y (more later): • Probabilities are always non non-negative ve • Probabilities over all possible outcomes su sum to one • As s we get more evi vidence, probabilities s may y change: 0.25 • P(T=heavy) vy) = 0.25 , P(T=heavy vy | Hour=8am) = 0.60 • We’ll talk about methods for reasoning and updating probabilities later

Reminder: Expectations • The exp f(X) of a random xpected va value of a function f( variable X is is a weighted average over outcomes. • Exa xample: How long to get to the airport? Time: 20 min 30 min 60 min + + 35 min x x x Probability: 0.25 0.50 0.25 What Probabilities to Use? • In ex expect ectimax ax se search , we have a pr proba babi bilistic mo model of the opponent (or environment) • Model could be a simple uniform distribution (roll a die) • Model could be sophisticated and require a great deal of computation • We have a chance node for any outcome out of our control: opponent or environment • The model might say that adversarial actions are likely! • For now, assume each chance node “m “magically” comes along with probabilities that specify the distribution over its outcomes Having a probabilistic belief about another agent’s action does not mean that the agent is flipping any coins!

Quiz: Informed Probabilities • Let’s say you know that your opponent is actually running a de depth pth 2 2 mi minima max , using the result 80% 80% of of the he time , and moving randomly y otherwise se • Quest stion: What tree search should you use? • An Answer: Ex Expecti tima max! To compute EACH chance node’s probabilities, • you have to run a simulation of your opponent • This kind of thing gets very slow very quickly 0.1 0.9 • Even worse if you have to simulate your opponent simulating you… • … except for minimax, which has the nice property that it all collapses into one game tree Modeling Assumptions

The Dangers of Optimism and Pessimism Dangerous Optim Da imis ism Dangerous Pessim Da imis ism Assuming chance when the world is adversarial Assuming the worst case when it’s not likely Assumptions vs. Reality Adversarial Ghost Random Ghost Won 5/5 Won 5/5 Minimax Pacman Avg. Score: 483 Avg. Score: 493 Won 1/5 Won 5/5 Expectimax Pacman Avg. Score: -303 Avg. Score: 503 Results from playing 5 games Pacman used depth 4 search with an eval function that avoids trouble Ghost used depth 2 search with an eval function that seeks Pacman [Demos: world assumptions (L7D3,4,5,6)]

Assumptions vs. Reality Adversarial Ghost Random Ghost Won 5/5 Won 5/5 Minimax Pacman Avg. Score: 483 Avg. Score: 493 Won 1/5 Won 5/5 Expectimax Pacman Avg. Score: -303 Avg. Score: 503 Results from playing 5 games Pacman used depth 4 search with an eval function that avoids trouble Ghost used depth 2 search with an eval function that seeks Pacman [Demos: world assumptions (L7D3,4,5,6)] Adversarial Ghost vs. Minimax Pacman

Random Ghost vs. Expectimax Pacman Adversarial Ghost vs. Expectimax Pacman

Random Ghost vs Minimax Pacman Other Game Types

Mixed Layer Types • E.g. Backg kgammon • Exp xpectiminimax • Environment is an extra “r “rand andom om ag agent ent” ” player that moves after each min/max agent • Each node computes the appropriate combination of its children Example: Backgammon • Dice rolls s increase se breadth: 21 outcomes with 2 dice • Backg kgammon: ~20 legal move ves x 20) 3 = 1.2 x • De x 10 9 Depth th 2 2: 20 x x (21 x • As s depth increase ses, s, probability y of reaching a give ven se search node sh shrinks ks • So usefulness of search is diminished • So limiting depth is less damaging • But pruning is trickier… • Hist storic AI: TD TDGam Gammon on uses depth-2 search + very good evaluation function + reinforcement learning: world-champion level play st AI world champion in any • 1 st y game! Image: Wikipedia

Multi-Agent Utilities • What if the game is not ze zero-su sum , or has multiple playe yers ? • Generaliza zation of mi minima max: • Terminals s have utility tuples s (one for each agent) • Node va values s are also utility tuples • Each player maximizes its own component • Can give rise to cooperation and competition dynamically… 1,6,6 7,1,2 6,1,2 7,2,1 5,1,7 1,5,2 7,7,1 5,2,5 Utilities

Maximum Expected Utility • Why should we ave verage utilities ? Why not mi minima max ? • Minimax will be overly risk sk-ave verse se in most settings. • Principle of maxi ximum exp xpected utility • A ra rational agent should chose the action that maxi ximize zes s its s exp xpected utility , given its kn knowledge of the world . • Quest stions: s: • Where do utilities come from? • How do we know such utilities even exist? • How do we know that averaging even makes sense? • What if our behavior (preferences) can’t be described by utilities? What Utilities to Use? 20 30 x 2 400 900 0 40 0 1600 • For worst-case mi minima max reasoning, terminal sc scaling doesn’t matter • We just want better states to have higher evaluations (get the ordering right) • We call this inse sensi sitivi vity y to monotonic transf sformations • For average-case exp xpectimax reasoning, magnitudes s matter

Utilities • Utilities s are functions from out outcom comes es (states of the world) to real numbers s that describe an agent’s pr prefe ferences • Where do utilities s come from? • In a ga game , may be simple ( +1 +1/-1 ) • Utilities summarize the agent’s go goals • Theor em: any “ra “rational” preferences can Theorem be summarized as a utility function • We har hard-wir wire utilities and let behaviors em emer erge • Why don’t we let agents pick utilities? • Why don’t we prescribe behaviors? Utilities: Uncertain Outcomes Getting ice cream Get Single Get Double Oops Whew!

Preferences A Prize A Lottery • An agent must st have ve preferences s among: s: A, B , etc. • Prize zes: A • Lotteries: s: situations with uncertain prizes p 1 -p A B • No Notatio ion: • Pr Preference: • In Indiffe ifference: Rationality

Announcements Homework k 3: Game Trees s (lead TA: Zhaoqing) Due - PDF document

Announcements Homework k 3: Game Trees s (lead TA: Zhaoqing) Due Mon 30 Sep at 11:59pm Pr Project 2 t 2: Multi-Agent Search (lead TA: Zhaoqing) Due Thu 10 Oct at 11:59pm (and Thursdays thereafter) Offi Office Ho Hours

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability & CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

Announcements CS 4100: Artificial Intelligence Uncertainty and Utilities Homework k 3: Game

Motivation Warren Buffett Javier Estrada IESE Business School I never had the faintest

Writing Temporally Predictable Code Peter Puschner Benedikt Huber slides credits: P. Puschner, R.

Pairing Positive and Negative To Fill the Hole in the Heart FACES Conference La Jolla, 2012

Evolving ASIC Methodology to Adapt to Technology and EDA Tool Advances Tom Russell Manager ASIC

Searches for NP Claudio Campagnari University of California Santa Barbara 1 Disclaime mer

How to be an Inclusive Leader 1 Agenda 2 Inclusive Hiring Carefully consider every

5/23/14 A 6 week old boy is brought in by ambulance

Announcements Homework k 3: Game Trees s (lead TA: Zhaoqing) Due - PDF document

Announcements Homework k 3: Game Trees s (lead TA: Zhaoqing) Due Mon 30 Sep at 11:59pm Pr Project 2 t 2: Multi-Agent Search (lead TA: Zhaoqing) Due Thu 10 Oct at 11:59pm (and Thursdays thereafter) Offi Office Ho Hours

DHTs and Sharding Aurojit Panda Announcements Announcements Fill out the Github consent

61A Lecture 35 Wednesday, December 4 Announcements 2 Announcements Homework 11 due Thursday

61A Lecture 6 Monday, February 2 Announcements 2 Announcements Homework 2 due Monday 2/2 @

61A Lecture 33 Monday, November 25 Announcements 2 Announcements Homework 10 due Tuesday

61A Lecture 6 Friday, September 13 Announcements 2 Announcements Homework 2 due Tuesday

61A Lecture 24 Monday, March 30 Announcements 2 Announcements Homework 7 due Wednesday 4/8

61A Lecture 37 Wednesday, April 29 Announcements 2 Announcements Homework 9 (4 pts) due

CS 61A Lecture 10 Friday, February 13 Announcements 2 Announcements Guerrilla Section 2 is

61A Lecture 14 Wednesday, February 25 Announcements 2 Announcements Project 2 due Thursday

Linearizability &amp; CAP Announcements No hours this week. Announcements No hours this

61A Lecture 13 Wednesday, October 2 Announcements 2 Announcements Homework 3 deadline

61A Lecture 24 Friday, November 1 Announcements 2 Announcements Homework 7 due Tuesday 11/5

61A Extra Lecture 2 Thursday, February 5 Announcements 2 Announcements If you want 1 unit

CS 61A Lecture 11 Wednesday, February 18 Announcements 2 Announcements Optional Hog Contest

Announcements Lecture 22 System Development Leah Perlmutter / Summer 2018 Announcements

Lecture 30: Conclusion Brian Hou August 11, 2016 Announcements Announcements Final Exam

Announcements CS 4100: Artificial Intelligence Uncertainty and Utilities Homework k 3: Game

Motivation Warren Buffett Javier Estrada IESE Business School I never had the faintest

Writing Temporally Predictable Code Peter Puschner Benedikt Huber slides credits: P. Puschner, R.

Pairing Positive and Negative To Fill the Hole in the Heart FACES Conference La Jolla, 2012

Evolving ASIC Methodology to Adapt to Technology and EDA Tool Advances Tom Russell Manager ASIC

Searches for NP Claudio Campagnari University of California Santa Barbara 1 Disclaime mer

How to be an Inclusive Leader 1 Agenda 2 Inclusive Hiring Carefully consider every

5/23/14 A 6 week old boy is brought in by ambulance

Linearizability & CAP Announcements No hours this week. Announcements No hours this