i nt roduct ion
play

I nt roduct ion So f ar we have st udied environment s where t - PDF document

I nt roduct ion So f ar we have st udied environment s where t here is only a single-agent Adversarial Search Today we look at what happens if we are in a set t ing where t her e ar e mult iple CS 486 / 686 agent s planning against


  1. I nt roduct ion • So f ar we have st udied environment s where t here is only a single-agent Adversarial Search • Today we look at what happens if we are in a set t ing where t her e ar e mult iple CS 486 / 686 agent s planning against each ot her May 19, 2005 – Game t heory: zero sum games Univer sit y of Wat erloo 1 2 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Out line Games • Games are one of t he oldest , most well-st udied domains in AI • Games • Why? • Minimax search – They are f un • Evaluat ion f unct ions – Games are usually easy t o represent and t he rules are clear • Alpha-bet a pruning – St at e spaces can be very large (so more challenging t han “t oy problems”) • Coping wit h chance • I n chess t he search t ree has ~10 154 nodes • Game programs – Like t he “real world” in t hat decisions have t o be made and t ime is vit ally import ant – Easy t o det ermine when a program is doing well • i.e. it wins 3 4 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Types of games Games as search problems • Consider a 2-player perf ect inf ormat ion game • Perf ect vs imperf ect inf or mat ion – State: board conf igurat ion plus t he player who’s – Perf ect inf o means t hat you can see t he t urn it is t o move ent ire st at e of t he game – Successor f unction: given a st at e ret urns a list of (move,st at e) pairs, indicat ing a legal move and t he – Chess, checkers, ot hello, go,… result ing board – I mperf ect inf o games include scrabble, – Terminal state: st at es where t here is a poker, most card games win/ loss/ draw • Det er minist ic vs st ochast ic – Utilit y f unct ion: assigns a numerical value t o t erminal st at es (e.g. I n chess +1 f or a win, -1 f or a – Chess is det erminist ic loss, 0 f or a draw) – Backgammon is st ochast ic – Solution : a st rat egy (way of picking moves) t hat wins t he game 5 6 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart 1

  2. Example: Tic-Tac-Toe Game search challenge MAX (X) • What makes game search challenging? X X X – There is an opponent ! MIN (O) X X X X X X – The opponent is malicious – it want s t o win (i.e. it is t rying t o make you lose) X O X O X . . . – We need t o t ake t his int o account when choosing MAX (X) O moves • Simulat e t he opponent ’s behaviour in our search X O X X O X O . . . MIN (O) X X MAX’s j ob is t o use • Not at ion: One player is called MAX (who t he search t ree t o want s t o maximize it s ut ilit y) and one player . . . . . . . . . . . . det ermine t he best is called MI N (who want s t o minimize it s move . . . X O X X O X X O X ut ilit y) TERMINAL O X O O X X O X X O X O O 7 8 Utility −1 0 +1 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Opt imal st rat egies Opt imal st rat egies • I n st andar d search t he opt imal solut ion is • Want t o f ind t he opt imal st rat egy a sequence of moves leading t o a winning – One t hat leads t o out comes at least as good t erminal st at e as any ot her st rat egy, given t hat MI N is • But MI N has somet hing t o say about t his playing opt imally • Strategy (f rom MAX’s perspect ive): – Equilibr ium (game t heory) – Zero-sum games of perf ect inf ormat ion are – Specif y a move f or t he init ial st at e, specif y a “easy games” f rom a game t heoret ic move f or all possible st at es arising f rom perspect ive MI N’s response, t hen all possible responses t o all of MI N’s responses t o MAX’s previous move… .. 9 10 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Minimax Value Minimax algorit hm MI NI MAX-VALUE(n) = Ut ilit y(n) if n is a t erminal st at e Max s ∈ Succ(n) MI NI MAX-VALUE(s) if n is a MAX node Min s ∈ Succ(n) MI NI MAX-VALUE(s) is n is a MI N node ply MAX 3 Ret urns act ion A corresponding a 2 a 1 a 3 t o best possible move B C D MIN 3 2 2 b b b c c c d d d 1 2 3 1 2 3 1 2 3 3 12 8 2 4 6 14 5 2 11 12 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart 2

  3. Propert ies of Minimax Propert ies of Minimax • Complet e if t ree is f init e • Complet e if t ree is f init e • Time complexit y: O(b m ) • Time complexit y: O(b m ) • Space complexit y: O(bm) (it is DFS) • Space complexit y: O(bm) (it is DFS) • Opt imal against an opt imal opponent • Opt imal against an opt imal opponent – I f MI N does not play opt imally t hen we might be able t o do bet t er f ollowing a dif f erent st rat egy m is dept h of t he t ree m is dept h of t he t ree 13 14 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Minimax and mult i-player games to move A ( 1, 2, 6) • Can we now writ e a program t hat will play chess r easonably well? B ( 1, 2, 6) (−1, 5, 2) C ( 1, 2, 6) X ( 6, 1, 2) (−1, 5, 2) ( 5, 4, 5) A ( 1, 2, 6) ( 4, 2, 3) ( 6, 1, 2) ( 7, 4,−1) ( 5,−1,−1) (−1, 5, 2) (7, 7,−1) ( 5, 4, 5) Can not handle alliances, sidepayment s… . 15 16 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Alpha-Bet a Pruning • No! • Can we now writ e a program t hat will play chess r easonably well? – I f we are smart (and lucky) we can do pruning – For chess b~35 and m~100 • Eliminat e large part s of t he t ree f rom – Do we really need t o look at all t hose considerat ion nodes? • Alpha-Bet a pruning applied t o a minimax t ree – Ret urns t he same decision as minimax – Prunes branches t hat cannot inf luence f inal decision 17 18 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart 3

  4. Alpha-Bet a Pruning Alpha-Bet a example • Alpha: [-inf, inf] MAX – Value of best (highest value) choice we have f ound so f ar on t he pat h f or MAX • Bet a: MI N [-inf, 3] – Value of best (lowest value) choice we have f ound so f ar on pat h f or MI N • Updat e alpha and bet a as sear ch cont inues • Prune as soon as t he value of t he current node is known t o be worse t han current alpha or 3 bet a values f or MAX or MI N 19 20 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Alpha-Bet a example Alpha-Bet a example [-inf,inf] [3,inf] MAX MAX MI N MI N [-inf,3] [3,3] 3 12 3 12 8 21 22 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Alpha-Bet a example Alpha-Bet a example [3,inf] [3,inf] MAX MAX MI N MI N [3,3] [-inf,2] [3,3] [-inf,2] P r une r emaining children 3 12 2 3 12 2 8 8 23 24 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart 4

  5. Alpha-Bet a example Alpha-Bet a example MAX [3,14] MAX [3,5] MI N MI N [-inf,14] [-inf,5] [3,3] [-inf,2] [3,3] [-inf,2] 3 12 2 14 3 12 2 14 5 8 8 25 26 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Alpha-Bet a example Propert ies of Alpha-Bet a • Pruning does not af f ect t he f inal result [3,3] MAX – You prune part s of t he t ree t hat you would never reach in act ual play MI N [2,2] [3,3] [-inf,2] • The order in which moves are evaluat ed are import ant – Wit h bad move ordering will prune not hing – Wit h perf ect node ordering can reduce t ime complexit y t o O(b m/ 2 ) 2 3 12 2 14 5 8 27 28 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart Real-t ime decisions Evaluat ion f unct ions • Alpha-bet a can be a huge improvement • Apply an evaluat ion f unct ion t o a st at e over minimax – I f t erminal st at e, f unct ion ret urns act ual – St ill not good enough as we need t o search ut ilit y all t he way t o t erminal st at es f or at least – I f non-t erminal, f unct ion ret ur ns est imat e part of sear ch space of t he expect ed ut ilit y (i.e. t he chance of – Need t o make a decision about a move winning f rom t hat st at e) quickly – Funct ion must be f ast t o comput e • Heurist ic evaluat ion f unct ion + cut of f t est 29 30 CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart CS486/686 Lecture Slides (c) 2005 K. Larson and P. Poupart 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend