game playing
play

Game Playing Tail end of Constraint Satisfaction Ch. 5.1-5.3, - PDF document

On to Games Game Playing Tail end of Constraint Satisfaction Ch. 5.1-5.3, 5.4.1, 5.5 Questions Game playing from reading? Framework Game trees Weve seen search problems Minimax where other agents moves Alpha-beta


  1. On to Games Game Playing • Tail end of Constraint Satisfaction Ch. 5.1-5.3, 5.4.1, 5.5 Questions • Game playing from reading? • Framework • Game trees We’ve seen search problems • Minimax where other agents’ moves • Alpha-beta pruning need to be taken into account – but what if they are • Adding randomness actively moving against us? Cynthia Matuszek – CMSC 671 1 Based on slides by Marie desJardin, Francisco Iacobelli 34 1 34 Why Games? State-of-the-art “A computer can’t be intelligent; one could never beat a human at ____” • Clear criteria for success • Chess : • Deep Blue beat Gary Kasparov in 1997 • Offer an opportunity to study problems involving • Garry Kasparav vs. Deep Junior (Feb 2003): tie! {hostile / adversarial / competing} agents. • Kasparov vs. X3D Fritz (November 2003): tie! • Interesting, hard problems which require minimal • Deep Fritz beat world champion Vladimir Kramnik (2006) setup • Now computers play computers • Often define very large search spaces • Checkers : “Chinook” (sigh), an AI program with a • chess 35 100 nodes in search tree, 10 40 legal states very large endgame database, is world champion, can provably never be beaten. Retired 1995. • Many problems can be formalized as games 35 36 35 36 AlphaGo Master defeated Ke Jie by three to zero during its 60 straight wins in the State-of-the-art online games at the end of 2016 and beginning of 2017. • Bridge : “Expert-level” AI, but no world champions • “computer bridge world champion Jack played seven top Dutch pairs … and two reigning European champions. • A total of 196 boards were played. Jack defeated three out of the seven pairs (including the Europeans). Overall, the program lost by a small margin (359 versus 385).” (2006) • Bridge is stochastic: the computer has imperfect information. “A computer can’t be intelligent; one • Go could never beat a human at ____” 37 wikipedia: Computer_bridge www.wired.com/2017/05/googles-alphago-levels-board-games-power-grids 37 38 1

  2. State-of-the-art: Go Typical Games • Computers finally got there: AlphaGo! • 2-person game • Made by Google DeepMind in London • Players alternate moves • 2015: Beat a professional Go player without handicaps • Easiest games are: • 2016: Beat a 9-dan professional without handicaps • Zero-sum : one player’s loss is the other’s gain • Fully observable : both players have access to complete • 2017: Beat Ke Jie, #1 human player information about the state of the game. • Deterministic : No chance (e.g., dice) involved • 2017: DeepMind published AlphaGo Zero • No human games data • Tic-Tac-Toe, Checkers, Chess, Go, Nim, Othello • Learns from playing itself • Not: Bridge, Solitaire, Backgammon, ... • Better than AlphaGo in 3 days of playing 39 44 39 44 How to Play (How to Search) How to Play (How to Search) • Obvious approach: • Key problems: • From current game state: • Representing the “board” (game state) 1. Consider all the legal moves you can make • We’ve seen that there are different ways to make these choices • Generating all legal next boards 2. Compute new position resulting from each move 3. Evaluate each resulting position • That can get ugly • Evaluating a position 4. Decide which is best 5. Make that move 6. Wait for your opponent to move 7. Repeat x 1 x 2 x 3 x 1 x 2 x 3 x 4 x 4 45 46 45 46 Evaluation Function Evaluation Function: The Idea • Evaluation function or static evaluator is used to • I am always trying to reach the highest value evaluate the “goodness” of a game position (state) • You are always trying to reach the lowest value • Zero-sum assumption allows one evaluation • Captures everyone’s goal in a single function function to describe goodness of a board for both • f ( n ) >> 0 : position n good for me and bad for you players • f ( n ) << 0 : position n bad for me and good for you • One player’s gain of n means the other loses n • f ( n ) = 0 ± ε : position n is a neutral position • How? • f ( n ) = + ∞ : win for me • f ( n ) = - ∞ : win for you 47 48 47 48 2

  3. Evaluation Function Examples Evaluation function examples • Example of an evaluation function for Tic-Tac-Toe: • Most evaluation functions are specified as a weighted sum of position features: • f ( n ) = [#3-lengths open for × ] - [#3-lengths open for O] • A 3-length is a complete row , column, or diagonal • f ( n ) = w 1 * feat 1 ( n ) + w 2 * feat 2 ( n ) + ... + w n * feat k ( n ) • Alan Turing’s function for chess • Example features for chess: piece count, piece placement, squares controlled, … • f ( n ) = w ( n )/ b ( n ) • w ( n ) = sum of the point value of white’s pieces • Deep Blue had over 8000 square control, rook-in-file, x- • b ( n ) = sum of black’s rays, king safety, pawn structure, features in its nonlinear passed pawns, ray control, outposts, pawn majority, rook on evaluation function! the 7 th blockade, restraint, trapped pieces, color complex, ... 49 50 49 50 Evaluation Function: the Idea Game trees I am maximizing f ( n ) on my turn • Problem spaces for • I am always trying to reach the highest value typical games are • You are always trying to reach the lowest value represented as trees Opponent is • Player must decide • Captures everyone’s goal in a single function minimizing f ( n ) best single move to • f ( n ) >> 0 : position n good for me and bad for you on their turn make next • f ( n ) << 0 : position n bad for me and good for you • Root node = current • f ( n ) = 0 ± ε : position n is a neutral position board configuration • f ( n ) = + ∞ : win for me • f ( n ) = - ∞ : win for you • Arcs = possible legal moves for a player 52 53 52 53 Game trees Minimax Procedure • Static evaluator function • Create start node: MAX node, current board state • Rates a board position • Expand nodes down to a depth of lookahead • f (board) = R, with f >0 for me, f <0 for you • Apply evaluation function at each leaf node • If it is my turn to move: • “Back up” values for each non-leaf node until a • Root is labeled “ MAX ” node value is computed for the root node • Otherwise it is a “ MIN ” node • MIN : backed-up value is lowest of children’s values ( opponent’s turn ) • MAX : backed-up value is highest of children’s values • Each level’s nodes are all MAX or all MIN • Pick operator associated with the child node whose backed-up value set the value at the root • Nodes at level i are opposite those at level i +1 54 55 54 55 3

  4. lookahead = 3 Minimax Algorithm max min 2 1 2 2 1 2 7 1 8 2 7 1 8 2 7 1 8 2 Static evaluator value 2 1 Can only choose MAX “best” move up to MIN lookahead 2 7 1 8 https://www.youtube.com/watch?v=6ELUvkSkCts 56 57 Example: Nim Partial Game Tree for Tic-Tac-Toe • In Nim, there are a certain number of objects (coins, sticks, etc.) on the table – we’ll play 7-coin Nim • Each player in turn has to pick up either one or two objects • Whoever picks up the last object loses • f (n) = +1 if position is a win for X. • f (n) = -1 if position is a win for O. • f (n) = 0 if position is a draw. 58 59 Minimax Tree Nim Game Tree MAX node • In-class exercise: • Draw minimax search tree for 4-coin Nim MIN node • Things to consider: • What’s your start state? • What’s the maximum depth of the tree? Minimum? • Pick up either one or two objects • Whoever picks up the last object loses value computed f value by minimax 61 60 61 4

  5. Nim Game Tree Games 2 Player 1 wins: +1 4 Player 2 wins: -1 3 2 Expectiminimax 2 1 1 0 Alpha-beta Pruning 1 0 0 0 0 62 63 62 63 Nim Game Tree Nim Game Tree Player 1 wins: +1 Player 1 wins: +1 4 Player 2 wins: -1 -1 4 Player 2 wins: -1 3 2 2 -1 3 -1 1 2 1 1 0 -1 1 -1 1 2 1 1 0 1 0 0 -1 0 -1 -1 1 0 1 0 -1 0 -1 -1 1 0 1 0 64 65 64 65 Improving Minimax Alpha-Beta Pruning • Basic problem: must examine a number of states • We can improve on the performance of the that is exponential in d ! minimax algorithm through alpha-beta pruning • Basic idea: “If you have an idea that is surely bad, don't take the • Solution: judicious pruning time to see how truly awful it is.” – Pat Winston of the search tree • We don’t need to compute ≤ 2 the value at this node. MAX • “Cut off” whole sections that • No matter what it is, it can’t can’t be part of the best solution MIN = 2 ≤ 1 affect the value of the root • Or, sometimes, probably won’t node. • Can be a completeness vs. efficiency tradeoff, esp. in • Because the MAX player MAX stochastic problem spaces will choose this value. 2 7 1 ? 67 66 67 5

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend