CS 730/830: Intro AI Adversarial Search 1 handout: slides You - - PowerPoint PPT Presentation

cs 730 830 intro ai
SMART_READER_LITE
LIVE PREVIEW

CS 730/830: Intro AI Adversarial Search 1 handout: slides You - - PowerPoint PPT Presentation

CS 730/830: Intro AI Adversarial Search 1 handout: slides You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program. Wheeler Ruml (UNH) Lecture 7, CS 730 1 / 19


slide-1
SLIDE 1

CS 730/830: Intro AI

Adversarial Search

Wheeler Ruml (UNH) Lecture 7, CS 730 – 1 / 19

1 handout: slides You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program.

slide-2
SLIDE 2

EOLQs

Adversarial Search

Wheeler Ruml (UNH) Lecture 7, CS 730 – 2 / 19

slide-3
SLIDE 3

Adversarial Search

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 3 / 19

slide-4
SLIDE 4

Planning Problems

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 4 / 19

Observability: complete, partial, hidden State: discrete, continuous Actions: deterministic, stochastic, discrete, continuous Nature: static, deterministic, stochastic Interaction:

  • ne decision, sequential

Time: static/off-line, on-line, discrete, continuous Percepts: discrete, continuous, uncertain Others: solo, cooperative, competitive

slide-5
SLIDE 5

Multi-agent is Different

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 5 / 19

Shortest-path (M&C, vacuum, tile puzzle)

want least-cost path to goal at unkown depth

Decisions with an adversary (chess, tic-tac-toe)

adversary might prevent path to best goal

want best assured outcome assuming rational opponent

irrational opponent can only be worse

slide-6
SLIDE 6

Adversarial Search: Minimax

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 6 / 19

Each ply corresponds to half a move. Terminal states are labeled with value. incorrect version by Zermelo (1912) full treatment by von Neumann and Morgenstern (1944) Can also bound depth and use a static evaluation function on non-terminal states.

slide-7
SLIDE 7

Evaluation for Tic-tac-toe

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 7 / 19

A 3-length is a complete row, column, or diagonal. value of position = ∞ if win for me,

  • r

= −∞ if a win for you,

  • therwise

= # 3-lengths open for me − # 3-lengths open for you

slide-8
SLIDE 8

Tic-tac-toe: two-ply search

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 8 / 19

slide-9
SLIDE 9

Tic-tac-toe: second move

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 9 / 19

slide-10
SLIDE 10

Tic-tac-toe: third move

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 10 / 19

slide-11
SLIDE 11

Improving the Search

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 11 / 19

partial expansion, SEF

symmetry (‘transposition tables’)

search more ply as we have time (De Groot figure)

avoid unnecessary evaluations

slide-12
SLIDE 12

Break

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 12 / 19

asst 3

asst 4

projects! talk with me well before break

slide-13
SLIDE 13

Which Values are Necessary?

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 13 / 19

slide-14
SLIDE 14

α-β Pruning

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 14 / 19

α best outcome Max can force at previous decision on this path (init to −∞) β best outcome Min can force at previous decision on this path (init to ∞) α and β values are copied down the tree (but not up). Minmax values are passed up the tree, as usual. John McCarthy (1956 but never published) simple version used by Newell, Shaw, and Simon (1958) published by Hart and Edwards (1961) proved correct and analyzed by Knuth and Moore (1975) proved optimal by Pearl (1982)

slide-15
SLIDE 15

α-β Pseudo-code

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 15 / 19

Max-value (state, α, β): when depth-cutoff (state), return SEF(state) for each child of state α ← max(α, Min-value (child, α, β)) when α ≥ β, return α return α Min-value (state, α, β): when depth-cutoff (state), return SEF(state) for each child of state β ← min(β, Max-value (child, α, β)) when β ≤ α, return β return β

slide-16
SLIDE 16

α-β in action

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 16 / 19

slide-17
SLIDE 17

Why α-β?

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 17 / 19

Time complexity of α-β is about O(bd/2)

slide-18
SLIDE 18

Progress on Games

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 18 / 19

Computers best: chess, checkers, backgammon, Scrabble, Jeopardy, Go Computers competitive: bridge, crosswords, poker Computers amateur: soccer?

slide-19
SLIDE 19

EOLQs

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 19 / 19

Please write down the most pressing question you have about the course material covered so far and put it in the box on your way out. Thanks!