CS 730/830: Intro AI Adversarial Search 1 handout: slides You - - PowerPoint PPT Presentation

▶

Sep 06, 2023 208 likes •413 views

CS 730/830: Intro AI Adversarial Search 1 handout: slides You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program. Wheeler Ruml (UNH) Lecture 7, CS 730 1 / 19

SLIDE 1

CS 730/830: Intro AI

Adversarial Search

Wheeler Ruml (UNH) Lecture 7, CS 730 – 1 / 19

1 handout: slides You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program.

SLIDE 2

EOLQs

Adversarial Search

Wheeler Ruml (UNH) Lecture 7, CS 730 – 2 / 19

SLIDE 3

Adversarial Search

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 3 / 19

SLIDE 4

Planning Problems

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 4 / 19

Observability: complete, partial, hidden State: discrete, continuous Actions: deterministic, stochastic, discrete, continuous Nature: static, deterministic, stochastic Interaction:

ne decision, sequential

Time: static/off-line, on-line, discrete, continuous Percepts: discrete, continuous, uncertain Others: solo, cooperative, competitive

SLIDE 5

Multi-agent is Different

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 5 / 19

■

Shortest-path (M&C, vacuum, tile puzzle)

◆

want least-cost path to goal at unkown depth

■

Decisions with an adversary (chess, tic-tac-toe)

◆

adversary might prevent path to best goal

◆

want best assured outcome assuming rational opponent

◆

irrational opponent can only be worse

SLIDE 6

Adversarial Search: Minimax

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 6 / 19

Each ply corresponds to half a move. Terminal states are labeled with value. incorrect version by Zermelo (1912) full treatment by von Neumann and Morgenstern (1944) Can also bound depth and use a static evaluation function on non-terminal states.

SLIDE 7

Evaluation for Tic-tac-toe

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 7 / 19

A 3-length is a complete row, column, or diagonal. value of position = ∞ if win for me,

= −∞ if a win for you,

therwise

= # 3-lengths open for me − # 3-lengths open for you

SLIDE 8

Tic-tac-toe: two-ply search

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 8 / 19

SLIDE 9

Tic-tac-toe: second move

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 9 / 19

SLIDE 10

Tic-tac-toe: third move

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 10 / 19

SLIDE 11

Improving the Search

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 11 / 19

■

partial expansion, SEF

■

symmetry (‘transposition tables’)

■

search more ply as we have time (De Groot figure)

■

avoid unnecessary evaluations

SLIDE 12

Break

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 12 / 19

■

asst 3

■

asst 4

■

projects! talk with me well before break

SLIDE 13

Which Values are Necessary?

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 13 / 19

SLIDE 14

α-β Pruning

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 14 / 19

α best outcome Max can force at previous decision on this path (init to −∞) β best outcome Min can force at previous decision on this path (init to ∞) α and β values are copied down the tree (but not up). Minmax values are passed up the tree, as usual. John McCarthy (1956 but never published) simple version used by Newell, Shaw, and Simon (1958) published by Hart and Edwards (1961) proved correct and analyzed by Knuth and Moore (1975) proved optimal by Pearl (1982)

SLIDE 15

α-β Pseudo-code

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 15 / 19

Max-value (state, α, β): when depth-cutoff (state), return SEF(state) for each child of state α ← max(α, Min-value (child, α, β)) when α ≥ β, return α return α Min-value (state, α, β): when depth-cutoff (state), return SEF(state) for each child of state β ← min(β, Max-value (child, α, β)) when β ≤ α, return β return β

SLIDE 16

α-β in action

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 16 / 19

SLIDE 17

Why α-β?

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 17 / 19

Time complexity of α-β is about O(bd/2)

SLIDE 18

Progress on Games

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 18 / 19

Computers best: chess, checkers, backgammon, Scrabble, Jeopardy, Go Computers competitive: bridge, crosswords, poker Computers amateur: soccer?

SLIDE 19

EOLQs

Adversarial Search ■ Problems ■ Different! ■ Minimax ■ Tic-tac-toe ■ Improvements ■ Break ■ α-β Pruning ■ α-β Pseudo-code ■ Why α-β? ■ Progress ■ EOLQs

CS 730/830: Intro AI

Wheeler Ruml (UNH) Lecture 7, CS 730 – 1 / 19

1 handout: slides You think you know when you can learn, are more sure when you can write, even more when you can teach, but certain when you can program.

EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 2 / 19

Adversarial Search

Wheeler Ruml (UNH) Lecture 7, CS 730 – 3 / 19

Planning Problems

Wheeler Ruml (UNH) Lecture 7, CS 730 – 4 / 19

Observability: complete, partial, hidden State: discrete, continuous Actions: deterministic, stochastic, discrete, continuous Nature: static, deterministic, stochastic Interaction:

Time: static/off-line, on-line, discrete, continuous Percepts: discrete, continuous, uncertain Others: solo, cooperative, competitive

Multi-agent is Different

Wheeler Ruml (UNH) Lecture 7, CS 730 – 5 / 19

■

Shortest-path (M&C, vacuum, tile puzzle)

◆

want least-cost path to goal at unkown depth

■

Decisions with an adversary (chess, tic-tac-toe)

◆

adversary might prevent path to best goal

◆

want best assured outcome assuming rational opponent

◆

irrational opponent can only be worse

Adversarial Search: Minimax

Wheeler Ruml (UNH) Lecture 7, CS 730 – 6 / 19

Each ply corresponds to half a move. Terminal states are labeled with value. incorrect version by Zermelo (1912) full treatment by von Neumann and Morgenstern (1944) Can also bound depth and use a static evaluation function on non-terminal states.

Evaluation for Tic-tac-toe

Wheeler Ruml (UNH) Lecture 7, CS 730 – 7 / 19

A 3-length is a complete row, column, or diagonal. value of position = ∞ if win for me,

= −∞ if a win for you,

= # 3-lengths open for me − # 3-lengths open for you

Tic-tac-toe: two-ply search

Wheeler Ruml (UNH) Lecture 7, CS 730 – 8 / 19

Tic-tac-toe: second move

Wheeler Ruml (UNH) Lecture 7, CS 730 – 9 / 19

Tic-tac-toe: third move

Wheeler Ruml (UNH) Lecture 7, CS 730 – 10 / 19

Improving the Search

Wheeler Ruml (UNH) Lecture 7, CS 730 – 11 / 19

■

partial expansion, SEF

■

symmetry (‘transposition tables’)

■

search more ply as we have time (De Groot figure)

■

avoid unnecessary evaluations

Break

Wheeler Ruml (UNH) Lecture 7, CS 730 – 12 / 19

■

asst 3

■

asst 4

■

projects! talk with me well before break

Which Values are Necessary?

Wheeler Ruml (UNH) Lecture 7, CS 730 – 13 / 19

α-β Pruning

Wheeler Ruml (UNH) Lecture 7, CS 730 – 14 / 19

α-β Pseudo-code

Wheeler Ruml (UNH) Lecture 7, CS 730 – 15 / 19

α-β in action

Wheeler Ruml (UNH) Lecture 7, CS 730 – 16 / 19

Why α-β?

Wheeler Ruml (UNH) Lecture 7, CS 730 – 17 / 19

Time complexity of α-β is about O(bd/2)

Progress on Games

Wheeler Ruml (UNH) Lecture 7, CS 730 – 18 / 19

Computers best: chess, checkers, backgammon, Scrabble, Jeopardy, Go Computers competitive: bridge, crosswords, poker Computers amateur: soccer?

EOLQs

Wheeler Ruml (UNH) Lecture 7, CS 730 – 19 / 19

Please write down the most pressing question you have about the course material covered so far and put it in the box on your way out. Thanks!