Search Algorithms 3 AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 - - PowerPoint PPT Presentation

search algorithms
SMART_READER_LITE
LIVE PREVIEW

Search Algorithms 3 AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 - - PowerPoint PPT Presentation

Search Algorithms 3 AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 1 3 Search Algorithms 3.1 Problem-solving agents 3.2 Basic search algorithms 3.3 Heuristic search Greedy search A search 3.4 Local search Hill-climbing


slide-1
SLIDE 1

Search Algorithms

3

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 1

slide-2
SLIDE 2

3 Search Algorithms 3.1 Problem-solving agents 3.2 Basic search algorithms 3.3 Heuristic search

  • Greedy search • A∗ search

3.4 Local search

  • Hill-climbing • Simulated annealing∗ • Genetic algorithms∗

3.5 Online search∗ 3.6 Adversarial search

  • minimax decisions • α–β pruning • Monte Carlo tree search∗

3.7 Metaheuristic∗

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 2

slide-3
SLIDE 3

Problem-solving agents

Problem-solving agents: finding sequences of actions that lead to desirable states (goal-based) state: some description of the current world states – abstracted for problem solving as state space Goal: a set of world states Action: transition between world states Search: the algorithm takes a problem as input and returns a solution in the form of an action sequence

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 3

slide-4
SLIDE 4

Problem-solving agents

function Simple-Problem-Solving-Agent( p) returns an action s, an action sequence, initially empty state, some description of the current world state g, a goal, initially null problem, a problem formulation state ← Update-State(state, p) if s is empty then g ← Formulate-Goal(state) problem ← Formulate-Problem(state, g) s ← Search( problem) if s=failure then return a null action action ← First(s, state) s ← Rest(s, state) return action

Note: offline vs. online problem solving

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 4

slide-5
SLIDE 5

Example: Romania

On holiday in Romania; currently in Arad Flight leaves tomorrow from Bucharest Formulate goal: be in Bucharest Formulate problem: states: various cities actions: drive between cities Find solution: sequence of cities, e.g., Arad, Sibiu, Fagaras, Bucharest

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 5

slide-6
SLIDE 6

Example: Romania

Giurgiu Urziceni Hirsova Eforie Neamt Oradea Zerind Arad Timisoara Lugoj Mehadia Dobreta Craiova Sibiu Fagaras Pitesti Vaslui Iasi Rimnicu Vilcea Bucharest 71 75 118 111 70 75 120 151 140 99 80 97 101 211 138 146 85 90 98 142 92 87 86

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 6

slide-7
SLIDE 7

Problem types

Deterministic, fully observable = ⇒ single-state problem Agent knows exactly which state it will be in; solution is a sequence Non-observable = ⇒ conformant problem Agent may have no idea where it is; solution (if any) is a sequence Nondeterministic and/or partially observable = ⇒ contingency prob- lem percepts provide new information about current state solution is a contingent plan or a policy

  • ften interleave search, execution

Unknown state space = ⇒ exploration problem (“online”)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 7

slide-8
SLIDE 8

Example: vacuum world

Single-state, start in #5. Solution??

1 2 3 4 5 6 7 8

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 8

slide-9
SLIDE 9

Example: vacuum world

Single-state, start in #5. Solution?? [Right, Suck] Conformant, start in {1, 2, 3, 4, 5, 6, 7, 8} e.g., Right goes to {2, 4, 6, 8}. Solution??

1 2 3 4 5 6 7 8

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 9

slide-10
SLIDE 10

Example: vacuum world

Single-state, start in #5. Solution?? [Right, Suck] Conformant, start in {1, 2, 3, 4, 5, 6, 7, 8} e.g., Right goes to {2, 4, 6, 8}. Solution?? [Right, Suck, Left, Suck] Contingency, start in #5 Murphy’s Law: Suck can dirty a clean carpet Local sensing: dirt, location only Solution??

1 2 3 4 5 6 7 8

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 10

slide-11
SLIDE 11

Example: vacuum world

Single-state, start in #5. Solution?? [Right, Suck] Conformant, start in {1, 2, 3, 4, 5, 6, 7, 8} e.g., Right goes to {2, 4, 6, 8}. Solution?? [Right, Suck, Left, Suck] Contingency, start in #5 Murphy’s Law: Suck can dirty a clean carpet Local sensing: dirt, location only. Solution?? [Right, if dirt then Suck]

1 2 3 4 5 6 7 8

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 11

slide-12
SLIDE 12

Problem formulation

A problem is defined formally by five coomponents: initial state that the agent starts – any state s ∈ S (set of states), the initial state S0 ∈ S e.g., In(Arad) (“at Arad”) actions: Given a state s, Action(s) returns the set of actions that can be executed in s e.g., from the state In(Arad), the applicable actions are {Go(Sibiu), Go(Timisoara), Go(Zerind)} transition model: a function Result(s, a) (or Do(a, s)) that re- turns the state that results from doing action a in the state s; – also use the term successor to refer to any state reachable from a given state by a single action e.g., Result(In(Arad), Go(Zerind)) = In(Zerind)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 12

slide-13
SLIDE 13

Problem formulation

goal test, can be explicit, e.g., x = In(Bucharest) implicit, e.g., NoDirt(x) path cost: function that assigns a numeric cost to each path e.g., sum of distances, number of actions executed, etc. c(s, a, s′) is the step cost, assumed to be ≥ 0 A solution is a sequence of actions – [a1, a2, · · · , an] leading from the initial state to a goal state

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 13

slide-14
SLIDE 14

Example: vacuum world state space graph

R L S S S S R L R L R L S S S S L L L L R R R R

states?? actions?? goal test?? path cost??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 14

slide-15
SLIDE 15

Example: vacuum world state space graph

R L S S S S R L R L R L S S S S L L L L R R R R

states??: integer dirt and robot locations (ignore dirt amounts etc.) actions?? goal test?? path cost??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 15

slide-16
SLIDE 16

Example: vacuum world state space graph

R L S S S S R L R L R L S S S S L L L L R R R R

states??: integer dirt and robot locations (ignore dirt amounts etc.) actions??: Left, Right, Suck, NoOp goal test?? path cost??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 16

slide-17
SLIDE 17

Example: vacuum world state space graph

R L S S S S R L R L R L S S S S L L L L R R R R

states??: integer dirt and robot locations (ignore dirt amounts etc.) actions??: Left, Right, Suck, NoOp goal test??: no dirt path cost??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 17

slide-18
SLIDE 18

Example: vacuum world state space graph

R L S S S S R L R L R L S S S S L L L L R R R R

states??: integer dirt and robot locations (ignore dirt amounts etc.) actions??: Left, Right, Suck, NoOp goal test??: no dirt path cost??: 1 per action (0 for NoOp)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 18

slide-19
SLIDE 19

Example: the 8-puzzle

Start State Goal State

2 4 5 6 7 8 1 2 3 4 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 5

states?? actions?? goal test?? path cost??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 19

slide-20
SLIDE 20

Example: the 8-puzzle

Start State Goal State

2 4 5 6 7 8 1 2 3 4 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 5

states??: integer locations of tiles (ignore intermediate positions) actions?? goal test?? path cost??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 20

slide-21
SLIDE 21

Example: the 8-puzzle

Start State Goal State

2 4 5 6 7 8 1 2 3 4 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 5

states??: integer locations of tiles (ignore intermediate positions) actions??: move blank left, right, up, down (ignore unjamming etc.) goal test?? path cost??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 21

slide-22
SLIDE 22

Example: the 8-puzzle

Start State Goal State

2 4 5 6 7 8 1 2 3 4 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 5

states??: integer locations of tiles (ignore intermediate positions) actions??: move blank left, right, up, down (ignore unjamming etc.) goal test??: = goal state (given) path cost??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 22

slide-23
SLIDE 23

Example: the 8-puzzle

Start State Goal State

2 4 5 6 7 8 1 2 3 4 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 5

states??: integer locations of tiles (ignore intermediate positions) actions??: move blank left, right, up, down (ignore unjamming etc.) goal test??: = goal state (given) path cost??: 1 per move Note: optimal solution of n-Puzzle family is NP-hard

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 23

slide-24
SLIDE 24

Example: robotic assembly

R R R P R R

states??: real-valued coordinates of robot joint angles parts of the object to be assembled actions??: continuous motions of robot joints goal test??: complete assembly with no robot included path cost??: time to execute

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 24

slide-25
SLIDE 25

Basic (tree) search algorithms

Simulated (offline) exploration of state space of a tree by generating successors of already-explored states frontier: all the leaf nodes available for expansion at moment

function Tree-Search( problem) returns a solution, or failure initialize the frontier using the initial state of problem loop do if the frontier is empty then return failure choose a leaf node and remove it from the frontier by certain strategy if the node contains a goal state then return the corresponding solution expand the node and add the resulting nodes to the search tree

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 25

slide-26
SLIDE 26

Tree search example

Rimnicu Vilcea

Lugoj Zerind Sibiu Arad Fagaras Oradea Timisoara Arad Arad Oradea Arad

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 26

slide-27
SLIDE 27

Tree search example

Rimnicu Vilcea

Lugoj Arad Fagaras Oradea Arad Arad Oradea Zerind Arad Sibiu Timisoara

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 27

slide-28
SLIDE 28

Tree search example

Lugoj Arad Arad Oradea

Rimnicu Vilcea

Zerind Arad Sibiu Arad Fagaras Oradea Timisoara

Note: loopy path (repeated state) in the leftmost To avoid exploring redundant paths by using a data structure explored set – remembering every expanded node — newly generated nodes in the explored set can be discarded Graph ⇒ Tree

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 28

slide-29
SLIDE 29

Implementation: states vs. nodes

A state is a (representation of) a physical configuration A node is a data structure constituting part of a search tree

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Node

STATE PARENT ACTION = Right PATH-COST = 6

Notation (.) for data structures – n.State: state (in the state space) corresponds to the node – n.Parent: node (in the search tree that generated this node) – n.Action: action applied to the parent to generate the node – n.Path-Cost: cost, g(x), of path from initial state to n

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 29

slide-30
SLIDE 30

Search strategies

A strategy is defined by picking the order of node expansion Strategies are evaluated along the following dimensions: completeness—does it always find a solution if one exists? time complexity—number of nodes generated/expanded space complexity—maximum number of nodes in memory

  • ptimality—does it always find a least-cost solution?

Time and space complexity are measured in terms of b—maximum branching factor of the search tree d—depth of the least-cost solution m—maximum depth of the state space (may be ∞)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 30

slide-31
SLIDE 31

Uninformed search strategies

Uninformed (blind) strategies use only the information available in the problem definition

  • Breadth-first search
  • Uniform-cost search
  • Depth-first search
  • Depth-limited search
  • Iterative deepening search

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 31

slide-32
SLIDE 32

Breadth-first search

function Breadth-First-Search( problem) returns a sltn., or failures node ← a node with State=problem.Initial-State,Path-Cost=0 if problem.Goal-Test(node.State) then reture Solution(node) frontier ← a FIFO queue with node as the only element explored ← an empty set loop do if Empty?(frontier) then reture failure node ← Pop(frontier) add node.State to explored for each action in problem.Action(node.State) do child ← Child-Node(problem,node,action) if child.State is not in explored or frontier then if problem.Goal-Test(child.State) then reture Sltn.(child) frontier ← Insert(child,frontier)

Notation (.) for abbreviations of the statements

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 32

slide-33
SLIDE 33

Breadth-first search

Expand shallowest unexpanded node Implementation frontier = FIFO (First-In-First-Out) queue, i.e., new successors go at end

A B C D E F G

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 33

slide-34
SLIDE 34

Breadth-first search

Expand shallowest unexpanded node Implementation

A B C D E F G

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 34

slide-35
SLIDE 35

Breadth-first search

Expand shallowest unexpanded node Implementation

A B C D E F G

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 35

slide-36
SLIDE 36

Breadth-first search

Expand shallowest unexpanded node Implementation A B C D E F G

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 36

slide-37
SLIDE 37

Properties of breadth-first search

Complete??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 37

slide-38
SLIDE 38

Properties of breadth-first search

Complete?? Yes (if b is finite) Time??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 38

slide-39
SLIDE 39

Properties of breadth-first search

Complete?? Yes (if b is finite) Time?? 1 + b + b2 + b3 + . . . + bd + b(bd − 1) = O(bd+1) i.e., exp. in d Space??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 39

slide-40
SLIDE 40

Properties of breadth-first search

Complete?? Yes (if b is finite) Time?? 1 + b + b2 + b3 + . . . + bd + b(bd − 1) = O(bd+1) i.e., exp. in d Space?? O(bd+1) (keeps every node in memory) Optimal??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 40

slide-41
SLIDE 41

Properties of breadth-first search

Complete?? Yes (if b is finite) Time?? 1 + b + b2 + b3 + . . . + bd + b(bd − 1) = O(bd+1) i.e., exp. in d Space?? O(bd+1) (keeps every node in memory) Optimal?? Yes (if cost = 1 per step); not optimal in general (the shallowest goal node is not necessarily optimal)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 41

slide-42
SLIDE 42

Properties of breadth-first search

Complete?? Yes (if b is finite) Time?? 1 + b + b2 + b3 + . . . + bd + b(bd − 1) = O(bd+1) i.e., exp. in d Space?? O(bd+1) (keeps every node in memory) Optimal?? Yes (if cost = 1 per step); not optimal in general O(bd): d = 16 ← b = 1, 1 million nodes/second, 1000 bytes/node Time — 350 years Space — 10 exabytes – Space is the big problem; can easily generate nodes at 100MB/sec, so 24hrs = 8640GB

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 42

slide-43
SLIDE 43

Uniform-cost search

Expand least-cost unexpanded node Implementation frontier = queue ordered by path cost, lowest first Equivalent to breadth-first if step costs all equal Complete?? Yes, if step cost ≥ ǫ Time?? # of nodes with g ≤ cost of optimal solution, O(b⌈C∗/ǫ⌉) where C∗ is the cost of the optimal solution Space?? # of nodes with g ≤ cost of optimal solution, O(b⌈C∗/ǫ⌉) Optimal?? Yes—nodes expanded in increasing order of g(n)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 43

slide-44
SLIDE 44

Uniform-cost search

O(b⌈C∗/ǫ⌉): – can be much greater than O(bd) (explore large trees involving large perhaps useful steps) – all step costs are equal, O(b⌈C∗/ǫ⌉) is just O(bd+1) UCS is similar to BFS – except that BFS stops as soon as it generates a goal whereas UCS examines all the nodes at the goal’s depth to see if one has a lower cost strictly more work by expanding nodes at depth d unnecessarily

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 44

slide-45
SLIDE 45

Depth-first search

Expand deepest unexpanded node Implementation frontier = LIFO (Last-In-First-Out) queue, i.e., put successors at front

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 45

slide-46
SLIDE 46

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 46

slide-47
SLIDE 47

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 47

slide-48
SLIDE 48

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 48

slide-49
SLIDE 49

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 49

slide-50
SLIDE 50

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 50

slide-51
SLIDE 51

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 51

slide-52
SLIDE 52

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 52

slide-53
SLIDE 53

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 53

slide-54
SLIDE 54

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 54

slide-55
SLIDE 55

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 55

slide-56
SLIDE 56

Depth-first search

Expand deepest unexpanded node Implementation

A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 56

slide-57
SLIDE 57

Properties of depth-first search

Complete??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 57

slide-58
SLIDE 58

Properties of depth-first search

Complete?? No: fails in infinite-depth spaces, spaces with loops Modify to avoid repeated states along path ⇒ complete in finite spaces Time??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 58

slide-59
SLIDE 59

Properties of depth-first search

Complete?? No: fails in infinite-depth spaces, spaces with loops Modify to avoid repeated states along path ⇒ complete in finite spaces Time?? O(bm): terrible if m is much larger than d but if solutions are dense, may be much faster than breadth-first Space??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 59

slide-60
SLIDE 60

Properties of depth-first search

Complete?? No: fails in infinite-depth spaces, spaces with loops Modify to avoid repeated states along path ⇒ complete in finite spaces Time?? O(bm): terrible if m is much larger than d but if solutions are dense, may be much faster than breadth-first Space?? O(bm), i.e., linear space Optimal??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 60

slide-61
SLIDE 61

Properties of depth-first search

Complete?? No: fails in infinite-depth spaces, spaces with loops Modify to avoid repeated states along path ⇒ complete in finite spaces Time?? O(bm): terrible if m is much larger than d but if solutions are dense, may be much faster than breadth- first Space?? O(bm), i.e., linear space Optimal?? No

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 61

slide-62
SLIDE 62

Depth-limited search

= depth-first search with depth limit l; cutoff no solution within l Recursive implementation

function Depth-Limited-Search( problem,limit) returns slnt/fail/cutoff return Recursive-DLS(Make-Node(problem.Initial-State),problem,limit) function Recursive-DLS(node,problem,limit) returns slnt/fail/cutoff if problem.Goal-Test(node.State) then return Solution(node) else if limit=0 then return cutoff else cutoff-occurred? ← false for each action in problem.Action(node.State) do child ← Child-Node(problem,node,action) result ← Recursive-DLS(child,problem,limit-1) if result = cutoff then cutoff-occurred? ← true else if result = failure then return result if cutoff-occurred? then return cutoff else return failure

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 62

slide-63
SLIDE 63

Iterative deepening search

function Iterative-Deepening-Search( problem) returns slnt,or fail for depth=0 to ∞ do result ← Depth-Limited-Search( problem, depth) if result = cutoff then return result end

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 63

slide-64
SLIDE 64

Iterative deepening search l = 0

Limit = 0

A A

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 64

slide-65
SLIDE 65

Iterative deepening search l = 1

Limit = 1

A B C A B C A B C A B C

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 65

slide-66
SLIDE 66

Iterative deepening search l = 2

Limit = 2

A B C D E F G A B C D E F G A B C D E F G A B C D E F G A B C D E F G A B C D E F G A B C D E F G A B C D E F G

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 66

slide-67
SLIDE 67

Iterative deepening search l = 3

Limit = 3

A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H I J K L M N O A B C D E F G H J K L M N O I A B C D E F G H I J K L M N O

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 67

slide-68
SLIDE 68

Properties of iterative deepening search

Complete??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 68

slide-69
SLIDE 69

Properties of iterative deepening search

Complete?? Yes Time??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 69

slide-70
SLIDE 70

Properties of iterative deepening search

Complete?? Yes Time?? (d + 1)b0 + db1 + (d − 1)b2 + . . . + bd = O(bd) Space??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 70

slide-71
SLIDE 71

Properties of iterative deepening search

Complete?? Yes Time?? (d + 1)b0 + db1 + (d − 1)b2 + . . . + bd = O(bd) Space?? O(bd) Optimal??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 71

slide-72
SLIDE 72

Properties of iterative deepening search

Complete?? Yes Time?? (d + 1)b0 + db1 + (d − 1)b2 + . . . + bd = O(bd) Space?? O(bd) Optimal?? Yes, if step cost = 1 Can be modified to explore uniform-cost tree Numerical comparison for b = 10 and d = 5, solution at far right leaf: N(IDS) = 50 + 400 + 3, 000 + 20, 000 + 100, 000 = 123, 450 N(BFS) = 10 + 100 + 1, 000 + 10, 000 + 100, 000 + 999, 990 = 111, 100 IDS does better because other nodes at depth d are not expanded BFS can be modified to apply goal test when a node is generated

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 72

slide-73
SLIDE 73

Bidirection search

Idea: run two simultanaeous searches – hoping that two searches meet in the middle – one forward from the initial state – another backward from the goal 2bd/2 is much less than bd Implementation: check to see whether the frontiers of the two searches intersect; – if they do, a solution has been found The first solution found may not be optimal; some additional search is required to make there is not another short-cut across the gap Both-ends-against-the-middle (BEATM) endeavors to combine the best features of top-down and bottom-up designs into one process

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 73

slide-74
SLIDE 74

Summary of algorithms

Criterion Breadth- Uniform- Depth- Depth- Iterative First Cost First Limited Deepening Complete? Yes∗ Yes∗ No Yes, if l ≥ d Yes Time bd+1 b⌈C∗/ǫ⌉ bm bl bd Space bd+1 b⌈C∗/ǫ⌉ bm bl bd Optimal? Yes∗ Yes No No Yes∗

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 74

slide-75
SLIDE 75

Graph search: repeated states

Failure to detect repeated states can turn a linear problem into an exponential one

A B C D A B B C C C C

d + 1 states space ⇒ 2d paths All the tree-search versions of algorithms can be extended to the graph-search versions by checking the repeated states

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 75

slide-76
SLIDE 76

Graph search

function Graph-Search( problem) returns a solution, or failure initialize the frontier using the initial state of problem initialize the explored set to empty loop do if the frontier is empty then return failure choose a leaf node and remove it from the frontier if the node contains a goal state then return the corresponding solution add the node to the explored set expand the chosen node and add the resulting nodes to the frontier

  • nly if not in the frontier or explored set

Note: using explored set to avoid exploring redundant paths

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 76

slide-77
SLIDE 77

Heuristic Search

Informed (heuristic) strategies use problem-specific knowledge to find solution more efficiently Best-first search: use an evaluation function for each node – Eval-Fn: estimate of “desirability” ⇒ Expand most desirable unexpanded node Implementation QueueingFn = insert successors in decreasing order of desirability Special cases – greedy (best-first) search – A∗ search

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 77

slide-78
SLIDE 78

Best-first search

function Best-First-Search( problem, Eval-Fn) returns a solution sequence inputs: problem, a problem Eval-Fn, an evaluation function Queueing-Fn ← a function that orders nodes by Eval-Fn return General-Search( problem, Queueing-Fn)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 78

slide-79
SLIDE 79

Romania with step costs in km

Bucharest Giurgiu Urziceni Hirsova Eforie Neamt Oradea Zerind Arad Timisoara Lugoj Mehadia Dobreta Craiova Sibiu Fagaras Pitesti Rimnicu Vilcea Vaslui Iasi

Straight−line distance to Bucharest 160 242 161 77 151 241 366 193 178 253 329 80 199 244 380 226 234 374 98

Giurgiu Urziceni Hirsova Eforie Neamt Oradea Zerind Arad Timisoara Lugoj Mehadia Dobreta Craiova Sibiu Fagaras Pitesti Vaslui Iasi Rimnicu Vilcea Bucharest

71 75 118 111 70 75 120 151 140 99 80 97 101 211 138 146 85 90 98 142 92 87 86

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 79

slide-80
SLIDE 80

Greedy search

Evaluation function h(n) (heuristic) = estimate of cost from n to the closest goal E.g., hSLD(n) = straight-line distance from n to Bucharest Greedy search expands the node that appears to be closest to goal

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 80

slide-81
SLIDE 81

Greedy search example

Arad 366

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 81

slide-82
SLIDE 82

Greedy search example

Zerind Arad Sibiu Timisoara 253 329 374

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 82

slide-83
SLIDE 83

Greedy search example

Rimnicu Vilcea

Zerind Arad Sibiu Arad Fagaras Oradea Timisoara 329 374 366 176 380 193

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 83

slide-84
SLIDE 84

Greedy search example

Rimnicu Vilcea

Zerind Arad Sibiu Arad Fagaras Oradea Timisoara Sibiu Bucharest 329 374 366 380 193 253

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 84

slide-85
SLIDE 85

Properties of greedy search

Complete??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 85

slide-86
SLIDE 86

Properties of greedy search

Complete?? No – can get stuck in loops, e.g., with Oradea as goal, Iasi → Neamt → Iasi → Neamt → (hSLD(n)) Complete in finite space with repeated-state checking Time??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 86

slide-87
SLIDE 87

Properties of greedy search

Complete?? No–can get stuck in loops, e.g., Iasi → Neamt → Iasi → Neamt → Complete in finite space with repeated-state checking Time?? O(bm), but a good heuristic can give dramatic improvement Space??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 87

slide-88
SLIDE 88

Properties of greedy search

Complete?? No–can get stuck in loops, e.g., Iasi → Neamt → Iasi → Neamt → Complete in finite space with repeated-state checking Time?? O(bm), but a good heuristic can give dramatic improvement Space?? O(bm)—keeps all nodes in memory Optimal??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 88

slide-89
SLIDE 89

Properties of greedy search

Complete?? No–can get stuck in loops, e.g., Iasi → Neamt → Iasi → Neamt → Complete in finite space with repeated-state checking Time?? O(bm), but a good heuristic can give dramatic improvement Space?? O(bm)—keeps all nodes in memory Optimal?? No

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 89

slide-90
SLIDE 90

A∗ search

Idea: avoid expanding paths that are already expensive Evaluation function f(n) = g(n) + h(n) g(n) = cost so far to reach n h(n) = estimated cost to goal from n f(n) = estimated total cost of path through n to goal Algorithm: identical to Uniform-Cost-Search except for using g + h instead of g A∗ search uses an admissible heuristic i.e., h(n) ≤ h∗(n) where h∗(n) is the true cost from n (Also require h(n) ≥ 0, so h(G) = 0 for any goal G) E.g., hSLD(n) never overestimates the actual road distance Theorem: A∗ search is optimal

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 90

slide-91
SLIDE 91

A∗ search example

Arad 366=0+366

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 91

slide-92
SLIDE 92

A∗ search example

Zerind Arad Sibiu Timisoara 447=118+329 449=75+374 393=140+253

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 92

slide-93
SLIDE 93

A∗ search example

Zerind Arad Sibiu Arad Timisoara

Rimnicu Vilcea

Fagaras Oradea 447=118+329 449=75+374 646=280+366 413=220+193 415=239+176 671=291+380

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 93

slide-94
SLIDE 94

A∗ search example

Zerind Arad Sibiu Arad Timisoara Fagaras Oradea 447=118+329 449=75+374 646=280+366 415=239+176

Rimnicu Vilcea

Craiova Pitesti Sibiu 526=366+160 553=300+253 417=317+100 671=291+380

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 94

slide-95
SLIDE 95

A∗ search example

Zerind Arad Sibiu Arad Timisoara Sibiu Bucharest

Rimnicu Vilcea

Fagaras Oradea Craiova Pitesti Sibiu 447=118+329 449=75+374 646=280+366 591=338+253 450=450+0 526=366+160 553=300+253 417=317+100 671=291+380

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 95

slide-96
SLIDE 96

A∗ search example

Zerind Arad Sibiu Arad Timisoara Sibiu Bucharest

Rimnicu Vilcea

Fagaras Oradea Craiova Pitesti Sibiu Bucharest Craiova

Rimnicu Vilcea

418=418+0 447=118+329 449=75+374 646=280+366 591=338+253 450=450+0 526=366+160 553=300+253 615=455+160 607=414+193 671=291+380

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 96

slide-97
SLIDE 97

Optimality of A∗

Suppose some suboptimal goal G2 has been generated and is in the

  • queue. Let n be an unexpanded node on a shortest path to an optimal

goal G

G n G2 Start

f(G2) = g(G2) since h(G2) = 0 > g(G) since G2 is suboptimal ≥ f(n) since h is admissible Since f(G2) > f(n), A∗ will never select G2 for expansion

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 97

slide-98
SLIDE 98

Optimality of A∗

Lemma: A∗ expands nodes in order of increasing f value∗ Gradually adds “f-contours” of nodes (cf. breadth-first adds layers) Contour i has all nodes with f = fi, where fi < fi+1

O Z A T L M D C R F P G B U H E V I N

380 400 420

S

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 98

slide-99
SLIDE 99

Heuristic consistency

A heuristic is consistent if

n c(n,a,n’) h(n’) h(n) G n’

h(n) ≤ c(n, a, n′) + h(n′) If h is consistent, we have f(n′) = g(n′) + h(n′) = g(n) + c(n, a, n′) + h(n′) ≥ g(n) + h(n) = f(n) I.e., f(n) is nondecreasing along any path (proof of the lemma) Note – the consistency is stronger than the admissibility – the graph-search version of A∗ is optimal if h(n) is consistent – the inconsistent heuristics can be effective by enhancemence

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 99

slide-100
SLIDE 100

Properties of A∗

Complete??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 100

slide-101
SLIDE 101

Properties of A∗

Complete?? Yes, unless there are infinitely many nodes with f ≤ f(G) Time??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 101

slide-102
SLIDE 102

Properties of A∗

Complete?? Yes, unless there are infinitely many nodes with f ≤ f(G) Time?? Exponential in [relative error in h × length of soln.] – absolute error: ∆ = h∗ − h, relative error: ǫ = (h∗ − h)/h∗ – exponential in the maximum absolute error, O(b∆) – for constant step costs, O(bǫd), or O((bǫ)d) w.r.t. h∗ (Polynomial in various variants of the heuristics) Space??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 102

slide-103
SLIDE 103

Properties of A∗

Complete?? Yes, unless there are infinitely many nodes with f ≤ f(G) Time?? Exponential in [relative error in h × length of soln.] (Polynomial in various variants of the heuristics) Space?? Keeps all nodes in memory – usually running out of space long before running out of time – overcome space problem without sacrificing optimality or com- pleteness, at a small cost in execution time Optimal??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 103

slide-104
SLIDE 104

Properties of A∗

Complete?? Yes, unless there are infinitely many nodes with f ≤ f(G) Time?? Exponential in [relative error in h × length of soln.] (Polynomial in various variants of the heuristics) Space?? Keeps all nodes in memory Optimal?? Yes—cannot expand fi+1 until fi is finished (C∗ is the cost of the optimal solution path) A∗ expands all nodes with f(n) < C∗ A∗ expands some nodes with f(n) = C∗ A∗ expands no nodes with f(n) > C∗ prune – eliminating possibilities without having to examine

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 104

slide-105
SLIDE 105

Recursive best-first search

RBFS: recursive algorithm – best-first search, but using only linear space – similar to recursive depth-first search, but – – using the f-limit variable to keep track of the f-value of the best alternative path available from any ancestor of the current node Complete?? Yes, like to A∗ Time?? Exponential, depending both on the accuracy of f and on how often the best path changes as nodes are expanded Space?? linear in the depth of the deepest optimal solution Optimal?? Yes, like to A∗ (if h(n) is admissible)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 105

slide-106
SLIDE 106

Recursive best-first algorithm

function Recursive-Best-First-Search( problem) returns slnt./fail. return RBFS(problem,Make-Node(problem.Initial-State),∞) function RBFS(problem,node,f-limit) returns slnt/fail/a new f-cost limit if problem.Goal-Test(node.State) then return Solution(node) successors ← [ ] for each action in problem.Action(node.State) do add Child-Node(problem,node,action) into successors if successors is empty then return failure, ∞ for each s in successors do s.f ← max(s.g + s.h, node.f ) loop do best ← the lowest f-value node in successors if best.f > f-limit then return failure, best.f alternative ← the second-lowest f-value among successors result, best.f ← RBFS(problem,best,min(f-limit, alternative)) if result=failure then return result

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 106

slide-107
SLIDE 107

Admissible heuristics

E.g., for the 8-puzzle h1(n) = number of misplaced tiles h2(n) = total Manhattan distance (i.e., no. of squares from desired location of each tile)

Start State Goal State

2 4 5 6 7 8 1 2 3 4 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 5

h1(S) =?? h2(S) =??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 107

slide-108
SLIDE 108

Admissible heuristics

E.g., for the 8-puzzle: h1(n) = number of misplaced tiles h2(n) = total Manhattan distance (i.e., no. of squares from desired location of each tile)

Start State Goal State

2 4 5 6 7 8 1 2 3 4 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 5

h1(S) =?? 6 h2(S) =?? 4 + 0 + 3 + 3 + 1 + 0 + 2 + 1 = 14 New kind of distance??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 108

slide-109
SLIDE 109

Dominance

If h2(n) ≥ h1(n) for all n (both admissible) then h2 dominates h1 and is better for search Typical search costs: d = 14 IDS = 3,473,941 nodes A∗(h1) = 539 nodes A∗(h2) = 113 nodes d = 24 IDS ≈ 54,000,000,000 nodes A∗(h1) = 39,135 nodes A∗(h2) = 1,641 nodes Given any admissible heuristics ha, hb, h(n) = max(ha(n), hb(n)) is also admissible and dominates ha, hb

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 109

slide-110
SLIDE 110

Relaxed problems

Admissible heuristics can be derived from the exact solution cost of a relaxed version of the problem If the rules of the 8-puzzle are relaxed so that a tile can move any- where, then h1(n) gives the shortest solution If the rules are relaxed so that a tile can move to any adjacent square, then h2(n) gives the shortest solution Key point: the optimal solution cost of a relaxed problem is no greater than the optimal solution cost of the real problem

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 110

slide-111
SLIDE 111

Example: n-queens

Put n queens on an n × n board with no two queens on the same row, column, or diagonal Move a queen to reduce number of conflicts

h = 5 h = 2 h = 0

Almost always solves n-queens problems almost instantaneously for very large n, e.g., n = 1 million

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 111

slide-112
SLIDE 112

Local Search

Local search algorithms operate using a single current (rather than multiple paths) and generally move only to neighbors of that node Local search vs. global search – global search, including informed or uninformed search, system- atically explore paths from an initial state – global search problems: observable, deterministic, known envi- ronments – local search use very little memory and find reasonable solutions in larger or infinite (continuous) state spaces for which global search is unsuitable

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 112

slide-113
SLIDE 113

Local Search

Local search algorithms are useful for solving optimization prob- lems – the best estimate of “objective function”, e.g., reproductive fitness in nature by Darwinian evolution Special cases: – Hill-climbing (greedy local search) – Simulated annealing – Genetic algorithms

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 113

slide-114
SLIDE 114

Hill-climbing

Useful to consider state space landscape

current state

  • bjective function

state space

global maximum local maximum "flat" local maximum shoulder

Random-restart hill climbing overcomes local maxima — trivially com- plete (with probability approaching 1) Random sideways moves: escape from shoulders, but loop on flat maxima

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 114

slide-115
SLIDE 115

Hill-climbing

Like climbing a hill with amnesia (or gradient ascent/descent)

function Hill-Climbing( problem) returns a state that’s a local max. current ← Make-Node(problem.Initial-State) loop do neighbor ← a highest-valued successor of current if neighbor.Value≤ current.Value then return current.State current ← neighbor end

The algorithm halts if it reaches a plateau where the best successor has the same value as the current state

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 115

slide-116
SLIDE 116

Simulated annealing

Idea: escape local maxima by allowing some “bad” moves but gradually decrease their size and frequency

function Simulated-Annealing( problem, schedule) returns a soln state inputs: schedule, a mapping from time to “temperature” current ← Make-Node(problem.Initial-State) for t ← 1 to ∞ do T ← schedule[t] if T=0 then return current next ← a randomly selected successor of current ∆E ← next.Value – current.Value if ∆E > 0 then current ← next else current ← next only with probability e∆E/T

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 116

slide-117
SLIDE 117

Properties of simulated annealing

At fixed “temperature” T, state occupation probability reaches Boltzman distribution (see later in probabilistic distribuiton) p(x) = αe

E(x) kT

T decreased slowly enough = ⇒ always reach best state x∗ because e

E(x∗) kT /e E(x) kT = e E(x∗)−E(x) kT

≫ 1 for small T find a global optimum with probability approaching 1 Devised (Metropolis et al., 1953) for physical process modeling Simulated annealing is a field in itself, widely used in VLSI layout, airline scheduling, etc.

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 117

slide-118
SLIDE 118

Local beam search

Idea: keep k states instead of 1; choose top k of all their successors Not the same as k searches run in parallel Searches that find good states recruit other searches to join them Problem: quite often, all k states end up on same local hill Idea: choose k successors randomly, biased towards good ones Observe the close analogy to natural selection

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 118

slide-119
SLIDE 119

Genetic algorithms

GA= stochastic local beam search + generate successors from pairs

  • f states

32252124

Selection Cross−Over Mutation

24748552 32752411 24415124

24 23 20

32543213

11

29% 31% 26% 14%

32752411 24748552 32752411 24415124 32748552 24752411 32752124 24415411 24752411 32748152 24415417

Fitness Pairs

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 119

slide-120
SLIDE 120

Genetic algorithms

GAs require states encoded as strings (GPs use programs) Crossover helps iff substrings are meaningful components

+ =

GAs = evolution: e.g., real genes encode replication machinery Genetic programming (GP) is closely related to GAs Artificial Life (AL) moves one step further

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 120

slide-121
SLIDE 121

Local search in continuous state spaces

Suppose we want to site three airports in Romania: – 6-D state space defined by (x1, y2), (x2, y2), (x3, y3) – objective function f(x1, y2, x2, y2, x3, y3) = sum of squared distances from each city to nearest airport Discretization methods turn continuous space into discrete space, e.g., empirical gradient considers ±δ change in each coordinate Gradient methods compute ∇f =

    ∂f

∂x1 , ∂f ∂y1 , ∂f ∂x2 , ∂f ∂y2 , ∂f ∂x3 , ∂f ∂y3

   

to increase/reduce f, e.g., by x ← x + α∇f(x) Sometimes can solve for ∇f(x) = 0 exactly (e.g., with one city). Newton–Raphson (1664, 1690) iterates x ← x − H−1

f (x)∇f(x)

to solve ∇f(x) = 0, where Hij = ∂2f/∂xi∂xj Hint: Newton-Raphson method is an efficient local search

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 121

slide-122
SLIDE 122

Online search

Offline search algorithms compute a complete solution before exec. vs.

  • nline search ones interleave computation and action (processing in-

put data as they are received) – necessary for unknown environment (dynamic or semidynamic, and nonderterministic domains) ⇐ exploration problem

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 122

slide-123
SLIDE 123

Example: maze problem

An online search agent solves problem by executing actions, rather than by pure computation (offline)

G S

1 2 3 1 2 3

The competitive ratio – the total cost of the path that the agent actually travels (online cost) / that the agent would follow if it knew the search space in advance (offline cost) ⇐ as small as possible Online search expands nodes in a local order, say, DepthFirst and HillClimbing have exactly this property

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 123

slide-124
SLIDE 124

Online search agents

function Online-DFS-Agent( s′) returns an action inputs: s′, a percept (current state) persistent: result, a table indexed by state and action, initially empty untried, a table that lists the actions not yet tried (each state) unbacktracked, a table that lists the backtracks not yet tried s, a, the previous state and action, initially null if Goal-Test(s′) then reture stop if s′ is a new state (not in untried) then untried[s′] ← Action(s′) if s is not null then result[s,a] ← s′ add s to the front of unbacktraked[s′] if untried[s′] is empty then if unbacktracked[s′] is empth then return stop else a ← an action b s.t. result[s′,b]=Pop(unbacktracked[s′]) else a ← Pop(untried[s′]) s ← s′ return a

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 124

slide-125
SLIDE 125

Adversarial search

  • Games
  • Perfect play

– minimax decisions – α–β pruning

  • Imperfect play

– Monte Carlo tree search

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 125

slide-126
SLIDE 126

Games

Game as adversarial search “Unpredictable” opponent ⇒ solution is a strategy specifying a move for every possible opponent reply Time limits ⇒ unlikely to find goal, must approximate Plan of attack

  • Computer considers possible lines of play (Babbage, 1846)
  • Algorithm for perfect play (Zermelo, 1912; Von Neumann, 1944)
  • Finite horizon, approximate evaluation (Zuse, 1945; Wiener, 1948;

Shannon, 1950)

  • First chess program (Turing, 1951)
  • Machine learning to improve evaluation accuracy (Samuel, 1952–

57)

  • Pruning to allow deeper search (McCarthy, 1956)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 126

slide-127
SLIDE 127

Types of games

deterministic chance perfect information imperfect information chess, checkers, go, othello backgammon monopoly bridge, poker, scrabble nuclear war

Computer game Single game playing: program to play one game General Game Playing (GGP): program to play more than one game

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 127

slide-128
SLIDE 128

Perfect play

Perfect information: deterministic to each player, zero-sum games Games of chess Checkers → Othello → Chess(/Chinese Chess/Shogi) → Go Search state spaces are vast for Go/Chess – each state is an point of decision-making to move Go: Legal position (Tromp and Farneb¨ ack 2007) 3361 (empty/black/white, 19 × 19 board), about 1.2% legal rate 3361 × 0.01196 · · · = 2.08168199382 · · · × 10170 – the observable universe contains around 1080 atoms Possible to reduce the space as small enough as likely exhaustibly search??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 128

slide-129
SLIDE 129

Game tree (2-player, tic-tac-toe)

X X X X X X X X X MAX (X) MIN (O) X X O O O X O O O O O O O MAX (X) X O X O X O X X X X X X X MIN (O) X O X X O X X O X . . . . . . . . . . . . . . . . . . . . . TERMINAL X X −1 +1 Utility

Small state space ⇒ First win Go: a high branching factor (b ≈ 250), deep (d ≈ 150) tree

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 129

slide-130
SLIDE 130

Minimax

Perfect play for deterministic, perfect-information games Idea: choose move to position with the best minimax value = best achievable payoff, computing by the utility (assuming that both players play optimally to the end of the game, the minimax value of a terminal state is just its utility) E.g., 2-ply game

MAX

A B C D 3 12 8 2 4 6 14 5 2 3 2 2 3 a1 a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3

MIN

MAX – the highest minimax; MIN – the lowest minimax argmaxa∈Sf(a): computes the element a of set S that has the maximum value of f(a) (argmina∈Sf(a) for the minimum)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 130

slide-131
SLIDE 131

Minimax algorithm

function Minimax-Decision(state) returns an action return argmaxa∈Action(s)Min-Value(Result(state,a)) function Max-Value(state) returns a utility value if Terminal-Test(state) then return Utility(state) v ← −∞ for each a in Actions[state] do v ← Max(v,Min-Value(Result(s,a))) return v function Min-Value(state) returns a utility value if Terminal-Test(state) then return Utility(state) v ← ∞ for each a in Actions[state] do v ← Min(v,Max-Value(Result(s,a))) return v

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 131

slide-132
SLIDE 132

Properties of minimax

Complete??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 132

slide-133
SLIDE 133

Properties of minimax

Complete?? Only if tree is finite (chess has specific rules for this) (A finite strategy can exist even in an infinite tree) Optimal??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 133

slide-134
SLIDE 134

Properties of minimax

Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent Time complexity??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 134

slide-135
SLIDE 135

Properties of minimax

Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? O(bm) Space complexity??

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 135

slide-136
SLIDE 136

Properties of minimax

Complete?? Yes, if tree is finite (chess has specific rules for this) Optimal?? Yes, against an optimal opponent. Otherwise?? Time complexity?? O(bm) Space complexity?? O(bm) (depth-first exploration) For chess, b ≈ 35, m ≈ 100 for “reasonable” games ⇒ exhaustive search is infeasible For Go, b ≈ 250, m ≈ 150 But do we need to explore every path?

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 136

slide-137
SLIDE 137

α–β pruning example

MAX

3 12 8

MIN

3 3

The first leaf below MIN node has a value at most 3

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 137

slide-138
SLIDE 138

α–β pruning example

MAX

3 12 8

MIN

3 2 2 X X 3

The second leaf has a value of 12, MIN would avoid, and so is still at most 3

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 138

slide-139
SLIDE 139

α–β pruning example

MAX

3 12 8

MIN

3 2 2 X X 14 14 3

The third leaf has a value of 8, value of MIN is exactly 3 with all the successor states

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 139

slide-140
SLIDE 140

α–β pruning example

MAX

3 12 8

MIN

3 2 2 X X 14 14 5 5 3

The first leaf below the second MIN node has a value at most 2, but the first MIN node is worth 3, so MAX would never choose it and look at other successor states — an example of α-β pruning

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 140

slide-141
SLIDE 141

α–β pruning example

MAX

3 12 8

MIN

3 3 2 2 X X 14 14 5 5 2 2 3

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 141

slide-142
SLIDE 142

α–β pruning

.. .. .. MAX MIN MAX MIN V

α is the best value (to max) found so far off the current path If V is worse than α, max will avoid it ⇒ prune that branch Define β similarly for min

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 142

slide-143
SLIDE 143

The α–β algorithm

function Alpha-Beta-Pruning(state) returns an action v ← Max-Value(state,−∞,∞) return the action in Actions(state) with value v function Max-Value(state,α,β) returns a utility value if Terminal-Test(state) then return Utility(state) v ← −∞ for each a in Actions[state] do v ← Max(v,Min-Value(Result(s,a))) if v ≥ β then return v α ← Max(α,v) return v function Min-Value(state, game) returns a utility value return v /* similar to Max by reversing α by β /*

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 143

slide-144
SLIDE 144

Properties of α–β

Pruning does not affect final result Good move ordering improves effectiveness of pruning With “perfect ordering,” time complexity = O(bm/2) ⇒ doubles solvable depth A simple example of the value of reasoning about which computations are relevant (a form of metareasoning) Unfortunately, 3550 is still impossible Depth-first minimax search with α-β pruning achieved super-human performance in chess, checkers and othello, but not effective in Go

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 144

slide-145
SLIDE 145

Imperfect play

Resource limits – deterministic game may have imperfect information in real time

  • Use Cutoff-Test instead of Terminal-Test

e.g., depth limit

  • Use Eval instead of Utility

i.e., eval. function that estimates desirability of position Suppose we have 100 seconds, explore 104 nodes/second ⇒ 106 nodes per move ≈ 358/2 ⇒ α–β reaches depth 8 ⇒ pretty good chess program

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 145

slide-146
SLIDE 146

Evaluation functions

Black to move White slightly better White to move Black winning

For chess, typically linear weighted sum of features Eval(s) = w1f1(s) + w2f2(s) + . . . + wnfn(s) e.g., w1 = 9 with f1(s) = (number of white queens) – (number of black queens), etc.

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 146

slide-147
SLIDE 147

Evaluation functions

For Go, simply linear weighted sum of features EvalFn(s) = w1f1(s) + w2f2(s) + . . . + wnfn(s) e.g., for some state s, w1 = 9 with f1(s) = (number of Black good) – (number of White good), etc. Evaluation functions need human knowledge and are hard to design Go lacks any known reliable heuristic function – difficulty than (Chinese) Chess

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 147

slide-148
SLIDE 148

Deterministic (perfect information) games in practice

Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used an endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total

  • f 443,748,401,247 positions

Othello: human champions refuse to compete against computers, who are too good Chess: IBM Deep Blue defeated human world champion Gary Kas- parov in 1997. Deep Blue uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 148

slide-149
SLIDE 149

Deterministic games in practice

Go: Google Deep Mind AlphaGo – Defeated human world champion Lee Sedol in 2016 and Ke Jie in 2017 – AlphaGo Zero defeated the champion-defeating versions Al- phaGo Lee/Master in 2017 – – winning 100–0 – – 3 days learning without teacher AlphaZero: a GGP program – achieved within 24h a superhuman level of play in the games of Chess/Shogi/Go (defeated AlphaGo Zero) in Dec. 2017 Achievements:

  • “the God of the chess” of superhuman
  • self-learning without prior human knowledge (except the game

rules)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 149

slide-150
SLIDE 150

Deterministic games in practice

Chinese Chess: there is not seriously studied yet – in principle, the algorithm of AlphaZero can be directly used for Chess and similar deterministic games All deterministic games have being well defeated by AI

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 150

slide-151
SLIDE 151

Nondeterministic games: backgammon

1 2 3 4 5 6 7 8 9 10 11 12 24 23 22 21 20 19 18 17 16 15 14 13 25

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 151

slide-152
SLIDE 152

Nondeterministic games in general

In nondeterministic games, chance introduced by dice, card-shuffling Simplified example with coin-flipping:

MIN MAX

2

CHANCE

4 7 4 6 5 −2 2 4 −2 0.5 0.5 0.5 0.5 3 −1

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 152

slide-153
SLIDE 153

Algorithm for nondeterministic games

Expectiminimax gives perfect play Just like Minimax, except we must also handle chance nodes: . . . if state is a Max node then return the highest ExpectiMinimax-Value of Suc- cessors(state) if state is a Min node then return the lowest ExpectiMinimax-Value of Succes- sors(state) if state is a chance node then return average of ExpectiMinimax-Value of Succes- sors(state) . . .

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 153

slide-154
SLIDE 154

Nondeterministic (perfect information) games in practice

Dice rolls increase b: 21 possible rolls with 2 dice Backgammon ≈ 20 legal moves (can be 6,000 with 1-1 roll) depth 4 = 20 × (21 × 20)3 ≈ 1.2 × 109 As depth increases, probability of reaching a given node shrinks ⇒ value of lookahead is diminished α–β pruning is much less effective TDGammon uses depth-2 search + very good Eval ≈ world-champion level

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 154

slide-155
SLIDE 155

Games of imperfect information

E.g., card games, where opponent’s initial cards are unknown Typically we can calculate a probability for each possible deal Seems just like having one big dice roll at the beginning of the game Idea: compute the minimax value of each action in each deal, then choose the action with highest expected value over all deals Special case: if an action is optimal for all deals, it’s optimal Current best bridge program, approximates this idea by 1) generating 100 deals consistent with bidding information 2) picking the action that wins most tricks on average

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 155

slide-156
SLIDE 156

Monte Carlo tree search

MCTS – a heuristic, expanding the tree based on random sampling of the state space – like as depth-limited minimax (with α-β pruning) – interest due to its success in computer Go since 2006 Motivation: EvalFn ⇐ stochastic simulation

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 156

slide-157
SLIDE 157

MCTS

  • 1. Selection: starting at the root, a child is recursively selected to

descend through the tree until the most expandable node is reached

  • 2. Expansion: one (or more) child nodes are added to expand the

tree, according to the available actions

  • 3. Simulation: a simulation is run from the new node(s) according to

the default policy to produce an outcome (random playout or rollout)

  • 4. Backpropagation: the simulation result is “backed up” through

the selected nodes to update their statistics

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 157

slide-158
SLIDE 158

MCTS

  • Tree Policy: select or create a leaf node from the nodes already

contained within the search tree (selection and expansion) – The tree policy attempts to balance exploration (look in areas that have not been well sampled yet) and exploitation (look in areas which appear to be promising)

  • Default (Value) Policy: play out the domain from a given non-

terminal state to produce a value estimate (simulation or evalua- tion) – in the simplest case, to make uniform random moves

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 158

slide-159
SLIDE 159

MCTS algorithm

function MCTS-Search(state) returns an action create root node v0 with state s0 while within computational budget do vl ← TreePolicy(v0) ∆ ← DefaultPolicy(s(vl)) Backup(vl,∆) return a(BestChild(v0))

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 159

slide-160
SLIDE 160

Example: Alpha0

Go – MCTS had a dramatic effect on narrowing this gap, but is com- petitive only on small boards (say, 9 × 9), or weak amateur level player on the standard 19 × 19 board – Pachi: open-source Go program, using MCTS, ranked at ama- teur 2 dan on KGS, that executes 100,000 simulations per move

  • Ref. Rimmel. A et al., Current Frontiers in Computer Go, IEEE Trans. Comp. Intell. AI Games, vol. 2, no. 4,

2010

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 160

slide-161
SLIDE 161

Example: Alpha0

Alpha0 algorithm design

  • 1. combine deep learning in an MCTS algorithm

– a single DNN (deep neural network) for both police for breadth pruning, and value for depth pruning

  • 2. in each position, an MCTS search is executed guided by the DNN

with data by self-play reinforcement learning without human knowledge beyond the game rules (prior know.)

  • 3. asynchronous multi-threaded search that executes simulations
  • n CPUs, and computes DNN in parallel on GPUs

Motivation: EvalFn ⇐ stochastic simulation ⇐ deep learning (see later in machine learning)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 161

slide-162
SLIDE 162

Example: Alpha0

Implementation Raw board representation: 19 × 19 × 17 historic position st = [Xt, Yt, Xt−1, Yt−1, · · · , Xt−7, Yt−7, C] Reading Silver D, et. al., Mastering the game of Go without human knowledge, Nature 550, 354-359, 2017; or

Silver, D, et. al., A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play, Science 07 Dec 2018: Vol. 362, Issue 6419, pp. 1140-1144

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 162

slide-163
SLIDE 163

Alpha0 algorithm: MCTS

Improvement MCTS: using a DNN and self-play

  • a. Selecting s with maximum value Q =

1 N(s,a)

  • s′|s,a→s′ V (s′) + an

upper confidence bound U ∝

P(s,a) 1+N(s,a) (stored prior probability P and

visit count N) (each simulation traverses the tree; don’t need rollout)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 163

slide-164
SLIDE 164

Alpha0 algorithm: MCTS

Improvement MCTS: using a DNN and self-play

  • b. Expanding leaf and evaluating s by DNN (P(s, ·), V (s)) = fθ(s)
  • c. Updating Q to track all V in the subtree
  • d. Once completed, search probabilities π ∝ N(s, a)1/τ are returned

(τ is a hyperparameter)

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 164

slide-165
SLIDE 165

Alpha0 pseudocode

function Alpha0(state) returns a move inputs: rules, the game rules scores, the game scores board, the board representation /* (historic data and color for players)*/ create root node with state s0, initially random play while within computational budget do αθ ← Mcts(s(fθ)) a ← Move(s, αθ) return a(BestMove(αθ))

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 165

slide-166
SLIDE 166

Alpha0 pseudocode

function Mcts(state, fθ) returns αθ inputs: tree, the search tree /*First In Last Out*/ P(s, a): prior probability, each edge (s, a) in tree N (s, a): visit count Q(s, a): action value while within computational budget do fθ ← Dnn(st,πt,zt) (P(s′, ·), V (s′)) ← fθ U(s, a) ← Policy(P(s′, ·),s0) Q(s, a) ← Value(V(s′),s0) s′ ← Max(U (s, a) + Q(s, a)) Backup(s′,Q) return αθ(BestChild(s0))

A free source codes LeelaZero of implementing AlphaGo Zero (and more, say, Facebook): https://github.com/ssxy00/leela-zero

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 166

slide-167
SLIDE 167

Complexity of Alpha0 algorithm

Go/Chess are NP-hard (“almost” in PSPACE) problems Alpha0 do not reduce the complexity of Go/Chess, but – outperforms human in the complexity – – practical approach to handle NP-hard problems – obeys the complexity of MCTS and machine learning – – performance improvements from deep learning by bigdata and computational power Alpha0 toward N-steps optimization?? – if so, a draw on Alpha0 vs. Alpha0 for Go/Chess

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 167

slide-168
SLIDE 168

Alpha0 vs. Deep Blue

Alpha0Go/Chess exceeded the performance of all other Go/Chess programs, demonstrating that DNN provides a viable alternative to Monte Carlo simulation Evaluated thousands of times fewer positions than Deep Blue did in match – while Deep Blue relied on a handcrafted evaluation function, Alpha0’s neural network is trained purely through self-play reinforce- ment learning

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 168

slide-169
SLIDE 169

Example: Card

Four-card bridge/whist/hearts hand, Max to play first

8 9 2 6 6 6 8 7 6 6 7 6 6 7 6 6 7 6 7 4 2 9 3 4 2 9 3 4 2 3 4 3 4 3 6 4 8 9 2 6 6 8 7 6 6 7 6 6 7 6 6 7 7 2 9 3 2 9 3 2 3 3 3 4 4 4 4 6 6 4 8 9 2 6 6 8 7 6 6 7 6 6 7 2 9 3 2 9 3 2 3 7 3 6 4 6 6 7 3 4 4 4 6 6 7 3 4

−0.5 −0.5

MAX MIN MAX MIN MAX MIN

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 169

slide-170
SLIDE 170

Imperfect information games in practice

Poker: surpass human experts in the game of heads-up no-limit Texas hold’em, which has over 10160) decision points – DeepStack: beat top poker pros in limit Texas hold’em in 2008, and defeated a collection of poker pros in heads-up no-limit in 2016 – Libratus: current two-time champion of the Annual Computer Poker Competition in heads-up no-limit, and defeated a team of top heads-up no-limit specialist pros in 2017 StarCraft II (real-time strategy games): Deep Mind AlphaStar 5-0 defeated a top professional player in Dec. 2018 Imperfect information games involves obstacles not present in classic board games like go, but which are present in many real-world applications, such as nego- tiation, auctions, security, weather prediction and climate modelling etc.

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 170

slide-171
SLIDE 171

Funny games

Games illustrate several important points about AI

  • perfection is unattainable ⇒ approximate
  • good idea to think about what to think about
  • uncertainty constrains the assignment of values to states
  • optimal decisions depend on information state, not real state
  • hard to find the principle of human thinking

Games are fun to work on, but dangerous

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 171

slide-172
SLIDE 172

Metaheuristic

Metaheuristic: higher-level procedure or heuristic to find a heuristic for optimization ⇐ local seach e.g., simulated annealing Metalevel vs. object-level state space Each state in a metalevel state space captures the internal state

  • f a program that is searching in an object-level state space

An agent can learn how to search better – metalevel learning algorithm can learn from experiences to avoid exploring unpromising subtrees – learning is to minimize the total cost of problem solving specially, learning admissible heuristics from example, e.g., 8- puzzle

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 172

slide-173
SLIDE 173

Tabu search

Tabu search is a metaheuristic employing local search resolving stuck in local minimum or plateaus e.g., Hill-Climbing tabu (forbidden) uses memory structures (tabulist) that describe the visited solutions or user-provided rules – to discourage the search from coming back to previously-visited solutions – depended certain short-term period or violated a rule (marked as “tab”) to avoid repeat

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 173

slide-174
SLIDE 174

Tabu search algorithm

function Tabu-Search( solution) returns solution-best inputs: solution, a solution (maybe choose random) persistent: tabulist, a memory structure for states visited, initially empty solution-best ← solution while not Stopping-Condition Candidate-List ← null Best-Candidate-List ← null for solution.Candidate in solution.Neighborhood if (not tabulist.contains(solution.Candidate)) and fitness(solution.Candidate)> fitness(best.Candidate) best.Candidate ← solution.Candidate solution ← best-Candidate if fitness(best-Candidate) > fitness(solution-best) solution-best ← best-Candidate tabuist.push(best-Candidate) if tabulist.Size > maxTabuSize tabulist.removeFirst() return solution-best

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 174

slide-175
SLIDE 175

Searching search

Researchers have taken inspiration for search (and optimization) al- gorithms from a wide variety of fields – metallurgy (simulated annealing) – biology (genetic algorithms) – economics (market-based algorithms) – entomology (ant colony) – neurology (neural networks) – animal behavior (reinforcement learning) – mountaineering (hill climbing) – politics (struggle forms), and others

Ref: Pearl, J (1984), Heuristics: Intelligent Search Strategies for Computer Problem Solving, Addison- Wesley

Is there a general problem solver to generality of intelligence?? – NO Say, General Problem Solvers (GPS), Alpha0

AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 175