The Gizmo Player Simon Doll Jan Kopcsek Alper Tunga Dresden, - - PowerPoint PPT Presentation

the gizmo player
SMART_READER_LITE
LIVE PREVIEW

The Gizmo Player Simon Doll Jan Kopcsek Alper Tunga Dresden, - - PowerPoint PPT Presentation

Fakulttsname Informatik Fachrichtung Informatik Institutsname Intelligente Systeme The Gizmo Player Simon Doll Jan Kopcsek Alper Tunga Dresden, 13.02.2008 Finding a heuristic function Two ways for learning a heuristic function:


slide-1
SLIDE 1

The Gizmo Player

Fakultätsname Informatik Fachrichtung Informatik Institutsname Intelligente Systeme

Dresden, 13.02.2008

Simon Dollé Jan Kopcsek Alper Tunga

slide-2
SLIDE 2

TU Dresden, 13.02.2008 Gizmo Player Slide 2 of 10

Finding a heuristic function

Two ways for learning a heuristic function:

  • Deductive

– Analyzing the rules – Identify common elements like game boards or pieces – Finding patterns

  • Inductive

– Playing and learning from experience – Monte Carlo strategy

slide-3
SLIDE 3
  • Play random games
  • Compute the means of scores

for each move

  • Use them as a

heuristic function

TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

Monte Carlo Strategy

slide-4
SLIDE 4
  • Play random games
  • Compute the means of scores

for each move

  • Use them as a

heuristic function

TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

Monte Carlo Strategy

slide-5
SLIDE 5
  • Play random games
  • Compute the means of scores

for each move

  • Use them as a

heuristic function

TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

Monte Carlo Strategy

slide-6
SLIDE 6
  • Play random games
  • Compute the means of scores

for each move

  • Use them as a

heuristic function

TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

Monte Carlo Strategy

slide-7
SLIDE 7
  • Play random games
  • Compute the means of scores

for each move

  • Use them as a

heuristic function

TU Dresden, 13.02.2008 Gizmo Player Slide 3 of 10

Monte Carlo Strategy

slide-8
SLIDE 8

TU Dresden, 13.02.2008 Gizmo Player Slide 4 of 10

Monte Carlo Strategy

  • Problem:
  • Same effort spend on interesting moves and uninteresting moves
  • Equivalent to play against a dummy player
  • UCT Algorithm (Upper Confidence Bound for Trees):
  • An algorithm to balance:
  • Exploration of interesting parts of the graph
  • Exploration of new parts
  • Make random games more realistic
slide-9
SLIDE 9
  • As long as there are unexplored

moves from our current state, explore them

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

slide-10
SLIDE 10

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

  • As long as there are unexplored

moves from our current state, explore them

slide-11
SLIDE 11

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

  • As long as there are unexplored

moves from our current state, explore them

slide-12
SLIDE 12

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

  • As long as there are unexplored

moves from our current state, explore them

slide-13
SLIDE 13

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

  • As long as there are unexplored

moves from our current state, explore them

slide-14
SLIDE 14

TU Dresden, 13.02.2008 Gizmo Player Slide 5 of 10

UCT Algorithm

  • As long as there are unexplored

moves from our current state, explore them

slide-15
SLIDE 15
  • As long as there are unexplored

moves from our current state, explore them

  • Otherwise, choose the one with

the highest score using

TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10 h : the heuristic value n : the number of games through the parent node ni : the number of games through the node

UCT Algorithm

slide-16
SLIDE 16

TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

UCT Algorithm

  • As long as there are unexplored

moves from our current state, explore them

  • Otherwise, choose the one with

the highest score using

h : the heuristic value n : the number of games through the parent node ni : the number of games through the node

slide-17
SLIDE 17

TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

UCT Algorithm

  • As long as there are unexplored

moves from our current state, explore them

  • Otherwise, choose the one with

the highest score using

h : the heuristic value n : the number of games through the parent node ni : the number of games through the node

slide-18
SLIDE 18

TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

UCT Algorithm

  • As long as there are unexplored

moves from our current state, explore them

  • Otherwise, choose the one with

the highest score using

h : the heuristic value n : the number of games through the parent node ni : the number of games through the node

slide-19
SLIDE 19

TU Dresden, 13.02.2008 Gizmo Player Slide 6 of 10

UCT Algorithm

  • As long as there are unexplored

moves from our current state, explore them

  • Otherwise, choose the one with

the highest score using

h : the heuristic value n : the number of games through the parent node ni : the number of games through the node

slide-20
SLIDE 20

TU Dresden, 13.02.2008 Gizmo Player Slide 7 of 10

UCT Algorithm

  • Which move to play?
  • The one with the highest

heuristic value

  • In multiplayer games:
  • Store the heuristic value

for each player

slide-21
SLIDE 21

TU Dresden, 13.02.2008 Gizmo Player Slide 8 of 10

Good points

  • Heuristic directly linked to the final score
  • Heuristic converges to min-max values
  • Time scalable
  • Easily parallelisable
slide-22
SLIDE 22

TU Dresden, 13.02.2008 Gizmo Player Slide 9 of 10

Problems

  • Simultaneous moves:

– What rule to choose to explore the nodes? – Which move to play?

  • Long games and loops:

– Depth first search problem

slide-23
SLIDE 23

TU Dresden, 13.02.2008 Gizmo Player Slide 10 of 10

Thank you for your attention And good luck to your players