R i f Reinforcement Learning in L i i Board Games Board Games - - PowerPoint PPT Presentation

r i f reinforcement learning in l i i board games board
SMART_READER_LITE
LIVE PREVIEW

R i f Reinforcement Learning in L i i Board Games Board Games - - PowerPoint PPT Presentation

R i f Reinforcement Learning in L i i Board Games Board Games G E O R G E T U C K E R G E O R G E T U C K E R Paper Background Reinforcement learning in board games g g Imran Ghory 2004 Surveys progress in last decade


slide-1
SLIDE 1

R i f L i i Reinforcement Learning in Board Games

G E O R G E T U C K E R

Board Games

G E O R G E T U C K E R

slide-2
SLIDE 2

Paper Background

Reinforcement learning in board games

g g

Imran Ghory 2004

Surveys progress in last decade Suggests improvements Formalizes key game properties Develops a TD-learning game system

slide-3
SLIDE 3

Why board games?

Regarded as a sign of intelligence and learning

g g g g

Chess

Games as simplified models

Battleship

Existing methods of comparison

i

Rating systems

slide-4
SLIDE 4

What is reinforcement learning?

After a sequence of actions get a reward

q g

Positive or negative

Temporal credit assignment problem

Determine credit for the reward Temporal Difference Methods TD-lambda TD-lambda

slide-5
SLIDE 5

History

Basics developed by Arthur Samuel

p y

Checkers

Richard Sutton introduced TD-lambda Gerald Tesauro creates TD-Gammon Chess and Go

Worse then conventional AI

slide-6
SLIDE 6

History

Othello

Contradictory results

Substantial growth since then TD-lambda has potential to learn game variants

slide-7
SLIDE 7

Conventional Strategies

Most methods use an evaluation function Use minimax/ alpha-beta search Hand-designed feature detectors

g

Evaluation function is a weighted sum

So why TD learning?

Does not need hand coded features

li i

Generalization

slide-8
SLIDE 8

Temporal Difference Learning

slide-9
SLIDE 9

Temporal Difference Learning

slide-10
SLIDE 10

Disadvantage

Requires lots of training

q g

Self-play

Short-term pathologies Randomization

slide-11
SLIDE 11

TD Algorithm Variants

TD-Leaf

Evaluation function search

TD-Directed

Minimax search

TD-Mu

i d

Fixed opponent Use evaluation function on opponent’s moves

slide-12
SLIDE 12

Current State

Many improvements

y p

Sparse and dubious validation Hard to check

Tuning weights

Nonlinear combinations Differentiate between effective and ineffective Differentiate between effective and ineffective

Automated evolution method of feature generation

Turian Turian

slide-13
SLIDE 13

Important Game Properties

Board Smoothness

Capabilities tied to smoothness Based on the board representation

Divergence rate Divergence rate

Measure how a single move changes the board Backgammon and Chess – low to medium Othello – high

Forced exploration

St t s l it

State space complexity

Longer training Possibly the most important factor

y p

slide-14
SLIDE 14

Importance of State space complexity

slide-15
SLIDE 15

Training Data

Random play

Limited use

Fixed opponent

Game environment and opponent are one Game environment and opponent are one

Database play

Speed

p

Self-play

No outside sources for data

Sl

Slow Learns what works

Hybrid methods

Hybrid methods

slide-16
SLIDE 16

Improvement: General

Reward size

Fixed value Based on end board

Board encoding Board encoding When to learn?

Every move?

y

Random moves?

Repetitive learning

d

Board inversion Batch learning

slide-17
SLIDE 17

Improvement: Neural Network

Functions in Neural Network

Radial Basis Functions

Training algorithm

RPROP

Random weight initialization

Si ifi

Significance

slide-18
SLIDE 18

Improvement: Self-play

Asymmetry

y y

Game-tree + function approximator

Player handling

Tesauro adds an extra unit Negate score (zero-sum game) Reverse colors Reverse colors

Random moves

Algorithm Algorithm

Informed final board evaluation

slide-19
SLIDE 19

Evaluation

Tic-tac-toe and Connect 4

Amenable to TD-learning Human board encoding is near optimal

Networks across multiple games

A general game player Plays perfectly near end game Plays perfectly near end game Randomly otherwise Random-decay handicap % of moves are random Common system

slide-20
SLIDE 20

Random Initializations

Significant impact on learning

g p g

slide-21
SLIDE 21

Inverted Board

Speeds up initial training

p p g

slide-22
SLIDE 22

Random Move Selection

More sophisticated techniques are required

p q q

slide-23
SLIDE 23

Reversed Color Evaluation

slide-24
SLIDE 24

Batch Learning

Similar to control

slide-25
SLIDE 25

Repetitive learning

No advantage

g

slide-26
SLIDE 26

Informed Final Board Evaluation

Extremely significant

y g

slide-27
SLIDE 27

Conclusion

Inverted boards and reverse color evaluation Initialization is important Biased randomization techniques

q

Batch learning has promise Informed final board evaluation is important

p