Comparing Direct and Indirect Encodings Using Both Raw and - - PowerPoint PPT Presentation

comparing direct and indirect encodings using both raw
SMART_READER_LITE
LIVE PREVIEW

Comparing Direct and Indirect Encodings Using Both Raw and - - PowerPoint PPT Presentation

Comparing Direct and Indirect Encodings Using Both Raw and Hand-Designed Features in Tetris By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu schrum2@southwestern.edu Howard


slide-1
SLIDE 1

By Lauren Gillespie, Gabby Gonzales and Jacob Schrum

gillespl@southwestern.edu

gonzale9@alumni.southwestern.edu

schrum2@southwestern.edu

Howard Hughes Medical Institute

Comparing Direct and Indirect Encodings Using Both Raw and Hand-Designed Features in Tetris

slide-2
SLIDE 2

GECCO 2017.

Introduction

  • Challenge: Use less domain-specific knowledge

✦ Important for general agents ✦ Accomplished using raw inputs ✦ Need to be able to process with a neural network

  • Why challenging?

✦ Complex domains = Large input space ✦ Large input space = Large neural networks ✦ Large neural networks = Difficult to train

slide-3
SLIDE 3

GECCO 2017.

  • Deep Learning applies large NN to hard tasks†
  • HyperNEAT also capable of handling large NNs

✦ Indirect encoding, good with geometric inputs‡ ✦ Compare to direct encoding, NEAT ✦ See if indirect encoding advantageous ✦ Also compare with hand-designed features

Addressing Challenges

† Mnih et al. 2013. Playing Atari with Deep Reinforcement Learning. ‡Hausknecht et al. 2012. HyperNEAT-GGP: A HyperNEAT-based Atari General Game Player.

slide-4
SLIDE 4

GECCO 2017.

Direct Vs. Indirect Encoding

Direct Encoding Indirect Encoding

Evolved network and agent network

(NEAT) (HyperNEAT)

Evolved network Agent network

UTILITY X Y X Y Bias

… … … …

slide-5
SLIDE 5

GECCO 2017.

Tetris Domain

  • Consists of 10 x 20 game board
  • Orient tetrominoes to clear lines
  • Clearing multiple lines = more points
  • NP-Complete domain†
  • One piece controller

✦ Agent has knowledge of current piece only

† Breukelaar et al. 2004. Tetris is hard, even to approximate.

slide-6
SLIDE 6

GECCO 2017.

Previous Work

  • Tetris Domain

All use hand-designed features

Reinforcement Learning:

Temporal difference learning: Bertsekas et al. 1996, Genesereth & Björnsson 2013

Policy search: Szitza & Lörincz 2006

Approximate Dynamic Programming: Gabillon et al. 2013

Evolutionary Computation:

Simple EA with linear function approximator: Böhm et al. 2004

Covariance Matrix Adaptation Evolution Strategy: Boumaza 2009

  • Raw Visual Inputs

Neuroevolution: Gauci & Stanley 2008, Verbancsics & Stanley 2010

General video game playing in Atari: Hausknecht et al. 2012, Mnih et al. 2013

Asterix game from Atari 2600 Suite

slide-7
SLIDE 7

GECCO 2017.

  • Most common input scheme for training ANNs†
  • Hand-picked information of game state as input

✦ Network doesn’t deal with excess info ✦ Smaller input space, easier to learn ✦ Very domain-specific, not versatile ✦ Human expertise needed ✦ Useful features not always apparent

Hand-Designed Features

Pros: Cons:

† Schrum & Miikkulainen. 2016. Discovering Multimodal Behavior in Ms. Pac-Man through Evolution of Modular Neural Networks.

slide-8
SLIDE 8

GECCO 2017.

Raw Features

  • One feature per game state element
  • Minimal input processing by user

✦ Networks less limited by domain† ✦ Less human expertise needed ✦ Large input space & networks ✦ Harder to learn, more time

Pros: Cons:

† Gauci & Stanley. 2008. A Case Study on the Critical Role of Geometric Regularity in Machine Learning.

slide-9
SLIDE 9

GECCO 2017.

NEAT

  • NeuroEvolution of Augmenting Topologies†
  • Synaptic and structural mutations
  • Direct encoding

✦ Network size proportional to genome size

  • Crossover alignment via historical markings
  • Inefficient with large input sets

✦ Mutations do not alter behavior effectively

Perturb Weight Add Connection Add Node

† Stanley & Miikkulainen. 2002. Evolving Neural Networks Through Augmenting Topologies

slide-10
SLIDE 10

GECCO 2017.

HyperNEAT

  • Hypercube-based NEAT†
  • Extension of NEAT
  • Indirect encoding

✦ Evolved CPPNs encode larger substrate-based agent ANNs

  • Compositional Pattern-Producing Networks (CPPNs)

✦ CPPN queried across substrate to create agent ANN ✦ Inputs = neuron coordinates, outputs = link weights

  • Substrates

✦ Layers of neurons with geometric coordinates ✦ Substrate layout determined by domain/experimenter

† Stanley et al. 2009. A Hypercube- based Encoding for Evolving Large-scale Neural Networks

UTILITY X Y X Y Bias

slide-11
SLIDE 11

GECCO 2017.

  • Geometric awareness: arises from indirect encoding
  • CPPN encodes geometry of domain into agent via substrates
  • Agent network can learn from task-relevant domain geometry

HyperNEAT with Tetris

Substrate layers

Input substrates

CPPN

UTILITY X Y X Y Bias

Game State

detailed view

Agent network

… … … …

slide-12
SLIDE 12

GECCO 2017.

  • Board configuration:

✦ Two input sets

  • 1. Location of all blocks

❖ block = 1, no block = 0

  • 2. Location of all holes

❖ hole = -1, no hole = 0

  • NEAT: Inputs in linear sequence
  • HyperNEAT: Two 2D input substrates

Raw Features Setup

slide-13
SLIDE 13

GECCO 2017.

  • Bertsekas et al. features† plus additional hole per column feature
  • All scaled to [0,1]

✦ Column height ✦ Height difference ✦ Tallest column ✦ Number of holes ✦ Holes per column

Hand-Designed Features Setup

† Bersekas et al. 1996. Neuro-Dynamic Programming

X Y X Y Bias UTILITY

HEIGHTS DIFFS HOLES MAX HEIGHT TOTAL HOLES

✦ ✦ ✦ ✦ ✦

slide-14
SLIDE 14

GECCO 2017.

Experimental Setup

  • Agent networks are afterstate evaluators
  • Each experiment evaluated with 30 runs

✦ 500 generations/run, 50 agents/generation ✦ Objectives averaged across 3 trials/agent ❖ Noisy domain, multiple trials needed

  • NSGA-II objectives: game score & survival time
slide-15
SLIDE 15

GECCO 2017

NEAT vs. HyperNEAT: Raw Features

50 100 150 200 250 300 350 400 100 200 300 400 500 Game Score Generation HyperNEAT Raw NEAT Raw

slide-16
SLIDE 16

GECCO 2017

NEAT vs. HyperNEAT: Hand-Designed Features

5000 10000 15000 20000 25000 30000 35000 100 200 300 400 500 Game Score Generation HyperNEAT Features NEAT Features

slide-17
SLIDE 17

GECCO 2017

Raw Features Champion Behavior

NEAT with Raw Features HyperNEAT with Raw Features

slide-18
SLIDE 18

GECCO 2017

Hand-Designed Features Behavior

NEAT with Hand-Designed Features HyperNEAT with Hand-Designed Features

slide-19
SLIDE 19

GECCO 2017.

Visualizing Substrates

Hidden

Output

Inputs Result

slide-20
SLIDE 20

GECCO 2017.

Discussion

  • Raw features: HyperNEAT clearly better than NEAT

✦ Indirect encoding advantageous ✦ NEAT ineffective at evolving large networks

  • Hand-Designed: HyperNEAT has less of an advantage

✦ Geometric awareness less important ✦ HyperNEAT CPPN limited by substrate topology

slide-21
SLIDE 21

GECCO 2017.

Future Work

  • HybrID†

✦ Start with HyperNEAT, switch to NEAT ✦ Gain advantage of both encodings

  • Raw feature Tetris with Deep Learning
  • Raw features in other visual domains

✦ Video games: DOOM, Mario, Ms. Pac-Man ✦ Board games: Othello, Checkers

† Clune et al. 2004. HybrID: A Hybridization of Indirect and Direct Encodings for Evolutionary Computation.

slide-22
SLIDE 22

GECCO 2017.

Conclusion

  • Raw features
  • Indirect encoding HyperNEAT effective
  • Geometric awareness an advantage
  • Hand-designed features
  • Ultimately NEAT produced better agents
  • HybrID might combine strengths of both
slide-23
SLIDE 23

GECCO 2017. GECCO 2017

Questions?

  • Contact info:

gillespl@southwestern.edu schrum2@southwestern.edu gonzale9@alumni.southwestern.edu

  • Movies and Code:

https://tinyurl.com/tetris-gecco2017

slide-24
SLIDE 24

Auxiliary Slides

slide-25
SLIDE 25

GECCO 2017.

NSGA-II

  • Pareto-based multiobjective EA optimization
  • Parent population, μ, evaluated in domain
  • Child population, λ, evolved from μ and evaluated
  • μ + λ sorted into non-dominated Pareto fronts
  • Pareto front: All individual such that
  • v = (v1, . . . ,vn) dominates vector u = (u1, . . . ,un) iff

1.∀i ∈{1,...,n}:vi ≥ui , and 2.∃i ∈{1,...,n}:vi >ui.

  • New μ picked from highest fronts
  • Tetris objectives: Game score, time

Time alive Game score

Pareto front

slide-26
SLIDE 26

GECCO 2017.

Visualizing Link Weights

slide-27
SLIDE 27

GECCO 2017.

Afterstate Evaluation

  • Evolved agents used as afterstate evaluators
  • Determine next move from state after placing piece
  • All possible piece locations determined, evaluated
  • Placement with best evaluation from state chosen
  • If placements lead to loss, not considered
  • Agent moves piece to best placement, repeats