comparing direct and indirect encodings using both raw
play

Comparing Direct and Indirect Encodings Using Both Raw and - PowerPoint PPT Presentation

Comparing Direct and Indirect Encodings Using Both Raw and Hand-Designed Features in Tetris By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu schrum2@southwestern.edu Howard


  1. Comparing Direct and Indirect Encodings Using Both Raw and Hand-Designed Features in Tetris By Lauren Gillespie, Gabby Gonzales and Jacob Schrum gillespl@southwestern.edu gonzale9@alumni.southwestern.edu schrum2@southwestern.edu Howard Hughes Medical Institute

  2. Introduction • Challenge: Use less domain-specific knowledge ✦ Important for general agents ✦ Accomplished using raw inputs ✦ Need to be able to process with a neural network • Why challenging? ✦ Complex domains = Large input space ✦ Large input space = Large neural networks ✦ Large neural networks = Difficult to train GECCO 2017.

  3. Addressing Challenges • Deep Learning applies large NN to hard tasks † • HyperNEAT also capable of handling large NNs ✦ Indirect encoding, good with geometric inputs ‡ ✦ Compare to direct encoding, NEAT ✦ See if indirect encoding advantageous ✦ Also compare with hand-designed features † Mnih et al. 2013. Playing Atari with Deep Reinforcement Learning. ‡Hausknecht et al. 2012. HyperNEAT-GGP: A HyperNEAT-based Atari General Game Player. GECCO 2017.

  4. Direct Vs. Indirect Encoding Evolved network Agent network and agent network UTILITY X Y X Y Bias … Evolved network … … … Direct Encoding Indirect Encoding (NEAT) (HyperNEAT) GECCO 2017.

  5. Tetris Domain • Consists of 10 x 20 game board • Orient tetrominoes to clear lines • Clearing multiple lines = more points • NP-Complete domain † • One piece controller ✦ Agent has knowledge of current piece only † Breukelaar et al. 2004. Tetris is hard, even to approximate. GECCO 2017.

  6. Previous Work Tetris Domain • All use hand-designed features ✦ Reinforcement Learning: ✦ Temporal difference learning: Bertsekas et al. 1996, Genesereth & Björnsson 2013 ❖ Policy search: Szitza & Lörincz 2006 ❖ Approximate Dynamic Programming: Gabillon et al. 2013 ❖ Evolutionary Computation: ✦ Simple EA with linear function approximator: Böhm et al. 2004 ❖ Covariance Matrix Adaptation Evolution Strategy: Boumaza 2009 ❖ Raw Visual Inputs • Asterix game from Atari 2600 Suite Neuroevolution: Gauci & Stanley 2008, Verbancsics & Stanley 2010 ✦ General video game playing in Atari: Hausknecht et al. 2012, Mnih et al. 2013 ✦ GECCO 2017.

  7. Hand-Designed Features • Most common input scheme for training ANNs † • Hand-picked information of game state as input ✦ Network doesn’t deal with excess info Pros: ✦ Smaller input space, easier to learn ✦ Very domain-specific, not versatile Cons: ✦ Human expertise needed ✦ Useful features not always apparent † Schrum & Miikkulainen. 2016. Discovering Multimodal Behavior in Ms. Pac-Man through Evolution of Modular Neural Networks. GECCO 2017.

  8. Raw Features • One feature per game state element • Minimal input processing by user ✦ Networks less limited by domain † Pros: ✦ Less human expertise needed ✦ Large input space & networks Cons: ✦ Harder to learn, more time † Gauci & Stanley. 2008. A Case Study on the Critical Role of Geometric Regularity in Machine Learning. GECCO 2017.

  9. NEAT Perturb Weight Add Connection Add Node • NeuroEvolution of Augmenting Topologies † • Synaptic and structural mutations • Direct encoding ✦ Network size proportional to genome size • Crossover alignment via historical markings • Inefficient with large input sets ✦ Mutations do not alter behavior effectively † Stanley & Miikkulainen. 2002. Evolving Neural Networks Through Augmenting Topologies GECCO 2017.

  10. UTILITY HyperNEAT X Y X Y Bias • Hypercube-based NEAT † • Extension of NEAT • Indirect encoding ✦ Evolved CPPNs encode larger substrate-based agent ANNs • Compositional Pattern-Producing Networks (CPPNs) ✦ CPPN queried across substrate to create agent ANN ✦ Inputs = neuron coordinates, outputs = link weights • Substrates ✦ Layers of neurons with geometric coordinates ✦ Substrate layout determined by domain/experimenter † Stanley et al. 2009. A Hypercube- based Encoding for Evolving Large-scale Neural Networks GECCO 2017.

  11. HyperNEAT with Tetris • Geometric awareness : arises from indirect encoding • CPPN encodes geometry of domain into agent via substrates • Agent network can learn from task-relevant domain geometry detailed view Game UTILITY X State Y X Y … Bias Input CPPN substrates … … … Substrate Agent network layers GECCO 2017.

  12. Raw Features Setup • Board configuration: ✦ Two input sets 1. Location of all blocks ❖ block = 1, no block = 0 2. Location of all holes ❖ hole = -1, no hole = 0 • NEAT: Inputs in linear sequence • HyperNEAT: Two 2D input substrates GECCO 2017.

  13. Hand-Designed Features Setup • Bertsekas et al. features † plus additional hole per column feature • All scaled to [0,1] ✦ MAX HEIGHT X UTILITY Y ✦ Column height X Y Bias ✦ Height difference ✦ Tallest column ✦ TOTAL HOLES ✦ Number of holes ✦ Holes per column ✦ ✦ ✦ HEIGHTS DIFFS HOLES † Bersekas et al. 1996. Neuro-Dynamic Programming GECCO 2017.

  14. Experimental Setup • Agent networks are afterstate evaluators • Each experiment evaluated with 30 runs ✦ 500 generations/run, 50 agents/generation ✦ Objectives averaged across 3 trials/agent ❖ Noisy domain, multiple trials needed • NSGA-II objectives: game score & survival time GECCO 2017.

  15. NEAT vs. HyperNEAT: Raw Features 400 HyperNEAT Raw 350 NEAT Raw 300 Game Score 250 200 150 100 50 0 0 100 200 300 400 500 Generation GECCO 2017

  16. NEAT vs. HyperNEAT: Hand-Designed Features 35000 HyperNEAT Features NEAT Features 30000 25000 Game Score 20000 15000 10000 5000 0 0 100 200 300 400 500 Generation GECCO 2017

  17. Raw Features Champion Behavior NEAT with Raw Features HyperNEAT with Raw Features GECCO 2017

  18. Hand-Designed Features Behavior HyperNEAT with NEAT with Hand-Designed Hand-Designed Features Features GECCO 2017

  19. Visualizing Substrates Hidden Output Result Inputs GECCO 2017.

  20. Discussion • Raw features: HyperNEAT clearly better than NEAT ✦ Indirect encoding advantageous ✦ NEAT ineffective at evolving large networks • Hand-Designed: HyperNEAT has less of an advantage ✦ Geometric awareness less important ✦ HyperNEAT CPPN limited by substrate topology GECCO 2017.

  21. Future Work • HybrID † ✦ Start with HyperNEAT, switch to NEAT ✦ Gain advantage of both encodings • Raw feature Tetris with Deep Learning • Raw features in other visual domains ✦ Video games: DOOM, Mario, Ms. Pac-Man ✦ Board games: Othello, Checkers † Clune et al. 2004. HybrID: A Hybridization of Indirect and Direct Encodings for Evolutionary Computation. GECCO 2017.

  22. Conclusion • Raw features • Indirect encoding HyperNEAT effective • Geometric awareness an advantage • Hand-designed features • Ultimately NEAT produced better agents • HybrID might combine strengths of both GECCO 2017.

  23. Questions? • Contact info: gillespl@southwestern.edu schrum2@southwestern.edu gonzale9@alumni.southwestern.edu • Movies and Code: https://tinyurl.com/tetris-gecco2017 GECCO 2017. GECCO 2017

  24. Auxiliary Slides

  25. NSGA-II • Pareto-based multiobjective EA optimization • Parent population, μ , evaluated in domain • Child population, λ , evolved from μ and evaluated • μ + λ sorted into non-dominated Pareto fronts • Pareto front: All individual such that • v = ( v 1 , . . . , v n ) dominates vector u = ( u 1 , . . . , u n ) iff Time alive 1. ∀ i ∈ {1,..., n }: v i ≥ u i , and Pareto front 2. ∃ i ∈ {1,..., n }: v i > u i . • New μ picked from highest fronts • Tetris objectives: Game score, time Game score GECCO 2017.

  26. Visualizing Link Weights GECCO 2017.

  27. Afterstate Evaluation • Evolved agents used as afterstate evaluators • Determine next move from state after placing piece • All possible piece locations determined, evaluated • Placement with best evaluation from state chosen • If placements lead to loss, not considered • Agent moves piece to best placement, repeats GECCO 2017.

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend