emergent solu ons to high dimensional mul task
play

Emergent solu-ons to high dimensional mul--task reinforcement - PowerPoint PPT Presentation

Humies Compe--on GECCO 2018 Emergent solu-ons to high dimensional mul--task reinforcement learning Stephen Kelly & Malcolm Heywood Why does the result qualify as human compe--ve? Visual State s ( t ) End-of-Evalua-on Game


  1. “Humies” Compe--on GECCO 2018 Emergent solu-ons to high dimensional mul--task reinforcement learning Stephen Kelly & Malcolm Heywood

  2. Why does the result qualify as human compe--ve? Visual State s ( t ) End-of-Evalua-on Game Playing Game score Agent Game -tle: - Atari - Doom Atomic Ac-on a ( t ) July 2018 Humies 2

  3. Visual RL dominated by Deep learning • DQN (2015) – Visual RL on Atari Learning Environment (49 -tles) – Q-learning with Deep learning – Cropped visual image (84 × 84) – Frame stacking (removes the interleaving of sprites & stochas-c proper-es) – “able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games” [Nature (2015) Vol. 518] • Gorila (2015), Double Q (2016), Dueling DL (2016), AC3 (2016), Noisy DQN (2017), Distribu-onal DQN (2017), Rainbow (2018) • One policy per game -tle • Learning parameters and DNN topology iden-fied a priori July 2018 Humies 3

  4. Visual RL Compared to ‘human’ 100 (algorithm – rnd)/(Human – rnd) Log ( % human ) TPG DQN Gorila Double-DQN H-NEAT 10000 Sta-s-cally equivalent Algorithm Beker than Human 1000 Human 100 level Best -tle Worst -tle 10 Algorithm Worse than Human 1 July 2018 Humies 4

  5. Visual RL and Mul--task learning • Mul-ple game -tles played by single agent • Single -tle DQN provides the baseline • Best DNN result needs prior knowledge regarding parameters and topology • Cons-tutes an example of a task pertaining to ‘Ar-ficial General Intelligence’ July 2018 Humies 5

  6. Single Title DQN score (%DQN score) July 2018 Mul---tle TPG versus Single--tle DQN Worse Beker Log 1000 100 10 1 Alien Bakle Zone Group 1 Asteroids Bank Heist Bowling Copper Com. Humies Cen-pede Group 2 Fishing Derby Kangaroo Frostbite Krull Kung-Fu Group 3 Ms.Pac-Man Time Pilot Private Eye 6

  7. Why [is our entry] ‘best’ in comparison to other entries? • Single -tle task – TPG provides solu-ons compe--ve with human and DQN – Agents have to be compe--ve over mul-ple game -tles • Mul---tle task – TPG mul--task solu-on is compe--ve with DQN trained under single -tle sepng – DNN state-of-the-art in single task does not address Mul---tle task • TPG for Single -tle task a special case of TPG for Mul-- -tle task July 2018 Humies 7

  8. The ‘icing on the cake’ • TPG addresses mul-ple issues simultaneously: – Complexity of topology is emergent and: • Highly modular • Unique to the task • Explicitly reflects a decomposi-on of the task – No image specific instruc9ons just: • Four 2 Argument operators {+, −, ×, ÷} • Three 1 Argument operators {log, exp, cosine} • One condi-onal operator – TPG highly efficient computa9onally – Some examples… July 2018 Humies 8

  9. Entire champion policy graph ● Visited per decision during test ● Bowling Ms. Pac − Man Teams (nodes) per 200 [ diko pixels used ] ● ● Overall solu-on graph emerge… 100 ● Boxing complexity ● ● ● ● ● Alien ● ● 50 ● ● ● ● Number of Teams ● ● ● ● ● ● ● 20 ● ● 10 ● Asteroids ● Per Decision complexity 5 2 ● Rand 1 0 200 400 600 800 Generation July 2018 Humies 9

  10. Emergent discovery of Mul---tle solu-ons { } { } { } { } Ms. Pac − Man { } { } 1 { } { } { } { } { } { } 7 0 { } { } { } { } 1 { } { } { } { } { } { } { } { } { } { } 3 0 1 { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } Frostbite Centipede { } { } { } { } { } { } July 2018 Humies 10

  11. Run -me complexity DQN TPG • ≈1.6 million weights in MLP • Single -tle – 71 – 2346 Instruc-ons (avg) • ≈3.2 million convolu-on opera-ons in DNN • Mul- -tle – 413 – 869 Instruc-ons (avg) • 3.2 GHz Intel i7-4700s • 2.2 GHz Intel E5-2650 – 5 decisions per second – Single -tle: • GPU accelera-on • 758-2853 decisions per sec. – 330 decisions per second – Mul---tle • 1832-2922 decisions per sec. July 2018 Humies 11

  12. Ques-ons? July 2018 Humies 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend