Emergent solu-ons to high dimensional mul--task reinforcement - - PowerPoint PPT Presentation

emergent solu ons to high dimensional mul task
SMART_READER_LITE
LIVE PREVIEW

Emergent solu-ons to high dimensional mul--task reinforcement - - PowerPoint PPT Presentation

Humies Compe--on GECCO 2018 Emergent solu-ons to high dimensional mul--task reinforcement learning Stephen Kelly & Malcolm Heywood Why does the result qualify as human compe--ve? Visual State s ( t ) End-of-Evalua-on Game


slide-1
SLIDE 1

“Humies” Compe--on GECCO 2018

Emergent solu-ons to high dimensional mul--task reinforcement learning

Stephen Kelly & Malcolm Heywood

slide-2
SLIDE 2

Why does the result qualify as human compe--ve?

Game -tle:

  • Atari
  • Doom

Game Playing Agent Visual State s(t) Atomic Ac-on a(t) End-of-Evalua-on Game score

July 2018 2 Humies

slide-3
SLIDE 3

Visual RL dominated by Deep learning

  • DQN (2015)

– Visual RL on Atari Learning Environment (49 -tles) – Q-learning with Deep learning – Cropped visual image (84 × 84) – Frame stacking (removes the interleaving of sprites & stochas-c proper-es) – “able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games” [Nature (2015) Vol. 518]

  • Gorila (2015), Double Q (2016), Dueling DL (2016), AC3 (2016),

Noisy DQN (2017), Distribu-onal DQN (2017), Rainbow (2018)

  • One policy per game -tle
  • Learning parameters and DNN topology iden-fied a priori

July 2018 3 Humies

slide-4
SLIDE 4

1 10 100 1000 10000 TPG DQN Gorila Double-DQN H-NEAT

Visual RL Compared to ‘human’

100 (algorithm – rnd)/(Human – rnd)

Human level

Algorithm Beker than Human Algorithm Worse than Human

Best -tle Worst -tle July 2018 4 Humies

Sta-s-cally equivalent Log ( % human )

slide-5
SLIDE 5

Visual RL and Mul--task learning

  • Mul-ple game -tles played by single agent
  • Single -tle DQN provides the baseline
  • Best DNN result needs prior knowledge regarding

parameters and topology

  • Cons-tutes an example of a task pertaining to

‘Ar-ficial General Intelligence’

July 2018 Humies 5

slide-6
SLIDE 6

Mul---tle TPG versus Single--tle DQN

1 10 100 1000 Group 1 Group 2 Group 3

Log (%DQN score) Single Title DQN score

July 2018 6 Humies

Alien Bakle Zone Asteroids Bank Heist Bowling Copper Com. Cen-pede Fishing Derby Kangaroo Frostbite Krull Kung-Fu Ms.Pac-Man Time Pilot Private Eye Beker Worse

slide-7
SLIDE 7

Why [is our entry] ‘best’ in comparison to other entries?

  • Single -tle task

– TPG provides solu-ons compe--ve with human and DQN – Agents have to be compe--ve over mul-ple game

  • tles
  • Mul---tle task

– TPG mul--task solu-on is compe--ve with DQN trained under single -tle sepng – DNN state-of-the-art in single task does not address Mul---tle task

  • TPG for Single -tle task a special case of TPG for Mul--
  • tle task

July 2018 7 Humies

slide-8
SLIDE 8

The ‘icing on the cake’

  • TPG addresses mul-ple issues simultaneously:

– Complexity of topology is emergent and:

  • Highly modular
  • Unique to the task
  • Explicitly reflects a decomposi-on of the task

– No image specific instruc9ons just:

  • Four 2 Argument operators {+, −, ×, ÷}
  • Three 1 Argument operators {log, exp, cosine}
  • One condi-onal operator

– TPG highly efficient computa9onally – Some examples…

July 2018 Humies 8

slide-9
SLIDE 9

Teams (nodes) per graph emerge… [ diko pixels used ]

  • Alien
  • Asteroids
  • Bowling
  • Boxing
  • Ms. Pac−Man
  • Rand

200 400 600 800 1 2 5 10 20 50 100 200 Number of Teams Generation

  • Entire champion policy graph

Visited per decision during test

July 2018 9 Humies

Overall solu-on complexity Per Decision complexity

slide-10
SLIDE 10

Emergent discovery of Mul---tle solu-ons

July 2018 Humies 10

{ } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { }

  • Ms. Pac−Man

Frostbite Centipede

1 3 1 7 1

slide-11
SLIDE 11

Run -me complexity

DQN

  • ≈1.6 million weights in MLP
  • ≈3.2 million convolu-on
  • pera-ons in DNN
  • 3.2 GHz Intel i7-4700s

– 5 decisions per second

  • GPU accelera-on

– 330 decisions per second

TPG

  • Single -tle

– 71 – 2346 Instruc-ons (avg)

  • Mul- -tle

– 413 – 869 Instruc-ons (avg)

  • 2.2 GHz Intel E5-2650

– Single -tle:

  • 758-2853 decisions per sec.

– Mul---tle

  • 1832-2922 decisions per sec.

July 2018 11 Humies

slide-12
SLIDE 12

Ques-ons?

July 2018 Humies 12