Emergent solu-ons to high dimensional mul--task reinforcement - - PowerPoint PPT Presentation

▶

Jun 22, 2023 221 likes •355 views

Humies Compe--on GECCO 2018 Emergent solu-ons to high dimensional mul--task reinforcement learning Stephen Kelly & Malcolm Heywood Why does the result qualify as human compe--ve? Visual State s ( t ) End-of-Evalua-on Game

SLIDE 1

“Humies” Compe--on GECCO 2018

Emergent solu-ons to high dimensional mul--task reinforcement learning

Stephen Kelly & Malcolm Heywood

SLIDE 2

Why does the result qualify as human compe--ve?

Game -tle:

Atari
Doom

Game Playing Agent Visual State s(t) Atomic Ac-on a(t) End-of-Evalua-on Game score

July 2018 2 Humies

SLIDE 3

Visual RL dominated by Deep learning

DQN (2015)

– Visual RL on Atari Learning Environment (49 -tles) – Q-learning with Deep learning – Cropped visual image (84 × 84) – Frame stacking (removes the interleaving of sprites & stochas-c proper-es) – “able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games” [Nature (2015) Vol. 518]

Gorila (2015), Double Q (2016), Dueling DL (2016), AC3 (2016),

Noisy DQN (2017), Distribu-onal DQN (2017), Rainbow (2018)

One policy per game -tle
Learning parameters and DNN topology iden-fied a priori

July 2018 3 Humies

SLIDE 4

1 10 100 1000 10000 TPG DQN Gorila Double-DQN H-NEAT

Visual RL Compared to ‘human’

100 (algorithm – rnd)/(Human – rnd)

Human level

Algorithm Beker than Human Algorithm Worse than Human

Best -tle Worst -tle July 2018 4 Humies

Sta-s-cally equivalent Log ( % human )

SLIDE 5

Visual RL and Mul--task learning

Mul-ple game -tles played by single agent
Single -tle DQN provides the baseline
Best DNN result needs prior knowledge regarding

parameters and topology

Cons-tutes an example of a task pertaining to

‘Ar-ficial General Intelligence’

July 2018 Humies 5

SLIDE 6

Mul---tle TPG versus Single--tle DQN

1 10 100 1000 Group 1 Group 2 Group 3

Log (%DQN score) Single Title DQN score

July 2018 6 Humies

Alien Bakle Zone Asteroids Bank Heist Bowling Copper Com. Cen-pede Fishing Derby Kangaroo Frostbite Krull Kung-Fu Ms.Pac-Man Time Pilot Private Eye Beker Worse

SLIDE 7

Why [is our entry] ‘best’ in comparison to other entries?

Single -tle task

– TPG provides solu-ons compe--ve with human and DQN – Agents have to be compe--ve over mul-ple game

tles
Mul---tle task

– TPG mul--task solu-on is compe--ve with DQN trained under single -tle sepng – DNN state-of-the-art in single task does not address Mul---tle task

TPG for Single -tle task a special case of TPG for Mul--
tle task

July 2018 7 Humies

SLIDE 8

The ‘icing on the cake’

TPG addresses mul-ple issues simultaneously:

– Complexity of topology is emergent and:

Highly modular
Unique to the task
Explicitly reflects a decomposi-on of the task

– No image specific instruc9ons just:

Four 2 Argument operators {+, −, ×, ÷}
Three 1 Argument operators {log, exp, cosine}
One condi-onal operator

– TPG highly efficient computa9onally – Some examples…

July 2018 Humies 8

SLIDE 9

Teams (nodes) per graph emerge… [ diko pixels used ]

Alien
Asteroids
Bowling
Boxing
Ms. Pac−Man
Rand

200 400 600 800 1 2 5 10 20 50 100 200 Number of Teams Generation

Entire champion policy graph

Visited per decision during test

July 2018 9 Humies

Overall solu-on complexity Per Decision complexity

SLIDE 10

Emergent discovery of Mul---tle solu-ons

July 2018 Humies 10

{ } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { } { }

Ms. Pac−Man

Frostbite Centipede

1 3 1 7 1

SLIDE 11

Run -me complexity

DQN

≈1.6 million weights in MLP
≈3.2 million convolu-on
pera-ons in DNN
3.2 GHz Intel i7-4700s

– 5 decisions per second

GPU accelera-on

– 330 decisions per second

TPG

Single -tle

– 71 – 2346 Instruc-ons (avg)

Mul- -tle

– 413 – 869 Instruc-ons (avg)

2.2 GHz Intel E5-2650

– Single -tle:

758-2853 decisions per sec.

– Mul---tle

1832-2922 decisions per sec.

July 2018 11 Humies

SLIDE 12

Ques-ons?

July 2018 Humies 12