Chrome Dino DQN AU T H T H O R : G E O RG E M A RG A R I T I S - PowerPoint PPT Presentation

Chrome Dino DQN AU T H T H O R : G E O RG E M A RG A R I T I S I N S N ST RU C T U C TO R : P RO F. M L AG O U DA K I S C O U R U R S E S E : C O M P 513 , AUTO N O M O US AG E N T S S C H O O L : E C E , T E C HN I C A L UN I VE R S I T Y O F C R E T E P E R I E R I O D O D: FA L L S E M E ST E R , 2 019 - 2 0 2 0

Overview  What is Chrome Dino?  Model  Deep Q Learning  Implementation  Results  Conclusions  Pros – Cons  Future Work  References

What is Chrome Dino?  2D Arcade Game created by Google for Chrome  Designed as an “Easter Egg” game for when there is no internet in Chrome  Player: A little Dino  Task: The player controls a dino and can either jump or duck at any specific time. The goal is to avoid as many obstacles as you can in order to maximize your score. As time progresses, the game becomes more difficult as the environment moves faster and more obstacles appear.

What is Chrome Dino?

Model State space -> Very Large:  Each state -> Represented by 4 frames of 84x84 binary images Actions:  Do nothing  Jump  Duck Rewards:  +0.1 in every frame the Dino is alive  -1 when the Dino dies

Deep Q Learning

Implementation  The game is run on a browser simulator (Selenium)  Python uses a chrome webriver to communicate with selenium and play the game  Our DQN model is implemented in Tensorflow 2.0  The agent interacts with the environment and the environment returns (s,a,r,s’) where:  s: current state (4x84x84 matrix)  a: action (0 for do nothing, 1 for jump, 2 for duck)  r: numeric reward  s’: new state

Implementation For better results and smoother training, our agent uses:  Experience replay: • Transitions are used in batches of past experiences for training • Same transition used multiple times to improve learning  Target Network: • Use se o of 2 2 ne networks: Target network to estimate target Q value and Policy network to get Q values • Increases training stability

Results In our results we tested 2 different models:  Model 1: without duck action (learning rate = 10 −3 )  Model 2: with duck action (learning rate = 10 −4 ) For those models, we measured every 20 episodes (games):  The ma maximum mum score of the last 20 episodes  The average score of the last 20 episodes  The mi mini nimum mum score of the last 20 episodes Then we smoothed the curves in order to better observe the trend

Results (max score) 350 300 Model 1 (No duck) 250 Model 2 (Duck) 200 150 100 50 0 0 1k 2k 3k 4k 5k 6k 7k 8k 9k 10k 11k Episodes

Results (avg score) 140 120 Model 1 (No duck) 100 Model 2 (Duck) 80 60 40 20 0 0 1k 2k 3k 4k 5k 6k 7k 8k 9k 10k 11k Episodes

Results (min score) 47 46 Model 1 (No duck) 45 Model 2 (Duck) 44 43 42 41 0 1k 2k 3k 4k 5k 6k 7k 8k 9k 10k 11k Episodes

Conclusions By reducing the learning rate and allowing duck action we can observe:  Slower convergence BUT UT  Better and more consistent results Observation on: Using duck action, our agent discovers a hidden strategy:  Jump  While in the air, hit duck to descend:  Minimizes air-time  Returns to ground where the agent has more control -> Avoids more obstacles

Pros - Cons Advan antag tages:  Can be used without any domain specific knowledge or assumptions about the environment  The exact same model can be used to beat many different games when trained in a different environment Disad advan antag ages:  Slow learning ability: A lot of time taken for training (1 or 2 days) •  Scores between near episodes not very consistent: • Increased score variation in near episodes

Future Work Try to improve DQN using:  Better hyperparameter tuning  Double DQN  Prioritized Experience Replay  Dueling DQN Try different approaches:  Use statistics (dino height, distance to next obstacle e.t.c) instead of images -> Already implemented in code (enabled using flag ---use-statistics)  Use NEAT (Neuro Evolution of Augmented Topologies) instead of DQN in conjunction with statistics -> Will yield better results

Code The source code is available on GitHub with documentation: https://github.com/margaeor/dino-dqn

References  Atari DQN - Paper  Intro to Deep RL – Article  Intro to DQN - Article  DQN Hands On - Article  DQN by sentdex - Video  courses – lectures

Chrome Dino DQN AU T H T H O R : G E O RG E M A RG A R I T I S - PowerPoint PPT Presentation

Chrome Dino DQN AU T H T H O R : G E O RG E M A RG A R I T I S I N S N ST RU C T U C TO R : P RO F. M L AG O U DA K I S C O U R U R S E S E : C O M P 513 , AUTO N O M O US AG E N T S S C H O O L : E C E , T E C HN I C A L UN I

5. Note two things, the box now says, Added to Chrome. Notice the Blue box with an S (for Snagit)

Revisiting the Chrome Extension Permissions Model Pranav Prakash, Chester Leung 1 Courtesy of

Running Android in a Container How the play store runs on Chrome OS How Android Runs On Chrome

CS 287 Lecture 19 (Fall 2019) Off-Policy, Model-Free RL: DQN, SoftQ , DDPG, SAC Pieter Abbeel

Chrome OS Internals Josh Triplett josh@joshtriplett.org LinuxCon Europe 2014 Josh Triplett

A Safe and Stateless Platform - Introduction to Google Chrome OS Security model Google Chrome

Hexavalent Chrome Elimination from Hard Chrome Surface Finishing October 1, 2015 SERDP &

Hex Chrome Exposures from Hex Chrome Exposures from Metal Cladding Work in a Boiler O t b

Clear Chrome Browsing Data 1. Click on the 3 dots on the Chrome Browser and select Settings 2.

Security Architecture Presenter: Jienan Liu Network, Intelligence & security Lab outline

Loophole: Timing Attacks on Shared Event Loops in Chrome Pepe Vila & Boris Kpf IMDEA

Accessing Google at home Parent s Handbook For laptops (if you have a Chrome device, log into

Introduction Presenter: Jienan Liu Network, Intelligence & security Lab What is Chrome

DINO CPU A T EACHING -F OCUSED RISC-V D ESIGN IN C HISEL Jason Lowe-Power @JasonLowePower

Pizza palaver Chef Dino is planning to open a new Pizza restaurant. He needs your help.

Design, Constraints and Integrity Dino Dini Lecturer Video Game Programming, IGAD Faculty

D REW , 1983) Following Drew (1983) ( k k ) + ( k k v k ) = 0 t (

HIGHER EFFICIENCY ENGINE WITH ULTRA-LOW EMISSIONS FOR SHIPS A short presentation HERCULES-B

Financial Aid 101 Paying for Postsecondary Education Dista Distanc nce e from from ho home

Pursuit Curves Molly Severdia May 15, 2008 Molly Severdia Pursuit Curves Assumptions y ( x 0

Acknowledgements Our Major Partners U.S. Fish and Wildlife Service North American Wetlands

Albion Park - Flood Focus Group Duck Creek Flood Study 1 Duck Creek Catchment Area 2 Duck Creek

- Trends and lessons Presented by Peter Backman 5 June 2015 Things Ill talk about The

ABOUT US Health Outreach is a group of volunteer health care providers dedicated to overseas

Chrome Dino DQN AU T H T H O R : G E O RG E M A RG A R I T I S - PowerPoint PPT Presentation

Chrome Dino DQN AU T H T H O R : G E O RG E M A RG A R I T I S I N S N ST RU C T U C TO R : P RO F. M L AG O U DA K I S C O U R U R S E S E : C O M P 513 , AUTO N O M O US AG E N T S S C H O O L : E C E , T E C HN I C A L UN I

5. Note two things, the box now says, Added to Chrome. Notice the Blue box with an S (for Snagit)

Revisiting the Chrome Extension Permissions Model Pranav Prakash, Chester Leung 1 Courtesy of

Running Android in a Container How the play store runs on Chrome OS How Android Runs On Chrome

CS 287 Lecture 19 (Fall 2019) Off-Policy, Model-Free RL: DQN, SoftQ , DDPG, SAC Pieter Abbeel

Chrome OS Internals Josh Triplett josh@joshtriplett.org LinuxCon Europe 2014 Josh Triplett

A Safe and Stateless Platform - Introduction to Google Chrome OS Security model Google Chrome

Hexavalent Chrome Elimination from Hard Chrome Surface Finishing October 1, 2015 SERDP &amp;

Hex Chrome Exposures from Hex Chrome Exposures from Metal Cladding Work in a Boiler O t b

Clear Chrome Browsing Data 1. Click on the 3 dots on the Chrome Browser and select Settings 2.

Security Architecture Presenter: Jienan Liu Network, Intelligence &amp; security Lab outline

Loophole: Timing Attacks on Shared Event Loops in Chrome Pepe Vila &amp; Boris Kpf IMDEA

Accessing Google at home Parent s Handbook For laptops (if you have a Chrome device, log into

Introduction Presenter: Jienan Liu Network, Intelligence &amp; security Lab What is Chrome

DINO CPU A T EACHING -F OCUSED RISC-V D ESIGN IN C HISEL Jason Lowe-Power @JasonLowePower

Pizza palaver Chef Dino is planning to open a new Pizza restaurant. He needs your help.

Design, Constraints and Integrity Dino Dini Lecturer Video Game Programming, IGAD Faculty

D REW , 1983) Following Drew (1983) ( k k ) + ( k k v k ) = 0 t (

HIGHER EFFICIENCY ENGINE WITH ULTRA-LOW EMISSIONS FOR SHIPS A short presentation HERCULES-B

Financial Aid 101 Paying for Postsecondary Education Dista Distanc nce e from from ho home

Pursuit Curves Molly Severdia May 15, 2008 Molly Severdia Pursuit Curves Assumptions y ( x 0

Acknowledgements Our Major Partners U.S. Fish and Wildlife Service North American Wetlands

Albion Park - Flood Focus Group Duck Creek Flood Study 1 Duck Creek Catchment Area 2 Duck Creek

- Trends and lessons Presented by Peter Backman 5 June 2015 Things Ill talk about The

ABOUT US Health Outreach is a group of volunteer health care providers dedicated to overseas

Hexavalent Chrome Elimination from Hard Chrome Surface Finishing October 1, 2015 SERDP &

Security Architecture Presenter: Jienan Liu Network, Intelligence & security Lab outline

Loophole: Timing Attacks on Shared Event Loops in Chrome Pepe Vila & Boris Kpf IMDEA

Introduction Presenter: Jienan Liu Network, Intelligence & security Lab What is Chrome