Beating Sonic and Knuckles With reinforcement learning And world - PowerPoint PPT Presentation

Beating Sonic and Knuckles With reinforcement learning And world models Michael Clark & Anthony DiPofi A talk for the Perth Machine Learning Group

The project - You can probably recognise the top left pane - But what do the other ones represent? - Let see…

Concepts - I’ll introduce you to these 3 concepts: 1. Reinforcement learning 2. World models 3. Mixture Density Networks

Reinforcement learning

Can be applied in industry Google’s robot arm farm

Can be applied in industry Spica.ai: Cryptocurrency trading - the black line is our RL. Does OK

But... - It needs to train for much longer than humans (not sample efficient) - It “cheats”, by doing unintended things if it can. “But you told me to get rid of the mess” - More reading: “Deep Reinforcement Learning Doesn't Work Yet” https://www.alexirpan.com/2018/02/14/rl-hard.html - If it worked really well... we wouldn’t know how to control it (yet) - I recommend Bostrom’s book Superintelligence (the audiobook) on this topic What are we missing? - Prior experience and memory - Unsupervised learning (without explicit labels) - Meta learning - ???

Cheating….

Yann Lecun’s cake

The Competition ● OpenAI has started a competition to beat Sonic the Hedgehog ● They pay staff 1M but can’t put up prize money :p ● I’m going to beat you “Deep Blockchain Quantum AI” ● https://contest.openai.com/ ● https://contest.openai.com/leaderboard

My approach: World Models ● We talked about this a few weeks ago, perhaps someone can give a summary? ○ Compress visual information ○ Predict the future ○ Act on the prediction ● Why is this interesting? ○ Reinforcement learning struggles ○ This is the “year of unsupervised learning”. ○ Like humans, it would allow artificial intelligence to learn without instruction ○ “World models” does that

World models - we will come back to this slide

World models: (V) A “visual cortex” to reduce dimensionality Z is the “latent vector”

World models: (M) MDN-RNNs ● This part predicts the future. . ● It has two components ○ Recurrent neural network: to predict the future ○ A mixture density network to output multiple probabilities Sean please explain RNN’s :p

Mixture Density Networks (M) - These output mean and standard deviations - e.g. - Means = [1, 2] - Variance = [0.5, 0.7] - But how to measure the error on a distribution? - The loss is the probability density of the true value. - Sampling: - Training: Sampled randomly - Testing: Take the mean

World models: (C) Controller

World models: (C) Controller ● In world models they used evolutionary strategies. But I use” ● “ Proximal Policy Optimization ” ● A policy gradient method ● Continuous action space ● Why? ○ Well tested, reliable, and general ○ Lots of code exists ○ Stockholm syndrome ● https://arxiv.org/abs/1707.06347 https://arxiv.org/abs/1707.06347

PPO: Key insight ● We’re at the black dot, we want to go up. ● Red line - actual performance of policy parameter theta ● Green line - unconstrained loss - a local approximation. But if you go to far away all bets are off ● The blue line is pessimistic, let just make a tiny jump to the top. That way we are always guaranteed to improve and not overshoot! (it’s a surrogate loss penalised with KL divergence, forming a lower bound) ● Expert explanation: https://youtu.be/xvRrgxcpaHY?t=17m27s ○ From “Deep RL Bootcamp” ● https://arxiv.org/abs/1707.06347

The project - You can probably recognise the top left pane - But what do the other ones represent? - Latent vectors, and decoded latent vectors

World models: Summary

Code ● Worked with Anthony DiPofi (Alabama) who I met on reddit.com/r/reinforcementlearning ○ https://github.com/goolulusaurs ● PyTorch: https://github.com/ShangtongZhang/DeepRL <3 ● ~3 Weekends ● ~$200 of compute ● ~10,000 tears later ● ~100,000 hedgehogs were virtually harmed ● It’ll release the code on https://github.com/wassname in a month

Demo: Before training

1 hour of training on first three levels

100k steps of training, ALL levels, 512 latent dims

Final status - I haven’t had time to tweak the controller so it’s only learnt to mash buttons - Competition ends at the end of the month - There seems to be a bug with the predicted latent state when running -

More readings: - Podcasts: - http://lineardigressions.com/episodes/2018/3/11/autoencoders - http://www.thetalkingmachines.com/episodes/strong-ai-and-autoencoders - Audiobook: - Superintelligence: Paths, Dangers, Strategies - Mixture density networks tutorial - https://github.com/hardmaru/pytorch_notebooks/blob/master/mixture_density_networks.ipynb - RL Courses: - Berkeley deep rl bootcamp - David silvers course - Papers: all the papers

Some practical tips - To do joint training I needed a low learning rate and to weight them in order of dependency - The VAE took the longest to train (days), and the most data (300,000 frames). -

Beating Sonic and Knuckles With reinforcement learning And world - PowerPoint PPT Presentation

Beating Sonic and Knuckles With reinforcement learning And world models Michael Clark & Anthony DiPofi A talk for the Perth Machine Learning Group The project - You can probably recognise the top left pane - But what do the other

Beating The Best - The Santander Bank Kaggle Beating The Best - The Santander Bank Kaggle Beating

SONiC: Software for Open Networking in the Cloud Lihua Yuan Microsoft Azure Network Team for

Sonic Automotive And EchoPark Continue To See Stronger Than Expected Recovery Updated Guidance

Pecha Kucha Presentation Script Slide 1 So, Im doing my pecha kucha on Sonic the Hedgehog.

Sonic Automotive Provides Updated Outlook, EchoPark Expanding Nationwide Distribution Network

Beating the No Win Scenario Joe DeVivo @joedevivo Tuesday, 26 March 2013 Beating the No Win

URBAN PLANNING AND THE ROLE OF LISTENING MARCEL COBUSSEN (LEIDEN UNIVERSITY, THE NETHERLANDS)

Sonic Healthcare Financial and Operational Review For the year ended 30 June 2014 Colin

Annual General Meeting 22 November 2017 Agenda Sonic Healthcare Annual General Meeting 2017 -

Fixing the Sound Barrier Three Generations of U.S. Research into Sonic Boom Reduction

General Overview of Nozzles Sebastian Szustkowski 02/22/18 Capillary Vs Nozzle Generally

Sonic the Hedgehog Hanan Alnizami CRA-W DMP Summer 2008 Mentor: Dr. Tiffany Barnes Graduate

Creative technology for youth ARDUINO Robotics & Electronics PROCESSING Visual Arts &

-Beating, dispersion and coupling correction in the LHC R. Toms, R. Calaga, O. Bruning, S.

Lecture 4: Spectrum Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020 Beating

Beating the bookie A look at statistical models for prediction of football matches Helge Langseth

What can Scheme learn from JavaScript? Scheme Workshop 2014 Andy Wingo Me and Scheme Guile

The many classical faces of quantum structures Chris Heunen University of Oxford November 30,

Review: Big O Notation Let T(n) be a function that defines the worst-case running time of an

Salvation Army Westcare Continuing Care evaluation Collaborative youth participation practice

Migrating GNOME to Git Migrating GNOME to Git (a human & technical perspective) Frdric

Turning Livelihoods to Rubbish? Project Workshop 17 February

Statistical physics lecture 4 Szymon Stoma 09-10-2009 Szymon Stoma Statistical physics

Charting the Life Course Supporting Families in their role of Fostering Self-Determination in

Sambuz

Useful Links

Newsletter

Mail Us

Beating Sonic and Knuckles With reinforcement learning And world - PowerPoint PPT Presentation

Beating Sonic and Knuckles With reinforcement learning And world models Michael Clark & Anthony DiPofi A talk for the Perth Machine Learning Group The project - You can probably recognise the top left pane - But what do the other

Beating The Best - The Santander Bank Kaggle Beating The Best - The Santander Bank Kaggle Beating

SONiC: Software for Open Networking in the Cloud Lihua Yuan Microsoft Azure Network Team for

Sonic Automotive And EchoPark Continue To See Stronger Than Expected Recovery Updated Guidance

Pecha Kucha Presentation Script Slide 1 So, Im doing my pecha kucha on Sonic the Hedgehog.

Sonic Automotive Provides Updated Outlook, EchoPark Expanding Nationwide Distribution Network

Beating the No Win Scenario Joe DeVivo @joedevivo Tuesday, 26 March 2013 Beating the No Win

URBAN PLANNING AND THE ROLE OF LISTENING MARCEL COBUSSEN (LEIDEN UNIVERSITY, THE NETHERLANDS)

Sonic Healthcare Financial and Operational Review For the year ended 30 June 2014 Colin

Annual General Meeting 22 November 2017 Agenda Sonic Healthcare Annual General Meeting 2017 -

Fixing the Sound Barrier Three Generations of U.S. Research into Sonic Boom Reduction

General Overview of Nozzles Sebastian Szustkowski 02/22/18 Capillary Vs Nozzle Generally

Sonic the Hedgehog Hanan Alnizami CRA-W DMP Summer 2008 Mentor: Dr. Tiffany Barnes Graduate

Creative technology for youth ARDUINO Robotics &amp; Electronics PROCESSING Visual Arts &amp;

-Beating, dispersion and coupling correction in the LHC R. Toms, R. Calaga, O. Bruning, S.

Lecture 4: Spectrum Mark Hasegawa-Johnson ECE 401: Signal and Image Analysis, Fall 2020 Beating

Beating the bookie A look at statistical models for prediction of football matches Helge Langseth

What can Scheme learn from JavaScript? Scheme Workshop 2014 Andy Wingo Me and Scheme Guile

The many classical faces of quantum structures Chris Heunen University of Oxford November 30,

Review: Big O Notation Let T(n) be a function that defines the worst-case running time of an

Salvation Army Westcare Continuing Care evaluation Collaborative youth participation practice

Migrating GNOME to Git Migrating GNOME to Git (a human &amp; technical perspective) Frdric

Turning Livelihoods to Rubbish? Project Workshop 17 February

Statistical physics lecture 4 Szymon Stoma 09-10-2009 Szymon Stoma Statistical physics

Charting the Life Course Supporting Families in their role of Fostering Self-Determination in

Sambuz

Useful Links

Newsletter

Mail Us

Creative technology for youth ARDUINO Robotics & Electronics PROCESSING Visual Arts &

Migrating GNOME to Git Migrating GNOME to Git (a human & technical perspective) Frdric