Neural Networks and their Application to Go Neural Networks - PowerPoint PPT Presentation

Neural Networks and their Application to Go A. Bausch Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training neural networks Problems AlphaGo Anne-Marie Bausch The Game of Go Policy Network Value Network ETH, D-MATH Monte Carlo Tree Search May 31, 2016

Table of Contents Neural Networks and their Application to Go 1 Neural Networks A. Bausch Theory Neural Training neural networks Networks Problems Theory Training neural networks Problems AlphaGo 2 AlphaGo The Game of Go Policy Network The Game of Go Value Network Monte Carlo Policy Network Tree Search Value Network Monte Carlo Tree Search

Perceptron A perceptron is the most basic artificial neuron (developed in Neural Networks and the 1950s and 1960s). their Application to Go A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network The input X ∈ R n , w 1 , . . . , w n ∈ R are called weights and the Value Network Monte Carlo Tree Search output Y ∈ { 0 , 1 } . The output depends on some treshold value τ :  W · X = � 0 , if w j x j ≤ τ,   j output = W · X = � 1 , if w j x j > τ.   j

Bias Next, we introduce what is known as the perceptron’s bias B , Neural Networks and their B := − τ. Application to Go This gives us a new formula for the output, A. Bausch Neural � 0 , if W · X + B ≤ 0 , Networks output = Theory 1 , if W · X + B > 0 . Training neural networks Problems AlphaGo Example The Game of Go Policy Network Value Network NAND gate: Monte Carlo Tree Search

Sigmoid Neuron Neural Problem: Small change in input can change output a lot Networks and their → Solution: Sigmoid Neuron Application to Go Input X ∈ R n A. Bausch Output = σ ( X · W + B ) = (1 + exp ( − X · W − B )) − 1 ∈ Neural 1 � � 0 , 1 , where σ ( z ) := 1+ exp ( − z ) is called the sigmoid Networks Theory function . Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search

Neural Networks Neural Given an input X , as well as some training and testing data, we Networks and their want to find a function f W , B such that f W , B : X → Y , where Application to Go Y denotes the output. A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search How do we choose the weights and the bias?

Example: XOR Gate Neural Networks and their Application to Go A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search

Learning Algorithm Neural Networks and their Application to Go A. Bausch A learning algorithm chooses weights and biases without Neural Networks interference of programmer. Theory Training neural Smoothness in σ : networks Problems δ output ∆ w j + δ output AlphaGo � ∆output ≈ ∆ B The Game of Go δ w j δ B Policy Network Value Network j Monte Carlo Tree Search

How to update weights and bias Neural Networks and How does the learning algorithm update the weights (and the their Application to bias)? Go A. Bausch Neural Networks Theory Training neural argmin W , B � f W , B ( X ) − Y � 2 networks Problems AlphaGo The Game of Go Policy Network Value Network → One method to do this is gradient descent Monte Carlo Tree Search → Choose appropriate learning rate! Example Digit Recognition (1990s) → Youtube Video

Example Neural Networks and their One image consists of 28x28 pixels which explains why the Application to Go input layer has 784 neurons A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search

3 main types of learning Neural Networks and their Application to Supervised Learning (SL) Go → Learning some mapping from inputs to outputs. A. Bausch Example : Classifying Digits Neural Networks Unsupervised Learning (UL) Theory Training neural → Given input and no output, what kinds of patterns can networks Problems you find? Example : Visual input is at first too complex, AlphaGo have to reduce number of dimensions The Game of Go Policy Network Value Network Reinforcement Learning (RL) Monte Carlo Tree Search → Learning method interacts with its environment by producing actions a 1 , a 2 , . . . that produce rewards or punishments r 1 , r 2 , . . . . Example : Human learning

Why was there a recent boost in the employment of neural networks? Neural Networks and their Application to Go The evolution of neural networks stagnated because networks A. Bausch with more than 2 hidden layers proved to be too difficult. The Neural main problems and their solutions are: Networks Theory Huge amount of Data Training neural networks → Big Data Problems AlphaGo Number of weights (capacity of computers) The Game of Go Policy Network → capacity of computers improved (Parallelism, GPUs) Value Network Monte Carlo Tree Search Theoretical limits → Difficult ( ⇒ See next slide)

Theoretical Limits Neural Networks and their Application to Back-propagated error signals either shrink rapidly Go (exponentially in the number of layers) or grow out of A. Bausch bounds Neural 3 solutions: Networks Theory Training neural (a) unsupervised pre-training ⇒ faciliates subsequent networks Problems supervised credit assignment through AlphaGo back-propagation (1991). The Game of Go Policy Network (b) LSTM-like networks (since 1997) avoid problem Value Network Monte Carlo Tree Search through special architecture. (c) Today, fast GPU-based computers allow for propagating errors a few layers further down within reasonable time

The Game of Go Neural Networks and their Application to Go A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search

Main rules Neural Origin: Ancient China more than 2500 years ago Networks and their Goal: Gain the most points Application to Go White gets 6.5 points for moving second A. Bausch Get points for territory at the end of game Neural Get points for prisoners Networks Theory → Stone is captured if it has no more liberties (liberties Training neural networks are “supply chains”) Problems AlphaGo Not allowed to commit suicide The Game of Go Policy Network Ko-Rule: Not allowed to play such that game is again as Value Network Monte Carlo before Tree Search

End of Game Neural Networks and The game is over when both players have passed consecutively their Application to → Prisonners are removed and points are counted! Go A. Bausch Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search

AlphaGo Neural Networks and DeepMind was founded in 2010 as a startup in Cambridge their Application to Google bought DeepMind for $500M in 2014 Go A. Bausch AlphaGo beat European Go champion Fan Hui (2-dan) in October 2015 Neural Networks Theory AlphaGo beat Lee Sedol (9-dan), one of the best players in Training neural networks the world in March 2016 (4 out of 5 games) Problems AlphaGo Victory of AI in Go was thought to be 10 years into the The Game of Go future Policy Network Value Network Monte Carlo 1920 CPUs and 280 GPUs used during match against Lee Tree Search Sedol → This equals around $1M without counting the electricity used for training and playing Next Game attacked by Google DeepMind: Starcraft

AlphaGo Difficulty: Search space of future Go moves is larger than the Neural Networks and number of particles in the known universe their Application to Go Policy Network A. Bausch Value Network Monte Carlo Tree Search (MCTS) Neural Networks Theory Training neural networks Problems AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search

Policy Network Part 1 Neural Multi-Layered Neural Network Networks and their Supervised-learning (SL) Application to Go Goal: Look at board position and choose next best move A. Bausch (does not care about winning, just about next move) Neural is trained on millions of example moves made by strong Networks human players on KGS (Kiseido Go Server) Theory Training neural networks it matches strong human players about 57% of time Problems (mismatches arenot necessarily mistakes) AlphaGo The Game of Go Policy Network Value Network Monte Carlo Tree Search

Policy Network Part 2 Neural Networks and their Application to 2 additional versions of policy networks: A stronger move Go picker and a faster move picker A. Bausch Stronger version uses RL Neural Networks → trained more intensively by playing game to the end (is Theory trained by millions of training games against previous Training neural networks Problems editions of itself, it does no reading, i.e., it does not try to AlphaGo simulate any future moves) The Game of Go Policy Network → needed for creating enough training data for value Value Network Monte Carlo network Tree Search Faster version is called “rollout network” → does not look at entire board but at smaller window around previous move → about 1000 times faster!

Neural Networks and their Application to Go Neural Networks - PowerPoint PPT Presentation

Neural Networks and their Application to Go A. Bausch Neural Networks and their Application to Go Neural Networks Learning Blackjack Theory Training neural networks Problems AlphaGo Anne-Marie Bausch The Game of Go Policy Network

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Neural Networks and Handwriting Recognition Background Neural Networks Neural Network Steven

Neural Networks Neural networks arise from attempts to model Neural Networks human/animal

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

Neural Information Retrieval Wassila Lalouani 1 Plan Neural network architectures Neural

Neural Networks 0. Logistics Spring 2019 1 Neural Networks are taking over! Neural networks

Neural Networks 1. Introduction Fall 2017 Neural Networks are taking over! Neural networks

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

CHAPTER II III I CHAPTER Neural Networks as Neural Networks as Associative Memory

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Neural Networks Neural Net Basics Dan Klein, John DeNero UC Berkeley Slides adapted from Greg

Relaxation and Hopfield Networks Neural Networks Neural Networks - Hopfield 1 Bibliography

Neural Networks 1. Introduction Spring 2020 1 Neural Networks are taking over! Neural

Neural Networks 1. Introduction Spring 2019 1 Neural Networks are taking over! Neural

Introduction to Artificial Intelligence Neural Networks - Deep Learning for NLP Janyl Jumadinova

(Very) Brief Introduction to Neural Networks IITP-03 Algorithms for NLP 1 / 31 Learning

Network Policy Controller in Weave Net Blocking unwanted network traffic in Kubernetes Bryan

CSC321 Lecture 22: Q-Learning Roger Grosse Roger Grosse CSC321 Lecture 22: Q-Learning 1 / 21

Integrity Policies CSE497b - Spring 2007 Introduction Computer and Network Security Professor

Deep Reinforcement Learning [Mastering the Game of Go with Deep Reinforcement Learning and Tree

Deep Reinforcement Learning Philipp Koehn 18 April 2019 Philipp Koehn Artificial Intelligence:

How to address Polo? Grammatically correct Prof. Chau Dr. Chau Grammatically incorrect, but

August 5, 2014 1 DISCLAIMER This presentation includes time sensitive information that may be

MARKET SIZE Governments will SPEND billions in smart cities A trillion dollar market in