AlphaGo 2/17/17 Video https://www.youtube.com/watch?v=g-dKXOlsf98 - PowerPoint PPT Presentation

Jan 12, 2023 •271 likes •412 views

AlphaGo 2/17/17 Video https://www.youtube.com/watch?v=g-dKXOlsf98 Figure from the AlphaGo Paper neural networks regular MCTS AlphaGo Neural Networks Tree Policy Default Policy Step 1: learn to predict human moves Used a large database

AlphaGo 2/17/17
Video https://www.youtube.com/watch?v=g-dKXOlsf98
Figure from the AlphaGo Paper neural networks regular MCTS
AlphaGo Neural Networks Tree Policy Default Policy
Step 1: learn to predict human moves • Used a large database of online expert games. • Learned two versions of the neural network: • A fast network P 𝜌 for use in evaluation. • An accurate network P 𝜏 for use in CS63 topic selection. neural networks weeks 8–9
Step 2: improve P 𝜏 (accurate network) • Run large numbers of self-play games. • Update P 𝜏 using reinforcement learning • weights updated by stochastic gradient descent CS63 topic reinforcement learning weeks 6-7 CS63 topic stochastic gradient descent week 3
Step 3: learn a better boardEval V 𝜄 • use random samples from the self-play database • prediction target: probability that black wins from a given board CS63 topic avoiding overfitting weeks 9-10
AlphaGo Tree Policy (selection) • select nodes randomly according to weight: • prior is determined by the improved policy network
AlphaGo Default Policy (simulation) When expanding a node, its initial value combines: • an evaluation from value network V 𝜄 • a rollout using fast policy P 𝜌 A rollout according to P 𝜌 selects random moves with the estimated probability a human would select them instead of uniformly randomly.
AlphaGo Results • Played Fan Hui (October 2015) • World #522. • AlphaGo won 5-0. • Played Lee Sedol (March 2016) • World #5, previously world #1 (2007-2011). • AlphaGo won 4-1. • Played against top pros (Dec 2016 – Jan 2017) • Included games against the word #1-4. • Games played online with short time limits. • AlphaGo won 60-0.
MCTS vs Bounded Min/Max UCT / MCTS MinMax/Backward Induction • Optimal once the entire • Optimal with infinite tree is explored or pruned. rollouts. • Can prove the outcome of • Anytime algorithm (can the game. give an answer • Can be made anytime-ish immediately, improves its with iterative deepening. answer with more time). • A heuristic is required • A heuristic is not unless the game tree is required, but can be used small. if available. • Hard to use on incomplete • Handles incomplete information games. information gracefully.
Discussion: why use MCTS for go? • We’re using MCTS in lab because we don’t want to write new heuristics for every game. • AlphaGo is all about heuristics. They’re learned by neural networks, but they’re still heuristics. • MCTS handles randomness and incomplete information better than Min/Max. • Go is a deterministic, perfect information game. So why does MCTS make so much sense for go?

Recommend

If Mathematical Proof is a Game, What are the States and Moves? David McAllester 1 AlphaGo Fan

If Mathematical Proof is a Game, What are the States and Moves? David McAllester 1 AlphaGo Fan (October 2015) AlphaGo Defeats Fan Hui, European Go Champion. 2 AlphaGo Lee (March 2016) 3 AlphaGo Zero vs. Alphago Lee (April 2017) AlphaGo

1.13k views • 36 slides

Current AI A quick summary Issam June 22, 2016 University of British Columbia AlphaGo 1

Current AI A quick summary Issam June 22, 2016 University of British Columbia AlphaGo 1 AlphaGo has beaten human Figure 1: Playing against human Champion Lee Sedol 4-1 Lee Sedol is a 9 dan professional Korean Go champion who won 27

721 views • 24 slides

AI and Security: Lessons, Challenges & Future Directions Dawn Song UC Berkeley AlphaGo:

AI and Security: Lessons, Challenges & Future Directions Dawn Song UC Berkeley AlphaGo: Winning over World Champion Source: David Silver Achieving Human-Level Performance on ImageNet Classification Source: Kaiming He Deep Learning

919 views • 79 slides

AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 remaining) new game0.py

AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 remaining) new game0.py with show_values for debugging Exam on Tuesday in lab I sent out a topics list last night. On Monday in lecture, well be doing review

699 views • 15 slides

A Deep Journey of Playing Games with RL NSE Seminar Kim Hammar kimham@kth.se January 31, 2020

Intro AlphaGo AlphaGo Zero AlphaZero Summary A Deep Journey of Playing Games with RL NSE Seminar Kim Hammar kimham@kth.se January 31, 2020 1 / 41 2 / 41 Why Combine the two? Branching factor = 3 VA

973 views • 63 slides

SURGICAL:AI AI Democratise Computer Aided Surgery WHY NOT GOOGLE MAP AMAZON UBER FACEBOOK

SURGICAL:AI AI Democratise Computer Aided Surgery WHY NOT GOOGLE MAP AMAZON UBER FACEBOOK ALPHAGO FOR SURGERY SURGERY CRISIS OA Affects 1 in 5 (US), Healthcare Cost $30 Trillions 1 in 6 (China) (US 23% GDP) OA cost 1%-2.5% GDP,

613 views • 22 slides

Growth What Happened? Why is our market share down in laundry? Data Technicians Only You New

Growth What Happened? Why is our market share down in laundry? Data Technicians Only You New questions take days or weeks to answer Where can I buy some laundry detergent? March 2016 Google DeepMind's AlphaGo defeated Lee Sedol 41.

258 views • 22 slides

Industrial Machine Intelligence The Golden Braid of Data Streams, AI, and Human Expertise Drew

Industrial Machine Intelligence The Golden Braid of Data Streams, AI, and Human Expertise Drew Conway Machine Learning in Oil and Gas, Canada Google releases Data Science Venn AlphaGo defeats MapReduce paper Diagram published Lee

633 views • 25 slides

Reinforcement Learning: Not Just for Robots and Games Jibin Liu Joint work with Giles Brown,

Reinforcement Learning: Not Just for Robots and Games Jibin Liu Joint work with Giles Brown, Priya Venkateshan, Yabei Wu, and Heidi Lyons Fig 1 . AlphaGo. Source: deepmind.com Fig 3 . Training robotic arm to reach target locations in the real

431 views • 22 slides

Neural Networks Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net

2019 CS420 Machine Learning, Lecture 4 Neural Networks Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html Breaking News of AI in 2016 AlphaGo wins Lee Sedol (4-1)

971 views • 74 slides

Man vs. Machine in Conversational Speech Recognition George Saon IBM Research AI Deep Blue vs.

Man vs. Machine in Conversational Speech Recognition George Saon IBM Research AI Deep Blue vs. Garry Kasparov, 1997 Man vs. Machine in Conversational Speech Recognition ASRU 2017, Okinawa AlphaGo vs. Lee Sedol, 2016 Man vs. Machine in

874 views • 44 slides

Learning Dexterity Peter Welinder SEPTEMBER 09, 2018 Learning Trends towards learning-based

Learning Dexterity Peter Welinder SEPTEMBER 09, 2018 Learning Trends towards learning-based robotics Reinforcement Learning Go (AlphaGo Zero) Dota 2 (OpenAI Five) What about Robotics? RL doesnt work because it uses lots of experience. 5

749 views • 35 slides

Mastering the game of Go with deep neural networks and tree search Nature, Jan, 2016 Roadmap

Mastering the game of Go with deep neural networks and tree search Nature, Jan, 2016 Roadmap What this paper is about? Deep Learning Search problem How to explore a huge tree (graph) AlphaGo Video

556 views • 51 slides

LeelaChessZero Open Source Community (F. Huizinga) Overview What is Lc0? The GameTree

LeelaChessZero Open Source Community (F. Huizinga) Overview What is Lc0? The GameTree and A0 in a nutshell Contribute Useful links Technical details What is Lc0? 2016 Deepminds AlphaGo 2017 AlphaZero

476 views • 21 slides

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm by

Mastering Chess and Shogi by Self- Play with a General Reinforcement Learning Algorithm by Silver et al Published by Google Deepmind Presented by Kira Selby Background u In March 2016, Deepminds AlphaGo program was the first computer Go

587 views • 31 slides

Towards a Foundation of Deep Learning: SGD, Overparametrization, and Generalization Jason D. Lee

Towards a Foundation of Deep Learning: SGD, Overparametrization, and Generalization Jason D. Lee University of Southern California January 29, 2019 Jason Lee Successes of Deep Learning Game-playing (AlphaGo, DOTA, King of Glory) Computer

665 views • 48 slides

Time-optimal Strategies for Infinite Games Martin Zimmermann RWTH Aachen University March 10th,

Time-optimal Strategies for Infinite Games Martin Zimmermann RWTH Aachen University March 10th, 2010 DIMAP Seminar Warwick University, United Kingdom Martin Zimmermann RWTH Aachen University Time-optimal Strategies for Infinite Games 1/32

1.02k views • 88 slides

The Gizmo Player Simon Doll Jan Kopcsek Alper Tunga Dresden, 13.02.2008 Finding a heuristic

Fakulttsname Informatik Fachrichtung Informatik Institutsname Intelligente Systeme The Gizmo Player Simon Doll Jan Kopcsek Alper Tunga Dresden, 13.02.2008 Finding a heuristic function Two ways for learning a heuristic function:

614 views • 23 slides

KR-Techniques for General Game Playing Michael Thielscher Roadmap 1. General Game Playing a

KR-Techniques for General Game Playing Michael Thielscher Roadmap 1. General Game Playing a Grand AI Challenge 2. KR-Aspects Formalizing game rules: Compact representations of state machines Challenge I: Mapping game descriptions to

651 views • 62 slides

Bev Barlow | Ofsted Update www.GMLPN.co.uk Ofsted update Greater Manchester provider network

Bev Barlow | Ofsted Update www.GMLPN.co.uk Ofsted update Greater Manchester provider network Bev Barlow Senior Her Majestys Inspector 29 November 2017 Slide 2 Overall effectiveness of further education and skills providers at their most

317 views • 27 slides

Linear Arithmetic Satisfjability via Strategy Improvement July 13, 2016 Azadeh Farzan 1 Zachary

Linear Arithmetic Satisfjability via Strategy Improvement July 13, 2016 Azadeh Farzan 1 Zachary Kincaid 1 , 2 1 University of Toronto 2 Princeton University SMT solvers handle the ground fragment. Techniques for quantifiers: Quantifier

931 views • 38 slides

Supporting adaptive classroom behaviour in students who are new refugees Alison McInnes, Ph.D.

Supporting adaptive classroom behaviour in students who are new refugees Alison McInnes, Ph.D. Department of Educational Psychology Faculty of Education, University of Alberta October 19, 2016 Creating Welcoming Communities Webinar Series

239 views • 9 slides

Support Vector Machines & Kernels Lecture 5 David Sontag New York University Slides adapted

Support Vector Machines & Kernels Lecture 5 David Sontag New York University Slides adapted from Luke Zettlemoyer and Carlos Guestrin Support Vector Machines QP form: More natural form: Equivalent if Regularization Empirical loss

489 views • 19 slides

COMP 516 Research Methods in Computer Science Dominik Wojtczak Department of Computer Science

COMP 516 Research Methods in Computer Science Dominik Wojtczak Department of Computer Science University of Liverpool 1 / 66 COMP 516 Research Methods in Computer Science Lecture 2: What is Research? Dominik Wojtczak Department of

914 views • 66 slides