alphago etc lab 4
play

AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 - PowerPoint PPT Presentation

AlphaGo, etc. Lab 4 Due Feb. 29 (you have two weeks 1.5 remaining) new game0.py with show_values for debugging Exam on Tuesday in lab I sent out a topics list last night. On Monday in lecture, well be doing review


  1. AlphaGo, etc.

  2. Lab 4 ● Due Feb. 29 (you have two weeks … 1.5 remaining) ● new game0.py with show_values for debugging

  3. Exam on Tuesday in lab ● I sent out a topics list last night. ● On Monday in lecture, we’ll be doing review problems, plus Q&A. ○ We’ll also do Q&A at the end today if there’s time. ○ I plan to send out review problems over the weekend. What sorts of questions will be on the exam? ● selecting an appropriate algorithm for various problems ○ state space search vs. local search; BFS vs. A*; minimax vs. MCTS... ● setting up an appropriate model for the problem and algorithm ○ generating neighbors; identifying a goal; describing utilities; choosing a heuristic... ● stepping through algorithms ○ identify the next state; list the order nodes are expanded; eliminate dominated strategies...

  4. AlphaGo neural networks normal MCTS

  5. AlphaGo neural networks evaluation selection evaluation

  6. CS63 topic neural networks Step 1: learn to predict human moves week 7, 14? ● used a large database of online expert games ● learned two versions of the neural network ○ a fast network P � for use in evaluation ○ an accurate network P � for use in selection

  7. CS63 topic reinforcement Step 2: improve the accurate network learning weeks 9-10 ● run large numbers of self-play games CS63 topic stochastic ● update the network using reinforcement learning gradient ascent week 3 ○ weights updated by stochastic gradient ascent

  8. Step 3: learn a board evaluation network, V � ● use random samples from the self-play database ● prediction target: probability that black wins from a given board

  9. AlphaGo tree policy select nodes randomly according to weight: prior is determined by the improved policy network P �

  10. AlphaGo default policy When expanding a node, its initial value combines: ● an evaluation from value network V � ● a rollout using fast policy P � A rollout according to P � selects random moves with the estimated probability a human would select them instead of uniformly randomly.

  11. AlphaGo results ● Beat a low-rank professional player (Fan Hui) 5 games to 0. ● Will take on a top professional player (Lee Sedol) March 8-15 in Seoul. ● There are good reasons to think AlphaGo may lose: ○ AlphaGo’s estimated ELO rating is lower than Lee’s. ○ Professionals who analyzed AlphaGo’s moves don’t think it can win. ○ Deep Blue lost to Kasparov on its first attempt after beating lower-ranked grandmasters.

  12. Transforming normal to extensive form Key idea: represent simultaneous moves with information sets. 1 2 A B A B 2 2 A 5,5 2,8 1 A B A B B 1,3 3,0 (5,5) (2,8) (1,3) (3,0)

  13. Transforming extensive to normal form 2 Key idea: strategies are complete policies, specifying an L R action for every information set. LLL 1,2 4,4 1 LLR 1,2 4,4 1 L R LRL 0,3 4,4 1 LRR 0,3 4,4 2 2 L R RLL 1,4 3,2 2 3 1 R L 1 RLR 1,4 0,0 L R L R RRL 1,4 3,2 1,2 0,3 4,4 1,4 3,2 0,0 RRR 1,4 0,0

  14. DESIGN DIMENSIONS Improvements Utility - modularity - iterative deepening - preferences - representation scheme - branch and bound, IDA* - expected utility maximizing - discreteness - multiple searches Extensive-Form Games - planning horizon - game tree representation - uncertainty LOCAL SEARCH - backwards induction - dynamic environment - state spaces - minimax - number of agents - cost functions - alpha-beta pruning - learning - neighbor generation - heuristic evaluation - computational limitations Hill-Climbing Normal Form Games - random restarts - payoff matrix repr. STATE SPACE SEARCH - random moves - removing dominated strats - state space modeling - simulated annealing - pure-strategy Nash eq. - completeness - temperature, decay rate - find one - optimality Population Search - mixed strategy Nash eq. - time/space complexity - (stochastic) beam search - verify one Uninformed Search - gibbs sampling - matrix/tree equivalence - depth-first - genetic algorithms - breadth-first - select/crossover/mutate MONTE CARLO SEARCH - uniform cost - state representation - random sampling evaluation Informed Search - satisfiability - explore/exploit tradeoff - greedy - gradient ascent Monte Carlo Tree Search - A* - tree policy - heuristics, admissibility GAME THEORY - default policy - UCT/UCB

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend