bandit based search for constraint programming
play

Bandit-based Search for Constraint Programming Manuel Loth 1 , 2 , 4 - PowerPoint PPT Presentation

Bandit-based Search for Constraint Programming Manuel Loth 1 , 2 , 4 , Mich` ele Sebag 2 , 4 , 1 , Youssef Hamadi 3 , 1 , Marc Schoenauer 4 , 2 , 1 , Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud and CNRS 3 Microsoft


  1. Bandit-based Search for Constraint Programming Manuel Loth 1 , 2 , 4 , Mich` ele Sebag 2 , 4 , 1 , Youssef Hamadi 3 , 1 , Marc Schoenauer 4 , 2 , 1 , Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud and CNRS 3 Microsoft Research Cambridge 4 INRIA Saclay 5 KTH, Stockholm Review AERES, Nov. 2013 LABORATOIRE DE RECHERCHE EN INFORMATIQUE 1 / 23

  2. Search/Optimization and Machine Learning Different Learning contexts ◮ Supervised (from examples) vs Reinforcement (from reward) ◮ Off-line (static) vs On-line (while searching) Here: Use on-line Reinforcement Learning (MCTS) To improve CP search 2 / 23

  3. Main idea Constraint Programming ◮ Explore a search tree ◮ Heuristics: (learn to) order variables & values Monte-Carlo Tree Search ◮ A tree-search method ◮ Breathrough for games and planning Hybridizing MCTS and CP Bandit-based Search for Constraint Programming 3 / 23

  4. Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 4 / 23

  5. The Multi-Armed Bandit problem Lai, Robbins 85 In a casino, one wants to maximize one’s gains while playing. Lifelong learning Exploration vs Exploitation Dilemma ◮ Play the best arm so far ? Exploitation ◮ But there might exist better arms... Exploration 5 / 23

  6. The Multi-Armed Bandit problem (2) ◮ K arms, i th arm gives reward 1 with proba. µ i , 0 otherwise ◮ At each time t , one selects an arm i ∗ t and gets a reward r t = number of times i has been selected in [0,t] n i , t ˆ = average reward of arm i in [0,t] µ i , t Upper Confidence Bound Auer et al. 2002 Be optimistic when facing the unknown log ( � n j , t ) � � � Select argmax µ i , t + C ˆ n i , t ǫ -greedy with probability 1 − ǫ , select argmax { ˆ µ i , t } exploitation else select an arm uniformly exploration 6 / 23

  7. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase ◮ Add a node Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  8. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  9. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  10. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  11. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  12. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  13. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  14. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  15. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  16. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward New Node Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  17. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward New Node Evaluate Random ◮ Update information in visited nodes Phase of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  18. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward New Node Evaluate Random ◮ Update information in visited nodes Phase of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  19. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward New Node Evaluate Random ◮ Update information in visited nodes Phase of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  20. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward New Node Evaluate Random ◮ Update information in visited nodes Phase of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  21. Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward New Node Evaluate Random ◮ Update information in visited nodes Phase of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

  22. Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 8 / 23

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend