Bandit-based Search for Constraint Programming Manuel Loth 1 , 2 , 4 - PowerPoint PPT Presentation

Bandit-based Search for Constraint Programming Manuel Loth 1 , 2 , 4 , Mich` ele Sebag 2 , 4 , 1 , Youssef Hamadi 3 , 1 , Marc Schoenauer 4 , 2 , 1 , Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud and CNRS 3 Microsoft Research Cambridge 4 INRIA Saclay 5 KTH, Stockholm Review AERES, Nov. 2013 LABORATOIRE DE RECHERCHE EN INFORMATIQUE 1 / 23

Search/Optimization and Machine Learning Different Learning contexts ◮ Supervised (from examples) vs Reinforcement (from reward) ◮ Off-line (static) vs On-line (while searching) Here: Use on-line Reinforcement Learning (MCTS) To improve CP search 2 / 23

Main idea Constraint Programming ◮ Explore a search tree ◮ Heuristics: (learn to) order variables & values Monte-Carlo Tree Search ◮ A tree-search method ◮ Breathrough for games and planning Hybridizing MCTS and CP Bandit-based Search for Constraint Programming 3 / 23

Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 4 / 23

The Multi-Armed Bandit problem Lai, Robbins 85 In a casino, one wants to maximize one’s gains while playing. Lifelong learning Exploration vs Exploitation Dilemma ◮ Play the best arm so far ? Exploitation ◮ But there might exist better arms... Exploration 5 / 23

The Multi-Armed Bandit problem (2) ◮ K arms, i th arm gives reward 1 with proba. µ i , 0 otherwise ◮ At each time t , one selects an arm i ∗ t and gets a reward r t = number of times i has been selected in [0,t] n i , t ˆ = average reward of arm i in [0,t] µ i , t Upper Confidence Bound Auer et al. 2002 Be optimistic when facing the unknown log ( � n j , t ) � � � Select argmax µ i , t + C ˆ n i , t ǫ -greedy with probability 1 − ǫ , select argmax { ˆ µ i , t } exploitation else select an arm uniformly exploration 6 / 23

Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase ◮ Add a node Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward New Node Evaluate ◮ Update information in visited nodes of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

Monte-Carlo Tree Search Kocsis Szepesv´ ari, 06 UCT == UCB for Trees: gradually grow the search tree ◮ Iterate Tree-Walk ◮ Building Blocks ◮ Select next action Bandit phase Bandit−Based ◮ Add a node Phase Grow a leaf of the search tree Search Tree ◮ Select next action bis Random phase, roll-out ◮ Compute instant reward New Node Evaluate Random ◮ Update information in visited nodes Phase of the search tree Propagate Explored Tree ◮ Returned solution: ◮ Path visited most often 7 / 23

Overview MCTS BaSCoP Experimental validation Conclusions and Perspectives 8 / 23

Bandit-based Search for Constraint Programming Manuel Loth 1 , 2 , 4 - PowerPoint PPT Presentation

Bandit-based Search for Constraint Programming Manuel Loth 1 , 2 , 4 , Mich` ele Sebag 2 , 4 , 1 , Youssef Hamadi 3 , 1 , Marc Schoenauer 4 , 2 , 1 , Christian Schulte 5 1 Microsoft-INRIA joint centre 2 LRI, Univ. Paris-Sud and CNRS 3 Microsoft

Reinforcement Learning n-armed bandit Kevin Spiteri April 21, 2015 n-armed bandit n-armed

Reinforcement Learning Kevin Spiteri April 21, 2015 n-armed bandit n-armed bandit 0.9 0.5

One Armed Bandit source: http://dogbeforewicket.blogspot.ca EECS 1030 moodle.yorku.ca One Armed

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Constraint Satisfaction Problems Chapter 5 Section 1 3 Constraint Satisfaction 1 Outline

Constraint Networks Dario Maggi University Basel October 9, 2014 Dario Maggi Constraint

Chapter 3 Constraint Programming Paragraph 2 Constraint Programs and Consistency Search and

Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems, Part I S ebastien

The Multi-Armed Bandit Problem Nicol` o Cesa-Bianchi Universit` a degli Studi di Milano Nicol`

Constraint-Based Refactoring Rename Field Problem Proven Correct Solution Constraint- Based

Constraint Integer Programming Leon Eifler, eifler@zib.de CO@Work, 2020 Outline Constraint

1 Constraint Satisfaction Problems Constraint Satisfaction Problems Constraint Satisfaction

Constraint Programming - Heuristics in Search Variable and Value selection Static and

Constraint Programming (CP) eVITA Winter School 2009 Optimization Tomas Eric Nordlander Outline

Constraint Satisfaction Constraint Satisfaction Problems Problems Some material from: D Lin, J

Constraint Satisfaction Problems 4 AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 4 1 4

By Stian Berg Supervisor Ole-Christoffer Granmo, University of Agder Introduction Thesis

City of Somerville Zoning Amendment Union Square Zoning Amendment Meeting #11 4-12-17

Lessons from Discrete Mathematics Kirsten Nelson Carleton University October 14, 2017 Contact:

Community Development Block Grant FY2020 RFP Application Workshop City of New Bedford Office

Internship Defense David Taralla University of Lige Thursday 19 December 2013 Contents

Data Science methods for treatment personalization in Persuasive Technology Prof. dr. M.C.Kaptein

Lecture #1: Introduction to CS109A aka STAT121A, AC209A, CSCIE-109A CS109A Introduction to Data

Welcome to Day 2 Lets brainstorm! 1. How can you reliably ensure your staff do not