Apprentissage par Renforcement: Plan du cours Contexte Algorithms - PowerPoint PPT Presentation

Apprentissage par Renforcement: Plan du cours Contexte Algorithms Value functions Optimal policy Temporal differences and eligibility traces Q-learning Playing Go: MoGo Feature Selection as a Game Position du probl` eme Monte-Carlo Tree Search Feature Selection: the FUSE algorithm Experimental Validation Active Learning as a Game Position du probl` eme Algorithme BAAL Validation exp´ erimentale Constructive Induction

Go as AI Challenge Features ◮ Number of games 2 . 10 170 ∼ number of atoms in universe. ◮ Branching factor: 200 ( ∼ 30 for chess) ◮ Assessing a game ? ◮ Local and global features (symmetries, freedom, ...) Principles of MoGo Gelly Silver 2007 ◮ A weak but unbiased assessment function: Monte Carlo-based ◮ Allowing the machine to play against itself and build its own strategy

Weak unbiased assessment Monte-Carlo-based Br¨ ugman (1993) 1. While possible, add a stone (white, black) 2. Compute Win(black) 3. Average on 1-2 Remark: The point is to be unbiased if there exists situations where you (wrongly) think you’re in good shape then you go there and you’re in bad shape...

Build a strategy: Monte-Carlo Tree Search In a given situation: Select a move Multi-Armed Bandit In the end: 1. Assess the final move Monte-Carlo 2. Update reward for all moves

Select a move Exploration vs Exploitation Dilemma Multi-Armed Bandits Lai, Robbins 1985 ◮ In a casino, one wants to maximize one’s gains while playing ◮ Play the best arms so far ? Exploitation ◮ But there might exist better arms... Exploration

Multi-Armed Bandits, foll’d Auer et al. 2001, 2002; Kocsis Szepesvari 2006 For each arm (move) ◮ Reward: Bernoulli variable ∼ µ i , 0 ≤ µ i ≤ 1 ◮ Empirical estimate: ˆ µ i ± Confidence ( n i ) nb trials Decision: Optimism in front of unknown! log ( � n j ) � Select i ∗ = argmax ˆ µ i + C n i

Multi-Armed Bandits, foll’d Auer et al. 2001, 2002; Kocsis Szepesvari 2006 For each arm (move) ◮ Reward: Bernoulli variable ∼ µ i , 0 ≤ µ i ≤ 1 ◮ Empirical estimate: ˆ µ i ± Confidence ( n i ) nb trials Decision: Optimism in front of unknown! log ( � n j ) � Select i ∗ = argmax ˆ µ i + C n i ◮ Take into account standard deviation of ˆ µ i Variants ◮ Trade-off controlled by C ◮ Progressive widening

Monte-Carlo Tree Search Comments: MCTS grows an asymmetrical tree ◮ Most promising branches are more explored ◮ thus their assessment becomes more precise ◮ Needs heuristics to deal with many arms... ◮ Share information among branches MoGo: World champion in 2006, 2007, 2009 First to win over a 7th Dan player in 19 × 19

Quand l’apprentissage c’est la s´ election d’attributs Bio-informatique ◮ 30 000 g` enes ◮ peu d’exemples (chers) ◮ but : trouver les g` enes pertinents

Position du probl` eme Buts • S´ election : trouver un sous-ensemble d’attributs • Ordre/Ranking : ordonner les attributs Formulation Soient les attributs F = { f 1 , .. f d } . Soit la fonction : G : P ( F ) �→ I R F ⊂ F �→ Err ( F ) = erreur min. des hypoth` eses fond´ ees sur F Trouver Argmin ( G ) Difficult´ es eme d’optimisation combinatoire (2 d ) • Un probl` • D’une fonction F inconnue...

Approches Filter m´ ethode univari´ ee D´ efinir score ( f i ); ajouter it´ erativement les attributs maximisant score ou retirer it´ erativement les attributs minimisant score + simple - pas cher − optima tr` es locaux Rq : on peut bactracker : meilleurs optima, mais plus cher Wrapping m´ ethode multivari´ ee Mesurer la qualit´ e d’attributs en rapport avec d’autres attributs : estimer G ( f i 1 , ... f ik ) − cher : une estimation = un pb d’apprentissage. + optima meilleurs M´ ethodes hybrides.

Approches filtre Notations Base d’apprentissage : E = { ( x i , y i ) , i = 1 .. n , y i ∈ {− 1 , 1 }} f ( x i ) = valeur attribut f pour exemple ( x i ) Gain d’information arbres de d´ ecision p ([ f = v ]) = Pr ( y = 1 | f ( x i ) = v ) QI ([ f = v ]) = − p log p − (1 − p ) log (1 − p ) � QI = p ( v ) QI ([ f = v ]) v Corr´ elation � i f ( x i ) . y i � corr ( f ) = ∝ f ( x i ) . y i �� i ( f ( x i )) 2 × � i y 2 i i

Approches wrapper Principe g´ en´ erer/tester Etant donn´ e une liste de candidats L = { f 1 , .., f p } • G´ en´ erer un candidat F • Calculer G ( F ) • apprendre h F ` a partir de E | F = ˆ • tester h F sur un ensemble de test G ( F ) • Mettre ` a jour L . Algorithmes • hill-climbing / multiple restart • algorithmes g´ en´ etiques Vafaie-DeJong, IJCAI 95 • (*) programmation g´ en´ etique & feature construction. Krawiec, GPEH 01

Approches a posteriori Principe • Construire des hypoth` eses • En d´ eduire les attributs importants • Eliminer les autres • Recommencer Algorithme : SVM Recursive Feature Elimination Guyon et al. 03 eaire → h ( x ) = sign ( � w i . f i ( x ) + b ) • SVM lin´ • Si | w i | est petit, f i n’est pas important • Eliminer les k attributs ayant un poids min. • Recommencer.

Limites Hypoth` eses lin´ eaires • Un poids par attribut. Quantit´ e des exemples • Les poids des attributs sont li´ es. • La dimension du syst` eme est li´ ee au nombre d’exemples. Or le pb de FS se pose souvent quand il n’y a pas assez d’exemples

Some references ◮ Filter approaches [1] ◮ Wrapper approaches ◮ Tackling combinatorial optimization [2,3,4] ◮ Exploration vs Exploitation dilemma ◮ Embedded approaches ◮ Using the learned hypothesis [5,6] ◮ Using a regularization term [7,8] ◮ Restricted to linear models [7] or linear combinations of kernels [8] [1] K. Kira, and L. A. Rendell ML’92 [2] D. Margaritis NIPS’09 [3] T. Zhang NIPS’08 [4] M. Boull´ e J. Mach. Learn. Res. 07 [5] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik Mach. Learn. 2002 [6] J. Rogers, and S. R. Gunn SLSFS’05 [7] R. Tibshirani Journal of the Royal Statistical Society 94 [8] F. Bach NIPS’08

Feature Selection F : Set of features Optimization problem F : Feature subset E : Training data set F ∗ = argmin Err ( A , F , E ) Find A : Machine Learning algorithm Err : Generalization error Feature Selection Goals ◮ Reduced Generalization Error ◮ More cost-effective models ◮ More understandable models Bottlenecks ◮ Combinatorial optimization problem: find F ⊆ F ◮ Generalization error unknown

FS as A Markov Decision Process f 1 f 3 f 2 Set of features F Set of states S = 2 F f 1 f 2 f 3 f 3 Initial state ∅ f 2 f 1 f 3 f 2 f 1 Set of actions A = { add f , f ∈ F} f , f f , f f , f Final state any state 1 2 1 3 2 3 f 2 f 1 f 3 Reward function V : S �→ [0 , 1] f , f 1 2 f 3 Goal: Find argmin Err ( A ( F , D )) F ⊆F

Optimal Policy Policy π : S → A Final state following a policy F π f 1 f 3 Optimal policy π ⋆ = f 2 argmin Err ( A ( F π , E )) f 1 f 2 f 3 π f 3 Bellman’s optimality principle f 2 f 1 f 1 f 3 f 2 π ⋆ ( F ) = argmin V ⋆ ( F ∪ { f } ) f ∈F f , f f , f f , f 1 2 1 3 2 3 f 2 f 1 f 3 � Err ( A ( F )) if final ( F ) V ⋆ ( F ) = f , f f ∈F V ⋆ ( F ∪ { f } ) otherwise 1 2 min f 3 ◮ π ⋆ intractable ⇒ approximation using UCT In practice ◮ Computing Err ( F ) using a fast estimate

FS as a game Exploration vs Exploitation tradeoff ◮ Virtually explore the whole lattice ◮ Gradually focus the search on most promising F s f 1 f 2 f 3 ◮ Use a frugal, unbiased assessment of F f , f f , f f , f 1 2 1 3 2 3 How ? ◮ Upper Confidence Tree (UCT) [1] ◮ UCT ⊂ Monte-Carlo Tree Search f , f 1 2 f 3 ◮ UCT tackles tree-structured optimization problems [1] L. Kocsis, and C. Szepesv´ ari ECML’06

The UCT scheme ◮ Upper Confidence Tree (UCT) [1] ◮ Gradually grow the search tree ◮ Building Blocks ◮ Select next action (bandit-based Search Tree phase) ◮ Add a node (leaf of the search tree) ◮ Select next action bis (random phase) ◮ Compute instant reward ◮ Update information in visited nodes ◮ Returned solution: Explored Tree ◮ Path visited most often [1] L. Kocsis, and C. Szepesv´ ari ECML’06

Apprentissage par Renforcement: Plan du cours Contexte Algorithms - PowerPoint PPT Presentation

Apprentissage par Renforcement: Plan du cours Contexte Algorithms Value functions Optimal policy Temporal differences and eligibility traces Q-learning Playing Go: MoGo Feature Selection as a Game Position du probl` eme Monte-Carlo Tree

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 17 October 2014

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 12 February 2015

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 13 November 2014

DRAFT Scaling MySQL with Python draft2 Roberto Polli - roberto.polli@par-tec.it Par-Tec Spa -

NLCertify : A Tool for Formal Nonlinear Optimization Victor Magron , Postdoc LAAS-CNRS 18

New Applications of Semidefinite Programming Victor Magron , RA Imperial College 3 Fvrier 2015

Atelier de Renforcement de Capacits en Gestion de Projet Bonnes Pratiques de Gestion de Projet

L ongme adow Par kway F ox Rive r Br idge Cor r idor July 7, 2015 L ongme adow Par

Position Par tne r s I GNSS Sydne y F e b rua ry 2018 Position Par tne r s Is Austr

Financial Intermediation at Any Scale For Quantitative Modelling (3/3) Cours Bachelier

Financial Intermediation at Any Scale For Quantitative Modelling (1/3) Cours Bachelier

Financial Intermediation at Any Scale For Quantitative Modelling (2/3) Cours Bachelier

Apprentissage Automatique et Fouille de donnes textuelles Jean-Michel RENDERS Xerox Research

A & O Apprentissage & Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

Vers un apprentissage subquadratique pour les m elanges darbres F. Schnitzler 1 P. Leray 2

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage & Optimization

System Monitoring Xavier Martorell-Bofill 1 Ren Serral-Graci 1 Universitat Politcnica de

Physics-Based Animation Prof. Rahul Narain About me Rahul Narain

Fault-Tolerant Services in Distributed Systems Usin Vijay K. Garg email: garg@ece.utexas.edu

Lecture 6 Boost Library 9 Kenny Erleben Department of Computer Science University of Copenhagen

M odels for Inexact Reasoning Fuzzy Logic Lesson 8 Fuzzy Controllers M aster in

Fuzzy Logic Controller Debasis Samanta IIT Kharagpur dsamanta@iitkgp.ac.in 12.02.2018 Debasis

Automa'c Genera'on Control using Intelligent Controllers Harkirat

Fuzzy Logic for Robot Navigation Anton Volkov 05.11.2018

Sambuz

Useful Links

Newsletter

Mail Us

Apprentissage par Renforcement: Plan du cours Contexte Algorithms - PowerPoint PPT Presentation

Apprentissage par Renforcement: Plan du cours Contexte Algorithms Value functions Optimal policy Temporal differences and eligibility traces Q-learning Playing Go: MoGo Feature Selection as a Game Position du probl` eme Monte-Carlo Tree

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 17 October 2014

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 12 February 2015

New Applications of Moment-SOS Hierarchies Victor Magron , RA Imperial College 13 November 2014

DRAFT Scaling MySQL with Python draft2 Roberto Polli - roberto.polli@par-tec.it Par-Tec Spa -

NLCertify : A Tool for Formal Nonlinear Optimization Victor Magron , Postdoc LAAS-CNRS 18

New Applications of Semidefinite Programming Victor Magron , RA Imperial College 3 Fvrier 2015

Atelier de Renforcement de Capacits en Gestion de Projet Bonnes Pratiques de Gestion de Projet

L ongme adow Par kway F ox Rive r Br idge Cor r idor July 7, 2015 L ongme adow Par

Position Par tne r s I GNSS Sydne y F e b rua ry 2018 Position Par tne r s Is Austr

Financial Intermediation at Any Scale For Quantitative Modelling (3/3) Cours Bachelier

Financial Intermediation at Any Scale For Quantitative Modelling (1/3) Cours Bachelier

Financial Intermediation at Any Scale For Quantitative Modelling (2/3) Cours Bachelier

Apprentissage Automatique et Fouille de donnes textuelles Jean-Michel RENDERS Xerox Research

A &amp; O Apprentissage &amp; Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

Vers un apprentissage subquadratique pour les m elanges darbres F. Schnitzler 1 P. Leray 2

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage &amp; Optimization

System Monitoring Xavier Martorell-Bofill 1 Ren Serral-Graci 1 Universitat Politcnica de

Physics-Based Animation Prof. Rahul Narain About me Rahul Narain

Fault-Tolerant Services in Distributed Systems Usin Vijay K. Garg email: garg@ece.utexas.edu

Lecture 6 Boost Library 9 Kenny Erleben Department of Computer Science University of Copenhagen

M odels for Inexact Reasoning Fuzzy Logic Lesson 8 Fuzzy Controllers M aster in

Fuzzy Logic Controller Debasis Samanta IIT Kharagpur dsamanta@iitkgp.ac.in 12.02.2018 Debasis

Automa'c Genera'on Control using Intelligent Controllers Harkirat

Fuzzy Logic for Robot Navigation Anton Volkov 05.11.2018

Sambuz

Useful Links

Newsletter

Mail Us

A & O Apprentissage & Optimisation Head: Mich` ele Sebag Joint INRIA project, Head:

Monte-Carlo Tree Search Mich` ele Sebag TAO: Theme Apprentissage & Optimization