Internship Defense David Taralla University of Lige Thursday 19 - PowerPoint PPT Presentation

Internship Defense David Taralla University of Liège Thursday 19 December 2013

Contents Introduction Context Basic idea From the idea to the theoretical implementation Conclusion

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences MCTS algorithm discovery ◮ Much research in AI games uses MCTS ◮ Problem known in advance: Customize MCTS in a problem-driven way ◮ Why not automatize this task? ⇒ Monte Carlo search algorithm discovery, for finite-horizon fully-observable deterministic sequential decision-making problems For example: • Sudoku puzzles • Pyramid card game • ... 3 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences Grammar & algorithm space ◮ Generate a rich space of MCTS algorithms thanks to search components • simulate • repeat • step • ... ◮ Space cardinality grows combinatorially with length and # of search comp. ◮ Multi-armed bandit approach to get a collection of well-performing algorithms 4 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences Multi-armed bandit model Bandit in this context ◮ Machine with multiple arms ◮ Pulling an arm has a budget cost and gives some reward ◮ Finite budget 5 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences Multi-armed bandit model Model description Here, ◮ Arm = algorithm execution ◮ Reward = this algorithm execution reward ◮ We want the best arm to be the algorithm with the best mean reward i.e. the algorithm performing the best on average 6 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences Multi-armed bandit model Model flaws ◮ Discrete One cannot pull half an arm! ◮ Big cardinality Existing methods not really adapted to big cardinality with finite budget ◮ They used UCB policy with 100 × #AlgoSpace steps Length up to 5 → #AlgoSpace = 3155: this method is not easily scalable 7 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences Multi-armed bandit model An alternative approach Design an alternative to standard UCB arm space exploration ◮ This is the best arm identification problem ◮ Get info. about pulled arms so far, select next arm accordingly ⇒ Perform some kind of information transfer from a (set of) arm(s) to another ⇒ This internship was about this problem 8 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences Basic idea ◮ Maximize the “distance” between the pulled arms and the next pull Get maximal information → Reduce required samples amount! ◮ Many challenges in this “simple” idea 9 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences Best arm identification algorithm Create sampling plan Add resulting data to memory Prune arm space Get a regressor using RLS on data gathered so far Get lower & upper confidence bounds Get best arm a ∗ using predictions Are we confident Return a ∗ enough for a ∗ ? No Yes 10 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Create sampling plan ◮ G-optimal experiment design • Concerned with the variance of predictions • Get allocation vector γ s.t. information is, in some way, maximized (Erratum — Report says we maximize J ( γ ). That is incorrect, we minimize J ( γ )). ◮ Simple rounding procedure • “Translate” γ into a sequence of arms to pull 11 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Get a regressor using RLS on data gathered so far ◮ Predictions? • Regressor θ • Features Φ r a = � φ a , θ � = � θ � φ a , ˆ • + η 12 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Get a regressor using RLS on data gathered so far ◮ Predictions? • Regressor θ • Features Φ r a = � φ a , θ � = � θ � φ a , ˆ • + η ◮ Features of an algorithm 12 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Get a regressor using RLS on data gathered so far ◮ Predictions? • Regressor θ • Features Φ r a = � φ a , θ � = � θ � φ a , ˆ • + η ◮ Features of an algorithm • ??? 12 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Get a regressor using RLS on data gathered so far ◮ Predictions? • Regressor θ • Features Φ r a = � φ a , θ � = � θ � φ a , ˆ • + η ◮ Features of an algorithm • ??? r a = � θ � φ a , ˆ • In fact, we just need features to compute ˆ 12 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Get a regressor using RLS on data gathered so far ◮ Predictions? • Regressor θ • Features Φ r a = � φ a , θ � = � θ � φ a , ˆ • + η ◮ Features of an algorithm • ??? r a = � θ � φ a , ˆ • In fact, we just need features to compute ˆ • Features dual: kernels α ∈ R n × 1 : n arms (...) ⇒ ∃ ˆ � � n n � � � � φ a , ˆ θ = φ a , α t φ a ˆ = α t � φ a , φ a t � ˆ � �� t =1 t =1 K ( a , a t ) 12 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Get a regressor using RLS on data gathered so far — Kernels — The kernel “mimics” the inner product of two feature vectors 13 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Get a regressor using RLS on data gathered so far — Kernels — The kernel “mimics” the inner product of two feature vectors Estimating θ Estimating α Based on features Based on kernel Get ˆ Get ˆ α → Get ˆ r a θ → Get ˆ r a 13 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Get a regressor using RLS on data gathered so far — Regularization parameter λ — ◮ Auto tuning of λ given dataset n � 1 ( f D − i ,λ ( a i ) − r i ) 2 ⇒ Minimize e ( λ ) = n i =1 ◮ Naïve approach: α — O ( n 3 ) (1 matrix inversion) 1. Get ˆ 2. Do it for n different datasets — O ( n ) ⇒ If M evaluations of e ( λ ), total complexity of O ( Mn 4 )! ◮ Kernelized generalized cross-validation ⇒ If M evaluations of e ( λ ), achievable total complexity of O ( n 3 + Mn 2 ) 14 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Get a regressor using RLS on data gathered so far — Regularization parameter λ — Example Mean error when predicting the mean reward of an algorithm 24 22 20 18 16 Mean errors using GCV 14 12 10 8 6 4 2 0 1.00E-06 1.00E-05 0.0001 0.001 0.01 0.1 1 10 100 1000 10000 Lambda 15 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Get lower & upper confidence bounds ◮ Theorem developed by Abbasi-Yadkori et al. (2011) ◮ Extension to the kernel case by Abbasi-Yadkori (2012) ◮ Given some assumptions on the model, allows to compute the (symmetrical) bounds 16 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences From the idea to the theoretical implementation Prune arm space ◮ Discard all arms whose upper bound is smaller than the lower bound on a ∗ ◮ Illustration [on the board] 17 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences Conclusion Wrap up: Sudoku 16 × 16 Maybe a little wrap-up example? Data ◮ Problem: 16 × 16 Sudoku, 1 3 prefilled grid ◮ About 3200 algorithms ◮ 2 rounds with sampling plans consisting of sequences of n 1 and n 2 algorithms 18 / 21

Internship Defense David Taralla University of Liège 1st Master in Engineering Sciences Conclusion Wrap up: Sudoku 16 × 16 Create sampling plan Add resulting data to memory Prune arm space Get a regressor using RLS on data gathered so far Get lower & upper confidence bounds Get best arm a ∗ using predictions Are we confident Return a ∗ enough for a ∗ ? No Yes 19 / 21

Internship Defense David Taralla University of Lige Thursday 19 - PowerPoint PPT Presentation

Internship Defense David Taralla University of Lige Thursday 19 December 2013 Contents Introduction Context Basic idea From the idea to the theoretical implementation Conclusion Internship Defense David Taralla University of Lige

PRESENTATION OUTLINE Internship Process Internship Goals Internship Requirements

Student internship program as Student internship program as Student internship program as

Dietetic Internship Meet the Dietetic Internship Faculty Susan Roberts, MS, RDN, LD, CNSC Ashley

Master of Information Management Internship INFM 736 Review the Internship Description:

Food Defense Food Defense Tabletop Food Defense Food Defense Tabletop Tabletop Tabletop

Internship Orientation Summer 2013 Welcome and Congratulations! Getting the most out of your

COLLARTS SOURCING REMOTE INTERNSHIPS WHAT IS A REMOTE INTERNSHIP? COLLARTS REMOTE INTERNSHIPS

GPS INTERNSHIP SUMMER 2018 Malia Martin Goal of the internship Introduction into the dairy

ADCN COUNSELING INTERNSHIP PREREQUISITES 48-HR. ADDICTION COUNSELING o Internship: ADCN 699

Extreme MAKEOVER College of Life Sciences Internship Online Management System

Seed Central Grand Prize Internship Program Sponsored by HM.CLAUSE 10/8/2014 Internship

Regional Internship Programme 2019 1 2 Benefits of the CCRIF SPC Regional Internship Programme

Graduation Day Event Welcome! August 2, 2019 Rock Internship Program Angus Young Associates

IAEA IAEA Internship Report Internship Report Takanari Fukuda Takanari Fukuda Project Human

CADET EMILIANO GONZALEZ, USMA 2018 HOMELAND SECURITY INTERNSHIP RESEARCH PRESENTATION 2018

Building a Quality Internship Program An Employer Guide 1 A traditional internship is any

Bandit-based Search for Constraint Programming Manuel Loth 1 , 2 , 4 , Mich` ele Sebag 2 , 4 , 1 ,

By Stian Berg Supervisor Ole-Christoffer Granmo, University of Agder Introduction Thesis

City of Somerville Zoning Amendment Union Square Zoning Amendment Meeting #11 4-12-17

Lessons from Discrete Mathematics Kirsten Nelson Carleton University October 14, 2017 Contact:

Data Science methods for treatment personalization in Persuasive Technology Prof. dr. M.C.Kaptein

Lecture #1: Introduction to CS109A aka STAT121A, AC209A, CSCIE-109A CS109A Introduction to Data

Welcome to Day 2 Lets brainstorm! 1. How can you reliably ensure your staff do not

University of Kansas School of Nursing ngodfrey@kumc.edu 4. Shift from an emphasis on