Upper confidence bound algorithms
Christos Dimitrakakis
EPFL
November 6, 2013
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 1 / 22
Upper confidence bound algorithms Christos Dimitrakakis EPFL - - PowerPoint PPT Presentation
Upper confidence bound algorithms Christos Dimitrakakis EPFL November 6, 2013 Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 1 / 22 1 Introduction 2 Bandit problems UCB 3 Structured bandit problems 4
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 1 / 22
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 2 / 22
Bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 3 / 22
Bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 4 / 22
Bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 5 / 22
Bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 5 / 22
Bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 5 / 22
Bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 5 / 22
Bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 5 / 22
Bandit problems UCB
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 6 / 22
Bandit problems UCB
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 7 / 22
Bandit problems UCB
t−1 + Bt−1,n∗ t−1 ≤ max ˆ
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 8 / 22
Bandit problems UCB
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 8 / 22
Bandit problems UCB
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 9 / 22
Structured bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 10 / 22
Structured bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 11 / 22
Structured bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 12 / 22
Structured bandit problems
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 12 / 22
Reinforcement learning problems Optimality Criteria
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 13 / 22
Reinforcement learning problems Optimality Criteria
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 13 / 22
Reinforcement learning problems Optimality Criteria
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 14 / 22
Reinforcement learning problems Optimality Criteria
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 14 / 22
Reinforcement learning problems Optimality Criteria
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 14 / 22
Reinforcement learning problems Optimality Criteria
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 14 / 22
Reinforcement learning problems Optimality Criteria
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 15 / 22
Reinforcement learning problems Optimality Criteria
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 16 / 22
Reinforcement learning problems Optimality Criteria
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 17 / 22
Reinforcement learning problems Optimality Criteria
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 18 / 22
Reinforcement learning problems UCRL
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 19 / 22
Reinforcement learning problems UCRL
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 19 / 22
Reinforcement learning problems UCRL
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 19 / 22
Reinforcement learning problems UCRL
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 19 / 22
Reinforcement learning problems UCRL
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 19 / 22
Reinforcement learning problems UCRL
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 20 / 22
Reinforcement learning problems UCRL
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 20 / 22
Reinforcement learning problems UCRL
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 21 / 22
Reinforcement learning problems UCRL
Christos Dimitrakakis (EPFL) Upper confidence bound algorithms November 6, 2013 22 / 22