High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris
Wojciech Jaśkowski Marcin Szubert Pawel Liskowski Krysztof Krawiec
Institute of Computing Science
High-Dimensional Function Approximation for Knowledge-Free - - PowerPoint PPT Presentation
High-Dimensional Function Approximation for Knowledge-Free Reinforcement Learning: a Case Study in SZ-Tetris Wojciech Jakowski Marcin Szubert Pawel Liskowski Krysztof Krawiec Institute of Computing Science July 14, 2015 Introduction RL
Institute of Computing Science
1 direct policy search (e.g. EAs), good for Tetris, Othello 2 value function-based methods (e.g. TD), good for Backgammon
High-Dimensional Function Approximation in RL: SZ-Tetris 2 / 17 Jasśkowski et al.
1 direct policy search (e.g. EAs), good for Tetris, Othello 2 value function-based methods (e.g. TD), good for Backgammon
High-Dimensional Function Approximation in RL: SZ-Tetris 2 / 17 Jasśkowski et al.
1 direct policy search (e.g. EAs), good for Tetris, Othello 2 value function-based methods (e.g. TD), good for Backgammon
High-Dimensional Function Approximation in RL: SZ-Tetris 2 / 17 Jasśkowski et al.
High-Dimensional Function Approximation in RL: SZ-Tetris 3 / 17 Jasśkowski et al.
High-Dimensional Function Approximation in RL: SZ-Tetris 4 / 17 Jasśkowski et al.
High-Dimensional Function Approximation in RL: SZ-Tetris 4 / 17 Jasśkowski et al.
High-Dimensional Function Approximation in RL: SZ-Tetris 4 / 17 Jasśkowski et al.
1 state-value function (estimates the expected future scores from a
2 state-preference function (no interpretation, larger is better) High-Dimensional Function Approximation in RL: SZ-Tetris 5 / 17 Jasśkowski et al.
High-Dimensional Function Approximation in RL: SZ-Tetris 6 / 17 Jasśkowski et al.
1 Height hk of the kth column of the
2 Absolute difference between the
3 Maximum column height maxh. 4 Number of ‘holes‘ on the board.
21
High-Dimensional Function Approximation in RL: SZ-Tetris 7 / 17 Jasśkowski et al.
1 2 3
0123 value 0000 3.04 0001 −3.90 0010 −2.14 . . . . . . 1100 −2.01 . . . . . . 1110 6.12 1111 3.21
1 Othello [Lucas, 2007,
2 Connect-4 [Thill, 2012], 3 2048 [Szubert, 2015]
m
m
8 / 17 Jasśkowski et al.
1 2 3
0123 value 0000 3.04 0001 −3.90 0010 −2.14 . . . . . . 1100 −2.01 . . . . . . 1110 6.12 1111 3.21
1 3 × 3-tuples (size = 9),
2 4 × 4-tuples (size = 16),
High-Dimensional Function Approximation in RL: SZ-Tetris 9 / 17 Jasśkowski et al.
1 Cross-Entrophy Method [CEM, Rubinstein, 2004]: 2 Covariance Matrix Adaptation Evolution Strategy [CMA-ES, Hansen
3 CMA-ES for high dimensions [VD-CMA-ES, Akimoto, 2014]
High-Dimensional Function Approximation in RL: SZ-Tetris 10 / 17 Jasśkowski et al.
a∈A(s)
11 / 17 Jasśkowski et al.
B&I Features 3x3 Tuple Network
50 100 150 200 250 300
50 100 150 200 200 400 600 800 1000
generation average score (cleared lines)
CEM CMAES CMAES−VD
High-Dimensional Function Approximation in RL: SZ-Tetris 12 / 17 Jasśkowski et al.
3x3 Tuple Network 4x4 Tuple Network
50 100 150 200 250 300
1000 2000 3000 4000 1000 2000 3000 4000
training games (x1000) average score (cleared lines)
High-Dimensional Function Approximation in RL: SZ-Tetris 13 / 17 Jasśkowski et al.
High-Dimensional Function Approximation in RL: SZ-Tetris 14 / 17 Jasśkowski et al.
High-Dimensional Function Approximation in RL: SZ-Tetris 15 / 17 Jasśkowski et al.
High-Dimensional Function Approximation in RL: SZ-Tetris 16 / 17 Jasśkowski et al.
1 High-dimensional representation (systematic n-tuple network) to:
2 VD-CMA-ES vs. TD:
1Source code: http://github.com/wjaskowski/gecco-2015-sztetris High-Dimensional Function Approximation in RL: SZ-Tetris 17 / 17 Jasśkowski et al.
1 High-dimensional representation (systematic n-tuple network) to:
2 VD-CMA-ES vs. TD:
1 The best player to date2(nearly 300 lines on average) > hand-coded
2 Systematic n-Tuple Networks solve the Challenge #1 posed by Szita
1Source code: http://github.com/wjaskowski/gecco-2015-sztetris High-Dimensional Function Approximation in RL: SZ-Tetris 17 / 17 Jasśkowski et al.