Autonomous(Task(Sequencing(for(Customized( Curriculum(Design(in(Reinforcement(Learning
Sanmit'Narvekar, Jivko Sinapov,+and+Peter+Stone Department+of+Computer+Science University+of+Texas+at+Austin {sanmit,+jsinapov,+pstone}+@cs.utexas.edu
Autonomous(Task(Sequencing(for(Customized( - - PowerPoint PPT Presentation
Autonomous(Task(Sequencing(for(Customized( Curriculum(Design(in(Reinforcement(Learning Sanmit'Narvekar, Jivko Sinapov,+and+Peter+Stone Department+of+Computer+Science University+of+Texas+at+Austin {sanmit,+jsinapov,+pstone}+@cs.utexas.edu
Sanmit'Narvekar, Jivko Sinapov,+and+Peter+Stone Department+of+Computer+Science University+of+Texas+at+Austin {sanmit,+jsinapov,+pstone}+@cs.utexas.edu
University+of+Texas+at+Austin Sanmit+Narvekar 2
University+of+Texas+at+Austin Sanmit+Narvekar 3
4 Sanmit+Narvekar University+of+Texas+at+Austin
5
.+.+.+.+.+.
Sanmit+Narvekar University+of+Texas+at+Austin
6
Empty+task Pawns+only Pawns+++King One+piece+per+type Target+task
Sanmit+Narvekar University+of+Texas+at+Austin
7
and+transfer+learning
Sanmit+Narvekar University+of+Texas+at+Austin
Environment Agent Action State Reward
Task+=+MDP Presented+at+AAMAS+‘16 via+Value+Function+Transfer
University+of+Texas+at+Austin Sanmit+Narvekar 8
University+of+Texas+at+Austin Sanmit+Narvekar 9
!0 !3 !1 !2 !5 !4 !f M1 M2 M3 M3 M3 M4 M4 M4 R0,1 R0,3 R0,2 R1,3 R2,4 R3,3 R4,4 R5,4
University+of+Texas+at+Austin Sanmit+Narvekar 10
!0 !3 !1 !2 !5 !4 !f M1 R0,1 M2 M3 M3 M3 M4 M4 M4 R0,3 R0,2 R1,3 R2,4 R3,3 R4,4 R5,4
train+on+given+learning+agent+policy+!i
University+of+Texas+at+Austin Sanmit+Narvekar 11
!0 !3 !1 !2 !5 !4 !f M1 R0,1 M2 M3 M3 M3 M4 M4 M4 R0,3 R0,2 R1,3 R2,4 R3,3 R4,4 R5,4
relevant+for+the+final+state+policy+!f to+guide+selection of+tasks
Target+Task
University+of+Texas+at+Austin Sanmit+Narvekar 12
Target'Task
University+of+Texas+at+Austin Sanmit+Narvekar 13
Solvable+Tasks Unsolvable+Tasks
1 2 3 4 5 6
to+learn.+Done
University+of+Texas+at+Austin Sanmit+Narvekar 14
Target'Task
University+of+Texas+at+Austin Sanmit+Narvekar 15
Solvable+Tasks Unsolvable+Tasks
solved+have+policies+that+are+ relevant+to+the+target+task
transfer
University+of+Texas+at+Austin Sanmit+Narvekar 16
Initial+Policy+!0 U … P , , , [ ]
U … P , , , [ ] …P , , , [ ] !1 !2
Solvable+Tasks
University+of+Texas+at+Austin Sanmit+Narvekar 17
New+Policy+!1 [s1,+s2,+s3,+s4 …+s"] P … P , , , [ ]
target+task+with+those+in+ experienced+in+sources
University+of+Texas+at+Austin Sanmit+Narvekar 18
Solvable+Tasks Unsolvable+Tasks
[s1,+s2,+s3 …+s"] [s4,+s5,+s6 …+s"] [s1,+s2,+s3 …+s"]
University+of+Texas+at+Austin Sanmit+Narvekar 19
Solvable+Tasks Unsolvable+Tasks
1 2 3 4 5 6
University+of+Texas+at+Austin Sanmit+Narvekar 20
!0 !3 !1 !2 !5 !4 !f M1 R0,1 M2 M3 M3 M3 M4 M4 M4 R0,3 R0,2 R1,3 R2,4 R3,3 R4,4 R5,4
Solvable+Tasks Unsolvable+Tasks
1 2 3 4 5 6
University+of+Texas+at+Austin Sanmit+Narvekar 21
University+of+Texas+at+Austin Sanmit+Narvekar 22
University+of+Texas+at+Austin Sanmit+Narvekar 23
University+of+Texas+at+Austin Sanmit+Narvekar 24
University+of+Texas+at+Austin Sanmit+Narvekar 25
curriculum+generation+as+an+MDP
trace in+this+MDP
create+curricula+tailored+to+sensing+and+ action+capabilities+of+agents
26 Sanmit+Narvekar University+of+Texas+at+Austin
!0 !3 !1 !2 !5 !4 !f M1 R0,1 M2 M3 M3 M3 M4 M4 M4 R0,3 R0,2 R1,3 R2,4 R3,3 R4,4 R5,4
Solvable+Tasks Unsolvable+Tasks1 2 3 4 5 6