Actor-Critic Policy Learning in Cooperative Planning
Josh Redding, Alborz Geramifard Han-Lim Choi and Jonathan P. How
Aerospace Controls Lab, MIT
August 22, 2011
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 1 / 1
Actor-Critic Policy Learning in Cooperative Planning Josh Redding, - - PowerPoint PPT Presentation
Actor-Critic Policy Learning in Cooperative Planning Josh Redding, Alborz Geramifard Han-Lim Choi and Jonathan P. How Aerospace Controls Lab, MIT August 22, 2011 Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 1 / 1
Aerospace Controls Lab, MIT
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 1 / 1
Cooperative Planning Introduction
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 2 / 1
Cooperative Planning Introduction
1 Cooperative planning uses models
2 Models are approximated
3 Result is sub-optimal planner output
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 3 / 1
Cooperative Planning Introduction
1 How can current multi-agent planners balance between robustness
2 How should the learning algorithms be formulated to best address
3 How can a learning algorithm be formulated to enable a more
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 4 / 1
Cooperative Planning Introduction
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 5 / 1
Planning + Learning Framework for Cooperative Planning and Learning
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 6 / 1
Planning + Learning Framework for Cooperative Planning and Learning
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 7 / 1
Planning + Learning Framework for Cooperative Planning and Learning
Cooperative Planner
World
Learning Algorithm Performance Analysis Agent/Vehicle disturbances noise
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 8 / 1
Planning + Learning Framework for Cooperative Planning and Learning
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 9 / 1
Problem Description Scenario
1 2 3
.5 [2,3] +100
4
.5 [2,3] +100
5
[3,4] +200 5 8
6
+100 .7
7
+300 .6
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 10 / 1
Problem Description Scenario
Cooperative Planner
World
Learning Algorithm Performance Analysis Agent/Vehicle disturbances noise
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 11 / 1
Problem Description Scenario
Consensus Based Bundle Algorithm (CBBA)
World
Actor-Critic RL Risk Analysis Agent/Vehicle
x,r(x) π(x)a π(x)b π(x) π0
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 12 / 1
Problem Description Cooperative Planner
Consensus Based Bundle Algorithm (CBBA)
World
Actor-Critic RL Risk Analysis Agent/Vehicle
x,r(x) π(x)a π(x)b π(x) π0
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 13 / 1
Problem Description Cooperative Planner
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 14 / 1
Problem Description Cooperative Planner
Phase 1 Phase 2 Assigned Yes No
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 15 / 1
Problem Description Learning Algorithm
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 16 / 1
Problem Description Learning Algorithm
Consensus Based Bundle Algorithm (CBBA)
World
Actor-Critic RL Risk Analysis Agent/Vehicle
x,r(x) π(x)a π(x)b π(x) π0
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 17 / 1
Problem Description Learning Algorithm
eP (s,a)/τ
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 18 / 1
Problem Description Learning Algorithm
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 18 / 1
Problem Description Performance Analysis
Consensus Based Bundle Algorithm (CBBA)
World
Actor-Critic RL Risk Analysis Agent/Vehicle
x,r(x) π(x)a π(x)b π(x) π0
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 19 / 1
Problem Description Performance Analysis
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 20 / 1
Numerical Results Setup
1
2
1
2
3
4
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 21 / 1
Numerical Results Setup
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 22 / 1
Numerical Results Scenario 1
1 2 3
.5 [2,3] +100
4
.5 [2,3] +100
5
[3,4] +200 5 8
6
+100 .7
7
+300 .6
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 23 / 1
Numerical Results Scenario 1
! " # $ % &! '(&!
#
!)!! !"!! !&!! ! &!! "!! )!! #!! *!! $!! +!! ,-./0 1.-234 Actor-Critic CBBA Optimal iCCA
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 24 / 1
Numerical Results Scenario 2
1 2 3
.5 [2,3] +100
4
.5 [2,3] +100
5
[3,4] +200 8 11
6
+100 .7
7
+300 .6
8
[4,6] +300 .8
9
+150 [4,7]
10
+150 .9 [3,5]
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 25 / 1
Numerical Results Scenario 2
! " # $ % &! '(&!
#
!$!! !#!! !"!! ! "!! #!! $!! %!! &!!! &"!! )*+,- .+*/01 Actor-Critic CBBA iCCA
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 26 / 1
Numerical Results Conclusions & Future Work
Redding et al (ACL) Actor-Critic Cooperative Planning August 22, 2011 27 / 1