Online Learning within Cooperative Planning Alborz Geramifard - - PowerPoint PPT Presentation

online learning within cooperative planning
SMART_READER_LITE
LIVE PREVIEW

Online Learning within Cooperative Planning Alborz Geramifard - - PowerPoint PPT Presentation

Online Learning within Cooperative Planning Alborz Geramifard September, 2010 agf@mit.edu Joint W ork: Finale Doshi, Josh Redding, Nicholas Roy, Jonathan How Supported by: AFOSR 1 Problem W aypoint Obstacle Base 2 Why is this a hard


slide-1
SLIDE 1

Online Learning within Cooperative Planning

Alborz Geramifard September, 2010 agf@mit.edu

1

Joint W

  • rk:

Finale Doshi, Josh Redding, Nicholas Roy, Jonathan How Supported by: AFOSR

slide-2
SLIDE 2

Problem

2

W aypoint Base Obstacle

slide-3
SLIDE 3

Why is this a hard problem?

Scalability (Large State Space) Stochastic Environment Inaccuracy in Model Limited Online Computation Safety

3

slide-4
SLIDE 4

DP/MILP

✓ ✓

Cooperative Planners

Limited Limited

✓ ✓

Online Model-Free RL

✓ ✓ ✓

Existing Methods

4

slide-5
SLIDE 5

DP/MILP

✓ ✓

Cooperative Planners

Limited Limited

✓ ✓

Online Model-Free RL

✓ ✓ ✓

Existing Methods

4

slide-6
SLIDE 6

Sample Complexity Limited Online Computation Safe Exploration

Big Picture

Cooperative Planner Online Model-Free RL

5

slide-7
SLIDE 7

Sample Complexity Limited Online Computation Safe Exploration

Big Picture

Cooperative Planner Online Model-Free RL

5

slide-8
SLIDE 8

Introduced incremental Feature Dependency Discovery (iFDD) as a novel Adaptive Function Approximation Scaled existing online RL methods to large domains using iFDD Combined online learning methods with cooperative planners

Contributions

6

slide-9
SLIDE 9

Challenges (Reminder)

7

Sample Complexity Limited Computation Safe Exploration

slide-10
SLIDE 10

1

Planner Learner

2

Learner

8

Outline

slide-11
SLIDE 11

Learner

9

Reinforcement Learning

at

st, rt

π : S → A

slide-12
SLIDE 12

Learner

10

State V alues

V π(s) = E ∞

  • t=1

γt−1rt

  • s0 = s, π
slide-13
SLIDE 13

Learner

10

State V alues

V π(s) = E ∞

  • t=1

γt−1rt

  • s0 = s, π
  • Scalability:

Represent V Learn Fast

slide-14
SLIDE 14

Linear Function Approximation

. . .

  • φ1

φ2

φn

θ1

θ2

θn

11

s

V (s) ≈ φ(s)T θ

Learner

slide-15
SLIDE 15

Linear Function Approximation

. . .

  • φ1

φ2

φn

θ1

θ2

θn

11

s

V (s) ≈ φ(s)T θ

Learner

slide-16
SLIDE 16

Linear Function Approximation

. . .

  • φ1

φ2

φn

θ1

θ2

θn

11

s

V (s) ≈ φ(s)T θ

Learner

slide-17
SLIDE 17

LFA: Example

State

φ(st)

  • 5

10 10 10

θt Feature W eight V alue

1 1 1

V(s) = -5+10+10 = 15

12

. . .

. . .

Learner

slide-18
SLIDE 18

Why Expand Features?

13

s

Learner

V (s)

slide-19
SLIDE 19

Why Expand Features?

13

s

Learner

V (s)

slide-20
SLIDE 20

Why Expand Features?

13

s

Learner

V (s)

slide-21
SLIDE 21

Learner

14

Existing Gap

Lack of Convergence [Rivest et al. 2003] Computational Complexity [Wu et al. 2004] Sample Complexity [Whiteson et al. 2007] Design Skill Requirement [Kolter et al. 2009]

slide-22
SLIDE 22

Update W eights Update Features

Control Loop:

15

Sarsa iFDD

|δ|

Learner

slide-23
SLIDE 23

Update W eights Update Features

Control Loop:

15

Sarsa iFDD

|δ|

Learner

slide-24
SLIDE 24

Sarsa

16

Learner

slide-25
SLIDE 25

Sarsa

16

(π) a,r

[Sutton 88]

st+1

st

δt = rt + γV (st+1) − V (st).

Learner

slide-26
SLIDE 26

Sarsa

16

(π) a,r

[Sutton 88]

st+1

st

θt+1 = θt + αtφ(st)δt(V )

δt = rt + γV (st+1) − V (st).

Linear Function Approximation

Learner

slide-27
SLIDE 27

Update W eights Update Features

Control Loop:

17

Sarsa iFDD

|δ|

Learner

slide-28
SLIDE 28

Update W eights Update Features

Control Loop:

17

Sarsa iFDD

|δ|

Learner

slide-29
SLIDE 29

Learner

18

Approach

φ

incremental Feature Dependency Discovery (iFDD) Add necessary dependencies as new features features may lead to features

n

2n

slide-30
SLIDE 30

Learner

18

Approach

φ

incremental Feature Dependency Discovery (iFDD) Add necessary dependencies as new features features may lead to features

n

2n

slide-31
SLIDE 31

Learner

18

Approach

φ

incremental Feature Dependency Discovery (iFDD) Add necessary dependencies as new features features may lead to features

n

2n

slide-32
SLIDE 32

Learner

18

Approach

φ

incremental Feature Dependency Discovery (iFDD) Add necessary dependencies as new features features may lead to features

n

2n

slide-33
SLIDE 33

Learner

18

Approach

φ

incremental Feature Dependency Discovery (iFDD) Add necessary dependencies as new features features may lead to features

n

2n

slide-34
SLIDE 34

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1

ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-35
SLIDE 35

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2 ϕ1

ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-36
SLIDE 36

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2 ϕ1

ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-37
SLIDE 37

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2 ϕ1

ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-38
SLIDE 38

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2 ϕ1

ϕ2∧ϕ3 ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-39
SLIDE 39

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2 ϕ1

ϕ2∧ϕ3 ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-40
SLIDE 40

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2 ϕ1

ϕ2∧ϕ3 ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-41
SLIDE 41

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2 ϕ1

ϕ2∧ϕ3 ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-42
SLIDE 42

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2 ϕ1

✓ ✓

ϕ2∧ϕ3 ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-43
SLIDE 43

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2 ϕ1

✓ ✓

ϕ2∧ϕ3 ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-44
SLIDE 44

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2

ϕ1∧ϕ2∧ϕ3

ϕ1

✓ ✓

ϕ2∧ϕ3 ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-45
SLIDE 45

Learner

19

iFDD

  • 1. Find Potential Feature Dependencies
  • 2. T

rack Relevance

  • 3. Discover (Relevance > Threshold)

ϕ1∧ϕ2

ϕ1∧ϕ2∧ϕ3

ϕ1

✓ ✓ ✓

ϕ2∧ϕ3 ϕ2 ϕ3

✓ ✓

|δ|

Binary Features

slide-46
SLIDE 46

Learner

20

iFDD

Convergence Proof Per time-step complexity is independent of number of features*. [Geramifard, et al. ICML 2011]

slide-47
SLIDE 47

Update W eights Update Features

Control Loop:

21

Sarsa iFDD

|δ|

Learner

slide-48
SLIDE 48

Sarsa with 5 representations: Initial (No Expansion) Initial + iFDD Tabular (Fully Expanded) Adaptive Tile Coding [Whiteson et al. 2007] Sparse Distributed Memories [Ratitch et al. 2004]

Empirical Results

22

Learner

slide-49
SLIDE 49

Sarsa with 5 representations: Initial (No Expansion) Initial + iFDD Tabular (Fully Expanded) Adaptive Tile Coding [Whiteson et al. 2007] Sparse Distributed Memories [Ratitch et al. 2004]

Empirical Results

22

Learner

slide-50
SLIDE 50

Adaptive Tile Coding (ATC)

23

[Whiteson et al. 2007]

slide-51
SLIDE 51

Adaptive Tile Coding (ATC)

23

[Whiteson et al. 2007]

slide-52
SLIDE 52

Adaptive Tile Coding (ATC)

23

[Whiteson et al. 2007]

slide-53
SLIDE 53

Adaptive Tile Coding (ATC)

23

[Whiteson et al. 2007]

slide-54
SLIDE 54

Adaptive Tile Coding (ATC)

23

[Whiteson et al. 2007]

slide-55
SLIDE 55

Sarsa with 5 representations: Initial (No Expansion) Initial + iFDD Tabular (Fully Expanded) Adaptive Tile Coding [Whiteson et al. 2007] Sparse Distributed Memories [Ratitch et al. 2004]

Empirical Results

24

Learner

slide-56
SLIDE 56

Sarsa with 5 representations: Initial (No Expansion) Initial + iFDD Tabular (Fully Expanded) Adaptive Tile Coding [Whiteson et al. 2007] Sparse Distributed Memories [Ratitch et al. 2004]

Empirical Results

24

Learner

slide-57
SLIDE 57

Sparse Distributed Memories (SDM)

25

[Ratitch et al. 2004]

slide-58
SLIDE 58

Sarsa with 5 representations: Initial (No Expansion) Initial + iFDD Tabular (Fully Expanded) Adaptive Tile Coding [Whiteson et al. 2007] Sparse Distributed Memories [Ratitch et al. 2004]

Empirical Results

26

Learner

slide-59
SLIDE 59

Persistent Surveillance

27

Maintenance Refuel Communication Target

Advance Retreat Loiter

Targets UAVs

fuel=10 fuel=10 fuel=10

~ 150 Million (s,a)

Learner

slide-60
SLIDE 60

28

Persistent Surveillance

2 4 6 8 10 x 10

4

50 50 100 150 200 250 300 350 400 450 Steps Return

Initial ATC SDM,Tabular Initial+iFDD

Learner

[Geramifard, et al. ICML 2011]

slide-61
SLIDE 61

Learner

29

Rescue Mission

1 2 3

+1

4

.8 +1

5

+5 8

6

+1

7

+5 .8

8

+10 .5 10

5 10 5

~200 million (s,a) [Geramifard, et al. ICML 2011]

slide-62
SLIDE 62

2 4 6 8 10 x 10

4

10 5 5 10 15 Steps Return

Initial+iFDD ATC Initial Tabular SDM

Learner

30

Rescue Mission

[Geramifard, et al. ICML 2011]

slide-63
SLIDE 63

1

Learner

31

Outline

Planner Learner

2

slide-64
SLIDE 64

32

Effect of Model Inaccuracy

P l a n n e r L e a r n e r

2

[Geramifard, et al. ACC 2011]

slide-65
SLIDE 65

Overly Restrictive [Heger 1994] Lack of Analytical Convergence [Geibel et al. 2005] No Robustness Guarantees [Abbeel et al. 2005]

Existing Gap

33

P l a n n e r L e a r n e r

2

slide-66
SLIDE 66

P l a n n e r L e a r n e r

2

Approach

!!!"

!##$%&'()*% +,'--%&

.#&,/

0%'&-)-1 ",1#&)(23 +%&4#&3'-5% "-',67)7 "1%-(89%2)5,% !"#$%&'()*+# ),"#+ ,'#+&-($",)#

[Redding, Geramifard, How, ACC 2010]

34

slide-67
SLIDE 67

P l a n n e r L e a r n e r

2

Approach

Stochastic Risk Model, Learners with Implicit Policy Formulation

!!!"

!##$%&'()*%+,%'&-%&

!#-.%-./. 0'.%1+ 0/-12%+ "23#&)(45 6!00"7

8#&21

,%'&-%& 9).: "-'2;.). "3%-(<=%4)>2%

!"#$%&'()!*#+,!"#$%&'''- !"! #(!) #"! #"" !"*"+ #(!) !

35

[Geramifard, et al. ACC 2011]

slide-68
SLIDE 68

P l a n n e r L e a r n e r

2

Grid W

  • rld Example

36

30% Uniform Noise for Movement (Not known to the agent) Rewards {+1,-1,-.001}

slide-69
SLIDE 69

37

! "!!! #!!! $!!! %!!! &!!!! !&'( !& !!'( ! !'( )*+,- .+*/01

Optimal NAC CNAC Planner

! "!!! #!!! $!!! %!!! &!!!! !&'( !& !!'( ! !'( )*+,- .+*/01

Optimal Sarsa CSarsa Planner

Grid W

  • rld

P l a n n e r L e a r n e r

2

[Geramifard, et al. ACC 2011]

slide-70
SLIDE 70

38

UAV Mission

P l a n n e r L e a r n e r

2

1 2 3

.5 [2,3] +100

4

.5 [2,3] +100

5

[3,4] +200 5 8

6

+100 .7

7

+300 .6

5% Movement Failure (Not known to the agent)

slide-71
SLIDE 71

P l a n n e r L e a r n e r

2

UAV Mission Results

39

40 50 60 70 80 90 100

Optimality

0% 20% 40% 60% 80% 100%

P(Crash) Planner Learner Planner + Learner Planner Learner Planner + Learner

[Geramifard, et al. ACC 2011]

slide-72
SLIDE 72

1

Learner

40

Outline

Planner Learner

2

slide-73
SLIDE 73

Introduced incremental Feature Dependency Discovery (iFDD) Scaled existing online RL methods to large domains using iFDD

Contributions

41

1

L e a r n e r

Combined online learning methods with cooperative planners

P l a n n e r L e a r n e r

2

slide-74
SLIDE 74

Backup Slides

42

slide-75
SLIDE 75

iFDD

slide-76
SLIDE 76

44

Algorithm 1:Discover Input: φ(s), δt, ξ, F, ψ Output: F, ψ foreach (g, h) ∈ {(i, j)|φi(s)φj(s) = 1} do f ← g ∧ h if f / ∈ F then ψf ← ψf + |δt| if ψf > ξ then F ← F ∪ f end end end

slide-77
SLIDE 77

45

Algorithm 2:Activate Features Input: φ0(s), F Output: φ(s) φ(s) ← ¯ activeInitialFeatures ← {i|φ0

i (s) = 1}

Candidates ← ℘(activeInitialFeatures) (*sorted by set size) while activeInitialFeatures = ∅ do f ← Candidates.next() if f ∈ F then activeInitialFeatures ← activeInitialFeatures −f φf(s) ← 1 end end return φ(s)

slide-78
SLIDE 78

46

2 4 6 8 10 x 10

4

500 1000 1500 2000 2500 3000 Steps Balancing Steps

Initial Tabular Guassian ATC Initial+iFDD

  • ·
slide-79
SLIDE 79

47

! " # $ % &! '(&!

#

!&)* !& !!)* ! !)* & +,-./ 0-,123

Tabular Initial ATC initial+iFDD

slide-80
SLIDE 80

SDM

slide-81
SLIDE 81

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-82
SLIDE 82

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-83
SLIDE 83

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-84
SLIDE 84

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-85
SLIDE 85

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-86
SLIDE 86

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-87
SLIDE 87

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-88
SLIDE 88

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-89
SLIDE 89

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-90
SLIDE 90

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-91
SLIDE 91

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-92
SLIDE 92

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-93
SLIDE 93

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-94
SLIDE 94

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-95
SLIDE 95

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-96
SLIDE 96

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-97
SLIDE 97

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-98
SLIDE 98

Sparse Distributed Memories (SDM)

49

[Ratitch et al. 2004]

slide-99
SLIDE 99

iCCA

slide-100
SLIDE 100

Stochastic Domain, Known Deterministic Risk Model [ACC 2010, GNC 2010]

!"#$%&!

'((! )*+$$#,

  • .,*/

!0%.,1',2%20 34 3256 !$+*7525 '.$%,.**#, !"#$%&'()*+# ),"#+ ,'#+&-($",)#.!"""#

!''!

89)

51

slide-101
SLIDE 101

52

Algorithm 1: Cooperative Natural Actor-Critic (CNAC) Input: πp, ξ Output: a a ∼ πAC(s, a) if not safe(s, a) then P(s, a) ← P(s, a) − ξ a ← πp Q(s, a) ← Q(s, a) + αδt(Q) P(s, a) ← P(s, a) + αQ(s, a)

[ACC 2010]

slide-102
SLIDE 102

53

Algorithm 2: safe Input: s, a Output: isSafe risk ← 0 for i ← 1 to M do t ← 1 st ∼ T p(s, a) while not constrained(st) and not isTerminal(st) and t < H do st+1 ∼ T p(st, πp(st)) t ← t + 1 risk ← risk + 1

i (constrained(st) − risk)

isSafe ← (risk < ψ)

[ACC 2010, GNC 2010]

slide-103
SLIDE 103

54

Algorithm 3: Cooperative Learning Input: N, πp, s, learner Output: a a ← πp(s) πl ← learner.π knownness ← min{1, count(s,a)

N

} if rand() < knownness then a′ ∼ πl(s, a) if safe(s, a′) then a ← a′ else count(s, a) ← count(s, a) + 1 learner.update()

[ACC 2011]

slide-104
SLIDE 104

55

1 2 3

.5 [2,3] +100

4

.5 [2,3] +100

5

[3,4] +200 5 8

6

+100 .7

7

+300 .6 ! " # $ % &! '(&!

#

!)!! !"!! !&!! ! &!! "!! )!! #!! *!! $!! +!! ,-./0 1.-234 Actor-Critic CBBA Optimal iCCA

(a) Step based performance

30 40 50 60 70 80 90 100

Optimality iCCA CBBA Actor-Critic

(b) Optimality after training

[ACC 2011]

slide-105
SLIDE 105

References

slide-106
SLIDE 106
  • J. hong Wu and R. Givan, “Feature-discovering approximate value iteration

methods,” in Symposium on Abstraction, Reformulation, and Approximation (SARA), 2005.

  • J. Z. Kolter and A. Y

. Ng, “Regularization and feature selection in least-squares temporal difference learning,” in ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning. New Y

  • rk, NY

, USA: ACM, 2009, pp. 521–528.

  • S. Whiteson, M. E. Taylor, and P

. Stone, “ Adaptive tile coding for value function approximation,” University of Texas at Austin, Tech. Rep. AI-TR-07-339, 2007.

  • F. Rivest and D. Precup, “Combining TD-learning with cascade-correlation

networks,” in In Proceedings of the Twentieth International Conference on Machine Learning. AAAI Press,2003, pp. 632– 639.

P . Abbeel and A. Y . Ng, “Exploration and apprenticeship learning in reinforcement learning,” in Proc. 21st International Conference on Machine Learning. ICML, 2005,

  • pp. 1–8.

P . Geibel and F. Wysotzki, “Risk-sensitive reinforcement learning applied to chance constrained control,” JAIR, vol. 24, 2005.

  • M. Heger, “Consideration of risk and reinforcement learning,” in Machine Learning,

Proceedings of the 11th International Conference, 1994, pp. 105–111.

57

slide-107
SLIDE 107
  • R. S. Sutton and S. D. Whitehead, “Online learning with random representations,” in

In Proceedings of the Tenth International Conference on Machine Learning. Morgan Kaufmann, 1993, pp. 314–321.

  • S. I. Reynolds, “

Adaptive resolution model-free reinforcement learning: Decision boundary partitioning,” in In Proc. 17th International Conf. on Machine Learning. Morgan Kaufmann, 2000, pp. 783–790.

  • B. Ratitch and D. Precup, “Sparse distributed memories for on-line value-based

reinforcement learning,” in ECML, ser. Lecture Notes in Computer Science, J. F. Boulicaut, F. Esposito, F. Giannotti, and D. Pedreschi, Eds., vol. 3201. Springer, 2004,

  • pp. 347–358.
  • H. R. Maei and R. S. Sutton, “GQ(λ): A general gradient algorithm for temporal

difference prediction learning with eligibility traces,” in Proceedings of the Third Conference on Artificial General Intelligence (AGI-10), Lugano, Switzerland, 2010. W . B. Knox and P . Stone, “Combining manual feedback with subsequent MDP reward signals for reinforcement learning,” in Proc. of 9th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010), May 2010.

58

slide-108
SLIDE 108

Publications

slide-109
SLIDE 109
  • A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. P

. How, “Incremental Feature Dependency Discovery”, International Conference on Machine Learning, 2011 (submitted)

  • A. Geramifard, J. Redding, N. Roy, and J. P

. How, “UAV Cooperative Control with Stochastic Risk Model”, American Control Conference, 2011 (submitted)

  • J. Redding, A. Geramifard, H.-L. Choi, and J. P

. How, “ Actor-critic policy learning in cooperative planning,” in AIAA Guidance, Navigation, and Control Conference (GNC), 2010.

  • J. Redding, A. Geramifard, and J. How, “

Actor-critic policy learning in cooperative planning,” in AAAI Spring Symposium Series, 2010.

  • J. Redding, A. Geramifard, A. Undurti, H. Choi, and J. How, “

An intelligent cooperative control architecture,” in American Control Conference, 2010

  • R. He, A. Bachrach, M. Achtelik, A. Garamifard, D. Gurdan,, S. Prentice, J. Stumpf,
  • N. Roy, “On the Design and Use of a Micro Air V

ehicle to T rack and Avoid Adversaries”, International Journal of Robotics Research (IJRR), 2008

60

Referred Publications

slide-110
SLIDE 110
  • A. Bachrach, A. Garamifard, D. Gurdan, R. He, S. Prentice, J. Stumpf, N. Roy,

“Coordinated Planning Under Uncertainty with Air and Ground V ehicles”, Proceedings of the 11th International Symposium on Experimental Robotics (ISER), 2008

  • R. Sutton, Cs. Szepesvári, A. Geramifard and M. Bowling, “Dyna-Style Planning

with Linear Function Approximation and Prioritized Sweeping”, Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI), pages 528-536, 2008, [28% acceptance]

  • M. Bowling, A. Geramifard, D. Wingate, “Sigma Point Policy Iteration”,

Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 379-386, 2008, [22% acceptance]

  • A. Geramifard, M. Bowling, M. Zinkevich, R. Sutton, “iLSTD: Eligibility T

races & Convergence Analysis ”, In B. Schölkopf and J.C. Platt and T. Hofmann editors, Advances in Neural Information Processing Systems 19 (NIPS), pages 440-448, 2007. [24% acceptance]

  • A. Geramifard, M. Bowling, R. Sutton, “Incremental Least-Square Temporal

Difference Learning”, Proceedings of 21t Conference, American Association for Artificial Intelligence (AAAI), pages 356-361, 2006. [30% acceptance]

Referred Publications

61

slide-111
SLIDE 111
  • A. Geramifard, P

. Chubak, V . Bulitko, “Biased Cost Pathfinding”, Proceedings of second Conference, Artificial Intelligence and Interactive Digital Entertainment (AIIDE) 2006.[73% acceptance]

  • A. Geramifard, P

. Nayei, R. Zamaninasab, J. Habibi, "A Hybrid Three Layer Architecture for Fire Agent Management in Rescue Simulation Environment", International Journal of Advanced Robotic Systems, V

  • l 2,No 2, 2005

A Nouri, R. Zamani-Nasab, J. Habibi, A. Geramifard " Task Allocation in Complex Multiagent Systems with Parallel Scheduling ", W

  • rkshop on Information Technology

& its Disciplines, Kish Island, Iran, February 2004

  • J. Habibi, M. Ahmadi, A. Nouri, M. M. Nevisi, A. Geramifard, et al. , " Arian

Agents: A Set of Implemented Agents for RoboCup Rescue Simulation Environment", In Proceedings of the RoboCup Symposium, Padova, Italy, 2003.

Referred Publications

62

slide-112
SLIDE 112

Course W

  • rk
slide-113
SLIDE 113

Graduate Course W

  • rk
  • Introduction to Reinforcement Learning (A-)
  • Individual Study AIBO Programming (A)
  • Machine Learning (A)
  • Real-Time Search (A+)
  • Practical Reinforcement Learning (A-)
  • AI and Games (A-)

Tehran University University of Alberta

  • Robotics*
  • Image Processing*

64

slide-114
SLIDE 114
  • 16.412 Cognitive Robotics (A)
  • 16.413 Principles of Autonomy & Decision Making (A+)
  • 16.420 Planning Under Uncertainty (A)
  • 6.832 Underactuated Robotics (A)
  • Introduction to Reinforcement Learning (A-)

Graduate Course W

  • rk

MIT-Major (Autonomy)

T ransferred

65

slide-115
SLIDE 115
  • 9.660 Computational Cognitive Science (A)
  • 9.914 Exploration Within Exploration (A)
  • 6.833 The Human Intelligence Enterprise (A)

Graduate Course W

  • rk

MIT-Minor (Brain & Cognitive Science)

  • 18.085 Computational Science and Eng. (A+)
  • 21F.501 Japanese Language I (A)

MIT-Extra

66