online learning within cooperative planning
play

Online Learning within Cooperative Planning Alborz Geramifard - PowerPoint PPT Presentation

Online Learning within Cooperative Planning Alborz Geramifard September, 2010 agf@mit.edu Joint W ork: Finale Doshi, Josh Redding, Nicholas Roy, Jonathan How Supported by: AFOSR 1 Problem W aypoint Obstacle Base 2 Why is this a hard


  1. 2 r e n r a e L r e n n a l P Existing Gap Overly Restrictive [ Heger 1994 ] Lack of Analytical Convergence [ Geibel et al. 2005 ] No Robustness Guarantees [ Abbeel et al. 2005 ] 33

  2. 2 r e n r a e L Approach r e n n a l P ! !!" !"#$%&'()*+# 0%'&-)-1 ",1#&)(23 !##$%&'()*% "1%-(89%2)5,% +,'--%& +%&4#&3'-5% "-',67)7 .#&,/ ,'#+&-($",)# ),"#+ [ Redding, Geramifard, How, ACC 2010 ] 34

  3. 2 r e n r a e L Approach r e n n a l P ! !!" !##$%&'()*%+,%'&-%& # " ! !#-.%-./. ,%'&-%& 0'.%1+ 0/-12%+ # (!) ! " ! "23#&)(45 # " " "3%-(<=%4)>2% !"*"+ 6!00"7 9).: ! "-'2;.). # (!) 8#&21 !"#$%&'()!*#+, !"#$%&''' - Stochastic Risk Model, Learners with Implicit Policy Formulation [ Geramifard, et al. ACC 2011 ] 35

  4. 2 r e n r a e L r e n n a l P Grid W orld Example 30 % Uniform Noise for Movement ( Not known to the agent ) Rewards { +1, - 1, - .001 } 36

  5. 2 r e n r a e L r e n n a l P Grid W orld Optimal Optimal CSarsa CNAC !'( !'( Planner Planner ! ! .+*/01 .+*/01 Sarsa NAC ! !'( ! !'( ! & ! & ! &'( ! &'( ! "!!! #!!! $!!! %!!! &!!!! ! "!!! #!!! $!!! %!!! &!!!! )*+,- )*+,- [ Geramifard, et al. ACC 2011 ] 37

  6. 2 r e n r a e L UAV Mission r e n n a l P +100 [2,3] .5 2 1 3 +100 5 6 .7 8 [2,3] [3,4] 4 5 +100 +200 .5 +300 7 .6 5 % Movement Failure ( Not known to the agent ) 38

  7. 2 r e n r a e L r e n n a l P UAV Mission Results P(Crash) Optimality 100 100% 90 80% 80 60% 70 40% 60 20% 50 0% 40 Learner Learner Planner + Learner Planner + Learner Planner Planner [ Geramifard, et al. ACC 2011 ] 39

  8. Outline 1 Learner 2 Planner Learner 40

  9. 1 Contributions r e n a r e L Introduced incremental Feature Dependency Discovery ( iFDD ) Scaled existing online RL methods to large domains using iFDD 2 e r n r a e L r e n n a P l Combined online learning methods with cooperative planners 41

  10. Backup Slides 42

  11. iFDD

  12. Algorithm 1: Discover Input : φ ( s ) , δ t , ξ , F , ψ Output : F , ψ foreach ( g, h ) ∈ { ( i, j ) | φ i ( s ) φ j ( s ) = 1 } do f ← g ∧ h ∈ F then if f / ψ f ← ψ f + | δ t | if ψ f > ξ then F ← F ∪ f end end end 44

  13. Algorithm 2: Activate Features Input : φ 0 ( s ) , F Output : φ ( s ) φ ( s ) ← ¯ 0 activeInitialFeatures ← { i | φ 0 i ( s ) = 1 } Candidates ← ℘ ( activeInitialFeatures ) (*sorted by set size) while activeInitialFeatures � = ∅ do f ← Candidates .next() if f ∈ F then activeInitialFeatures ← activeInitialFeatures − f φ f ( s ) ← 1 end end return φ ( s ) 45

  14. · � � � Initial+iFDD 3000 ATC 2500 Guassian 2000 Balancing Steps Tabular Initial 1500 1000 500 0 0 2 4 6 8 10 Steps 4 x 10 46

  15. initial+iFDD & !)* Tabular ! 0-,123 Initial ! !)* ATC ! & ! &)* ! " # $ % &! +,-./ # '(&! 47

  16. SDM

  17. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  18. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  19. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  20. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  21. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  22. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  23. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  24. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  25. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  26. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  27. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  28. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  29. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  30. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  31. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  32. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  33. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  34. Sparse Distributed Memories ( SDM ) [ Ratitch et al. 2004 ] 49

  35. iCCA

  36. ! ''! !"#$%& ! !"#$%&'()*+# !0%.,1',2%20 34 '((! '.$%,.**#, )*+$$#, 3256 !$+*7525 89) -.,*/ ,'#+&-($",)#. !"""# ),"#+ Stochastic Domain, Known Deterministic Risk Model [ ACC 2010, GNC 2010 ] 51

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend