autonomous learning of ball
play

Autonomous Learning of Ball Trapping in the Four-legged Robot - PowerPoint PPT Presentation

Autonomous Learning of Ball Trapping in the Four-legged Robot League Hayato Kobayashi 1 , Tsugutoyo Osaki 2 , Eric Williams 2 , Akira Ishino 1 , Ayumi Shinohara 2 1 Kyushu University, Japan 2 Tohoku University, Japan Motivation Passwork in


  1. Autonomous Learning of Ball Trapping in the Four-legged Robot League Hayato Kobayashi 1 , Tsugutoyo Osaki 2 , Eric Williams 2 , Akira Ishino 1 , Ayumi Shinohara 2 1 Kyushu University, Japan 2 Tohoku University, Japan

  2. Motivation  Passwork in the four-legged robot league  KeepAway Soccer [Stone et al. 2001]  Benchmark of good passing abilities in the simulation league  Passing Challenge  Technical challenge in this year It is too difficult for dogs KeepAway Soccer http://www.cs.utexas.edu/~AustinVilla/sim/keepaway/

  3. Ball Trapping  Stop and control an oncoming ball

  4. One-dimensional Model The passer is watching the chest of the receiver. The receiver is watching the ball.

  5. Autonomous Method  Same way as diligent humans Kick Wall

  6. Training Equipment Limit ball ’ s movement and robot ’ s locomotion to one-dimension Slope made of cardbord Rails made of string

  7. Learning Method  Sarsa(λ) [Rummery and Niranjan 1994; Sutton 1996]  Reinforcement learning algorithm  Tile-coding (aka CMACs [Albus 1975] )  Linear function approximation  For speeding up their learning

  8. Reinforcement Learning  Acquire maps from state input to action output maximizing the sum of rewards reward r  1 t Agent action Environment a (AIBO) t state s t In our study, each time step t = 0, 1, 2, … mean 0ms, 40ms, 80ms, …

  9. Implementation  State s t = ( x t , dx t )  x t ・・・ The distance from the robot to the ball [0,2000] ( mm )  dx t ・・・ The difference between the current x t and the previous x t of one time step before. [-200,200] ( mm )  Action a t  ready ・・・ Move its head to watch the ball  trap ・・・ Initiate the trapping motion

  10. Implementation  Reward r t+1  Positive  If the ball was correctly captured between the chin and the chest after the trap action.  Negative  If the trap action failed, or  If the ball touches the chest PSD sensor before the trap action is performed.  Zero  Otherwise

  11. Implementation  Episode  The period from kicking the ball to receiving any reward other than zero Trap! Kick!

  12. Experiments  Using one robot  Using two robots  without communication  with communication

  13. Using One Robot  Earlier phase https://youtu.be/hv1sgIZLpKA

  14. Using One Robot  Later phase https://youtu.be/XJBllv7wJXQ

  15. Result of Learning Using One Robot trapping success rate every 10 episodes 100 80 60 traping success rate 40 20 0 0 50 100 150 200 250 300 350 episodes

  16. Episodes 1 … 50 Result of each episode ● successful × failed in spite of trying 200 failure success ▲ failed because of doing nothing collision 150 100 50 dx 0 -50 -100 -150 -200 0 500 1000 1500 2000 x

  17. Episodes 51 … 100 Result of each episode ● successful × failed in spite of trying 200 failure success ▲ failed because of doing nothing collision 150 100 50 dx 0 -50 -100 -150 -200 0 500 1000 1500 2000 x

  18. Episodes 101 … 150 Result of each episode ● successful × failed in spite of trying 200 failure success ▲ failed because of doing nothing collision 150 100 50 dx 0 -50 -100 -150 -200 0 500 1000 1500 2000 x

  19. Episodes 151 … 200 Result of each episode ● successful × failed in spite of trying 200 failure success ▲ failed because of doing nothing collision 150 100 50 dx 0 -50 -100 -150 -200 0 500 1000 1500 2000 x

  20. Using Two Robots  Simply replace slope with another robot  Active Learner (AL)  Original robot  Same as in case of training using one robot  Passive Learner (PL)  Replaces slope  Does not approach the ball if the trapping failed

  21. Using Two Robots  Earlier phase https://youtu.be/sXkVYZjOzjg

  22. Using Two Robots  Later phase https://youtu.be/opvoyv9h-GU

  23. Result of Learning Using Two Robots Without Communication trapping success rate every 10 episodes 100 AL PL 80 Active Learner Passive Learner 60 traping success rate 40 20 0 0 50 100 150 200 250 300 350 episodes

  24. Problem of Using Two Robots  Takes a long time to learn  AL can only learn when PL itself succeeds  Cannot learn if the ball is not returned  Even if we use only two ALs, the problem is not resolved  Just learn slowly, though simultaneously.

  25. Solution  Sharing their experiences  Their experiences include  Action a t ( trap or ready )  State variables s t =( x t ,dx t )  Reward r t+1

  26. Result of Learning Using Two Robots With Communication trapping success rate every 10 episodes 100 AL PL 80 Active Learner Passive Learner 60 traping success rate 40 20 0 0 50 100 150 200 250 300 350 400 episodes

  27. Conclusion  The goal of pass-work is achieved in one-dimension  learned the skills without human intervention  learned more quickly by exchanging experiences with each other

  28. Future Work  Extend trapping skills to two-dimensions  Layered Learning [Stone 2000]  Make goalies stronger  Make robots learn passing skills simultaneously

  29. Thank you for your attention! Bremen is a good town!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend