Autonomous Learning of Ball Trapping in the Four-legged Robot - - PowerPoint PPT Presentation

autonomous learning of ball
SMART_READER_LITE
LIVE PREVIEW

Autonomous Learning of Ball Trapping in the Four-legged Robot - - PowerPoint PPT Presentation

Autonomous Learning of Ball Trapping in the Four-legged Robot League Hayato Kobayashi 1 , Tsugutoyo Osaki 2 , Eric Williams 2 , Akira Ishino 1 , Ayumi Shinohara 2 1 Kyushu University, Japan 2 Tohoku University, Japan Motivation Passwork in


slide-1
SLIDE 1

Autonomous Learning of Ball Trapping in the Four-legged Robot League

Hayato Kobayashi1, Tsugutoyo Osaki2, Eric Williams2, Akira Ishino1, Ayumi Shinohara2

1Kyushu University, Japan 2Tohoku University, Japan

slide-2
SLIDE 2

Motivation

 Passwork in the four-legged robot league

 KeepAway Soccer [Stone et al. 2001]

 Benchmark of good passing abilities in the

simulation league

 Passing Challenge

 Technical challenge in this year

It is too difficult for dogs

http://www.cs.utexas.edu/~AustinVilla/sim/keepaway/

KeepAway Soccer

slide-3
SLIDE 3

Ball Trapping

 Stop and control an oncoming ball

slide-4
SLIDE 4

The passer is watching the chest of the receiver. The receiver is watching the ball.

One-dimensional Model

slide-5
SLIDE 5

Autonomous Method

 Same way as diligent humans

Kick Wall

slide-6
SLIDE 6

Slope made of cardbord

Training Equipment

Limit ball’s movement and robot’s locomotion to one-dimension

Rails made of string

slide-7
SLIDE 7

Learning Method

 Sarsa(λ) [Rummery and Niranjan 1994; Sutton 1996]

 Reinforcement learning algorithm

 Tile-coding (aka CMACs [Albus 1975] )

 Linear function approximation  For speeding up their learning

slide-8
SLIDE 8

Reinforcement Learning

Agent (AIBO)

Environment

In our study, each time step t = 0, 1, 2, … mean 0ms, 40ms, 80ms, …

action

t

a

state

t

s reward

1  t

r

 Acquire maps from state input to action

  • utput maximizing the sum of rewards
slide-9
SLIDE 9

Implementation

 State st =(xt, dxt)

 xt・・・The distance from the robot to the ball

[0,2000](mm)

 dxt・・・The difference between the current xt and

the previous xt of one time step before. [-200,200](mm)

 Action at

 ready・・・Move its head to watch the ball  trap・・・Initiate the trapping motion

slide-10
SLIDE 10

Implementation

 Reward rt+1

 Positive

 If the ball was correctly captured between the chin and

the chest after the trap action.

 Negative

 If the trap action failed, or  If the ball touches the chest PSD sensor before the trap

action is performed.

 Zero

 Otherwise

slide-11
SLIDE 11

Implementation

 Episode

 The period from kicking the ball to

receiving any reward other than zero

Kick! Trap!

slide-12
SLIDE 12

Experiments

 Using one robot  Using two robots

 without communication  with communication

slide-13
SLIDE 13

Using One Robot

 Earlier phase

https://youtu.be/hv1sgIZLpKA

slide-14
SLIDE 14

Using One Robot

 Later phase

https://youtu.be/XJBllv7wJXQ

slide-15
SLIDE 15

50 100 150 200 250 300 350

episodes

20 40 60 80 100

traping success rate

Result of Learning Using One Robot

trapping success rate every 10 episodes

slide-16
SLIDE 16

Episodes 1…50

500 1000 1500 2000

x

  • 200
  • 150
  • 100
  • 50

50 100 150 200

dx

failure success collision

  • successful

× failed in spite of trying ▲ failed because of doing nothing

Result of each episode

slide-17
SLIDE 17

Episodes 51…100

500 1000 1500 2000

x

  • 200
  • 150
  • 100
  • 50

50 100 150 200

dx

failure success collision

  • successful

× failed in spite of trying ▲ failed because of doing nothing

Result of each episode

slide-18
SLIDE 18

Episodes 101…150

500 1000 1500 2000

x

  • 200
  • 150
  • 100
  • 50

50 100 150 200

dx

failure success collision

  • successful

× failed in spite of trying ▲ failed because of doing nothing

Result of each episode

slide-19
SLIDE 19

Episodes 151…200

500 1000 1500 2000

x

  • 200
  • 150
  • 100
  • 50

50 100 150 200

dx

failure success collision

  • successful

× failed in spite of trying ▲ failed because of doing nothing

Result of each episode

slide-20
SLIDE 20

Using Two Robots

 Simply replace slope with another robot

 Active Learner (AL)

 Original robot  Same as in case of training using one robot

 Passive Learner (PL)

 Replaces slope  Does not approach the ball if the trapping failed

slide-21
SLIDE 21

Using Two Robots

 Earlier phase

https://youtu.be/sXkVYZjOzjg

slide-22
SLIDE 22

Using Two Robots

 Later phase

https://youtu.be/opvoyv9h-GU

slide-23
SLIDE 23

Result of Learning Using Two Robots Without Communication

50 100 150 200 250 300 350

episodes

20 40 60 80 100

traping success rate

AL PL

Active Learner Passive Learner

trapping success rate every 10 episodes

slide-24
SLIDE 24

Problem of Using Two Robots

 Takes a long time to learn

 AL can only learn when PL itself succeeds

 Cannot learn if the ball is not returned

 Even if we use only two ALs, the problem

is not resolved

 Just learn slowly, though simultaneously.

slide-25
SLIDE 25

Solution

 Sharing their experiences

 Their experiences include

 Action at (trap or ready)  State variables st =(xt,dxt)  Reward rt+1

slide-26
SLIDE 26

50 100 150 200 250 300 350 400

episodes

20 40 60 80 100

traping success rate

AL PL

Result of Learning Using Two Robots With Communication

Active Learner Passive Learner

trapping success rate every 10 episodes

slide-27
SLIDE 27

Conclusion

 The goal of pass-work is achieved in

  • ne-dimension

 learned the skills without human

intervention

 learned more quickly by exchanging

experiences with each other

slide-28
SLIDE 28

Future Work

 Extend trapping skills to two-dimensions

 Layered Learning [Stone 2000]  Make goalies stronger

 Make robots learn passing skills

simultaneously

slide-29
SLIDE 29

Thank you for your attention!

Bremen is a good town!