an architecture for action selection in robotic soccer
play

An Architecture for Action Selection in Robotic Soccer Peter Stone - PowerPoint PPT Presentation

An Architecture for Action Selection in Robotic Soccer Peter Stone Joint work with David McAllester RoboCup Use soccer as a rich and realistic test-bed An international AI and Robotics research initiative Multiple teammates with a


  1. An Architecture for Action Selection in Robotic Soccer Peter Stone Joint work with David McAllester

  2. RoboCup � Use soccer as a rich and realistic test-bed An international AI and Robotics research initiative � Multiple teammates with a common goal � Multiple adversaries — not known in advance Research challenges � Real-time decision making necessary � Noisy sensors and actuators � Enormous state-space Slide # 2

  3. CMUnited-99 � Stone, Riley, Veloso � 1999 simulator league world champions � 37-team field; Total score: 110–0 (8 games) � Learned low-level behaviors � Heuristic high-level action decision � Dribble; Shoot; Hold; Clear; Pass (10) Here: Improvements over CMUnited-99 Slide # 3

  4. Outline � RoboCup simulator � Action Selection Architecture � Leading Passes � Force Field Control for Off-Ball Motion � Results Slide # 4

  5. � Distributed: each player a separate client � Server models dynamics and kinematics RoboCup Simulator .. � Clients receive sensations, send actions Client 1 � Parametric actions: dash, turn, kick, say Cycle t-1 t t+1 t+2 Server � Abstract, noisy sensors, hidden state � Hear sounds from limited distance Client 2 � See relative distance, angle to objects ahead 9 states 10 � > 23 � Limited resources: stamina � Play occurs in real time ( � human parameters) Slide # 5

  6. Outline � RoboCup simulator � Action Selection Architecture � Leading Passes � Force Field Control for Off-Ball Motion � Results Slide # 6

  7. Motivation � v ( s ) � expected reward from state s (RL) Decisions based on a Value Function 0 0 when � P ( s j s; a ) � probability of outcome s a from s � Select option with highest X selecting option (action) 0 0 P ( s j s; a ) v ( s ) 0 s Slide # 7

  8. Options � Execute the option with the highest score An option can be scored and executed � Scoring: � p � probability of success s � v ; v � values of succeeding, failing s f � Score: p v + (1 � p ) v s s s f � value function currently hand-written � Scoring across options must be comparable Slide # 8

  9. Aside: Soft Boolean Expressions Æ � x < y 2 [0 ; 1℄ (continuous) Æ Avoid discontinuities x = y ) x < y = 1 = 2 Æ x << 0 ) x < y � 0 Æ x >> 1 ) x < y � 1 � � if ( p; x; y ) assumes p 2 [0 ; 1℄ � ( p; x; y ) � px + (1 � p ) y � Æ � Often write if ( x < y ; z ; w ) . if Slide # 9

  10. � Consider hundreds of passes: o � angle increments of 4 Pass Option � speed increments of 0 : 2 m=se � I I � teammate (opponent) interception time t ( o ) � Approximate, fast computation � Score: larger margin ) larger p s � 5 p = if ( I < I ; : 9 ; 0) s t o � v s based on ball’s predicted location after pass � v = 0 f Slide # 10

  11. Other Options � p I s related only to o � v >> 0 s Shot Option: kick towards a point in the goal � v = 0 f � p I s related only to o � v > 0 s Clear Option: kick the ball down the field � v = 0 f � Difficult to calibrate many Others: dribble, send, hold, cross, ... Slide # 11

  12. Leading Passes � Usually a pass option is selected � Many leading passes seen CMUnited-99: only direct passes Now: hundreds considered Movement without the ball is also crucial � Forces over limited regions � Boundaries treated as hard constraints CMUnited-99: SPAR Slide # 12

  13. Outline � RoboCup simulator � Action Selection Architecture � Leading Passes � Force Field Control for Off-Ball Motion � Results Slide # 13

  14. Movement Off the Ball In principle: derivative of value function Here: vector sum of force fields B Offsides line C O B B S T d � distance of the player to the ball b B Teammate Opponent � 10 F � B + O + if ( d < 20 ; T + C ; S ) b Slide # 14

  15. Force Fields B Offsides line C O B B S T B Teammate Opponent Bounds-Repellent (B): Stay on the field Offsides-Repellent (O): Stay on-sides Strategic (S): Stay about 20m from teammates Tactical (T): But not too close Get-clear (C): Move away from “key” defender Slide # 15

  16. � Keepaway vs. CMUnited-99 Results – Goal: maintain possession � Possession time in 95% confidence intervals – No offensive or defensive reasoning x Position Program Possession Time Mean Ball CMUnited-99 5.7-6.6 sec -19.5 New Team 16.9-18.7 sec -33.6 Very insensitive to most parameters Slide # 16

  17. S b : Force of unit magnitude towards the ball S Varying d : Force downfield S � : b , d , or b d S S , S + S S + S S + S + S � 10 � F � B + O + if ( d < 20 ; T + C ; S ) b x Position S b Program Possession Time Mean Ball S + S d CMUnited 5.7-6.6 -19.5 S + S b d 16.9-18.7 -33.6 S + S + S 24.8-27.9 -35.9 22.2-25.2 25.7 23.7-26.8 26.6 Slide # 17

  18. Overall Results � CMUnited-99 vs. CMUnited-99: 0.3 – 0.3 � New Team vs. CMUnited-99: 2.5 – 0.3 � ATT-CMUnited-2000: 3rd place RoboCup-2000 Competition � Stone, Riley, McAllester, Veloso � Also included dynamic set plays � 35-team field; Total score: 26–11 (8 games) [Riley & Veloso, 2001] Slide # 18

  19. � An option-based action-selection architecture Summary � Leading Passes in RoboCup soccer � Force Field Control for Off-Ball Motion � Samba [Riekki & Roenig, ’98] : force fields for Related Work � SPAR [Veloso et al., ’99] : limited regions, hard action selection constraints � Learn the option value functions using RL Future Work Slide # 19

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend