Contextual Awareness for Robot Autonomy ( FA2386-10-1-4138 ) PI: - - PowerPoint PPT Presentation

contextual awareness for robot autonomy
SMART_READER_LITE
LIVE PREVIEW

Contextual Awareness for Robot Autonomy ( FA2386-10-1-4138 ) PI: - - PowerPoint PPT Presentation

Contextual Awareness for Robot Autonomy ( FA2386-10-1-4138 ) PI: Reid Simmons (Carnegie Mellon University) AFOSR Joint Program Review: Cognition and Decision Program Human-System Interaction and Robust Decision Making Program Robust


slide-1
SLIDE 1

Contextual Awareness for Robot Autonomy

(FA2386-10-1-4138) PI: Reid Simmons (Carnegie Mellon University)

AFOSR Joint Program Review: Cognition and Decision Program Human-System Interaction and Robust Decision Making Program Robust Computational Intelligence Program (Jan 28-Feb 1, 2013, Washington DC)

slide-2
SLIDE 2

Contextual Awareness (Simmons)

Objective: DoD Benefit: Technical Approach: Budget:

Actual/ Planned $K

FY11 FY12 FY13

179,479 / 166,995 155,232 / 173,353 74,670 / 186,394 Annual Progress Report Submitted? Y Y NA Project End Date: 8/23/2013

Develop approaches that provide robots with contextual awareness: Awareness of surroundings, capabilities, and intent

Anticipate possible failures and respond appropriately by planning and reasoning about uncertainty explicitly; Current work switches between policies at run time to maximize probability of exceeding reward threshold and finds anomalies in execution data using deviation from nominal model

Robots that are more robust to uncertainty in environment; Robots that are capable of understanding their limitations and responding intelligently

as of 12/31/12

slide-3
SLIDE 3

List of Project Goals 1. Develop algorithms to reliably detect, diagnose and recover from exceptional (and uncertain) situations 2. Develop approaches to determine robot’s own limitations and ask for assistance 3. Develop algorithms to explain actions to people 4. Develop approaches to learn from people

slide-4
SLIDE 4

Progress Towards Goals 1. Develop algorithms to reliably detect, diagnose and recover from exceptional (and uncertain) situations (ongoing work by Breelyn Kane, Robotics PhD) 2. Develop approaches to determine robot’s own limitations and ask for assistance (ongoing work by Juan Mendoza, Robotics PhD) 3. Develop algorithms to explain actions to people (postponed) 4. Develop approaches to learn from people (PhD thesis “Graph-based Trajectory Planning through Programming by Demonstration,” Nik Melchior, defended December 2010, thesis completed August 2012)

slide-5
SLIDE 5

Policy Switching to Exceed Reward Thresholds

(Breelyn Kane, PhD student, Robotics)

“Risk-Variant Policy Switching to Exceed Reward Thresholds,” B. Kane and R. Simmons, In Proceedings International Conference on Automated Planning and Scheduling, Sao Paulo, Brazil, June 2012.

slide-6
SLIDE 6

Acting to Exceed Reward Thresholds

  • In Competitive Domains,

Second is no Better than Last

– “The person that said winning isn’t everything, never won anything” (Mia Hamm) – “If you’re not first, you’re last!” (Ricky Bobby, Talladega Nights) – Arcade games: Not just beating a level, going for top score

slide-7
SLIDE 7

Straightforward Approach

  • Add time and cumulative reward to state space
  • Generate optimal plan and execute
  • Significantly increases state space

– Planning probably infeasible for real-world domains

slide-8
SLIDE 8

Offline Online Our Approach

  • Generate different policies: policies of varying

risk attitude

  • Estimate the reward distributions
  • Switch between policies: Calculate the

maximum probability of being over a threshold at each time step based on the current cumulative (discounted) reward

slide-9
SLIDE 9

Distribution of Rewards

  • Our work reasons about the complete,

non-parametric reward distribution, including the distribution tails

– Estimate reward distribution by running policy and gathering statistics  

x s V P  ) (

slide-10
SLIDE 10

Switching Decision Criterion

   

thresh s R P  maximize

   

         

 t t t

s R thresh s V P 

  1

minarg

slide-11
SLIDE 11

Pizza Delivery Domain

Risk = 1.2 Risk-Neutral

“30 Minutes

  • r It’s Free”
slide-12
SLIDE 12

Pizza Domain Results

  • Execute 10,000 runs in original MDP
  • Same start state every time
  • Risk-neutral vs switching (with risky policy d=1.2)

Fails 9.5% less using switching strategy; Reduces losses by 30.6%

Threshold = -100;

Fails 22.4% less using switching strategy; Reduces losses by 27.9%

Threshold = -70 Fails to Exceed the Threshold Risk Neutral Fails: 3120 Switching Fails: 2166 Fails to Exceed the Threshold Risk Neutral Fails: 8026 Switching Fails: 5790

slide-13
SLIDE 13

Augmented State Approach

  • Augment state space with cumulative reward

– Integer-valued, no discounting – Reward capped [0, -150] – Action rewards based on location and current cumulative reward – State space increases by two orders of magnitude

slide-14
SLIDE 14

Comparison of Approaches

  • Execute 10,000 runs in original MDP
  • Same start state every time
  • No discounting
  • Augmented space vs switching (with risky policy d=1.2)

Augmented state fails 16.9% less than risk-neutral, original-space; Reduces losses by 21.2% Augmented state fails 6.8% less than switching strategy; Reduces losses by 9.9%

Fails to Exceed the Threshold (-70) Risk Neutral Fails: 7946 Switching Fails: 6945 Augmented State Fails: 6256

slide-15
SLIDE 15

Comparison of Approaches

+ Augmented state approach performs close to optimal

  • Very large planning time
  • Must re-generate policy when threshold changes
  • State space is enormous if discounting is needed

Augmented State Risk-Variant Switching Planning (Offline) Time Solve policy: 18 hours Solve policy: < 1min Generate reward distr: 5-10 min Construct CDF: 1 min Total: 18 hours Total: 12 min * 2 policies = 24 min Execution (Online) Time 0.015s Eval Switch : 0.02s

slide-16
SLIDE 16

Robust Execution Monitoring

(Juan Pablo Mendoza, PhD student, Robotics)

“Motion Interference Detection in Mobile Robots,” J.P. Mendoza, M. Veloso and R. Simmons, In Proceedings of International Conference on Intelligent Robots and Systems, Vilamoura, Portugal, October 2012 “Mobile Robot Fault Detection based on Redundant Information Statistics,” J.P. Mendoza,

  • M. Veloso and R. Simmons, In IROS Workshop on Safety in Human-Robot Coexistence and

Interaction, Vilamoura, Portugal, October 2012

slide-17
SLIDE 17

Learning to Detect Motion Interference

  • Learn HMM from Robot Data

– Includes nominal and Motion Interference (MI) states – Hand-labeled training data – Learn transition probabilities to nominal states – Learn observation probabilities of all states – Transition probability to MI state is tunable parameter

Stop Accel Constant Decel MI

slide-18
SLIDE 18

Learning Behavior Model

  • Observations

– Commanded Velocity – Velocity Difference: difference between commanded and perceived velocity, from encoders – Acceleration: Linear regression of last N measures

  • f velocity

– Jerk: Linear regression of last M measures of acceleration

slide-19
SLIDE 19

Example Runs

slide-20
SLIDE 20

Overall Performance

  • Precision/recall as MI transition probability varies
  • With precision = 1 and recall = 0.93,

median detection time was 0.36s (mean = 0.647)

slide-21
SLIDE 21

Detecting Unexpected Anomalies

  • Basic Idea

– Model nominal behavior – Detect significant deviation from nominal – Determine extent of anomaly

  • Make execution monitoring efficient, effective,

and informative

slide-22
SLIDE 22

Modeling Nominal Behavior

  • Define a residual function that is (close to) zero

during nominal behavior

– For instance, velocity difference or difference between estimates of redundant sensors

– Future work: Make residual function dependent on current state (e.g., using HMM)

slide-23
SLIDE 23

Detecting Deviation from Nominal

  • Compute sample mean of residual function from
  • bserved data
  • Compute probability that sample mean is not

within epsilon of zero

         N Fr

2

, ~ 

 

N F z z D a

r f f

r r

      

' '

1 2 ) (

slide-24
SLIDE 24

Estimate Extent of Anomaly

  • Define region around current state

– Currently: grid state space – Future: maintain continuous state space

  • Extend region in direction that increase a(d(R))

the most

– Currently: extend to form axis-aligned hyper-rectangles – Future: general convex shaped regions

  • Stop when found locally maximal anomaly region

– Current: Continue while a(d(R)) is non-decreasing – Future: Skip over “gaps”

slide-25
SLIDE 25

Example Runs

Push from Stop Pull from Stop Collision

slide-26
SLIDE 26

Discovering a Global Anomaly

  • Region grows and becomes more certain as robot travels

Ideally, keep upper and lower bounds

  • n anomaly region
slide-27
SLIDE 27
slide-28
SLIDE 28

Interaction with Other Groups and Organizations

  • Received Infinite Mario software and support from John

Laird’s group at University of Michigan; interaction with student Shiwali Mohan

  • Interacted with Sven Koenig (USC/NSF) and former

student Yaxin Liu regarding generation of risk-sensitive policies

  • Interaction with Manuela Veloso (CMU CSD/RI) – co-

advising Juan Pablo Mendoza

slide-29
SLIDE 29

List of Publications Attributed to the Grant

  • “Risk-Variant Policy Switching to Exceed Reward Thresholds,” B.

Kane and R. Simmons, ICAPS, Sao Paulo Brazil, June 2012

  • “Graph-based Trajectory Planning through Programming by

Demonstration,” Nik Melchior, PhD Thesis, CMU RI-TR-11-40, August 2012

  • “Motion Interference Detection in Mobile Robots,” J.P. Mendoza, M.

Veloso and R. Simmons, In Proceedings of International Conference

  • n Intelligent Robots and Systems, Vilamoura, Portugal, October

2012

  • “Mobile Robot Fault Detection based on Redundant Information

Statistics,” J.P. Mendoza, M. Veloso and R. Simmons, In IROS Workshop on Safety in Human-Robot Coexistence and Interaction, Vilamoura, Portugal, October 2012

slide-30
SLIDE 30