Assistive robotics: helping with tasks for fun and profit.
Neil Bell 3 March 2016 CMSC691-HRI
Assistive robotics: helping with tasks for fun and profit. Neil - - PowerPoint PPT Presentation
Assistive robotics: helping with tasks for fun and profit. Neil Bell 3 March 2016 CMSC691-HRI Who has... Five to ten year goals? 1. Finish school? 2. Pay off the car loan? 3. First (or second) child? Who has... Five to ten year goals?
Neil Bell 3 March 2016 CMSC691-HRI
Five to ten year goals? 1. Finish school? 2. Pay off the car loan? 3. First (or second) child?
Five to ten year goals? 1. Finish school? 2. Pay off the car loan? 3. First (or second) child? Barriers to these goals? 1. Injury? 2. Family emergency? 3. Relative growing old, requiring assistance?
Five to ten year goals? 1. Finish school? 2. Pay off the car loan? 3. First (or second) child? Barriers to these goals? 1. Injury? 2. Family emergency? 3. Relative growing old, requiring assistance? Today’s focus: leveraging robotic assistance to ease the burden of extra or increasingly more difficult tasks.
1. The Domesticated Robot: Design Guidelines for Assisting Older Adults to Age in Place. J. Beer, C. Smarr, T. Chen, A. Prakash, T. Mitzner, C. Kemp,
2. Online Development of Assistive Robot Behaviors for Collaborative Manipulation and Human-Robot Teamwork. B. Hayes, B. Scassellati. AAAI 2014.
1. The Domesticated Robot: Design Guidelines for Assisting Older Adults to Age in Place. J. Beer, C. Smarr, T. Chen, A. Prakash, T. Mitzner, C. Kemp,
2. Online Development of Assistive Robot Behaviors for Collaborative Manipulation and Human-Robot Teamwork. B. Hayes, B. Scassellati. AAAI 2014.
Goal: 1. Retain or enhance functionality despite age-related changes a. Cognition - less working memory b. Physical - arthritis or pain c. Perception - senses weakened, vision loss 2. Therefore, eliminate need for relocation to satisfy or substitute for goal #1.
Selection - Development and commitment to personal goals Reframe or update goals based on life events and changes
Selection - Development and commitment to personal goals Reframe or update goals based on life events and changes Optimization - Increasing odds of success Investment of time and energy to behaviors that support chosen goals
Selection - Development and commitment to personal goals Reframe or update goals based on life events and changes Optimization - Increasing odds of success Investment of time and energy to behaviors that support chosen goals Compensation - Regulation of loss Use mechanisms to prevent or balance age-related changes
Psychological, such as mnemonics, memory aids.
Psychological, such as mnemonics, memory aids. Technological 1. Hearing aids 2. Wheelchairs 3. Eyeglasses
Psychological, such as mnemonics, memory aids. Technological 1. Hearing aids 2. Wheelchairs 3. Eyeglasses 4. Robots
Psychological, such as mnemonics, memory aids. Technological 1. Hearing aids 2. Wheelchairs 3. Eyeglasses 4. Robots “Competence” is dynamic, capturing how a person functions in isolation. Relocation decisions depend heavily on individual’s level of competence.
Identifies daily upkeep tasks as having highest potential for assistive robotics.
Identifies daily upkeep tasks as having highest potential for assistive robotics. 1. Assess older adults’ preference for assistance from robots or humans on upkeep tasks. 2. Understand older adults’ opinions of using a home robot. 3. Consider implications of findings for directing improvement efforts for designing home assistive robots.
1. Questionnaire a. Technology experience b. Demographics, health, and current living situation
1. Questionnaire a. Technology experience b. Demographics, health, and current living situation 2. Conduct group interviews with adults ranging in age from 65 to 93.
1. Questionnaire a. Technology experience b. Demographics, health, and current living situation 2. Conduct group interviews with adults ranging in age from 65 to 93. 3. Assess familiarity with robots a. Most were familiar with concept of robots. b. Few had controlled or interacted with one.
1. Questionnaire a. Technology experience b. Demographics, health, and current living situation 2. Conduct group interviews with adults ranging in age from 65 to 93. 3. Assess familiarity with robots a. Most were familiar with concept of robots. b. Few had controlled or interacted with one. 4. Introduce participants to capabilities of Personal Robot 2 (PR2).
1. Questionnaire a. Technology experience b. Demographics, health, and current living situation 2. Conduct group interviews with adults ranging in age from 65 to 93. 3. Assess familiarity with robots a. Most were familiar with concept of robots. b. Few had controlled or interacted with one. 4. Introduce participants to capabilities of Personal Robot 2 (PR2). 5. Assistance Preference Checklist
Goal: assess how participants’ preferences (robot vs. human) vary per task Process: 1. Assume robot could perform task to the level of a human 2. Imagine participant needed assistance on the given task 3. Rate preference on scale a. Human-only (1) b. No preference (3) c. Robot only (5)
Participants preferred robots over humans in 28
many cleaning tasks. (M > 3.00 => preference for robot assistance, where 3.00 = no preference)
Other tasks provided less decisive results. (M ≅ 3.00, where 3.00 = no preference)
Fetching tasks also more geared towards robotic assistance. (M > 3.00 => preference for robot assistance, where 3.00 = no preference)
Coding scheme used to identify patterns and themes from the discussion 1. Transcript segmented by the researcher 2. Segments categorized into groups: pro and con 3. Patterns emerge, identifying commonalities in the participant responses
Pro: Compensation Time saving Delegation of undesirable task Effort saving Optimization Con: Damage to environment Dependency Mental model Reliability in the system Storage & space requirements
Customizability Tailor behavior to user preference Interaction Cooperative robot-human effort Manipulation Level of dexterous manipulation Payload Range of weight expected to work with Range of motion Size of kinematic workspace (high/low/near/far) Storage & size Physical attributes such as footprint, height, mass.
1. The Domesticated Robot: Design Guidelines for Assisting Older Adults to Age in Place. J. Beer, C. Smarr, T. Chen, A. Prakash, T. Mitzner, C. Kemp,
2. Online Development of Assistive Robot Behaviors for Collaborative Manipulation and Human-Robot Teamwork. B. Hayes, B. Scassellati. AAAI 2014.
If the previous paper can be seen as the “why”, this is the “how”. General agreement: 1. Human-robot teaming can improve efficiency, quality of life, and safety. 2. Robots: a. Provide assistance when useful. b. Do dull or undesirable tasks when possible. c. Assist with dangerous tasks when feasible Key contribution: One possible process and training model for enabling a robot to learn from demonstration.
Learning by demonstration Providing input to a learning system without complex interfaces. Novice operators can demo, and even verbally describe the process. Algorithms can be robust to inconsistencies of inexperienced trainers. Key point: Learn the human’s intent rather than solely a sequence of actions.
Consider the concerns that participants had from the previous paper: 1. “I keep thinking of it in terms of how it could help prepare my food but I don’t know whether robots could cook.” 2. “I can see that if it does laundry, it needs to be able to sort by color. I can see that that would be a con and it couldn’t do it.” 3. “You tell him to bring glasses, he brings you a pair of shoes.” Learning by demonstration is designed to be flexible and efficient, quick to converge, especially as new tasks are added.
Markov Decision Process (MDP) A generalized structure for efficient representation of flexible, arbitrarily complex options, capturing closed-loop policies and action sequences.
Markov Decision Process (MDP) A generalized structure for efficient representation of flexible, arbitrarily complex options, capturing closed-loop policies and action sequences. S : a set of possible states (s in S) that the agent can be in at any moment. A : a set of possible actions (a in A) that the agent can take. R (s′ | s , a) : reward for arriving at state s′ having taken action a from state s. P (s′ | s , a) : probability that taking action a from state s actually results in arriving at state s′. Quite necessary in stochastic settings (“robotics...duh”).
Markov Decision Process (MDP) It’s all about the policy , fully enumerating what actions can be taken from what states. Generally static once policy graph is defined.
Markov Decision Process (MDP) It’s all about the policy , fully enumerating what actions can be taken from what states. Generally static once policy graph is defined. Semi-Markov (SMDP) : extensible, can adapt and learn new connections.
Markov Decision Process (MDP) It’s all about the policy , fully enumerating what actions can be taken from what states. Generally static once policy graph is defined. Semi-Markov (SMDP) : extensible, can adapt and learn new connections.
Partially Observable Markov Decision Process (POMDP) Builds on MDPs by including realistic lack of perfect knowledge. Observers don’t know exactly what state the agent is in, can only take
Partially Observable Markov Decision Process (POMDP) Builds on MDPs by including realistic lack of perfect knowledge. Observers don’t know exactly what state the agent is in, can only take
Add two new variables to (S, A, R, P ): O : a set of observations, generally collected from states in S. Ω : conditional probability of an observation given a state and an action. Ω (o in O | s , a) = probability that o is observed given some action a from s. Looping across (s, a) tuples gives best guess (argmax).
Paper’s focus is on sharing workspace with assistive robot while cooperating on high-level construction & assembly tasks: Materials stabilization Materials retrieval Collaborative object manipulation Enhancing awareness Task progression guidance
Paper’s focus is on sharing workspace with assistive robot while cooperating on high-level construction & assembly tasks: Materials stabilization Holding magnifying glass Materials retrieval Fetch cell phone Collaborative object manipulation Open jar Enhancing awareness Recognize danger and alert Task progression guidance Prepare items for task in advance Correlates heavily with tasks necessary for successful aging in place.
Stepping back: Goal is to identify the behavior policy , using SMDPs to adapt and learn new connections. We can map vertices and edges to components of the task at hand.
Standard SMDP directed graph: state-centric, take actions to get to new states Vertices represent possible states. 1. Robot arm position 2. Worker tool selection 3. Worker position 4. Material being worked on 5. Tool state (drill rotating, flashlight shining, ....)
Standard SMDP directed graph: state-centric, take actions to get to new states Vertices represent possible states. 1. Robot arm position 2. Worker tool selection 3. Worker position 4. Material being worked on 5. Tool state (drill rotating, flashlight shining, ....) Edges represent actions to transition between states. 1. Turn on flashlight (if in a state where holding flashlight, and flashlight is off) 2. Turn off flashlight (if in a state where holding flashlight, and flashlight is on)
Graph’s “dual” is swapping purpose of vertices and edges, become action-centric. Edges represent possible actions states, configuration of the workspace. 1. (flashlight = off, worker has hammer) 2. (flashlight = on, worker has hammer) 3. (flashlight = on, worker not present) Vertices represent possible states actions. 1. Turn on flashlight 2. Turn off flashlight An action-centric graph focuses on taking actions when a given state configuration is present.
Skill acquisition: provide guidance and information so the SMDP can be built. Authors use explicit, kinesthetic teaching, clarifying which attributes of the current task are critical to learn. Manipulator distance may be useful in one scenario, but not in another. Different workers have preference or styles (left vs right-handed, …) Interpretation: blend of supervised and unsupervised learning. Supervised: learning is structured and guided Unsupervised: Actual internal model is black-box, thus, POMDP.
Skill refinement: On-the-job retraining, not re-scripting like assembly line robots Task changes over time: new tools, new material, new target. Like humans, sub-optimal behavior requires immediate correction. New training data provided by kinesthetic movement or positioning. Updates are added to SMDP, updating reward function R (). Layered rewards provide normal and sub-level rewards, more granular.
Robot has a difficult role: contribute to current state while predicting next state. 1. Choose actions from set of valid actions from current state based on POMDP. 2. Optimize, minimize time spent in the current state s by reducing the reward according to the duration. Encourage progress. 3. Create list of possible assistive policies based on observed task-level policies. 4. Select policy that most lowers expected policy-wide time-in-state given all
After policy is chosen, personalize behavior, choose a based on worker. Action rejection is valid, worker denies robot interaction, update SMDP policy. Execute behavior, leveraging kinematic calculations for pose, actions, response.
Leverage POMDPs for flexible training and refinement. Robots receive on-the-job training with live human workers. Training leads to optimal policy selection and adaptation during execution.
Final thoughts, ideas, questions?