Learning Models of Human Behavior using a Value Directed Approach - PowerPoint PPT Presentation

Learning Models of Human Behavior using a Value Directed Approach Jesse Hoey Computer Science Department University of Toronto http://www.cs.toronto.edu/ ∼ jhoey/ IRIS Learning Workshop - June 9, 2004

Motivation: Modeling Human Behaviors ACTION Decision Theory cognitive vision human behaviors Computer Vision VIDEO IRIS Learning Workshop - June 9, 2004 2/25

Motivation: Modeling Human Behaviors ACTION Decision Theory cognitive vision human behaviors Computer Vision VIDEO IRIS Learning Workshop - June 9, 2004 3/25

POMDPs for Human Behavior Understanding behavior IRIS Learning Workshop - June 9, 2004 4/25

POMDPs for Human Behavior Understanding Action utility Outcome behavior IRIS Learning Workshop - June 9, 2004 5/25

POMDPs for Human Behavior Understanding Action: Context: utility steal cake previous (hunger) don’t steal cake world state Outcome: get cake get caught behavior Partially Observable Markov Decision Process IRIS Learning Workshop - June 9, 2004 6/25

Overview ➜ POMDPs for Display Understanding in Context ➜ Computer Vision: Modeling video sequences • spatial abstraction • temporal abstraction ➜ Learning POMDPs ➜ Solving POMDPs ➜ Value-Directed Learning ➜ Experiments • Robot Control • Card Matching Game ➜ Conclusions, Current & Future Work IRIS Learning Workshop - June 9, 2004 7/25

Partially Observable Markov Decision Processes (POMDPs) A POMDP is a probabilistic temporal model of agent interacting with its environment : a tuple � S, A, T, R, O, B � R A S : finite set of unobservable states A : finite set of agent actions S S T : S × A → S transition function R : S × A → R reward function O O O : set of observations B : S × A → O observation function t−1 t IRIS Learning Workshop - June 9, 2004 8/25

POMDPs for Human Behavior Understanding Action: Context: utility steal cake Θ A previous (hunger) don’t steal cake world state Outcome: get cake Θ get caught D behavior Θ O IRIS Learning Workshop - June 9, 2004 9/25

Output Model b:a most likely A =1 a A t Θ A S a a S t−1 t b:a A t Θ D Θ O O t IRIS Learning Workshop - June 9, 2004 10/25

Output Model b:a most likely A =1 1 "smile" 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 0 a A t Θ A 1 1 1 X 1 1 2 3 4 1 2 3 4 0 1 2 3 4 1 2 3 4 0 0 0 S a a S t−1 t b:a W 1 1 1 1 A t 1 1 Θ D 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 0 0 0 0 Θ O 0 0 O Zx Zx Zx Zx t Zw Zw Zw Zw Zw Zw 1249 1250 1187 1188 V H f t I frame 1187 1188 1189 1249 1250 1251 IRIS Learning Workshop - June 9, 2004 11/25

Output Model b:a most likely A =1 1 "smile" 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 0 a A t Θ A 1 1 1 X 1 1 2 3 4 1 2 3 4 0 1 2 3 4 1 2 3 4 0 0 0 S a a S t−1 t b:a W 1 1 1 1 A t 1 1 Θ D 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 0 0 0 0 Θ O 0 0 O Zx Zx Zx Zx t Zw Zw Zw Zw Zw Zw 1249 1250 1187 1188 V H f t I frame 1187 1188 1189 1249 1250 1251 P ( O | Ab : a ) = P ( IT | WT,iAb : a ) P ( ∇ fT | XT,jAb : a ) | Ab : a ) � � Θ Xijkn Θ W jklnP ( XT − 1 ,kWT − 1 ,l { O } 1 ,T − 1 ij kl IRIS Learning Workshop - June 9, 2004 12/25

Learning the Model Θ A a A a A τ τ−1 S a S a S a τ−1 τ τ+1 Θ D b:a b:a A A τ−1 τ Θ O O O τ−1 τ Find parameters, Θ ∗ = arg max P ( O , S a , A a Θ) Θ Use expectation-maximization(EM)algorithm:   Θ ∗ = arg max � P ( A b : a | OS a A a , θ ′ ) log P ( A b : a OS a A a | Θ) + log P (Θ)  Θ A b : a finds local maximum of a posteriori probability IRIS Learning Workshop - June 9, 2004 13/25

Solution Techniques Decision Making Finding Decision−Analytic unobservable state problems approach equilibria continuous observations multi−agent systems General Decision Making Monte Carlo MDP Entropy optimal unobservable state approximation approximation POMDPs solution continuous observations Decision Making Incremental Factored unobservable state pruning solvers discrete observations Decision Making SPUDD observable state EM for Learning POMDPs Hurt me Hardcore! Nightmare! I can win! Bring it on! plenty! difficulty IRIS Learning Workshop - June 9, 2004 14/25

Solving the Model R R a A a A τ τ−1 S a S a S a τ−1 τ τ+1 b:a b:a A A τ−1 τ MDP Approximation: Assume A b : a is observable Dynamic Programming: Value Iteration �� V 0 = R V n +1 ( s ) = R ( s ) + max P r ( t | a, s ) · V n ( t ) , a ∈A t ∈S n-stage to go Policy: actions that maximize expected value �� π n ( s ) = arg max P r ( t | a, s ) · V n ( t ) a ∈A t ∈S IRIS Learning Workshop - June 9, 2004 15/25

Value Directed Structure Learning merge V 0 salami irrelevant background cook split V 1 relevant V 2 IRIS Learning Workshop - June 9, 2004 16/25

Value Directed Structure Learning State merging: repeat 1.learn the POMDP model 2.compute value functions for behaviors 3.compute distance between value functions 4.if policies agree, merge behaviors closest in value until number of behaviors stops changing State splitting repeat 1.learn the POMDP model 2.examine states for predicitve power - entropy? 3.Split behaviors which predict different outcomes until number of behaviors stops changing IRIS Learning Workshop - June 9, 2004 17/25

Experiments: Robot Control Gestures 1 1 1 1 1 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 0 0 0 0 0 40 41 a robot action A t {"go left","stop","go right","forwards"} R b operator action 40 41 42 A t {"good robot","bad robot"} 1 1 1 1 1 control b:a A 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 1 2 3 4 5 6 1 2 3 4 5 0 0 0 0 0 t command 52 51 observation of gesture O t 51 53 52 IRIS Learning Workshop - June 9, 2004 18/25

Value-Directed Structure Learning Part of Value function & policy for robot control: Acom Acom d1 d5 d6 d2 d3 d4 d1 d2 d3 d4 d5 d6 Aact Aact Aact Aact bad good bad good bad good bad good right left left right stop forward forward forward stop stop 0.60 1.60 0.50 1.50 0.54 1.54 0.65 1.65 Value Policy • Some states of Acom are redundant • detect & merge using state aggregation in the policy and value function • re-compute policy IRIS Learning Workshop - June 9, 2004 19/25

Value-Directed Structure Learning Acom Acom d3 d1 d2 d4 d1 d2 d3 d4 Aact Aact Aact Aact bad good bad good bad good bad good left right stop forward 0.54 1.54 0.50 1.50 0.60 1.60 0.65 1.65 Value Policy • Leave-four-out cross validation (12 times) • Take actions and accumulate rewards • Success rate: 47 / 48 = 98% or 11 / 12 correct policies • Merges to 4 states all 12 times. IRIS Learning Workshop - June 9, 2004 20/25

Experiments: Card Matching Game Cooperative two player game Goal: match cards stage 1 stage 2 stage 3 IRIS Learning Workshop - June 9, 2004 21/25

Card Matching Results 3 behaviors identified: nodding, shaking, null Predicts: • 6/7 human actions in test data • 19/20 human actions in training data. Errors: lack of POMDP data, temporal segmentation problems IRIS Learning Workshop - June 9, 2004 22/25

Handwashing Behavior Understanding Action pompt utility (reward) Context: previous world Outcome: state hands washed caregiver behavior P ( video | behavior ) statistically significant value directed IRIS Learning Workshop - June 9, 2004 23/25

Difficult Cases self−occlusion objects appear to merge object occlusion IRIS Learning Workshop - June 9, 2004 24/25

Conclusions • Computer Vision + Probabilistic Models + Decision Theory • Learning purposeful human behavior models from unlabeled data. • System is general and portable - no reliance on expert knowledge • Applications: HCI, surveillance, assisted living, driver support • Future work – Spatial segmentation and representation + tracking – Multimodal observations – Temporal segmentation – POMDP solutions – Value-directed learning (Hoey & Little, CVPR 2004) IRIS Learning Workshop - June 9, 2004 25/25

Learning Models of Human Behavior using a Value Directed Approach - PowerPoint PPT Presentation

Learning Models of Human Behavior using a Value Directed Approach Jesse Hoey Computer Science Department University of Toronto http://www.cs.toronto.edu/ jhoey/ IRIS Learning Workshop - June 9, 2004 Motivation: Modeling Human Behaviors

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Incidence Relations and Directed Cycles Hao Wu George Washington University Directed graphs and

3.5 Connectivity in Directed Graphs Directed Graphs Directed graph. G = (V, E) Edge (u, v)

6.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

5.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

CS 401 Greedy Algorithms Xiaorui Sun 1 Directed Acyclic Graphs (DAG) Def: A DAG is a directed

Goal-Directed Design User Goals Models Goal-Directed Design Jrg Cassens References SoSe

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces Jacob

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

IMPACTFUL PROGRAMS-- SELF-DIRECTED LEARNING A Program WHAT? Definition Value Goals and

BEHAVIOR @ HOME Behavior Basics Simple strategies that can make a big difference! Presented by

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

Syntax Directed Analysis Chapter 5 1 Compiler Construction Syntax Directed Analysis

Directed Steiner Tree and the Lasserre Hierarchy Thomas Rothvo Department of Mathematics,

Directed Diffusion for Wireless Sensor Networking Jussi Nikander Jussi.Nikander@hut.fi 9th

Support Planner, Consultant, Broker A Rose by Any Other Name Applied Self-Direction Workshop:

FES Client Data Processing & Actions FI FES 315 FI_FES_315 FES Client Data Processing &

IRIS Plugin for Decision Deck Vincent Mousseau, Salem Chakhar Lamsade, Universit e Paris

Segmentation-level Fusion for Iris Recognition Peter Wild 1 , 3 , Heinz Hofbauer 2 , James Ferryman

STOPtop Silver B Product Need & User Stovetop Grease Fires 60,800 fires $858,000,000 340

Reducing Environmental Asthma Triggers Anne Kelsey Lamb, MPH, Director Regional Asthma Management

Occupational Risk Assessment 2020 . . . and Beyond Christine Whittaker, Ph.D. Chief, Risk

Customer Satisfaction SE 350 Software Processes & Product Quality Overview Defining customer

Learning Models of Human Behavior using a Value Directed Approach - PowerPoint PPT Presentation

Learning Models of Human Behavior using a Value Directed Approach Jesse Hoey Computer Science Department University of Toronto http://www.cs.toronto.edu/ jhoey/ IRIS Learning Workshop - June 9, 2004 Motivation: Modeling Human Behaviors

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Finding Strongly Connected Components Directed Acyclic Graphs Directed Acyclic Graphs Directed

Incidence Relations and Directed Cycles Hao Wu George Washington University Directed graphs and

3.5 Connectivity in Directed Graphs Directed Graphs Directed graph. G = (V, E) Edge (u, v)

6.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

5.1 Directed Acyclic Graphs Directed acyclic graphs , or DAGs are acyclic directed graphs where

CS 401 Greedy Algorithms Xiaorui Sun 1 Directed Acyclic Graphs (DAG) Def: A DAG is a directed

Goal-Directed Design User Goals Models Goal-Directed Design Jrg Cassens References SoSe

UT^2: Human-like Behavior via Neuroevolution of Combat Behavior and Replay of Human Traces Jacob

Probabilistic Graphical Models Probabilistic Graphical Models Relationship between the directed

IMPACTFUL PROGRAMS-- SELF-DIRECTED LEARNING A Program WHAT? Definition Value Goals and

BEHAVIOR @ HOME Behavior Basics Simple strategies that can make a big difference! Presented by

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

Syntax Directed Analysis Chapter 5 1 Compiler Construction Syntax Directed Analysis

Directed Steiner Tree and the Lasserre Hierarchy Thomas Rothvo Department of Mathematics,

Directed Diffusion for Wireless Sensor Networking Jussi Nikander Jussi.Nikander@hut.fi 9th

Support Planner, Consultant, Broker A Rose by Any Other Name Applied Self-Direction Workshop:

FES Client Data Processing &amp; Actions FI FES 315 FI_FES_315 FES Client Data Processing &amp;

IRIS Plugin for Decision Deck Vincent Mousseau, Salem Chakhar Lamsade, Universit e Paris

Segmentation-level Fusion for Iris Recognition Peter Wild 1 , 3 , Heinz Hofbauer 2 , James Ferryman

STOPtop Silver B Product Need &amp; User Stovetop Grease Fires 60,800 fires $858,000,000 340

Reducing Environmental Asthma Triggers Anne Kelsey Lamb, MPH, Director Regional Asthma Management

Occupational Risk Assessment 2020 . . . and Beyond Christine Whittaker, Ph.D. Chief, Risk

Customer Satisfaction SE 350 Software Processes &amp; Product Quality Overview Defining customer

FES Client Data Processing & Actions FI FES 315 FI_FES_315 FES Client Data Processing &

STOPtop Silver B Product Need & User Stovetop Grease Fires 60,800 fires $858,000,000 340

Customer Satisfaction SE 350 Software Processes & Product Quality Overview Defining customer