Learning Models of Human Behavior using a Value Directed Approach - - PowerPoint PPT Presentation

learning models of human behavior using a value directed
SMART_READER_LITE
LIVE PREVIEW

Learning Models of Human Behavior using a Value Directed Approach - - PowerPoint PPT Presentation

Learning Models of Human Behavior using a Value Directed Approach Jesse Hoey Computer Science Department University of Toronto http://www.cs.toronto.edu/ jhoey/ IRIS Learning Workshop - June 9, 2004 Motivation: Modeling Human Behaviors


slide-1
SLIDE 1

Learning Models of Human Behavior using a Value Directed Approach

Jesse Hoey Computer Science Department University of Toronto http://www.cs.toronto.edu/∼jhoey/

IRIS Learning Workshop - June 9, 2004

slide-2
SLIDE 2

Motivation: Modeling Human Behaviors

Computer Vision Theory Decision human ACTION VIDEO cognitive vision behaviors

IRIS Learning Workshop - June 9, 2004 2/25

slide-3
SLIDE 3

Motivation: Modeling Human Behaviors

Computer Vision Theory Decision human ACTION VIDEO vision cognitive behaviors

IRIS Learning Workshop - June 9, 2004 3/25

slide-4
SLIDE 4

POMDPs for Human Behavior Understanding

behavior

IRIS Learning Workshop - June 9, 2004 4/25

slide-5
SLIDE 5

POMDPs for Human Behavior Understanding

behavior utility Action Outcome

IRIS Learning Workshop - June 9, 2004 5/25

slide-6
SLIDE 6

POMDPs for Human Behavior Understanding

Context: don’t steal cake steal cake Action: get cake get caught utility (hunger) previous world state behavior Outcome:

Process Decision Markov Partially Observable

IRIS Learning Workshop - June 9, 2004 6/25

slide-7
SLIDE 7

Overview

➜ POMDPs for Display Understanding in Context ➜ Computer Vision: Modeling video sequences

  • spatial abstraction
  • temporal abstraction

➜ Learning POMDPs ➜ Solving POMDPs ➜ Value-Directed Learning ➜ Experiments

  • Robot Control
  • Card Matching Game

➜ Conclusions, Current & Future Work

IRIS Learning Workshop - June 9, 2004 7/25

slide-8
SLIDE 8

Partially Observable Markov Decision Processes (POMDPs)

A POMDP is a probabilistic temporal model

  • f agent interacting with its environment :

a tuple S, A, T, R, O, B

S: finite set of unobservable states A: finite set of agent actions T : S × A → S transition function R : S × A → R reward function O: set of observations B : S × A → O observation function

R

t−1 t

O O S S A

IRIS Learning Workshop - June 9, 2004 8/25

slide-9
SLIDE 9

POMDPs for Human Behavior Understanding

Context: don’t steal cake steal cake Action: get cake get caught utility (hunger) previous world state behavior

Θ ΘO

Outcome:

ΘA

D

IRIS Learning Workshop - June 9, 2004 9/25

slide-10
SLIDE 10

Output Model

At

a a b:a

Sa

t

A t S

t−1 t

O ΘA ΘO ΘD

b:a

most likely A =1 IRIS Learning Workshop - June 9, 2004 10/25

slide-11
SLIDE 11

Output Model

Zx Zw Zw Zx Zw Zw Zx Zw Zw Zx t f

1249 1250 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4 1187 1188 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 1 2 3 4

1249 1250 1251 1188 1189

H V W X "smile"

1187 frame

I most likely A =1

b:a

At

a a b:a

Sa

t

A t S

t−1 t

O ΘA ΘO ΘD

IRIS Learning Workshop - June 9, 2004 11/25

slide-12
SLIDE 12

Output Model

Zx Zw Zw Zx Zw Zw Zx Zw Zw Zx t f

1249 1250 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4 1187 1188 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4 1 1 2 3 4

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 1 1 2 3 4

1249 1250 1251 1188 1189

H V W X "smile"

1187 frame

I most likely A =1

b:a

At

a a b:a

Sa

t

A t S

t−1 t

O ΘA ΘO ΘD

P (O|Ab:a) =

  • ij

P (IT |WT,iAb:a)P (∇ fT |XT,jAb:a)

  • kl

ΘXijknΘW jklnP (XT −1,kWT −1,l {O} 1,T −1 |Ab:a) IRIS Learning Workshop - June 9, 2004 12/25

slide-13
SLIDE 13

Learning the Model

τ

O

τ

Sa

τ−1 τ−1

O

τ−1 τ

A A

b:a b:a

S a S a

A

Θ

τ+1 τ

a

A

τ−1

Aa ΘD ΘO

Find parameters, Θ∗ = arg max

Θ

P(O, Sa, AaΘ) Use expectation-maximization(EM)algorithm: Θ∗ = arg max

Θ

 

Ab:a

P(Ab:a|OSaAa, θ′) log P(Ab:aOSaAa|Θ) + log P(Θ)   finds local maximum of a posteriori probability

IRIS Learning Workshop - June 9, 2004 13/25

slide-14
SLIDE 14

Solution Techniques

Learning Hardcore! Nightmare! I can win! Bring it on! Decision−Analytic approach Incremental pruning Entropy approximation Factored solvers SPUDD EM for POMDPs MDP approximation plenty! Hurt me Decision Making

  • bservable state

Decision Making Decision Making

discrete observations continuous observations unobservable state unobservable state

Decision Making

unobservable state multi−agent systems continuous observations

solution

  • ptimal

Finding equilibria General

difficulty problems

POMDPs Monte Carlo

IRIS Learning Workshop - June 9, 2004 14/25

slide-15
SLIDE 15

Solving the Model

τ

Sa

τ−1 τ−1 τ

A A

b:a b:a

S a S a

τ+1 τ

a

A

τ−1

Aa R R

MDP Approximation: Assume Ab:a is observable Dynamic Programming: Value Iteration

V n+1(s) = R(s) + max

a∈A

  • t∈S

P r(t|a, s) · V n(t)

  • ,

V 0 = R

n-stage to go Policy: actions that maximize expected value

πn(s) = arg max

a∈A

  • t∈S

P r(t|a, s) · V n(t)

  • IRIS Learning Workshop - June 9, 2004

15/25

slide-16
SLIDE 16

Value Directed Structure Learning

V

2

V

1

salami background cook relevant split V irrelevant merge

IRIS Learning Workshop - June 9, 2004 16/25

slide-17
SLIDE 17

Value Directed Structure Learning

State merging: repeat 1.learn the POMDP model 2.compute value functions for behaviors 3.compute distance between value functions 4.if policies agree, merge behaviors closest in value until number of behaviors stops changing State splitting repeat 1.learn the POMDP model 2.examine states for predicitve power - entropy? 3.Split behaviors which predict different outcomes until number of behaviors stops changing

IRIS Learning Workshop - June 9, 2004 17/25

slide-18
SLIDE 18

Experiments: Robot Control Gestures

R

At

b b:a t

A Ot

  • bservation of gesture
  • perator action

{"go left","stop","go right","forwards"}

robot action

{"good robot","bad robot"}

control command At

a

1 1 2 3 4 5 1 1 2 3 4 5 6 1 1 2 3 4 5 1 1 2 3 4 5 6 1 1 2 3 4 5

40 41

40 41 42

1 1 2 3 4 5 1 1 2 3 4 5 6 1 1 2 3 4 5 1 1 2 3 4 5 6 1 1 2 3 4 5

51 52

51 52 53

IRIS Learning Workshop - June 9, 2004 18/25

slide-19
SLIDE 19

Value-Directed Structure Learning

Part of Value function & policy for robot control:

Acom Aact d1 Aact d5 d6 d2 Aact d3 Aact d4 0.60 bad 1.60 good 0.50 bad 1.50 good 0.54 bad 1.54 good 0.65 bad 1.65 good

Acom left d1 right d2 stop d3 forward d4 right forward stop d5 left forward stop d6

Value Policy

  • Some states of Acom are redundant
  • detect & merge using state aggregation in the policy and value

function

  • re-compute policy

IRIS Learning Workshop - June 9, 2004 19/25

slide-20
SLIDE 20

Value-Directed Structure Learning

Acom Aact d3 Aact d1 Aact d2 Aact d4 0.54 bad 1.54 good 0.50 bad 1.50 good 0.60 bad 1.60 good 0.65 bad 1.65 good

Acom left d1 right d2 stop d3 forward d4

Value Policy

  • Leave-four-out cross validation (12 times)
  • Take actions and accumulate rewards
  • Success rate: 47/48 = 98% or 11/12 correct policies
  • Merges to 4 states all 12 times.

IRIS Learning Workshop - June 9, 2004 20/25

slide-21
SLIDE 21

Experiments: Card Matching Game

Cooperative two player game Goal: match cards stage 1 stage 2 stage 3

IRIS Learning Workshop - June 9, 2004 21/25

slide-22
SLIDE 22

Card Matching Results

3 behaviors identified: nodding, shaking, null Predicts:

  • 6/7 human actions in test data
  • 19/20 human actions in training data.

Errors: lack of POMDP data, temporal segmentation problems

IRIS Learning Workshop - June 9, 2004 22/25

slide-23
SLIDE 23

Handwashing Behavior Understanding

previous world state Context: Outcome: utility Action pompt hands washed (reward) caregiver behavior

P(video|behavior) statistically significant value directed

IRIS Learning Workshop - June 9, 2004 23/25

slide-24
SLIDE 24

Difficult Cases

self−occlusion

  • bject occlusion
  • bjects appear to merge

IRIS Learning Workshop - June 9, 2004 24/25

slide-25
SLIDE 25

Conclusions

  • Computer Vision + Probabilistic Models + Decision Theory
  • Learning purposeful human behavior models from unlabeled data.
  • System is general and portable - no reliance on expert knowledge
  • Applications: HCI, surveillance, assisted living, driver support
  • Future work

– Spatial segmentation and representation + tracking – Multimodal observations – Temporal segmentation – POMDP solutions – Value-directed learning (Hoey & Little, CVPR 2004)

IRIS Learning Workshop - June 9, 2004 25/25