Inferring Human Interaction from Motion Trajectories Tianmin Shu 1 - - PowerPoint PPT Presentation

inferring human interaction from motion trajectories
SMART_READER_LITE
LIVE PREVIEW

Inferring Human Interaction from Motion Trajectories Tianmin Shu 1 - - PowerPoint PPT Presentation

Inferring Human Interaction from Motion Trajectories Tianmin Shu 1 Yujia Peng 2 Lifeng Fan 1 Hongjing Lu 2 Song-Chun Zhu 1 University of California, Los Angeles, USA 1 Department of Statistics 2 Department of Psychology People are adept at


slide-1
SLIDE 1

Inferring Human Interaction from Motion Trajectories

University of California, Los Angeles, USA

1 Department of Statistics 2 Department of Psychology

Tianmin Shu1 Yujia Peng2 Hongjing Lu2 Song-Chun Zhu1 Lifeng Fan1

slide-2
SLIDE 2

People are adept at inferring social interactions from highly simplified stimuli.

Heider and Simmel (1944)

slide-3
SLIDE 3
  • Later studies showed that the perception of human-like interactions relies on some critical

low-level motion cues, e.g., speed and motion direction. (Dittrich & Lea, 1994; Scholl &

Tremoulet, 2000; Tremoulet & Feldman, 2000, 2006; Gao, Newman, & Scholl, 2009; Gao, McCarthy, & Scholl, 2010…)

[Gao & Scholl, 2011] Chasing vs. Stalking

slide-4
SLIDE 4

Real-life Stimuli

[Shu et al., 2015] [Choi et al., 2009]

slide-5
SLIDE 5

(Shu, et al., 2015)

Tracking human trajectories and labeling group human interactions.

[Shu et al., CVPR 2015]

slide-6
SLIDE 6

Experiment 1

  • Participants
  • 33 participants from the UCLA subject pool
  • Stimuli
  • 24 interactive actions
  • 24 non-interactive actions
  • Duration: 15-33 s
  • 4 out of 48 actions were used as practice
  • Task
  • Judging whether the two agents are

interacting at each moment.

1 2 1 2 1 2 1 2

Interactive instances Non-interactive instances

slide-7
SLIDE 7

Interactive Example

slide-8
SLIDE 8

Non-interactive Example

slide-9
SLIDE 9

Human Experiment Results

Video frame

Interactive action 4 Non-interactive action 40

slide-10
SLIDE 10

Human Experiment Results

N = 33

Video frame

Interactive action 4 Non-interactive action 40

slide-11
SLIDE 11
  • Previous studies have developed Bayesian models to reason about the

intentions of agents when moving in maze-like environments

(Baker, Goodman, & Tenenbaum, 2008; Baker, Saxe, and Tenenbaum, 2009; Ullman et al., 2009; Baker, 2012...)

Computational Model

[Ullman et al., 2009] [Baker, 2012]

slide-12
SLIDE 12
  • Previous studies have developed Bayesian models to reason about the

intentions of agents when moving in maze-like environments

(Baker, Goodman, & Tenenbaum, 2008; Baker, Saxe, and Tenenbaum, 2009; Ullman et al., 2009; Baker, 2012...)

  • In the current study, we are not trying to explicitly infer intention.

Instead, we want to see if the model can capture statistical regularities (e.g. motion patterns) that can signal human interaction.

Computational Model

slide-13
SLIDE 13

Computational Model

S: latent sub-interactions Y: interaction labels Г : input layer, motion trajectories of two agents

(0: interactive, 1: non-interactive)

slide-14
SLIDE 14
  • 1. Conditional interactive fields (CIFs)

Interactivity can be represented by latent motion fields, each capturing the relative motion between the two agents.

Linear Dynamic System:

slide-15
SLIDE 15

An example CIF: Orbiting

  • Arrows: represent the mean relative motion at different locations
  • Intensities of the arrows: the relative spatial density which increases from light to dark
slide-16
SLIDE 16
  • 2. Temporal parsing by latent sub-interactions

Latent motion fields can vary over time, which enables the model to characterize the behavioral change of the agents.

slide-17
SLIDE 17

A Simple View of Our Model

Interaction ≈ Fields + Procedure

slide-18
SLIDE 18

Formulation

  • Г: input of motion trajectories
  • S: latent variable, sub-interaction
  • Y: interaction labels

Given the input of motion trajectories Г, the model infers the posterior distribution of the latent variables S and Y:

(0: interactive, 1: non-interactive)

slide-19
SLIDE 19

Formulation

  • Г: input of motion trajectories
  • S: latent variable, sub-interaction
  • Y: interaction labels

(0: interactive, 1: non-interactive)

slide-20
SLIDE 20

Formulation

  • Г: input of motion trajectories
  • S: latent variable, sub-interaction
  • Y: interaction labels

(0: interactive, 1: non-interactive)

slide-21
SLIDE 21

Formulation

  • Г: input of motion trajectories
  • S: latent variable, sub-interaction
  • Y: interaction labels

(0: interactive, 1: non-interactive)

slide-22
SLIDE 22

Learning

  • Gibbs sampling
slide-23
SLIDE 23

The model infers the current status of latent variables

Inference

slide-24
SLIDE 24

Infer st under the assumption of interaction (i.e., yt = 1) The model infers the current status of latent variables

Inference

slide-25
SLIDE 25

Infer st under the assumption of interaction (i.e., yt = 1) The posterior probability of yt = 1 given st ∈ S The model infers the current status of latent variables

Inference

slide-26
SLIDE 26

Prediction

Predict/synthesize xt+1 given yt and st Predict/synthesize st+1 given yt+1 and all previous s yt+1 = 1: interactive trajectories yt+1 = 0: non-interactive trajectories

slide-27
SLIDE 27

Training Data

  • 1. UCLA aerial event dataset (Shu et al. 2015)

http://www.stat.ucla.edu/˜tianmin.shu/AerialVideo/AerialVideo.html

  • 131 training instances (excluding trajectories used in

the stimuli)

  • 22 validation instances from 44 stimuli
  • 22 testing instances from the remaining stimuli
slide-28
SLIDE 28

Training Data

  • 1. UCLA aerial event dataset (Shu et al. 2015)

http://www.stat.ucla.edu/˜tianmin.shu/AerialVideo/AerialVideo.html

  • 131 training instances (excluding trajectories used in

the stimuli)

  • 22 validation instances from 44 stimuli
  • 22 testing instances from the remaining stimuli
  • 2. The second dataset was created from the
  • riginal Heider-Simmel animation (i.e., two

triangles and one circle).

  • 27 training instances
slide-29
SLIDE 29

A few predominant CIFs:

  • 1. approaching
  • Arrows: represent the mean relative motion at different locations
  • Intensities of the arrows: the relative spatial density which increases from light to dark
slide-30
SLIDE 30

A few predominant CIFs:

  • 2. Passing by (upper part)
slide-31
SLIDE 31

A few predominant CIFs:

  • 2. Passing by (upper part) and following (lower part)
slide-32
SLIDE 32

A few predominant CIFs:

  • 3. Leaving/avoiding
slide-33
SLIDE 33

The frequencies of CIFs. Illustration of the top five frequent CIFs learned from the training data.

slide-34
SLIDE 34

Temporal Parsing of Fields in Heider-Simmel Animations

slide-35
SLIDE 35

Interactiveness Inference in Aerial Videos

slide-36
SLIDE 36

Experiment 1 Results

Comparison of online predictions by our full model (|S| = 15) (orange) and humans (blue) over time (in seconds) on testing videos.

Time (s) Time (s) Time (s) Interactive ratings

slide-37
SLIDE 37

Experiment 1 Results

Comparison of online predictions by our full model (|S| = 15) (orange) and humans (blue) over time (in seconds) on testing videos.

Time (s) Time (s) Time (s) Interactive ratings

Trained on Heider-simmel stimuli, tested on aerial video stimuli: r = 0.640 and RMSE of 0.227

slide-38
SLIDE 38

Test the Model Trained from Aerial Videos on Heider-Simmel Stimuli

slide-39
SLIDE 39

Experiment 2

  • We used the model trained on aerial videos to synthesize new interactive videos:
slide-40
SLIDE 40

Experiment 2

  • We used the model trained on aerial videos to synthesize new interactive videos
  • The model generated 10 interactive animations

Synthesized interactive video (y=1)

Model predicted interactiveness

5x

slide-41
SLIDE 41

Experiment 2

  • We used the model trained on aerial videos to synthesize new interactive videos
  • The model generated 10 interactive animations

Synthesized interactive video (y=1)

Model predicted interactiveness

5x

slide-42
SLIDE 42

Model predicted interactiveness

Experiment 2

  • We used the model (trained on aerial videos) to synthesize new interactive videos:
  • The model generated 10 interactive animations and 10 non-interactive animations

Synthesized non-interactive video (y=0) 5x

slide-43
SLIDE 43

Experiment 2 Results

N = 17

  • The interactiveness between the two agents in the synthesized videos was judged

accurately by human observers.

  • The model effectively captured the visual features that signal potential interactivity

between agents.

slide-44
SLIDE 44

Conclusion

  • Decontextualized animations based on real-life videos enable human

perception of social interactions.

slide-45
SLIDE 45

Conclusion

  • Decontextualized animations based on real-life videos enable human

perception of social interactions.

  • The hierarchical model can learn the statistic regularity of common

sub-interactions, and accounts for human judgments of interactiveness.

slide-46
SLIDE 46

Conclusion

  • Decontextualized animations based on real-life videos enable human

perception of social interactions.

  • The hierarchical model can learn the statistic regularity of common

sub-interactions, and accounts for human judgments of interactiveness.

  • Results suggest that human interactions can be decomposed into

sub-interactions such as approaching, walking in parallel, or orbiting.

slide-47
SLIDE 47

For more details, please visit our website

http://www.stat.ucla.edu/~tianmin.shu/HeiderSimmel/CogSci17/