inferring human interaction from motion trajectories
play

Inferring Human Interaction from Motion Trajectories Tianmin Shu 1 - PowerPoint PPT Presentation

Inferring Human Interaction from Motion Trajectories Tianmin Shu 1 Yujia Peng 2 Lifeng Fan 1 Hongjing Lu 2 Song-Chun Zhu 1 University of California, Los Angeles, USA 1 Department of Statistics 2 Department of Psychology People are adept at


  1. Inferring Human Interaction from Motion Trajectories Tianmin Shu 1 Yujia Peng 2 Lifeng Fan 1 Hongjing Lu 2 Song-Chun Zhu 1 University of California, Los Angeles, USA 1 Department of Statistics 2 Department of Psychology

  2. People are adept at inferring social interactions from highly simplified stimuli. Heider and Simmel (1944)

  3. • Later studies showed that the perception of human-like interactions relies on some critical low-level motion cues , e.g., speed and motion direction. (Dittrich & Lea, 1994; Scholl & Tremoulet, 2000; Tremoulet & Feldman, 2000, 2006; Gao, Newman, & Scholl, 2009; Gao, McCarthy, & Scholl, 2010…) Chasing vs. Stalking [Gao & Scholl, 2011]

  4. Real-life Stimuli [Choi et al., 2009] [Shu et al., 2015]

  5. Tracking human trajectories and labeling group human interactions. (Shu, et al., 2015) [Shu et al., CVPR 2015]

  6. Interactive instances Experiment 1 1 2 • Participants 2 1 • 33 participants from the UCLA subject pool • Stimuli • 24 interactive actions • 24 non-interactive actions Non-interactive instances • Duration: 15-33 s • 4 out of 48 actions were used as practice 1 2 • Task 2 • Judging whether the two agents are 1 interacting at each moment .

  7. Interactive Example

  8. Non-interactive Example

  9. Human Experiment Results Interactive action 4 Non-interactive action 40 Video frame

  10. Human Experiment Results Interactive action 4 N = 33 Non-interactive action 40 Video frame

  11. Computational Model • Previous studies have developed Bayesian models to reason about the intentions of agents when moving in maze-like environments (Baker, Goodman, & Tenenbaum, 2008; Baker, Saxe, and Tenenbaum, 2009; Ullman et al., 2009; Baker, 2012...) [Ullman et al., 2009] [Baker, 2012]

  12. Computational Model • Previous studies have developed Bayesian models to reason about the intentions of agents when moving in maze-like environments (Baker, Goodman, & Tenenbaum, 2008; Baker, Saxe, and Tenenbaum, 2009; Ullman et al., 2009; Baker, 2012...) • In the current study, we are not trying to explicitly infer intention. Instead, we want to see if the model can capture statistical regularities (e.g. motion patterns) that can signal human interaction.

  13. Computational Model Y: interaction labels (0: interactive, 1: non-interactive) S: latent sub-interactions Г : input layer, motion trajectories of two agents

  14. 1. Conditional interactive fields (CIFs) Interactivity can be represented by latent motion fields, each capturing the relative motion between the two agents. Linear Dynamic System:

  15. An example CIF: Orbiting • Arrows: represent the mean relative motion at different locations • Intensities of the arrows: the relative spatial density which increases from light to dark

  16. 2. Temporal parsing by latent sub-interactions Latent motion fields can vary over time, which enables the model to characterize the behavioral change of the agents.

  17. A Simple View of Our Model Interaction ≈ Fields + Procedure

  18. Formulation Given the input of motion trajectories Г , the model infers the posterior distribution of the latent variables S and Y: Г : input of motion trajectories • • S: latent variable, sub-interaction • Y: interaction labels (0: interactive, 1: non-interactive)

  19. Formulation Г : input of motion trajectories • • S: latent variable, sub-interaction • Y: interaction labels (0: interactive, 1: non-interactive)

  20. Formulation Г : input of motion trajectories • • S: latent variable, sub-interaction • Y: interaction labels (0: interactive, 1: non-interactive)

  21. Formulation Г : input of motion trajectories • • S: latent variable, sub-interaction • Y: interaction labels (0: interactive, 1: non-interactive)

  22. Learning • Gibbs sampling

  23. Inference The model infers the current status of latent variables

  24. Inference The model infers the current status of latent variables Infer s t under the assumption of interaction (i.e., y t = 1)

  25. Inference The model infers the current status of latent variables Infer s t under the assumption of interaction (i.e., y t = 1) The posterior probability of y t = 1 given s t ∈ S

  26. Prediction Predict/synthesize s t+1 given y t+1 and all previous s y t+1 = 1 : interactive trajectories y t+1 = 0 : non-interactive trajectories Predict/synthesize x t+1 given y t and s t

  27. Training Data 1. UCLA aerial event dataset (Shu et al. 2015) http://www.stat.ucla.edu/˜tianmin.shu/AerialVideo/AerialVideo.html • 131 training instances (excluding trajectories used in the stimuli) • 22 validation instances from 44 stimuli • 22 testing instances from the remaining stimuli

  28. Training Data 1. UCLA aerial event dataset (Shu et al. 2015) http://www.stat.ucla.edu/˜tianmin.shu/AerialVideo/AerialVideo.html • 131 training instances (excluding trajectories used in the stimuli) • 22 validation instances from 44 stimuli • 22 testing instances from the remaining stimuli 2. The second dataset was created from the original Heider-Simmel animation (i.e., two triangles and one circle). • 27 training instances

  29. A few predominant CIFs: 1. approaching • Arrows: represent the mean relative motion at different locations • Intensities of the arrows: the relative spatial density which increases from light to dark

  30. A few predominant CIFs: 2. Passing by (upper part)

  31. A few predominant CIFs: 2. Passing by (upper part) and following (lower part)

  32. A few predominant CIFs: 3. Leaving/avoiding

  33. Illustration of the top five frequent CIFs learned from the training data. The frequencies of CIFs.

  34. Temporal Parsing of Fields in Heider-Simmel Animations

  35. Interactiveness Inference in Aerial Videos

  36. Experiment 1 Results Comparison of online predictions by our full model (|S| = 15) (orange) and humans (blue) over time (in seconds) on testing videos. Interactive ratings Time (s) Time (s) Time (s)

  37. Experiment 1 Results Comparison of online predictions by our full model (|S| = 15) (orange) and humans (blue) over time (in seconds) on testing videos. Interactive ratings Trained on Heider-simmel stimuli, tested on aerial video stimuli: r = 0.640 and RMSE of 0.227 Time (s) Time (s) Time (s)

  38. Test the Model Trained from Aerial Videos on Heider-Simmel Stimuli

  39. Experiment 2 • We used the model trained on aerial videos to synthesize new interactive videos:

  40. Experiment 2 • We used the model trained on aerial videos to synthesize new interactive videos • The model generated 10 interactive animations Synthesized interactive video (y=1) 5x Model predicted interactiveness

  41. Experiment 2 • We used the model trained on aerial videos to synthesize new interactive videos • The model generated 10 interactive animations Synthesized interactive video (y=1) 5x Model predicted interactiveness

  42. Experiment 2 • We used the model (trained on aerial videos) to synthesize new interactive videos: • The model generated 10 interactive animations and 10 non-interactive animations 5x Synthesized non-interactive video (y=0) Model predicted interactiveness

  43. Experiment 2 Results N = 17 • The interactiveness between the two agents in the synthesized videos was judged accurately by human observers. • The model effectively captured the visual features that signal potential interactivity between agents.

  44. Conclusion • Decontextualized animations based on real-life videos enable human perception of social interactions.

  45. Conclusion • Decontextualized animations based on real-life videos enable human perception of social interactions. • The hierarchical model can learn the statistic regularity of common sub-interactions, and accounts for human judgments of interactiveness.

  46. Conclusion • Decontextualized animations based on real-life videos enable human perception of social interactions. • The hierarchical model can learn the statistic regularity of common sub-interactions, and accounts for human judgments of interactiveness. • Results suggest that human interactions can be decomposed into sub-interactions such as approaching, walking in parallel, or orbiting.

  47. For more details, please visit our website http://www.stat.ucla.edu/~tianmin.shu/HeiderSimmel/CogSci17/

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend