of objects and human poses
play

of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May - PowerPoint PPT Presentation

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May 28 th , 2013 1 Agenda Introduction & Motivation Dataset description Model


  1. Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May 28 th , 2013 1

  2. Agenda • Introduction & Motivation • Dataset description • Model • Training • Inference • Results 2

  3. Context and Recognition Human visual system uses context for recognition 3

  4. Human Object Interaction (HOI) 4

  5. Human Poses and Objects Human pose Unusual part estimation is appearances challenging. Self occlusion Patch looks like body part 5

  6. Human Poses and Objects Given the object is detected. 6

  7. Human Poses and Objects Object detection is challenging Small, low- resolution, partially occluded Image region similar to detection target 7

  8. Human Poses and Objects Given the pose is estimated. 8

  9. Datasets - Sports Images of six sports activities 9

  10. Datasets - PPMI People interacting with 12 classes of musical instruments 10

  11. Atomic poses – pose dictionary 11

  12. Mutual Context Model • Goal: Estimate the human pose and detect the objects that the human interacts with – Occluded or small objects – Articulated human poses – variation of poses in one class of activity • Conditional random field model • Human interacting with any number of objects 12

  13. Model y ( A , O , H , I ) = f 1 ( A , O , H ) + f 2 ( O , H ) Activity Co-occurrence context Spatial context A + f 3 ( O , I ) + f 4 ( H , I ) + f 5 ( A , I ) Human pose H Objects O M O 1 Modeling objects Modeling activity Body parts Modeling human pose P 1 P 2 P L I Image of human-object interaction 13

  14. Model: Co-occurrence Context Activity A Compatibility between actions, objects, and Human pose human poses H Objects O M O 1 Body parts f 1 ( A , O , H ) = P 1 P 2 P L N b N o N a M 1( H = h i ).1( O m = o j ).1( A = a k ) z i , j , k å å å å i = 1 m = 1 j = 1 k = 1 I Image of human-object interaction 14

  15. Model: Co-occurrence Context f 1 ( A , O , H ) = N h N o N a M 1( H = h i ).1( O m = o j ).1( A = a k ) z i , j , k å å å å i = 1 m = 1 j = 1 k = 1 N h : total number of atomic poses h i : the i th atomic pose N o : total number of objects o j : the j th object N a : total number of activates a k : the k th activity ζ i,j,k : strength of the co-occurrence interaction 15

  16. Model: Spatial Context Activity A Spatial relationship between object and Human pose different body parts of the human H Objects O M O 1 Body parts f 2 ( H , O ) = P 1 P 2 P L N h N o M L 1( H = h i ).1( O m = o j ). l i , j , l å å å å T l , O m ) . b ( X I m = 1 i = 1 j = 1 l = 1 I Image of human-object interaction 16

  17. Model: Spatial Context f 2 ( H , O ) = N h N o M L 1( H = h i ).1( O m = o j ). l i , j , l å å å å T l , O m ) . b ( X I m = 1 i = 1 j = 1 l = 1 l : location of the center of human’s l th body part in image I x I l and the m th object l m ): spatial relationship between x I b(x I , O bounding box  sparse binary vector with one 1 λ i,j,l : Weight for the relationship 17

  18. Model: Objects Modeling objects using the detection scores Activity in all the object bounding boxes and the A spatial relationship between these boxes. Human pose H Objects f 3 ( O , I ) = O M O 1 N o M 1( O m = o j ). g j å å T . g ( O m ) + Body parts m = 1 j = 1 P 1 P 2 P L N o M M L 1( O m = o j ).1( O m = o ¢ å å å å T . b ( O m , O m ) ¢ ¢ j ). g j , ¢ j m = 1 m = 1 ¢ j = 1 j = 1 ¢ I 18

  19. Model: Objects f 3 ( O , I ) = N o M 1( O m = o j ). g j å å T . g ( O m ) + m = 1 j = 1 N o M M L 1( O m = o j ).1( O m = o ¢ å å å å T . b ( O m , O m ) ¢ ¢ j ). g j , ¢ j m = 1 m = 1 ¢ j = 1 j = 1 ¢ g(O m ) : vector of scores of all detected object in the m th box ϒ j : the detection score weight for the j th object b(O m, O m’ ) : binary vector of spatial relationship between pairs of objects ϒ j,j ’ : weight for geometric configuration between o j and o j ’ [Desai et al, 2009] 19

  20. Model: Human Pose Likelihood of observing image I given the Activity atomic pose h i A Human pose H f 4 ( H , I ) = O M O 1 N h L å å T . p ( X I l | X h i l )) + Body parts 1( H = h i ).( a i , l b i , l T . f l ( I )) i = 1 l = 1 P 1 P 2 P L I Image of human-object interaction 20

  21. Model: Human Pose f 4 ( H , I ) = N h L å å T . p ( X I l | X h i l )) + 1( H = h i ).( a i , l b i , l T . f l ( I )) i = 1 l = 1 l | x hi l ) : Gaussian likelihood of observing x I l , given the standard joint p(x I location of the l th body part in pose h i f l (I) : the l th body part detection output α j,l : location weight for the l th body part in pose h i β j,l : appearance weight for the l th body part in pose h i 21

  22. Model: Activities Activity classifier to model HOI activity Activity A f 5 ( A , I ) = Human pose H N o å Objects 1( A = a k ). h k b i , l T . T . s ( I )) O M O 1 k = 1 Body parts P 1 P 2 P L I Image of human-object interaction 22

  23. Model: Activities f 5 ( A , I ) = N o å 1( A = a k ). h k b i , l T . T . s ( I )) k = 1 η k : feature weight for activity a k s(I) : output of one-versus-all discriminative classifier 23

  24. Training: Atomic Poses Hierarchical clustering from a given set of poses on training images: • Position and orientation of parts with distance • Normalization to the same position/size of torso (sports) or head (music) • Variations in position and orientation are normalized to [-1,1] • Missing parts are filled from the image’s nearest neighbor • Atomic poses are shared by all activities w 𝑈 ⋅ ∣ x 𝑚 − x 𝑚 ∣ 24

  25. Training: Objects and Part Detectors Deformable Parts Model with SVM on HOG feature detectors: • One mixture component per per body part • Two mixture components per object unless aspect ratios do not change • - value of the object detection score divided by the threshold • - value of the body part detection divided by the threshold 25

  26. Training: Activity Classifier Spatial Pyramid Matching method: • Sparse SIFT features on three layers • - a vector with confidence scores obtained from an SVM classifier 26

  27. Training: Estimating Model Parameters Conditional Random Field with no hidden variables: • - model parameters • Maximum likelihood approach • Zero-mean Gaussians priors 27

  28. Inference: Iterative Process Initialization : • Action classification with SPM classification • Object bounding boxes from independent object detectors (scores >0.9) • Initial pose from a pictorial structure model from all training images Two Iterations : • Updating the layout of human body parts - updating Gaussian priors for part locations with poses marginal probabilities: • Updating object detection results - greedy forward search: • Updating the activity and atomic pose labels - maximizing the overall sum by enumerating all possible values for actions and human poses 28

  29. Results: Examples for Testing Images 29

  30. Results: Sports – Object Detection • Better overall performance across all objects • Better discrimination of similar objects (cricket ball vs. croquet ball) 30

  31. Results: Sports – Human Pose Estimation • Better overall performance across all poses • Outperform even Pictorial Structure model trained on separate classes! 31

  32. Results: Sports – Activity Classification • Better overall performance • Performance is better than just SPM by about 4% 32

  33. Results: Music – Object Detection • Better overall performance across all objects • Better improvement for “playing instrument” situations when context plays a more important role 33

  34. Results: Music – Object Detection • Demonstration of the importance of human poses for object detection 34

  35. Results: Music – Human Pose Estimation • Better performance for poses with “playing instrument” • Only marginally better for poses with “not playing instrument” • No significant improvement as compared to Pictorial Structure model 35

  36. Results: Music – Activity Classification • Better overall performance as compared to SPM and grouplet approach 36

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend