of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May - PowerPoint PPT Presentation

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May 28 th , 2013 1

Agenda • Introduction & Motivation • Dataset description • Model • Training • Inference • Results 2

Context and Recognition Human visual system uses context for recognition 3

Human Object Interaction (HOI) 4

Human Poses and Objects Human pose Unusual part estimation is appearances challenging. Self occlusion Patch looks like body part 5

Human Poses and Objects Given the object is detected. 6

Human Poses and Objects Object detection is challenging Small, low- resolution, partially occluded Image region similar to detection target 7

Human Poses and Objects Given the pose is estimated. 8

Datasets - Sports Images of six sports activities 9

Datasets - PPMI People interacting with 12 classes of musical instruments 10

Atomic poses – pose dictionary 11

Mutual Context Model • Goal: Estimate the human pose and detect the objects that the human interacts with – Occluded or small objects – Articulated human poses – variation of poses in one class of activity • Conditional random field model • Human interacting with any number of objects 12

Model y ( A , O , H , I ) = f 1 ( A , O , H ) + f 2 ( O , H ) Activity Co-occurrence context Spatial context A + f 3 ( O , I ) + f 4 ( H , I ) + f 5 ( A , I ) Human pose H Objects O M O 1 Modeling objects Modeling activity Body parts Modeling human pose P 1 P 2 P L I Image of human-object interaction 13

Model: Co-occurrence Context Activity A Compatibility between actions, objects, and Human pose human poses H Objects O M O 1 Body parts f 1 ( A , O , H ) = P 1 P 2 P L N b N o N a M 1( H = h i ).1( O m = o j ).1( A = a k ) z i , j , k å å å å i = 1 m = 1 j = 1 k = 1 I Image of human-object interaction 14

Model: Co-occurrence Context f 1 ( A , O , H ) = N h N o N a M 1( H = h i ).1( O m = o j ).1( A = a k ) z i , j , k å å å å i = 1 m = 1 j = 1 k = 1 N h : total number of atomic poses h i : the i th atomic pose N o : total number of objects o j : the j th object N a : total number of activates a k : the k th activity ζ i,j,k : strength of the co-occurrence interaction 15

Model: Spatial Context Activity A Spatial relationship between object and Human pose different body parts of the human H Objects O M O 1 Body parts f 2 ( H , O ) = P 1 P 2 P L N h N o M L 1( H = h i ).1( O m = o j ). l i , j , l å å å å T l , O m ) . b ( X I m = 1 i = 1 j = 1 l = 1 I Image of human-object interaction 16

Model: Spatial Context f 2 ( H , O ) = N h N o M L 1( H = h i ).1( O m = o j ). l i , j , l å å å å T l , O m ) . b ( X I m = 1 i = 1 j = 1 l = 1 l : location of the center of human’s l th body part in image I x I l and the m th object l m ): spatial relationship between x I b(x I , O bounding box  sparse binary vector with one 1 λ i,j,l : Weight for the relationship 17

Model: Objects Modeling objects using the detection scores Activity in all the object bounding boxes and the A spatial relationship between these boxes. Human pose H Objects f 3 ( O , I ) = O M O 1 N o M 1( O m = o j ). g j å å T . g ( O m ) + Body parts m = 1 j = 1 P 1 P 2 P L N o M M L 1( O m = o j ).1( O m = o ¢ å å å å T . b ( O m , O m ) ¢ ¢ j ). g j , ¢ j m = 1 m = 1 ¢ j = 1 j = 1 ¢ I 18

Model: Objects f 3 ( O , I ) = N o M 1( O m = o j ). g j å å T . g ( O m ) + m = 1 j = 1 N o M M L 1( O m = o j ).1( O m = o ¢ å å å å T . b ( O m , O m ) ¢ ¢ j ). g j , ¢ j m = 1 m = 1 ¢ j = 1 j = 1 ¢ g(O m ) : vector of scores of all detected object in the m th box ϒ j : the detection score weight for the j th object b(O m, O m’ ) : binary vector of spatial relationship between pairs of objects ϒ j,j ’ : weight for geometric configuration between o j and o j ’ [Desai et al, 2009] 19

Model: Human Pose Likelihood of observing image I given the Activity atomic pose h i A Human pose H f 4 ( H , I ) = O M O 1 N h L å å T . p ( X I l | X h i l )) + Body parts 1( H = h i ).( a i , l b i , l T . f l ( I )) i = 1 l = 1 P 1 P 2 P L I Image of human-object interaction 20

Model: Human Pose f 4 ( H , I ) = N h L å å T . p ( X I l | X h i l )) + 1( H = h i ).( a i , l b i , l T . f l ( I )) i = 1 l = 1 l | x hi l ) : Gaussian likelihood of observing x I l , given the standard joint p(x I location of the l th body part in pose h i f l (I) : the l th body part detection output α j,l : location weight for the l th body part in pose h i β j,l : appearance weight for the l th body part in pose h i 21

Model: Activities Activity classifier to model HOI activity Activity A f 5 ( A , I ) = Human pose H N o å Objects 1( A = a k ). h k b i , l T . T . s ( I )) O M O 1 k = 1 Body parts P 1 P 2 P L I Image of human-object interaction 22

Model: Activities f 5 ( A , I ) = N o å 1( A = a k ). h k b i , l T . T . s ( I )) k = 1 η k : feature weight for activity a k s(I) : output of one-versus-all discriminative classifier 23

Training: Atomic Poses Hierarchical clustering from a given set of poses on training images: • Position and orientation of parts with distance • Normalization to the same position/size of torso (sports) or head (music) • Variations in position and orientation are normalized to [-1,1] • Missing parts are filled from the image’s nearest neighbor • Atomic poses are shared by all activities w 𝑈 ⋅ ∣ x 𝑚 − x 𝑚 ∣ 24

Training: Objects and Part Detectors Deformable Parts Model with SVM on HOG feature detectors: • One mixture component per per body part • Two mixture components per object unless aspect ratios do not change • - value of the object detection score divided by the threshold • - value of the body part detection divided by the threshold 25

Training: Activity Classifier Spatial Pyramid Matching method: • Sparse SIFT features on three layers • - a vector with confidence scores obtained from an SVM classifier 26

Training: Estimating Model Parameters Conditional Random Field with no hidden variables: • - model parameters • Maximum likelihood approach • Zero-mean Gaussians priors 27

Inference: Iterative Process Initialization : • Action classification with SPM classification • Object bounding boxes from independent object detectors (scores >0.9) • Initial pose from a pictorial structure model from all training images Two Iterations : • Updating the layout of human body parts - updating Gaussian priors for part locations with poses marginal probabilities: • Updating object detection results - greedy forward search: • Updating the activity and atomic pose labels - maximizing the overall sum by enumerating all possible values for actions and human poses 28

Results: Examples for Testing Images 29

Results: Sports – Object Detection • Better overall performance across all objects • Better discrimination of similar objects (cricket ball vs. croquet ball) 30

Results: Sports – Human Pose Estimation • Better overall performance across all poses • Outperform even Pictorial Structure model trained on separate classes! 31

Results: Sports – Activity Classification • Better overall performance • Performance is better than just SPM by about 4% 32

Results: Music – Object Detection • Better overall performance across all objects • Better improvement for “playing instrument” situations when context plays a more important role 33

Results: Music – Object Detection • Demonstration of the importance of human poses for object detection 34

Results: Music – Human Pose Estimation • Better performance for poses with “playing instrument” • Only marginally better for poses with “not playing instrument” • No significant improvement as compared to Pictorial Structure model 35

Results: Music – Activity Classification • Better overall performance as compared to SPM and grouplet approach 36

of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May - PowerPoint PPT Presentation

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May 28 th , 2013 1 Agenda Introduction & Motivation Dataset description Model

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Mutable Values Announcements Objects (Demo) Objects 4 Objects Objects represent

61A Lecture 12 Announcements Objects (Demo) Objects 4 Objects Objects represent

Objects & Inheritance Section 7 Implementing Objects in 401 Ways of implementing objects:

Live Objects Live Objects Live Objects Live Objects Krzys Ostrowski, Ken Birman, Danny Dolev

Object Oriented Programming Sunil Pai, Y! Objects Objects and Javascript Numbers Strings

Objects (Demo1) Objects Objects represent information They consist of data and behavior,

Mutable Values Announcements Objects (Demo) Objects Objects represent information They

61A Lecture 12 Announcements Objects (Demo) Objects Objects represent information They

Transforming Objects Ray : R(t) = s + c t Objects : Sphere, box, cone etc. We assume the objects

Deep Convolutional Poses for Human Interaction Recognition in Monocular Videos Marcel Sheeny de

Objects and Meaning Unit Plan: Comfort and Objects Kelly Junis ART 333- Curriculum Development

Review Objects Classes Objects and Arrays Models of Motion with Objects Linear

Objects and Classes Objects with attributes Objects are the basis of object-oriented programming.

By a set we mean any collection of objects that are precisely spec- ified. These objects are called

Poses and Motion: Representations of Motion and Kinematics of Rigid Bodies The Heart of

Verb polysemy and frequency effects in thematic fit modeling Clayton Greenberg, Vera Demberg, and

Squeak and Croquet Please note: The following are just screenshots from the demo made with

Recycling of texts in early English books Patrik Aaltonen Susan Huotari Mika Koistinen

Womens Roles in Puritan Culture revised 07.21.10 || English 2327: American Literature I || D.

Kinds of Interfaces Semester 2, 2009 1 Interface Categories and Styles Basic Categories of

Derivative Evaluation by Automatic Differentiation of Programs Laurent Hasco et

Advanced Macroeconomics 10. Determinants of Total Factor Productivity Karl Whelan School of

2019 NCSEA Board of Directors Election Photo Michele Ahern Assistant Deputy Commissioner NYC

of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May - PowerPoint PPT Presentation

Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses Maryam Daneshi, Konstantin Bayandin May 28 th , 2013 1 Agenda Introduction & Motivation Dataset description Model

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Mutable Values Announcements Objects (Demo) Objects 4 Objects Objects represent

61A Lecture 12 Announcements Objects (Demo) Objects 4 Objects Objects represent

Objects &amp; Inheritance Section 7 Implementing Objects in 401 Ways of implementing objects:

Live Objects Live Objects Live Objects Live Objects Krzys Ostrowski, Ken Birman, Danny Dolev

Object Oriented Programming Sunil Pai, Y! Objects Objects and Javascript Numbers Strings

Objects (Demo1) Objects Objects represent information They consist of data and behavior,

Mutable Values Announcements Objects (Demo) Objects Objects represent information They

61A Lecture 12 Announcements Objects (Demo) Objects Objects represent information They

Transforming Objects Ray : R(t) = s + c t Objects : Sphere, box, cone etc. We assume the objects

Deep Convolutional Poses for Human Interaction Recognition in Monocular Videos Marcel Sheeny de

Objects and Meaning Unit Plan: Comfort and Objects Kelly Junis ART 333- Curriculum Development

Review Objects Classes Objects and Arrays Models of Motion with Objects Linear

Objects and Classes Objects with attributes Objects are the basis of object-oriented programming.

By a set we mean any collection of objects that are precisely spec- ified. These objects are called

Poses and Motion: Representations of Motion and Kinematics of Rigid Bodies The Heart of

Verb polysemy and frequency effects in thematic fit modeling Clayton Greenberg, Vera Demberg, and

Squeak and Croquet Please note: The following are just screenshots from the demo made with

Recycling of texts in early English books Patrik Aaltonen Susan Huotari Mika Koistinen

Womens Roles in Puritan Culture revised 07.21.10 || English 2327: American Literature I || D.

Kinds of Interfaces Semester 2, 2009 1 Interface Categories and Styles Basic Categories of

Derivative Evaluation by Automatic Differentiation of Programs Laurent Hasco et

Advanced Macroeconomics 10. Determinants of Total Factor Productivity Karl Whelan School of

2019 NCSEA Board of Directors Election Photo Michele Ahern Assistant Deputy Commissioner NYC

Objects & Inheritance Section 7 Implementing Objects in 401 Ways of implementing objects: