disentangling and unifying graph convolutions for
play

Disentangling and Unifying Graph Convolutions for Skeleton-Based - PowerPoint PPT Presentation

Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition Ziyu Liu 1,3 , Hongwen Zhang 2 , Zhenghao Chen 1 , Zhiyong Wang 1 , Wanli Ouyang 1,3 1 The University of Sydney, 2 University of Chinese Academy of Sciences &


  1. Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition Ziyu Liu 1,3 , Hongwen Zhang 2 , Zhenghao Chen 1 , Zhiyong Wang 1 , Wanli Ouyang 1,3 1 The University of Sydney, 2 University of Chinese Academy of Sciences & CASIA 
 3 The University of Sydney, SenseTime Computer Vision Research Group, Australia

  2. Agenda • Overview • Contributions 1. Factorized Modeling Unified Spatial-Temporal Modeling → 2. Adjacency Powering Disentangling Neighborhoods → • Experiments & Results

  3. Action Recognition from Skeletons • Human actions can be e ffi ciently represented by skeletons • Free of background clutter / lighting conditions / clothing variations Skeleton-Based … … … Action Recognition “Hand Shaking” Input Video Estimated 2D/3D Poses Skeletons Predicted Action Image Credit: Amir Shahroudy, Jun Liu, Tian-Tsong Ng, Gang Wang, "NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis", CVPR 2016

  4. Previous Approaches • Traditional • Handcraft features (e.g. Vemulapalli et al., CVPR’14; Huang et al., CVPR’17) • CNNs / RNNs (e.g. Ke et al., Wang et al., Liu et al., CVPR’17; Si et al., ECCV’18) • Often overlook semantic connectivity patterns between joints Huang et al., CVPR’17 • Graph-Based (e.g. Shi et al., Li et al., Si et al., CVPR’19; Li et al., Wen et al., AAAI’19) • Graphs naturally captures the structure of human bodies • nodes, bones edges Joints → → • No hand-crafted node traversal G = ( V , E )

  5. ̂ Preliminaries • Actions as Graph Sequences • Structure: -node graph with adjacency matrix (normalized ) N A A • Features: Joint locations over frames X T • Goal: Learn to classify graph sequences Each Frame … Entire Action A ∈ ℝ N × N A ∈ ℝ N × N Structure: Structure: X t ∈ ℝ N × C X ∈ ℝ T × N × C Features: Features:

  6. ̂ Preliminaries • Feature Learning with Graph Convolutional Nets (GCNs) (Kipf et al., ICLR’17) 1. Neighborhood feature aggregation 2. Layer-wise feature update Each Frame Entire Action Feature Update Neighborhood X ( l +1) = σ ( … A X ( l ) Θ ( l ) ) Aggregation Neighborhood A ∈ ℝ N × N A ∈ ℝ N × N Structure: Structure: Aggregation X t ∈ ℝ N × C X ∈ ℝ T × N × C Features: Features:

  7. Existing Graph-Based Approaches 2. Multi-Scale Graph Convolutions 1. Factorized Modeling GCNs + Temporal Models GCNs over di ff erent adjacency powers … … Spatial Aggregation Spatial Aggregation Temporal Aggregation e.g. Li et al. CVPR’19, Shi et al. CVPR’19, Shi et al. CVPR’19, Yan et al. AAAI’18 e.g. Li et al. CVPR’19, Li et al. AAAI’18

  8. Agenda • Overview • Contributions 1. Factorized Modeling Unified Spatial-Temporal Modeling → 2. Adjacency Powering Disentangling Neighborhoods → • Experiments & Results

  9. Previous Approach #1: Factorized Modeling • Learn spatial-temporal features with spatial / temporal modules • Spatial : Neighborhood aggregation (GCNs) • Temporal : Node-wise sequence models (1d conv / recurrent) … … Spatial Aggregation Temporal Aggregation (cf. Factorized 3D CNNs)

  10. Motivation #1: Indirect Information Flow • Factorization can create bottlenecks for feature propagation • Unweighted message passing (GCNs) can also make aggregated features generic information bottleneck Hard to learn spatial-temporal relationships (cf. Factorized 3D CNNs)

  11. Idea #1: Unified Spatial-Temporal Modeling • G3D modules : neighborhood aggregation across space and time • Edges serve as skip connections, allowing more direct information flow G3D Spatial-Temporal Information Flow

  12. Idea #1: Unified Spatial-Temporal Modeling Spatial Graph Skeleton Features (1) Sliding G3D Spatial-Temporal Window Temporal Window Window Features Spatial-Temporal Information Flow Temporal Edges (3) Graph Convolutions over Windows Spatial Edges +: )×) G3D Spatial-Temporal Edges #: % &' ×)×* &' Sliding Temporal Window size = ! , dilation = " + (-) : !)×!) (4) Squeeze Windows with 1x1 Conv # (-) : % /01 ×!)×* &' GCN X ( τ ) ! (Window Size) # (-) : % /01 ×!)×* 2&3 (2) Extrapolate Spatial Connectivity Conv 1x1 Collapse Window Reshape + FC BatchNorm #: % /01 ×)×* /01 X

  13. Idea #1: Unified Spatial-Temporal Modeling Spatial Graph Skeleton Features (1) Sliding G3D Spatial-Temporal Window Temporal Window Window Features Spatial-Temporal Information Flow Temporal Edges (3) Graph Convolutions over Windows Spatial Edges +: )×) G3D Spatial-Temporal Edges #: % &' ×)×* &' Sliding Temporal Window size = ! , dilation = " + (-) : !)×!) (4) Squeeze Windows with 1x1 Conv # (-) : % /01 ×!)×* &' GCN X ( τ ) ! (Window Size) # (-) : % /01 ×!)×* 2&3 (2) Extrapolate Spatial Connectivity Conv 1x1 Collapse Window Reshape + FC BatchNorm #: % /01 ×)×* /01 X

  14. Idea #1: Unified Spatial-Temporal Modeling Spatial Graph Skeleton Features (1) Sliding G3D Spatial-Temporal Window Temporal Window Window Features Spatial-Temporal Information Flow Temporal Edges (3) Graph Convolutions over Windows Spatial Edges +: )×) G3D Spatial-Temporal Edges #: % &' ×)×* &' Sliding Temporal Window size = ! , dilation = " + (-) : !)×!) (4) Squeeze Windows with 1x1 Conv # (-) : % /01 ×!)×* &' GCN X ( τ ) ! (Window Size) # (-) : % /01 ×!)×* 2&3 (2) Extrapolate Spatial Connectivity Conv 1x1 Collapse Window Reshape + FC BatchNorm #: % /01 ×)×* /01 X

  15. Idea #1: Unified Spatial-Temporal Modeling Spatial Graph Skeleton Features (1) Sliding G3D Spatial-Temporal Window Temporal Window Window Features Spatial-Temporal Information Flow Temporal Edges (3) Graph Convolutions over Windows Spatial Edges +: )×) G3D Spatial-Temporal Edges #: % &' ×)×* &' Sliding Temporal Window size = ! , dilation = " + (-) : !)×!) (4) Squeeze Windows with 1x1 Conv # (-) : % /01 ×!)×* &' GCN X ( τ ) ! (Window Size) # (-) : % /01 ×!)×* 2&3 (2) Extrapolate Spatial Connectivity Conv 1x1 Collapse Window Reshape + FC BatchNorm #: % /01 ×)×* /01 X

  16. Idea #1: Unified Spatial-Temporal Modeling Spatial Graph Skeleton Features (1) Sliding G3D Spatial-Temporal Window Temporal Window Window Features Spatial-Temporal Information Flow Temporal Edges (3) Graph Convolutions over Windows Spatial Edges +: )×) G3D Spatial-Temporal Edges #: % &' ×)×* &' Sliding Temporal Window size = ! , dilation = " + (-) : !)×!) (4) Squeeze Windows with 1x1 Conv # (-) : % /01 ×!)×* &' GCN X ( τ ) ! (Window Size) # (-) : % /01 ×!)×* 2&3 (2) Extrapolate Spatial Connectivity Conv 1x1 Collapse Window Reshape + FC BatchNorm #: % /01 ×)×* /01 X

  17. Discussion • Analogous to 3D convolutions on RGB videos • Unlike 3D conv, # parameters is independent of receptive field size • Temporal receptive field can be controlled based on input resolution • Considers more information at once and helps prevent losing features during unweighted spatial aggregation • Memory footprint Spatial-Temporal Neighborhood Aggregation

  18. Agenda • Overview • Contributions 1. Factorized Modeling Unified Spatial-Temporal Modeling → 2. Adjacency Powering Disentangling Neighborhoods → • Experiments & Results

  19. ̂ ̂ ̂ ̂ ̂ ̂ ̂ Previous Approach #2: Multi-Scale Graph Convolutions ˜ • Making A k k -hop neighbors reachable with • Mixing features A k X for k = 0,1,... with normalized A 0 0 0 5 5 5 3 3 3 Multi-Scale Aggregation 1 1 1 2 2 2 4 4 4 A k with di ff erent and Θ ( k ) 7 7 7 = σ ( ( k ) ) K ∑ X ( l +1) A k X ( l ) t Θ ( l ) t k =0 A 1 A 2 A 3 e.g. Li et al. CVPR’19, Abu-El-Haija et al. ICML’19, Luan et al. NeurIPS’19, Liao et al. ICLR’19

  20. ̂ ̂ ̂ Multi-Scale Aggregation A k with di ff erent and Θ ( k ) = σ ( ( k ) ) K Motivation #2: Biased Node Weighting ∑ X ( l +1) A k X ( l ) t Θ ( l ) t k =0 • Node weights from A k are biased towards closer nodes • More length- walks to closer nodes due to cyclic walks k Number of length- k walks from Node 1 Number of length- k walks to Self 500000 600000 To Node 1 Node 1 To Node 2 Node 2 To Node 3 Node 3 500000 400000 To Node 4 Node 4 � To Node 5 Node 5 Number of Walks Number of Walks 400000 300000 � � 300000 200000 200000 � � 100000 100000 0 0 0 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 18 Walk Length k Walk Length k

  21. ̂ ̂ ̂ Multi-Scale Aggregation A k with di ff erent and Θ ( k ) = σ ( ( k ) ) K Motivation #2: Biased Node Weighting ∑ X ( l +1) A k X ( l ) t Θ ( l ) t k =0 • Node weights from A k are biased towards closer nodes • More length- walks to closer nodes due to cyclic walks k Number of length- k walks to Self 400 Node 1 Node 1 (Self-loops) 350 Node 2 Node 2 (Self-loops) 300 Node 3 Number of Walks � Node 3 (Self-loops) 250 � Node 4 � 200 Node 4 (Self-loops) Node 5 150 Node 5 (Self-loops) � � 100 50 0 1 2 3 4 5 6 Walk Length k

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend