feature representation in person
play

Feature Representation in Person Re-identification Hong Chang - PowerPoint PPT Presentation

Feature Representation in Person Re-identification Hong Chang Institute of Computing Technology Chinese Academy of Sciences 2020.1 Contents Feature representation in person Re-ID Related recent works Learning features with High


  1. Feature Representation in Person Re-identification Hong Chang Institute of Computing Technology Chinese Academy of Sciences 2020.1

  2. Contents  Feature representation in person Re-ID – Related recent works  Learning features with – High robustness – High discriminativeness – Low information loss/redundancy  Discussions 2

  3. Person Re-identification  The problem ?  Main challenges pose scale occlusion illumination 3

  4. Feature Representation & Metric Learning  The work flow of person Re-ID Camera A Feature Detection Image/Video representation Metric results learning Feature Image/Video Detection representation Camera B  Two key components – Feature representation – Metric learning 4

  5. Recent Works in Feature Representation  For images: traditional deep feature feature (a) global local hard adaptive part part part detection [1-3] [4-6] [7-10] – Better person part alignment (b) – Weaknesses: part detection loss, extra computation, etc. – Unsolved problems: (a) discriminative region? (b) occlusion? 5

  6. Recent Works in Feature Representation  For videos: image set spatial-temporal feature feature [11-13] low-order high-order information information [14] recurrent network, non-local 3D convolution [14-16] [16] – Unsolved problems: (a) disturbance? (b) occlusion? (b) (a) 6

  7. Feature Representation for Person Re-ID Discriminativeness (towards disturbance & occlusion) Cross-attention network Occlusion recovery Interaction- Existing aggregation feature representation Robustness (towards pose & scale changes) Knowledge propagation Completeness (low information loss) 7

  8. Feature Representation for Person Re-ID Discriminativeness (towards disturbance & occlusion) Cross-attention network Occlusion recovery Interaction- Existing aggregation feature representation Robustness (towards pose & scale changes) Knowledge propagation Completeness (low information loss) 8

  9. Interaction-Aggregation Feature Representation  To deal with pose and scale changes pose scale  Main idea: – Unsupervised, Light weight – Semantic similarity 9

  10. Interaction-Aggregation Feature Representation  Spatial IA – adaptively determines the receptive fields according to the input person pose and scale – Interaction: models the relations between spatial features to generate a semantic relation map 𝑇 . – Aggregation: aggregates semantically related features across different positions based on 𝑇 . 10

  11. Interaction-Aggregation Feature Representation  Channel IA – selectively aggregates channel features to enhance the feature representation, especially for small scale visual cues – Interaction: models the relations between channel features to generate a semantic relation map C . – Aggregation based on relation map C 11

  12. Interaction-Aggregation Feature Representation  Overall model – IANet: CNN with IA modules – Extension: spatial-temporal context IA 12

  13. Interaction-Aggregation Feature Representation  Visualization results – receptive fields: sub-relation maps with high relation values – SIA can adaptively localize the body parts and visual attributes under various poses and scales. Images receptive fields Images receptive fields 13

  14. Interaction-Aggregation Feature Representation  Visualization for pose and scale robustness  Quantitative results Ablation study Market-1501&DukeMTMC G : global feature P : part feature MS : multi-scale feature [17] R. Hou, B. Ma, H. Chang, X. Gu, S. Shan, and X. Chen. Interaction-and-aggregation network for person re- identification, in CVPR, 2019. 14

  15. Feature Representation for Person Re-ID Discriminativeness (towards disturbance & occlusion) Cross-attention Occlusion recovery Interaction- Existing aggregation feature representation Robustness (towards pose & scale changes) Knowledge propagation Completeness (low information loss) 15

  16. Cross-Attention Feature Representation  Motivation: to localize the relevant regions and generate more discriminative features – Person re-identification – Few-shot classification  Main idea: utilizing semantic relations meta-learns where to focus on! 16

  17. Cross-Attention Feature Representation  Cross-attention module – highlights the relevant regions and generate more discriminative feature pairs – Correlation Layer: calculate a correlation map 𝑆 ∈ ℝ ℎ×𝑥 × ℎ×𝑥 between support feature 𝑄 and query feature 𝑅 . It denotes the semantic relevance between each spatial position of 𝑄, 𝑅. 17

  18. Cross-Attention Feature Representation  Cross-attention module – Fusion Layer: generate the attention map pairs 𝐵 𝑞 𝐵 𝑟 ∈ ℝ ℎ×𝑥 based on the corresponding correlation maps 𝑆 .  The kernel 𝑥 fuses the correlation vector into an attention scalar.  The kernel 𝑥 should draw attention to the target object.  A meta fusion layer is designed to generate the kernel 𝑥 . 18

  19. Cross-Attention Feature Representation  Experiments on few-shot classification – state-of-the-art on miniImageNet and tieredImageNet datasets O : Optimization-based P : Parameter-generating M : Metric-learning T : Transductive [18] R. Hou, H. Chang, B. Ma, S. Shan, and X. Chen. Cross Attention Network for Few-shot Classification. 19 In NeurIPS, 2019.

  20. Feature Representation for Person Re-ID Discriminativeness (towards disturbance & occlusion) Cross-attention network Occlusion recovery Interaction- Existing aggregation feature representation Robustness (towards pose & scale changes) Knowledge propagation Completeness (low information loss) 20

  21. Temporal Knowledge Propagation  Image-to-video Re-ID – Image lacks temporal information – Information asymmetry increases matching difficulty  Our solution: temporal knowledge propagation 21

  22. Temporal Knowledge Propagation  The framework – Propagation via cross sample – Propagation via features: distances: – Integrated Triplet Loss: 22

  23. Temporal Knowledge Propagation  Testing pipeline of I2V Re_ID – SAT: spatial average pooling – TAP: temporal average pooling 23

  24. Temporal Knowledge Propagation  Visualization – The learned image features focus on more foreground – More consistent feature distributions of two modalities 24

  25. Temporal Knowledge Propagation  Experimental results Comparison among I2I, I2V and V2V ReID [19] X. Gu, B. Ma, H. Chang, S. Shan, X. Chen, Temporal Knowledge Propagation for Image-to-Video Person Re-identification. In ICCV, 2019. 25

  26. Feature Representation for Person Re-ID Discriminativeness (towards disturbance & occlusion) Cross-attention network Occlusion recovery Interaction- Existing aggregation feature representation Robustness (towards pose & scale changes) Knowledge propagation Completeness (low information loss) 26

  27. Occlusion-free Video Re-ID  Occlusion problem  information loss  Our solution: explicitly recover the appearance of the occluded parts  Method overview – Similarity scoring mechanism: locate the occluded parts – STCnet: recover the appearance of the occluded parts 27

  28. Occlusion-free Video Re-ID  Spatial-Temporal Completion network (STCnet) – Spatial Structure Generator: make a coarse prediction for occluded parts conditioned on the visible parts – Temporal Attention Generator: refine the occluded contents with temporal information – Discriminator: real or not? – ID Guider: classification target 28

  29. Occlusion-free Video Re-ID  Visualization results  Quantitative results MARS Ablation study [20] R. Hou, B. Ma, H. Chang, X. Gu, S. Shan, and X. Chen, VRSTC: Occlusion-free video person re-identification, in CVPR, 2019. 29

  30. Discussions Discriminativeness  As for our (towards disturbance & occlusion) methods …  meta-attended discriminative regions Cross-attention  good generalization ability network  necessarity? Occlusion  extension to ST context  redundancy recovery Interaction- Existing for video?  plug-in CNNs aggregation feature representation Robustness (towards pose & scale changes) Knowledge  lead in temporal information propagation  from videos to images Completeness Completeness (low information loss) (low information loss & redundancy) 30

  31. Discussions  Limitations in feature representation learning – For images, the discriminative ability is upper bounded  Appearance { 𝑦 1 , 𝑦 2 , …, 𝑦 𝑛 }  Identity 𝑧  Large appearance variation & little relation with identity, e.g., the same person with different clothes or accessories  Application: short term, restricted regions – For videos, more discriminative spatial temporal features are required  Key: temporal information representation  Other information: trajectory, other spatial temporal references  Application: more real-world scenarios 31

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend