semantic spaces for zero shot behaviour analysis
play

Semantic Spaces for Zero-Shot Behaviour Analysis Xun Xu Computer - PowerPoint PPT Presentation

Semantic Spaces for Zero-Shot Behaviour Analysis Xun Xu Computer Vision and Interactive Media Lab, NUS Singapore 1 Collaborators Prof. Shaogang Gong Dr. Timothy Hospedales 2 Outline Background Transductive Zero-Shot Action


  1. Semantic Spaces for Zero-Shot Behaviour Analysis Xun Xu Computer Vision and Interactive Media Lab, NUS Singapore 1

  2. Collaborators Prof. Shaogang Gong Dr. Timothy Hospedales 2

  3. Outline • Background • Transductive Zero-Shot Action Recognition • Multi-Task Zero-Shot Embedding • Zero-Shot Crowd Analysis 3

  4. Video Behaviour Defined as Visually Distinguishable Activities • Human Actions • Crowd Behaviour 4

  5. Human Actions • Individual or multiple interactive human activities 5 Soomro, et al. “UCF101 : A Dataset of 101 human actions classes from videos in the wild.” 2012

  6. Human Actions Tasks • Action Recognition Eye Makeup Rafting Swimming Fencing Diving Archery 6

  7. Human Actions Tasks • Action Detection (Retrieval) Given query “Swimming” return ranked videos Lower Ranking …… 7

  8. Crowd Behaviour • A group of people acting collectively 8 Shao, J., et al. “Deeply learned attributes for crowded scene understanding .” CVPR 2015

  9. Crowd Behaviour Tasks • Crowd Behaviour Profiling 9

  10. Crowd Behaviour Tasks • Crowd Anomaly Detection 10 Hassner, T., et al. “Violent flows: Real-time detection of violent crowd behavior .” CVPR 2012

  11. Potential Applications Human Computer Interaction Surveillance Video Sharing 11

  12. Outline • Background • Transductive Zero-Shot Action Recognition • Multi-Task Zero-Shot Embedding • Zero-Shot Crowd Analysis 12

  13. Motivation • Ever Increasing #Categories for action recognition 2004 2005 2010 Weizmann 9 Classes KTH 6 Classes Olympic Sports 16 Classes 2011 2012 2015 203 Classes UCF101 101 Classes HMDB51 51 Classes 13

  14. Motivation • Ever Increasing #Categories Limitations 2004 2005  Expensive to collect training data 2010 Weizmann 9 Classes KTH 6 Classes Olympic Sports 16 Classes  Annotating video is costly 2011 2012 2015 203 Classes UCF101 101 Classes HMDB51 51 Classes 14

  15. Zero-Shot Learning (ZSL) • Can we use videos from known class to help predict videos from unknown classes? Known Classes Unknown Classes Shot-Put Hammer Throw Discus Throw 15

  16. Attribute Semantic Space • Attribute Based Attributes Hammer Throw Throw Away Outdoor Discus Throw Turn Around Ball Bend 16

  17. Attribute Semantic Space • Attribute Based Attributes Hammer Throw Throw Away Outdoor Discus Throw Turn Around Ball Shot-put Bend Known a priori 17

  18. Attribute Semantic Space • Attribute Based Attributes Hammer Throw Throw Away Test video Outdoor Discus Throw Turn Around Ball Shot-put Bend 18

  19. Attribute Semantic Space • Attribute Based Attributes Limitations Discus Throw Throw Away • Ontological problem Outdoor Hammer • Manual label attributes is Throw Turn costly for videos Around • Incompatible with other Ball Shot-put attribute sets Bend 19

  20. Word-Vector Semantic Space Feature Space X Word-Vector Space Z Hammer Discus Throw = [0.2 0.5 0.1 …] Throw  ( ) z f x Discus Throw Hammer Throw = [0.1 0.6 0.1 …] 20

  21. Word-Vector Semantic Space Feature Space X Word-Vector Space Z Hammer Discus Throw = [0.2 0.5 0.1 …] Throw ShotPut = [0.3 0.4 0.2 …] Discus Throw Hammer Throw = [0.1 0.6 0.1 …] 21

  22. Semantic Word-Vector • Skip-gram model predicts adjacent words 1 T   max log p(z | z ) t  j t T { z }      t c j c , j 1 0 T exp(z z )   i j p(z z ) | i j T exp(z z ) i j i Result of this optimization vec (“ball”)=[ -0.004 0.01 0.01 -0.03 0.05] vec (“sword”)=[0.16 0.06 0.09 -0.06 -0.002] vec (“archery”)=[0.02 0.01 0.02 -0.03 -0.03] vec (“boxing”)=[ -0.08 -0.01 0.15 -0.01 0.09] Mikolov, T., et al. "Distributed representations of words and phrases and their compositionality .” NIPS2013 22 Pennington, J., et al. "Glove: Global vectors for word representation." EMNLP 2014.

  23. Benefits • Geometric Meaningful Word-Vector Space ship Far Away Run cat Walk Closer dog 23

  24. Benefits • Unsupervised Semantic Space 24

  25. Benefits • Wide coverage of words Vec (“Apple”) = [0.2 0.3 0.1 …] Vec (“Bear”) = [0.1 0.9 0.1 …] Vec (“Car ”) = [0.6 0.2 0.4 …] Vec (“Desk”) = [0.2 0.8 0.4 …] Vec (“Fish”) = [0.5 0.2 0.3 …] … 25

  26. Benefits • Uniform across datasets Dataset 1 Dataset 2 Discus Throw = [0.2 0.5 …] Discus Throw = [0.2 0.5 …] HammerThrow = [0.1 0.2 …] HammerThrow = [0.1 0.2 …] 26

  27. Challenges • Domain Shift Feature Space X Semantic Vector Space Y Discus Throw Hammer Throw HammerThrow Sword Exercise Discus Throw Play Guitar 27

  28. Challenges • Domain Shift Feature Space X Semantic Vector Space Y Discus Throw Hammer Throw HammerThrow Sword Exercise Discus Throw Confusion Play Guitar 28

  29. Our Solution 29 Xu, X., et al. “ Transductive Zero-Shot Action Recognition by Word-Vector Embedding .” IJCV 2017

  30. Our Solution 30 Xu, X., et al. “ Transductive Zero-Shot Action Recognition by Word-Vector Embedding .” IJCV 2017

  31. Low-Level Visual Feature • Improved Trajectory Feature for x 31 Wang, H. and Schmid , C., et al. “Action recognition with improved trajectories,” ICCV13

  32. Our Solution 32 Xu, X., et al. “ Transductive Zero-Shot Action Recognition by Word- Vector Embedding.” IJCV 2017

  33. Combinations of Multi Words • A phrase is constructed from single word vectors Additive Composition vec (“Apply Eye Makeup”) = vec (“Apply”) + vec (“Eye”) + vec (“Makeup”) vec (“Brushing Teeth”) = vec (“Brushing”) + vec (“Teeth”) vec (“Playing Guitar”) = vec (“Playing”) + vec (“Guitar”) 33

  34. Our Solution 34 Xu, X., et al. “ Transductive Zero-Shot Action Recognition by Word- Vector Embedding.” IJCV 2017

  35. Visual to Semantic Mapping by Regularized Linear Regression • Multi-Dimensional Regularized Linear Regression N  2 2    min z Wx W i i 2 2 W  i 1 W x is N Dimension z is D Dimension x 1 z 1 Feature Space Semantic Space x 2 z 2 …… x 3 … 35

  36. Domain Shift – Semi Supervised (Manifold Regularized) Regression • Semi-supervised regression is applied to tackle domain shift which takes test data distribution into consideration trg X Target Train Data tr X trg Target Test Data te Train and Test Data in Feature Space X  X trg tr tr  trg X X te te KNN Graph KNN Graph to model Manifold weight     2       f x f x : x [ X ;X ] Manifold Regularizor ij i j tr te 2 36

  37. Domain Shift – Semi Supervised (Manifold Regularized) Regression • Semi-supervised regression is applied to tackle domain shift which takes test data distribution into consideration trg X Target Train Data tr X trg Target Test Data te KNN Graph to model Manifold N   2 2 2        min z Wx W Wx Wx i i ij i j 2 2 2 W  i ij 1 37

  38. Our Solution Additional datasets are available 38 Xu, X., et al. “ Transductive Zero-Shot Action Recognition by Word- Vector Embedding.” IJCV 2017

  39. Data Augmentation • Use more training data from Auxiliary Dataset to help learn a better regression Augmented Train and Test Data in Feature Space X  [ X trg ; X aux ] tr tr  trg X X te te trg X Target Train Data tr aux Auxiliary Data X trg X Target Test Data te trg X Target Dataset Train Data tr (e.g. HMDB51) Data Augmentation X aux Auxiliary Dataset Data More Data is considered to learn more robust regressor (e.g. UCF101) 39

  40. Semantic Word Vector Approach 40

  41. Zero-Shot Recognition by Nearest Neighbor • Do nearest Neighbor search in word-vector space to predict category of test data HulaHoop Fencing Basketball W Diving TestData Kayaking Minimal distance Rafting TaiChi Category Name Test Video Instance 41

  42. Domain Shift – SelfTraining • Self-training is applied to tackle domain shift Category Name  z f ( x ) Test Video Instance te z z  Z("Taichi") g("Taichi") 2 3 K 1  *  Z ("Taichi") z z z te K 4 Z("Taichi") 1 z  NN( Z("Taichi"),K ) te z , K ) is the KNN function NN( Z proto 5 z 7 z z 4 NN example 8 Z ("Taichi") * 6     Z ("Taichi") * ( z z z z ) 4 5 6 7 8 42

  43. Domain Shift – SelfTraining • Self-training is applied to tackle domain shift Category Name  z f ( x ) Test Video Instance te z z  Z("Taichi") g("Taichi") 2 3 K 1  *  Z ("Taichi") z z z te K 4 Z("Taichi") 1 z  NN( Z("Taichi"),K ) te z , K ) is the KNN function NN( Z proto 5 z 7 z z 4 NN example 8 Z ("Taichi") * 6     Z ("Taichi") * ( z z z z ) 4 5 6 7 8 43

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend