video analytics
play

Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 - PowerPoint PPT Presentation

Day 4 Lecture 4 Video Analytics Xavier Gir-i-Nieto Motivation 2 Motivation 3 Motivation 4 Outline 1. Scene Classification 2. Object Detection & Tracking 5 Scene Classification (Slides by Victor Campos) Karpathy, A., Toderici, G.,


  1. Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto

  2. Motivation 2

  3. Motivation 3

  4. Motivation 4

  5. Outline 1. Scene Classification 2. Object Detection & Tracking 5

  6. Scene Classification (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. 6 (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

  7. Scene Classification Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision , pp. 4489-4497. 2015 7

  8. Scene Classification Previous lectures Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision , pp. 4489-4497. 2015 8

  9. Scene Classification Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision , pp. 4489-4497. 2015 9

  10. Scene Classification: DeepVideo: Architectures (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. 10 (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

  11. Scene Classification: DeepVideo: Features Unsupervised learning [Le at al’11] Supervised learning [Karpathy et al’14] (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. 11 (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

  12. Scene Classification: DeepVideo: Multires (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. 12 (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

  13. Scene Classification: DeepVideo: Results (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. 13 (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014

  14. Scene Classification Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015 14

  15. Scene Classification: C3D Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." 15 CVPR 2015

  16. Scene Classification: C3D: Spatial Dimensions K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition” ICLR 2015. 16

  17. Scene Classification: C3D: Temporal dimension 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets 2D ConvNets Temporal depth Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 17 features with 3D convolutional networks." CVPR 2015

  18. Scene Classification: C3D: Temporal dimension A homogeneous architecture with small 3 × 3 × 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 18 features with 3D convolutional networks." CVPR 2015

  19. Scene Classification: C3D: Temporal dimension No gain when varying the temporal depth across layers. Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 19 features with 3D convolutional networks." CVPR 2015

  20. Scene Classification: C3D: Network Architecture Feature vector Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 20 features with 3D convolutional networks." CVPR 2015

  21. Scene Classification: C3D: Feature Vector 16 frames-long clips Video sequence 8 frames-long overlap Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 21 features with 3D convolutional networks." CVPR 2015

  22. Scene Classification: C3D: Feature Vector 16-frame clip 4096-dim video descriptor 4096-dim video descriptor 16-frame clip Average L2 norm 16-frame clip ... 16-frame clip Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 22 features with 3D convolutional networks." CVPR 2015

  23. Scene Classification: C3D: Visualization Based on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details. Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 23 features with 3D convolutional networks." CVPR 2015

  24. Scene Classification: C3D: Visualization C3D + simple linear classifier outperformed state-of-the-art methods on 4 different benchmarks, and were comparable with state of the art methods on other 2 benchmarks Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 24 features with 3D convolutional networks." CVPR 2015

  25. Scene Classification: C3D: Software Implementation by Michael Gygli (GitHub) Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal 25 features with 3D convolutional networks." CVPR 2015

  26. Classification: Image & Optical Flow CNN + LSTM Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and 26 George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015

  27. (Scene Classification: Image &) Optical Flow Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. 27 and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. CVPR 2015

  28. (Scene Classification: Image &) Optical Flow Since existing ground truth datasets are not sufficiently large to train a Convnet, a synthetic dataset is generated… and augmented (translation, rotation, scaling transformations; additive Gaussian noise; changes in brightness, contrast, gamma and color). Data augmentation Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. 28 and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. CVPR 2015

  29. Scene Classification & Detection + CNN RNN “Biking” 29 Slide credit: Albero Montes

  30. Classification & Detection: Proposals + C3D (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 30 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

  31. Classification & Detection: Proposals + C3D (1) Binary classification: Action or No Action (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 31 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

  32. Classification & Detection: Proposals + C3D (2) One-vs-all Action classification (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 32 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

  33. Classification & Detection: Proposals + C3D (3) Refinement with temporal-aware loss function (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 33 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

  34. Classification & Detection: Proposals + C3D Post-processing (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 34 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

  35. Classification & Detection: Proposals + C3D (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal 35 Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]

  36. Classification & Detection: Image + RNN + Reinforce Yeung, Serena, Olga Russakovsky, Greg Mori, and Li Fei-Fei. "End-to-end Learning of Action Detection 36 from Frame Glimpses in Videos." CVPR 2016

  37. Scene Classification & Detection: C3D + LSTM Montes A. “Temporal Activity Detection in Untrimmed Videos with Recurrent Neural 37 Networks”. BSc thesis submitted to ETSETB (2016) [code available in Keras]

  38. Outline 1. Scene Classification 2. Object Detection & Tracking 38

  39. Objects: ImageNet Video [ILSVRC 2015 Slides and videos] 39

  40. Objects: ImageNet Video [ILSVRC 2015 Slides and videos] 40

  41. Objects: ImageNet Video: T-CNN Object Detection Object Tracking (Slides by Andrea Ferri): Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, and Wanli Ouyang, “Object Detection From Video Tubelets With Convolutional Neural 41 Networks”, CVPR 2016 [code]

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend