Analyzing and Predicting Human Activities in Video Greg Mori - PowerPoint PPT Presentation

Analyzing and Predicting Human Activities in Video Greg Mori Professor School of Computing Science Simon Fraser University Research Director Borealis AI Vancouver

What does activity recognition involve?

Detection: are there people?

indoor scene long term care walker facility chair Objects and scenes: where are they? floor

run stand Action recognition: what are they doing? fall squat

get help watch Intention/social role: why are they doing this? comfort

help the fallen person Group activity recognition: what is the overall situation?

indoor scene get help run watch long term care walker help the facility stand fallen person chair These are inter-related problems: model structures comfort fall squat floor

Desiderata for Activity Recognition Models Group structure Label structure Temporal structure help the fallen long term care facility person walker indoor scene chair floor time Ibrahim et al., CVPR 16 Yeung et al., CVPR 16 Hu et al., CVPR 16 Mehrasa et al., SLOAN 18 Yeung et al., IJCV 17 Deng et al., CVPR 16 Khodabandeh et al., arXiv 17 He et al., WACV 18 Nauata et al., CVPRW 17 Lan et al. CVPR 12 Chen et al., ICCVW 17 Deng et al., CVPR 17 Zhong et al., 2018 9

Task: action detection Input Output t = 0 t = T Running Talking Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.

Dominant paradigm: Dense processing t = 0 t = T … Standard in THUMOS challenge action detection entries … Oneata et al. 2014 Wang et al. 2014 Gkioxari and Malik 2015 Oneata et al. 2014 Yu et al. 2015 Escorcia et al. 2016 Yuan et al. 2015 Peng and Schmid 2016 He et al. 2018 Sliding windows Action proposals Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.

Efficiently detecting actions Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.

Our model for efficient action detection Detected actions Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 13

Our model for efficient action detection Detected actions Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 14

Our model for efficient action detection Detected actions Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 15

Our model for efficient action detection Detected actions Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 16

Our model for efficient action detection Detected actions [ ] Outputs: Output Detection instance hypothesis [start, end] Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 17

Our model for efficient action detection Detected actions x [ ] Outputs: Output Detection instance hypothesis [start, end] Emission indicator Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 18

Our model for efficient action detection Detected actions x [ ] Outputs: Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 19

Our model for efficient action detection Detected actions x [ ] [ ] Outputs: Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 22

Our model for efficient action detection Detected actions x x [ ] [ ] Outputs: Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 23

Our model for efficient action detection Detected actions x x [ ] [ ] [ ] Outputs: Output Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 26

Our model for efficient action detection Detected [ ] actions x x [ ] [ ] [ ] Outputs: Output Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 27

Our model for efficient action detection Detected [ ] actions x x [ ] [ ] [ ] Outputs: Output Output Output Detection instance hypothesis [start, end] Emission indicator Next frame to glimpse Recurrent neural network … (time information) Convolutional neural network (frame information) Video t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 28

Training the detection instance output Positive video Negative video g 1 g 2 [ ] [ ] Training data t = 0 t = T t = 0 t = T y 2 = 1 y 3 = 2 y 4 = 0 y 1 = 1 [ ] ] [ ] [ ] [ Detections t = 0 t = T t = 0 t = T d 1 d 2 d 3 d 4 Reward for detection cross-entropy L 2 distance classification loss localization loss Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 29

Training the non-differentiable outputs [ ] [ ] Training data t = T t = 0 [ ] [ ] [ ] Detections t = 0 t = T Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 30

Training the non-differentiable outputs [ ] [ ] Training data t = T t = 0 [ ] [ ] [ ] Detections t = 0 t = T d 1 d 2 d 3 (1) whether to predict a detection ⍉ Model’s action Frame 1 Frame 8 Frame 6 Frame 15 sequence a (2) where to look next go to frame 15 go to frame 8 go to frame 6 Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. 31

Analyzing and Predicting Human Activities in Video Greg Mori - PowerPoint PPT Presentation

Analyzing and Predicting Human Activities in Video Greg Mori Professor School of Computing Science Simon Fraser University Research Director Borealis AI Vancouver What does activity recognition involve? Detection: are there people? indoor

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Natural Processes & Human Activities Natural Processes & Human Activities bellwork 1

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Feature engineering Abhishek Trehan People Analytics Practitioner DataCamp Human Resources

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

Video Sur Video Sur rveillance, rveillance, , Video Analyti Video Analyti ics, and You.

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

Cognitive Model Priors for Predicting Human Decisions David Bourgin* 1 Joshua Peterson* 2 Daniel

Predicting and modeling water chemistry Predicting and modeling water chemistry associated with

Predicting Min Predicting Min-Bias and the Bias and the Underlying Event at

Predicting and Comprehending Predicting and Comprehending Asteroid Impacts Asteroid Impacts

Usage of fast NLO in PDF fits Introduction Proton Structure in the LHC Era - School and Workshop

Chapter 3. Distribution of random variables Jan 28, 2016 Huamei Dong 1.6. Checking Normality

Overview of neuro-symbolic processing in Neural Blackboard Architectures Frank van der Velde

Chapter 1, Chapter 2 (Section 2.1) Quiz Review Terms Individuals: objects described by a set

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Probabilistic Model Checking Lecture 4 Prof. Marta Kwiatkowska Department of Computer Science

Welcome to ISPD 2008 The International Symposium on Physical Design Portland, Oregon April 13

NEWS MEDIA, THE PUBLIC SPHERE, AND INFORMED CITIZENSHIP Information Markets and the

Analyzing and Predicting Human Activities in Video Greg Mori - PowerPoint PPT Presentation

Analyzing and Predicting Human Activities in Video Greg Mori Professor School of Computing Science Simon Fraser University Research Director Borealis AI Vancouver What does activity recognition involve? Detection: are there people? indoor

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

Natural Processes &amp; Human Activities Natural Processes &amp; Human Activities bellwork 1

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Feature engineering Abhishek Trehan People Analytics Practitioner DataCamp Human Resources

Understanding Census geography and tigris basics Kyle Walker Instructor DataCamp Analyzing US

PerfMon redux: analyzing a CUDA application with the Windows PerfMon redux: analyzing a CUDA

What are survey weights? Kelly McConville Assistant Professor of Statistics DataCamp Analyzing

Twitter Networks Alex Hanna Computational Social Scientist DataCamp Analyzing Social Media Data

Video Sur Video Sur rveillance, rveillance, , Video Analyti Video Analyti ics, and You.

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/20/2019 NVIDIA Video Technologies Overview Turing

NVIDIA VIDEO TECHNOLOGIES Abhijit Patait, 3/26/2018 NVIDIA Video Technologies Overview Video

Cognitive Model Priors for Predicting Human Decisions David Bourgin* 1 Joshua Peterson* 2 Daniel

Predicting and modeling water chemistry Predicting and modeling water chemistry associated with

Predicting Min Predicting Min-Bias and the Bias and the Underlying Event at

Predicting and Comprehending Predicting and Comprehending Asteroid Impacts Asteroid Impacts

Usage of fast NLO in PDF fits Introduction Proton Structure in the LHC Era - School and Workshop

Chapter 3. Distribution of random variables Jan 28, 2016 Huamei Dong 1.6. Checking Normality

Overview of neuro-symbolic processing in Neural Blackboard Architectures Frank van der Velde

Chapter 1, Chapter 2 (Section 2.1) Quiz Review Terms Individuals: objects described by a set

Applied Automata Theory Roland Meyer TU Kaiserslautern Roland Meyer (TU KL) Applied Automata

Probabilistic Model Checking Lecture 4 Prof. Marta Kwiatkowska Department of Computer Science

Welcome to ISPD 2008 The International Symposium on Physical Design Portland, Oregon April 13

NEWS MEDIA, THE PUBLIC SPHERE, AND INFORMED CITIZENSHIP Information Markets and the

Natural Processes & Human Activities Natural Processes & Human Activities bellwork 1