- Prof. Bernard Ghanem
Learning to Detect Activity in Untrimmed Video Prof. Bernard Ghanem - - PowerPoint PPT Presentation
Learning to Detect Activity in Untrimmed Video Prof. Bernard Ghanem - - PowerPoint PPT Presentation
Learning to Detect Activity in Untrimmed Video Prof. Bernard Ghanem An image is worth a thousand words A video is worth a million words Source: YouTube Image: a tiger attacking a person on a grass field Video: the tiger is being
Bernard Ghanem
An image is worth a thousand words A video is worth a million words
Source: YouTube
“a tiger attacking a person on a grass field” “the tiger is being playful” Image: Video:
Bernard Ghanem
Fun facts about video
By 2017, online video will account for 74% of all online traffic3 45% of people watch more than an hour of Facebook or YouTube videos a week2 Almost 50% of internet users look for videos related to a product or service before visiting a store4 85% of Facebook video is watched without sound5 55% of people watch videos online every day1
Source:Source:1) MWP Statistics, 2015; 2) HubSpot, 2016 3) KPCB, 2016 4) Google, 2016; 5) DIGIDAY, 2016
Bernard Ghanem
Problem: Detecting Human Activities in Video
… … … …
Input
Bernard Ghanem
Problem: Detecting Human Activities in Video
… … … …
Input Output
Class: Pole Vault Bounds: (23.1s, 25.2s)
Bernard Ghanem
Why Activity Detection?
Bernard Ghanem
Bernard Ghanem
- 1. Not enough large-scale training data
- 2. Large number of activities
- 3. Real-time processing is not enough
Challenges of Detecting Human Activities
… … … …
Input Output
Bernard Ghanem
- 1. Not enough large-scale training data
1st Version (R1.1):
- ~200 classes
- ~850 hours
- class hierarchy
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding [CVPR 2015]
Bernard Ghanem
- 1. Not enough large-scale training data
At CVPR 2017 (July 26 – afternoon) http://activity-net.org/challenges/2017
Sponsored by:
ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding [CVPR 2015]
Bernard Ghanem
Classical Activity Detection Pipeline
… …
Basketball Dunk Classifier Volleyball Spiking Classifier
. . .
Bernard Ghanem
Classical Activity Detection Pipeline
… …
Basketball Dunk Classifier Volleyball Spiking Classifier
. . .
Bernard Ghanem
Using proposals is important
… …
Action Proposal Basketball Dunk Classifier Basketball Dunk Classifier Volleyball Spiking Classifier Volleyball Spiking Classifier
Bernard Ghanem
What have we done?
Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos [CVPR 2016] proposals are represented as sparse combinations of STIPs (10FPS on single CPU core) DAPs: Deep Action Proposals for Action Understanding [ECCV 2016] multi-scale (sparse) proposals are output by an LSTM in one pass (130FPS on single GPU) SST: Single-Stream Temporal Action Proposals [CVPR 2017] multi-scale (dense) proposals are scored by a GRU in one pass + streaming (300FPS on single GPU)
Bernard Ghanem
…
Localized Action Detections
… …
Input video Visual Encoder
- Seq. Encoder
Output Proposals
… k · δ maximum proposal size
(per output)
k - proprosals
classifier Untrimmed Input Video Temporal Action Proposals
- utput
(time step t)
ct
⬄
δ
SST
ϕ ϕ ϕ ϕ ϕ ϕ
Time
SST: Single Stream Temporal Action Proposals
Bernard Ghanem
SS-TAD: Single Stream Temporal Action Detection
End-to-End, Single-Stream Temporal Action Detection in Untrimmed Videos [BMVC 2017] multi-scale (dense) detection are scored in one pass + streaming (700FPS on TitanX GPU)
SS-TAD Proposals Classifiers Frame-level Classifiers Merging/Smoothing
Untrimmed Video Input Action Detections (a) (b) (c)
Bernard Ghanem
SS-TAD: Single Stream Temporal Action Detection
Key Detection Ground-truth Time (Actions are played at 1x speed, Background video is sped up)
Bernard Ghanem
- Applying activity detectors for large number of
activity classes is expensive.
- Can we do better than linear computational
growth with # of activity classes?
- 2. Large number of activities
Bernard Ghanem
Activity-Object and Activity-Scene Relations
SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017] DAPs: Deep Action Proposals for Action Understanding [ECCV 2016]
Bernard Ghanem
Typical Activity Detection Pipeline
SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017] DAPs: Deep Action Proposals for Action Understanding [ECCV 2016]
Video Sequence
Action Proposals (Stage 1)
Action Proposals
Action Classifiers (Stage 2)
Reject
Bernard Ghanem
SCC: Semantic Context Cascade
SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017]
Bernard Ghanem
SCC: Semantic Context Cascade
SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017]
Bernard Ghanem
SCC: Semantic Context Cascade
SCC: Semantic Context Cascade for Efficient Action Detection [CVPR 2017]
Bernard Ghanem
- In the past, real-time processing was a “good-to-have”,
i.e. 1min video → 1min processing
- But, not anymore!
- We need to stay ahead of the increasing video upload
- rate. How?
hardware acceleration (GPUs)
more efficient implementation
do we need to visit every frame?
- 3. Real-time processing is not enough
Bernard Ghanem
Do we have to visit every frame?
Search History t
- Log how human annotator
moves the time slider instead
- f throwing it away
- Can we learn from how humans
move the slider to localize activities?
Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018]
Bernard Ghanem 𝑢 𝑢
Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018]
Bernard Ghanem 𝑢
LSTM 3D ConvNet Target Activity 𝒀: Visual Observation 𝒘: Feature Vector 𝒊: LSTM State 𝑔 𝒀 : Temporal Location
𝒊𝑗−2 𝑔(𝒀𝑗−2) 𝑔(𝒀𝑗−3)
. . .
𝒊𝑗−3 𝑔(𝒀𝑗−3)
. . .
𝒀𝑗−2 𝒘𝑗−2 𝑔(𝒀𝑗−2) 𝑢
Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018]
𝒊𝑗−1 𝑔(𝒀𝑗−1) 𝒘𝑗−1 𝒀𝑗−1 𝑔(𝒀𝑗−1) 𝒘𝑗 𝒊𝑗 𝑔(𝒀𝑗) 𝒀𝑗 𝑔(𝒀𝑗) 𝒀𝑗+1 𝒘𝑗+1 𝒊𝑗+1 𝑔(𝒀𝑗+1)
. . . . . .
𝑔(𝒀𝑗+1)
Bernard Ghanem
Action Search or Action Spotting
Action Search: Learning to Search for Human Activities in Untrimmed Videos [arXiv 2017][To be submitted to CVPR2018]
Activity: “shot put” Activity: “shot put” Activity: “basketball dunk”
Bernard Ghanem
SPONSORS
Bernard Ghanem
- Prof. Bernard Ghanem
bernard.ghanem@kaust.edu.sa ivul.kaust.edu.sa
baseball throw dunk shoveling washing dishes pole vault dancing