MIRIS: Fast Object Track Queries in Video Favyen Bastani, Songtao - - PowerPoint PPT Presentation

miris fast object track queries in video
SMART_READER_LITE
LIVE PREVIEW

MIRIS: Fast Object Track Queries in Video Favyen Bastani, Songtao - - PowerPoint PPT Presentation

MIRIS: Fast Object Track Queries in Video Favyen Bastani, Songtao He, Arjun Balasingam, Karthik Gopalakrishnan, Mohammad Alizadeh, Hari Balakrishnan, Michael Cafarella, Tim Kraska, Sam Madden MIT CSAIL Traffic Cameras Dashcams Miscellaneous


slide-1
SLIDE 1

MIRIS: Fast Object Track Queries in Video

Favyen Bastani, Songtao He, Arjun Balasingam, Karthik Gopalakrishnan, Mohammad Alizadeh, Hari Balakrishnan, Michael Cafarella, Tim Kraska, Sam Madden MIT CSAIL

slide-2
SLIDE 2

Traffic Cameras Miscellaneous Dashcams

slide-3
SLIDE 3

Video Analytics

Debugging Autonomous Vehicle Software Traffic Planning Finding Interesting Events Real-Time Mapping

slide-4
SLIDE 4

Prior Work [1, 2, 3]

[1] NoScope: Optimizing Neural Network Queries over Video at Scale. Daniel Kang et al. VLDB 2017. [2] Accelerating Machine Learning Inference with Probabilistic Predicates. Yao Lu et al. SIGMOD 2018. [3] BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics. Daniel Kang et al. VLDB 2020.

Select video frames with three buses

slide-5
SLIDE 5

Prior Work [1, 2, 3]

[1] NoScope: Optimizing Neural Network Queries over Video at Scale. Daniel Kang et al. VLDB 2017. [2] Accelerating Machine Learning Inference with Probabilistic Predicates. Yao Lu et al. SIGMOD 2018. [3] BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics. Daniel Kang et al. VLDB 2020.

Object Detector Object Detector Object Detector

slide-6
SLIDE 6

Prior Work [1, 2, 3]

[1] NoScope: Optimizing Neural Network Queries over Video at Scale. Daniel Kang et al. VLDB 2017. [2] Accelerating Machine Learning Inference with Probabilistic Predicates. Yao Lu et al. SIGMOD 2018. [3] BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics. Daniel Kang et al. VLDB 2020.

Object Detector Object Detector Object Detector

slide-7
SLIDE 7

Prior Work [1, 2, 3]

[1] NoScope: Optimizing Neural Network Queries over Video at Scale. Daniel Kang et al. VLDB 2017. [2] Accelerating Machine Learning Inference with Probabilistic Predicates. Yao Lu et al. SIGMOD 2018. [3] BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics. Daniel Kang et al. VLDB 2020.

Object Detector Object Detector Object Detector 3 1

slide-8
SLIDE 8

Prior Work [1, 2, 3]

[1] NoScope: Optimizing Neural Network Queries over Video at Scale. Daniel Kang et al. VLDB 2017. [2] Accelerating Machine Learning Inference with Probabilistic Predicates. Yao Lu et al. SIGMOD 2018. [3] BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics. Daniel Kang et al. VLDB 2020.

Approximate Classifier Approximate Classifier Approximate Classifier 0.03 ❌ 0.96 0.23 Fast, Inaccurate

slide-9
SLIDE 9

Prior Work [1, 2, 3]

[1] NoScope: Optimizing Neural Network Queries over Video at Scale. Daniel Kang et al. VLDB 2017. [2] Accelerating Machine Learning Inference with Probabilistic Predicates. Yao Lu et al. SIGMOD 2018. [3] BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics. Daniel Kang et al. VLDB 2020.

Approximate Classifier Approximate Classifier Approximate Classifier 0.03 0.96 0.23

X

Object Detector Object Detector 3 buses ✅

slide-10
SLIDE 10

Prior Work [1, 2, 3]

[1] NoScope: Optimizing Neural Network Queries over Video at Scale. Daniel Kang et al. VLDB 2017. [2] Accelerating Machine Learning Inference with Probabilistic Predicates. Yao Lu et al. SIGMOD 2018. [3] BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics. Daniel Kang et al. VLDB 2020.

Approximate Classifier Approximate Classifier Approximate Classifier 0.03 0.96 0.23

X

Object Detector Object Detector 3 buses ✅ Only 1 bus ❌

Query: Select video frames with three buses

slide-11
SLIDE 11

Object Track Queries

<(t1, x1, y1, w1, h1), … , (tn, xn, yn, wn, hn)>

slide-12
SLIDE 12

Object Track Queries

slide-13
SLIDE 13

Object Track Queries

slide-14
SLIDE 14

Object Track Queries

slide-15
SLIDE 15

Object Track Queries

slide-16
SLIDE 16

Find cars that rapidly decelerate

slide-17
SLIDE 17

Find cars that rapidly decelerate Given track A: select A if there is a 1 sec interval I such that, if v1 is A’s velocity in first half of I, and v2 is velocity in second half, then v1 - v2 exceeds a threshold.

slide-18
SLIDE 18

Find bears catching salmon

slide-19
SLIDE 19

Find bears catching salmon Given bear A and salmon B: select (A, B) if A and B intersect for at least two seconds.

slide-20
SLIDE 20

Find cars that run a red light

slide-21
SLIDE 21

Find cars that run a red light Given car A and red light B: select (A, B) if A starts in bottom-right and ends in top-left, and the interval of A is contained in the interval of B.

slide-22
SLIDE 22
slide-23
SLIDE 23

Object Detector Object Detector Object Detector Object Detector Object Detector Object Detector

slide-24
SLIDE 24

Object Detector Object Detector Object Detector Object Detector Object Detector Object Detector

slide-25
SLIDE 25

Object Detector Object Detector Object Detector Object Detector Object Detector Object Detector

slide-26
SLIDE 26

Object Detector Object Detector Object Detector Object Detector Object Detector Object Detector

  • Costly!
  • On $10,000 GPU, object

detection runs at ~30 fps

  • On AWS, $1 per video hour
  • => $72K to execute query
  • ver one month of video

captured from 100 cameras

slide-27
SLIDE 27

Object Detector Object Detector

slide-28
SLIDE 28

Low-Framerate Tracking: Matching Errors

slide-29
SLIDE 29

Low-Framerate Tracking: Matching Errors

0.38 0.42 0.02

slide-30
SLIDE 30

Low-Framerate Tracking: Predicate Errors

slide-31
SLIDE 31

Predicate Errors

slide-32
SLIDE 32

Predicate Errors

slide-33
SLIDE 33

Predicate Errors

slide-34
SLIDE 34

Predicate Errors

slide-35
SLIDE 35

MIRIS: Fast Object Track Queries over Video

Key ideas:

  • Track at low framerate; but may need to re-visit some intermediate frames
  • Query Planning + Object Tracking

○ Parameterizable query-driven object tracking method ○ Query planner to select the parameters using AQP techniques

slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39

10 sec 12 sec 14 sec 16 sec 18 sec Object Detections

slide-40
SLIDE 40

10 sec 12 sec 14 sec 16 sec 18 sec Object Detections

slide-41
SLIDE 41

10 sec 12 sec 14 sec 16 sec 18 sec

slide-42
SLIDE 42

10 sec 12 sec 14 sec 16 sec 18 sec Object Track

slide-43
SLIDE 43

10 sec 12 sec 14 sec 16 sec 18 sec

slide-44
SLIDE 44

Low-Framerate Tracking: Matching Errors

0.38 0.42 0.02

slide-45
SLIDE 45

Low-Framerate Tracking: Matching Errors

0.38 0.42 0.02

Close: keep both

slide-46
SLIDE 46

10 sec 12 sec 14 sec 16 sec 18 sec

slide-47
SLIDE 47

Filtering

  • Remove groups of paths that we are sure do not satisfy the predicate
  • Several filtering methods for planner to choose from: nearest-neighbor, RNN
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51

Refinement: Address Predicate Errors

slide-52
SLIDE 52

MIRIS: Fast Object Track Queries over Video

Key ideas:

  • Track at low framerate; but may need to re-visit some intermediate frames
  • Query Planning + Object Tracking

○ Parameterizable query-driven object tracking method ○ Query planner to select the parameters using AQP techniques

slide-53
SLIDE 53

Query Planning

Video Dataset

Select tracks satisfying P, with 99% accuracy.

slide-54
SLIDE 54

Query Planning

Video Dataset

Select tracks satisfying P, with 99% accuracy.

slide-55
SLIDE 55

Query Planning

Video Dataset

Sampled Video Segments Select tracks satisfying P, with 99% accuracy.

slide-56
SLIDE 56

Query Planning

Video Dataset

Sampled Video Segments Select tracks satisfying P, with 99% accuracy.

slide-57
SLIDE 57

Query Planning

Video Dataset

Sampled Video Segments Select tracks satisfying P, with 99% accuracy.

slide-58
SLIDE 58

Query Planning

Video Dataset

Sampled Video Segments Select tracks satisfying P, with 99% accuracy.

Filtering Uncertainty Resolution Refinement Initial Tracking

slide-59
SLIDE 59

Query Planning

Video Dataset

Sampled Video Segments Select tracks satisfying P, with 99% accuracy.

Filtering Uncertainty Resolution Refinement Initial Tracking

Sampling Framerate

Parameters:

“Closeness” Threshold

Parameters:

slide-60
SLIDE 60

Query Planning

Video Dataset

Sampled Video Segments Select tracks satisfying P, with 99% accuracy.

Filtering Uncertainty Resolution Refinement Initial Tracking

Sampling Framerate “Closeness” Threshold NND RNN Prefix- Suffix Accel

Parameters: Parameters: Methods: Methods:

RNN T T T T T

Per-method threshold parameters

slide-61
SLIDE 61

Evaluation: 9 Queries over 5 Video Sources

Diverse range of video sources:

  • UAV: video captured by UAV over traffic junction
  • Tokyo, Warsaw: video captured by fixed traffic camera
  • Resort: video of a pedestrian walkway
  • BDD: dashcam video
slide-62
SLIDE 62

Higher Speed Higher Accuracy

[1] Simple Online and Realtime Tracking. Alex Bewley et al. ICIP 2016. [2] High-Speed Tracking with Kernelized Correlation Filters. Joao Henriques et al. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014. [3] FlowNet: Learning Optical Flow with Convolutional Networks. Alexey Dosovitskiy et al. ICCV 2015. [4] NoScope: Optimizing Neural Network Queries over Video at Scale. Daniel Kang et al. VLDB 2017. [5] Accelerating Machine Learning Inference with Probabilistic Predicates. Yao Lu et al. SIGMOD 2018. [6] BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics. Daniel Kang et al. VLDB 2020.

Four baselines:

  • Overlap-based tracking [1]
  • Kernel correlation filters (KCF) [2]
  • FlowNet [3]
  • Probabilistic predicates [4, 5, 6]
slide-63
SLIDE 63

Higher Speed Higher Accuracy

Four baselines:

  • Overlap-based tracking [1]
  • Kernel correlation filters (KCF) [2]
  • FlowNet [3]
  • Probabilistic predicates [4, 5, 6]

GNN: apply our tracker model without filtering, uncertainty resolution, and refinement

[1] Simple Online and Realtime Tracking. Alex Bewley et al. ICIP 2016. [2] High-Speed Tracking with Kernelized Correlation Filters. Joao Henriques et al. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014. [3] FlowNet: Learning Optical Flow with Convolutional Networks. Alexey Dosovitskiy et al. ICCV 2015. [4] NoScope: Optimizing Neural Network Queries over Video at Scale. Daniel Kang et al. VLDB 2017. [5] Accelerating Machine Learning Inference with Probabilistic Predicates. Yao Lu et al. SIGMOD 2018. [6] BlazeIt: Optimizing Declarative Aggregation and Limit Queries for Neural Network-Based Video Analytics. Daniel Kang et al. VLDB 2020.

slide-64
SLIDE 64

Higher Speed Higher Accuracy

slide-65
SLIDE 65

Higher Speed Higher Accuracy

slide-66
SLIDE 66

Higher Speed Higher Accuracy

slide-67
SLIDE 67

Conclusion

  • MIRIS is an approach for efficiently executing object track queries on large

video datasets

  • Provides a 9x average speedup (at the highest accuracy levels)
  • Code: https://favyen.com/miris/