Wrap Up Lecture Instructor - Simon Lucey 16-423 - Designing - - PowerPoint PPT Presentation

wrap up lecture
SMART_READER_LITE
LIVE PREVIEW

Wrap Up Lecture Instructor - Simon Lucey 16-423 - Designing - - PowerPoint PPT Presentation

Wrap Up Lecture Instructor - Simon Lucey 16-423 - Designing Computer Vision Apps Today Review - Project Presentations Emerging Trends in Mobile Vision Project Discussions Reminder - Project Presentation Each team will be


slide-1
SLIDE 1

Wrap Up Lecture

Instructor - Simon Lucey

16-423 - Designing Computer Vision Apps

slide-2
SLIDE 2

Today

  • Review - Project Presentations
  • Emerging Trends in Mobile Vision
  • Project Discussions
slide-3
SLIDE 3

Reminder - Project Presentation

  • Each team will be given approximately 2.5 minutes per

member to present (for example a 2 member team will have 5 minutes allotted).

  • Each team will fill out the following form, providing a short

(must be shorter than your allotted time) YouTube clip describing your App in action.

  • Teams can submit their YouTube clips through the form

http://goo.gl/forms/YoeQt0c1Hf.

  • 16423 staff will select the best presentations, with the winner

receiving the the best project prize.

slide-4
SLIDE 4

Today

  • Review - Project Presentations
  • Emerging Trends in Mobile Vision
  • Project Discussions
slide-5
SLIDE 5

Why is Mobile CV Different?

5

slide-6
SLIDE 6

Why is Mobile CV Different?

5

slide-7
SLIDE 7

Why is Mobile CV Different?

6

slide-8
SLIDE 8

Why is Mobile CV Different?

6

slide-9
SLIDE 9

Why is Mobile CV Different?

6

slide-10
SLIDE 10

Balancing Power versus Perception

7

slide-11
SLIDE 11

Algorithm Software Architecture SOC Hardware

slide-12
SLIDE 12

Algorithm Software Architecture SOC Hardware

Correlation Filters with Limited Boundaries

Hamed Kiani Galoogahi Istituto Italiano di Tecnologia Genova, Italy

hamed.kiani@iit.it

Terence Sim National University of Singapore Singapore

tsim@comp.nus.edu.sg

Simon Lucey Carnegie Mellon University Pittsburgh, USA

slucey@cs.cmu.edu

Abstract

Correlation filters take advantage of specific proper- ties in the Fourier domain allowing them to be estimated efficiently: O(ND log D) in the frequency domain, ver- sus O(D3 + ND2) spatially where D is signal length, and N is the number of signals. Recent extensions to cor- relation filters, such as MOSSE, have reignited interest of their use in the vision community due to their robustness and attractive computational properties. In this paper we demonstrate, however, that this computational efficiency comes at a cost. Specifically, we demonstrate that only 1 D proportion of shifted examples are unaffected by boundary effects which has a dramatic effect on detection/tracking
  • performance. In this paper, we propose a novel approach
to correlation filter estimation that: (i) takes advantage of inherent computational redundancies in the frequency do- main, (ii) dramatically reduces boundary effects, and (iii) is able to implicitly exploit all possible patches densely ex- tracted from training examples during learning process. Im- pressive object tracking and detection results are presented in terms of both accuracy and computational efficiency.
  • 1. Introduction
Correlation between two signals is a standard approach to feature detection/matching. Correlation touches nearly every facet of computer vision from pattern detection to ob- ject tracking. Correlation is rarely performed naively in the spatial domain. Instead, the fast Fourier transform (FFT) affords the efficient application of correlating a desired tem- plate/filter with a signal. Correlation filters, developed initially in the seminal work of Hester and Casasent [15], are a method for learning a template/filter in the frequency domain that rose to some prominence in the 80s and 90s. Although many variants have been proposed [15, 18, 20, 19], the approach’s central tenet is to learn a filter, that when correlated with a set of training signals, gives a desired response, e.g. Figure 1 (b). Like correlation, one of the central advantages of the ap- (a) (b)
  • (c)
(d) Figure 1. (a) Defines the example of fixed spatial support within the image from which the peak correlation output should occur. (b) The desired output response, based on (a), of the correlation filter when applied to the entire image. (c) A subset of patch ex- amples used in a canonical correlation filter where green denotes a non-zero correlation output, and red denotes a zero correlation
  • utput in direct accordance with (b). (d) A subset of patch ex-
amples used in our proposed correlation filter. Note that our pro- posed approach uses all possible patches stemming from different parts of the image, whereas the canonical correlation filter simply employs circular shifted versions of the same single patch. The central dilemma in this paper is how to perform (d) efficiently in the Fourier domain. The two last patches of (d) show that D−1 T patches near the image border are affected by circular shift in our method which can be greatly diminished by choosing D << T, where D and T indicate the length of the vectorized face patch in (a) and the whole image in (a), respectively. proach is that it attempts to learn the filter in the frequency domain due to the efficiency of correlation in that domain. Interest in correlation filters has been reignited in the vi- sion world through the recent work of Bolme et al. [5] on Minimum Output Sum of Squared Error (MOSSE) correla- tion filters for object detection and tracking. Bolme et al.’s work was able to circumvent some of the classical problems
slide-13
SLIDE 13

Algorithm Software Architecture SOC Hardware

Ax = b

slide-14
SLIDE 14

Algorithm Software Architecture Hardware

slide-15
SLIDE 15

Algorithm Software Architecture Hardware

slide-16
SLIDE 16

Algorithm Software Architecture SOC Hardware

SIMD (Single Instruction, Multiple Data)

slide-17
SLIDE 17

Algorithm Software Architecture SOC Hardware

  • (length 2, 4, 8, …) vectors of integers or floats

Names: MMX, SSE, SSE2, …

  • +

x

4-way

SIMD (Single Instruction, Multiple Data)

slide-18
SLIDE 18

Algorithm Software Architecture SOC Hardware

slide-19
SLIDE 19

Algorithm Software Architecture SOC Hardware

APIs in the current versions of OpenGL ES do not have the “scatter”

slide-20
SLIDE 20

Algorithm Software Architecture SOC Hardware

APIs in the current versions of OpenGL ES do not have the “scatter”

slide-21
SLIDE 21

Algorithm Software Architecture SOC Hardware

slide-22
SLIDE 22

Algorithm Software Architecture SOC Hardware

Optimize

slide-23
SLIDE 23

Algorithm Software Architecture SOC Hardware

Optimize

slide-24
SLIDE 24

Algorithm Software Architecture SOC Hardware

Optimize

slide-25
SLIDE 25

OpenCV MATLAB

slide-26
SLIDE 26

OpenCV MATLAB

slide-27
SLIDE 27

OpenCV MATLAB

slide-28
SLIDE 28

Some Insights for Mobile CV

  • Very difficult to write the fastest code.
  • When you are prototyping an idea you should not worry about this, but
  • You have to be aware of where bottle necks can occur.
  • This is what you will learn in this course.
  • Highest performance in general is non-portable.
  • If you want to get the most out of your system it is good to go deep.
  • However, options like OpenCV are good when you need to build

something quickly that works.

  • To build good computer vision apps you need to know them

algorithmically.

  • Simply knowing how to write fast code is not enough.
  • You need to also understand computer vision algorithmically.
  • OpenCV can be dangerous here.

Some insights taken from Markus Püschel’s lectures on “How to Write fast Numerical Code”.

slide-29
SLIDE 29

Source: http://www.slashgear.com/iphone-7-potential-wanes-as-android-n-starts-to-tango-20440932/

slide-30
SLIDE 30

Source: http://www.slashgear.com/iphone-7-potential-wanes-as-android-n-starts-to-tango-20440932/

slide-31
SLIDE 31

Better Selfies

Ohad Fried, Eli Shechtman, Dan B Goldman, and Adam Finkelstein. Perspective-aware Manipulation of Portrait Photos. ACM Transactions on Graphics (Proc. SIGGRAPH), July 2016.

slide-32
SLIDE 32

Better Selfies

Ohad Fried, Eli Shechtman, Dan B Goldman, and Adam Finkelstein. Perspective-aware Manipulation of Portrait Photos. ACM Transactions on Graphics (Proc. SIGGRAPH), July 2016.

slide-33
SLIDE 33

Emerging Trends - Low Power

18

Taken from: http://lpirc.net/

slide-34
SLIDE 34

Emerging Trends - High Speed Camera

19

iPhone 6 Samsung Galaxy S5

slide-35
SLIDE 35

Emerging Trends - High Speed Camera

19

iPhone 6 Samsung Galaxy S5

slide-36
SLIDE 36

Depth From Shake

  • H. Alismail, B. Browning, S. Lucey. “Photometric Bundle Adjustment" ACCV 2016.
slide-37
SLIDE 37

Depth From Shake

  • H. Alismail, B. Browning, S. Lucey. “Photometric Bundle Adjustment" ACCV 2016.
slide-38
SLIDE 38

Results

slide-39
SLIDE 39

Results

slide-40
SLIDE 40

Results

slide-41
SLIDE 41

Results

slide-42
SLIDE 42

Emerging Trends - Depth Cameras

23

slide-43
SLIDE 43

24

Kinect 2 Sensor Standard Basketball Hoop

slide-44
SLIDE 44

Ball Tracking

slide-45
SLIDE 45

Ball Tracking

slide-46
SLIDE 46

Ball Tracking

slide-47
SLIDE 47

26

slide-48
SLIDE 48

26

slide-49
SLIDE 49

Limitations - Range

27

slide-50
SLIDE 50

Limitations - Ambient Light

  • A sunny day on Earth can reach up to 1120Wm-2
  • Tabletop projector releases on average 10W of light.

28

500 1000 1500 2000 2500 3000 3500 4000 0.5 1 1.5 2 2.5 Wavelength (in nm) Spectral Irradiance (in Wm−2nm−1) Extraterrestrial Radiation Direct + Circumsolar Irradiance

slide-51
SLIDE 51

The Future

slide-52
SLIDE 52

The Future

slide-53
SLIDE 53

Emerging Trends - Augmented Reality

30

slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56

Emerging Trends - Mobile SLAM

33

  • P. Tanskanen, K. Kolev, L. Meier, F. Camposeco, O. Saurer, M. Pollefeys : Live metric 3d reconstruction on mobile phones. (ICCV 2013)
slide-57
SLIDE 57

Emerging Trends - Mobile SLAM

33

  • P. Tanskanen, K. Kolev, L. Meier, F. Camposeco, O. Saurer, M. Pollefeys : Live metric 3d reconstruction on mobile phones. (ICCV 2013)
slide-58
SLIDE 58

Emerging Trends - Mobile SLAM

33

  • P. Tanskanen, K. Kolev, L. Meier, F. Camposeco, O. Saurer, M. Pollefeys : Live metric 3d reconstruction on mobile phones. (ICCV 2013)
slide-59
SLIDE 59

Photometric Bundle Adjustment

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.
slide-60
SLIDE 60

Photometric Bundle Adjustment

F - frames

F

X

r=1

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.
slide-61
SLIDE 61

Photometric Bundle Adjustment

F - frames

F

X

r=1

“reference frame”

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.
slide-62
SLIDE 62

Photometric Bundle Adjustment

F - frames

F

X

r=1

X

x∈Ir

x ∈ Ir

“reference frame”

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.
slide-63
SLIDE 63

Photometric Bundle Adjustment

F - frames

F

X

r=1

X

x∈Ir

x ∈ Ir X

f∈obs(x)

“reference frame”

arg min

λ,θ

||Ir(x) − If(W{x; θf, λr(x)})||2

2

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.
  • H. Alismail, B. Browning and S. Lucey “Photometric Bundle Adjustment for Vision-based SLAM”, ACCV 2016.
slide-64
SLIDE 64

DSO SLAM

ORB-SLAM:

s_01 s_10 s_20 s_30 s_40 s_50 Fwd Bwd

2 4 6 8 10

DSO:

s_01 s_10 s_20 s_30 s_40 s_50 Fwd Bwd

2 4 6 8 10

Full evaluation result.

  • J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.

All error values for the TUM- monoVO dataset.

slide-65
SLIDE 65

36

slide-66
SLIDE 66

36

slide-67
SLIDE 67

Emerging Trends - Deep Learning

ImageNet Challenge Year

BC

(before ConvNets)

AD

(after deep learning)

6.8%

slide-68
SLIDE 68

Emerging Trends - Deep Vision

38

slide-69
SLIDE 69

Convolutional Pose Machines

9×9 C 1×1 C 1×1 C 1×1 C 1×1 C

11×11

C

11×11

C

Loss Loss

f1 f2

(c) Stage 1

Input Image h×w×3

Input Image h×w×3

9×9 C 9×9 C 9×9 C 2× P 2× P 5×5 C 2× P 9×9 C 9×9 C 9×9 C 2× P 2× P 5×5 C 2× P

11×11

C

(e) Effective Receptive Field

x x0

g1

g2

gT

b1 b2 bT ψ2 ψT

(a) Stage 1

Pooling

P

Convolution

C

x0

Convolutional Pose Machines (T–stage) x x0

h0×w0 ×(P + 1) h0×w0 ×(P + 1)

(b) Stage ≥ 2 (d) Stage ≥ 2

9 × 9 26 × 26 60 × 60 96 × 96 160 × 160 240 × 240 320 × 320 400 × 400

  • S. Wei, et al. “Convolutional Pose Machines”. In CVPR 2016.
slide-70
SLIDE 70

High-Fidelity Deep Learning

  • S. Wei, et al. “Convolutional Pose Machines”. In CVPR 2016.
slide-71
SLIDE 71

High-Fidelity Deep Learning

  • S. Wei, et al. “Convolutional Pose Machines”. In CVPR 2016.
slide-72
SLIDE 72

Current State of the Art

CAD Selection (VGG-NET) Extrinsics Selection (VGG-NET)

. . .

. . .

image patch 3@ (224x224)

  • H. Su, et al. “Render for CNN: Viewpoint Estimation in

Images Using CNNs Trained with Rendered 3D Model Views”. In ICCV 2015.

  • A. Bansal, et al. “Marr Revisited: 2D-3D Alignment

via Surface Normal Prediction”. In CVPR 2016.

slide-73
SLIDE 73

Current State of the Art

CAD Selection (VGG-NET) Extrinsics Selection (VGG-NET)

. . .

. . .

image patch 3@ (224x224)

  • H. Su, et al. “Render for CNN: Viewpoint Estimation in

Images Using CNNs Trained with Rendered 3D Model Views”. In ICCV 2015.

  • A. Bansal, et al. “Marr Revisited: 2D-3D Alignment

via Surface Normal Prediction”. In CVPR 2016.

slide-74
SLIDE 74

Compressible CAD Models

slide-75
SLIDE 75

Compressible CAD Models

slide-76
SLIDE 76

3D Convolutional LSTM

  • Front/Side
  • Back
  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-77
SLIDE 77

Recurrent Neural Network

[Christopher Olah] Understanding LSTM Networks, http:// colah.github.io/posts/2015-08-Understanding-LSTMs/

slide-78
SLIDE 78

3D-R2N2

  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-79
SLIDE 79

3D-R2N2

  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-80
SLIDE 80

3D-R2N2

  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-81
SLIDE 81

3D Convolutional LSTM

3D-R2N2

  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-82
SLIDE 82

3D-R2N2

  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-83
SLIDE 83

3D-R2N2

  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-84
SLIDE 84

3D-R2N2

  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-85
SLIDE 85

3D-R2N2

  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-86
SLIDE 86

3D-R2N2

  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-87
SLIDE 87

MVS Ours 20 30 40

  • No. of views
  • C. Choy, et al. “3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction”. In ECCV 2016.
slide-88
SLIDE 88

Emerging Trends - Deep Learning

  • Data and Compute Speed Matter
  • Energy and Space do NOT!!!

Training

slide-89
SLIDE 89

Emerging Trends - Deep Learning

  • Data and Compute Speed Matter
  • Energy and Space now Matter!!!!

Testing

slide-90
SLIDE 90

Emerging Trends - Deep Learning

  • Data and Compute Speed Matter
  • Energy and Space now Matter!!!!

Testing

slide-91
SLIDE 91

Lower Precision

R

32Xbit'

B

1Xbit'

Reducing'Precision'

  • Saving'Memory'
  • Saving'ComputaIon'

∈ {−1, +1}

{X1,+1}' {0,1}' MUL' XNOR' ADD,'SUB' BitXCount'(popcount)' [Han'et'al.'2016]'

−0.05 0.05 1600 3200 4800 6400

8Xbit'

I

Lower Precision

Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

slide-92
SLIDE 92

Why Binary?

  • Binary'InstrucIons''
  • AND,'OR,'XOR,'XNOR,''PoPCount'(BitXCount)'

'

  • Low'Power'Device'

Rastegari, Mohammad, et al. "XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks." ECCV 2016

slide-93
SLIDE 93

Reminder

+''−''×' 1x' 1x'

OperaIons' Memory' ComputaIon'

+''−''' ~32x' ~2x' XNOR' BitXcount' ~32x' ~58x'

I ⇤ I ⇤ I ⇤ I ⇤

R R R

R B R B R B

XNORXNetworks'

slide-94
SLIDE 94

Reminder: ASICs for Low Energy

  • Application Specific Integrated Circuits (ASIC)
  • ASICs are perfect for targeting a specific application domain.
  • Inherently low-power as they are “frozen in silicon” for a

specific application domain (e.g. graphics cards, ethernet cards, DSPs, etc.).

  • Drawbacks,
  • incredibly expensive to develop.
  • time consuming and resource-intensive to develop.
  • Positives,
  • Extremely energy efficient.

57

slide-95
SLIDE 95

Emerging Trends - “Frozen” DeepNets

58

(Taken from recent talk by Yann Lecunn at Hot Chips conference in May 2015).

slide-96
SLIDE 96

Emerging Trends - Deep Learning

59

APIs in the current versions of OpenGL ES do not have the “scatter”

slide-97
SLIDE 97

Emerging Trends - Deep Learning

59

APIs in the current versions of OpenGL ES do not have the “scatter”

Deep Learning

Check out this article.

slide-98
SLIDE 98

Emerging Trends - Deep Learning

60

(Taken from recent talk by Yann Lecunn at Hot Chips conference in May 2015).