Computer Vision and Deep Learning Introduction to Data Science 2019 - - PowerPoint PPT Presentation

computer vision and deep learning
SMART_READER_LITE
LIVE PREVIEW

Computer Vision and Deep Learning Introduction to Data Science 2019 - - PowerPoint PPT Presentation

Computer Vision and Deep Learning Introduction to Data Science 2019 University of Helsinki Mats Sj oberg mats.sjoberg@csc.fi CSC IT Center for Science September 23, 2019 Computer vision Giving computers the ability to understand


slide-1
SLIDE 1

Computer Vision and Deep Learning

Introduction to Data Science 2019 University of Helsinki Mats Sj¨

  • berg

mats.sjoberg@csc.fi

CSC – IT Center for Science

September 23, 2019

slide-2
SLIDE 2

Computer vision

Giving computers the ability to understand visual information Examples:

◮ A robot that can move around obstacles by analysing the

input of its camera(s)

◮ A computer system finding images of cats among millions of

images (e.g., on the Internet).

2/45

slide-3
SLIDE 3

From picture to pixels

◮ The camera image needs to be digitised for computer

processing

◮ Turning it into millions of discrete picture elements, or pixels

  • 0.4941 0.4941 0.4745 0.4901 0.4745

0.4941 0.4941 0.5058 0.4941 0.4980 0.4980 0.4941 0.4862 0.4705 0.4941 0.5019 0.4980 0.4980 0.4901 0.5098 0.5098 0.5058 0.5215 0.5098 0.5058

“There’s a cat among some flowers in the grass”

◮ How do we get from pixels to understanding? ◮ . . . or even some kind of useful/actionable interpretation.

3/45

slide-4
SLIDE 4

Deep learning

Before

◮ Hand-crafted features, e.g., colour distributions, edge

histograms

◮ Complicated feature selection mechanisms ◮ “Classical” machine learning, e.g., kernel methods (SVM)

About 5 years ago: deep learning

◮ End-to-end learning, i.e., the network itself learns the features ◮ Each layer typically learns a higher level of representation ◮ However: entirely data-driven, features can be hard to

interpret Computer vision was one of the first breakthroughs of deep learning.

4/45

slide-5
SLIDE 5

Deep learning = neural networks

Fully connected or dense layer

. . . f (·)

. . .

f (·)

x1 x2 xn y1 ym wji yj = f (n

i=1 wjixi)

y = f           w11 w12 . . . w1n w21 w22 . . . w2n . . . . . . ... . . . wm1 wm2 . . . wmn         x1 . . . xn         = f (WTx)

(we’re ignoring the bias term here . . . )

5/45

slide-6
SLIDE 6

Learning in neural networks

◮ Feedforward network has a huge number of parameters that

need to be learned

◮ Each output node interacts with every input node via the

weights in W

◮ n × m weights (and that’s just one layer!) ◮ Learning is typically done with stochastic gradient descent

http://ruder.io/optimizing-gradient-descent/

◮ Gradients for each neuron obtained with backpropagation ◮ Given enough time and data the network can in theory learn

to model any complex phenomena (Universal approximation theorem)

◮ In practice, we often use domain knowledge to restrict the

number of parameters that need to be learned. http://playground.tensorflow.org/

6/45

slide-7
SLIDE 7

Deep learning for vision

While we don’t hand-craft features anymore . . . In practice we still apply some “expert knowledge” to make learning feasible . . .

◮ Neighbouring pixels are probably related (convolutions) ◮ There are common image features which can appear anywhere

such as edges, corners, etc (weight sharing)

◮ Often the exact location of a feature isn’t important (max

pooling) ⇒ Convolutional neural networks (CNN, ConvNet).

7/45

slide-8
SLIDE 8

Feedforward to convolutional net

x1 x2 x3 x4 x5 x6 x7 y1 y2 y3 y4 y5 y6 y7 w11 w21 . . . w77

Network changes from this . . .

x1 x2 x3 x4 x5 x6 x7 y1 y2 y3 y4 y5 y6 y7 w1 w1 w1 w1 w1 w1 w1 w2 w2 w2 w2 w2 w2

to this.

8/45

slide-9
SLIDE 9

Convolution in 2D

◮ We arrange the input and output neurons in 2D ◮ The output is the result of a weighted sum of a small local

area in the previous layer – convolution S(i, j) =

  • m
  • n

I(i + m, j + n)K(m, n)

◮ The weights K(m, n) is what is learned.

9/45

slide-10
SLIDE 10

Convolution in 2D

◮ We arrange the input and output neurons in 2D ◮ The output is the result of a weighted sum of a small local

area in the previous layer – convolution S(i, j) =

  • m
  • n

I(i + m, j + n)K(m, n)

◮ The weights K(m, n) is what is learned.

10/45

slide-11
SLIDE 11

Learning in layers

◮ The convolutional layer learns several sets of weights, each a

kind of feature detector

◮ These are built up in layers ◮ Until we get our end result, e.g., an object detector.

“cat”

11/45

slide-12
SLIDE 12

Visualising convolutional layers

Krizhevsky et al 2012

12/45

slide-13
SLIDE 13

deconvnet

Map activations back to the image space

Zeiler and Fergus 2014, https://arxiv.org/abs/1311.2901

13/45

slide-14
SLIDE 14

Real convolutional neural nets

◮ What we call CNNs, actually also contain other types of

  • perations/layers: fully connected layers, non-linearities

◮ Modern CNNs have a huge bag of tricks: pooling, various

training shortcuts, 1x1 convolutions, inception modules, residual connections, etc.

INPUT 32x32

Convolutions Subsampling Convolutions

C1: feature maps 6@28x28

Subsampling

S2: f. maps 6@14x14 S4: f. maps 16@5x5 C5: layer 120 C3: f. maps 16@10x10 F6: layer 84

Full connection Full connection Gaussian connections

OUTPUT 10

LeNet5 (LeCun et al 1998)

14/45

slide-15
SLIDE 15

Examples of real CNNs

AlexNet (Krizhevsky et al 2012)

15/45

slide-16
SLIDE 16

Examples of real CNNs

GoogLeNet (Szegedy et al 2014)

16/45

slide-17
SLIDE 17

Examples of real CNNs

Inception v3 (Szegedy et al 2015)

17/45

slide-18
SLIDE 18

Examples of real CNNs

ResNet-152 (He et al 2015) https://github.com/KaimingHe/deep-residual-networks

18/45

slide-19
SLIDE 19

Object recognition challenge

ImageNet benchmark

◮ ImageNet Large Scale Visual Recognition Challenge (ILSVRC) ◮ More than 1 million images ◮ Task: classify into 1000 object categories.

19/45

slide-20
SLIDE 20

Object recognition challenge

◮ First time won by a CNN in 2012 (Krizhevsky et al) ◮ Wide margin: top-5 error rate from 26% to 16% ◮ CNNs have ruled ever since.

20/45

slide-21
SLIDE 21

Accuracy vs model complexity

◮ Accuracy vs number of inference operations ◮ Circle size represents number of parameters ◮ Newer nets are both better, faster and have fewer parameters.

Image from https://arxiv.org/pdf/1605.07678.pdf 21/45

slide-22
SLIDE 22

Computer vision applications

22/45

slide-23
SLIDE 23

Object detection and localisation

Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. arXiv:1506.01497 23/45

slide-24
SLIDE 24

Semantic segmentation

Learning Deconvolution Network for Semantic Segmentation. Hyeonwoo Noh, Seunghoon Hong, Bohyung Han. arXiv: 1505.04366 24/45

slide-25
SLIDE 25

Object detection and localisation

https://github.com/facebookresearch/Detectron 25/45

slide-26
SLIDE 26

Describing an image

Show and Tell: A Neural Image Caption Generator. Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru

  • Erhan. arXiv:1411.4555

26/45

slide-27
SLIDE 27

Describing an image

DenseCap: Fully Convolutional Localization Networks for Dense Captioning, Justin Johnson, Andrej Karpathy, Li Fei-Fei, CVPR 2016. 27/45

slide-28
SLIDE 28

Visual question answering

VQA: Visual Question Answering. Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh. ICCV 2015. 28/45

slide-29
SLIDE 29

Generative Adversarial Networks (GANs)

“The coolest idea in machine learning in the last twenty years” – Yann LeCun

◮ We have two networks: generator and discriminator ◮ The generator produces samples, while the discriminator tries

to distinguish between real data items and the generated samples

◮ The discriminator tries to learn to classify correctly, while the

generator in turn tries to learn to fool the discriminator.

29/45

slide-30
SLIDE 30

GAN examples

Generated bedrooms

https://arxiv.org/abs/1511.06434v2 30/45

slide-31
SLIDE 31

GAN examples

Generated “celebrities”

Progressive Growing of GANs for Improved Quality, Stability, and Variation. Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen. arXiv: 1710.10196 31/45

slide-32
SLIDE 32

GAN examples

CycleGAN

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks https://junyanz.github.io/CycleGAN/ 32/45

slide-33
SLIDE 33

GAN examples

Generative Adversarial Text to Image Synthesis

https://arxiv.org/pdf/1605.05396.pdf 33/45

slide-34
SLIDE 34

Neural style

A Neural Algorithm of Artistic Style https://arxiv.org/pdf/1508.06576.pdf https://github.com/jcjohnson/neural-style

34/45

slide-35
SLIDE 35

AI vs humans?

35/45

slide-36
SLIDE 36

AI vs humans?

Recall our ImageNet benchmark . . . where do humans stand?

http: //karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/ 36/45

slide-37
SLIDE 37

AI better than humans?

◮ Don’t confuse classification accuracy with understanding! ◮ Neural nets learn to optimize for a particular problem pretty

well

◮ But in the end it’s just pixel statistics ◮ Humans can generalize and understand the context.

37/45

slide-38
SLIDE 38

AI better than humans?

Microsoft CaptionBot: “I think it’s a group of people standing next to a man in a suit and tie.”

https://karpathy.github.io/2012/10/22/state-of-computer-vision/ 38/45

slide-39
SLIDE 39

39/45

slide-40
SLIDE 40

Adversarial examples

◮ Deep nets fooled by deliberately crafted inputs ◮ Revealing: what deep nets learn is quite different from what

humans learn

https://blog.openai.com/adversarial-example-research/ 40/45

slide-41
SLIDE 41

Conclusion

◮ Deep learning has been a big leap for computer vision ◮ We can solve some specific problems really well ◮ Still far away from true understanding of visual information

41/45

slide-42
SLIDE 42

About CSC

42/45

slide-43
SLIDE 43

CSC – IT Center for Science

◮ Finnish non-profit state enterprise with special tasks ◮ Owned by the Finnish state (70%) and higher education

institutions (30%)

◮ ICT expertise for research, education, public administration ◮ Services mostly free for universities and state research

institutions

◮ You might have heard about: Funet, HAKA, eduroam,

VIRTA, Finland’s fastest supercomputers, . . .

◮ Headquarters in Espoo (Keilaniemi), datacenter in Kajaani.

43/45

slide-44
SLIDE 44

CSC’s services

Some other services, which might be relevant for you:

◮ Notebooks – notebooks.csc.fi

◮ Jupyter notebooks, e.g., with deep learning environments ◮ Anyone with student account can access

◮ Puhti CPU and GPU cluster

◮ 320 NVIDIA Volta V100 GPUs for deep learning ◮ requires research project or university course ◮ https://research.csc.fi/dl2021-utilization

◮ In Q4/2020: EuroHPC pre-exascale supercomputer LUMI

◮ among world’s fastest computers ∼ 200 petaflops/s ◮ largely GPU based ◮ https://datacenter.csc.fi/wp/about-eurohpc/. 44/45

slide-45
SLIDE 45

Thank you!

45/45