Computer Vision and Deep Learning Introduction to Data Science 2019 - PowerPoint PPT Presentation

Computer Vision and Deep Learning Introduction to Data Science 2019 University of Helsinki Mats Sj¨ oberg mats.sjoberg@csc.fi CSC – IT Center for Science September 23, 2019

Computer vision Giving computers the ability to understand visual information Examples: ◮ A robot that can move around obstacles by analysing the input of its camera(s) ◮ A computer system finding images of cats among millions of images (e.g., on the Internet). 2/45

From picture to pixels ◮ The camera image needs to be digitised for computer processing ◮ Turning it into millions of discrete picture elements, or pixels “There’s a cat 0.4941 0.4941 0.4745 0.4901 0.4745 � 0.5098 0.5058 0.5215 0.5098 0.5058 � 0.4941 0.4941 0.5058 0.4941 0.4980 among some flowers 0.4980 0.4941 0.4862 0.4705 0.4941 0.5019 0.4980 0.4980 0.4901 0.5098 in the grass” ◮ How do we get from pixels to understanding? ◮ . . . or even some kind of useful/actionable interpretation. 3/45

Deep learning Before ◮ Hand-crafted features, e.g., colour distributions, edge histograms ◮ Complicated feature selection mechanisms ◮ “Classical” machine learning, e.g., kernel methods (SVM) About 5 years ago: deep learning ◮ End-to-end learning, i.e., the network itself learns the features ◮ Each layer typically learns a higher level of representation ◮ However: entirely data-driven, features can be hard to interpret Computer vision was one of the first breakthroughs of deep learning. 4/45

Deep learning = neural networks Fully connected or dense layer x 1 w ji y 1 f ( · ) x 2 y j = f ( � n i =1 w ji x i ) . . . . . . y m x n f ( · )     w 11 w 12 w 1 n   . . . x 1 w 21 w 22 w 2 n    .  . . .  = f ( W T x ) . y = f . . .   ...     . . . .     . . .      x n w m 1 w m 2 w mn . . . (we’re ignoring the bias term here . . . ) 5/45

Learning in neural networks ◮ Feedforward network has a huge number of parameters that need to be learned ◮ Each output node interacts with every input node via the weights in W ◮ n × m weights (and that’s just one layer!) ◮ Learning is typically done with stochastic gradient descent http://ruder.io/optimizing-gradient-descent/ ◮ Gradients for each neuron obtained with backpropagation ◮ Given enough time and data the network can in theory learn to model any complex phenomena (Universal approximation theorem) ◮ In practice, we often use domain knowledge to restrict the number of parameters that need to be learned. http://playground.tensorflow.org/ 6/45

Deep learning for vision While we don’t hand-craft features anymore . . . In practice we still apply some “expert knowledge” to make learning feasible . . . ◮ Neighbouring pixels are probably related (convolutions) ◮ There are common image features which can appear anywhere such as edges, corners, etc (weight sharing) ◮ Often the exact location of a feature isn’t important (max pooling) ⇒ Convolutional neural networks (CNN, ConvNet). 7/45

Feedforward to convolutional net w 11 w 1 x 1 y 1 x 1 y 1 w 21 w 2 . . w 1 . x 2 y 2 x 2 y 2 w 2 w 1 y 3 y 3 x 3 x 3 w 2 w 1 x 4 y 4 x 4 y 4 w 2 w 1 x 5 y 5 x 5 y 5 w 2 w 1 y 6 y 6 x 6 x 6 w 2 w 1 x 7 y 7 x 7 y 7 w 77 to this. Network changes from this . . . 8/45

Convolution in 2D ◮ We arrange the input and output neurons in 2D ◮ The output is the result of a weighted sum of a small local area in the previous layer – convolution � � S ( i , j ) = I ( i + m , j + n ) K ( m , n ) m n ◮ The weights K ( m , n ) is what is learned . 9/45

Convolution in 2D ◮ We arrange the input and output neurons in 2D ◮ The output is the result of a weighted sum of a small local area in the previous layer – convolution � � S ( i , j ) = I ( i + m , j + n ) K ( m , n ) m n ◮ The weights K ( m , n ) is what is learned . 10/45

Learning in layers ◮ The convolutional layer learns several sets of weights, each a kind of feature detector ◮ These are built up in layers ◮ Until we get our end result, e.g., an object detector. “cat” 11/45

Visualising convolutional layers Krizhevsky et al 2012 12/45

deconvnet Map activations back to the image space Zeiler and Fergus 2014, https://arxiv.org/abs/1311.2901 13/45

Real convolutional neural nets ◮ What we call CNNs, actually also contain other types of operations/layers: fully connected layers, non-linearities ◮ Modern CNNs have a huge bag of tricks: pooling, various training shortcuts, 1x1 convolutions, inception modules, residual connections, etc. C3: f. maps 16@10x10 C1: feature maps S4: f. maps 16@5x5 INPUT 6@28x28 32x32 S2: f. maps C5: layer OUTPUT F6: layer 6@14x14 120 10 84 Gaussian connections Full connection Subsampling Subsampling Full connection Convolutions Convolutions LeNet5 (LeCun et al 1998) 14/45

Examples of real CNNs AlexNet (Krizhevsky et al 2012) 15/45

Examples of real CNNs GoogLeNet (Szegedy et al 2014) 16/45

Examples of real CNNs Inception v3 (Szegedy et al 2015) 17/45

Examples of real CNNs ResNet-152 (He et al 2015) https://github.com/KaimingHe/deep-residual-networks 18/45

Object recognition challenge ImageNet benchmark ◮ ImageNet Large Scale Visual Recognition Challenge (ILSVRC) ◮ More than 1 million images ◮ Task: classify into 1000 object categories. 19/45

Object recognition challenge ◮ First time won by a CNN in 2012 (Krizhevsky et al) ◮ Wide margin: top-5 error rate from 26% to 16% ◮ CNNs have ruled ever since. 20/45

Accuracy vs model complexity ◮ Accuracy vs number of inference operations ◮ Circle size represents number of parameters ◮ Newer nets are both better, faster and have fewer parameters. Image from https://arxiv.org/pdf/1605.07678.pdf 21/45

Computer vision applications 22/45

Object detection and localisation Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. CVPR 2014. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. arXiv:1506.01497 23/45

Semantic segmentation Learning Deconvolution Network for Semantic Segmentation. Hyeonwoo Noh, Seunghoon Hong, Bohyung Han. arXiv: 1505.04366 24/45

Object detection and localisation https://github.com/facebookresearch/Detectron 25/45

Describing an image Show and Tell: A Neural Image Caption Generator. Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan. arXiv:1411.4555 26/45

Describing an image DenseCap: Fully Convolutional Localization Networks for Dense Captioning, Justin Johnson, Andrej Karpathy, Li Fei-Fei, CVPR 2016. 27/45

Visual question answering VQA: Visual Question Answering. Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh. ICCV 2015. 28/45

Generative Adversarial Networks (GANs) “The coolest idea in machine learning in the last twenty years” – Yann LeCun ◮ We have two networks: generator and discriminator ◮ The generator produces samples, while the discriminator tries to distinguish between real data items and the generated samples ◮ The discriminator tries to learn to classify correctly, while the generator in turn tries to learn to fool the discriminator. 29/45

GAN examples Generated bedrooms https://arxiv.org/abs/1511.06434v2 30/45

GAN examples Generated “celebrities” Progressive Growing of GANs for Improved Quality, Stability, and Variation. Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen. arXiv: 1710.10196 31/45

GAN examples CycleGAN Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks https://junyanz.github.io/CycleGAN/ 32/45

GAN examples Generative Adversarial Text to Image Synthesis https://arxiv.org/pdf/1605.05396.pdf 33/45

Neural style A Neural Algorithm of Artistic Style https://arxiv.org/pdf/1508.06576.pdf https://github.com/jcjohnson/neural-style 34/45

AI vs humans? 35/45

AI vs humans? Recall our ImageNet benchmark . . . where do humans stand? http: //karpathy.github.io/2014/09/02/what-i-learned-from-competing-against-a-convnet-on-imagenet/ 36/45

AI better than humans? ◮ Don’t confuse classification accuracy with understanding! ◮ Neural nets learn to optimize for a particular problem pretty well ◮ But in the end it’s just pixel statistics ◮ Humans can generalize and understand the context. 37/45

AI better than humans? Microsoft CaptionBot: “I think it’s a group of people standing next to a man in a suit and tie.” https://karpathy.github.io/2012/10/22/state-of-computer-vision/ 38/45

Adversarial examples ◮ Deep nets fooled by deliberately crafted inputs ◮ Revealing: what deep nets learn is quite different from what humans learn https://blog.openai.com/adversarial-example-research/ 40/45

Conclusion ◮ Deep learning has been a big leap for computer vision ◮ We can solve some specific problems really well ◮ Still far away from true understanding of visual information 41/45

About CSC 42/45

Computer Vision and Deep Learning Introduction to Data Science 2019 - PowerPoint PPT Presentation

Computer Vision and Deep Learning Introduction to Data Science 2019 University of Helsinki Mats Sj oberg mats.sjoberg@csc.fi CSC IT Center for Science September 23, 2019 Computer vision Giving computers the ability to understand

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1 Delving Deep into

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Reasoning A Vision for Automated Deduction Stephan Schulz Deep Reasoning A Vision for

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

COMPUTER VISION Robust estimation Emanuel Aldea < emanuel.aldea@u-psud.fr >

Early Face Recognition Systems in Computer Vision Kanade feature-based face recognition (1973!)

Exact and Partial Energy Minimization in Computer Vision Alexander Shekhovtsov Supervisor:

Computer Vision Statistical Shape Analysis Shape Shape is the geometric information that

3D Computer Vision Dmitry Chetverikov, Levente Hajder Etvs Lornd University, Faculty of

Convolutional Neural Networks for Computer Vision Caner Hazrba Centrum fr Informations- und

Computer'Vision Course'Introduction Prof.&Flvio&Cardeal&

Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National

Computer Vision and Deep Learning Introduction to Data Science 2019 - PowerPoint PPT Presentation

Computer Vision and Deep Learning Introduction to Data Science 2019 University of Helsinki Mats Sj oberg mats.sjoberg@csc.fi CSC IT Center for Science September 23, 2019 Computer vision Giving computers the ability to understand

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

Delving Deep into Computer Vision Caner Hazirbas Machine Learning Meetup #1 Delving Deep into

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Reasoning A Vision for Automated Deduction Stephan Schulz Deep Reasoning A Vision for

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

COMPUTER VISION Robust estimation Emanuel Aldea &lt; emanuel.aldea@u-psud.fr &gt;

Early Face Recognition Systems in Computer Vision Kanade feature-based face recognition (1973!)

Exact and Partial Energy Minimization in Computer Vision Alexander Shekhovtsov Supervisor:

Computer Vision Statistical Shape Analysis Shape Shape is the geometric information that

3D Computer Vision Dmitry Chetverikov, Levente Hajder Etvs Lornd University, Faculty of

Convolutional Neural Networks for Computer Vision Caner Hazrba Centrum fr Informations- und

Computer'Vision Course'Introduction Prof.&amp;Flvio&amp;Cardeal&amp;

Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

COMPUTER VISION Robust estimation Emanuel Aldea < emanuel.aldea@u-psud.fr >

Computer'Vision Course'Introduction Prof.&Flvio&Cardeal&