8. More Tasks in Computer Vision CS 519 Deep Learning, Winter 2018 - - PowerPoint PPT Presentation

β–Ά
8 more tasks in computer
SMART_READER_LITE
LIVE PREVIEW

8. More Tasks in Computer Vision CS 519 Deep Learning, Winter 2018 - - PowerPoint PPT Presentation

8. More Tasks in Computer Vision CS 519 Deep Learning, Winter 2018 Fuxin Li With materials from Zsolt Kira, Roger Grosse, Nitish Srivastava Image Classification History Caltech datasets: Caltech-256 Caltech-101: 3,030 images in 101


slide-1
SLIDE 1
  • 8. More Tasks in Computer

Vision

CS 519 Deep Learning, Winter 2018 Fuxin Li

With materials from Zsolt Kira, Roger Grosse, Nitish Srivastava

slide-2
SLIDE 2

Image Classification History

  • Caltech datasets:
  • Caltech-101: 3,030 images in 101 categories
  • Caltech-256: 30,607 images in 256 categories
  • ImageNet
  • Full set: more than 10 million images
  • WordNet taxonomy
  • Challenge: 1.2 million images in 1,000

categories

  • Dog breeds

Caltech-256

slide-3
SLIDE 3

ImageNet

slide-4
SLIDE 4

Dog breeds

  • More than 120 different dog breeds in the dataset
  • Hard for human to discriminate
slide-5
SLIDE 5

ILSVRC

152-level Conv Net (covered later) 3.6% (VGG) (AlexNet)

slide-6
SLIDE 6

AlexNet (60 million parameters)

slide-7
SLIDE 7

The VGG Network (138 Million parameters)

224 x 224 224 x 224 112 x 112 56 x 56 28 x 28 14 x 14 7 x 7 Airplane Dog Car SUV Minivan Sign Pole ……

(Simonyan and Zisserman 2014)

slide-8
SLIDE 8

Softmax Cross-Entropy

  • Softmax layer in multi-class
  • Log-likelihood:
  • Loss function is minus log-likelihood
  • Total energy: min

𝐗 σ𝑗 βˆ’ log 𝑄(𝑧 = 𝑧𝑗|𝑦𝑗)

βˆ’ log 𝑄(𝑧 = π‘˜|𝑦) = βˆ’π’šβŠ€π’™π‘˜ + log ෍

𝑙

π‘“π’šβŠ€π’™π‘™

slide-9
SLIDE 9

Reinforcement Learning: Atari games

  • Predict the Q-function (value function) of each move using the

current scene as input

  • Use normal MDP value iterations to decide the best current move

Mnih et al. Playing Atari with Deep Reinforcement Learning

slide-10
SLIDE 10

Reinforcement Learning: Playing go

  • Predict next 3 moves using CNN
  • Combine with Monte Carlo Tree Search to obtain state-of-the-art go-

playing system

Tian and Zhu. arXiv 0511:06410

slide-11
SLIDE 11
slide-12
SLIDE 12

Object Detection

  • Faster/Mask R-CNN:
  • Deep network on object proposals
  • Jointly train network to propose

boxes inside the image and classification in the box

slide-13
SLIDE 13

Predicting Regions

slide-14
SLIDE 14

Semantic Segmentation

  • Given an image, identify the category and spatial extent of all relevant
  • bjects

Horse Person Horse Person

Image Category Label Object Label

Obj 1 Obj 2 Obj 3 Obj 4

Segment-based Framework

14

slide-15
SLIDE 15

Fully Convolutional Network

  • Idea – Fully connected can be turned into fully-convolutional
  • Zero-padding can help outputting more numbers!

512 7 7 Convolve 7x7 filters 4096 7 7

slide-16
SLIDE 16

Decoding Step (Deconvolution)

  • Can also train network to β€œdecode”
  • Suppose CNN is an β€œencoding” process
  • One could train a β€œdecoder” to retain the full image resolution
  • Decoder is another CNN, with filter weights tied/not tied to the filter weights

in the β€œencoder” CNN

  • One could use β€œUn-max-pooling”

to increase resolution

slide-17
SLIDE 17

Deconvolution used for finer details

  • Same convolutional networks – deconvolute all the way
  • H. Noh, S. Hong, B. Han. Learning Deconvolution Network for Semantic Segmentation. ICCV 2015
slide-18
SLIDE 18

Deconvolution

Some Conv result Un-max-pooling Deconvolution Un-max-pooling Deconvolution Un-max-pooling Deconvolution Un-max-pooling Deconvolution

slide-19
SLIDE 19

U-Net: Add linkage

  • Add linkage between

conv layers and deconv layers with the same resolution

  • Improve spatial

precision and helps at boundaries (low-level information)

slide-20
SLIDE 20

Sample results for deconvolution-based semantic segmentation

slide-21
SLIDE 21

Other trivia: Fine-Tuning

  • Take pre-trained

network

  • Remove last

layer

  • Add your new

layer

  • Say with 10

classes

  • Best results
  • Train last layer
  • Retrain entire

network

slide-22
SLIDE 22

Fine-tuning