Learning Based Vision II Computer Vision Fall 2018 Columbia - PowerPoint PPT Presentation

Learning Based Vision II Computer Vision Fall 2018 Columbia University

Project • Project Proposals due October 31 • Pick one of our suggested projects, or pitch your own • Must use something in this course • Groups of 2 strongly recommended • If you want help finding a team, see post on Piazza • We’ll give you Google Cloud credits once you turn in your project proposal • Details here: http://w4731.cs.columbia.edu/project

Neural Networks

Convolutional Network (AlexNet) input conv1 conv2 conv3 conv4 conv5 “fc6” “fc7” output 1x1x4096 1x1x4096 1x1x1000 13x13x256 13x13x384 13x13x384 27x27x256 55x55x96 Red layers are followed by max pooling 224x224x3 Visualization hids the dimensions of the filters Slide credit: Deva Ramanan

Convolutional Layer = x i ∈ ℝ W × H × D * i ∈ ℝ w × h × D w k x i +1 ∈ ℝ W × H × K

Learning x i θ Parameters Input (image) y i f ( x i ; θ ) Prediction Target (labels) ℒ Loss Function ℒ ( f ( x i ; θ ), y i ) + λ ∥ θ ∥ 2 θ ∑ min 2 i ℒ ( z , y ) = − ∑ y i log z i j

Slide from Rob Fergus, NYU

Let’s break them

“school bus”

“school bus” “ostrich”

+ = (scaled for “ostrich” “school bus” visualization)

Images on left are correctly classified Images on the right are incorrectly classified as ostrich

How can we find these? Solve optimization problem to find minimal change that maximizes the loss Δ ℒ ( f ( x + Δ ), y ) − λ ∥Δ∥ 2 max 2

99% confidence! Nguyen, Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

99% confidence! Also 99% confidence! Nguyen, Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Nguyen, Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Universal attacks Moosave-Dezfooli et al. arXiv 1610.08401

Universal attacks Attack is agnostic to the image content Moosave-Dezfooli et al. arXiv 1610.08401

Change just one pixel Su et al, “One pixel attack for fooling deep neural networks”

In the physical world

In the 3D physical world

Neural network camouflage https://cvdazzle.com/

Which Pixels in the Input Affect the Neuron the Most? • Rephrased: which pixels would make the neuron not turn on if they had been different? • In other words, for which inputs is 𝜖𝑜𝑓𝑣𝑠𝑝𝑜 𝜖𝑦 𝑗 large?

Typical Gradient of a Neuron • Visualize the gradient of a particular neuron with respect to the input x • Do a forward pass: • Compute the gradient of a particular neuron using backprop:

“Guided Backpropagation” • Idea: neurons act like detectors of particular image features • We are only interested in what image features the neuron detects, not in what kind of stuff it doesn’t detect • So when propagating the gradient, we set all the negative gradients to 0 • We don’t care if a pixel “suppresses” a neuron somewhere along the part to our neuron

Guided Backpropagation Compute gradient, Compute gradient, Compute gradient, zero out negatives, zero out negatives, zero out negatives, backpropagate backpropagate backpropagate

Guided Backpropagation Backprop Guided Backprop

Guided Backpropagation Springerberg et al, Striving for Simplicity: The All Convolutional Net (ICLR 2015 workshops)

What About Doing Gradient Descent? • What to maximize the i-th output of the softmax • Can compute the gradient of the i-th output of the softmax with respect to the input x (the W’s and b’s are fixed to make classification as good as possible) • Perform gradient descent on the input

Yosinski et al, Understanding Neural Networks Through Deep Visualization (ICML 2015)

ConvNet P(category) Image

ConvNet P(category) Image What if we learn to generate adversarial examples?

Noise ConvNet ConvNet P(category) What if we learn to generate adversarial examples?

Generative Adversarial Networks Goodfellow et al G Noise D P(real)

Generated images Trained with CIFAR-10

Introduced a form of ConvNet more stable under adversarial training than previous attempts.

Generator Random uniform vector (100 numbers)

Synthesized images

Transposed-convolution

Transposed-convolution Convolution Transposed-convolution

Generated Images Brock et al. Large scale GAN training for high fidelity natural image synthesis

Image Interpolation

Nearest Neighbors

Generating Dynamics

Two components Generator Network to visualize car fc7 Classification fc6 layer conv5 conv4 conv3 conv2 conv1

Two components Generator Network to visualize Table lamp fc7 Classification fc6 layer conv5 conv4 conv3 conv2 conv1

Two components Table lamp Classification fc6 fc7 layer conv5 conv4 conv3 conv2 Generator conv1 Unit to visualize

Synthesizing Images Preferred by CNN ImageNet-Alexnet-final units (class units) Nguyen A, Dosovitskiy A, Yosinski J, Brox T, Clune J. (2016). "Synthesizing the preferred inputs for neurons in neural networks via deep generator networks.". arXiv:1605.09304.

Where to start training?

Gradient Descent How to pick where to start? α δ ℒ ℒ δθ θ

Idea 0: Train many models

Drop-out regularization (a) Standard Neural Net (b) After applying dropout. Intuition: we should really train a family of models with different architectures and average their predictions (c.f. model averaging from machine learning) Practical implementation: learn a single “superset” architecture that randomly removes nodes (by randomly zero’ing out activations) during gradient updates Slide credit: Deva Ramanan

Idea 1: Carefully pick starting point

Backprop x L � 1 x 2 x 3 x L ... x 0 f 1 f 2 f L ` z 2 R w 1 w 2 w L dz d = [ ` y � f L ( · ; w L ) � ... � f 2 ( · ; w 2 ) � f 1 ( x 0 ; w 1 )] d w l d w l dz dz d (vec x L � 1 ) > . . . d vec x l +1 d vec x L d vec x l = d w > d w l d (vec x L ) > d (vec x l ) > l Slide credit: Deva Ramanan

Idea 1: Carefully pick starting point He et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Exploding and vanishing gradient • How does the determinant of the gradients e ff ect the final gradient? • What if the determinant is less than one? • What if the determinant is greater than one? dz dz d (vec x L � 1 ) > . . . d vec x l +1 d vec x L d vec x l = d w > d w l d (vec x L ) > d (vec x l ) > l

Exploding and vanishing gradient Source: Roger Grosse

Initialization • Key idea: initialization weights so that the variance of activations is one at each layer • You can derive what this should be for di ff erent layers and nonlinearities w i ∼ 𝒪 ( 0,2 k ) • For ReLU: b i = 0 He et al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification

Idea 2: How to maintain this throughout training?

Batch Normalization ! = ! − % ! ' = (" ! + * " & • % : mean of ! in mini-batch • % , & : functions of ! , • & : std of ! in mini-batch analogous to responses • ( : scale • ( , * : parameters to be learned, • * : shift analogous to weights Ioffe & Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. ICML 2015

Batch Normalization # = # − ! # ' = ($ # + * $ " 2 modes of BN: • Train mode: Caution : make sure your • ! , " are functions of a batch of # BN usage is correct! • Test mode: (this causes many of my bugs in • ! , " are pre-computed on training set my research experience!) Ioffe & Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. ICML 2015

Batch Normalization w/ BN w/o BN accuracy iter. Figure credit: Ioffe & Szegedy Ioffe & Szegedy. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”. ICML 2015

Back to breaking things…

Architecture of Krizhevsky et al. Softmax Output • 8 layers total Layer 7: Full Layer 6: Full • Trained on Imagenet dataset [Deng et al. CVPR’09] Layer 5: Conv + Pool Layer 4: Conv • 18.2% top-5 error Layer 3: Conv Layer 2: Conv + Pool • Our reimplementation: 18.1% top-5 error Layer 1: Conv + Pool Input Image

Architecture of Krizhevsky et al. Softmax Output • Remove top fully connected layer – Layer 7 Layer 6: Full Layer 5: Conv + Pool • Drop 16 million Layer 4: Conv parameters Layer 3: Conv • Only 1.1% drop in Layer 2: Conv + Pool performance! Layer 1: Conv + Pool Input Image

Learning Based Vision II Computer Vision Fall 2018 Columbia - PowerPoint PPT Presentation

Learning Based Vision II Computer Vision Fall 2018 Columbia University Project Project Proposals due October 31 Pick one of our suggested projects, or pitch your own Must use something in this course Groups of 2 strongly

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

FOCUS AREAS FOCUS AREAS FOCUS AREAS FOCUS AREAS Our Our Vision Vision Our Our Vision

So What Has So, What Has So, What Has So What Has Vision Done For Vision Done For Vision Done

Analog night vision devices April, 2020 ANALOG NIGHT VISION DEVICES Night vision devices

Deep Learning Gets Way Deeper 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1

Calibration to Implied Volatility Data Jean-Pierre Fouque University of California Santa Barbara

NOW Handout Page 1 SPARC (and RISC I) had register Can we have fast interrupts? windows Raise

CONVENTIONAL CONSTRUCTION COSTING S. Dixon, FESS/Engineering nuSTORM Workshop - September 2012 1

CNN Case Studies M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Fei

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22

Convolutional Neural Networks II Milan Straka March 30, 2020 Charles University in Prague

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Todays class

Learning Based Vision II Computer Vision Fall 2018 Columbia - PowerPoint PPT Presentation

Learning Based Vision II Computer Vision Fall 2018 Columbia University Project Project Proposals due October 31 Pick one of our suggested projects, or pitch your own Must use something in this course Groups of 2 strongly

Computer Vision Computer Vision How does vision work? What is vision for? Ela Claridge

Branding Presentation VISION Mevushal VISION Muscat of Alexandria &amp; Viognier VISION

Vision Services Vision Services &amp; &amp; Vision Therapy Vision Therapy February 2, 2007

Vision Our National Church partners .. Vision Our National Network partners Vision Getting

HIM Without Walls Realizing Our Vision! Realizing Our Vision Realize Our Vision Realizing Our

Deep Learning in Computer Vision Caner Hazrba Deep Learning in Action 24. June 15

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

J J R R Our Vision . . . Our Vision . . . Our Vision . . . Our Vision . . . TO BE THE BEST

Post- -trauma vision trauma vision Post Post- -trauma vision trauma vision Post syndrome

2017 Humana Vision 130 LOOK Whats NEW! NEW RETAIL FRAME BENEFIT 2 Humana Vision 100

Vision What is the Vision? The American Fork Canyon Vision (Vision) will ho- Few places in the

Building Our Vision St. Andrews Vision and Mission Our Vision: Our Vision: The Tree of Life is

FLITTER FLITTER The Foldable Litter Pink B Our Vision Our Vision Our Vision Our Vision A

FOCUS AREAS FOCUS AREAS FOCUS AREAS FOCUS AREAS Our Our Vision Vision Our Our Vision

So What Has So, What Has So, What Has So What Has Vision Done For Vision Done For Vision Done

Analog night vision devices April, 2020 ANALOG NIGHT VISION DEVICES Night vision devices

Deep Learning Gets Way Deeper 3x3 conv, 128 1x1 conv, 512 1x1 conv, 128 3x3 conv, 128 1x1

Calibration to Implied Volatility Data Jean-Pierre Fouque University of California Santa Barbara

NOW Handout Page 1 SPARC (and RISC I) had register Can we have fast interrupts? windows Raise

CONVENTIONAL CONSTRUCTION COSTING S. Dixon, FESS/Engineering nuSTORM Workshop - September 2012 1

CNN Case Studies M. Soleymani Sharif University of Technology Fall 2017 Slides are based on Fei

Announcements Class is 170. Matlab Grader homework, 1 and 2 (of less than 9) homeworks Due 22

Convolutional Neural Networks II Milan Straka March 30, 2020 Charles University in Prague

Convolutional Neural Networks Computer Vision Jia-Bin Huang, Virginia Tech Todays class

Branding Presentation VISION Mevushal VISION Muscat of Alexandria & Viognier VISION

Vision Services Vision Services & & Vision Therapy Vision Therapy February 2, 2007