Convolutional Neural Nets EECS 442 David Fouhey Fall 2019, - PowerPoint PPT Presentation

Convolutional Neural Nets EECS 442 – David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/

Previously – Backpropagation 𝑔 𝑦 = −𝑦 + 3 2 x -x -x+3 (-x+3) 2 -n n 2 n+3 1 2x − 6 −2𝑦 + 6 −2𝑦 + 6 Forward pass: compute function Backward pass: compute derivative of all parts of the function

Setting Up A Neural Net Input Hidden Output h 1 y 1 x 1 h 2 y 2 x 2 h 3 y 3 h 4

Setting Up A Neural Net Input Hidden 2 Output Hidden 1 a 1 h 1 y 1 x 1 a 2 h 2 y 2 x 2 a 3 h 3 y 3 a 4 h 4

Fully Connected Network a 1 h 1 y 1 Each neuron connects x 1 a 2 h 2 to each neuron in the y 2 previous layer x 2 a 3 h 3 y 3 a 4 h 4

Fully Connected Network a 1 h 1 y 1 𝒃 All layer a values x 1 a 2 h 2 𝒙 𝒋 , 𝑐 𝑗 Neuron i weights, bias y 2 x 2 a 3 h 3 𝑔 Activation function y 3 a 4 h 4 𝒊 = 𝑔(𝑿𝒃 + 𝒄) 𝑈 ℎ 1 𝑥 1 𝑏 1 𝑐 1 𝑔 ( ) 𝑈 ℎ 2 𝑥 2 𝑏 2 𝑐 2 = + 𝑈 ℎ 3 𝑥 3 𝑏 3 𝑐 3 𝑈 ℎ 4 𝑥 4 𝑏 4 𝑐 4

Fully Connected Network Define New Block: “Linear Layer” (Ok technically it’s Affine) W b 𝑀 𝒐 = 𝑿𝒐 + 𝒄 n L Can get gradient with respect to all the inputs (do on your own; useful trick: have to be able to do matrix multiply)

Fully Connected Network a 1 h 1 y 1 x 1 a 2 h 2 y 2 x 2 a 3 h 3 y 3 a 4 h 4 W 1 b 1 W 2 b 2 W 3 b 3 x L f (n) L f (n) L f (n)

Fully Connected Network a 1 h 1 y 1 Backpropagation lets us calculate derivative of x 1 a 2 h 2 the output/error with respect to all the Ws at a y 2 x 2 a 3 h 3 given point x y 3 a 4 h 4 W 1 b 1 W 2 b 2 W 3 b 3 x L f (n) L f (n) L f (n)

Putting It All Together – 1 Function: NN(x; W i ,b i ) Parameterized by W = {W i ,b i } W 1 b 1 W 2 b 2 W 3 b 3 x L f (n) L f (n) L f (n)

Putting It All Together Function: Loss(NN(x; W i ,b i ),y) Function: NN(x; W i ,b i ) W 1 b 1 W 2 b 2 W 3 b 3 x L f (n) L f (n) L f (n) Loss y

Putting It All Together W = initializeWeights() for i in range(numIterations): #sample a batch batch = random.subset(0,#datapoints,K) batchX, batchY = dataX[batch], dataY[batch] #compute gradient with batch gradW = backprop(Loss(NN(batchX,W),batchY)) #update W with gradient step W += -stepsize*gradW return W

What Can We Represent? h 1 h 2 y 1 h 3 h 4 𝑀 𝒐 = 𝑿𝒐 + 𝒄 W b x L f (n)

What Can We Represent • Recall: ax+by+z is • proportional to signed distance to line • equal to signed distance if you set it right • Generalization to N-D: hyperplane w T x +b

Can We Train a Network To Do It? - - + - + - + x 1 + y 1 x 2 + + - - + + - -

Can We Train a Network To Do It? - - + h 1 - + - + + x 1 h 2 y 1 x 2 h 3 + + - - + + h 4 - -

Can We Train a Network To Do It? - + - - + max( w 1 T x +b,0) - + + max( w 1 T x +b,0)+ + + - - + + - max(- w 1 T x +b,0) = - Distance to line - - + - + max(- w 1 T x +b,0) + - + defined by w 1 + + - - + + - - x 1 - + - - + max( w 2 T x +b,0) - + + max( w 2 T x +b,0)+ x 2 + + - - + + - max(- w 2 T x +b,0) = - Distance to line - - + - + max(- w 2 T x +b,0) defined by w 2 + - + + + - - + + - -

Can We Train a Network To Do It? - + - - + - + + + + - - + + - - Distance to w 1 - - + - + + - + + + - - + + - - x 1 Next layer computes: - w 1 Distance - w 2 Distance > 0 + - - + - + + x 2 + + - - + + - - Distance to w 2 - - + - + + - + + + - - + + - -

Can We Train a Network To Do It? Result: feedforward neural networks with a finite number of neurons in a hidden layer can approximate any reasonable* function Cybenko (1989) for neural networks with sigmoids; Hornik (1991) more generally In practice, doesn’t give a practical guarantee. Why? *Continuous, with bounded domain.

Developing Intuitions There is no royal road to geometry. – Euclid • Best way: play with data, be skeptical of everything you do, be skeptical of everything you are told • Remember: this is linear algebra, not magic • Common technique: How would you set the weights by hand if you were forced to be a deep net

Parameters How many parameters does this network have? x 1 Weights : 1x2 y 1 Parameters: 3 (bias!) x 2

Parameters How many parameters does this network have? h 1 x 1 h 2 Weights : 1x4+4x2 = 12 y 1 Parameters: 12+5 = 17 x 2 h 3 h 4

Parameters How many parameters does this network have? a 1 h 1 y 1 Weights : 3x4+4x4+4x2 = 36 x 1 a 2 h 2 y 2 Parameters: 36+11 = 47 x 2 a 3 h 3 y 3 a 4 h 4

Parameters H*P+ H*H H*H O*H H +H +H +O O H H H neurons neurons neurons neurons Make Px1 h h h o vector … … … … x h h h o P: 285x350 picture (terrible!) , H: 1000, O: 3 102 million parameters (400MB)

Parameters • First layer converts all H visual information into a neurons Make single N dimensional Px1 h vector. vector • Suppose you want a … x neuron to represent dx/dy h at each pixel. How many neurons do you need? • 2P!

Parameters H*P+ H*H H*H O*H H +H +H +O O H H H neurons neurons neurons neurons Make Px1 h h h o vector … … … … x h h h o P: 285x350, H: 2P, O: 3 100 billion parameters (400GB)

Convnets Keep Spatial Resolution Around Neural net: Convnet: Data: vector Fx1 Data: image HxWxF Transform: matrix-multiply Transform: convolution Make Keep Px1 Image vector Dims x

Convnet Height: 300 Width: 500 Height Depth: 3 Height: 32 Width Width: 32 Depth: 3 Depth

Convnet Fully connected: Convnet: Connects to everything Connects locally 32 32 neuron neuron 32 32 3 3 Slide credit: Karpathy and Fei-Fei

Convnet Neuron is the same: weighted linear average F w 32 F h neuron c 𝐺 ℎ 𝐺 𝑑 𝑥 ෍ ෍ ෍ 𝐺 𝑗,𝑘,𝑙 ∗ 𝐽 𝑧+𝑗,𝑦+𝑘,𝑑 32 𝑗=1 𝑘=1 𝑙=1 3 Slide credit: Karpathy and Fei-Fei

Convnet Neuron is the same: weighted linear average F w 32 Filter is global over Filter is local in F h neuron space: sum only channels/depth: sum over all channels over F h x F w pixels c 𝐺 ℎ 𝐺 𝑑 𝑥 ෍ ෍ ෍ 𝐺 𝑗,𝑘,𝑙 ∗ 𝐽 𝑧+𝑗,𝑦+𝑘,𝑑 32 𝑗=1 𝑘=1 𝑙=1 3 Slide credit: Karpathy and Fei-Fei

Convnet Get spatial output by sliding filter over image F w 32 F h c 𝐺 ℎ 𝐺 𝑑 𝑥 ෍ ෍ ෍ 𝐺 𝑗,𝑘,𝑙 ∗ 𝐽 𝑧+𝑗,𝑦+𝑘,𝑑 32 𝑗=1 𝑘=1 𝑙=1 3 Slide credit: Karpathy and Fei-Fei

Differences From Lecture 4 Filtering (a) #input channels can be greater than one (b) forget you learned the difference between convolution and cross-correlation I11 F11 I12 F12 I13 F13 I14 I15 I16 Output[1,2] I21 F21 I22 F22 I23 F23 I24 I25 I26 = I[1,2]*F[1,1] + I[1,3]*F[1,2] I31 F31 I32 F32 I33 F33 I34 I35 I36 + … + I[3,4]*F[3,3] I41 I42 I43 I44 I45 I46 I51 I52 I53 I54 I55 I56

Convnet How big is the output? Height? 32-5+1=28 32 5 Width? 32-5+1=28 Channels? 1 5 One filter not very useful by itself 32 3 Slide credit: Karpathy and Fei-Fei

Multiple Filters You’ve already seen this before Input: Output: 400x600x1 400x600x2

Convnet Multiple out channels via multiple filters. How big is the output? Depth Height? 32-5+1=28 32 5 Dimension Width? 32-5+1=28 Channels? 200 5 200 32 3 Slide credit: Karpathy and Fei-Fei

Convnet Multiple out channels via multiple filters. How big is the output? Height? 32-5+1=28 32 5 Width? 32-5+1=28 Channels? 200 5 32 3 Slide credit: Karpathy and Fei-Fei

Convnet, Summarized Neural net: Convnet: series of matrix-multiplies series of convolutions parameterized by W , b + parameterized by F,b + nonlinearity/activation nonlinearity/activation Fit by gradient descent Fit by gradient descent x

One Additional Subtlety – Stride Warmup: how big is the output spatially? Normal (Stride 1): I11 I12 I13 I14 I15 I16 I17 F11 F12 F13 5x5 output F21 I21 F22 I22 F23 I23 I24 I25 I26 I27 F31 I31 F32 I32 F33 I33 I34 I35 I36 I37 I41 I42 I43 I44 I45 I46 I47 I51 I52 I53 I54 I55 I56 I57 I61 I62 I63 I64 I65 I66 I67 I71 I72 I73 I74 I75 I76 I77 Example credit: Karpathy and Fei-Fei

One Additional Subtlety – Stride Stride: skip a few (here 2) Normal (Stride 1): I11 I12 I13 I14 I15 I16 I17 F11 F12 F13 5x5 output F21 I21 F22 I22 F23 I23 I24 I25 I26 I27 F31 I31 F32 I32 F33 I33 I34 I35 I36 I37 I41 I42 I43 I44 I45 I46 I47 I51 I52 I53 I54 I55 I56 I57 I61 I62 I63 I64 I65 I66 I67 I71 I72 I73 I74 I75 I76 I77 Example credit: Karpathy and Fei-Fei

One Additional Subtlety – Stride Stride: skip a few (here 2) Normal (Stride 1): I11 I12 I13 I14 I15 I16 I17 F11 F12 F13 5x5 output I21 I22 F21 I23 F22 I24 F23 I25 I26 I27 I31 I32 F31 I33 F32 I34 F33 I35 I36 I37 I41 I42 I43 I44 I45 I46 I47 I51 I52 I53 I54 I55 I56 I57 I61 I62 I63 I64 I65 I66 I67 I71 I72 I73 I74 I75 I76 I77 Example credit: Karpathy and Fei-Fei

Convolutional Neural Nets EECS 442 David Fouhey Fall 2019, - PowerPoint PPT Presentation

Convolutional Neural Nets EECS 442 David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/ Previously Backpropagation = + 3 2 x -x -x+3 (-x+3) 2 -n n 2 n+3 1 2x 6

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning COMPSCI 371D Machine

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Petri Nets Petri Nets Inputs and Outputs Petri Nets vs FSM Lionel Morel Modeling Templates

Mix-Nets Lecture 19 Some tools for electronic-voting (and other things) Mix-Nets Mix-Nets

Petri Nets and Model Checking Natasa Gkolfi University of Oslo March 31, 2017 Petri Nets and

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Supporting Facility and Shin Ming Guo Process Flows NKFUST Servicescapes Process Flow

The Map To Millions for Millennials sli.do/sbm2m 1 Disclaimer Any advice contained in this

Usable security and the human in the loop Michelle Mazurek Some slides adapted from Lujo Bauer,

2018 Facilitator: Mdm Pauline Goh Mr Ko Kuan Woei 1 Objectives Identifying scientific

UNDERSTANDNG PROCESSES 2 L EARNNG O BJECTVES Understand the following concepts: Flow

QUESTIONS 2-1 Cost information is used in deciding whether to introduce a new product or

AESS - IFD Batch 20/21 1 Contents Introduction Zero Articles Indefinite Articles

Machine Learning I: Decision Trees Schedule mostly finalized Link on Teams now Piazza AI

Convolutional Neural Nets EECS 442 David Fouhey Fall 2019, - PowerPoint PPT Presentation

Convolutional Neural Nets EECS 442 David Fouhey Fall 2019, University of Michigan http://web.eecs.umich.edu/~fouhey/teaching/EECS442_F19/ Previously Backpropagation = + 3 2 x -x -x+3 (-x+3) 2 -n n 2 n+3 1 2x 6

Convolutional Neural Nets CS447 Natural Language Processing (J. Hockenmaier)

Convolutional Neural Networks Convolutional neural networks One of the major kinds of ANNs in use

Convolutional Neural Networks ---- Off the shelf top notch performances Convolutional Neural

Convolutional Neural Nets 4-25-16 Reading Quiz Convolutional neural networks are most commonly

Deep Convolutional Neural Nets COMPSCI 371D Machine Learning COMPSCI 371D Machine

Convolutional Kuan-Ting Lai 2020/3/31 Neural Network Convolutional Neural Networks (CNN)

Conflict nets: Efficient locally canonical MALL proof nets Dominic J. D. Hughes and Willem

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Introduction CSCE 970 CSCE 970 Lecture 4: Lecture 4: Convolutional Convolutional Neural

Petri Nets Petri Nets Inputs and Outputs Petri Nets vs FSM Lionel Morel Modeling Templates

Mix-Nets Lecture 19 Some tools for electronic-voting (and other things) Mix-Nets Mix-Nets

Petri Nets and Model Checking Natasa Gkolfi University of Oslo March 31, 2017 Petri Nets and

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Convolutional Neural Networks for Sentence Classification Yoon Kim New York University 1 / 34

Convolutional Neural Networks 08, 10 &amp; 17 Nov, 2016 J. Ezequiel Soto S. Image Processing

and Inference for Convolutional Neural Networks 1 2 FFT IFFT 3 4 Mathieu et al.: Fast

Supporting Facility and Shin Ming Guo Process Flows NKFUST Servicescapes Process Flow

The Map To Millions for Millennials sli.do/sbm2m 1 Disclaimer Any advice contained in this

Usable security and the human in the loop Michelle Mazurek Some slides adapted from Lujo Bauer,

2018 Facilitator: Mdm Pauline Goh Mr Ko Kuan Woei 1 Objectives Identifying scientific

UNDERSTANDNG PROCESSES 2 L EARNNG O BJECTVES Understand the following concepts: Flow

QUESTIONS 2-1 Cost information is used in deciding whether to introduce a new product or

AESS - IFD Batch 20/21 1 Contents Introduction Zero Articles Indefinite Articles

Machine Learning I: Decision Trees Schedule mostly finalized Link on Teams now Piazza AI

Convolutional Neural Networks 08, 10 & 17 Nov, 2016 J. Ezequiel Soto S. Image Processing